|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
public interface DocumentFactory
A factory parsing and building documents of the same type.
Each document produced by the same factory has a number of fields,
which represent units of information that should be indexed
separately. The number of available fields may be recovered calling
numberOfFields(), their types calling fieldType(int),
and their symbolic names using fieldName(int).
Factories contain the parsing and document-level breaking logic. For instance,
a factory for HTML documents might extract the text into a title and a body, and
expose them as DocumentFactory.FieldType.TEXT fields. Additionally, the last modification
date might be exposed as a DocumentFactory.FieldType.DATE field, and so on.
Warning: implementations of this class are not required
to be thread-safe, but they provide flyweight copies.
The copy() method is strengthened so to return a instance of this class.
| Nested Class Summary | |
|---|---|
static class |
DocumentFactory.FieldType
A field type. |
| Method Summary | |
|---|---|
DocumentFactory |
copy()
|
int |
fieldIndex(String fieldName)
Returns the index of a field, given its symbolic name. |
String |
fieldName(int field)
Returns the symbolic name of a field. |
DocumentFactory.FieldType |
fieldType(int field)
Returns the type of a field. |
Document |
getDocument(InputStream rawContent,
Reference2ObjectMap<Enum<?>,Object> metadata)
Returns the document obtained by parsing the given byte stream. |
int |
numberOfFields()
Returns the number of fields present in the documents produced by this factory. |
| Method Detail |
|---|
int numberOfFields()
String fieldName(int field)
field - the index of a field (between 0 inclusive and numberOfFields() exclusive}).
field-th field.int fieldIndex(String fieldName)
fieldName - the name of a field of this factory.
fieldName.DocumentFactory.FieldType fieldType(int field)
The possible types are defined in DocumentFactory.FieldType.
field - the index of a field (between 0 inclusive and numberOfFields() exclusive}).
field-th field.
Document getDocument(InputStream rawContent,
Reference2ObjectMap<Enum<?>,Object> metadata)
throws IOException
The parameter metadata actually replaces the lack of a simple keyword-based
parameter-passing system in Java. This method might take several different type of “suggestions”
which have been collected by the collection: typically, the document title, a URI representing
the document, its MIME type, its encoding and so on. Some of this information might be
set by default (as it happens, for instance, in a PropertyBasedDocumentFactory).
Implementations of this method must consult the metadata provided by the collection, possibly
complete them with default factory metadata, and proceed to the document construction.
rawContent - the raw content from which the document should be extracted; it must not be closed, as
resource management is a responsibility of the DocumentCollection.metadata - a map from enums (e.g., keys taken in PropertyBasedDocumentFactory) to various kind of objects.
IOExceptionDocumentFactory copy()
copy in interface FlyweightPrototype<DocumentFactory>
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||