Class TRECHeaderDocumentFactory
- java.lang.Object
-
- it.unimi.di.big.mg4j.document.AbstractDocumentFactory
-
- it.unimi.di.big.mg4j.document.TRECHeaderDocumentFactory
-
- All Implemented Interfaces:
DocumentFactory,FlyweightPrototype<DocumentFactory>,Serializable
public class TRECHeaderDocumentFactory extends AbstractDocumentFactory
A factory without fields that is used to interpret the header of a TREC GOV2 document. It is usually the first factory to interpret a document of aTRECDocumentCollection.Presently, its only rôumflex;le is that of parsing the document URI and setting a metadata item with key
PropertyBasedDocumentFactory.MetadataKeys.URI.- Author:
- Alessio Orlandi, Luca Natali
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface it.unimi.di.big.mg4j.document.DocumentFactory
DocumentFactory.FieldType
-
-
Constructor Summary
Constructors Constructor Description TRECHeaderDocumentFactory()
-
Method Summary
Modifier and Type Method Description DocumentFactorycopy()intfieldIndex(String fieldName)Returns the index of a field, given its symbolic name.StringfieldName(int fieldIndex)Returns the symbolic name of a field.DocumentFactory.FieldTypefieldType(int fieldIndex)Returns the type of a field.DocumentgetDocument(InputStream rawContent, Reference2ObjectMap<Enum<?>,Object> metadata)Returns the document obtained by parsing the given byte stream.intnumberOfFields()Returns the number of fields present in the documents produced by this factory.protected static booleanstartsWith(byte[] a, int l, byte[] b)protected static booleanstartsWithIgnoreCase(byte[] a, int l, char[] b)-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentFactory
ensureFieldIndex, toString
-
-
-
-
Method Detail
-
numberOfFields
public int numberOfFields()
Description copied from interface:DocumentFactoryReturns the number of fields present in the documents produced by this factory.- Returns:
- the number of fields present in the documents produced by this factory.
-
fieldName
public String fieldName(int fieldIndex)
Description copied from interface:DocumentFactoryReturns the symbolic name of a field.- Parameters:
fieldIndex- the index of a field (between 0 inclusive andDocumentFactory.numberOfFields()exclusive}).- Returns:
- the symbolic name of the
field-th field.
-
fieldIndex
public int fieldIndex(String fieldName)
Description copied from interface:DocumentFactoryReturns the index of a field, given its symbolic name.- Parameters:
fieldName- the name of a field of this factory.- Returns:
- the corresponding index, or -1 if there is no field with name
fieldName.
-
fieldType
public DocumentFactory.FieldType fieldType(int fieldIndex)
Description copied from interface:DocumentFactoryReturns the type of a field.The possible types are defined in
DocumentFactory.FieldType.- Parameters:
fieldIndex- the index of a field (between 0 inclusive andDocumentFactory.numberOfFields()exclusive}).- Returns:
- the type of the
field-th field.
-
startsWith
protected static boolean startsWith(byte[] a, int l, byte[] b)
-
startsWithIgnoreCase
protected static boolean startsWithIgnoreCase(byte[] a, int l, char[] b)
-
getDocument
public Document getDocument(InputStream rawContent, Reference2ObjectMap<Enum<?>,Object> metadata) throws IOException
Description copied from interface:DocumentFactoryReturns the document obtained by parsing the given byte stream.The parameter
metadataactually replaces the lack of a simple keyword-based parameter-passing system in Java. This method might take several different type of “suggestions” which have been collected by the collection: typically, the document title, a URI representing the document, its MIME type, its encoding and so on. Some of this information might be set by default (as it happens, for instance, in aPropertyBasedDocumentFactory). Implementations of this method must consult the metadata provided by the collection, possibly complete them with default factory metadata, and proceed to the document construction.- Parameters:
rawContent- the raw content from which the document should be extracted; it must not be closed, as resource management is a responsibility of the DocumentCollection.metadata- a map from enums (e.g., keys taken inPropertyBasedDocumentFactory) to various kind of objects.- Returns:
- the document obtained by parsing the given character sequence.
- Throws:
IOException
-
copy
public DocumentFactory copy()
-
-