Class AbstractTikaDocumentFactory
- java.lang.Object
-
- it.unimi.di.big.mg4j.document.AbstractDocumentFactory
-
- it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
-
- it.unimi.di.big.mg4j.document.tika.AbstractTikaDocumentFactory
-
- All Implemented Interfaces:
DocumentFactory,FlyweightPrototype<DocumentFactory>,Serializable
- Direct Known Subclasses:
AbstractSimpleTikaDocumentFactory
public abstract class AbstractTikaDocumentFactory extends PropertyBasedDocumentFactory
An abstract document factory that provides the mapping from field names to field indices.Concrete subclasses must implement the method
fields(), providing the list of Tika fields.- Author:
- Salvatore Insalaco
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
PropertyBasedDocumentFactory.MetadataKeys
-
Nested classes/interfaces inherited from interface it.unimi.di.big.mg4j.document.DocumentFactory
DocumentFactory.FieldType
-
-
Field Summary
-
Fields inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
defaultMetadata
-
-
Constructor Summary
Constructors Constructor Description AbstractTikaDocumentFactory()AbstractTikaDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)AbstractTikaDocumentFactory(Properties properties)AbstractTikaDocumentFactory(String[] property)
-
Method Summary
Modifier and Type Method Description intfieldIndex(String fieldName)Returns the index of a field, given its symbolic name.StringfieldName(int field)Returns the symbolic name of a field.protected abstract List<TikaField>fields()Returns the list of Tika fields (they will be mapped to MG4J fields whose index is their index in the list).DocumentFactory.FieldTypefieldType(int field)Returns the type of a field.intnumberOfFields()Returns the number of fields present in the documents produced by this factory.-
Methods inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
ensureJustOne, getInstance, getInstance, getInstance, getInstance, parseProperties, parseProperties, parseProperty, resolve, resolve, resolveNotNull, sameKey, toString
-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentFactory
ensureFieldIndex
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface it.unimi.di.big.mg4j.document.DocumentFactory
copy, getDocument
-
-
-
-
Constructor Detail
-
AbstractTikaDocumentFactory
public AbstractTikaDocumentFactory(Properties properties) throws org.apache.commons.configuration.ConfigurationException
- Throws:
org.apache.commons.configuration.ConfigurationException
-
AbstractTikaDocumentFactory
public AbstractTikaDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
-
AbstractTikaDocumentFactory
public AbstractTikaDocumentFactory(String[] property) throws org.apache.commons.configuration.ConfigurationException
- Throws:
org.apache.commons.configuration.ConfigurationException
-
AbstractTikaDocumentFactory
public AbstractTikaDocumentFactory()
-
-
Method Detail
-
numberOfFields
public int numberOfFields()
Description copied from interface:DocumentFactoryReturns the number of fields present in the documents produced by this factory.- Returns:
- the number of fields present in the documents produced by this factory.
-
fieldName
public String fieldName(int field)
Description copied from interface:DocumentFactoryReturns the symbolic name of a field.- Parameters:
field- the index of a field (between 0 inclusive andDocumentFactory.numberOfFields()exclusive}).- Returns:
- the symbolic name of the
field-th field.
-
fieldIndex
public int fieldIndex(String fieldName)
Description copied from interface:DocumentFactoryReturns the index of a field, given its symbolic name.- Parameters:
fieldName- the name of a field of this factory.- Returns:
- the corresponding index, or -1 if there is no field with name
fieldName.
-
fieldType
public DocumentFactory.FieldType fieldType(int field)
Description copied from interface:DocumentFactoryReturns the type of a field.The possible types are defined in
DocumentFactory.FieldType.- Parameters:
field- the index of a field (between 0 inclusive andDocumentFactory.numberOfFields()exclusive}).- Returns:
- the type of the
field-th field.
-
-