Package it.unimi.di.big.mg4j.document
Class ConcatenatedDocumentCollection
- java.lang.Object
-
- it.unimi.di.big.mg4j.document.AbstractDocumentSequence
-
- it.unimi.di.big.mg4j.document.AbstractDocumentCollection
-
- it.unimi.di.big.mg4j.document.ConcatenatedDocumentCollection
-
- All Implemented Interfaces:
DocumentCollection,DocumentSequence,SafelyCloseable,FlyweightPrototype<DocumentCollection>,Closeable,Serializable,AutoCloseable
public class ConcatenatedDocumentCollection extends AbstractDocumentCollection implements Serializable
A document collection exhibiting a list of underlying document collections, called segments, as a single collection. The underlying collections are (virtually) concatenated—that is, the first document of the second collection is renumbered to the size of the first collection, and so on. All underlying collections must use the same factory class.A main method makes it easy to create concatenated collections given the filenames of the component collections.
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentCollection
AbstractDocumentCollection.PropertyKeys
-
-
Field Summary
-
Fields inherited from interface it.unimi.di.big.mg4j.document.DocumentCollection
DEFAULT_EXTENSION
-
-
Constructor Summary
Constructors Modifier Constructor Description ConcatenatedDocumentCollection(String... collectionName)Creates a new, partially uninitialised concatenated document collection using giving component collections names.protectedConcatenatedDocumentCollection(String[] collectionName, DocumentCollection[] collection)Creates a new concatenated document collection using giving component collections.
-
Method Summary
Modifier and Type Method Description voidclose()Closes this document sequence, releasing all resources.DocumentCollectioncopy()Documentdocument(long index)Returns the document given its index.DocumentFactoryfactory()Returns the factory used by this sequence.voidfilename(CharSequence filename)Does nothing.static voidmain(String[] arg)Reference2ObjectMap<Enum<?>,Object>metadata(long index)Returns the metadata map for a document.longsize()Returns the number of documents in this collection.InputStreamstream(long index)Returns an input stream for the raw content of a document.-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentCollection
ensureDocumentIndex, iterator, printAllDocuments, toString
-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentSequence
finalize, load
-
-
-
-
Constructor Detail
-
ConcatenatedDocumentCollection
protected ConcatenatedDocumentCollection(String[] collectionName, DocumentCollection[] collection)
Creates a new concatenated document collection using giving component collections.- Parameters:
collection- a list of component collections.
-
ConcatenatedDocumentCollection
public ConcatenatedDocumentCollection(String... collectionName) throws IllegalArgumentException, SecurityException
Creates a new, partially uninitialised concatenated document collection using giving component collections names.- Parameters:
collectionName- a list of names of component collections.- Throws:
IllegalArgumentExceptionSecurityException
-
-
Method Detail
-
filename
public void filename(CharSequence filename)
Description copied from class:AbstractDocumentSequenceDoes nothing.- Specified by:
filenamein interfaceDocumentSequence- Overrides:
filenamein classAbstractDocumentSequence- Parameters:
filename- the filename of this document sequence.
-
copy
public DocumentCollection copy()
- Specified by:
copyin interfaceDocumentCollection- Specified by:
copyin interfaceFlyweightPrototype<DocumentCollection>
-
document
public Document document(long index) throws IOException
Description copied from interface:DocumentCollectionReturns the document given its index.- Specified by:
documentin interfaceDocumentCollection- Parameters:
index- an index between 0 (inclusive) andDocumentCollection.size()(exclusive).- Returns:
- the
index-th document. - Throws:
IOException
-
metadata
public Reference2ObjectMap<Enum<?>,Object> metadata(long index) throws IOException
Description copied from interface:DocumentCollectionReturns the metadata map for a document.- Specified by:
metadatain interfaceDocumentCollection- Parameters:
index- an index between 0 (inclusive) andDocumentCollection.size()(exclusive).- Returns:
- the metadata map for the document.
- Throws:
IOException
-
size
public long size()
Description copied from interface:DocumentCollectionReturns the number of documents in this collection.- Specified by:
sizein interfaceDocumentCollection- Returns:
- the number of documents in this collection.
-
stream
public InputStream stream(long index) throws IOException
Description copied from interface:DocumentCollectionReturns an input stream for the raw content of a document.- Specified by:
streamin interfaceDocumentCollection- Parameters:
index- an index between 0 (inclusive) andDocumentCollection.size()(exclusive).- Returns:
- the raw content of the document as an input stream.
- Throws:
IOException
-
factory
public DocumentFactory factory()
Description copied from interface:DocumentSequenceReturns the factory used by this sequence.Every document sequence is based on a document factory that transforms raw bytes into a sequence of characters. The factory contains useful information such as the number of fields.
- Specified by:
factoryin interfaceDocumentSequence- Returns:
- the factory used by this sequence.
-
close
public void close() throws IOExceptionDescription copied from interface:DocumentSequenceCloses this document sequence, releasing all resources.You should always call this method after having finished with this document sequence. Implementations are invited to call this method in a finaliser as a safety net (even better, implement
SafelyCloseable), but since there is no guarantee as to when finalisers are invoked, you should not depend on this behaviour.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein interfaceDocumentSequence- Overrides:
closein classAbstractDocumentSequence- Throws:
IOException
-
main
public static void main(String[] arg) throws IOException, com.martiansoftware.jsap.JSAPException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
- Throws:
IOExceptioncom.martiansoftware.jsap.JSAPExceptionInstantiationExceptionIllegalAccessExceptionInvocationTargetExceptionNoSuchMethodException
-
-