Class SubsetDocumentSequence
- java.lang.Object
-
- it.unimi.di.big.mg4j.document.AbstractDocumentSequence
-
- it.unimi.di.big.mg4j.document.SubsetDocumentSequence
-
- All Implemented Interfaces:
DocumentSequence,SafelyCloseable,Closeable,Serializable,AutoCloseable
public class SubsetDocumentSequence extends AbstractDocumentSequence implements Serializable
A collection that exhibits a subset of documents (possibly not contiguous) from a given sequence.This class provides several string-based constructors that use the
ObjectParserconventions; they can be used to generate easily subcollections from the command line.- Author:
- Paolo Boldi
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description SubsetDocumentSequence(DocumentSequence underlyingSequence, long first, long last)Creates a new subsequence.SubsetDocumentSequence(DocumentSequence underlyingSequence, LongSet documents)Creates a new subsequence.SubsetDocumentSequence(String underlyingSequenceFilename, String documentFileFilename)Creates a new subsequence.SubsetDocumentSequence(String underlyingSequenceFilename, String first, String last)Creates a new subsequence.
-
Method Summary
Modifier and Type Method Description voidclose()Closes this document sequence, releasing all resources.DocumentFactoryfactory()Returns the factory used by this sequence.DocumentIteratoriterator()Returns an iterator over the sequence of documents.-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentSequence
filename, finalize, load
-
-
-
-
Constructor Detail
-
SubsetDocumentSequence
public SubsetDocumentSequence(DocumentSequence underlyingSequence, LongSet documents)
Creates a new subsequence.- Parameters:
underlyingSequence- the underlying document sequence.documents- in the subsequence.
-
SubsetDocumentSequence
public SubsetDocumentSequence(DocumentSequence underlyingSequence, long first, long last)
Creates a new subsequence.- Parameters:
underlyingSequence- the underlying document sequence.first- the first document (inclusive) in the subsequence.last- the last document (exclusive) in this subsequence.
-
SubsetDocumentSequence
public SubsetDocumentSequence(String underlyingSequenceFilename, String documentFileFilename) throws NumberFormatException, IllegalArgumentException, SecurityException, IOException, ClassNotFoundException
Creates a new subsequence.- Parameters:
underlyingSequenceFilename- the filename of the underlying document sequence.documentFileFilename- the filename of a file containing a serialized version of the set of document pointers to be retained.- Throws:
NumberFormatExceptionIllegalArgumentExceptionSecurityExceptionIOExceptionClassNotFoundException
-
SubsetDocumentSequence
public SubsetDocumentSequence(String underlyingSequenceFilename, String first, String last) throws NumberFormatException, IllegalArgumentException, SecurityException, IOException, ClassNotFoundException
Creates a new subsequence.- Parameters:
underlyingSequenceFilename- the filename of the underlying document sequence.first- the first document (inclusive) in the subsequence.last- the last document (exclusive) in this subsequence.- Throws:
NumberFormatExceptionIllegalArgumentExceptionSecurityExceptionIOExceptionClassNotFoundException
-
-
Method Detail
-
iterator
public DocumentIterator iterator() throws IOException
Description copied from interface:DocumentSequenceReturns an iterator over the sequence of documents.Warning: this method can be safely called just one time. For instance, implementations based on standard input will usually throw an exception if this method is called twice.
Implementations may decide to override this restriction (in particular, if they implement
DocumentCollection). Usually, however, it is not possible to obtain two iterators at the same time on a collection.- Specified by:
iteratorin interfaceDocumentSequence- Returns:
- an iterator over the sequence of documents.
- Throws:
IOException- See Also:
DocumentCollection
-
factory
public DocumentFactory factory()
Description copied from interface:DocumentSequenceReturns the factory used by this sequence.Every document sequence is based on a document factory that transforms raw bytes into a sequence of characters. The factory contains useful information such as the number of fields.
- Specified by:
factoryin interfaceDocumentSequence- Returns:
- the factory used by this sequence.
-
close
public void close() throws IOExceptionDescription copied from interface:DocumentSequenceCloses this document sequence, releasing all resources.You should always call this method after having finished with this document sequence. Implementations are invited to call this method in a finaliser as a safety net (even better, implement
SafelyCloseable), but since there is no guarantee as to when finalisers are invoked, you should not depend on this behaviour.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein interfaceDocumentSequence- Overrides:
closein classAbstractDocumentSequence- Throws:
IOException
-
-