|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectit.unimi.di.mg4j.document.AbstractDocumentSequence
it.unimi.di.mg4j.document.AbstractDocumentCollection
it.unimi.di.mg4j.document.FileSetDocumentCollection
public class FileSetDocumentCollection
A DocumentCollection corresponding to
a given set of files.
This class provides a main method with a flexible syntax that serialises into a document collection a list of files given on the command line or piped into standard input. Optionally, you can provide a parallel list of URIs that will be associated with each file.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class it.unimi.di.mg4j.document.AbstractDocumentCollection |
|---|
AbstractDocumentCollection.PropertyKeys |
| Field Summary |
|---|
| Fields inherited from interface it.unimi.di.mg4j.document.DocumentCollection |
|---|
DEFAULT_EXTENSION |
| Constructor Summary | |
|---|---|
FileSetDocumentCollection(String[] file,
DocumentFactory factory)
Builds a document collection corresponding to a given set of files specified as an array. |
|
FileSetDocumentCollection(String[] file,
String[] uri,
DocumentFactory factory)
Builds a document collection corresponding to a given set of files specified as an array and a parallel array of URIs, one for each file. |
|
| Method Summary | |
|---|---|
void |
close()
Closes this document sequence, releasing all resources. |
FileSetDocumentCollection |
copy()
|
Document |
document(int index)
Returns the document given its index. |
DocumentFactory |
factory()
Returns the factory used by this sequence. |
static void |
main(String[] arg)
|
Reference2ObjectMap<Enum<?>,Object> |
metadata(int index)
Returns the metadata map for a document. |
int |
size()
Returns the number of documents in this collection. |
InputStream |
stream(int index)
Returns an input stream for the raw content of a document. |
| Methods inherited from class it.unimi.di.mg4j.document.AbstractDocumentCollection |
|---|
ensureDocumentIndex, iterator, printAllDocuments, toString |
| Methods inherited from class it.unimi.di.mg4j.document.AbstractDocumentSequence |
|---|
filename, finalize, load |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface it.unimi.di.mg4j.document.DocumentSequence |
|---|
filename |
| Constructor Detail |
|---|
public FileSetDocumentCollection(String[] file,
DocumentFactory factory)
Beware. This class is not guaranteed to work if files are deleted or modified after creation!
file - an array containing the files that will be contained in the collection.factory - the factory that will be used to create documents.
public FileSetDocumentCollection(String[] file,
String[] uri,
DocumentFactory factory)
Beware. This class is not guaranteed to work if files are deleted or modified after creation!
file - an array containing the files that will be contained in the collection.uri - an array, parallel to file, containing URIs to be associated with each element of file.factory - the factory that will be used to create documents.| Method Detail |
|---|
public DocumentFactory factory()
DocumentSequenceEvery document sequence is based on a document factory that transforms raw bytes into a sequence of characters. The factory contains useful information such as the number of fields.
factory in interface DocumentSequencepublic int size()
DocumentCollection
size in interface DocumentCollectionpublic Reference2ObjectMap<Enum<?>,Object> metadata(int index)
DocumentCollection
metadata in interface DocumentCollectionindex - an index between 0 (inclusive) and DocumentCollection.size() (exclusive).
public Document document(int index)
throws IOException
DocumentCollection
document in interface DocumentCollectionindex - an index between 0 (inclusive) and DocumentCollection.size() (exclusive).
index-th document.
IOException
public InputStream stream(int index)
throws IOException
DocumentCollection
stream in interface DocumentCollectionindex - an index between 0 (inclusive) and DocumentCollection.size() (exclusive).
IOExceptionpublic FileSetDocumentCollection copy()
copy in interface DocumentCollectioncopy in interface FlyweightPrototype<DocumentCollection>
public void close()
throws IOException
DocumentSequenceYou should always call this method after having finished with this document sequence.
Implementations are invited to call this method in a finaliser as a safety net (even better,
implement SafelyCloseable), but since there
is no guarantee as to when finalisers are invoked, you should not depend on this behaviour.
close in interface DocumentSequenceclose in interface Closeableclose in class AbstractDocumentSequenceIOException
public static void main(String[] arg)
throws IOException,
com.martiansoftware.jsap.JSAPException,
InstantiationException,
IllegalAccessException,
InvocationTargetException,
NoSuchMethodException
IOException
com.martiansoftware.jsap.JSAPException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||