|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
public interface DocumentCollectionBuilder
An interface for classes that can build collections during the indexing process.
A builder is usually based on a basename.
Many different collections can be built using the same builder, using open(CharSequence)
to specify a suffix that will be added to the basename. Creating several collections
is a simple way to make collection construction scalable: for instance, Scan creates
several collections, one per batch, and then puts them together using a ConcatenatedDocumentCollection.
After creating an instance of this class and after having opened a new collection, it is possible to add incrementally
new documents. Each document must be started with startDocument(CharSequence, CharSequence)
and ended with endDocument(); inside each document, each non-text field must be written by passing
an object to nonTextField(Object), whereas each text field must be
started with startTextField() and ended with endTextField(): inbetween, a call
to add(MutableString, MutableString) must be made for each word/nonword pair retrieved
from the original collection. At the end, close() returns a ZipDocumentCollection
that must be serialised.
Several collections (e.g., SimpleCompressedDocumentCollection, ZipDocumentCollection) can be
exact or approximated: in the latter case, nonwords are not recorded to decrease space usage.
| Method Summary | |
|---|---|
void |
add(MutableString word,
MutableString nonWord)
Adds a word and a nonword to the current text field, provided that a text field has started but not yet ended; otherwise, doesn't do anything. |
String |
basename()
Returns the basename of this builder. |
void |
close()
Terminates the contruction of the collection. |
void |
endDocument()
Ends a document entry. |
void |
endTextField()
Ends a new text field. |
void |
nonTextField(Object o)
Adds a non-text field. |
void |
open(CharSequence suffix)
Opens a new collection. |
void |
startDocument(CharSequence title,
CharSequence uri)
Starts a document entry. |
void |
startTextField()
Starts a new text field. |
void |
virtualField(ObjectList<Scan.VirtualDocumentFragment> fragments)
Adds a virtual field. |
| Method Detail |
|---|
String basename()
void open(CharSequence suffix)
throws IOException
suffix - a suffix that will be added to the basename provided at construction time.
IOException
void startDocument(CharSequence title,
CharSequence uri)
throws IOException
title - the document title (usually, the result of Document.title()).uri - the document uri (usually, the result of Document.uri()).
IOException
void endDocument()
throws IOException
IOExceptionvoid startTextField()
void endTextField()
throws IOException
IOException
void nonTextField(Object o)
throws IOException
o - the content of the non-text field.
IOException
void virtualField(ObjectList<Scan.VirtualDocumentFragment> fragments)
throws IOException
fragments - the virtual fragments to be added.
IOException
void add(MutableString word,
MutableString nonWord)
throws IOException
Usually, word e nonWord are just the result of a call
to WordReader.next(MutableString, MutableString).
word - a word.nonWord - a nonword.
IOException
void close()
throws IOException
IOException
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||