Interface DocumentCollectionBuilder
-
- All Known Implementing Classes:
SimpleCompressedDocumentCollectionBuilder,ZipDocumentCollectionBuilder
public interface DocumentCollectionBuilderAn interface for classes that can build collections during the indexing process.A builder is usually based on a basename. Many different collections can be built using the same builder, using
open(CharSequence)to specify a suffix that will be added to the basename. Creating several collections is a simple way to make collection construction scalable: for instance,Scancreates several collections, one per batch, and then puts them together using aConcatenatedDocumentCollection.After creating an instance of this class and after having opened a new collection, it is possible to add incrementally new documents. Each document must be started with
startDocument(CharSequence, CharSequence)and ended withendDocument(); inside each document, each non-text field must be written by passing an object tononTextField(Object), whereas each text field must be started withstartTextField()and ended withendTextField(): inbetween, a call toadd(MutableString, MutableString)must be made for each word/nonword pair retrieved from the original collection. At the end,close()returns aZipDocumentCollectionthat must be serialised.Several collections (e.g.,
SimpleCompressedDocumentCollection,ZipDocumentCollection) can be exact or approximated: in the latter case, nonwords are not recorded to decrease space usage.
-
-
Method Summary
Modifier and Type Method Description voidadd(MutableString word, MutableString nonWord)Stringbasename()Returns the basename of this builder.voidclose()Terminates the contruction of the collection.voidendDocument()Ends a document entry.voidendTextField()Ends a new text field.voidnonTextField(Object o)Adds a non-text field.voidopen(CharSequence suffix)Opens a new collection.voidstartDocument(CharSequence title, CharSequence uri)Starts a document entry.voidstartTextField()Starts a new text field.voidvirtualField(List<Scan.VirtualDocumentFragment> fragments)Adds a virtual field.
-
-
-
Method Detail
-
basename
String basename()
Returns the basename of this builder.- Returns:
- the basename
-
open
void open(CharSequence suffix) throws IOException
Opens a new collection.- Parameters:
suffix- a suffix that will be added to the basename provided at construction time.- Throws:
IOException
-
startDocument
void startDocument(CharSequence title, CharSequence uri) throws IOException
Starts a document entry.- Parameters:
title- the document title (usually, the result ofDocument.title()).uri- the document uri (usually, the result ofDocument.uri()).- Throws:
IOException
-
endDocument
void endDocument() throws IOExceptionEnds a document entry.- Throws:
IOException
-
startTextField
void startTextField()
Starts a new text field.
-
endTextField
void endTextField() throws IOExceptionEnds a new text field.- Throws:
IOException
-
nonTextField
void nonTextField(Object o) throws IOException
Adds a non-text field.- Parameters:
o- the content of the non-text field.- Throws:
IOException
-
virtualField
void virtualField(List<Scan.VirtualDocumentFragment> fragments) throws IOException
Adds a virtual field.- Parameters:
fragments- the virtual fragments to be added.- Throws:
IOException
-
add
void add(MutableString word, MutableString nonWord) throws IOException
Adds a word and a nonword to the current text field, provided that a text field has started but not yet ended; otherwise, doesn't do anything.Usually,
wordenonWordare just the result of a call toWordReader.next(MutableString, MutableString).- Parameters:
word- a word.nonWord- a nonword.- Throws:
IOException
-
close
void close() throws IOExceptionTerminates the contruction of the collection.- Throws:
IOException
-
-