Package it.unimi.di.big.mg4j.document
Class ZipDocumentCollectionBuilder
- java.lang.Object
-
- it.unimi.di.big.mg4j.document.ZipDocumentCollectionBuilder
-
- All Implemented Interfaces:
DocumentCollectionBuilder
public class ZipDocumentCollectionBuilder extends Object implements DocumentCollectionBuilder
A builder for zipped document collections.
-
-
Constructor Summary
Constructors Constructor Description ZipDocumentCollectionBuilder(String basename, DocumentFactory factory, boolean exact)Creates a new zipped collection builder.
-
Method Summary
Modifier and Type Method Description voidadd(MutableString word, MutableString nonWord)Stringbasename()Returns the basename of this builder.voidbuild(DocumentSequence inputSequence)voidclose()Terminates the contruction of the collection.voidendDocument()Ends a document entry.voidendTextField()Ends a new text field.static voidmain(String[] arg)voidnonTextField(Object o)Adds a non-text field.voidopen(CharSequence suffix)Opens a new collection.voidstartDocument(CharSequence title, CharSequence uri)Starts a document entry.voidstartTextField()Starts a new text field.voidvirtualField(List<Scan.VirtualDocumentFragment> fragments)Adds a virtual field.
-
-
-
Constructor Detail
-
ZipDocumentCollectionBuilder
public ZipDocumentCollectionBuilder(String basename, DocumentFactory factory, boolean exact)
Creates a new zipped collection builder.- Parameters:
factory- the factory of the base document sequence.exact- true iff also non-words should be preserved.
-
-
Method Detail
-
open
public void open(CharSequence suffix) throws FileNotFoundException
Description copied from interface:DocumentCollectionBuilderOpens a new collection.- Specified by:
openin interfaceDocumentCollectionBuilder- Parameters:
suffix- a suffix that will be added to the basename provided at construction time.- Throws:
FileNotFoundException
-
basename
public String basename()
Description copied from interface:DocumentCollectionBuilderReturns the basename of this builder.- Specified by:
basenamein interfaceDocumentCollectionBuilder- Returns:
- the basename
-
startDocument
public void startDocument(CharSequence title, CharSequence uri) throws IOException
Description copied from interface:DocumentCollectionBuilderStarts a document entry.- Specified by:
startDocumentin interfaceDocumentCollectionBuilder- Parameters:
title- the document title (usually, the result ofDocument.title()).uri- the document uri (usually, the result ofDocument.uri()).- Throws:
IOException
-
endDocument
public void endDocument() throws IOExceptionDescription copied from interface:DocumentCollectionBuilderEnds a document entry.- Specified by:
endDocumentin interfaceDocumentCollectionBuilder- Throws:
IOException
-
startTextField
public void startTextField()
Description copied from interface:DocumentCollectionBuilderStarts a new text field.- Specified by:
startTextFieldin interfaceDocumentCollectionBuilder
-
nonTextField
public void nonTextField(Object o) throws IOException
Description copied from interface:DocumentCollectionBuilderAdds a non-text field.- Specified by:
nonTextFieldin interfaceDocumentCollectionBuilder- Parameters:
o- the content of the non-text field.- Throws:
IOException
-
virtualField
public void virtualField(List<Scan.VirtualDocumentFragment> fragments) throws IOException
Description copied from interface:DocumentCollectionBuilderAdds a virtual field.- Specified by:
virtualFieldin interfaceDocumentCollectionBuilder- Parameters:
fragments- the virtual fragments to be added.- Throws:
IOException
-
endTextField
public void endTextField() throws IOExceptionDescription copied from interface:DocumentCollectionBuilderEnds a new text field.- Specified by:
endTextFieldin interfaceDocumentCollectionBuilder- Throws:
IOException
-
add
public void add(MutableString word, MutableString nonWord) throws IOException
Description copied from interface:DocumentCollectionBuilderAdds a word and a nonword to the current text field, provided that a text field has started but not yet ended; otherwise, doesn't do anything.Usually,
wordenonWordare just the result of a call toWordReader.next(MutableString, MutableString).- Specified by:
addin interfaceDocumentCollectionBuilder- Parameters:
word- a word.nonWord- a nonword.- Throws:
IOException
-
close
public void close() throws IOExceptionDescription copied from interface:DocumentCollectionBuilderTerminates the contruction of the collection.- Specified by:
closein interfaceDocumentCollectionBuilder- Throws:
IOException
-
build
public void build(DocumentSequence inputSequence) throws IOException
- Throws:
IOException
-
main
public static void main(String[] arg) throws com.martiansoftware.jsap.JSAPException, IOException, ClassNotFoundException, InvocationTargetException, NoSuchMethodException, IllegalAccessException, InstantiationException, IllegalArgumentException, SecurityException
- Throws:
com.martiansoftware.jsap.JSAPExceptionIOExceptionClassNotFoundExceptionInvocationTargetExceptionNoSuchMethodExceptionIllegalAccessExceptionInstantiationExceptionIllegalArgumentExceptionSecurityException
-
-