Class QuasiSuccinctIndexWriter
- java.lang.Object
-
- it.unimi.di.big.mg4j.index.QuasiSuccinctIndexWriter
-
- All Implemented Interfaces:
IndexWriter
public class QuasiSuccinctIndexWriter extends Object implements IndexWriter
An index writer for quasi-succinct indices.- Author:
- Sebastiano Vigna
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static classQuasiSuccinctIndexWriter.Accumulatorprotected static classQuasiSuccinctIndexWriter.LongWordCachestatic classQuasiSuccinctIndexWriter.LongWordOutputBitStream
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_CACHE_SIZEThe default size of the bit cache.
-
Constructor Summary
Constructors Constructor Description QuasiSuccinctIndexWriter(IOFactory ioFactory, CharSequence basename, long numberOfDocuments, int log2Quantum, int cacheSize, Map<CompressionFlags.Component,CompressionFlags.Coding> flags, ByteOrder byteOrder)Creates a new index writer, with the specified basename.
-
Method Summary
Modifier and Type Method Description voidclose()Closes this index writer, completing the index creation process and releasing all resources.static intlowerBits(long length, long upperBound, boolean strict)Returns the number of lower bits for the Elias–Fano encoding of a list of given length, upper bound and strictness.OutputBitStreamnewDocumentRecord()Starts a new document record.longnewInvertedList()Starts a new inverted list.voidnewInvertedList(long frequency, long occurrency, long sumMaxPos)Starts a new inverted list.static longnumberOfPointers(long length, long upperBound, int log2Quantum, boolean strict, boolean indexZeroes)Returns the number of forward or skip pointers to the Elias–Fano encoding of a list of given length, upper bound and strictness.static intpointerSize(long length, long upperBound, boolean strict, boolean indexZeroes)Returns the size in bits of forward or skip pointers to the Elias–Fano encoding of a list of given length, upper bound and strictness.voidprintStats(PrintStream stats)Writes to the given print stream statistical information about the index just built.Propertiesproperties()Returns properties of the index generated by this index writer.voidwriteDocumentPointer(OutputBitStream out, long pointer)Writes a document pointer.voidwriteDocumentPositions(OutputBitStream unused, int[] position, int offset, int count, int docSize)Writes the positions of the occurrences of the current term in the current document to the givenOutputBitStream.voidwriteFrequency(long frequency)Writes the frequency.voidwritePayload(OutputBitStream unused, Payload payload)Writes the payload for the current document.voidwritePositionCount(OutputBitStream unused, int count)Writes the count of the occurrences of the current term in the current document to the givenOutputBitStream.longwrittenBits()Returns the overall number of bits written onto the underlying stream(s).
-
-
-
Field Detail
-
DEFAULT_CACHE_SIZE
public static final int DEFAULT_CACHE_SIZE
The default size of the bit cache.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
QuasiSuccinctIndexWriter
public QuasiSuccinctIndexWriter(IOFactory ioFactory, CharSequence basename, long numberOfDocuments, int log2Quantum, int cacheSize, Map<CompressionFlags.Component,CompressionFlags.Coding> flags, ByteOrder byteOrder) throws IOException
Creates a new index writer, with the specified basename.- Parameters:
ioFactory- the factory that will be used to perform I/O.basename- the basename.numberOfDocuments- the number of documents in the collection to be indexed.log2Quantum- the logarithm of the quantum.cacheSize- the size in byte of the bit caches.byteOrder- the byte order of the index (ifnull,ByteOrder.nativeOrder()).- Throws:
IOException
-
-
Method Detail
-
lowerBits
public static int lowerBits(long length, long upperBound, boolean strict)Returns the number of lower bits for the Elias–Fano encoding of a list of given length, upper bound and strictness.- Parameters:
length- the number of elements of the list.upperBound- an upper bound for the elements of the list.strict- if true, the elements of the list are strictly increasing, and the returned number of bits is for the strict representation (e.g., storing the k-th element decreased by k).- Returns:
- the number of bits for the Elias–Fano encoding of a list with the specified parameters.
-
pointerSize
public static int pointerSize(long length, long upperBound, boolean strict, boolean indexZeroes)Returns the size in bits of forward or skip pointers to the Elias–Fano encoding of a list of given length, upper bound and strictness.- Parameters:
length- the number of elements of the list.upperBound- an upper bound for the elements of the list.strict- if true, the elements of the list are strictly increasing, and the returned number of bits is for the strict representation (e.g., storing the k-th element decreased by k).indexZeroes- if true, the number of bits for skip pointers is returned; otherwise, the number of bits for forward pointers is returned.- Returns:
- the size of bits of forward or skip pointers the Elias–Fano encoding of a list with the specified parameters.
-
numberOfPointers
public static long numberOfPointers(long length, long upperBound, int log2Quantum, boolean strict, boolean indexZeroes)Returns the number of forward or skip pointers to the Elias–Fano encoding of a list of given length, upper bound and strictness.- Parameters:
length- the number of elements of the list.upperBound- an upper bound for the elements of the list.log2Quantum- the logarithm of the quantum size.strict- if true, the elements of the list are strictly increasing, and the returned number of bits is for the strict representation (e.g., storing the k-th element decreased by k).indexZeroes- if true, an upper bound on the number of skip pointers is returned; otherwise, the (exact) number of forward pointers is returned.- Returns:
- an upper bound on the number of skip pointers or the (exact) number of forward pointers.
-
newInvertedList
public void newInvertedList(long frequency, long occurrency, long sumMaxPos) throws IOExceptionStarts a new inverted list. The previous inverted list, if any, is actually written to the underlying bit stream.This method provides additional information which is necessary to build the posting list. The information can be omitted if only part of the index is being written (e.g., no positions or even no counts and positions).
- Parameters:
frequency- the frequency of the inverted list.occurrency- the occurrency of the inverted list (use -1 if you are not writing counts).sumMaxPos- the sum of the maximum position in each document (unused if positions are not indexed).- Throws:
IllegalStateException- if too few records were written for the previous inverted list.IOException- See Also:
IndexWriter.newInvertedList()
-
newInvertedList
public long newInvertedList() throws IOExceptionDescription copied from interface:IndexWriterStarts a new inverted list. The previous inverted list, if any, is actually written to the underlying bit stream.- Specified by:
newInvertedListin interfaceIndexWriter- Returns:
- the position (in bits) of the underlying bit stream where the new inverted list starts.
- Throws:
IOException
-
writeFrequency
public void writeFrequency(long frequency)
Description copied from interface:IndexWriterWrites the frequency.- Specified by:
writeFrequencyin interfaceIndexWriter- Parameters:
frequency- the (positive) number of document records that this inverted list will contain.
-
newDocumentRecord
public OutputBitStream newDocumentRecord() throws IOException
Description copied from interface:IndexWriterStarts a new document record.This method must be called exactly exactly f times, where f is the frequency specified with
IndexWriter.writeFrequency(long).- Specified by:
newDocumentRecordin interfaceIndexWriter- Returns:
- the output bit stream where the next document record data should be written, if necessary, or
null, ifIndexWriter.writeDocumentPointer(OutputBitStream, long)ignores its first argument. - Throws:
IOException
-
writeDocumentPointer
public void writeDocumentPointer(OutputBitStream out, long pointer) throws IOException
Description copied from interface:IndexWriterWrites a document pointer.This method must be called immediately after
IndexWriter.newDocumentRecord().- Specified by:
writeDocumentPointerin interfaceIndexWriter- Parameters:
out- the output bit stream where the pointer will be written.pointer- the document pointer.- Throws:
IOException
-
writePayload
public void writePayload(OutputBitStream unused, Payload payload) throws IOException
Description copied from interface:IndexWriterWrites the payload for the current document.This method must be called immediately after
IndexWriter.writeDocumentPointer(OutputBitStream, long).- Specified by:
writePayloadin interfaceIndexWriter- Parameters:
unused- the output bit stream where the payload will be written.payload- the payload.- Throws:
IOException
-
writePositionCount
public void writePositionCount(OutputBitStream unused, int count) throws IOException
Description copied from interface:IndexWriterWrites the count of the occurrences of the current term in the current document to the givenOutputBitStream.- Specified by:
writePositionCountin interfaceIndexWriter- Parameters:
unused- the output stream where the occurrences should be written.count- the count.- Throws:
IOException
-
writeDocumentPositions
public void writeDocumentPositions(OutputBitStream unused, int[] position, int offset, int count, int docSize) throws IOException
Description copied from interface:IndexWriterWrites the positions of the occurrences of the current term in the current document to the givenOutputBitStream.- Specified by:
writeDocumentPositionsin interfaceIndexWriter- Parameters:
unused- the output stream where the occurrences should be written.position- the position vector (a sequence of strictly increasing natural numbers).offset- the first valid entry inposition.count- the number of valid entries inpositionstarting fromoffset.docSize- the size of the current document (only for Golomb and interpolative coding; you can safely pass -1 otherwise).- Throws:
IOException
-
writtenBits
public long writtenBits()
Description copied from interface:IndexWriterReturns the overall number of bits written onto the underlying stream(s).- Specified by:
writtenBitsin interfaceIndexWriter- Returns:
- the number of bits written, according to the variables keeping statistical records.
-
properties
public Properties properties()
Description copied from interface:IndexWriterReturns properties of the index generated by this index writer.This method should only be called after
IndexWriter.close(). It returns a new property object containing values for (whenever appropriate)Index.PropertyKeys.DOCUMENTS,Index.PropertyKeys.TERMS,Index.PropertyKeys.POSTINGS,Index.PropertyKeys.MAXCOUNT,Index.PropertyKeys.INDEXCLASS,Index.PropertyKeys.CODING,Index.PropertyKeys.PAYLOADCLASS,BitStreamIndex.PropertyKeys.SKIPQUANTUM, andBitStreamIndex.PropertyKeys.SKIPHEIGHT.- Specified by:
propertiesin interfaceIndexWriter- Returns:
- properties a new set of properties for the just created index.
-
close
public void close() throws IOExceptionDescription copied from interface:IndexWriterCloses this index writer, completing the index creation process and releasing all resources.- Specified by:
closein interfaceIndexWriter- Throws:
IOException
-
printStats
public void printStats(PrintStream stats)
Description copied from interface:IndexWriterWrites to the given print stream statistical information about the index just built. This method must be called afterIndexWriter.close().- Specified by:
printStatsin interfaceIndexWriter- Parameters:
stats- a print stream where statistical information will be written.
-
-