Class Concatenate
- java.lang.Object
-
- it.unimi.di.big.mg4j.tool.Combine
-
- it.unimi.di.big.mg4j.tool.Concatenate
-
public final class Concatenate extends Combine
Concatenates several indices.This implementation of
Combineconcatenates the involved indices: document 0 of the first index is document 0 of the final collection, but document 0 of the second index is numbered after the number of documents in the first index, and so on. The resulting index is exactly what you would obtain by concatenating the document sequences at the origin of each index.Note that this class can be used also with a single index, making it possible to recompress easily an index using different compression flags.
- Since:
- 1.0
- Author:
- Sebastiano Vigna
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.tool.Combine
Combine.GammaCodedIntIterator, Combine.IndexType
-
-
Field Summary
-
Fields inherited from class it.unimi.di.big.mg4j.tool.Combine
additionalProperties, bufferSize, DEFAULT_BUFFER_SIZE, frequency, hasCounts, hasPayloads, hasPositions, haveSumsMaxPos, index, indexIterator, indexReader, indexWriter, inputBasename, ioFactory, maxCount, metadataOnly, needsSizes, numberOfDocuments, numberOfOccurrences, numIndices, outputBasename, p, positionArray, predictedLengthNumBits, predictedSize, quasiSuccinctIndexWriter, size, sumsMaxPos, termQueue, usedIndex, variableQuantumIndexWriter
-
-
Constructor Summary
Constructors Constructor Description Concatenate(IOFactory ioFactory, String outputBasename, String[] inputBasename, boolean metadataOnly, int bufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags, Combine.IndexType indexType, boolean skips, int quantum, int height, int skipBufferOrCacheSize, long logInterval)Concatenates several indices into one.Concatenate(IOFactory ioFactory, String outputBasename, String[] inputBasename, IntList delete, boolean metadataOnly, int bufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags, Combine.IndexType indexType, boolean skips, int quantum, int height, int skipBufferOrCacheSize, long logInterval)Concatenates several indices into one.
-
Method Summary
Modifier and Type Method Description protected longcombine(int numUsedIndices, long occurrency)Combines several indices.protected longcombineNumberOfDocuments()Combines the number of documents.protected intcombineSizes(OutputBitStream sizesOutputBitStream)Combines size lists.static voidmain(String[] arg)
-
-
-
Constructor Detail
-
Concatenate
public Concatenate(IOFactory ioFactory, String outputBasename, String[] inputBasename, boolean metadataOnly, int bufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags, Combine.IndexType indexType, boolean skips, int quantum, int height, int skipBufferOrCacheSize, long logInterval) throws IOException, org.apache.commons.configuration.ConfigurationException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
Concatenates several indices into one.- Parameters:
ioFactory- the factory that will be used to perform I/O.outputBasename- the basename of the combined index.inputBasename- the basenames of the input indices.metadataOnly- if true, we save only metadata (term list, frequencies, global counts).bufferSize- the buffer size for index readers.writerFlags- the flags for the index writer.indexType- the type of the index to build.skips- whether to insert skips in caseinterleavedis true.quantum- the quantum of skipping structures; if negative, a percentage of space for variable-quantum indices (irrelevant ifskipsis false).height- the height of skipping towers (irrelevant ifskipsis false).skipBufferOrCacheSize- the size of the buffer used to hold temporarily inverted lists during the skipping structure construction, or the size of the bit cache used when building a quasi-succinct index.logInterval- how often we log.- Throws:
IOExceptionorg.apache.commons.configuration.ConfigurationExceptionURISyntaxExceptionClassNotFoundExceptionSecurityExceptionInstantiationExceptionIllegalAccessExceptionInvocationTargetExceptionNoSuchMethodException
-
Concatenate
public Concatenate(IOFactory ioFactory, String outputBasename, String[] inputBasename, IntList delete, boolean metadataOnly, int bufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags, Combine.IndexType indexType, boolean skips, int quantum, int height, int skipBufferOrCacheSize, long logInterval) throws IOException, org.apache.commons.configuration.ConfigurationException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
Concatenates several indices into one.- Parameters:
ioFactory- the factory that will be used to perform I/O.outputBasename- the basename of the combined index.inputBasename- the basenames of the input indices.delete- a monotonically increasing list of integers representing documents that will be deleted from the output index, ornull.metadataOnly- if true, we save only metadata (term list, frequencies, global counts).bufferSize- the buffer size for index readers.writerFlags- the flags for the index writer.indexType- the type of the index to build.skips- whether to insert skips in caseinterleavedis true.quantum- the quantum of skipping structures; if negative, a percentage of space for variable-quantum indices (irrelevant ifskipsis false).height- the height of skipping towers (irrelevant ifskipsis false).skipBufferOrCacheSize- the size of the buffer used to hold temporarily inverted lists during the skipping structure construction, or the size of the bit cache used when building a quasi-succinct index.logInterval- how often we log.- Throws:
IOExceptionorg.apache.commons.configuration.ConfigurationExceptionURISyntaxExceptionClassNotFoundExceptionSecurityExceptionInstantiationExceptionIllegalAccessExceptionInvocationTargetExceptionNoSuchMethodException
-
-
Method Detail
-
combineNumberOfDocuments
protected long combineNumberOfDocuments()
Description copied from class:CombineCombines the number of documents.- Specified by:
combineNumberOfDocumentsin classCombine- Returns:
- the number of documents of the combined index.
-
combineSizes
protected int combineSizes(OutputBitStream sizesOutputBitStream) throws IOException
Description copied from class:CombineCombines size lists.- Specified by:
combineSizesin classCombine- Returns:
- the maximum size of a document in the combined index.
- Throws:
IOException
-
combine
protected long combine(int numUsedIndices, long occurrency) throws IOExceptionDescription copied from class:CombineCombines several indices.When this method is called, exactly
numUsedIndicesentries ofCombine.usedIndexcontain, in increasing order, the indices containing inverted lists for the current term. Implementations of this method must combine the inverted list and return the total frequency.- Specified by:
combinein classCombine- Parameters:
numUsedIndices- the number of valid entries inCombine.usedIndex.occurrency- the occurrency of the term (used only when buildingCombine.IndexType.QUASI_SUCCINCTindices).- Returns:
- the total frequency.
- Throws:
IOException
-
main
public static void main(String[] arg) throws org.apache.commons.configuration.ConfigurationException, SecurityException, com.martiansoftware.jsap.JSAPException, IOException, URISyntaxException, ClassNotFoundException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
- Throws:
org.apache.commons.configuration.ConfigurationExceptionSecurityExceptioncom.martiansoftware.jsap.JSAPExceptionIOExceptionURISyntaxExceptionClassNotFoundExceptionInstantiationExceptionIllegalAccessExceptionInvocationTargetExceptionNoSuchMethodException
-
-