Package it.unimi.di.big.mg4j.tool
Class Merge
- java.lang.Object
-
- it.unimi.di.big.mg4j.tool.Combine
-
- it.unimi.di.big.mg4j.tool.Merge
-
public class Merge extends Combine
Merges several indices.This class merges indices by performing a simple ordered list merge. Documents appearing in two indices will cause an error.
- Since:
- 1.0
- Author:
- Sebastiano Vigna
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.tool.Combine
Combine.GammaCodedIntIterator, Combine.IndexType
-
-
Field Summary
Fields Modifier and Type Field Description protected long[]docThe reference array of the document queue.protected LongHeapSemiIndirectPriorityQueuedocumentQueueThe queue containing document pointers (for remapped indices).-
Fields inherited from class it.unimi.di.big.mg4j.tool.Combine
additionalProperties, bufferSize, DEFAULT_BUFFER_SIZE, frequency, hasCounts, hasPayloads, hasPositions, haveSumsMaxPos, index, indexIterator, indexReader, indexWriter, inputBasename, ioFactory, maxCount, metadataOnly, needsSizes, numberOfDocuments, numberOfOccurrences, numIndices, outputBasename, p, positionArray, predictedLengthNumBits, predictedSize, quasiSuccinctIndexWriter, size, sumsMaxPos, termQueue, usedIndex, variableQuantumIndexWriter
-
-
Constructor Summary
Constructors Constructor Description Merge(IOFactory ioFactory, String outputBasename, String[] inputBasename, boolean metadataOnly, int bufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags, Combine.IndexType indexType, boolean skips, int quantum, int height, int skipBufferOrCacheSize, long logInterval)Merges several indices into one.Merge(IOFactory ioFactory, String outputBasename, String[] inputBasename, IntList delete, boolean metadataOnly, int bufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags, Combine.IndexType indexType, boolean skips, int quantum, int height, int skipBufferOrCacheSize, long logInterval)Merges several indices into one.
-
Method Summary
Modifier and Type Method Description protected longcombine(int numUsedIndices, long occurrency)Combines several indices.protected longcombineNumberOfDocuments()Combines the number of documents.protected intcombineSizes(OutputBitStream sizesOutputBitStream)Combines size lists.static voidmain(String[] arg)
-
-
-
Field Detail
-
doc
protected long[] doc
The reference array of the document queue.
-
documentQueue
protected LongHeapSemiIndirectPriorityQueue documentQueue
The queue containing document pointers (for remapped indices).
-
-
Constructor Detail
-
Merge
public Merge(IOFactory ioFactory, String outputBasename, String[] inputBasename, boolean metadataOnly, int bufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags, Combine.IndexType indexType, boolean skips, int quantum, int height, int skipBufferOrCacheSize, long logInterval) throws IOException, org.apache.commons.configuration.ConfigurationException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
Merges several indices into one.- Parameters:
ioFactory- the factory that will be used to perform I/O.outputBasename- the basename of the combined index.inputBasename- the basenames of the input indices.metadataOnly- if true, we save only metadata (term list, frequencies, global counts).bufferSize- the buffer size for index readers.writerFlags- the flags for the index writer.indexType- the type of the index to build.skips- whether to insert skips in caseinterleavedis true.quantum- the quantum of skipping structures; if negative, a percentage of space for variable-quantum indices (irrelevant ifskipsis false).height- the height of skipping towers (irrelevant ifskipsis false).skipBufferOrCacheSize- the size of the buffer used to hold temporarily inverted lists during the skipping structure construction, or the size of the bit cache used when building a quasi-succinct index.logInterval- how often we log.- Throws:
IOExceptionorg.apache.commons.configuration.ConfigurationExceptionURISyntaxExceptionClassNotFoundExceptionSecurityExceptionInstantiationExceptionIllegalAccessExceptionInvocationTargetExceptionNoSuchMethodException
-
Merge
public Merge(IOFactory ioFactory, String outputBasename, String[] inputBasename, IntList delete, boolean metadataOnly, int bufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags, Combine.IndexType indexType, boolean skips, int quantum, int height, int skipBufferOrCacheSize, long logInterval) throws IOException, org.apache.commons.configuration.ConfigurationException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
Merges several indices into one.- Parameters:
ioFactory- the factory that will be used to perform I/O.outputBasename- the basename of the combined index.inputBasename- the basenames of the input indices.delete- a monotonically increasing list of integers representing documents that will be deleted from the output index, ornull.metadataOnly- if true, we save only metadata (term list, frequencies, global counts).bufferSize- the buffer size for index readers.writerFlags- the flags for the index writer.indexType- the type of the index to build.skips- whether to insert skips in caseinterleavedis true.quantum- the quantum of skipping structures; if negative, a percentage of space for variable-quantum indices (irrelevant ifskipsis false).height- the height of skipping towers (irrelevant ifskipsis false).skipBufferOrCacheSize- the size of the buffer used to hold temporarily inverted lists during the skipping structure construction, or the size of the bit cache used when building a quasi-succinct index.logInterval- how often we log.- Throws:
IOExceptionorg.apache.commons.configuration.ConfigurationExceptionURISyntaxExceptionClassNotFoundExceptionSecurityExceptionInstantiationExceptionIllegalAccessExceptionInvocationTargetExceptionNoSuchMethodException
-
-
Method Detail
-
combineNumberOfDocuments
protected long combineNumberOfDocuments()
Description copied from class:CombineCombines the number of documents.- Specified by:
combineNumberOfDocumentsin classCombine- Returns:
- the number of documents of the combined index.
-
combineSizes
protected int combineSizes(OutputBitStream sizesOutputBitStream) throws IOException
Description copied from class:CombineCombines size lists.- Specified by:
combineSizesin classCombine- Returns:
- the maximum size of a document in the combined index.
- Throws:
IOException
-
combine
protected long combine(int numUsedIndices, long occurrency) throws IOExceptionDescription copied from class:CombineCombines several indices.When this method is called, exactly
numUsedIndicesentries ofCombine.usedIndexcontain, in increasing order, the indices containing inverted lists for the current term. Implementations of this method must combine the inverted list and return the total frequency.- Specified by:
combinein classCombine- Parameters:
numUsedIndices- the number of valid entries inCombine.usedIndex.occurrency- the occurrency of the term (used only when buildingCombine.IndexType.QUASI_SUCCINCTindices).- Returns:
- the total frequency.
- Throws:
IOException
-
main
public static void main(String[] arg) throws org.apache.commons.configuration.ConfigurationException, SecurityException, com.martiansoftware.jsap.JSAPException, IOException, URISyntaxException, ClassNotFoundException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
- Throws:
org.apache.commons.configuration.ConfigurationExceptionSecurityExceptioncom.martiansoftware.jsap.JSAPExceptionIOExceptionURISyntaxExceptionClassNotFoundExceptionInstantiationExceptionIllegalAccessExceptionInvocationTargetExceptionNoSuchMethodException
-
-