Class TermCollectionVisitor
- java.lang.Object
-
- it.unimi.di.big.mg4j.search.visitor.AbstractDocumentIteratorVisitor
-
- it.unimi.di.big.mg4j.search.visitor.TermCollectionVisitor
-
- All Implemented Interfaces:
DocumentIteratorVisitor<Boolean>
public class TermCollectionVisitor extends AbstractDocumentIteratorVisitor
A visitor collecting information about terms appearing in aDocumentIterator.The purpose of this visitor is that of exploring before iteration the structure of a
DocumentIteratorto count how many terms are actually used, and set up some appearing in all leaves of nonzero frequency (the latter condition is used to skip empty iterators), possibly considering just a subset of indices. For this visitor to work, all leaves of nonzero frequency must return a non-nullvalue on a call toIndexIterator.term().During the visit, we keep track of which index/term pair have been already seen. Each pair is assigned an distinct offset—a number between zero and the overall number of distinct pairs—which is stored into each index iterator id and is used afterwards to access quickly data about the pair. Note that duplicate index/term pairs get the same offset. The overall number of distinct pairs is returned by
numberOfPairs()after a visit.The indices appearing in some valid pair are recorded; they are accessible as a vector returned by
indices(), and the map from positions in this vector to indices is inverted byindexMap().If you need to fix the index map, there's a special
prepare(ReferenceSet)method. In that case only terms associated with indices in the provided set will be collected.Warning: the semantics of
prepare(ReferenceSet)described above has been implemented in MG4J 4.0. Previously, the effect ofprepare(ReferenceSet)was just that of adding artificially indices to the index set.The offset assigned to each pair index/term is returned by
offset(Index, String). Should you need to know the terms associated with each index, they are returned byterms(Index).After a term collection, usually counters are set up by a visit of
CounterSetupVisitor.
-
-
Constructor Summary
Constructors Constructor Description TermCollectionVisitor()Creates a new term-collection visitor.
-
Method Summary
Modifier and Type Method Description Reference2IntMap<Index>indexMap()Returns a map from indices met during term collection to their position intoindices().Index[]indices()Returns the indices met during pair collection.intnumberOfPairs()Returns the number of distinct index/term pair corresponding to nonzero-frequency index iterators in the last visit.intoffset(Index index, String term)Returns the offset associated with a given pair index/term.TermCollectionVisitorprepare()Prepares this term-collection visitor.TermCollectionVisitorprepare(ReferenceSet<Index> indices)Prepares this term-collection visitor, possibly specifying the indices that should be collected.Object2IntLinkedOpenHashMap<String>term2Id()Returns the a map associating terms appearing in the query with ids.String[]terms(Index index)Returns the terms associated with the given index.StringtoString()Booleanvisit(IndexIterator indexIterator)Visits anIndexIteratorleaf.
-
-
-
Method Detail
-
prepare
public TermCollectionVisitor prepare()
Prepares this term-collection visitor.- Specified by:
preparein interfaceDocumentIteratorVisitor<Boolean>- Overrides:
preparein classAbstractDocumentIteratorVisitor- Returns:
- this term-collection visitor.
-
prepare
public TermCollectionVisitor prepare(ReferenceSet<Index> indices)
Prepares this term-collection visitor, possibly specifying the indices that should be collected.- Parameters:
indices- the set of indices that will be collected; if empty, the all indices will be collected (e.g., the call is equivalent toprepare()).- Returns:
- this term-collection visitor.
-
visit
public Boolean visit(IndexIterator indexIterator) throws IOException
Description copied from interface:DocumentIteratorVisitorVisits anIndexIteratorleaf.- Parameters:
indexIterator- the leaf to be visited.- Returns:
- an appropriate return value if the visit should continue, or
null. - Throws:
IOException
-
numberOfPairs
public int numberOfPairs()
Returns the number of distinct index/term pair corresponding to nonzero-frequency index iterators in the last visit.- Returns:
- the number distinct index/term pair corresponding to nonzero-frequency index iterators.
-
indices
public Index[] indices()
Returns the indices met during pair collection.Note that the returned array does not include indices only associated to index iterators of zero frequency, unless
prepare(ReferenceSet)was called with a nonempty argument.- Returns:
- the indices met during term collection.
-
indexMap
public Reference2IntMap<Index> indexMap()
Returns a map from indices met during term collection to their position intoindices().Note that the returned map does not include as keys indices only associated to index iterators of zero frequency, unless
prepare(ReferenceSet)was called with a nonempty argument.- Returns:
- a map from indices met during term collection to their position
into
indices().
-
terms
public String[] terms(Index index)
Returns the terms associated with the given index.- Parameters:
index- an index.- Returns:
- the terms associated with
index, in the same order in which they appeared during the visit, skipping duplicates, if some nonzero-frequency iterator based onindexwas found;nullotherwise.
-
term2Id
public Object2IntLinkedOpenHashMap<String> term2Id()
Returns the a map associating terms appearing in the query with ids.- Returns:
- a map from terms appearing in the query (in indices with counts) to ids.
-
offset
public int offset(Index index, String term)
Returns the offset associated with a given pair index/term.- Parameters:
index- an index appearing inindices().term- a term appearing in the array returned byterms(Index)with argumentindex.- Returns:
- the offset associated with the pair
index/term.
-
-