Tanl Linguistic Pipeline |
Public Member Functions | |
reference | operator* () |
pointer | operator-> () |
const_iterator & | operator++ () |
Advance a PostingList::const_iterator. | |
const_iterator | operator++ (int) |
const_iterator & | next (DocID min) |
Advance posting list to document with DocID min. | |
bool | atEnd () |
size_type | size () const |
void | copyHits (std::fstream &o) |
Copy hits to file. | |
size_type | index () const |
Public Attributes | |
HitsCursor | hitsCursor |
Protected Member Functions | |
const_iterator (size_type s, byte const *p) | |
Protected Attributes | |
size_type | rest_ |
rest_ contains the number of remaining elements including the current one, already read into posting (cannot be 0, which means end) | |
byte const * | c_ |
start of next Posting (may differ from end of current one) | |
size_type | tablesz_ |
The size of the skip list table. | |
size_type | size_ |
The size of the PostingList (same as size_ of parent PostingList). | |
PostingOffset * | table_ |
table_ contains pointers to posting lists of each Postings_Segment_Size element, and the DocID of the corresponding document. | |
value_type | posting |
Size | hitlen |
Friends | |
class | PostingList |
bool | operator== (const_iterator const &i, const_iterator const &j) |
bool | operator!= (const_iterator const &i, const_iterator const &j) |
size_type IXE::PostingList::const_iterator::index | ( | ) | const [inline] |
Referenced by IXE::PostingList::remap_iterator::operator++().
PostingList::const_iterator & IXE::PostingList::const_iterator::next | ( | DocID | min | ) |
Advance posting list to document with DocID min.
Exploit PostingOffset table_ to perform binary search and jump to start of segment containing requested posting.
min | the DocID of the requested posting. |
References c_, operator++(), IXE::parseEptacode(), rest_, size_, table_, and tablesz_.
Referenced by IXE::PostingList::remap_iterator::operator++().
PostingList::const_iterator & IXE::PostingList::const_iterator::operator++ | ( | ) |
Advance a PostingList::const_iterator.
A posting has the following format:
I[0x80{M}...0x80]OL{H}^O
that is: a DocID (I) followed by zero or more TermColors (M) surrounded by 0x80 bytes, followed by the number of occurrences in the document (O), followed by the byte length of the hitlist less O (L), followed by O hits, i.e. positions where the word occurs in document I. Each H is a document position, represented as delta increment with respect to the previous one. First word is at position 1.
Reimplemented in IXE::PostingList::remap_iterator.
References c_, IXE::parseEptacode(), and rest_.
Referenced by next().
PostingOffset* IXE::PostingList::const_iterator::table_ [protected] |
table_ contains pointers to posting lists of each Postings_Segment_Size element, and the DocID of the corresponding document.
This is used to perform binary search to the segment containing the posting. Thereafter the segment is scanned linearly. Ex: 0: off0, 1234 (1024th posting contains docID 1234, at offset off0) 1: off1, 2345 (2*1024th posting contains docID 2345, at offset off1) ... n: offn, NNNN ((n+1)*1024th posting contains docID NN, at offset offn)
Referenced by next(), and IXE::PostingList::remap_iterator::remap_iterator().