Tanl Linguistic Pipeline |
Trie to represent the suffices plus the additional tag counting information. More...
#include <SuffixGuesser.h>
Classes | |
struct | counts_iterator |
Iterator of a TrieNode. More... | |
Public Member Functions | |
TrieNode () | |
Default constructor. | |
void | set_tag_info (Counts *tag) |
tag info setter. | |
void | serialize (std::ostream &out) |
Serializes a TrieNode object. | |
void | serialize (std::istream &in) |
De-serializes a TrieNode object. | |
TrieNode * | add_char (Counts *legacy_counts, bool after_branch, int ix, int stop, std::string &word, TagID tag, int count) |
Recursive method that adds a word, char by char to the trie. | |
bool | empty_node () |
This method verifies whether the node is empty. | |
Public Attributes | |
Counts * | tag_info |
Tag information. Can be null since it is optional. | |
bool | terminal |
Indicates whether the Node represents a terminal node. |
Trie to represent the suffices plus the additional tag counting information.
The trie is designed defining a struct TrieNode that inherits from map<char, TrieNode*>. That way each node of the trie is as well a map that contains for each char a descendant. At the same time each node contains additional tag information, that may be null.
The trie defines an internal counts_iterator structure that allows us to iterate over the counts of the suffices in the trie.
Tanl::POS::TrieNode::TrieNode | ( | ) | [inline] |
Default constructor.
This constructor sets the tag information to null and the terminal flag to false;
TrieNode* Tanl::POS::TrieNode::add_char | ( | Counts * | legacy_counts, | |
bool | after_branch, | |||
int | ix, | |||
int | stop, | |||
std::string & | word, | |||
TagID | tag, | |||
int | count | |||
) |
Recursive method that adds a word, char by char to the trie.
This method updates the trie of suffices by adding new branches or updating the already existing info about the tag counting.
For example if we have the EMPTY trie and we add to it the word dog with tag info: tag=1 -- value=20, the following trie will be obtained (each node has between parenthesis the tag counting info)
Node (global = 20, map = {1:20}) |g ----> Node (null) |o ----> Node (null) |d ----> Terminal Node (null)
Now if we add the word slumdog to the previous trie with tag info: tag = 2 -- value 10, then the following will be obtained
Node (global = 30, map = {1:20, 2:10}) |g ----> Node (null) |o ----> Node (null) |d ----> Node (null) |m ----> Node (global = 10, map = {2:10}) |u ----> Node (null) |l ----> Node (null) |s ----> Terminal Node (null)
legacy_counts | Tag info that was inherited from upper nodes. | |
after_branch | Must be true when the add_char call is from outside the trie or when it is right after a braching operation. | |
ix | At the begining is the end of the word | |
stop | It indicates the position in the word that bounds the suffix. If it is 0 the method will add the whole word to the trie. | |
word | String that we are trying to add to the trie. | |
tag | Integer that represents the tag we are using. | |
count | Is the amount of times word was found in the corpus tagged with tag |
bool Tanl::POS::TrieNode::empty_node | ( | ) |
void Tanl::POS::TrieNode::serialize | ( | std::istream & | in | ) |
De-serializes a TrieNode object.
in | The stream from which the object will be read |
References Tanl::POS::Counts::serialize(), serialize(), tag_info, and terminal.
void Tanl::POS::TrieNode::serialize | ( | std::ostream & | out | ) |
Serializes a TrieNode object.
out | The stream wherein the object will be written |
References Tanl::POS::Counts::serialize(), tag_info, and terminal.
Referenced by Tanl::POS::SuffixGuesser::serialize(), and serialize().
void Tanl::POS::TrieNode::set_tag_info | ( | Counts * | tag | ) |
tag info setter.
While setting the new value, if an old value exists it is destroyed.
tag | New tag info value |
References tag_info.