Tanl Linguistic Pipeline

Tanl::POS::SuffixGuesser Struct Reference

The task of the suffix guesser is to predict a tag-distribution based on the suffix of the word. More...

#include <SuffixGuesser.h>

List of all members.

Public Member Functions

void serialize (std::ostream &out)
 Serializes a SuffixGuesser object.
void serialize (std::istream &in)
 De-Serializes a SuffixGuesser object.
void add_word (int n, std::string &word, TagID tag, int count)
 Adds a word to the suffix trie.
double tagprob (std::string &word, int tagid)
 TODO.
double tagprobs (std::string &word, std::vector< double > &probs)
 TODO.

Static Public Member Functions

static double calculate_theta (std::vector< double > &apriori_tag_probs)
 TODO.

Public Attributes

double theta
 Theta used in the interpolation process.
TrieNode trie
 Trie of suffices.
Counts empty_counts
 Empty Counts object.

Detailed Description

The task of the suffix guesser is to predict a tag-distribution based on the suffix of the word.

In training phase, it calculates for each suffix its count in the corpus, in total and for each tag separately. Let's assume a word ending with ABCDE. During prediction, it linearly interpolates the looked-up predictions for the ABCDE, BCDE, CDE, DE, E, "" suffices. Interpolation is done successively with weights 1 and theta, so weights are basically powers of 1/(1+theta), with the shorter suffix getting the larger weight.


Member Function Documentation

void Tanl::POS::SuffixGuesser::add_word ( int  n,
std::string &  word,
TagID  tag,
int  count 
)

Adds a word to the suffix trie.

Parameters:
n Max suffix size.
word String to be added to the trie.
tag Tag identifier used to tag the word we are trying to add.
count Amount of times word was tagged with tag inside the corpus.
void Tanl::POS::SuffixGuesser::serialize ( std::istream &  in  ) 

De-Serializes a SuffixGuesser object.

Parameters:
in The stream from which the object will be read

References Tanl::POS::TrieNode::serialize(), serialize(), theta, and trie.

void Tanl::POS::SuffixGuesser::serialize ( std::ostream &  out  ) 

Serializes a SuffixGuesser object.

Parameters:
out The stream wherein the object will be written

References Tanl::POS::TrieNode::serialize(), theta, and trie.

Referenced by serialize().


The documentation for this struct was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines
 
Copyright © 2005-2011 G. Attardi. Generated on 4 Mar 2011 by doxygen 1.6.1.