Tanl Linguistic Pipeline |
Public Member Functions | |
Resources (std::string &locale) | |
Resources (std::string &resourceDir, std::string &locale) | |
char const * | typeName (EntityType et) |
void | load (std::string &resourceDir) |
Load all resources from the given directory. | |
template<class WordSet > | |
void | load (WordSet *sets, char const *file) |
Load a group of WordSets from a file. | |
Public Attributes | |
Text::WordIndex | classId |
Maps class names to class IDs. | |
char const * | language |
TagSet | prevTokenType |
TagSet | nextTokenType |
Tanl::Text::NormWordSet | FWL |
FWL (Frequent Word List): words that occur in more than 5 documents. | |
Tanl::Text::NormWordSet | designators [NUM_CLASSES] |
CPW (Common Preceding Words): 20 words that most often precede names of a certain class. | |
Tanl::Text::NormWordSet | preBigrams [NUM_CLASSES] |
CPB (Common Preceding Bigrams): bigrams that often precede names of a certain class. | |
Tanl::Text::Suffixes | suffixes [NUM_CLASSES] |
SUF (Suffix for Class): common 3-letter suffix for each class. | |
Tanl::Text::NormWordSet | lastWords [NUM_CLASSES] |
NLW (Name Last Words): list of words terminating an entity. | |
Tanl::Text::NormWordSet | lowerInterm [NUM_CLASSES] |
LIW (Lowercase Intermediate Words): list of lowercase words appearing within a sequence, eg: PER: "van der", "de", "of" ORG: al, in, zonder, vor, for. | |
Static Public Attributes | |
static char const * | classNames [] |
Table of entity type names. | |
static const int | nClasses = NUM_CLASSES |
Number of NE classes. |
void Tanl::SST::Resources::load | ( | WordSet * | sets, | |
char const * | file | |||
) | [inline] |
Load a group of WordSets from a file.
The file contains one word per line in the format: class word where class is the name of an entity tag, like LOC, MISC, ORG, PER.
void Tanl::SST::Resources::load | ( | std::string & | resourceDir | ) |
Load all resources from the given directory.
Referenced by Tanl::SST::SST::SST().
Maps class names to class IDs.
char const * Tanl::SST::Resources::classNames [static] |
{ "adj.all", "adj.pert", "adj.ppl", "adv.all", "noun.Tops", "noun.act", "noun.animal", "noun.artifact", "noun.attribute", "noun.body", "noun.cognition", "noun.communication", "noun.event", "noun.feeling", "noun.food", "noun.group", "noun.location", "noun.motive", "noun.object", "noun.other", "noun.person", "noun.phenomenon", "noun.plant", "noun.possession", "noun.process", "noun.quantity", "noun.relation", "noun.shape", "noun.state", "noun.substance", "noun.time", "verb.body", "verb.change", "verb.cognition", "verb.communication", "verb.competition", "verb.consumption", "verb.contact", "verb.creation", "verb.emotion", "verb.motion", "verb.perception", "verb.possession", "verb.social", "verb.stative", "verb.weather" }
Table of entity type names.
FWL (Frequent Word List): words that occur in more than 5 documents.
const int Tanl::SST::Resources::nClasses = NUM_CLASSES [static] |
Number of NE classes.
Referenced by Tanl::SST::SstFeatureExtractor::extract(), and Tanl::SST::SstFeatureExtractor::reset().