Tanl Linguistic Pipeline

Tanl::NER::Resources Struct Reference

List of all members.

Public Member Functions

 Resources (char const *POStag, char const *NEtag)
 Resources (std::string &locale, char const *POStag, char const *NEtag)
 Resources (std::string &resourceDir, std::string &locale, char const *POStag, char const *NEtag)
size_t typesCount ()
char const * typeName (EntityType et)
void load (std::string &resourceDir)
 Load all resources from the given directory.
template<class WordSet >
void load (std::vector< WordSet > &sets, char const *file)
 Load a group of WordSets from a file.
template<class WordSet >
void load (vector< WordSet > &sets, char const *file)
template<class WordSet >
void load (WordSet *sets, char const *file)

Public Attributes

Text::WordIndex classId
 Maps class names to class IDs.
char const * language
char const * POStag
char const * NEtag
TagSet prevTokenType
TagSet nextTokenType
std::vector
< Tanl::Text::NormWordSet
dict
Tanl::Text::NormWordSet moneyDict
Tanl::Text::NormWordSet namesDict
Tanl::Text::NormWordSet timeDict
Tanl::Text::NormWordSet prodDict
Tanl::Text::NormWordSet FWL
 FWL (Frequent Word List): words that occur in more than 5 documents.
std::vector
< Tanl::Text::NormWordSet
designators
 CPW (Common Preceding Words): 20 words that most often precede names of a certain class.
std::vector
< Tanl::Text::NormWordSet
preBigrams
 CPB (Common Preceding Bigrams): bigrams that often precede names of a certain class.
std::vector
< Tanl::Text::NormWordSet
prefixes
 PRE (Prefix for Class): common 3-letter prefix for each class.
std::vector< Tanl::Text::Suffixessuffixes
 SUF (Suffix for Class): common 3-letter suffix for each class.
std::vector
< Tanl::Text::NormWordSet
firstWords
 EFW (Entity First Words): list of words starting an entity.
std::vector
< Tanl::Text::NormWordSet
lastWords
 ELW (Entity Last Words): list of words terminating an entity.
std::vector
< Tanl::Text::NormWordSet
lowerInterm
 NAW (Name After Words): list of words after an entity.

Static Public Attributes

static IXE::conf_set< std::string > entityTypes
 The entity type names.

Member Function Documentation

template<class WordSet >
void Tanl::NER::Resources::load ( std::vector< WordSet > &  sets,
char const *  file 
) [inline]

Load a group of WordSets from a file.

The file contains one word per line in the format: class word where class is an entity type, like LOC, MISC, ORG, PER.

void Tanl::NER::Resources::load ( std::string &  resourceDir  ) 

Load all resources from the given directory.

Referenced by Tanl::NER::NER::NER().


Member Data Documentation

Maps class names to class IDs.

The entity type names.

Referenced by Tanl::NER::NER::tag().

FWL (Frequent Word List): words that occur in more than 5 documents.

NAW (Name After Words): list of words after an entity.

e.g.: center, museum, square, street LIW (Lowercase Intermediate Words): list of lowercase words appearing within a sequence, eg: PER: "van der", "de", "of" ORG: al, in, zonder, vor, for


The documentation for this struct was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines
 
Copyright © 2005-2011 G. Attardi. Generated on 4 Mar 2011 by doxygen 1.6.1.