Tanl Linguistic Pipeline

Tanl::SST::Resources Struct Reference

List of all members.

Public Member Functions

 Resources (std::string &locale)
 Resources (std::string &resourceDir, std::string &locale)
char const * typeName (EntityType et)
void load (std::string &resourceDir)
 Load all resources from the given directory.
template<class WordSet >
void load (WordSet *sets, char const *file)
 Load a group of WordSets from a file.

Public Attributes

Text::WordIndex classId
 Maps class names to class IDs.
char const * language
TagSet prevTokenType
TagSet nextTokenType
Tanl::Text::NormWordSet FWL
 FWL (Frequent Word List): words that occur in more than 5 documents.
Tanl::Text::NormWordSet designators [NUM_CLASSES]
 CPW (Common Preceding Words): 20 words that most often precede names of a certain class.
Tanl::Text::NormWordSet preBigrams [NUM_CLASSES]
 CPB (Common Preceding Bigrams): bigrams that often precede names of a certain class.
Tanl::Text::Suffixes suffixes [NUM_CLASSES]
 SUF (Suffix for Class): common 3-letter suffix for each class.
Tanl::Text::NormWordSet lastWords [NUM_CLASSES]
 NLW (Name Last Words): list of words terminating an entity.
Tanl::Text::NormWordSet lowerInterm [NUM_CLASSES]
 LIW (Lowercase Intermediate Words): list of lowercase words appearing within a sequence, eg: PER: "van der", "de", "of" ORG: al, in, zonder, vor, for.

Static Public Attributes

static char const * classNames []
 Table of entity type names.
static const int nClasses = NUM_CLASSES
 Number of NE classes.

Member Function Documentation

template<class WordSet >
void Tanl::SST::Resources::load ( WordSet *  sets,
char const *  file 
) [inline]

Load a group of WordSets from a file.

The file contains one word per line in the format: class word where class is the name of an entity tag, like LOC, MISC, ORG, PER.

void Tanl::SST::Resources::load ( std::string &  resourceDir  ) 

Load all resources from the given directory.

Referenced by Tanl::SST::SST::SST().


Member Data Documentation

Maps class names to class IDs.

char const * Tanl::SST::Resources::classNames [static]
Initial value:
 {
  "adj.all", "adj.pert", "adj.ppl", "adv.all", "noun.Tops", "noun.act",
  "noun.animal", "noun.artifact", "noun.attribute", "noun.body",
  "noun.cognition", "noun.communication", "noun.event", "noun.feeling",
  "noun.food", "noun.group", "noun.location", "noun.motive", "noun.object",
  "noun.other", "noun.person", "noun.phenomenon", "noun.plant",
  "noun.possession", "noun.process", "noun.quantity", "noun.relation",
  "noun.shape", "noun.state", "noun.substance", "noun.time", "verb.body",
  "verb.change", "verb.cognition", "verb.communication", "verb.competition",
  "verb.consumption", "verb.contact", "verb.creation", "verb.emotion",
  "verb.motion", "verb.perception", "verb.possession", "verb.social",
  "verb.stative", "verb.weather"
}

Table of entity type names.

FWL (Frequent Word List): words that occur in more than 5 documents.

const int Tanl::SST::Resources::nClasses = NUM_CLASSES [static]

The documentation for this struct was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines
 
Copyright © 2005-2011 G. Attardi. Generated on 4 Mar 2011 by doxygen 1.6.1.