Tanl::SST::Resources Struct Reference

List of all members.

Public Member Functions
	Resources (std::string &locale)
	Resources (std::string &resourceDir, std::string &locale)
char const *	typeName (EntityType et)
void	load (std::string &resourceDir)
	Load all resources from the given directory.
template<class WordSet >
void	load (WordSet sets, char const file)
	Load a group of WordSets from a file.
Public Attributes
Text::WordIndex	classId
	Maps class names to class IDs.
char const *	language
TagSet	prevTokenType
TagSet	nextTokenType
Tanl::Text::NormWordSet	FWL
	FWL (Frequent Word List): words that occur in more than 5 documents.
Tanl::Text::NormWordSet	designators [NUM_CLASSES]
	CPW (Common Preceding Words): 20 words that most often precede names of a certain class.
Tanl::Text::NormWordSet	preBigrams [NUM_CLASSES]
	CPB (Common Preceding Bigrams): bigrams that often precede names of a certain class.
Tanl::Text::Suffixes	suffixes [NUM_CLASSES]
	SUF (Suffix for Class): common 3-letter suffix for each class.
Tanl::Text::NormWordSet	lastWords [NUM_CLASSES]
	NLW (Name Last Words): list of words terminating an entity.
Tanl::Text::NormWordSet	lowerInterm [NUM_CLASSES]
	LIW (Lowercase Intermediate Words): list of lowercase words appearing within a sequence, eg: PER: "van der", "de", "of" ORG: al, in, zonder, vor, for.
Static Public Attributes
static char const *	classNames []
	Table of entity type names.
static const int	nClasses = NUM_CLASSES
	Number of NE classes.

Member Function Documentation

template<class WordSet >

void Tanl::SST::Resources::load	(	WordSet *	sets,
		char const *	file
	)			`[inline]`

Load a group of WordSets from a file.

The file contains one word per line in the format: class word where class is the name of an entity tag, like LOC, MISC, ORG, PER.

void Tanl::SST::Resources::load ( std::string & resourceDir )

Load all resources from the given directory.

Referenced by Tanl::SST::SST::SST().

Member Data Documentation

Text::WordIndex Tanl::SST::Resources::classId

Maps class names to class IDs.

char const * Tanl::SST::Resources::classNames [static]

Initial value:

 {
  "adj.all", "adj.pert", "adj.ppl", "adv.all", "noun.Tops", "noun.act",
  "noun.animal", "noun.artifact", "noun.attribute", "noun.body",
  "noun.cognition", "noun.communication", "noun.event", "noun.feeling",
  "noun.food", "noun.group", "noun.location", "noun.motive", "noun.object",
  "noun.other", "noun.person", "noun.phenomenon", "noun.plant",
  "noun.possession", "noun.process", "noun.quantity", "noun.relation",
  "noun.shape", "noun.state", "noun.substance", "noun.time", "verb.body",
  "verb.change", "verb.cognition", "verb.communication", "verb.competition",
  "verb.consumption", "verb.contact", "verb.creation", "verb.emotion",
  "verb.motion", "verb.perception", "verb.possession", "verb.social",
  "verb.stative", "verb.weather"
}

Table of entity type names.

Tanl::Text::NormWordSet Tanl::SST::Resources::FWL

FWL (Frequent Word List): words that occur in more than 5 documents.

const int Tanl::SST::Resources::nClasses = NUM_CLASSES [static]

Number of NE classes.

Referenced by Tanl::SST::SstFeatureExtractor::extract(), and Tanl::SST::SstFeatureExtractor::reset().

The documentation for this struct was generated from the following files:

tag/SST/SstFeatureExtractor.h
tag/SST/SstFeatureExtractor.cpp

Tanl::SST::Resources Struct Reference

Public Member Functions

Public Attributes

Static Public Attributes

Member Function Documentation

Member Data Documentation