Tanl::NER::Resources Struct Reference

List of all members.

Public Member Functions
	Resources (char const POStag, char const NEtag)
	Resources (std::string &locale, char const POStag, char const NEtag)
	Resources (std::string &resourceDir, std::string &locale, char const POStag, char const NEtag)
size_t	typesCount ()
char const *	typeName (EntityType et)
void	load (std::string &resourceDir)
	Load all resources from the given directory.
template<class WordSet >
void	load (std::vector< WordSet > &sets, char const *file)
	Load a group of WordSets from a file.
template<class WordSet >
void	load (vector< WordSet > &sets, char const *file)
template<class WordSet >
void	load (WordSet sets, char const file)
Public Attributes
Text::WordIndex	classId
	Maps class names to class IDs.
char const *	language
char const *	POStag
char const *	NEtag
TagSet	prevTokenType
TagSet	nextTokenType
std::vector < Tanl::Text::NormWordSet >	dict
Tanl::Text::NormWordSet	moneyDict
Tanl::Text::NormWordSet	namesDict
Tanl::Text::NormWordSet	timeDict
Tanl::Text::NormWordSet	prodDict
Tanl::Text::NormWordSet	FWL
	FWL (Frequent Word List): words that occur in more than 5 documents.
std::vector < Tanl::Text::NormWordSet >	designators
	CPW (Common Preceding Words): 20 words that most often precede names of a certain class.
std::vector < Tanl::Text::NormWordSet >	preBigrams
	CPB (Common Preceding Bigrams): bigrams that often precede names of a certain class.
std::vector < Tanl::Text::NormWordSet >	prefixes
	PRE (Prefix for Class): common 3-letter prefix for each class.
std::vector< Tanl::Text::Suffixes >	suffixes
	SUF (Suffix for Class): common 3-letter suffix for each class.
std::vector < Tanl::Text::NormWordSet >	firstWords
	EFW (Entity First Words): list of words starting an entity.
std::vector < Tanl::Text::NormWordSet >	lastWords
	ELW (Entity Last Words): list of words terminating an entity.
std::vector < Tanl::Text::NormWordSet >	lowerInterm
	NAW (Name After Words): list of words after an entity.
Static Public Attributes
static IXE::conf_set< std::string >	entityTypes
	The entity type names.

Member Function Documentation

template<class WordSet >

void Tanl::NER::Resources::load	(	std::vector< WordSet > &	sets,
		char const *	file
	)			`[inline]`

Load a group of WordSets from a file.

The file contains one word per line in the format: class word where class is an entity type, like LOC, MISC, ORG, PER.

void Tanl::NER::Resources::load ( std::string & resourceDir )

Load all resources from the given directory.

Referenced by Tanl::NER::NER::NER().

Member Data Documentation

Text::WordIndex Tanl::NER::Resources::classId

Maps class names to class IDs.

IXE::conf_set< std::string > Tanl::NER::Resources::entityTypes [static]

The entity type names.

Referenced by Tanl::NER::NER::tag().

Tanl::Text::NormWordSet Tanl::NER::Resources::FWL

FWL (Frequent Word List): words that occur in more than 5 documents.

std::vector<Tanl::Text::NormWordSet> Tanl::NER::Resources::lowerInterm

NAW (Name After Words): list of words after an entity.

e.g.: center, museum, square, street LIW (Lowercase Intermediate Words): list of lowercase words appearing within a sequence, eg: PER: "van der", "de", "of" ORG: al, in, zonder, vor, for

The documentation for this struct was generated from the following files:

tag/NER/NerFeatureExtractor.h
tag/NER/NerFeatureExtractor.cpp
tag/SST/SstFeatureExtractor.cpp

Tanl::NER::Resources Struct Reference

Public Member Functions

Public Attributes

Static Public Attributes

Member Function Documentation

Member Data Documentation