Tanl Linguistic Pipeline

Tanl::SST::SstFeatureExtractor Class Reference

Extract features for SST. More...

#include <SstFeatureExtractor.h>

Inheritance diagram for Tanl::SST::SstFeatureExtractor:
Tanl::Classifier::FeatureExtractor< Classifier::Features, const int >

List of all members.

Public Member Functions

 SstFeatureExtractor (Resources &resources)
void analyze (Sentence *sent, int zone)
 Set the sentence from which to extract features.
void extract (Classifier::Features &feats, const int &pos)
 Extract features at position pos in sentence and put them into .
void reset ()
 Reset to initial state.
void classified (int position, char const *className)
 Record that a token at given position has been classified in the given class.

Protected Attributes

Resourcesresources
bool insideQuotes
TokenCategorizer tokenCategorizer
 extracts token type
std::vector< EntityTypetokenTypes
 types of sentence tokens
Sentencesentence
 sentence being analyzed
unordered_map< string, bool > capitalized
 Words that appeared previously as capitalized.
Tanl::Text::NormWordSet prevClass [NUM_CLASSES]
 List of words previously designated as given class.
Tanl::Text::NormWordSet otherLast [NUM_CLASSES]
 other word in Cap sequence is in Last Words
Tanl::Text::NormWordSet acronyms
 List of previously found acronyms.

Detailed Description

Extract features for SST.


Member Function Documentation

void Tanl::SST::SstFeatureExtractor::analyze ( Sentence sent,
int  zone 
)

Set the sentence from which to extract features.

References Tanl::SST::TokenCategorizer::analyze(), Tanl::Token::form, sentence, tokenCategorizer, and tokenTypes.

Referenced by Tanl::SST::SstEventStream::analyze().

void Tanl::SST::SstFeatureExtractor::extract ( Classifier::Features feats,
const int &  pos 
) [virtual]

Extract features at position pos in sentence and put them into .

Local features include:

  1. lexical word features
  2. features that depend on surrounding words
  3. features from dictionary lookup of word.

Implements Tanl::Classifier::FeatureExtractor< Classifier::Features, const int >.

References acronyms, Tanl::Token::attrIndex(), capitalized, Tanl::Text::NormWordSet::contains(), Tanl::Token::form, Tanl::Token::get(), Tanl::Text::NormWordSet::insert(), Tanl::SST::Resources::nClasses, otherLast, prevClass, Tanl::Text::RegExp::Pattern::replace(), sentence, and Tanl::Text::to_upper().

Referenced by Tanl::SST::SstEventStream::next().

void Tanl::SST::SstFeatureExtractor::reset (  )  [virtual]

Reset to initial state.

Useful when reading several documents with the same tagger instance.

Reimplemented from Tanl::Classifier::FeatureExtractor< Classifier::Features, const int >.

References acronyms, capitalized, Tanl::SST::Resources::nClasses, otherLast, and prevClass.

Referenced by Tanl::SST::SstEventStream::reset().


Member Data Documentation

Words that appeared previously as capitalized.

Referenced by extract(), and reset().


The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines
 
Copyright © 2005-2011 G. Attardi. Generated on 4 Mar 2011 by doxygen 1.6.1.