Tanl Linguistic Pipeline

Tanl::Classifier::MaxEnt Class Reference
[Classifier]

A Maximum Entropy classifier. More...

#include <MaxEnt.h>

Inheritance diagram for Tanl::Classifier::MaxEnt:
Tanl::Classifier::Classifier Tanl::Classifier::GIS Tanl::Classifier::LBFGS

List of all members.

Public Member Functions

 MaxEnt (int iterations, int cutoff)
 MaxEnt (char const *file)
 MaxEnt (std::istream &ifs)
void estimate (Context &context, double prob[])
 Evaluate a context and return an array of the likelihood of each outcome in that context.
void estimate (Features &features, double prob[])
 Evaluate a set of features and return an array of the likelihood of each outcome in that context.
ClassID BestOutcome (double *ocs) const
 Return the ID of the outcome corresponding to the highest likelihood in.
void load (std::istream &is)
 Load the model from file.
void save (char const *file)
 Save the model to file.
void read (EventStream &eventStream)
 Reads events from eventStream into a linked list.

Protected Types

typedef unordered_map
< std::pair< ClassID,
std::vector< PID > >, int > 
EventMap

Protected Member Functions

void readEvent (Event *ev)
 Consume a trainig event.
ClassID estimate (const std::vector< PID > &predicates, double alpha[])
 Estimates the conditional probabilities p(oid|cxt) for a given context.
void loadZhang (std::istream &is)
 Load model saved in the format of Zhang maxent implementation.

Static Protected Member Functions

static int buildIndex (std::list< Event * > &events, Text::WordIndex &predIndex, EventMap &eventMap, std::vector< char const * > &outcomeLabels, int evCutoff, bool verbose)

Protected Attributes

FeatureMap lambda
 the model parameters
PID numPreds
 = predLabels.size()
int numTokens
 # of unique events
EventMap eventMap
 occurrences of unique events
unsigned cutoff
 discard predicates below this frequency
int iterations
 steps of the algorithm
size_t correctionConstant
double correctionParam
std::list< Event * > events
WordCounts counter
int pID

Friends

std::ostream & operator<< (std::ostream &s, MaxEnt const &m)

Detailed Description

A Maximum Entropy classifier.

See Adwait Ratnaparkhi's tech report at the University of Pennsylvania, available at ftp://ftp.cis.upenn.edu/pub/ircs/tr/97-08.ps.Z.


Member Function Documentation

ClassID Tanl::Classifier::MaxEnt::BestOutcome ( double *  ocs  )  const

Return the ID of the outcome corresponding to the highest likelihood in.

Parameters:
ocs. 
ocs A double[] as returned by the estimate() method.
Returns:
The classID of the most likely outcome.

Reimplemented from Tanl::Classifier::Classifier.

Referenced by Tanl::SST::SSTPipe::Current(), Parser::MeParser::parse(), and Parser::MeParser::revise().

ClassID Tanl::Classifier::MaxEnt::estimate ( const std::vector< PID > &  predicates,
double  alpha[] 
) [protected]

Estimates the conditional probabilities p(oid|cxt) for a given context.

Parameters:
predicates from a context.
alpha the estimated probabilities p(oid|cxt) for each outcome oid.

References lambda.

void Tanl::Classifier::MaxEnt::estimate ( Features features,
double  prob[] 
) [inline]

Evaluate a set of features and return an array of the likelihood of each outcome in that context.

Parameters:
features The set of features which have been observed at the present decision point.
Returns:
The normalized probabilities for the outcomes given the context. The indexes of the double[] are the outcome ids, and the actual string representation of the outcomes can be obtained from the method getOutcome(int i).

References estimate().

void Tanl::Classifier::MaxEnt::estimate ( Context context,
double  prob[] 
) [inline]

Evaluate a context and return an array of the likelihood of each outcome in that context.

Parameters:
context Represents the set of features which have been observed at the present decision point.
Returns:
The normalized probabilities for the outcomes given the context. The indexes of the double[] are the outcome ids, and the actual string representation of the outcomes can be obtained from the method getOutcome(int i).

Referenced by Tanl::SST::SSTPipe::Current(), estimate(), Parser::MeParser::parse(), Parser::MeParser::revise(), and Tanl::NER::NER::tag().

void Tanl::Classifier::MaxEnt::load ( std::istream &  is  ) 

Load the model from file.

Loader recognizes also text model files produced by Zhang Maxent Toolkit: (http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html).

Reimplemented from Tanl::Classifier::Classifier.

void Tanl::Classifier::MaxEnt::loadZhang ( std::istream &  is  )  [protected]

Load model saved in the format of Zhang maxent implementation.

void Tanl::Classifier::MaxEnt::read ( EventStream eventStream  ) 

Reads events from eventStream into a linked list.

Consider all events = (cID, [pred1, ..., predk]) Assign pID to each predicate that occurrs > cutoff times. The predicates associated with each event are counted and any which occurs at least cutoff times is added to predLabels. Classes which are outcome of a retained event will be added to outcomeLabels.

References readEvent(), and Tanl::Classifier::Classifier::verbose.

Referenced by Parser::MeParser::train(), Tanl::Classifier::LBFGS::train(), and Tanl::Classifier::GIS::train().

void Tanl::Classifier::MaxEnt::save ( char const *  file  ) 

Save the model to file.

Format for the GIS maxent (.mem) files.

This format can be memory mapped.

1. GIS (model type identifier)

2. the correction constant (int)

3. the correction parameter (double)

4. # of outcomes (int)

  • list of outcome names (string)

5. # of predicates (int)

  • list of predicate names (string)

6. parameters

  • # of groups (i.e. predicates with same set of outcomes)
  • The following repeated for each group a. group size (gs), # group outcomes
    • The following repeated for each outcome: 1. outcome (i) 2. param[n + j, i], for 0 <= j < gs

Example of 5. and 6.: 7(# preds) Sunny(first pred. name) Happy Dry Humid Sad Cloudy Rainy 3(# groups) 1 1(group 1: 1 predicate, 1 outcome) 0(outcome 0) 2.4005893(param[0, 0]) 5 2(group 2: 5 predicates, 2 outcomes) 0(outcome 0) 2.1392054(param[1, 0]) -0.3270814(param[2, 0]) 0.2927261(param[3, 0]) -1.9319866(param[4, 0]) 1.1981091(param[5, 0]) 1(outcome 1) -4.7484765(param[1, 1]) 0.3342510(param[2, 1]) -0.3882752(param[3, 1]) 2.0065205(param[4, 1]) -2.1725304(param[5, 1]) 1 1(group 3: 1 predicate, 1 outcome) 1(outcome 1) 3.5907883(param[6, 1])

Reimplemented from Tanl::Classifier::Classifier.

Reimplemented in Tanl::Classifier::LBFGS.

References lambda, and numPreds.


The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines
 
Copyright © 2005-2011 G. Attardi. Generated on 4 Mar 2011 by doxygen 1.6.1.