Tanl Linguistic Pipeline |
A Maximum Entropy classifier. More...
#include <MaxEnt.h>
Public Member Functions | |
MaxEnt (int iterations, int cutoff) | |
MaxEnt (char const *file) | |
MaxEnt (std::istream &ifs) | |
void | estimate (Context &context, double prob[]) |
Evaluate a context and return an array of the likelihood of each outcome in that context. | |
void | estimate (Features &features, double prob[]) |
Evaluate a set of features and return an array of the likelihood of each outcome in that context. | |
ClassID | BestOutcome (double *ocs) const |
Return the ID of the outcome corresponding to the highest likelihood in. | |
void | load (std::istream &is) |
Load the model from file. | |
void | save (char const *file) |
Save the model to file. | |
void | read (EventStream &eventStream) |
Reads events from eventStream into a linked list. | |
Protected Types | |
typedef unordered_map < std::pair< ClassID, std::vector< PID > >, int > | EventMap |
Protected Member Functions | |
void | readEvent (Event *ev) |
Consume a trainig event. | |
ClassID | estimate (const std::vector< PID > &predicates, double alpha[]) |
Estimates the conditional probabilities p(oid|cxt) for a given context. | |
void | loadZhang (std::istream &is) |
Load model saved in the format of Zhang maxent implementation. | |
Static Protected Member Functions | |
static int | buildIndex (std::list< Event * > &events, Text::WordIndex &predIndex, EventMap &eventMap, std::vector< char const * > &outcomeLabels, int evCutoff, bool verbose) |
Protected Attributes | |
FeatureMap | lambda |
the model parameters | |
PID | numPreds |
= predLabels.size() | |
int | numTokens |
# of unique events | |
EventMap | eventMap |
occurrences of unique events | |
unsigned | cutoff |
discard predicates below this frequency | |
int | iterations |
steps of the algorithm | |
size_t | correctionConstant |
double | correctionParam |
std::list< Event * > | events |
WordCounts | counter |
int | pID |
Friends | |
std::ostream & | operator<< (std::ostream &s, MaxEnt const &m) |
A Maximum Entropy classifier.
See Adwait Ratnaparkhi's tech report at the University of Pennsylvania, available at ftp://ftp.cis.upenn.edu/pub/ircs/tr/97-08.ps.Z
.
ClassID Tanl::Classifier::MaxEnt::BestOutcome | ( | double * | ocs | ) | const |
Return the ID of the outcome corresponding to the highest likelihood in.
ocs. | ||
ocs | A double[] as returned by the estimate() method. |
Reimplemented from Tanl::Classifier::Classifier.
Referenced by Tanl::SST::SSTPipe::Current(), Parser::MeParser::parse(), and Parser::MeParser::revise().
ClassID Tanl::Classifier::MaxEnt::estimate | ( | const std::vector< PID > & | predicates, | |
double | alpha[] | |||
) | [protected] |
Estimates the conditional probabilities p(oid|cxt) for a given context.
predicates | from a context. | |
alpha | the estimated probabilities p(oid|cxt) for each outcome oid. |
References lambda.
void Tanl::Classifier::MaxEnt::estimate | ( | Features & | features, | |
double | prob[] | |||
) | [inline] |
Evaluate a set of features and return an array of the likelihood of each outcome in that context.
features | The set of features which have been observed at the present decision point. |
References estimate().
void Tanl::Classifier::MaxEnt::estimate | ( | Context & | context, | |
double | prob[] | |||
) | [inline] |
Evaluate a context and return an array of the likelihood of each outcome in that context.
context | Represents the set of features which have been observed at the present decision point. |
Referenced by Tanl::SST::SSTPipe::Current(), estimate(), Parser::MeParser::parse(), Parser::MeParser::revise(), and Tanl::NER::NER::tag().
void Tanl::Classifier::MaxEnt::load | ( | std::istream & | is | ) |
Load the model from file.
Loader recognizes also text model files produced by Zhang Maxent Toolkit: (http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html).
Reimplemented from Tanl::Classifier::Classifier.
void Tanl::Classifier::MaxEnt::loadZhang | ( | std::istream & | is | ) | [protected] |
Load model saved in the format of Zhang maxent implementation.
void Tanl::Classifier::MaxEnt::read | ( | EventStream & | eventStream | ) |
Reads events from eventStream
into a linked list.
Consider all events = (cID, [pred1, ..., predk]) Assign pID to each predicate that occurrs > cutoff times. The predicates associated with each event are counted and any which occurs at least cutoff
times is added to predLabels
. Classes which are outcome of a retained event will be added to outcomeLabels
.
References readEvent(), and Tanl::Classifier::Classifier::verbose.
Referenced by Parser::MeParser::train(), Tanl::Classifier::LBFGS::train(), and Tanl::Classifier::GIS::train().
void Tanl::Classifier::MaxEnt::save | ( | char const * | file | ) |
Save the model to file.
Format for the GIS maxent (.mem) files.
This format can be memory mapped.
1. GIS (model type identifier)
2. the correction constant (int)
3. the correction parameter (double)
4. # of outcomes (int)
5. # of predicates (int)
6. parameters
Example of 5. and 6.: 7(# preds) Sunny(first pred. name) Happy Dry Humid Sad Cloudy Rainy 3(# groups) 1 1(group 1: 1 predicate, 1 outcome) 0(outcome 0) 2.4005893(param[0, 0]) 5 2(group 2: 5 predicates, 2 outcomes) 0(outcome 0) 2.1392054(param[1, 0]) -0.3270814(param[2, 0]) 0.2927261(param[3, 0]) -1.9319866(param[4, 0]) 1.1981091(param[5, 0]) 1(outcome 1) -4.7484765(param[1, 1]) 0.3342510(param[2, 1]) -0.3882752(param[3, 1]) 2.0065205(param[4, 1]) -2.1725304(param[5, 1]) 1 1(group 3: 1 predicate, 1 outcome) 1(outcome 1) 3.5907883(param[6, 1])
Reimplemented from Tanl::Classifier::Classifier.
Reimplemented in Tanl::Classifier::LBFGS.