Tanl::Classifier::MaxEnt Class Reference
[Classifier]

A Maximum Entropy classifier. More...

#include <MaxEnt.h>

Inheritance diagram for Tanl::Classifier::MaxEnt:

Public Member Functions
	MaxEnt (int iterations, int cutoff)
	MaxEnt (char const *file)
	MaxEnt (std::istream &ifs)
void	estimate (Context &context, double prob[])
	Evaluate a context and return an array of the likelihood of each outcome in that context.
void	estimate (Features &features, double prob[])
	Evaluate a set of features and return an array of the likelihood of each outcome in that context.
ClassID	BestOutcome (double *ocs) const
	Return the ID of the outcome corresponding to the highest likelihood in.
void	load (std::istream &is)
	Load the model from file.
void	save (char const *file)
	Save the model to file.
void	read (EventStream &eventStream)
	Reads events from `eventStream` into a linked list.
Protected Types
typedef unordered_map < std::pair< ClassID, std::vector< PID > >, int >	EventMap
Protected Member Functions
void	readEvent (Event *ev)
	Consume a trainig event.
ClassID	estimate (const std::vector< PID > &predicates, double alpha[])
	Estimates the conditional probabilities p(oid\|cxt) for a given context.
void	loadZhang (std::istream &is)
	Load model saved in the format of Zhang maxent implementation.
Static Protected Member Functions
static int	buildIndex (std::list< Event * > &events, Text::WordIndex &predIndex, EventMap &eventMap, std::vector< char const * > &outcomeLabels, int evCutoff, bool verbose)
Protected Attributes
FeatureMap	lambda
	the model parameters
PID	numPreds
	= predLabels.size()
int	numTokens
	# of unique events
EventMap	eventMap
	occurrences of unique events
unsigned	cutoff
	discard predicates below this frequency
int	iterations
	steps of the algorithm
size_t	correctionConstant
double	correctionParam
std::list< Event * >	events
WordCounts	counter
int	pID
Friends
std::ostream &	operator<< (std::ostream &s, MaxEnt const &m)

Detailed Description

A Maximum Entropy classifier.

See Adwait Ratnaparkhi's tech report at the University of Pennsylvania, available at ftp://ftp.cis.upenn.edu/pub/ircs/tr/97-08.ps.Z.

Member Function Documentation

ClassID Tanl::Classifier::MaxEnt::BestOutcome ( double * ocs ) const

Return the ID of the outcome corresponding to the highest likelihood in.

Parameters:

	ocs.
	ocs	A double[] as returned by the estimate() method.

Returns:: The classID of the most likely outcome.

Reimplemented from Tanl::Classifier::Classifier.

Referenced by Tanl::SST::SSTPipe::Current(), Parser::MeParser::parse(), and Parser::MeParser::revise().

ClassID Tanl::Classifier::MaxEnt::estimate	(	const std::vector< PID > &	predicates,
		double	alpha[]
	)			`[protected]`

Estimates the conditional probabilities p(oid|cxt) for a given context.

Parameters:

	predicates	from a context.
	alpha	the estimated probabilities p(oid\|cxt) for each outcome oid.

References lambda.

void Tanl::Classifier::MaxEnt::estimate	(	Features &	features,
		double	prob[]
	)			`[inline]`

Evaluate a set of features and return an array of the likelihood of each outcome in that context.

Parameters:

features

The set of features which have been observed at the present decision point.

Returns:: The normalized probabilities for the outcomes given the context. The indexes of the double[] are the outcome ids, and the actual string representation of the outcomes can be obtained from the method getOutcome(int i).

References estimate().

void Tanl::Classifier::MaxEnt::estimate	(	Context &	context,
		double	prob[]
	)			`[inline]`

Evaluate a context and return an array of the likelihood of each outcome in that context.

Parameters:

context

Represents the set of features which have been observed at the present decision point.

Returns:: The normalized probabilities for the outcomes given the context. The indexes of the double[] are the outcome ids, and the actual string representation of the outcomes can be obtained from the method getOutcome(int i).

Referenced by Tanl::SST::SSTPipe::Current(), estimate(), Parser::MeParser::parse(), Parser::MeParser::revise(), and Tanl::NER::NER::tag().

void Tanl::Classifier::MaxEnt::load ( std::istream & is )

Load the model from file.

Loader recognizes also text model files produced by Zhang Maxent Toolkit: (http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html).

Reimplemented from Tanl::Classifier::Classifier.

void Tanl::Classifier::MaxEnt::loadZhang ( std::istream & is ) [protected]

Load model saved in the format of Zhang maxent implementation.

void Tanl::Classifier::MaxEnt::read ( EventStream & eventStream )

Reads events from eventStream into a linked list.

Consider all events = (cID, [pred1, ..., predk]) Assign pID to each predicate that occurrs > cutoff times. The predicates associated with each event are counted and any which occurs at least cutoff times is added to predLabels. Classes which are outcome of a retained event will be added to outcomeLabels.

References readEvent(), and Tanl::Classifier::Classifier::verbose.

Referenced by Parser::MeParser::train(), Tanl::Classifier::LBFGS::train(), and Tanl::Classifier::GIS::train().

void Tanl::Classifier::MaxEnt::save ( char const * file )

Save the model to file.

Format for the GIS maxent (.mem) files.

This format can be memory mapped.

1. GIS (model type identifier)

2. the correction constant (int)

3. the correction parameter (double)

4. # of outcomes (int)

list of outcome names (string)

5. # of predicates (int)

list of predicate names (string)

6. parameters

# of groups (i.e. predicates with same set of outcomes)
The following repeated for each group a. group size (gs), # group outcomes
- The following repeated for each outcome: 1. outcome (i) 2. param[n + j, i], for 0 <= j < gs

Example of 5. and 6.: 7(# preds) Sunny(first pred. name) Happy Dry Humid Sad Cloudy Rainy 3(# groups) 1 1(group 1: 1 predicate, 1 outcome) 0(outcome 0) 2.4005893(param[0, 0]) 5 2(group 2: 5 predicates, 2 outcomes) 0(outcome 0) 2.1392054(param[1, 0]) -0.3270814(param[2, 0]) 0.2927261(param[3, 0]) -1.9319866(param[4, 0]) 1.1981091(param[5, 0]) 1(outcome 1) -4.7484765(param[1, 1]) 0.3342510(param[2, 1]) -0.3882752(param[3, 1]) 2.0065205(param[4, 1]) -2.1725304(param[5, 1]) 1 1(group 3: 1 predicate, 1 outcome) 1(outcome 1) 3.5907883(param[6, 1])

Reimplemented from Tanl::Classifier::Classifier.

Reimplemented in Tanl::Classifier::LBFGS.

References lambda, and numPreds.

The documentation for this class was generated from the following files:

classifier/MaxEnt.h
classifier/MaxEnt.cpp

Tanl::Classifier::MaxEnt Class Reference [Classifier]

Public Member Functions

Protected Types

Protected Member Functions

Static Protected Member Functions

Protected Attributes

Friends

Detailed Description

Member Function Documentation

Tanl::Classifier::MaxEnt Class Reference
[Classifier]