Tanl Linguistic Pipeline

Tanl::Classifier::GIS Class Reference
[Classifier]

Generalized Iterative Scaling algorithm to find the parameters {lambda1, . More...

#include <GIS.h>

Inheritance diagram for Tanl::Classifier::GIS:
Tanl::Classifier::MaxEnt Tanl::Classifier::Classifier

List of all members.

Public Member Functions

 GIS (EventStream &es, int iterations, int cutoff=0, double alpha=0.0)
 Create and train a model from events from a single EventStream.
 GIS (int iterations=50, int cutoff=0, double alpha=0.0)
 The EventStream is supplied separately with method read().
void train ()
 Train a model on events read with previous calls to read().
void train (EventStream &es)
 Train a model on events read from.

Detailed Description

Generalized Iterative Scaling algorithm to find the parameters {lambda1, .

.., lambdak} of the unique distribution

p* = argmax H(p) p in P

where

P = { p | Epfj = Ep~fj, j = {1, .. , k}}

S = {(a1, b1), ... , (aN, bN)}, ai in A, bj in B (the trainig set)

p(x)the model p's probability of x

p~(x)observerd probability of x in S

Epfj = SUM p(x) fj(x) x in AxB

Ep~fj = SUM p~(x) fj(x) x in AxB

This version of GIS uses an Exponential prior to mitigate overfitting, and avoids the use of a slack variable (

See also:
: http://www.research.microsoft.com/~joshuago/longexponentialprior.ps).

An alternative would be the "Correction Free" GIS algorithm using Gaussian prior smoothing described in [Curran and Clark, 2003]: "Investigating GIS and Smoothing for Maximum Entropy Taggers" (

See also:
http://acl.ldc.upenn.edu/eacl2003/papers/main/p11.pdf).

Constructor & Destructor Documentation

Tanl::Classifier::GIS::GIS ( EventStream es,
int  iterations,
int  cutoff = 0,
double  alpha = 0.0 
)

Create and train a model from events from a single EventStream.

Parameters:
iterations the number of iterations to perform.
cutoff discard features occurring less than this value.
alpha defines the exponential prior of the lambda values in MaxEnt models.

References train().

Tanl::Classifier::GIS::GIS ( int  iterations = 50,
int  cutoff = 0,
double  alpha = 0.0 
)

The EventStream is supplied separately with method read().

Generalized Iterative Scaling uses an iterative algorithm to find the correct weights for a conditional exponential classifier.

This is useful to supply several streams in turn (for instance data from several files).

Parameters:
iterations the number of iterations to perform.
cutoff discard features occurring less than this value.
alpha defines the exponential prior of the lambda values in MaxEnt models (
See also:
: http://research.microsoft.com/en-us/um/people/joshuago/longexponentialprior.pdf).

It initially sets all weights to zero; it then iteratively updates the weights using the formula:

lambda[i] := lambda[i] * (E~f[i]/Ef[i]) ^ (1/C)

Where: lambda[i] is the weight of the ith feature. C is the correction constant. E~f[i] is the sum of the feature values for the ith feature over training texts. Ef[i] is the sum of the feature values for the ith feature that is predicted by the current model.


Member Function Documentation

void Tanl::Classifier::GIS::train ( EventStream es  ) 

Train a model on events read from.

Parameters:
es. 

References Tanl::Classifier::MaxEnt::read(), and train().


The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines
 
Copyright © 2005-2011 G. Attardi. Generated on 4 Mar 2011 by doxygen 1.6.1.