Tanl Linguistic Pipeline |
Generalized Iterative Scaling algorithm to find the parameters {lambda1, . More...
#include <GIS.h>
Public Member Functions | |
GIS (EventStream &es, int iterations, int cutoff=0, double alpha=0.0) | |
Create and train a model from events from a single EventStream. | |
GIS (int iterations=50, int cutoff=0, double alpha=0.0) | |
The EventStream is supplied separately with method read(). | |
void | train () |
Train a model on events read with previous calls to read(). | |
void | train (EventStream &es) |
Train a model on events read from. |
Generalized Iterative Scaling algorithm to find the parameters {lambda1, .
.., lambdak} of the unique distribution
p* = argmax H(p) p in P
where
P = { p | Epfj = Ep~fj, j = {1, .. , k}}
S = {(a1, b1), ... , (aN, bN)}, ai in A, bj in B (the trainig set)
p(x)the model p's probability of x
p~(x)observerd probability of x in S
Epfj = SUM p(x) fj(x) x in AxB
Ep~fj = SUM p~(x) fj(x) x in AxB
This version of GIS uses an Exponential prior to mitigate overfitting, and avoids the use of a slack variable (
An alternative would be the "Correction Free" GIS algorithm using Gaussian prior smoothing described in [Curran and Clark, 2003]: "Investigating GIS and Smoothing for Maximum Entropy Taggers" (
Tanl::Classifier::GIS::GIS | ( | EventStream & | es, | |
int | iterations, | |||
int | cutoff = 0 , |
|||
double | alpha = 0.0 | |||
) |
Create and train a model from events from a single EventStream.
iterations | the number of iterations to perform. | |
cutoff | discard features occurring less than this value. | |
alpha | defines the exponential prior of the lambda values in MaxEnt models. |
References train().
Tanl::Classifier::GIS::GIS | ( | int | iterations = 50 , |
|
int | cutoff = 0 , |
|||
double | alpha = 0.0 | |||
) |
The EventStream is supplied separately with method read().
Generalized Iterative Scaling uses an iterative algorithm to find the correct weights for a conditional exponential classifier.
This is useful to supply several streams in turn (for instance data from several files).
iterations | the number of iterations to perform. | |
cutoff | discard features occurring less than this value. | |
alpha | defines the exponential prior of the lambda values in MaxEnt models ( |
It initially sets all weights to zero; it then iteratively updates the weights using the formula:
lambda[i] := lambda[i] * (E~f[i]/Ef[i]) ^ (1/C)
Where: lambda[i] is the weight of the ith feature. C is the correction constant. E~f[i] is the sum of the feature values for the ith feature over training texts. Ef[i] is the sum of the feature values for the ith feature that is predicted by the current model.
void Tanl::Classifier::GIS::train | ( | EventStream & | es | ) |
Train a model on events read from.
es. |
References Tanl::Classifier::MaxEnt::read(), and train().