Tanl Linguistic Pipeline |
Public Types | |
typedef std::vector < Tanl::Classifier::PID > | X |
typedef Tanl::Classifier::ClassID | Y |
typedef std::pair< X, Y > | Case |
typedef std::vector< Case > | Cases |
typedef std::vector< Case * > | ValidationSet |
Public Member Functions | |
MlpModel (int numFeatures, int numOutcomes, int numHidden, int numLayers=1) | |
list< Event * > | collectEvents (Enumerator< Sentence * > &sentenceReader, GlobalInfo &info) |
Collect events from sentenceReader . | |
void | buildCases (list< Event * > &events, Cases &cases) |
Create numeric. | |
double | train (Case &, int &) |
Compute the gradients with respect to negative log likelihodd:. | |
void | train (Cases &cases, int epoch, ofstream &ofs) |
Train model with. | |
void | validate (ValidationSet &vs, double &avg, double &std) |
int | crossentropy_softmax (Vector &x, double sm[]) |
Compute:. | |
Vector | gradCrossentropy (Vector &x, int y) |
int | estimate (std::vector< PID > &features, double prob[]) |
void | load (ifstream &ifs, char const *file="") |
void | save (ofstream &ofs) |
void | writeLabels (ofstream &ofs) |
streampos | writeData (ofstream &ofs) |
void | clearLabels () |
Protected Attributes | |
Matrix | w1 |
Matrix | w2 |
Matrix | wh |
Vector | b1 |
Vector | b2 |
Vector | bh |
int | numLayers |
number of hidden layers | |
int | numHidden |
number of hidden variables | |
int | numFeatures |
number of features | |
WordIndex | outcomeIndex |
void Parser::MlpModel::buildCases | ( | list< Event * > & | events, | |
Cases & | cases | |||
) |
Create numeric.
cases | out of training | |
events. |
References numFeatures, numLayers, and Tanl::Classifier::Classifier::verbose.
Referenced by Parser::MlpParser::train().
int Parser::MlpModel::crossentropy_softmax | ( | Vector & | x, | |
double | sm[] | |||
) |
Compute:.
softmax(x)[i] = exp(x[i]) / sum_j(exp(x[j]))
We compute this by subtracting off the max of x. This avoids numerical instability.
m = max_j x[j] softmax(x)[i] = exp(x[i] -m) / sum_j(exp(x[j] - m))
Negative log likelihood at index t is:
nll(x,t) = -log(softmax(x)[t])
Referenced by estimate(), and train().
int Parser::MlpModel::estimate | ( | std::vector< PID > & | features, | |
double | prob[] | |||
) |
References crossentropy_softmax(), and numLayers.
Referenced by Parser::MlpParser::parse().
void Parser::MlpModel::train | ( | Cases & | cases, | |
int | epoch, | |||
ofstream & | ofs | |||
) |
Train model with.
cases,performing | ||
epoch | iterations, saving intermediate model weights to | |
ofs. |
References Parser::MovingAverage::add(), Parser::Parser::procStat(), train(), Tanl::Classifier::Classifier::verbose, and writeData().
double Parser::MlpModel::train | ( | Case & | cas, | |
int & | argmax | |||
) |
Compute the gradients with respect to negative log likelihodd:.
x = h w2 + b2 xw1 = SUM_f w1[f] h = softsign(xw1 + b1) h' = 1 / (1 + abs(xw1 + b1))^2 nll = -x[t] + log(Sum_j(exp(x[j]))) d nll/dx = - d x[t]/dx + 1/Sum_j(exp(x[j])) Sum_j(exp(x[j]) d x[j]/ dx) = [0 .. -1 .. 0] + 1
d nll/dw1 = dnll/dx dx/dw1 = dnll/dx (dh/dw1 w2) = dnll/dx (h' w2) dxw1/dx d nll/db1 = dnll/dx dx/db1 = dnll/dx (dh/db1 w2) = dnll/dx (h' w2) d nll/dw2 = dnll/dx dx/dw2 = dnll/dx h d nll/db2 = dnll/dx dx/db2 = dnll/dx
Return in
argmax | the most likely result. |
References crossentropy_softmax(), and numLayers.
Referenced by Parser::MlpParser::train(), and train().
streampos Parser::MlpModel::writeData | ( | ofstream & | ofs | ) |
References numFeatures, numHidden, and numLayers.
Referenced by Parser::MlpParser::train(), and train().