Tanl Linguistic Pipeline |
Yoshua Bengio:. More...
Public Member Functions | |
void | add (double v) |
Add value. | |
Public Attributes | |
double | mean |
double | variance |
int | count |
Yoshua Bengio:.
My preferred style of moving average is the following. Let's say you have a series x_t and you want to estimate the mean m of previous (recent) x's:
m <-- m + (2/t) (x_t - m)
Note that with (1/t) learning rate instead of (2/t) you get the exact historical average. With a larger learning rate (like 2/t) you give a bit more importance to recent stuff, which makes sense if x's are non-stationary (very likely here [in the setting of computing the moving average of the training error]). With a constant learning rate (independent of t) you get an exponential moving average.
You can estimate a running average of the gradient variance by running averages of the mean gradient and of the square of the difference to the moving mean.
void Parser::MovingAverage::add | ( | double | v | ) | [inline] |