Updaters

Table of Contents Overview BatchNorm Updater IncNorm Updater Grad(ient) Updater

Overview

An updater is a generalization of a gradient optimization step. Several updaters do perform gradient optimization, but there are others that do something different. e.g. multiplicative updates are used in NMF, and models based on quotients of accumulated data are updated by recomputing the quotient on moving averages. Updating is broken out from the generation of the gradient (or quotient pair) which happens in the model's update method. Updaters support a few methods:

init: to initialize any params and working storage for the updater.
update: the basic update called by the learner on each minibatch. 
updateM: an update called at the end of a pass over the dataset. 
clear: clears working storage, called at the beginning of a dataset pass.

BatchNorm Updater

Used for batch mode inference in LDA, NMF and Gibbs sampling. Accumulates update data into numerator and denominator "accumulator" matrices during a pass over the dataset. At the end of the pass, it updates the model by computing the ratio of the accumulators.

IncNorm Updater

The incremental version of the Batch Norm updater. Both numerator and denominator are updated using moving averages.

<math>M^{(t)} = \alpha U^{(t)} + (1-\alpha) M^{(t-1)} </math>

where <math>M^{(t)}</math> is the matrix at time <math>t</math>, <math>U^{(t)}</math> is the update at time <math>t</math>, and <math>\alpha</math> is defined as

<math>\alpha = \left( \frac{1}{t}\right)^p</math>

A value of <math>p=1</math> (set with the power option to the updater) gives a uniform average of the updates up to time <math>t</math>. Smaller values of <math>p</math> weight recent data more heavily. A value of <math>p=0.5</math> mimics the temporal envelope of ADAGRAD. In practice values smaller than 0.5 often give best performance. The default value is 0.3 which works well with most of the models.

Grad(ient) Updater

This updater just adds gradient updates to the model, using a decay schedule. It includes a few options:

lrate:FMat(1f): the learning rate, or scale factor applied to gradients at each step.
texp:FMat(0.5f): the decay exponent.
waitsteps = 2: number of steps to wait before starting decay.
mask:FMat = null: a mask used to prevent updates to some values (e.g. constants in the model).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updaters

Table of Contents

Overview

BatchNorm Updater

IncNorm Updater

Grad(ient) Updater

Clone this wiki locally