A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants

(Neal and Hinton 1998) in Learning in Graphical Models, ed. Michael I. Jordan.

Expectation–Maximization (EM) is an algorithm that performs alternating optimization. It starts at an initial guess of a model’s parameters, then alternately finds distributions of the unknown variables (E-step), then re-estimates the parameters to maximize the likelihood (M-step).

There are variants of EM where the M-step doesn’t maximize, but still improves, the likelihood. The E-step can also be partial, and this paper is the first to formally treat that style of EM.

EM can be seen as maximizing the free energy: the E-step maximizes it with respect to latent variables, and M-step maximizes it with respect to the parameters.

Let’s say that we’ve observed Z, but not Y. Their joint distribution is parameterized by θ: we’re trying to maximize the log-likelihood of the data: **L(θ) = log P(z θ)**. Both regular and partial EM are guaranteed to never diminish this quantity.
Written on February 12, 2018