# A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants

(Neal and Hinton 1998) in *Learning in Graphical Models*, ed. Michael I. Jordan.

Expectation–Maximization (EM) is an algorithm that performs alternating optimization. It starts at an initial guess of a model’s parameters, then alternately finds distributions of the unknown variables (E-step), then re-estimates the parameters to maximize the likelihood (M-step).

There are variants of EM where the M-step doesn’t maximize, but still improves, the likelihood. The E-step can also be partial, and this paper is the first to formally treat that style of EM.

EM can be seen as maximizing the free energy: the E-step maximizes it with respect to latent variables, and M-step maximizes it with respect to the parameters.

Let’s say that we’ve observed , but not Z. Their joint distribution is parameterized by Y: we’re trying to maximize the log-likelihood of the data: **θL(θ) = log P(z |
θ)**. Both regular and partial EM are guaranteed to never diminish this quantity. |