Expectation_Maximization-Based

Expectation Maximization-Based Algorithms

The expectation maximization (EM) method [8] is a widely used powerful algorithm in machine learning, including population modeling of pharmacokinetics (PK) and pharmacodynamic (PD) systems. EM treats data in terms of ‘complete’ and ‘missing’. By using Bayes Theorem, the missing data are integrated out, and the parameters of the parametric model are learned automatically through iterations between the expectation steps (E-step) and the maximization steps (M-step). An EM-algorithm-based exact maximum likelihood solution to the parametric population modeling problem was proposed by Schumitzky [9] in 1995 and fully implemented by Walker [10] in 1996 for non-mixture models. For mixture models, the corresponding formulas were derived in reference [3].

An EM algorithm needs to address two important problems.

The convergence problem. EM algorithms converge to a stationary point [11] [12] [3] of the likelihood function from the given initial conditions. However, a stationary point can be a local maximum, a local minimum, or a saddle point, and therefore it may not be the global maximum of the likelihood function.

The compute-intensive calculations problem. A standard method to increase the probability of finding the global maximum is to repeatedly run the algorithm, initializing each run with different conditions. This of course is computationally intense, so developing efficient methods to avoid converging on solutions which are not truly maximally likely is an important task.

Advantages of EM Methods

No formal numerical optimization procedure is necessary to optimize the overall likelihood (or approximation to the likelihood).

This contrasts with methods such as FO, FOCE-ELS, and Laplacian, which use numerical optimization procedures. These procedures, particularly in combination with numerical derivatives, are fragile and can easily fail or be unstable.

The EM procedures rely on numerical integration to obtain the means and covariances of the posteriors. Numerical integration is inherently much more stable and reliable than numerical differentiation and optimization.

The methods may be made as accurate as desired (i.e., they can produce estimates arbitrarily close to the true maximum likelihood estimate).

This is done by simply increasing the accuracy of the numerical integration procedure, typically by increasing the number of points at which the integrand is sampled.

This contrasts with FO, FOCE-ELS, and Laplacian, which are inherently limited in accuracy by the likelihood approximations they employ and may produce results quite different from the true maximum likelihood estimates.

Sampling

The key step in most EM methods is computing the means and covariances of the posterior distributions. One common approach is to use Monte Carlo (MC) sampling of the posteriors, assisted by importance sampling techniques. In this case, the method is usually called MCPEM (or in the case of NONMEM, IMP). Each sample is drawn from a convenient importance sampling distribution, such as a multivariate normal that approximates the target posterior, and then each sample is weighted by the likelihood ratio of the target distribution to the importance sampling distribution evaluated at that sample. The means and covariances of the weighted samples are then used as approximations to the desired true means and covariances of the posteriors.

Certara’s NLME engine includes the QRPEM and IT2S-EM algorithms. For a more mathematical look at the methods used by Expectation-Maximization-based algorithms, see:

“Explore the Formulation of EM ”

“Standard Error (SE) Calculations”