FOCE Lindstrom-Bates (FOCE L-B)
FOCE L-B applies only to observational data that are continuous and modeled with a Gaussian likelihood. As with FOCE ELS and FO, the random effects (ETAs) are assumed to be normally distributed as are the residuals (EPSs). Overall this implies that a first-order linearization of the model with respect to ETAs and EPSs will have a normal (Gaussian) distribution of values when evaluated at a given time point for a random individual.
FOCE L-B solves a sequence of linearized mixed effects problems. Each iteration consists of the following steps:
Conditional step: for each individual, Phoenix finds the optimal ETA values corresponding to the current (THETA, SIGMA, OMEGA) estimates by maximizing the joint log likelihood with respect to the ETAs. This optimization is performed with a quasi-Newton optimization algorithm also used in step 3 as well as other Phoenix engines.
Linearize the model function with respect to the ETAs around the optimal ETA values computed in step 1. The linearization is used to compute an FOCE approximation of the marginal log likelihood function.
Solve the linearized mixed effects problem by minimizing the FOCE approximation to the overall negative marginal log likelihood of the linearized problem to obtain a new set of estimates (THETA, SIGMA, OMEGA).
The iterations are repeated until convergence, which in the Phoenix Model is defined by reduction of the gap, or the difference between starting and final optimal log likelihood values for the current linearized problem, to less than a specified tolerance. In Phoenix, the specified tolerance is 0.001. The progress of the current gap value computation is displayed in an external window.
FOCE L-B usually converges, but there is no theoretical guarantee of convergence and both oscillatory and divergent behavior might occasionally occur. The final converged parameter values represent the optimal FOCE solution to the final linearized problem, and are usually, but not necessarily, very close to the optimal FOCE ELS solution.
Note that convergence need not be monotonic. That is, the gaps do not always decrease, and the log likelihoods of the solutions to the linearized problems do not necessarily improve from iteration to iteration.
First Order Conditional Estimation-Extended Least Squares, which is applicable to Gaussian data only, is essentially equivalent to the NONMEM FOCE methodology with interaction and is based on minimizing an extended least squares objective function that represents the FOCE approximation to the negative log of the marginal likelihood as a function of (THETA, SIGMA, OMEGA). Unlike FOCE L-B which requires optimization of approximate marginal log likelihoods for a sequence of linearized mixed effects problems, FOCE ELS conceptually involves only a single optimization of a top-level approximate marginal log likelihood function.
However, each evaluation of the objective function in the top level minimization requires a conditional step in the form of an inner optimization of the joint log likelihood for each subject with respect to ETA at the current (THETA, SIGMA, OMEGA) values. The inner optimizations, which are the same calculations that are performed in step 1 of FOCE L-B, are nested within the overall optimization process but many more of them are required than in FOCE L-B. Therefore FOCE ELS is usually considerably slower than FOCE L-B.
The modeling computation window shows the current value of the FOCE marginal log likelihood for the current iteration. Iterations of FOCE L-B are defined in terms of iterations of the underlying quasi-Newton optimization algorithm as applied to the top level minimization and generally consist of a gradient evaluation followed by a line search along a direction computed from the gradient and previous iterates. Unlike FOCE L-B, progress is monotonic from iteration to iteration in the absence of numerical difficulties that require internal restarts.
Both FOCE L-B and FOCE ELS use “interaction”, which means that the individual prediction, which is obtained by using the current optimal ETA estimates to compute the model prediction function, is used to evaluate the residual error model. In contrast, the FO algorithm uses the population prediction, obtained by setting ETA=0, to evaluate the residual error model. The interaction computation is usually regarded as leading to more accurate overall estimates of THETA, SIGMA, and OMEGA.
First Order is applicable to Gaussian data only. The FO engine is similar to the FOCE ELS algorithm in that it requires a single top level minimization of an objective function representing an approximate negative marginal log likelihood. But the FO objective is simpler and much faster to evaluate than the FOCE ELS objective and does not require any nested conditional optimizations. The FOCE approximation used in FOCE ELS and FOCE L-B requires a model linearization around the current optimal ETAs computed in a conditional step, the FO approximation always linearizes around ETA=0. This means that FO gains speed by omitting all conditional steps, but in doing so sacrifices accuracy. FO is often the fastest and most reliably convergent method, but produces results with the poorest statistical quality in terms of bias and RMSE with respect to true values.
Also, for purposes of computing statistical weights in the residual error model, FO uses the population prediction obtained with ETA=0. This is less accurate than the individual prediction used by other methods and also contributes to the relatively poor statistical quality of FO estimates.
Iterations for FO simply correspond to iterations of the quasi-Newton optimization algorithm. In principle only a single pass through the quasi-Newton method is required, but the Phoenix Model implementation repeats the optimization from the final results of the previous run until successive runs output the same log likelihood to within a tolerance of 0.001.
Applicable to both Gaussian data and data modeled with general user supplied likelihoods such as count, categorical, and time-to-event data. The Laplacian method has the same basic structure as FOCE ELS in that a top level minimization of an approximation to the marginal negative log likelihood is performed, but the details of the approximation are different and somewhat more computationally complex than FOCE ELS.
The Laplacian engine is based on approximating the marginal log likelihood with a Laplacian approximation to the integral of the joint log likelihood. This replaces the joint log likelihood with a “nearby” Gaussian likelihood called the Gaussian kernel that can be integrated analytically. The determination of the approximating Gaussian function requires both a conditional step as well as a numerical evaluation of the second derivatives of the joint log likelihood function at each point where the top-level objective function is evaluated. This is usually more computationally intensive and less numerically reliable than a FOCE ELS objective function evaluation. Laplacian is often regarded as slightly more accurate but slower and numerically less reliable than FOCE algorithms in Gaussian data cases where FOCE is applicable.
Note:The Laplacian method with FOCE Hess selected is the same as NONMEM's FOCE engine when NAGQ=1. When FOCE Hess is not selected, it is similar to NONMEM's Laplacian engine.
Applicable to Gaussian and user-defined log likelihood data. The Naive pooled engine, when applied to population data, treats all observations as if they came from a single individual in that it ignores inter-individual variations in ETA values. All ETAs are forced to zero, and no OMEGA parameters are computed, only THETA and SIGMA. The engine can also be applied to a single individual, or to individuals separately as a series of individual fits in a multiple individual dataset, in addition to application to all individuals collectively in a population model. When applied to all individuals in population mode, the engine pools the data for evaluation into a single individual log likelihood function that contains no random effect parameters, but respects inter-individual differences in dosing and covariate values.
The engine minimizes the exact negative log likelihood, either as a Gaussian or user specified function. No approximations are necessary since there is no population distribution and hence no joint likelihood to integrate. The minimization is performed by the same quasi-Newton algorithm as used in the other engines. As with FO, FOCE ELS, and Laplacian, iterations simply correspond to iterations of the quasi-Newton optimization algorithm. Also as with FO, in principle only a single pass through the quasi-Newton method is required, but the PHX implementation repeats the optimization from the final value of the previous run until successive runs yield the same log likelihood to within a tolerance of 0.001.
Iterative Two-Stage — Expectation-Maximization (IT2S-EM)
Applies to all types of data, including continuous data, that is modeled with a Gaussian (normal) likelihood, as well as count, categorical, and time-to-event data for which the user must specify a likelihood function). IT2S-EM iteratively performs IT2S and EM-like steps, attempting to improve the approximate marginal likelihood at each iteration. It is not a true EM engine such as that in MONOLIX in that the THETA and SIGMA updates follow an iterative two-stage strategy, while the OMEGA update uses an EM strategy. Also, unlike true EM, the ETA estimates are modes of the joint density, where EM uses means.
The iteration sequence is as follows:
Conditional step: for current values of (THETA, SIGMA, OMEGA), for each individual compute an optimal ETA (also known as an empirical Bayesian or POSTHOC estimate). These ETAs maximize the joint likelihood defined by the product of the distribution of the individual residuals, conditioned on ETA, and the population distribution of ETAs. Algorithms such as IT2S-EM, FOCE L-B, FOCE ELS, Laplacian, and Adaptive Gaussian Quadrature all require performance of this same joint likelihood optimization step and are called 'conditional methods' since the evaluation of the approximate marginal likelihood requires computing model predictions that are conditioned on using the ETA values computed by optimizing the joint likelihood.
Also compute covariance (uncertainty) estimates of the ETAs numerically by computing second derivatives of the joint log likelihood function with respect to the ETAs at the optimal ETA values. As a by-product of this computation, the Laplacian approximation to the marginal log likelihood function is obtained for the current (THETA, SIGMA, OMEGA) values.
Compute new estimates of THETA and SIGMA given the ETAs by optimizing the joint log likelihood function with respect to (THETA, SIGMA) with OMEGA and ETAs frozen at current values.
Compute new estimates of OMEGA from the ETAs and the uncertainties on the ETAs using the standard EM OMEGA update formula.
The progress of the computation in terms of the value of the current iteration's negative Laplacian log likelihood is displayed in the UI progress bar. Usually the likelihood improves from iteration to iteration, but there is no theoretical guarantee of this happening. The iterations stop based on lack of progress in the log likelihood over several iterations. This can either indicate convergence, or in some cases oscillatory or even divergent behavior. The best likelihood solution obtained before termination is reported. Since IT2S-EM fit is not an accurate likelihood maximum, standard error results are not reported, as they would also be inaccurate, and possibly not meaningful or even computable.
Often IT2S-EM makes rapid progress during the first few iterations even when the overall sequence of iterates does not converge. A useful strategy regardless of the convergence behavior can be to run a few iterations of IT2S-EM to get an improved starting solution for other engines.
Quasi-Random Parametric Expectation Maximization (QRPEM) is a member of a general class of NLME estimation procedures known as EM methods (in addition to QRPEM in Phoenix NLME, MCPEM from S-ADAPT, SAEM from MONOLIX and NONMEM, and IMP from NONMEM are some of the other currently available EM variants for PK/PD NLME estimation). EM methods for NLME are based on the observation that maximum likelihood estimation usually would become much easier if the true value of the structural parameters were known for each subject. In the simplest case, the maximum likelihood estimate of fixed and random effects parameters can be obtained in a simple single step from the empirical mean and covariance matrix of the known structural parameters.
While the true values of the structural parameters are generally not known, a posterior distribution of the structural parameters (or equivalently, the random effects) can easily be computed for each subject for a given current estimate of the fixed and random effects parameters. Accurately computed means and covariances of these posterior distributions form the basis for a simple computation of updated fixed and random effect estimates. These updated estimates can be shown to have an improved likelihood relative to the starting estimates as long as the posterior means and covariance computations are sufficiently accurate. This procedure can be iterated and will usually converge to the desired maximum likelihood estimates of all parameters.
One major advantage of EM methods is that no formal numerical optimization procedure in necessary to optimize the overall likelihood (or approximation to the likelihood). This contrasts with methods such as FO, FOCE ELS, and LAPLACIAN which rely on formal numerical optimization using numerical derivatives applied to an approximate likelihood. Numerical optimization procedures, particularly in combination with numerical derivatives, are relatively fragile and can easily fail or be unstable. In contrast, the EM procedures do not rely on numerical derivatives and numerical optimization but rather numerical integration to obtain the means and covariances of the posteriors. Numerical integration is inherently much more stable and reliable than numerical differentiation and optimization.
A second advantage of EM methods is that they may be made as accurate as desired (i.e., they can produce estimates arbitrarily close to the true maximum likelihood estimate). This is done by simply increasing the accuracy of the numerical integration procedure, typically by increasing the number of points at which the integrand is sampled. This contrasts with FO, FOCE ELS, and LAPLACIAN, which are inherently limited in accuracy by the particular likelihood approximations they employ, and which may produce results quite different from the true maximum likelihood estimates.
The key step in most EM methods is computing the means and covariances of the posterior distributions. One common approach is to use Monte Carlo (MC) sampling of the posteriors, assisted by importance sampling techniques. In this case the method is usually called MCPEM (or in the case on NONMEM, IMP). Each sample is drawn from a convenient importance sampling distribution such as a multivariate normal that approximates the target posterior, and then each sample is weighted by the likelihood ratio of the target distribution to the importance sampling distribution, evaluated at that sample. The means and covariances of the weighted samples are then used as approximations to the desired true means and covariances of the posteriors. Accuracy can be improved by simply taking more samples.
QRPEM is very similar to importance sampling-based MCPEM, with the exception that samples are no longer randomly drawn but rather taken from a low discrepancy quasi-random sequence (QRPEM uses a Sobol sequence, with a option for Owen or TF scrambling). For purposes of numerical integration, quasi-random sequences fill the domain of interest much more uniformly than random sequences, and usually provide far more accurate integral values for a given sample size.
Many models contain features such as non-linear covariate models, fixed effects not paired with a random effect in a structural parameter definition, or certain types of residual error model parameters. Such models require an auxiliary estimation procedure to obtain estimates for the fixed effects associated with these features. Generally this involves solving a simple but potentially quite large likelihood optimization model in those parameters, where each sample from each subject contributes a term. This can result in an unnecessarily computationally intensive problem involving a very large number of terms. In these cases QRPEM applies a resampling procedure called SIR (Sampling-Importance-Resampling) to prune the terms to a much smaller and more manageable number but in a theoretically valid manner. This greatly accelerates this auxiliary procedure without significant loss of accuracy.
Like the other accurate likelihood method AGQ in PHX, QRPEM can be applied to both Gaussian and user supplied log likelihood models. As the number of random effects increases, QRPEM becomes increasingly faster and more stable relative to AGQ. However, unlike AGQ, there are two current limitations to the types of models that can be handled directly with QRPEM. The first limitation is that only linear covariate models in a structural parameter definition are allowed (the covariate model must be linear either before or after a log transform of the structural parameter definition). However, in general, models with nonlinear covariate models can be easily handled in QRPEM by a simple manual restructuring of the text model so the non-linear covariate model is applied outside of the initial structural parameter definition rather than inside. Second, if any covariate appearing in a a covariate effect has, for any subject, more than one value, such as if it is a time-varying covariate, QRPEM will not run. If a model of either unhandled type is encountered, QRPEM will immediately stop with an error message.
See also the “Structural parameters and QRPEM PML example”.
Adaptive Gaussian quadrature is a generalization of the Laplacian method, and when FOCE Hess is selected the FOCE ELS method, that uses numerical integration techniques to improve the likelihood approximation. It is applicable to both Gaussian and user-supplied log likelihood cases.
When Laplacian with FOCE Hess is selected with NAGQ=1, the resulting method is the same as FOCE ELS and very similar to NONMEM's FOCE engine with interaction. When FOCE Hess is not selected and NAGQ=1, it is similar to NONMEM's Laplacian engine. When NAGQ is set to a value greater than 1, the method has no NONMEM equivalent, and the quality of the likelihood approximation is improved over simple Laplacian or FOCE ELS.
The main difference is that the Laplacian approximation is replaced by a numerical integration step which samples the joint log likelihood function at a grid of ETA values in addition to the ETAs at the maximum. The initial steps are identical, a conditional step to get optimal ETAs followed by numerical second derivative evaluation at the optimal ETAs to compute a Gaussian kernel. In the simpler Laplacian case, the approximation is computed by using only using the joint likelihood value at the optimal ETA values in addition to the second derivative values, whereas AGQ continues with additional evaluations at special Gaussian quadrature nodes to improve the accuracy of the approximation.
The general AGQ method allows the user to specify the number N of nodes along each ETA variable axis, with the accuracy of the approximation as well as the computational cost increasing with Nd, where d is the number of random effects ETA for each individual. Because of this, AGQ is often most useful for improving estimation accuracy for models with small numbers d of random effects, as is often the case with user-supplied log likelihood functions, particularly for count data.
In the special case of Gaussian data, the user can optionally specify the use of a FOCE approximation to compute the Gaussian kernel covariance matrix instead of numerical second derivatives. This is more stable than a numerical second derivative computation.
Applicable to Gaussian or user-defined likelihood data. The nonparametric engine is intended to be run as a post-processor after one of the parametric engines has been run. It makes no assumptions regarding the random effects distribution, conceptually modeling the distribution as a discrete distribution on an arbitrarily fine grid in random effects space. It can be used, for example, to detect bimodality in a parameter such as a clearance. In the nonparametric log likelihood function, the parameters to be fit are the probabilities associated with each grid point in the random effects space.
If the grid is very fine, there can be an enormous number of these probabilities. However, mathematically it can be shown that at the maximum likelihood distribution, almost all of the probabilities are zero, which can be used to simplify the computation. The optimal nonparametric distribution takes the form of a discrete distribution on at most N support points, that is, at most N of the probabilities are non-zero, regardless of how many starting grid points were used.
An iteration for the nonparametric engine involves:
Selection of a set of candidate support points, which usually includes all of the support points with non-zero probability from the previous iteration plus generation of some additional candidates that are likely to improve the likelihood of the nonparametric distribution.
Computation of the optimal probabilities on the candidate support points.
The user specifies the number of iterations to apply. On the first iteration, the support points are set at the optimal post-hoc estimates from the initial parametric run. Any fixed effects associated with the residual error model or covariate models are frozen to the values from the parametric run for all of the iterations. A specially designed convex primal dual optimization engine then computes optimal probabilities on these support points. The results of the first iteration are in principle the same as the results of the nonparametric algorithm in NONMEM. However, NONMEM cannot perform any additional iterations, and the final NONMEM support points are fixed at the parametric POSTHOC values, which can be highly suboptimal.
If subsequent iterations are desired, Phoenix first discards any current iteration support points with a probability of zero, and then introduces additional candidate support points and the primal dual algorithm is reapplied to compute a new discrete distribution, which in general will include at least some of the new candidate support points. From iteration to iteration the likelihood improves monotonically, and the support points migrate to optimal positions. The Phoenix algorithm has the capability of optimizing both probabilities and support point positions using multiple iterations. The NONMEM nonparametric algorithm can only perform a single pass that optimizes probabilities on support points fixed at POSTHOC estimates from a preceding parametric run.
The primary raw result of the Phoenix nonparametric algorithm is the optimal discrete distribution of ETAs in terms of support points and associated probabilities. The means, any covariances, and marginal distributions of each ETA of this distribution are reported. In addition to the optimal population distribution, each individual has a discrete posterior distribution from which a mean ETA value can be computed. Tables of nonparametric ETA means are produced, as are covariate vs. nonparametric ETA mean plots, which can be used to screen for potential covariate relationships.
Note:If the Phoenix Model engine gives an exception, it is a general exception caused by a bad fit to data. Should an exception occur, try reconsidering the engine, initial parameters estimates, and number of compartments.
Last modified date:6/26/19
Legal Notice | Contact Certara
© 2019 Certara USA, Inc. All rights reserved.