Linear mixed effects computations

The following sections cover the methods the Linear Mixed Effects object uses to create model results.

Restricted maximum likelihood

For linear mixed effects models that contain random effects, the first step in model analysis is deter­mining the maximum likelihood estimates of the variance parameters associated with the random effects. This is accomplished in LinMix using Restricted Maximum Likelihood (REML) estimation.

The linear mixed effects model described under “General linear mixed effects model” is:

y = Xb + Zlinmixcomp00828.png + e
V=Variance(y)=ZGZT+R 

(1)

 

Let linmixcomp00830.png be a vector consisting of the variance parameters in G and R. The full maximum likelihood pro­cedure (ML) would simultaneously estimate both the fixed effects parameters linmixcomp00832.png and the variance parameters linmixcomp00834.png by maximizing the likelihood of the observations y with respect to these parameters. In contrast, restricted maximum likelihood estimation (REML) maximizes a likelihood that is only a func­tion of the variance parameters linmixcomp00836.png and the observations y, and not a function of the fixed effects parameters. Hence for models that do not contain any fixed effects, REML would be the same as ML.

To obtain the restricted likelihood function in a form that is computationally efficient to solve, the data matrix [X | Z | y] is reduced by a QR factorization to an upper triangular matrix. Let [R0 | R1 | R2] be an orthogonal matrix such that the multiplication with [X | Z | y] results in an upper triangular matrix:

linmixcomp00838.png 

REML estimation maximizes the likelihood of linmixcomp00840.png based on yr and ye of the reduced data matrix, and ignores y0. Since ye has a distribution that depends only on the residual variance, and since yr and ye are independent, the negative of the log of the restricted likelihood function is the sum of the separate negative log-likelihood functions:

log(L(linmixcomp00842.png; yr, ye))=log(L(linmixcomp00844.png; yr))+log(L(linmixcomp00846.png; ye)) – (N–rank(X))/2log(2p)

(2)

where:

log(L(linmixcomp00848.png; yr))=–½ log (|Vr |) – ½ yr T Vr –1 yr 

log(L(linmixcomp00850.png; ye))=–ne/2 log(linmixcomp00852.pnge) – ½ (yeTye/linmixcomp00854.pnge)

Vr is the Variance(yr)

linmixcomp00856.pnge is the residual variance

ne is the residual degrees of freedom

N is the number of observations used

For some balanced problems, ye is treated as a multivariate residual:

linmixcomp00858.png 

in order to increase the speed of the program and then the log-likelihood can be computed by the equivalent form:

linmixcomp00860.png 

(3)

 

where V is the dispersion matrix for the multivariate normal distribution.

For more information on REML, see Corbeil and Searle (1976).

Newton algorithm for maximization of restricted likelihood

Newton’s algorithm is an iterative algorithm used for finding the minimum of an objective function. The algorithm needs a starting point, and then at each iteration, it finds the next point using:

linmixcomp00862.pngi+1=linmixcomp00864.pngi – H–1(linmixcomp00866.pngi) g(linmixcomp00868.pngi)

(4)

where H–1 is the inverse of the Hessian of the objective function, and g is the gradient of the objective function.

The algorithm stops when a convergence criterion is met, or when it is unable to continue.

In the Linear Mixed Effects object, a modification of Newton’s algorithm is used to minimize the nega­tive restricted log-likelihood function with respect to the variance parameters linmixcomp00870.png, hence yielding the variance parameters that maximize the restricted likelihood. Since a constant term does not affect the minimization, the objective function used in the algorithm is the negative of the restricted log-likeli­hood without the constant term.

objective function=–log(L(linmixcomp00872.png; yr)) – log(L(linmixcomp00874.png; ye))

(5)

Newton’s algorithm is modified by adding a diagonal matrix to the Hessian when determining the direction vector for Newton’s, so that the diagonal matrix plus the Hessian is positive definite, ensur­ing a direction of descent.

linmixcomp00876.pngi+1=linmixcomp00878.pngi – (Hi+Di)–1 gi 

(6)

The appropriate Di matrix is found by a modified Cholesky factorization as described in Gill, Murray and Wright (1981). Another modification to Newton’s algorithm is that, once the direction vector

d=–(Hi+Di)–1 gi 

(7)

has been determined, a line search is done as described in Fletcher (1980) to find an approximate minimum along the direction vector.

Note from the discussion on REML estimation that the restricted log-likelihood is a function of the vari­ance matrix Vr and the residual variance, which are in turn functions of the parameters linmixcomp00880.png. The gradi­ent and Hessian of the objective function with respect to linmixcomp00882.png are therefore functions of first and second derivatives of the variances with respect to linmixcomp00884.png. Since the variances can be determined through the types specified for G and R (e.g., Variance Components, Unstructured, etc.), the first and second derivatives of the variance can be evaluated in closed-form, and hence the gradient and Hessian are evaluated in closed-form in the optimization algorithm.

Starting values

Newton’s algorithm requires initial values for the variance parameter linmixcomp00886.png. The user can supply these in the Linear Mixed Effects object’s General Options tab, by clicking the Generate button under Initial Variance Parameters. More often, LinMix will determine initial values. For variance structures that are linear in linmixcomp00888.png (Variance Components, Unstructured, Compound Symmetry, and Toeplitz), the initial val­ues are determined by the method of moments. If the variance structure is nonlinear, then LinMix will calculate a “sample” variance by solving for random effects and taking their sums of squares and cross-products. The sample variance will then be equated with the parametric variance, and the sys­tem of equations will be solved to obtain initial parameter estimates. If either method fails, LinMix will use values that should yield a reasonable variance matrix, e.g., 1.0 for variance components and standard deviations, 0.01 for correlations.

Sometimes the method of moments will yield estimates of linmixcomp00890.png that are REML estimates. In this case the program will not need to iterate.

Convergence criterion

The convergence criterion used in the Linear Mixed Effects object is that the algorithm has converged if:

gT (linmixcomp00892.png) H–1(linmixcomp00894.png) g(linmixcomp00896.png) < linmixcomp00898.png

(8)

where linmixcomp00900.png is the convergence criterion specified on the General Options tab. The default is 1´10–10.

The possible convergence outcomes from Newton’s algorithm are:

Newton's algorithm converged with a final Hessian that is positive definite, indicating suc­cessful convergence at a local, and hopefully global, minimum.

Newton's algorithm converged with a modified Hessian, indicating that the model may be over-specified. The output is suspect. The algorithm converged, but not at a local minimum. It may be in a trough, at a saddle point, or on a flat surface.

Initial variance matrix is not positive definite. This indicates an invalid starting point for the algorithm.

Intermediate variance matrix is not positive definite. The algorithm cannot continue.

Failed to converge in allocated number of iterations.

Satterthwaite approximation for degrees of freedom

The linear model used in linear mixed effects modeling is:

gT (linmixcomp00902.png) H–1(linmixcomp00904.png) g(linmixcomp00906.png) < linmixcomp00908.pngY ~ N(Xb, V)

(9)

where X is a matrix of fixed known constants, b is an unknown vector of constants, and V is the vari­ance matrix dependent on other unknown parameters. Wald (1943) developed a method for testing hypotheses in a rather general class of models. To test the hypothesis H0: Llinmixcomp00910.png=0, the Wald statistic is:

linmixcomp00912.png 

(10)

where:

linmixcomp00914.png is the estimator of linmixcomp00916.png.

linmixcomp00918.png is the estimator of V.

L must be a matrix such that each row is in the row space of X.

The Wald statistic is asymptotically distributed under the null hypothesis as a chi-squared random variable with q=rank(L) degrees of freedom. In common variance component models with balanced data, the Wald statistic has an exact F-distribution. Using the chi-squared approximation results in a test procedure that is too liberal. LinMix uses a method based on analysis techniques described by Allen and Cady (1982) for finding an F distribution to approximate the distribution of the Wald statistic.

Giesbrecht and Burns (1985) suggested a procedure to determine denominator degrees of freedom when there is one numerator degree of freedom. Their methodology applies to variance component models with unbalanced data. Allen derived an extension of their technique to the general mixed model.

A one degree of freedom (df) test is synonymous with L having one row in equation (10). When L has a single row, the Wald statistic can be re-expressed as:

linmixcomp00920.png 

(11)

Consider the right side of equation (11). The distribution of the numerator is approximated by a chi-squared random variable with 1 df. The objective is to find n such that the denominator is approxi­mately distributed as a chi-squared variable with linmixcomp00922.png df. If:

linmixcomp00924.png 

(12)

is true, then:

linmixcomp00926.png 

(13)

The approach is to use equation (13) to solve for linmixcomp00928.png.

Find an orthogonal matrix R and a diagonal matrix linmixcomp00930.png such that:

linmixcomp00932.png 

(14)

The Wald statistic, equation (10), can be re-expressed as:

linmixcomp00934.png 

(15)

where ru is the u-th column of R, and linmixcomp00936.pngu is the u-th diagonal of linmixcomp00938.png.

Each term in equation (15) is similar in form to equation (11). Assuming that the u-th term is distrib­uted F(1, linmixcomp00940.pngu), the expected value of W is:

linmixcomp00942.png 

(16)

Assuming W is distributed F(q, linmixcomp00944.png), its expected value is qlinmixcomp00946.png/(linmixcomp00948.png – 2). Equating expected values and solving for linmixcomp00950.png, one obtains:

linmixcomp00952.png=2 E{W}/(E{W}–q)

(17)

Linear mixed effects scenario

The following example is used as a framework for discussing the functions and computations of the Linear Mixed Effects object.

An investigator wants to estimate the difference between two treatments for blood pressure, labeled A and B. Because responses are fairly heterogeneous among people, the investigator decided that each study subject will be his own control, which should increase the power of the study. Therefore, each subject will be exposed to treatments A and B. Since the treatment order might affect the out­come, half the subjects will receive A first, and the other half will receive B first. For the analysis, note that this crossover study has two periods, 1 and 2, and two sequence groups AB and BA.

The model will be of the form:

yijkm=linmixcomp00954.png+pi+dj+tq+Sk(j)+eijkm

(18)

where:

i is the period index (1 or 2)

j is the sequence index (AB or BA)

k is the subject within sequence group index (1…nj)

m is the observation within subject index (1, 2)

q is the treatment index (1, 2)

linmixcomp00956.png is the intercept

pi is the effect due to period

dj is the effect due to sequence grouping

tq is the effect due to treatment, the purpose of the study

Sk(j) is the effect due to subject

eijkm is the random variation

Typically, eijkm is assumed to be normally distributed with mean zero and variance s2 > 0. The pur­pose of the study is to estimate t1 – t2 in all people with similar inclusion criteria as the study subjects. Study subjects are assumed to be randomly selected from the population at large. Therefore, it is desirable to treat subject as a random effect. Hence, assume that subject effect is normally distributed with mean zero and variance Var(Sk(j)) ³ 0.

This mixing of the regression model and the random effects is known as a mixed effect model. The mixed effect model allows one to model the mean of the response as well as its variance. Note that in the model, equation (18), the mean is:

E{yijkm}=linmixcomp00958.png+pi+dj+tq 

(19)

and the variance is given by:

Var{yijkm}=Var{Sk(j)}+ s2 

(20)

E{yijkm}is known as the fixed effects of the model, while Sk(j)+eijkm constitutes the random effects, since these terms represent random variables whose variances are to be estimated. A model with only the residual variance is known as a fixed effect model. A model without the fixed effects is known as a random effect model. A model with both is a mixed effect model.

The data are shown in the table below. The Linear Mixed Effects object is used to analyze these data. The fixed and random effects of the model are entered on different tabs of the Diagram view. This is appropriate since the fixed effects model the mean of the response, while the random effects model the variance of the response.

Subject

Sequence

Period

Treatment

Response

1

AB

1

A

9.70

2

AB

1

A

10.24

3

AB

1

A

11.2

4

AB

1

A

7.82

5

AB

1

A

11.1

6

AB

1

A

9.31

7

BA

2

A

9.02

8

BA

2

A

7.88

9

BA

2

A

9.6

10

BA

2

A

9.63

11

BA

2

A

9.63

12

BA

2

A

9.91

7

BA

1

B

8.15

8

BA

1

B

9.23

9

BA

1

B

9.43

10

BA

1

B

10.13

11

BA

1

B

9.67

12

BA

1

B

11.34

1

AB

2

B

8.72

2

AB

2

B

11.28

3

AB

2

B

11.73

4

AB

2

B

9.77

5

AB

2

B

8.91

6

AB

2

B

8.31

The Linear Mixed Effects object can fit linear models to Gaussian data. The mean and variance struc­tures can be simple or complex. Independent variables can be continuous or discrete, and no distribu­tional assumption is made about them.

This model provides a context for the linear mixed effects model.

 


Last modified date:6/26/19
Certara USA, Inc.
Legal Notice | Contact Certara
© 2019 Certara USA, Inc. All rights reserved.