Least_squares

Least squares means

Sometimes it is desirable to see the mean response for each level of a classification variable or for each combination of levels for two or more classification variables. If there are covariables with unequal means for the different levels, or if there are unbalanced data, the subsample means are not estimates that can be validly compared. This section describes least squares means, statistics that make proper adjustments for unequal covariable means and unbalanced data in the linear mixed effects model.

Consider a completely randomized design with a covariable. The model is:

Phoenix_UserDocs_Linear_Mixed_Effects_Object_image1931

where y_ij is the observed response on the j^th individual on the i^th treatment; a_i is the intercept for the i^th treatment; x_ij is the value of the covariable for the j^th individual in the i^th treatment; b is the slope with respect to x; and e_ij is a random error with zero expected value. Suppose there are two treatments; the average of the x_1j is 5; and the average of the x_2j is 15. The respective expected values of the sample means are a₁+5b and a₂+15b. These are not comparable because of the different coefficients of b. Instead, one can estimate a₁+10b and a₂+10b where the overall mean of the covariable is used in each linear combination.

Now consider a 2 ´ 2 factorial design. The model is:

Phoenix_UserDocs_Linear_Mixed_Effects_Object_image1933

where y_ijk is the observed response on the k^th individual on the i^th level of factor A and the j^th level of factor B; m is the over-all mean; a_i is the effect of the i^thlevel of factor A; b_j is the effect of the j^th level of factor B; and e_ijk is a random error with zero expected value. Suppose there are six observations for the combinations where i=j and four observations for the combinations where i ¹ j. The respective expected values of the averages of all values on level 1 of A and the averages of all values on level 2 of A are m + (0.6 b₁ + 0.4 b₂) + a₁ and m + (0.4 b₁ + 0.6 b₂) + a₂. Thus, sample means cannot be used to compare levels of A because they contain different functions of b₁ and b₂. Instead, one compares the linear combinations:

Phoenix_UserDocs_Linear_Mixed_Effects_Object_image1935

The preceding examples constructed linear combinations of parameters, in the presence of unbalanced data, that represent the expected values of sample means in balanced data. This is the idea behind least squares means. Least squares means are given in the context of a defining term, though the process can be repeated for different defining terms for the same model. The defining term must contain only classification variables and it must be one of the terms in the model. Treatment is the defining term in the first example, and factor A is the defining term in the second example. When a least squares means is requested, LinMix automatically generates the coefficients l_j of the linear combination expression and processes them almost as it would process the coefficients specified in an estimate statement. This chapter describes generation of linear combinations of elements of b that represent least squares means. A set of coefficients are created for each of all combinations of levels of the classification variables in the defining term. For all variables in the model, but not in the defining term, average values of the variables are the coefficients. The average value of a numeric variable (covariable) is the average for all cases used in the model fitting. For a classification variable with k levels, assume the average of each indicator variable is 1/k. The value 1/k would be the actual average if the data were balanced. The values of all variables in the model have now been defined. If some terms in the model are products, the products are formed using the same rules used for constructing rows of the X matrix as described in the “Fixed effects specification” section. It is possible that some least squares means are not estimable.

For example, suppose the fixed portion of the model is: Drug + Form + Age + Drug*Form

To get means for each level of Drug, the defining term is Drug. Since Drug has three levels, three sets of coefficients are created. Build the coefficients associated the first level of Drug, DrugA. The first coefficient is one for the implied intercept. The next three coefficients are 1, 0, and 0, the indicator variables associated with DrugA. Form is not in the defining term, so average values are used. The next four coefficients are all 0.25, the average of a four factor indicator variable with balanced data. The next coefficient is 32.17, the average of Age. The next twelve elements are:

Phoenix_UserDocs_Linear_Mixed_Effects_Object_image1937

The final result is shown in the DrugA column in the following table. The results for DrugB and DrugC are also shown in the table. No new principles would be illustrated by finding the coefficients for the Form least squares means. The coefficients for the Drug*Form least squares means would be like representative rows of X except that Age would be replaced by the average of Age.

Coeff_LSM