A Likelihood Ratio Test David Allen University of Kentucky February 23, 2012
1 Introduction Earlier presentations gave a procedure for finding an estimate and its standard error of a single linear combination of elements of β. Depending on the context, these are used for testing a null hypothesis or establishing a confidence interval. Here the likelihood ratio test used to simultaneously test multiple null hypotheses is given. It is assumed that all alternative hypotheses are two-sided. The derivation is based on the fixed effects only model. Adjustments for mixed models are considered later. Back 2
An Alternate Expression of the Linear Model Heretofore, the representation of the linear model has been Y = Xβ + ε. (1) An alternate definition of the linear model is Y = μ + ε (2) where μ V(X). In both representations, Y is an n-component vector of responses, X is a model matrix, and ε N n (0, σ 2 ). Back 3
Coordinate free representation The difference between the two representations is a matter of emphasis. In Equation (1) the columns of X are axes of a coordinate system and the elements of β are the coordinates of a point in the system. In Equation (2) the set of possible values of μ is important but not the specific axes. Back 4
Testing the null hypothesis The objective versus the alternate hypothesis H 0 : μ V(X 0 ) (3) H 1 : μ / V(X 0 ) (4) where V(X 0 ) V(X) is the subject of this section. Testing (3) is done using the likelihood ratio principle. For this, the likelihood function is needed. Back 5
The likelihood function The density function for the multivariate normal distribution was given earlier. With n components and = σ 2 the density function becomes 1 ƒ (y) = (2πσ 2 ) n/2 exp y μ 2 2σ 2 This expression viewed as a function of the parameters is called the likelihood function: L(μ, σ 2 1 y μ 2 ) = (2πσ 2 exp ) n/2 2σ 2 (5) Back 6
The likelihood ratio is The likelihood ratio λ = s p ω L(μ, σ 2 ) s p Ω L(μ, σ 2 ) (6) where Ω is the full parameter space and ω is the parameter space under the null hypothesis. The null hypothesis is rejected for small values of λ. The likelihood ratio test is discussed by Casella and Berger [1, Section 8.2.1]. Back 7
The log likelihood function First, consider the denominator of the likelihood ratio (6). The logarithm of the likelihood function is easier to maximize than the likelihood function. The log likelihood function, apart from an additive constant, is log L(μ, σ 2 ) = n 2 log σ 2 y μ 2 2σ 2 (7) The next step is to find values of σ 2 and μ V(X) such that Equation (7) is maximized. Back 8
The Estimate of σ 2 Regardless of the value of σ 2, the log likelihood function is maximized with respect to μ if y μ 2 is minimized with respect to μ. From the vector spaces presentation, this minimum is obtained when μ = ŷ where ŷ is the projection of y onto V(X). See also Scheffé [3, page 383]. Now let SSR = y ŷ 2 and substitute into Equation (7) to obtain log L( ˆμ, σ 2 ) = n 2 log σ 2 SSR 2σ 2 (8) Back 9
Take the derivative of Equation (8) with respect to σ 2. Setting the result equal to zero and solving yields σ 2 = SSR n. (9) Back 10
The Denominator Now ˆμ and σ 2 are substituted in Formula 5 to obtain the denominator of the likelihood ratio: L ˆμ, σ 1 2 = (2πσ exp n. (10) 2 ) n/2 2 Back 11
The Ratio The mechanics for maximizing the numerator of (6) is identical to that for the denominator. Let ŷ 0 be the projection of y onto V(X 0 ) and SSR 0 = y ŷ 0 2. The likelihood ratio, after some simplification, is λ = SSR SSR 0 n/2 (11) The null hypothesis (3) is rejected if λ < c where λ is from Equation (11) and c, called the critical value, is chosen so that the test has the specified probability of a type I error. Back 12
An Alternate Statistic Define the sum of squares for hypothesis, SSH = SSR 0 SSR, and F = SSH/(r r 0) SSR/(n r) (12) where r = r nk(x) and r 0 = r nk(x 0 ). An equivalent test is to reject the null hypothesis (3) if F > ƒ where ƒ is chosen so that the test has the specified probability of a type I error. The following sections consider computation of F and its distribution. Back 13
2 The transformed model approach Select a matrix X 1 such that V( X 0 X 1 ) = V(X). The choice X 1 = X always works and is particularly useful if you would like an expression for the null hypothesis in terms of β. Let R be an orthogonal matrix such that R t X 0 X 1 is an echelon matrix. Partition R as R0 R 1 R The number of columns in the respective partitions of (13) are r 0, r r 0, and n r. (13) Back 14
Components of the Transformed Model Let Y 0 = R t 0 Y, Y 1 = R t 1 Y, and Y = R t Y. The components of the transformed model are Y = Y 0 Y 1 Y, X 0 = E 00 0 0, and X = E 00 E 01 0 E 11. (14) 0 0 Back 15
Projections The definition of a projection in the presentation on vector spaces is repeated here: Let V r V n and y V n. There exists vectors and z such that y = + z (15) V r (16) z V r (17) This decomposition is unique. The vector is called the projection of y onto V r. See also Scheffé [3, page 383]. Back 16
Partition Y as Projection onto X Y 0 Y = Y 1 + 0 0 0 Y. (18) Refer to Conditions (15) (17). Verify that the first term on the right in Equation (18) is in V(X ) and the second term is perpendicular to V(X ). Thus SSR = Y 2. Back 17
Projection onto X 0 Now partition Y as Y 0 0 Y = 0 + Y 1. (19) 0 Verify that the first term on the right in Equation (19) is in V(X 0 ) and the second term is perpendicular to V(X 0 ). Thus SSR 0 = Y 1 2 + Y 2. Y Back 18
The test statistic Thus SSH = SSR 0 SSR = Y 1 2 and the test statistic is F = SSH/(r r 0) SSR/(n r) (20) Exercise 2.1. Find the distribution of F in Equation (20) Back 19
3 A Completely Randomized Design The principles are illustrated using data from a completely randomized design as given in Kuehl [2]: Bacteria Package condition log(co nt/cm 2 ) Plastic wrap 7.66 6.98 7.80 Vacuum packaged 5.26 5.44 5.80 Mixed gas 7.41 7.33 7.04 CO 2 3.51 2.91 3.66 We refer to this data as the bacteria data. Back 20
The full model The matrix representation of the model for the bacteria data is 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 μ y = 0 1 0 0 1 μ 2 0 0 1 0 μ 3 + ε (21) 0 0 1 0 μ 4 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 Back 21
The reduced model If the means are all equal, the model is reduced to y = 1 1 1 1 1 1 1 1 1 1 1 1 μ + ε (22) Back 22
The Defining Matrices In the notation of the section of the likelihood ratio test, X 1 is the matrix in Equation (21), and X 0 is the matrix in Equation (22). The matrix X 0 X 1 y is transformed to E 00 E 01 Y 0 0 E 11 Y 1. 0 0 Y Back 23
The Transformed Matrix Numerically, the transformed matrix is -3.464-0.866-0.866-0.866-0.866-20.438 0-1.500 0.500 0.500 0.500-3.160 0 0-1.414 0.707 0.707-0.269 0 0 0 1.225-1.225 4.777 0 0 0 0 0-0.412 0 0 0 0 0-0.052 0 0 0 0 0-0.139 0 0 0 0 0-0.219 0 0 0 0 0-0.509 0 0 0 0 0 0.344 0 0 0 0 0-0.256 0 0 0 0 0 0.494 Back 24
The analysis of variance table The sums of squares, degrees of freedom, mean squares, and inference statistics are displayed in an analysis of variance table: Sum of Degrees of Mean Source squares freedom square F p-value Treatment 32.873 3 10.958 94.584 0.0000 Residuals 0.927 8 0.116 The null hypothesis is soundly rejected. Back 25
4 Mixed Models To use the likelihood ratio technique with mixed models, one would need to consider each model separately. However, for balanced models, usable techniques arise from calculating the statistics assuming everything is fixed. Then deriving the distribution of the statistics assuming the mixed model. Back 26
Randomized Complete Block Data This example uses simulated data from a randomized complete block design with three treatments and four blocks. There are two objectives: test H 0 that treatment means are equal and H 0 that σ 2 b = 0. Back 27
The Reduced Matrix The matrix X 0 X Z Y reduced by an orthogonal transformation is: -3.464-1.155-1.155-1.155-0.866-0.866-0.866-0.866-15.339 0-1.633 0.816 0.816 0 0 0 0 1.753 0 0-1.414 1.414 0 0 0 0 0.807 0 0 0 0 1.500-0.500-0.500-0.500 0.506 0 0 0 0 0-1.414 0.707 0.707 2.671 0 0 0 0 0 0-1.225 1.225-1.973 0 0 0 0 0 0 0 0 0.130 0 0 0 0 0 0 0 0-2.047 0 0 0 0 0 0 0 0-0.660 0 0 0 0 0 0 0 0 1.019 0 0 0 0 0 0 0 0 0.699 0 0 0 0 0 0 0 0 1.211 Addressing the objectives is an in class exercise. Back 28
An Exercise Exercise 4.1. Assume μ 1 = 4, μ 2 = 5, μ 3 = 6, σ 2 b σ 2 = 1. = 4, and 1. Give the power of the test for equality of means. 2. Give the power of the test for σ 2 b = 0. Back 29
References [1] George Casella and Roger L. Berger. Statistical Inference. Duxbury, second edition, 2002. [2] Robert O. Kuehl. Design of Experiments: Statistical Principles of Research Design and Analysis. Duxbury Press, second edition, 2000. [3] Henry Scheffé. The Analysis of Variance. John Wiley & Sons, Inc., New York, 1959. Back 30