Ch4. Distribution of Quadratic Forms in y

Similar documents
Random Vectors and Multivariate Normal Distributions

MLES & Multivariate Normal Theory

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

Matrix Algebra, Class Notes (part 2) by Hrishikesh D. Vinod Copyright 1998 by Prof. H. D. Vinod, Fordham University, New York. All rights reserved.

ANALYSIS OF VARIANCE AND QUADRATIC FORMS

Maximum Likelihood Estimation

STAT5044: Regression and Anova. Inyoung Kim

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.

Lecture 11. Multivariate Normal theory

SOME THEOREMS ON QUADRATIC FORMS AND NORMAL VARIABLES. (2π) 1 2 (σ 2 ) πσ

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

Linear Algebra Review

STAT 100C: Linear models

Lecture 15: Multivariate normal distributions

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Sampling Distributions

The Multivariate Normal Distribution 1

The Multivariate Normal Distribution 1

STAT 100C: Linear models

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Chapter 1. Matrix Algebra

Probability Background

TMA4267 Linear Statistical Models V2017 (L10)

Distributions of Quadratic Forms. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 31

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

MIT Spring 2015

Standard Linear Regression Model (SLM)

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

Math 423/533: The Main Theoretical Topics

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

ANOVA: Analysis of Variance - Part I

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution

Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016

Lecture 4: Least Squares (LS) Estimation

Estimation of Variances and Covariances

Chapter 5 Matrix Approach to Simple Linear Regression

[y i α βx i ] 2 (2) Q = i=1

ANOVA Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 32

Matrix Approach to Simple Linear Regression: An Overview

ANOVA Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 32

3 Multiple Linear Regression

3. The F Test for Comparing Reduced vs. Full Models. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Introduction to Probability and Stocastic Processes - Part I

Chapter 5. The multivariate normal distribution. Probability Theory. Linear transformations. The mean vector and the covariance matrix

Multivariate Regression (Chapter 10)

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Random Vectors and Random Sampling. 1+ x2 +y 2 ) (n+2)/2

On the Distribution of a Quadratic Form in Normal Variates

The Statistical Property of Ordinary Least Squares

Linear Models in Statistics

Part 6: Multivariate Normal and Linear Models

BASICS OF PROBABILITY

WLS and BLUE (prelude to BLUP) Prediction

Properties of Matrices and Operations on Matrices

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

STAT 714 LINEAR STATISTICAL MODELS

STA 2101/442 Assignment 3 1

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Linear Models and Estimation by Least Squares

Ch 2: Simple Linear Regression

Stat 206: Sampling theory, sample moments, mahalanobis

STA 2201/442 Assignment 2

Notes on Random Vectors and Multivariate Normal

1 Data Arrays and Decompositions

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

The Multivariate Normal Distribution. In this case according to our theorem

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution

Economics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18)

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

7.3 The Chi-square, F and t-distributions

Properties of Random Variables

Paired comparisons. We assume that

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares

TAMS39 Lecture 2 Multivariate normal distribution

Lecture 9 SLR in Matrix Form

Economics 573 Problem Set 5 Fall 2002 Due: 4 October b. The sample mean converges in probability to the population mean.

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

3. Probability and Statistics

8 - Continuous random vectors

Multivariate Gaussians. Sargur Srihari

Lecture Note 1: Probability Theory and Statistics

Course information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment.

MAS223 Statistical Inference and Modelling Exercises

Applied Statistics Preliminary Examination Theory of Linear Models August 2017

01 Probability Theory and Statistics Review

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Transcription:

ST4233, Linear Models, Semester 1 2008-2009 Ch4. Distribution of Quadratic Forms in y 1 Definition Definition 1.1 If A is a symmetric matrix and y is a vector, the product y Ay = i a ii y 2 i + i j a ij y i y j is called a quadratic form. If x is n 1, y is p 1, and A is n p, the product x Ay = ij a ij x i y j is called a bilinear form. Note: y Ay = y A y, so if let B = A+A /2, then B = B symmetric and y By = y Ay. Thus, to study a quadratic form y Ay, we always assume that A is symmetric, A = A. 1

2 Mean and Variance of Quadratic Forms Theorem 2.1 If y is a random vector with mean µ and covariance matrix Σ and if A is a symmetric matrix of constants, then Ey Ay = traσ + µ Aµ. 1 Proof: Since Σ = Ey µy µ = Eyy µµ, i.e., Eyy = Σ + µµ. Since y Ay is a scalar, it is equal to its trace. We have Ey Ay = E[try Ay] = E[trAyy ] = tr[eayy ] = tr[aσ + µµ ] = traσ + µ Aµ. Note that since y Ay is not a linear function of y, Ey Ay Ey AEy. Example 2.1 Assume that y 1, y 2,, y n are i.i.d with mean µ and variance σ 2. sample variance s 2 = 1 n 1 Then the y i ȳ 2 2 is an unbiased estimator of the population variance σ 2. Proof: Since the numerator of 2 can be written as y i ȳ 2 = y I 1 n Jy, where J = jj and j = 1, 1,, 1. Thus for use in 1, we have A = I 1 n J, Σ = σ2 I, and µ = µj. Hence, E[ y i ȳ 2 ] = tr[i 1 n Jσ2 I] + µj I 1 n Jµj = σ 2 tri 1 n J + µ2 j j 1 n j jj j = σ 2 n n n + µ2 n 1 n n2 = n 1σ 2. Therefore Es 2 = σ 2. 2

Theorem 2.2 If y is N p µ, Σ, then vary Ay = 2tr[AΣ 2 ] + 4µ AΣAµ. 3 Proof: See Searle 1971, P. 57. Theorem 2.3 If y is N p µ, Σ, then covy, y Ay = 2ΣAµ. Proof: See Rencher 2000, p. 96-97. Corollary 2.1 Let B be a k p matrix of constants, then y covby, y Ay = 2BΣAµ. Theorem 2.4 Let v = matrix as follows, x be a partitioned random vector with mean vector and covariance E y x = µy µ x, cov y x = Σ xy Σyx Σ yx Σ xx, where y is p 1, x is q 1, and Σ yx is p q. Let A be a q p matrix of constants. Then Ex Ay = traσ yx + µ xaµ y. Proof: Exercise of students. Example 2.2 Show that the population covariance σ xy = E[x µ x y µ y ] can be estimated by the sample covariance S xy = 1 n 1 x i xy i ȳ, where x 1, y 1, x 2, y 2,, x n, y n is a bivariate random sample from a population with means µ x and µ y, and covariance σ xy. 3 Noncentral Chi-square Distribution Definition 3.1 Suppose that y is N n µ, I n. Then v = y y is distributed as a noncentral χ 2 distribution with n degrees of freedom and noncentrality parameter λ = µ µ/2, i.e., v χ 2 n, λ. When λ = 0, v is distributed as a central χ 2 distribution, i.e., v χ 2 n. 3

Theorem 3.1 If v is distributed as χ 2 n, λ, then Ev = n + 2λ, varv = 2n + 8λ, M v t = Proof: See Graybill 1976, p.126. The χ 2 distribution has an additive property: 1 1 2t n/2 e λ[1 1/1 2t]. Theorem 3.2 If v 1, v 2,, v k are independently distributed as χ 2 n i, λ i, then k v i is distributed as χ 2 k n i, k λ i. 4 Noncentral F Distribution Definition 4.1 If u is distributed as a noncentral chi-square, χ 2 p, λ, while v remains a central chi-square, χ 2 q, with u and v independent. Then z = u/p v/q is distributed as F p, q, λ, the noncentral F-distribution with noncentrality parameter λ, where λ is the same noncentrality parameter as in the distribution of u. The mean of the distribution is Ez = 5 Noncentral t Distribution q q 2 1 + 2λ p. Definition 5.1 If y is Nµ, 1, u is χ 2 p, and y and u are independent, then t = y u/p is distributed as tp, µ, the noncentral t-distribution with p degrees of freedom and noncentrality parameter µ. Note if y is Nµ, σ 2, the noncentrality parameter becomes µ/σ, since y/σ is distributed as Nµ/σ, 1. 4

6 Distribution of Quadratic Forms Theorem 6.1 Let y be distributed as N p µ, Σ, let A be a p p symmetric matrix of constants of rank r, and let λ = 1 2 µ Aµ. Then y Ay is χ 2 r, λ if and only if Σ 1/2 AΣ 1/2 is idempotent. Corollary 6.1 If y is N p O, I, then y Ay is χ 2 r if and only if A is idempotent of rank r. Corollary 6.2 If y is N p µ, σ 2 I, then y Ay/σ 2 is χ 2 r, µ Aµ/2σ 2 is and only if A is idempotent of rank r. Example 6.1 Given an i.i.d sample y 1, y 2,, y n with mean µ and variance σ 2. Consider the sample variance S 2 = 1 n 1 y i ȳ 2, where y i Nµ, σ 2. In the matrix form, it is can be written as S 2 = 1 n 1 y I 1 n Jy, where I 1 nj is idempotent and of rank n 1. We next find λ, λ = µ Aµ/2σ 2 = µ 2 [j j 1/nj Jj]/2σ 2 = 0. Therefore, y I 1 n Jy/σ2 = n 1S 2 /σ 2 is distributed as χ 2 n 1. 7 Independence of Linear Forms and Quadratic Forms Lemma 7.1 A symmetric matrix A, of order n and rank r, can be written as LL where L is n r of rank r, i.e., L has full column rank. Proof: D P AP 2 = r O = O O Dr O D r O for some orthogonal P, where D 2 r is diagonal of order r. Hence A = P Dr D r O P = LL O where L = D r O P of order r n and full row rank; i.e., L is of full column rank. Note also that LL = A and L L = D 2 r. Also, L is real only when A is non-negative definite, for only then are the non-zero elements of D 2 r positive. Theorem 7.1 linear and quadratic Suppose B is a k p matrix of constants, A is a p p symmetric matrix of constants, and y is distributed as N p µ, Σ. independent if and only if BΣA = O. 5 Then By and y Ay are

Proof: sufficiency BΣA = O implies independence. From the lemma, A = LL, where L is of full-column-rank. BΣA = O implies BΣLL LL L 1 = O i.e. BΣL = O. Therefore covby, y L = BΣL = O. Hence, because y is normal, By and y L are independent. Consequently By and y Ay are independent. necessity: the independence of By and y Ay implies BΣA = O. The independence property gives covby, y Ay = 2BΣAµ = 0. Hence 2BΣAµ = 0, and since this is true for all µ, BΣA = O, and so the proof is complete. Example 7.1 Given an i.i.d sample y 1, y 2,, y n with mean µ and variance σ 2. Consider s 2 = 1 n 1 y I 1 n Jy and ȳ = 1 n j y. By Theorem 7.1, ȳ is independent of s 2 since j I 1 nj = O. Theorem 7.2 quadratic and quadratic Let A and B be symmetric matrices of constants. If y is N p µ, Σ, then y Ay and y By are independent if and only if AΣB = O. Proof: Sufficiency AΣB = O implies independence. By the Lemma, we have A = LL and B = MM, where each of L and M have full column rank. Therefore, if AΣB = O, LL ΣMM = O, and because L L 1 and M M 1 exist this means L ΣM = O. Therefore covl y, y M = L ΣM = O. Hence, because y is normal, L y and y M are independent. Consequently y Ay = y LL y and y By = y MM y are independent. Necessity the independence implies AΣB = O. When y Ay and y By are independent, covy Ay, y By = 0, so that vary Ay + y By = vary Ay + vary By, i.e., vary A + By = vary Ay + vary By. Applying equation 3 to all three terms in this results leads, after a little simplification, to trσaσb + 2µ AΣBµ = 0. This is true for all µ, including µ = 0, so trσaσb = 0 and on substituting back gives 2µ AΣBµ = 0. This in turn is true for all µ, and so AΣB = 0. Thus the theorem is proved. Example 7.2 To illustrate Theorem 7.2, consider the partitioning of i y2 i = n y i ȳ 2 + nȳ 2, i.e., y y = y I 1 n Jy + y 1 n Jy. If y is N n µj, σ 2 I, then by Theorem 7.2, y I 1 n Jy and y 1 njy are independent if and only if I 1 n J 1 nj = O. 6

Theorem 7.3 Several quadratic forms Let y be N n µ, σ 2 I, let A i be symmetric of rank r i for i = 1, 2,, k, and let y Ay = k y A i y, where A = k A i is symmetric of rank r. Then 1 y A i y/σ 2 is χ 2 r i, µ A i µ/2σ 2, i = 1, 2,, k, and 2 y A i y and y A j y are independent for all i j, and 3 y Ay/σ 2 is χ 2 r, µ Aµ/2σ 2 if and only if any two of the following three statements are true: a each A i is idempotent, b A i A j = O for all i j. c A = k A i is idempotent. or if and only if c and d are true, where d is the following statement: d r = k r i. 7