STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova Inyoung Kim

2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties

3 / 51 Matrix Expression If we have p variables x i1,...,x ip for the ithe observation, variable Observation 1 2 p 1 x 11 x 12 x 1p 2 x 21 x 22 x 2p............ n x n1 x n2 x np x 1 x 2 x p Notation: let x 1,x 2,...,x p be the column vectors.

4 / 51 Matrix Expression X represents the matrix: element x ij corresponds to the ijth element X = {x ij } which is n p matrix, i = 1,...,n and j = 1,...,p x j = column vector of the jth variable capital=matrix, bold=vector, not bold=scalar We typically think of vectors as elements of real numbers, i.e., x R p

5 / 51 Special matrices Square matrix: n n or p p Symmetric matrix: X = X t Diagonal matrix: square matrix with zeros except possibly on the diagonal d 1 0 0 0 D = 0 d 2 0 0 0 0 d 3 0 0 0 0 d 4

6 / 51 Special matrices Identity matrix: square matrix with ones on the diagonal, 0 s off diagonal

7 / 51 Special matrices Identity matrix: square matrix with ones on the diagonal, 0 s off diagonal 1 0 0 0 I = 0 1 0 0 0 0 1 0 0 0 0 1

8 / 51 Special matrices Identity matrix: square matrix with ones on the diagonal, 0 s off diagonal 1 0 0 0 I = 0 1 0 0 0 0 1 0 0 0 0 1 A is Idempotent matrix if A 2 = A A n =

9 / 51 Special matrices 1 1 1 n =. = 1 1 1 1 1 1 1 1 1 1 J n n =.... 1 1 1 1 J = J nn =

10 / 51 Special matrices 1 1 1 n =. = 1 1 1 1 1 1 1 1 1 1 J n p =.... 1 1 1 1 J = J nn = 1 n 1 t n = 11 t

11 / 51 Special matrices X = X t X = 1 x 1 1 x 2.. = 1 x n

12 / 51 Special matrices X = X t X = 1 x 1 1 x 2.. = ( ) 1 x 1 x n

13 / 51 Linear and quadratic forms Let Y n 1 be a random vector and B t n be a fixed matrix. In similarly, let A n n be a fixed matrix. Then Linear form of y: l t 1 = B t n y n 1 is called a linear form of y. Quadratic form of y: q 1 1 = y t Ay is a called a quadratic form of y.

14 / 51 Quadratic forms Y = Y 1 Y 2.. Y n a 11 a 12 a 1n, A a 21 a 22 a 2n nn =... a n1 a n2 a nn Y t AY = i=1 j=1 a ij Y i Y j

15 / 51 Quadratic forms Quadratic forms results usually assume that A is symmetric ( ) 5 3 A = 3 4

16 / 51 Quadratic forms Quadratic forms results usually assume that A is symmetric ( ) 5 3 A = 3 4 (Y 1 Y 2 ) ( 5 3 3 4 )( Y1 Y 2 ) = 5Y 2 1 + 6Y 1 Y 2 + 4Y 2 2

17 / 51 Quadratic forms Quadratic forms are common in linear models as a way of expressing variation i (y i ŷ i ) 2 Decompositions of sum of squares may be expressed in terms of quadratic forms i (y i ȳ) 2 = i (y i ŷ i ) 2 + i (ŷ i ȳ) 2 Distributions of quadratic forms are Independence of quadratic forms is based on

18 / 51 Quadratic forms Quadratic forms are common in linear models as a way of expressing variation i (y i ŷ i ) 2 Decompositions of sum of squares may be expressed in terms of quadratic forms i (y i ȳ) 2 = i (y i ŷ i ) 2 + i (ŷ i ȳ) 2 Distributions of quadratic forms are chi-square Independence of quadratic forms is based on idempotent matrices

19 / 51 Properties of linear and quadratic forms When l t 1 = B t n y n 1, E(y) = µ, E(y i ) = µ i and µ = µ n var(y 1 ) cov(y 1,y 2 ) cov(y 1,y n ) cov(y 2,y 1 ) var(y 2 ) cov(y 2,y n ) Var(y) = cov(y) =.... cov(y n,y 1 ) cov(y n,y 2 ) var(y n ) E(l) = Cov(y) = l 1 = B 1 y, l 2 = B 2 y, then Cov(l 1,l 2 ) = µ 1..,

20 / 51 Properties of linear and quadratic forms When l t 1 = B t n y n 1, E(y) = µ, E(y i ) = µ i and µ = µ n var(y 1 ) cov(y 1,y 2 ) cov(y 1,y n ) cov(y 2,y 1 ) var(y 2 ) cov(y 2,y n ) Var(y) = cov(y) =.... cov(y n,y 1 ) cov(y n,y 2 ) var(y n ) E(l) =E(By)=BE(y)=Bµ Cov(y) =E[{y-E(y)} n 1 {y E(y)} t 1 n ] l 1 = B 1 y, l 2 = B 2 y, then Cov(l 1,l 2 ) =B 1 cov(y)b t 2 µ 1..,

21 / 51 Properties of quadratic forms Using the following facts trace if A = {a ij }, then tr(a) = i=1 a ii tr(ab) = tr(ba) We can proof this Eq = µ t Aµ + tr(av) where q = y t A n n y, y [µ1,v], V n n =covariance matrix

Facts: Eq = µ t Aµ + tr(av) 22 / 51

23 / 51 Simple Linear Regression y 1 = β 0 1 + β 1 x 1 + ε 1 y 2 = β 0 1 + β 1 x 2 + ε 2. y n = β 0 1 + β 1 x n + ε n

24 / 51 Matrix Expression of Simple Linear Regression Let Y = y 1 y 2.. y n, X = ( ) 1 x, ε = ε 1.. ε n The simple linear regression model is, and β = ( β0 β 1 ) Y = Xβ + ε.

25 / 51 Matrix expression of Normal Equation Recall: b 0 = ˆβ 0 and b 1 = ˆβ 1 nb 0 + b 1 X i = Y i b 0 X i + b 1 X 2 i = X i Y i i

26 / 51 Matrix expression for LSE Q = {Y i (β 0 + β 1 X i )} 2 = = = = i Equating to zero, dividing by 2, and substituting b for β gives the matrix form of the least squares normal equations:

27 / 51 Matrix expression for LSE Q = {Y i (β 0 + β 1 X i )} 2 i = (Y Xβ) t (Y Xβ) = Y t Y β t X t Y Y t Xβ β t X t Xβ = Y t Y 2Y t Xβ + β t X t Xβ Q β = 2X t Y + 2X t Xβ Equating to zero, dividing by 2, and substituting b for β gives the matrix form of the least squares normal equations: X t Xb = X t Y

28 / 51 Matrix expression of Normal Equation X t Xβ = X t Y ˆβ = (X t X) 1 X t Y

29 / 51 Matrix expression of Ŷ Ŷ = X ˆβ = = where H = X(X t X) 1 X t :

30 / 51 Note (X t X) 1 (X t X) 1 = = 1 ns xx ( 1 n + x 2 S xx ( i Xi 2 i X i x S xx x S xx 1 S xx ) i X i n )

31 / 51 Hat matrix H is n n matrix H is symmetric, H t = H H is idempotent (why?) Note: Properties of idempotent matrix A If rank(a nn ) = n, then A = I nn tr(a) = rank(a) If A is symmetric matrix, then A is positive semi-definite (p.s.d). Def of p.s.d: for any vector x, x t Ax 0

32 / 51 Matrix expression of residual e = Y Ŷ = (I H)Y (I-H) is also symmetric and idempotent. Var(e) = (I H)σ 2

33 / 51 Decomposition of sum of squares i Y i Ȳ = (Ŷ i Ȳ ) + (Y i Ŷ i ) (Y i Ȳ ) 2 = (Ŷ i Ȳ ) 2 + (Y i Ŷ i ) 2 SST = SSR + SSE i i

34 / 51 Matrix expression of SST SST = (Y i Ȳ ) 2 ] i=1 = Y 2 i ( Y i) 2 n

35 / 51 Matrix expression of SSE SSE = (Y i Ŷ ) 2 = i=1

36 / 51 Matrix expression of SSR SSR =

37 / 51 Matrix expression of SST, SSE, SSR SST = Y t (I 1 n J)Y SSE = Y t (I H)Y SSR = Y t (H 1 n J)Y

38 / 51 Properties of estimates E( ˆβ) = Var( ˆβ) =

39 / 51 Properties of estimates If y 1,y 2,...,y n N(β,σ 2 ), then b = ˆβ N 2 (β,σ 2 (X t X) 1 ) NOTE: (X X) 1 = 1 ( ) ( ) i xi 2 1 i x i = + x2 n S xx x S xx ns xx i x i n x S xx 1 S xx Var( ˆβ) = Var( ˆβ 0 ) = Var( ˆβ 1 ) = Cov( ˆβ 0, ˆβ 1 ) =

40 / 51 Remark: Comparison between centering and non-centering Regression at the centered form. Let x 1 = x 1 x, x 2 = x 2 x,...,x n = x n x. Then x = 0 If we replace x 1, x 2,..., x n by x 1, x 2,...,x n, ˆβ 0 = ȳ x ˆβ 1 = ȳ β 0 and β 1 are independent. because cov( ˆβ 0, ˆβ 1 ) = x S xx σ 2 = 0

41 / 51 Properties of estimates E(Ŷ ) = Var(Ŷ ) =

42 / 51 Distributional properties y N(µ,V), w = By + δ, then w y N(µ,V), l = By, q = y t Ay, A: symmetry and idempotent. If l and q are independent, then BVA = 0

43 / 51 Distributional properties y N(µ,V), w = By + δ, then w N(Bµ + δ,bvb t ) y N(µ,V), l = By, q = y t Ay, A: symmetry and idempotent. If l and q are independent, then BVA = 0

44 / 51 Distributional properties y N(µ,V), l = By, q = y t Ay, A: symmetry and idempotent. If l and q are independent, then BVA = 0 Proof:

45 / 51 Distributional properties y N(µ,V), q 1 = y t A 1 y, q 2 = y t A 2 y then, q 1 and q 2 are independent A 1 VA 2 = 0 (NOTE: I didn t describe any properties about matrix A 1 and A 2. What properties do they have? )

46 / 51 Distributional properties y 1,...,y n iid N(0,1) then, n i=1 y 2 i χ 2 n

47 / 51 Distributional properties y 1,...,y n iid N(0,σ 2 ) then, n i=1 y 2 i σ 2 χ 2 n

48 / 51 Distributional properties y N(0,V), q = y t Ay, r = rank(a), then q χr 2 idempotent. iff AV is

49 / 51 Distributional properties y N(0,1), x χd 2, and y and χ are independent. Then t = y χ/d t d

50 / 51 Distributional properties χ 1 χ 2 d 1, χ 2 χ 2 d 2, and quadratic forms χ 1 and χ 2 are independent. Then F = χ 1/d 1 χ 2 /d 2 F d1,d 2

51 / 51 Distributional properties If t t d, then t 2 F 1,d (WHY?)