Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

Stat 74 Estmato for Geeral Lear Model Prof. Goel Broad Outle Geeral Lear Model (GLM): Trag Samle Model: Gve observatos, [[( Y, x ), x = ( x,, xr )], =,,, the samle model ca be exressed as Y = µ ( x, x,, x ) + ε, =,,,, () r where,,,,, zero ad varace σ. ε =, deote the ose (radom errors), each wth mea From ow o, we deote the features f, =,,,, themselves as coded redctor varables x, x,, x. I the smlest settg, the radom errors are assumed to be ucorrelated wth equal varace. Thus the samle GLM ca be exressed as = β + ε, [ ε ] =, [ ε ] = σ, ( ε, εk ) =,. = Y x E Var Cov k () EY [ ] = µ = β x. = x (3) Vector/matrx otato for the resose, redctor varables, error terms ad the ukow coeffcets: Y β ε x. x Y β ε x x. Y =, β =, ε =,ad X =. Also, let x. =, Y β ε x x. deote the th colum of X,.e., X = [ x., x.,, x. ]. Gve the resose vector Y, ad the desg matrx X, the samle GLM ca be wrtte as Y = Xβ + ε E ε = Cov ε = ε ε = E ε ε = E εε = σ I (4), [ ], [ ] ((cov(, ))) (( ( ))) [ ]. Thus, E[ Y] = μ= Xβ, Cov( Y) = E[( Y-Xβ)( Y-Xβ) ] = E[ εε ] = σ I. (5)

Ordary Least Square (OLS): For a estmate β of β, corresodg Resdual Sum of Squares: l ( β) = ( Y β x ) = e ( β) = ( β) ( β). e e = = = Problem: Fd a estmated coeffcet vector ˆ β = arg m l( β) I matrx otato, β R m e ( β) e( β) = m( YX β)( Y X β) = e ( ˆ β) e( ˆ β). β R β R (6) Exad S( β) = ( YXβ)( Y Xβ) = YY YX β β XY + β XX β. O settg the artal dervatves of S( β ) wth resect to β equal to zero, we get the Normal Equatos XX β = XY. (7) Ay soluto of (7) s a otmal soluto to the OLS roblem. Full rak case: Examle Smle Lear Regresso, Multle Regresso wth learly deedet features. If the matrx XXs o-sgular (the desg matrx X s of full colum rak ), verse of XXexsts, ad the uque otmal least square soluto s ˆ = XX XY (8) βˆ ( ). Note that E[ β] = ( ) E[ ] = ( ) β = β XX X Y XX XX. Sce, Cov( TY) = TCov ( Y) T, therefore, ˆ Cov( β) = Cov( TY) = σ [ TIT ], where T = ( X X) X. Therefore, Cov( β) σ ( ). ˆ = XX

Not full-rak case: Examle - ANOVA for Desged Exermets The ormal equatos (7) are cosstet, but the system has ftely may solutos. Each soluto ca be exressed as β = ( XX ) XY, where ( XX ) s a geeralzed verse of XX. I fact, E β = ( XX ) X E[ Y] = ( XX ) ( XX ) β = H β β, so some comoets of β do ot ossess ubased estmators. The bases β may deed o the geeralzed verse used obtag the artcular OLS soluto. All these estmators ca t be regarded as otmal estmator of the vector β. Why? Otmal wth resect to what crtera? May eed to add a addtoal crtero to OLS to get a uque soluto, e.g., Mmum orm OLS estmator: β + = ( XX ) + XY, where ( XX ) + s the Moore-Perose geeralzed verse (Pseudo-verse) of XX. Coordate Free (Vector Sace) Aroach: Iterret the model μ= X β as - μ C[ X ]. However, ote that ˆµ = X β = PY = X(X X) X Y, the roecto of Y oto C[ X ]. The symmetrc matrx P= X( XX ) X s the orthogoal roecto matrx oto C[ X ], the sace saed by the colums of X. Eve though o-full rak there are ftely may solutos ( β ) to the ormal equatos case, the roecto ˆ µ = PY (also called Y ˆ ) s uque,.e., the matrx P does ot chage wth the choce of a geeralzed verse of XX. For a vector u C[ X ],.e., u = Xb, for some vector b, Pu=X( XX ) Xu = X( XX ) XXb = Xb= u. Thus the roecto of a vector u C[ X ] oto C[ X] s u tself.

Furthermore, for a arbtrary vector Y V, PY C[ X], therefore, P(PY) = PY holds true for all Y R. Hece, (P - P) = P(I - P) =. Thus, P = P, (.e., P s a demotet matrx). Is P a symmetrc matrx?. Fact: Every symmetrc demotet matrx s a orthogoal roecto matrx oto the sace saed by ts colums. Sce P (I-P) =, rows (colums) of P are orthogoal to the colums (rows) of (I-P),.e., Py ad ( I - P) y are orthogoal. Note that, Xβ = ˆ µ = PY = Yˆ, Y Y ˆ = (I - P)Y = e, the vector of resduals. Therefore, the vectors Yˆ ad e are orthogoal,.e., Examle: Detals of full-rak lear regresso model. The Key Questo: How to characterze the class of lear fuctos Ye ˆ = ye ˆ =. be estmated uquely through the least squares solutos? c β = c β that ca Estmable fuctos: A lear arametrc fucto c β s sad to be estmable, f there exst at least oe ubased estmator. If there does ot exst ay ubased estmator of the lear fucto c β, t s sad to be o-estmable. Why cosder estmable fuctos? We wll dscuss ts coecto wth the cocet of Idetfablty. Note that Y s a ubased estmator of x. β. (Why?) Thus x. β s estmable for each row of the matrx X. Hecec β, where the vector c s some lear combato of rows of X, s also estmable. Fact: A lear arametrc fucto c β of β ' c C ( X ) Row sace of X. (Prove t.) c C ( X ) c= Xl for some l. s = s estmable f ad oly f

Therefore, for ay OLS β, c β = lx ( XX ) XY = lpy = l ˆ µ. Thus c β s varat to the choce of geeralzed verse [Uque OLS soluto for ubased estmator ofc β. Gauss-Markov Theorem - c β Best (Mmum Varace) Lear Ubased Estmator (B.L.U.E.) of a estmable lear fuctoc β. Geeralzed Least Squares: Var( ε) = σ V, where V s a kow.d. matrx. Reduce ths roblem to a OLS roblem by a o-sgular trasformato Sce V s a ostve defte matrx, there exsts a o-sgular matrx T such that V = TT. Now, cosder the lear trasformato Z = TY. Note E( Z ) = TE( Y) = TXβ, Cov( Z) = σ TVT = σ T( T T) T = σ I. Ca cosder OLS roblem for Z.

Backgroud - Vector dfferetato: Vector of Partal dervatves of a lear form l u = lu,ad a quadratc form u Au = a uu, = = = for a symmetrc matrx A : ( lu) ( u Au) au + a u l au ( ) l ( ) = lu lu ( ) a u a u + = ; ( ) u Au u au = l u Au =. = = = Au l a u + au au ( lu ) ( ) u Au Whe A s ot symmetrc, uau = u {( A+ A ) / } u, wth {( A+ A ) / } symmetrc. Therefore, ( u Au) = ( A + A ) u.