Multiple Linear Regression Analysis

LINEA EGESSION ANALYSIS MODULE III Lecture - 4 Multple Lear egresso Aalyss Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur

Cofdece terval estmato The cofdece tervals multple regresso model ca be costructed for dvdual regresso coeffcets as well as otly. We cosder both of them as follows: Cofdece terval o the dvdual regresso coeffcet Assumg ε ' s are detcally ad depedetly dstrbuted followg, we have y N X I ~ ( βσ, ) b N X X ~ ( βσ, ( ' ) ). Thus the margal dstrbuto of ay regresso coeffcet estmate ~ ( β, σ ) b N C = + N(0, σ ) y Xβ ε where s the th dagoal elemet of ( X ' X). Thus where C b β t = ~ t ( k) uder H 0, =,,... ˆ σ C SSre s yy ' bx ' ' y ˆ σ = =. k k

3 So the 00( α)% cofdece terval for β ( =,,..., k) s obtaed as follows: b β P tα t, k α = α ˆ σ C P b t C b t C So the cofdece terval s ˆ ˆ α σ β + α σ = α. b ˆ, ˆ tα σ C b + tα σ C.

4 Smultaeous cofdece tervals o regresso coeffcets A set of cofdece tervals that are true smultaeously wth probablty cofdece tervals. It s relatvely easy to defe a ot cofdece rego for Sce ( b β)' X ' X( b β) ~ F k MS re s k, k ( b β)' X ' X( b β) P Fα ( k, k) α. k MS = re s β ( α) multple regresso model. are called smultaeous or ot So a 00 ( α)% ot cofdece rego for all of the parameters s ( b β)' X' Xb ( β) ~ Fα ( k, k ) k MS res β whch descrbes a ellptcally shaped rego.

5 Coeffcet of determato ( ) ad adusted Let be the multple correlato coeffcet betwee y ad X, X,..., X k. The square of multple correlato coeffcet ( ) s called as coeffcet of determato. The value of commoly descrbes that how well the sample regresso le fts to the observed data. Ths s also treated as a measure of goodess of ft of the model. Assumg that the tercept term s preset the model as the where y = β+ βx + β3x3 +... + βk Xk + u, =,,..., = SS r e s SS T SS r e g = SS = SS SS = SS re g T ee ' ( y y) res T : sum of squares due to resduals, : total sum of squares, : sum of squares due to regresso. measure the explaatory power of the model whch tur reflects the goodess of ft of the model. It reflects the model adequacy the sese that how much s the explaatory power of explaatory varable.

Sce e e = y I X X X X y = y Hy ' ' ( ' ) ' ', ( y y) = y y, = = where y = y = ' y wth = (,,..., ) ', y = ( y, y,..., y) ' = 6 Thus where So ( y y) = y ' y ' yy ' = = y' y y' ' y = y' y y' ( ' ) ' y = y' I ( ' ) ' y = y ' Ay A= I ' ( ' ) '. y Hy =. y ' Ay The lmts of are 0 ad,.e., 0. = 0 dcates the poorest ft of the model. = dcates the best ft of the model. = 0.95 dcates that 95% of the varato y s explaed by the explaatory varables. I smple words, the model s 95% good. Smlarly ay other value of betwee 0 ad dcates the adequacy of ftted model.

7 Adusted If more explaatory varables are added to the model, the creases. I case the varables are rrelevat, the wll stll crease ad gves a overly optmstc pcture. Wth a purpose of correcto overly optmstc pcture, adusted, deoted as SSre s/( k) = SS / ( ) = k ( ). or ad s used whch s defed as We wll see later that ( - k) ad ( - ) are the degrees of freedom assocated wth the dstrbutos of SS res ad SS T. Moreover, the quattes ad are based o the ubased estmators of respectve varaces of e ad y s the cotext of aalyss of varace. T SS r e s SS T k The adusted wll decle f the addto of a extra varable produces too small a reducto ( ) for the crease s. k to compesate Aother lmtato of adusted s that t ca be egatve also. For example f = 0.84 = 0.08< 0 7 9 whch has o terpretato. k = 3, = 0, = 0.6, the

Lmtatos 8. If costat term s abset the model, the ca ot be defed. I such cases, ca be egatve. Some ad-hoc measures based o for regresso le through org have bee proposed the lterature.. s sestve to extreme values, so lacks robustess. 3. Cosder a stuato where we have followg two models: The questo s ow whch model s better? y = β + β X +... + β X + u, =,,.., k k log y = γ + γ X +... + γ X + v k k For the frst model, = = = ( y yˆ ) ( y y) ad for the secod model, a opto s to defe as (log y log ˆ y) =. (log y log y) = = As such ad are ot comparable. If stll, the two models are eeded to be compared, a better proposto to defe ca be as follows: * ( y ˆ at log y ) = 3 = ( y y) = * where y = logy. Now o comparso may gve a dea about the adequacy of the two models. ad 3

9 elatoshp of aalyss of varace test ad coeffcet of determato β 0 3 Assumg to be a tercept term, the for : = =... = = 0 the F-statstc aalyss of varace test s F = MS MS re g res where s the coeffcet of determato. H β β β k ( k) SS k SS k SS = = = SS reg reg T ( k ) SSres k SST SSreg k SS re g k = k SS re g T So F ad are closely related. Whe = 0, the F = 0. I lmt, whe =, F =. So both F ad vary drectly. Larger mples greater F value. That s why the F test uder aalyss of varace s termed as the measure of overall sgfcace of estmated regresso. It s also a test of sgfcace of. If F s hghly sgfcat, t mples that we ca reect H 0,.e. y s learly related to X s.

0 Predcto of values of study varable The predcto multple regresso model has two aspects. Predcto of average value of study varable or mea respose.. Predcto of actual value of study varable.. Predcto of average value of y We eed to predct E(y) at a gve The predctor as a pot estmate s p= xb= x( X' X) X' y ' ' 0 0 E( p) = x β. ' 0 So p s a ubased predctor for E( y). x = ( x, x,..., x )'. 0 0 0 0k Its varace s [ ] [ ] Var( p) = E p E( y) ' p E( y) = σ x ( X ' X) x. ' 0 0

The cofdece terval o the mea respose at a partcular pot, such as x0, x0,..., x0k ca be foud as follows: Defe x = ( x, x,..., x )'. 0 0 0 0k The ftted value at The x 0 s yˆ = xb. ' 0 0 E( yˆ ) = x β = E( y x ) ' 0 0 0 Var( yˆ ) = σ x ( X ' X ) x ' 0 0 0 yˆ E( y x ) = α 0 0 P tα t, ' α k ˆ σ x0( X ' X) x0 ' ' P yˆ0 t ˆ ˆ ˆ α σ x0( X ' X) x0 E( y x0) y0 + tα σ x0( X ' X) x0 = α. 0 0 0 x, x,..., x k 0 The 00( α)% cofdece terval o the mea respose at the pot,.e., E( y x ) s ' ' yˆ0 t ˆ ˆ ˆ α σ x0( X ' X) x0, y0 + tα σ x0( X ' X) x0.

. Predcto of actual value of y We eed to predct y at a gve x = ( x, x,..., x )'. 0 0 0 0k The predctor as a pot estmate s p f = xb ' 0 So p f E( p ) = x β f ' 0 s a ubased predctor for y. It's varace s ( ) Var( p ) = E ( p y)( p y)' f f f ' = σ + x0( X ' X) x 0. The 00( α)% cofdece terval for ths future observato s ' ' p ˆ ˆ f tα σ [ + x0( X ' X) x0], pf + tα σ [ + x0( X ' X) x0].