Multiple Linear Regression and the General Linear Model

Size: px

Start display at page:

Download "Multiple Linear Regression and the General Linear Model"

Pamela McGee
5 years ago
Views:

1 Multle Lnear Regresson and the General Lnear Model 1

2 Outlne 1. Introducton to Multle Lnear Regresson 2. Statstcal Inference 3. Tocs n Regresson Modelng 4. Examle 5. Varable Selecton Methods 6. Regresson Dagnostc and Strategy for Buldng a Model 2

3 1. Introducton to Multle Lnear Regresson Extendng smle lnear regresson to two or more regressors 3

4 Hstory Francs Galton coned the term regresson n hs bologcal research Karl Pearson and Udny Yule extended Galton s work to the statstcal context Legendre and Gauss develoed the method of least squares Ronald Fsher develoed the maxmum lkelhood method used n the related statstcal nference. 4

Hstory Adren-Mare Legendre (1752/9/18 - Carl Fredrch Francs Gauss Galton (1777/4/30 (1822/2/16 1911/1/17). 1855/4/23).

least-squares In 1805, Bostatstcs. analyss he ublshed n 1795 at an the artcle age of eghteen.

called des orbtes des shorter Theora Motus Cororum Ronald Aylmer comètes. chldren; Coelestum and vce n versa.

beng has the In 1821, of Least Squares to the world. Yes, tendency he ublshed to regress another to ts artcle mean.

ublshed artcle regardng to regresson combnatons for such observatonum They errorbus both develoed method of least

5 Hstory Adren-Mare Legendre (1752/9/18 - Carl Fredrch Francs Gauss Galton (1777/4/30 (1822/2/ /1/17). 1855/4/23). He develoed 1833/1/10). He s regarded the fundamentals as the founder of the of bass for Karl Pearson. least-squares In 1805, Bostatstcs. analyss he ublshed n 1795 at an the artcle age of eghteen. named Nouvelles méthodes our la In hs research he found that tall He ublshed détermnaton arents an usually artcle have called des orbtes des shorter Theora Motus Cororum Ronald Aylmer comètes. chldren; Coelestum and vce n versa. Sectonbus So the Concs Fsher. Solem Ambentum In ths artcle, he ntroduced Method heght of human n beng has the In 1821, of Least Squares to the world. Yes, tendency he ublshed to regress another to ts artcle mean. about least square he was the frst erson who Subsequently, analyss wth further coned develoment, the word called Theora ublshed artcle regardng to regresson combnatons for such observatonum They errorbus both develoed method of least squares, whch s the henomenon mnms obnoxae. and roblems. Ths artcle regresson theory ncludes Gauss Markov earlest after form of theorem regresson. Galton. Most content n ths age comes from Wkeda 5

6 Probablstc Model Y s the observed value of the random varable (r.v.) whch deends on k fxed redctor values accordng to the followng model: Y 0 x 1 1 x 2 2 x k k Here 0, 1,, k are unknown model arameters, and n s the number of observatons. The random error,, are assumed to be ndeendent r.v. s wth mean 0 and varance 2 Y Thus are ndeendent r.v. s wth mean and varance, where μ 2 0 1x1 2x2 k xk E Y 6

7 Fttng the model The least squares (LS) method s used to fnd a lne that fts the equaton Y 0 1x1 2x2 k xk, 1, Secfcally, LS rovdes estmates of the unknown model arameters, 0, 1,, k whch mnmzes,, the sum of squared dfference of the n Y x x k xk observed values,, and the corresondng onts on the lne wth the same x s y The LS can be found by takng artal dervatves of Q wth resect to the unknown arameters 2 0, 1,, k set of smultaneous lnear equatons. and settng them equal to 0. The result s a, n The resultng solutons, estmators of 0, 1,, k, resectvely. are the least squares (LS) Please note the LS method s non-arametrc. That s, no robablty dstrbuton assumtons on Y or ε are needed. 7

8 Goodness of Ft of the Model To evaluate the goodness of ft of the LS model, we use the resduals defned by y e y yˆ ( 1,2,, n) are the ftted values: ˆ An overall measure of the goodness of ft s the error sum of squares (SSE) mn Q SSE e n 1 A few other defnton smlar to those n smle lnear regresson: total sum of squares (SST): SST ( y y) 2 2 regresson sum of squares (SSR): SSR SST SSE 8

9 coeffcent of determnaton: R 2 SSR SST SSE 1 SST 2 0R 1 values closer to 1 reresent better fts addng redctor varables never decreases and generally ncreases multle correlaton coeffcent (ostve square root of ): R R 2 only ostve square root s used R s a measure of the strength of the assocaton between the redctors (x s) and the one resonse varable Y R 2 2 R 9

10 Multle Regresson Model n Matrx Notaton The multle regresson model can be reresented n a comact form usng matrx notaton Let: 1 2, n Y Y Y Y , n y y y y 1 2 n be the n x 1 vectors of the r.v. s, ther observed values, and random errors, resectvely for all n observatons Let: ' Y s ' y s ' s be the n x (k + 1) matrx of the values of the redctor varables for all n observatons (the frst column corresonds to the constant term ) 0 kn n k k x x x x x x X

11 Let: 0 1 k and ˆ 0 ˆ ˆ 1 ˆk be the (k + 1) x 1 vectors of unknown model arameters and ther LS estmates, resectvely The model can be rewrtten as: Y X The smultaneous lnear equatons whose solutons yelds the LS estmates: X ' X If the nverse of the matrx X ' y ˆ 1 ( X ' X ) ' X' X X y exsts, then the soluton s gven by: 11

12 2. Statstcal Inference 12

13 Determnng the statstcal sgnfcance of the redctor varables: For statstcal nferences, we need the assumton that d 2 ~ N 0, *..d. means ndeendent & dentcally dstrbuted We test the hyotheses: If we can t reject and varance H 0 j : B j 0 It s easly shown that each H 0 j : j 0 vs. H 1 j : j 0, then the corresondng regressor x j s not a sgnfcant redctor of y. 2 v jj, where matrx ˆ j s normal wth mean v jj j s the jth dagonal entry of the V (X' X) 1 13

14 Dervng a votal quantty for the nference on Recall ˆ 2 j j ~ N(, v jj ) The unbased estmator of the unknown error varance s gven by S 2 SSE n (k 1) e n (k 1) MSE d.o.f. 2 ( n ( k 1)) S SSE 2 We also know that W ~ 2 2 n ( k 1), and that S 2 and ˆ j are statstcally ndeendent. ˆ Wth j j Z ~ N(0,1), and by the defnton of the t-dstrbuton, v T jj Z ˆ j j W / n ( k 1) S v 14 2 jj ~ T j we obtan the votal quantty for the nference on n( k1) j 2

15 Dervaton of the Confdence Interval for P ˆ j j ( t / 2, n( k1) t / 2, n( k1) ) 1 S v jj j P ˆ ˆ ( k1) s v jj j j t / 2, n( k1 s v ) 1 ( j t / 2, n ) jj Thus the 100(1-α)% confdence nterval for j s: where SE(ˆ ) s j 15 v jj

16 Dervaton of the Hyothess Test for at the sgnfcance level α Hyotheses: H : 0 0 j H The test statstc s: a : 0 j The decson rule of the test s derved based on the Tye I error rate α. That s P (Reject H 0 H 0 s true) = P( T c) c t / 2, n ( k 1) T ˆ 0 H ~ S v j 0 n( k1) Therefore, we reject H 0 at the sgnfcance level α f and only f t, where s the observed value of 0 t / 2, n ( k t 1) 0 T0 0 jj 0 T j 16

17 Another Hyothess Test Now consder: H 0 : j 0 H a : j 0 for all 1 j k for at least one 1 j k MSR When H 0 s true, the test statstcs F0 ~ f k, n ( k 1) MSE An alternatve and equvalent way to make a decson for a statstcal test s through the -value, defned as: = P(observe a test statstc value at least as extreme as the one observed H0) At the sgnfcance level, we reject H 0 f and only f < 17

18 The General Hyothess Test Consder the full model: Y x x 2 2 x k k (=1,2, n) Y Now consder a artal model: 0 1x1 2x2 k m km, Hyotheses: H 0 : km1... k 0 vs. H a : j 0 for at least one k m 1 j k x (=1,2, n) Test statstc: F Reject H 0 when ( SSE SSE) / m km k 0 m, n( k1) SSEk /[ n ( k 1)] ~ F0 fm, n( k1), f 18

19 Estmatng and Predctng Future Observatons Let x * (x 0 *,x 1 *,...,x k * ) ' and let The votal quantty for * s T * * ˆ s x Vx * * ~ T n ( k 1) Usng ths votal quantty, we can derve a CI for the estmated mean : * * * ˆ t s x Vx n( k1), / 2 Addtonally, we can derve a redcton nterval (PI) to redct Y * : * * * Yˆ t s 1 x Vx n( k1), / 2 19

20 /* smle lnear regresson */ roc reg; model y = x; /* multle regresson */ roc reg; model y = x1 x2 x3; Regresson n SAS Here are some rnt otons for the model hrase: model y = x / nont; /* regresson wth no ntercet */ model y = x / ss1; /* rnt tye I sums of squares */ model y = x / ; /* rnt redcted values and resduals */ model y = x / r; /* oton lus resdual dagnostcs */ model y = x / clm; /* oton lus 95% CI for estmated mean */ model y = x / cl; /* oton lus 95% CI for redcted value */ model y = x / r cl clm; /* otons can be combned */ 20

21 Confdence and Predcton Band n SAS The CLM oton adds confdence lmts for the mean redcted values. The CLI oton adds confdence lmts for the ndvdual redcted values. roc sglot data=sashel.class; eg x=heght y=weght / CLM CLI; run; For nformaton about the SAS Samle Lbrary, see: htt://suort.sas.com/documenta ton/cdl/en/grstatroc/67909/htm L/default/vewer.htm#0jxq3ea4njt vnn1xnj4a7s37yz.htm 21

22 Regresson n SAS It s ossble to let SAS do the redctng of new observatons and/or estmatng of mean resonses. The way to do ths s to enter the x values (or x1,x2,x3 for multle regresson) you are nterested n durng the data nut ste, but ut a erod (.) for the unknown y value. data new; nut x y; datalnes; ; run; roc reg; model y = x / cl clm; 22

23 3. Tocs n Regresson Modelng 23

24 3.1 Multcollnearty Defnton. The redctor varables are lnearly deendent. Ths can cause serous numercal and statstcal dffcultes n fttng the regresson model unless extra redctor varables are deleted. 24

25 ^ How does the multcollnearty cause dffcultes? sthe solutonto the equaton ^ norder for to be unque and comutable. T T T X X X y, thus X X must be nvertable If the aroxmate multcollnearty haens: T X X 1. s nearly sngular, whch makes numercally unstable. Ths reflected n large changes n ther magntudes wth small changes n data. ^ V ( X T 1 X ) ^ 2. The matrx has very large elements. Therefore are large, whch makes statstcally nonsgnfcant. j ^ ) Var ( 2 v jj 25

26 Measures of Multcollnearty 1. The correlaton matrx R. Easy but can t reflect lnear relatonshs between more than two varables. 2. Determnant of R can be used as measurement of T sngularty of X X 3. Varance Inflaton Factors (VIF): the dagonal elements of R 1. VIF > 10 s regarded as unaccetable. 26

27 3.2 Polynomal Regresson A secal case of a lnear model: Problems: k y 0 1 x... k x x x 2,,, 1. The owers of x,.e., tend to be hghly correlated. 2. If k s large, the magntudes of these owers tend to vary over a rather wde range. x k So, set k<=3 f a good dea, and almost never use k>5. 27

28 1. Centerng the x-varable: Solutons * * * 0 1 k y ( x x)... ( x x)k 2. Effect: removng the non-essental multcollnearty n the data. 3. Further more, we can standardze the data by dvdng the standard devaton s of x: x x x s x 4. Effect: helng to allevate the second roblem. 5. Usng the frst few rncal comonents of the orgnal varables nstead of the orgnal varables. 28

29 3.3 Dummy Predctor Varables & The General Lnear Model How to handle the categorcal redctor varables? 1. If we have categores of an ordnal varable, such as the rognoss of a atent (oor, average, good), one can assgn numercal scores to the categores (oor=1, average=2, good=3). 29

30 2. If we have nomnal varable wth c>=2 categores. Use c-1 ndcator varables, x, called Dummy Varables,,, x 1 c 1 to code. x 1 for the th category, 1 c1 x 1 xc for the cth category. 30

31 Why don t we just use c ndcator varables: x, x,..., x 1 2 c? If we use that, there wll be a lnear deendency among them: x1 x2... x c 1 Ths wll cause multcollnearty. 31

32 Examle of the dummy varables For nstance, f we have four years of quarterly sale data of a certan brand of soda. How can we model the tme trend by fttng a multle regresson equaton? Soluton: We use quarter as a redctor varable x1. To model the seasonal trend, we use ndcator varables x2, x3, x4, for Wnter, Srng and Summer, resectvely. For Fall, all three equal zero. That means: Wnter- (1,0,0), Srng-(0,1,0), Summer-(0,0,1), Fall-(0,0,0). Then we have the model: Y 0 1x1 2x2 3x3 4x4, 1,,16 32

33 3. Once the dummy varables are ncluded, the resultng regresson model s referred to as a General Lnear Model. Ths term must be dfferentated from that of the Generalzed Lnear Model whch nclude the General Lnear Model as a secal case wth the dentty lnk functon: μ 0 1x1 2x2 k xk E Y The generalzed lnear model wll lnk the model arameters to the redctors through a lnk functon. For another examle, we recall the logt lnk n the logstc regresson. 33

34 4. Examle, Galton Here we revst the classc regresson towards Medocrty n Heredtary Stature by Francs Galton He erformed a smle regresson to redct offsrng heght based on the average arent heght htt:// Sloe of regresson lne was less than 1 showng that extremely tall arents had less extremely tall chldren At the tme, Galton dd not have multle regresson as a tool so he had to use other methods to account for the dfference between male and female heghts We can now erform multle regresson on arentoffsrng heght and use multle varables as redctors 34

35 Examle, Galton OUR MODEL: Y x x x Y = heght of chld x 1 = heght of father x 2 = heght of mother x 3 = gender of chld 35

36 Examle, Galton In matrx notaton: Y X ( X ' X ) 1 X ' y We fnd that: β 0 = β 1 = β 2 = β 3 =

37 Examle, Galton Y Chld

38 Examle, Galton X Father Mother Gender

39 Examle, Galton Imortant calculatons SSE SST r 2 MSE 1 ( y ( y y y) SSE SST SSE 4.64 n ( k 1) ) , y X ˆ Is the redcted heght of each chld gven a set of redctor varables 39

40 Examle, Galton Are these values sgnfcantly dfferent than zero? H o : β j = 0 H a : β j 0 Reject H 0j f j t j tn( k 1), /2 t894,0.025 SE( ) V j ( X ' X ) j SE( ) MSE * v jj 40

41 Examle, Galton β-estmate SE t Intercet * Father Heght Mother Heght * * Gender * * <0.05. We conclude that all β are sgnfcantly dfferent than zero. 41

42 Examle, Galton Testng the model as a whole: H o : β 0 = β 1 = β 2 = β 3 = 0 H a : The above s not true. Reject H 0 f [ ( 1)] F f f r 2 2 r n k 2 k(1 r ) SSE SST k, n( k 1), 3,894,.05 2 ( ) SSE y y SST y y 2 ( ) 11, Snce F = >2.615, we reject H o and conclude that our model redcts heght better than by chance. 42

321495(64) + 5.225951(1) 69.97 95% Predcton nterval: Y* t 894,.

43 Examle, Galton Makng Predctons Let s say George Clooney (71 nches) and Madonna (64 nches) would have a baby boy. Y * (71) (64) (1) % Predcton nterval: Y* t 894,.025 MSE *(1 x*' Vx*) Y* Y* t 894,.025 MSE *(1 x*' Vx*) ± 4.84 = (65.13, 74.81) 43

44 EXAMPLE, GALTON SAS code, data ste: htt:// Data Galton; Inut Famly Father Mother Gender $ Heght Kds; Datalnes; M F F F M M F F ; Run; 44

45 EXAMPLE, GALTON SAS code, roc REG ods grahcs on; data revse; set Galton; f Gender = 'F' then sex = 1.0; else sex = 0.0; run; roc reg data=revse; ttle "roc reg; Deendence of Chld Heghts on Parental Heghts"; model heght = father mother sex / vf colln; run; qut; 45

46 46

47 EXAMPLE, GALTON 47

48 EXAMPLE, GALTON 48

49 EXAMPLE, GALTON SAS code, roc GLM Alternatvely, one can use roc GLM rocedure that can ncororate the categorcal varable (sex) drectly va the class statement. Another added beneft s that SAS wll rovde an overall sgnfcance test for the categorcal varable. roc glm data=galton; Class gender; model heght = father mother gender; run; qut; 49

50 EXAMPLE, Galton 50

51 EXAMPLE, GALTON Dear Students, dd you notce any volaton of assumtons n the analyss of the Galton data? The answer s that we have volated the ndeendent observatons assumton as many kds were from the same famles! Ths, however, can be easly resolved wth more advanced statstcal models that wll nclude Famly as a random effect (random regressor.) 51

52 5. Varables Selecton Method A. Stewse Regresson 52

53 Varables selecton method (1) Why do we need to select the varables? (2) How do we select varables? * stewse regresson * best subset regresson 53

54 Stewse Regresson (-1)-varable model: P-varable model x x x Y, 1, 1 1, x x Y, 1 1 1,

55 0 0 Hyothesstest F - test Partal : 1 : 0 H H 55 2 / 1), ( 0 1), ( 1, 1 : ) ( : 1)] ( /[ ) /1 ( n n t t H reject SE t test statstc f n SSE SSE SSE F

56 Partal correlaton coeffcents )] ( [ )... ( )... ( )... ( x x yx x x yx r n r t F x x SSE x x SSE x x SSE SSE SSE SSE r x x yx 56

57 5. Varables selecton method A. Stewse Regresson: SAS Examle 57

58 Examle 11.5 (T&D g. 416), 11.9 (T&D g. 431) The followng table shows data on the heat evolved n calores durng the hardenng of cement on a er gram bass (y) along wth the ercentages of four ngredents: trcalcum alumnate (x1), trcalcum slcate (x2), tetracalcum alumno ferrte (x3), and dcalcum slcate (x4). No. X1 X2 X3 X4 Y Ref: T & D: "Statstcs and Data Analyss", by Tamhane and Dunlo, Pearson, 1999, 2nd edton, Pearson; ISBN:

59 SAS Program (stewse varable selecton s used) data examle115; nut x1 x2 x3 x4 y; datalnes; ; run; roc reg data=examle115; model y = x1 x2 x3 x4 /selecton=stewse; run; 59

60 Selected SAS outut The REG Procedure Model: MODEL1 Deendent Varable: y Stewse Selecton: Ste 4 Parameter Standard Varable Estmate Error Tye II SS F Value Pr > F Intercet <.0001 x <.0001 x <.0001 Bounds on condton number: ,

61 SAS Outut (cont) All varables left n the model are sgnfcant at the level. No other varable met the sgnfcance level for entry nto the model. Summary of Stewse Selecton Varable Varable Number Partal Model Ste Entered Removed Vars In R-Square R-Square C() F Value Pr > F 1 x x < x x

62 5. Varables selecton method B. Best Subsets Regresson 62

63 Best Subsets Regresson For the stewse regresson algorthm The fnal model s not guaranteed to be otmal n any secfed sense. In the best subsets regresson, subset of varables s chosen from the collecton of all subsets of k redctor varables) that otmzes a well-defned objectve crteron 63

64 Best Subsets Regresson In the stewse regresson, We get only one sngle fnal models. In the best subsets regresson, The nvestor could secfy a sze for the redctors for the model. 64

65 Best Subsets Regresson Otmalty Crtera r 2 -Crteron: r 2 Adjusted r 2 -Crteron: SSR 1 SST r 2 adj, SSE SST 1 MSE MST C -Crteron (recommended for ts ease of comutaton and ts ablty to judge the redctve ower of a model) n 1 [ E[ Yˆ ] E[ Y 2 ]] 1 2 The samle estmator, Mallows C -statstc, s gven by C SSE 2( 1) n ˆ 2 65

66 Best Subsets Regresson Algorthm Note that our roblem s to fnd the mnmum of a gven functon. Use the stewse subsets regresson algorthm and relace the artal F crteron wth other crteron such as C. Enumerate all ossble cases and fnd the mnmum of the crteron functons. Other ossblty? 66

67 Best Subsets Regresson & SAS roc reg data=examle115; run; model y = x1 x2 x3 x4 /selecton=adjrsq; For the selecton oton, SAS has mlemented 9 methods n total. For best subset method, we have the followng otons: Maxmum R 2 Imrovement (MAXR) Mnmum R 2 (MINR) Imrovement R 2 Selecton (RSQUARE) Adjusted R 2 Selecton (ADJRSQ) Mallows' C Selecton (CP) 67

68 6. Buldng A Multle Regresson Model Stes and Strategy 68

69 Modelng s an teratve rocess. Several cycles of the stes maybe needed before arrvng at the fnal model. The basc rocess conssts of seven stes 69

70 Get started and Follow the Stes Categorzaton by Usage Dvde the Data Collect the Data Exlore the Data Ft Canddate Models Select and Evaluate Select the Fnal Model 70

71 Lnear Regresson Assumtons Mean of Error Is 0 Varance of Error s Constant Probablty Dstrbuton of Error s Normal Errors are Indeendent 71

72 Resdual Plot for Functonal Form (Lnearty) Add X^2 Term Correct Secfcaton e e X X 72

73 Resdual Plot for Equal Varance SR Unequal Varance Correct Secfcaton SR X Fan-shaed. Standardzed resduals used tycally (resdual dvded by standard error of redcton) X 73

74 Resdual Plot for Indeendence Not Indeendent Correct Secfcaton SR SR X X 74

75 Questons? 75

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models