Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo mooide (CO) cotet for 5 brads of domestic cigarettes. We are goig to try to predict CO as a fuctio of tar ad icotie cotet. To visualize the data let us plot each of these variable agaist others, see figure 14.1. Sice the variables seem to have a liear relatioship we fit a least-squares lie, which we will eplai below, to fit the data usig Matlab tool polytool. For eample, if our vectors are ic for icotie, tar for tar ad carb for CO the, for eample, usig polytool(ic,carb,1) will produce figure 14.1 (a), etc. We ca also perform statistical aalysis of these fits, i a sese that will gradually be eplaied below, usig Matlab regress fuctio. For carbo mooide vs. tar: [b,bit,r,rit,stats]=regress(carb,[oes(5,1),tar]); b =.7433 bit = 1.3465 4.1400 0.8010 0.6969 0.9051 stats = 0.9168 53.3697 0.000 1.9508, for carbo mooide vs. icotie [b,bit,r,rit,stats]=regress(carb,[oes(5,1),ic]); b = 1.6647 bit = -0.3908 3.701 1.3954 10.147 14.5761 stats = 0.8574 138.659 0.000 3.343 91
35 Carbo mooide 30 30 35 Carbo mooide 5 5 0 0 15 15 10 10 5 5 frag replacemets 5 Nicotie 0 0 5 0. 0.4 0.6 0.8 1 1. 1.4 1.6 1.8 5 10 15 0 5 30 PSfrag replacemets Tar 35 30 Nicotie 5 0 15 10 5 frag replacemets 0 5 Tar 0. 0.4 0.6 0.8 1 1. 1.4 1.6 1.8 Figure 14.1: Least-squares lie (solid lie). (a) Carbo mooide cotet (mg.) vs. icotie cotet (mg.). (b) Carbo mooide vs. tar cotet. (c) Tar cotet vs. icotie cotet. ad for icotie vs. tar [b,bit,r,rit,stats]=regress(tar,[oes(5,1),ic]); b = -1.4805 bit = -.8795-0.0815 15.681 14.1439 17.114 stats = 0.9538 474.4314 0.000 1.5488 The output of regress gives a vector b of parameters of a fitted least-squares lie, 95% cofidece itervals bit for these parameters, ad stats cotais i order: R statistic, F statistic, p-value of F statistic, MLE πˆ of the error variace. All of these will be eplaied below. 9
Simple liear regressio model. Suppose that we have a pair of variables (X, Y ) ad a variable Y is a liear fuctio of X plus radom oise: Y = f(x) + χ = β 0 + β 1 X + χ, where a radom oise χ is assumed to have ormal distributio N(0, π ). A variable X is called a predictor variable, Y - a respose variable ad a fuctio f() = β 0 + β 1 - a liear regressio fuctio. Suppose that we are give a sequece of pairs (X 1, Y 1 ),..., (X, Y ) that are described by the above model: Y i = β 0 + β 1 X i + χ i ad χ 1,..., χ are i.i.d. N(0, π ). We have three ukow parameters - β 0, β 1 ad π - ad we wat to estimate them usig a give sample. The poits X 1,..., X ca be either radom or o radom, but from the poit of view of estimatig liear regressio fuctio the ature of Xs is i some sese irrelevat so we will thik of them as fied ad o radom ad assume that the radomess comes from the oise variables χ i. For a fied X i, the distributio of Y i is equal to N(f(X i ), π ) with p.d.f. 1 (y f (X i )) απ e σ ad the likelihood fuctio of the sequece Y 1,..., Y is: 1 ) 1 P e σ (Yi f(x i )) 1 ) = e 1 P i=1 σ απ απ i=1 (Y i β 0 β 1 X i ). Let us fid the maimum likelihood estimates of β 0, β 1 ad π that maimize this likelihood fuctio. First of all, it is obvious that for ay π we eed to miimize L := (Y i β 0 β 1 X i ) i=1 over β 0, β 1. The lie that miimizes the sum of squares L is called the least-squares lie. To fid the critical poits we write: If we itroduce the otatios L = (Y i (β 0 + β 1 X i )) = 0 β 0 i=1 L β 1 = (Y i (β 0 + β 1 X i ))X i = 0 i=1 1 1 1 X X 1 X = X i, Y = Y i, = i, XY = X iy i the the critical poit coditios ca be rewritte as X β 0 + β 1 X = Y ad β 0 X + β 1 = XY. 93
Solvig for β 0 ad β 1 we get the MLE X βˆ = X ad ˆ = XY Y 0 Y βˆ1 β 1. X X These estimates are used to plot least-squares regressio lies i figure 14.1. Fially, to fid the MLE of π we maimize the likelihood over π ad get: πˆ = 1 (Y i βˆ0 βˆ1x i ). i=1 The differeces r i = Y i Ŷ i betwee observed respose variables Y i ad the values predicted by the estimated regressio lie Ŷ i = βˆ0 + βˆ1x i are called the residuals. The R statistic i the eamples above is defied as R i=1 = 1 (Y i Ŷ i ). i=1 (Y i Ȳ ) The umerator i the last sum is the sum of squares of the residuals ad the umerator is the variace of Y ad R is usually iterpreted as the proportio of variability i the data eplaied by the liear model. The higher R the better our model eplais the data. Net, we would like to do statistical iferece about the liear model. 1. Costruct cofidece itervals for parameters of the model β 0, β 1 ad π.. Costruct predictio itervals for Y give ay poit X (dotted lies i figure 14.1). 3. Test hypotheses about parameters of the model. For eample, F -statistic i the output of Matlab fuctio regress comes from a test of the hypothesis H 0 : β 0 = 0, β 1 = 0 that the respose Y is ot correlated with a predictor variable X. I spirit all these problems are similar to statistical iferece about parameters of ormal distributio such as t-tests, F -tests, etc. so as a startig poit we eed to fid a joit distributio of the estimates βˆ0, βˆ1 ad ˆπ. To compute the joit distributio of βˆ0 ad βˆ1 is very easy because they are liear combiatios of Y i s which have ormal distributios ad, as a result, βˆ0 ad βˆ1 will have ormal distributios. All we eed to do is fid their meas, variaces ad covariace, which is a straightforward computatio. However, we will obtai this as a part of a more geeral computatio that will also give us joit distributio of all three estimates βˆ0, βˆ1 ad ˆπ. Let us deote the sample variace of Xs by The we will prove the followig: π X = X. 94
1. ) π 1 X ) ) π ) βˆ1 N β 1,, βˆ0 N β π X 0, + = N β 0,, π π π Xπ Cov( βˆ0, βˆ1) =. π. πˆ is idepedet of βˆ0 ad βˆ1. 3. πˆ has π distributio with degrees of freedom. Remark. Lie 1 meas that ( βˆ0, βˆ1) have joitly ormal distributio with mea (β 0, β 1 ) ad covariace matri ( ) π Σ = X X π 1. X Proof. Let us cosider two vectors 1 1 ) a 1 = (a 11,..., a 1 ) =,..., ad X i X a = (a 1,..., a ) where a i =. π It is easy to check that both vectors have legth 1 ad they are orthogoal to each other sice their scalar product is 1 X i X a 1 a = a 1i a i = π i=1 i=1 = 0. Let us choose vectors a 3,..., a so that a 1,..., a is orthoormal basis ad, as a result, the matri a11 a 1 a 1 a A =... a 1 a is orthogoal. Let us cosider vectors Y = (Y 1,..., Y ), µ = EY = (EY 1,..., EY ) ad ) Y = (Y 1,..., Y ) = Y µ Y1 EY 1 =,..., Y EY π π π so that the radom variables Y 1,..., Y are i.i.d. stadard ormal. We proved before that if we cosider a orthogoal trasformatio of i.i.d. stadard ormal sequece: Z = (Z 1,..., Z ) = Y A 95
the Z 1,..., Z will also be i.i.d. stadard ormal. Sice Y µ ) Z = Y A = A = Y A µa π π this implies that Y A = πz + µa. Let us defie a vector Z = (Z 1,..., Z ) = Y A = πz + µa. Each Z i is a liear combiatio of Y i s ad, therefore, it has a ormal distributio. Sice we made a specific choice of the first two colums of the matri A we ca write dow eplicitly the first two coordiates Z 1 ad Z of vector Z. We have, Z 1 = a i1 Y i = 1 Yi = Y = (βˆ0 + βˆ1x) i=1 i=1 ad the secod coordiate Z = (Xi X)Y i a i Y i = π i=1 i=1 = (X π i X)Y i = π βˆ1. π i=1 Solvig these two equatios for βˆ0 ad βˆ1 we ca epress them i terms of Z 1 ad Z as βˆ1 = 1 Z ad βˆ0 = 1 Z X 1 Z. π π This easily implies claim 1. Net we will show how ˆπ ca also be epressed i terms of Z i s. πˆ = (Yi βˆ0 βˆ1x i ) = ) (Y i Ȳ ) βˆ1(x i X ) {sice βˆ0 = Ȳ βˆ1x } i=1 i=1 = (Yi Ȳ ) i=1 βˆ1π (Y i Ȳ )(X i X ) π +βˆ1 (X i X ) i=1 {{ i=1 = (Yi Ȳ ) βˆ1 π = Y i (Ȳ ) βˆ1 π {{ {{ i=1 i=1 βˆ1 Z Z 1 = Yi Z 1 Z = Z i Z 1 Z = Z 3 + + Z. i=1 i=1 I the last lie we used the fact that Z = Y A is a orthogoal trasformatio of Y ad sice orthogoal trasformatio preserves the legth of a vector we have, Z i = Y i. i=1 i=1 96
If we ca show that Z 3,..., Z are i.i.d. with distributio N(0, π ) the πˆ Z ) 3 Z ) = +... + π π π has -distributio with degrees of freedom, because Z i /π N(0, 1). Sice we showed above that Z = µa + πz Z i = (µa) i + πz i, the fact that Z 1,..., Z are i.i.d. stadard ormal implies that Z i s are idepedet of each other ad Z i N((µA) i, π ). Let us compute the mea EZ i = (µa) i : (µa) i = EZ i = E a ji Y j = a ji EY j = a ji (β 0 + β 1 X j ) j=1 j=1 j=1 = a ji (β 0 + β 1 X + β 1 (X j X)) j=1 = (β0 + β 1 X) a ji + β 1 a ji (X j X). j=1 j=1 Sice the matri A is orthogoal its colums are orthogoal to each other. Let a i = (a 1i,..., a i ) be the vector i the ith colum ad let us cosider i 3. The the fact that a i is orthogoal to the first colum gives 1 a i a 1 = a j1 a ji = a ji = 0 j=1 j=1 ad the fact that a i is orthogoal to the secod colum gives a i a = 1 π j=1 (X j X)a ji = 0. This show that for i 3 a ji = 0 ad j=1 j=1 a ji (X j X) = 0 ad this proves that EZ i = 0 for i 3 ad Z i N(0, π ) for i 3. As we metioed above this also proves that πˆ/π. Fially, πˆ is idepedet of βˆ0 ad βˆ1 because πˆ ca be writte as a fuctio of Z 3,..., Z ad βˆ0 ad βˆ1 ca be writte as fuctios of Z 1 ad Z. Statistical iferece i simple liear regressio. Suppose ow that we wat to fid the cofidece itervals for ukow parameters of the model β 0, β 1 ad π. This is 97
straightforward ad very similar to the cofidece itervals for parameters of ormal distributio. For eample, usig that πˆ/π, if we fid the costats c 1 ad c such that (0, c 1 ) = 1 α ad (c, + ) = 1 α the with probability α we have c 1 πˆ/π c. Solvig this for π we fid the α cofidece iterval: πˆ πˆ π. Similarly, we fid the α cofidece iterval for β 1. Sice / π πˆ (βˆ1 β 1 ) π N(0, 1) ad π the π / 1 πˆ (βˆ1 β 1 ) t π π has Studet t -distributio with degrees of freedom. Simplifyig, we get ( )π (βˆ1 β 1 ) t. (14.0.1) πˆ Therefore, if we fid c such that t ( c, c) = α the with probability α: ( )π c (βˆ1 β 1 ) c πˆ ad solvig for β 1 we obtai the α cofidece iterval: βˆ1 c ad α cofidece iterval for β 0 is: πˆ X ) πˆ X ) βˆ0 c 1 + β 0 βˆ0 + c 1 +. π π c c 1 πˆ πˆ. ( )π β 1 βˆ1 + c ( )π Similarly, to fid the cofidece iterval for β 0 we use that βˆ0 β / 0 1 πˆ / πˆ ) π = ( βˆ0 β 0 ) 1 + X π 1 + X π σ ) t (14.0.) We ca ow costruct various t-tests based o t-statistics (14.0.1) ad (14.0.). 98
Liear combiatios of parameters. More geerally, let us compute the distributio of a liear combiatio c 0 βˆ0 + c 1 βˆ1 of the estimates. This will allow us to costruct cofidece itervals ad t-tests for liear combiatios of parameters c 0 β 0 + c 1 β 1. Clear, the distributio of this liear combiatio will be ormal with mea E ( c βˆ + c β ˆ ) = c β + c β. We compute its variace: 0 0 1 1 0 0 1 1 Var(c 0 βˆ0 + c 1 βˆ1) = E(c 0 βˆ0 + c 1 βˆ1 c 0 β 0 c 1 β 1 ) = E(c 0 (βˆ0 β 0 ) + c 1 (βˆ1 β 1 )) ˆ ˆ = c E(β β ) + E(β β ) + E(βˆ β )(βˆ 0 0 0 c1 1 1 c0c1 0 0 1 β 1) {{ {{ {{ variace of βˆ variace of ˆ 1 β0 covariace ) 1 X π Xπ = c0 + π + c1 c 0 c 1 π π π c0 (c 0 X c 1 ) ) = π +. π This proves that βˆ0 + c 1 βˆ1 N c 0 β 0 + c 1 β 1, π c0 + (c 0X c 1 ) )) c 0. (14.0.3) π Usig (c 0, c 1 ) = (1, 0) or (0, 1), will give the distributios of βˆ0 ad βˆ1. Predictio Itervals. Suppose ow that we have a ew observatio X for which Y is ukow ad we wat to predict Y or fid the cofidece iterval for Y. Accordig to simple regressio model, Y = β 0 + β 1 X + χ ad it is atural to take Ŷ = βˆ0 + βˆ1x as the predictio of Y. Let us fid the distributio of their differece Ŷ Y. Clearly, the differece will have ormal distributio so we oly eed to compute the mea ad the variace. The mea is E(Ŷ Y ) = Eβˆ0 + Eβˆ1X β 0 β 1 X Eχ = β 0 + β 1 X β 0 β 1 X 0 = 0. Sice a ew pair (X, Y ) is idepedet of the prior data we have that Y is idepedet of Ŷ. Therefore, sice the variace of the sum or differece of idepedet radom variables is equal to the sum of their variaces, we get Var( Ŷ Y ) = Var( Ŷ ) + Var(Y ) = π + Var( Ŷ ), where we also used that Var(Y ) = Var(χ) = π. To compute the variace of Ŷ we ca use the formula above with (c 0, c 1 ) = (1, X) 1 (X Var( Ŷ ) = Var( βˆ0 + Xβˆ1) = π X) ) +. π 99
Therefore, we showed that As a result, we have: ˆ 0, π 1 + 1 + (X X) )) Y Y N. π Ŷ Y / 1 πˆ ) (X X) π π 1 1 + + σ π t ad the 1 α predictio iterval for Y is π Ŷ c + 1 + (X X) ) π Y Ŷ + c + 1 + (X These are the dashed curves created by Matlab polytool fuctio. π X) ). Simultaeous cofidece set for (β 0, β 1 ) ad F -test. We will ow costruct a statistic that will allow us to give a cofidece set for both parameters β 0, β 1 at the same time ad test the hypothesis of the type H 0 : β 0 = 0 ad β 1 = 0. (14.0.4) The values (0, 0) could be replaced by ay other predetermied values. Lookig at the proof of the joit distributio of the estimates, as a itermediate step we showed that estimates βˆ0 ad βˆ1 ca be related to Z 1 = (βˆ0 + βˆ1x) ad Z = π ˆ β 1 where ormal radom variables Z 1, Z are idepedet of each other ad idepedet of πˆ π. Also, Z 1 ad Z have variace π. Stadardizig these radom variables we get π A = ((βˆ0 β 0 ) + ( βˆ1 β 1 )X ) N(0, 1) ad B = (βˆ1 β 1 ) N(0, 1) π π which implies that A + B -distributio. By defiitio of F -distributio, / (A πˆ + B ) F,. π Simplifyig the left-had side we get F := ) (βˆ0 β 0 ) + X (βˆ1 β 1 ) + X (βˆ0 β 0 )(βˆ1 β 1 ) F,. ˆπ 100
This allows us to obtai a joit cofidece set (ellipse) for parameters β 0, β 1. Give a cofidece level α [0, 1] is we defie a threshold c by F, (0, c) = α the with probability α we have F := ) (βˆ0 β 0 ) + X (βˆ1 β 1 ) + X (βˆ0 β 0 )(βˆ1 β 1 ) c. ˆπ This iequality defies a ellipse for (β 0, β 1 ). To test the hypothesis (14.0.4), we use the fact that uder H 0 the statistic ad defie a decisio rule by F := (βˆ0 + X βˆ1 + X βˆ0βˆ1) F, ˆπ { H β = 0 : F c H 1 : F > c, where F, (c, ) = α - a level of sigificace. F -statistic output by Matlab regress fuctio will be eplaied i the et sectio. Refereces. [1] Usig Cigarette Data for A Itroductio to Multiple Regressio. by Laure McItyre, Joural of Statistics Educatio v.,.1 (1994). [] Medehall, W., ad Sicich, T. (199), Statistics for Egieerig ad the Scieces (3rd ed.), New York: Delle Publishig Co. 101