(X i X)(Y i Y ) = 1 n

Size: px

Start display at page:

Download "(X i X)(Y i Y ) = 1 n"

Horatio Charles
5 years ago
Views:

1 L I N E A R R E G R E S S I O N 10 I Chapter 6 we discussed the cocepts of covariace ad correlatio two ways of measurig the extet to which two radom variables, X ad Y were related to each other. I may cases we would like to take this a step further ad try to use iformatio from oe variable to make predictios about the outcome of the other. For istace sample covariace ad correlatio We have so far cosidered summarizig a set of observatios where oe measuremet is made o each idividal or uit, but ofte i real-life radom experimets we make multiple measuremets o each idividual. For example, durig a health check-up a doctor might record the height, weight, age, sex, pulse rate, ad blood pressure. Just as we did for sigle measuremets, we ca represet the observed data by their empirical distributio, which is ow a fuctio of multiple argumets. For example, if we measure two radom variables (X i, Y i ) for the ith idividual (say weight ad blood pressure), the the empirical distributio fuctio is give by f(t, s) 1 #{X t, Y s}. We ca ow use this to estimate populatio features by the correspodig feature of the empirical distributio. For example, the populatio covariace Cov[X, Y ] E[(X E[X])(Y E[Y ])] E[XY ] E[X]E[Y ] gives a measure of how X ad Y relate to each other. The sample versio of this is the sample covariace S XY 1 1 (X i X)(Y i Y ) 1 i1 X i Y i X Y. (10.1.1) The sample correlatio coefficiet is defied similarly to populatio correlatio coefficiet ρ[x, Y ] as i1 r[x, Y ] S XY S X S Y, (10.1.2) where S X ad S Y are the sample stadard deviatios of X ad Y respectively. As with ρ[x, Y ], r[x, Y ] is bouded betwee 1 ad 1, ad is ivariat to scale ad locatio trasformatios, that is, for real umbers a, b, c, d, r[ax + b, cy + d] r[x, Y ] 10.2 simple liear model We will assume that the variable Y depeds o X i a liear fashio, but that it is also affected by radom factors. Specifically we will assume there is a regressio lie y α + βx ad that for give x-values X 1, X 2,..., X the correspodig y-values Y 1, Y 2,..., Y are give by Y j α + βx j + ɛ j, (10.2.1) for j 1, 2,..., ad where each of the ɛ j are idepedet radom variables with ɛ j Normal(0, σ 2 ). Equatio (10.2.1) is referred to as the simple liear model. I particular ɛ j are the (radom) vertical distace of the poit (X j, Y j ) from the regressio lie. For all results below we assume σ 2 > 0 is the variace of the errors, assumed to be the same for every data poit. We also assume that ot all of the X j quatities are the same so that the variace of these quatities is o-zero. I particular this meas Versio: April 25, 2016

2 242 liear regressio 10.3 the least squares lie The values of (X 1, Y 1 ),..., (X, Y ) are collected data. Though we assume that this data is produced via the simple liear model, we typically do ot kow the actual values of the slope β or the y-itercept α. The goal of this sectio is to illustrate a way to estimate these values from the data. For a lie y a + bx the residual of a data poit (X j, Y j ) is defied to be the quatity Y j (a + bx j ). This is the differece betwee the actual y-value of the data poit ad the locatio where the lie predicts the y-value should be. I other words, it may be viewed as the error of the lie whe attemptig to predict the y-value correspodig to the X j data poit. Amog all possible lies through the data, there is oe which miimizes the sum of these squared residual errors. This is called the least squares lie. Let (X 1, Y 1 ), (X 2, Y 2 ),..., (X, Y ) be poits o the plae. Suppose we wish to fid a lie that miimises the sum of squared residual errors. That is, let g : R 2 R be defied as g(a, b) [Y j (a + bx j )] 2. The objective is to miimize g. So usig calculus, From equatio (10.3.1) we have 0 [Y j a bx j ] 0 g a 2 ad 0 g b 2 Y j [Y j a bx j ] (10.3.1) X j [Y j a bx j ]. (10.3.2) a b X j Y a bx (Y (a + bx)) Therefore 1 Y a + bx, (10.3.3) which shows that the poit (X, Y ) must lie o the least squares lie. The poit (X, Y ) is kow as the poit of averages. Similarly from equatio (10.3.2), so that 0 X j [Y j a bx j ] (X j Y j ax j bxj 2 ) X j Y j ax + b X j Y j ax + b Xj 2. (10.3.4) We ow use the system of two equatios (give by (10.3.3) ad (10.3.4)) solve for a, b to get b ( X j Y j ) X Y ( (10.3.5) Xj 2) X2 X 2 j (10.3.6) 1 We shall use the otatio X, Y, S X, S Y, r[x, Y ](below), eve though they are ot ecessarily radom quatities. This is to simplify otatio ad will allow us to use kow properties, i the evet they are radom. Versio: April 25, 2016

3 10.3 the least squares lie 243 Recall that the sample variace of X 1, X 2,..., X is S 2 X 1 1 [ (X j X) 2 ] 1 1 [ Xj 2 2X j X + X 2 ] 1 1 [( Xj 2 ) 2X( X j ) + X 2 ] 1 1 [( Xj 2 ) 2X 2 + X 2 ] 1 1 [( Xj 2 ) X 2 ] Therefore, the deomiator of (10.3.5) is simply 2. The umerator may be writte more simply by usig the otatio of sample covairace ad correlatio defied i (10.1.1) ad (10.1.2). So from (10.3.5) we have b ( X j Y j ) X Y ( Xj 2) X2 ( 1)S XY ( 1)S 2 X r[x, Y ]S Y S X Usig the above ad (10.3.3), we also ow ca write a ice formula for a, which is a Y r[x, Y ]S Y S X X (10.3.7) By the above calcuatio we have show that the least squares lie miimizig the sum of the squared residual errors is the lie passig through the poit of averages (X, Y ) ad havig a slope equal to b r[x, Y ]S Y S X. We state this precisely i the Theorem below. Theorem Let (X 1, Y 1 ), (X 2, Y 2 ),..., (X, Y ) be give data poits. The the least squares lie passes through (X, Y ) ad has slope give by r[x, Y ]S Y S X. We illustrate the use of these formulas with two examples give below. Example Cosider the followig five data poits: X Y These poits are ot coliear, but suppose we wish to fid a lie that most closely approximates their tred i the least squares sese described above. Viewig these as samples, it is routie to calculate that the formulas above yield a 9.1 ad b 0.9. Of all of the lies i the plae, the oe that miimizes the sum of squared residual errors for the data set above is the lie y x. The R software also has a feature to perform a regressio directly. To obtai this result usig R we could first create vectors that represet the data: Versio: April 25, 2016

4 244 liear regressio > x <- c(3,4,5,6,7) > y <- c(6,5,6,4,2) Ad the istruct R to perform the regressio usig the commad lm idicatig the liear model. > lm(y x) The order of the variables i this commad is importat with this y x idicatig that the y variable is beig predicted usig the x variable as iput. The resultig output from R is (Iercept) x the values of the itercept ad slope of the least squares lie respectively. Example Suppose as part of a health study, a researcher collects data for weights ad heights of sixty adult me i a populatio. The average height of the me is 174 cm with a sample stadard deviatio of 8.0 cm. The average weight of the me is 78 kg with a sample stadard deviatio of 10 kg. The correlatio betwee the variables i the sample was This iformatio aloe is eough to fid the least squares lie for predictig weight from height. The reader may use the formulas above to verify that b ad a Therefore, amog all lies, y x is the oe which miimizes the sum of squared residuals. This does ot ecessarily mea this lie would be appropriate for predictig ew data poits. To make such a declaratio, we would wat to have some evidece that the two variables had a liear relatioship to begi with, but regardless of whether or ot the data was produced from a simple liear model, the lie above miimizes error i the least squares sese. exercises Ex Let (X 1, Y 1 ),... (X, Y ) be data produced via the simple liear model ad suppose y a + bx is the least squares lie for the data. Recall from above that the residual for ay give data poit is Y j (a + bx j ), the error the lie makes i predictig the correct y-value from the give x-value. Show that the sum of the residuals over all data poits must be zero. Ex Suppose that istead of usig the simple liear model, we assume the regressio lie is kow to pass through the origi. That is, the regressio lie has the from y βx ad for give x-values X 1, X 2,..., X the correspodig y-values Y 1, Y 2,..., Y are give by Y j βx j + ɛ j, (10.3.8) for j 1, 2,...,. As with the simple liear model, we assume each of the ɛ j are idepedet radom variables with ɛ j Normal(0, σ 2 ). (We will refer to this as the liear model though the origi ad will have several exercises ivestigatig how several formulas from this chapter would eed to be modified for such a model.) Assumig data (X 1, Y 1 ),... (X, Y ) was produced from the liear model through the origi, fid the least squares lie through the origi. That is, fid a formula for b such that the lie y bx miimizes the sum of squared residual errors a ad b as radom variables I this sectio (ad the remaider of this chapter) we will assume that (X 1, Y 1 ),..., (X, Y ) follow the simple liear model (10.2.1). I other words, there is a regressio lie y α + βx ad that for give x-values X 1, X 2,..., X the correspodig y-values Y 1, Y 2,..., Y are give by (10.2.1). I the previous sectio this data was used to produce a mea squared error-miimizig least squares lie y a + bx. I this sectio we ivestigate how well the radom quatities a ad b approximate the (ukow) values α ad β. Versio: April 25, 2016

5 10.4 a ad b as radom variables 245 Theorem Uder the assumptios of the simple liear model (10.2.1), the slope b of the least squares lie is a liear combiatio of the Y j variables. Further it has a ormal distributio with mea β ad σ variace 2. ( 1)S 2 X Proof - First recall that the X 1, X 2, X 3,... X are assumed to be determiistic, so will be treated as kow costats. The data poits Y 1, Y 2,..., Y are assumed to follow the simple liear model (10.2.1). So for j 1,...,, E[Y j ] E[α + βx j + ɛ j ] α + βx j + E[ɛ j ] α + βx j ad V ar[y j ] V ar[α + βx j + ɛ j ] V ar[ɛ j ] σ 2. Usig the formula, (10.3.5), we derived for b ad the above we have E[b] E ( X j Y j ) X Y ( 1)S 2 X 1 2 X j E[Y j ]) XE[Y ] 1 2 X j (α + βx j )) X(α + βx) 1 αx + β( 2 Xj 2 ) αx βx 2 β 2 Xj 2 ) X 2 β. Similarly, V ar[b] V ar ( X j Y j ) X Y ( 1)S 2 X 1 [ 2 X 2 ]2 j V ar[y j ]) 2 X 2 V ar[y ] 1 [ 2 X 2 ]2 j σ 2 ) 2 X 2 (σ 2 /) σ 2 [ 2 X 2 ]2 j ) X 2 σ 2 2. Versio: April 25, 2016

6 246 liear regressio The algebra below justifies that b is a liear combiatio of the Y j variables. b ( X j Y j ) X Y ( 1)S 2 X 1 [ ] 2 X j Y j ) ( XY j ) Xj X 2 Sice b is a liear combiatio of idepedet, ormal radom variables Y j, b itself is also a ormal radom variable (Theorem ). As oted above, the least squares lie ca be defied as the lie of slope b passig through the poit of averages. The followig lemma is a useful fact about how these quatities relate to each other. Lemma Let b be the slope of the least squares lie ad let Y be the sample average of the Y j variables. The b ad Y are idepedet. Proof - By Theorem , Y has a ormal distributio ad so does b by Theorem By Theorem 6.4.3, all we have ro show is that Y ad b are ucorrelated. Note that the Y j variables are all idepedet of each other ad so Cov[Y j, Y k ] will be zero if j k ad will equal the variace σ 2 otherwise. So, Cov[b, Y ] Cov X j X ( 1)S 2 X Y j, 1 [ Xj X Cov k1 2 Y j, 1 Y k k1 X j X ( 1)S 2 Cov[Y j, Y k ] k1 X X j X ( 1)S 2 σ 2 X σ 2 2 X j X 0. We coclude this sectio with a result o the distributio of a. Theorem Uder the assumptios of the simple liear model (10.2.1), The y-itercept a (give by (10.3.7) of the least squares lie is a liear combiatio of Y j variables. Further it has a ormal distributio with mea α ad variace σ 2 ( 1 + Proof- See Exercise X2 ). ( 1)S 2 X Y k ] Y j exercises Ex Prove Theorem (Hit: Make use of the fact that Y a + bx ad what has previously bee prove about Y ad b). Ex Show that, geerally speakig, a ad b are ot idepedet. Fid ecessary ad sufficiet coditios for whe the two varaibles are idepedet. Ex Show that a ad Y are ever idepedet. Ex Cotiuig from Exercise , assumig the regressio lie y βx passes through the origi ad b is the lest squares lie of the form y bx, do the followig: (a) Fid the expected value of b. Versio: April 25, 2016

7 10.5 predictig ew data whe σ 2 is kow 247 (b) Fid the variace of b. (c) Determie whether or ot b has a ormal distributio. (d) Determie if b ad Y are idepedet predictig ew data whe σ 2 is kow I this sectio we retur to questio of usig data for predictio. We cotiue to assume the simple liear model (10.2.1). We further assume that α ad β are estimated by a ad b( as calculated from the data (X 1, Y 1 ),..., (X, Y )) ad parameter σ 2 describig the variability of data aroud the regressio lie is a kow quatity. First suppose for a particular determiistic x-value X that we wat to use the data to estimate the correspodig y-value Y α + βx o the regressio lie by Y a + bx. Theorem The quatity Y a + bx has a ormal distributio with mea Y α + βx ad variace σ 2 ( 1 + (X X) 2 ). ( 1)S 2 X Proof - Recall from Theorem ad Theorem that a ad b are both liear combiatio of the radom variables Y j ormal distributio. So Y has ormal distributio by Theorem We eed to calculate oly its mea ad variace. The expected value is simple to calculate. E[Y ] E[a + bx ] E[a] + E[b]X α + βx Y If a ad b were idepedet, the calculatig the variace of Y would also be a simple task, but this is typically this is ot the case. However, from Lemma , we kow that b ad Y are idepedet. To make use of this, usig (10.3.3), we may rewrite the lie i poit-slope form aroud the poit of averages: Y Y + b(x X). From this we have, V ar[y ] V ar[y + b(x X)] V ar[y ] + V ar[b](x X) 2 σ2 + σ 2 2 (X X) 2 ( ) σ (X X) 2 2. Note that for various values of X this variace is miimal whe X is X, the average value of the x-data. I this case V ar[y ] σ2 V ar[y ] as expected. The further X is from the average of the x-values, the more variace there is i predictig the poit o the regressio lie. Next suppose that, istead of tryig to estimate a poit o the regressio lie, we are tryig to predict a ew data poit produced from the liear model. Let X ow represet the x-value of some ew data poit ad let Y α + βx + ɛ where ɛ Normal(0, σ 2 ) where the radom variable ɛ is assumed to be idepedet of all prior ɛ j which produced the origial data set. The followig theorem addresses the distributio of the predictive error made whe estimatig Y by the quatity Y a + bx. Theorem If (X, Y ) is a ew data poit, as described i the previous paragraph, the the predictive error i estimatig Y usig the least square lie is (a + bx ) Y which is ormally distributed with mea 0 ad variace σ 2 ( (X X) 2 ). ( 1)S 2 X Proof - The expected value of the predictive error is zero sice E[(a + bx ) Y ] E[a] + E[b]X E[α + βx + ɛ ] α + βx α βx E[ɛ ] 0. Versio: April 25, 2016

8 248 liear regressio Both quatities a ad b are liear combiatios of the Y j variables ad so a + bx Y a + bx α βx ɛ ( α βx ) + a liear combiatio of Y 1, Y 2,..., Y, ɛ. All ( + 1) of the variables, Y 1, Y 2,..., Y, ɛ, are idepedet ad have a ormal distributio. As ( α βx ) is a costat, from the above (a + bx Y ) has a ormal distributio. Fially, to calculate the varaice, we agai rewrite a + bx i poit-slope form ad exploit idepedece. V ar[(a + bx Y )] V ar[(y + b(x X) (α + βx + ɛ )] V ar[y ] + V ar[b](x X) 2 + V ar[ɛ ] σ2 + σ 2 2 (X X) 2 + σ 2 ( ) σ (X X) 2 2. Example A mathematics professor at a large uiversity is studyig the relatioship betwee scores o a preparatio assessmet quiz studets take o the first day of class ad their actual pecetage score at the ed of class. Assumig the simple liear model with σ 6, he takes a radom sample of 30 studets ad discovers their average score o the quiz is X 54 with a sample stadard deviatio of S X 12, while the averae percetage score i the class is Y 68 with a sample stadard deviatio of S Y 10. The sample correlatio is r[x, Y ] 0.6. So accordig to the results above, the least squares lie for predictig the course percetage from the prelimiary quiz will be y 0.5x If we wish to use the lie to predict the course percetage for someoe who scores a 54 o the prelimiary quiz, we would fid y 0.5(54) , as expected sice somoe who gets a average score o the quiz is likely to get aroud the average percetage i the class. Similiarly if we wish to use the lie to predice the course percetage for someoe who scores a 80 o the prelimiary quiz, we would fid y 0.5(80) Also ot surprisig. Due to the positive correlatio, a studet scorig above average o the quiz is also likely to score higher i the course as well. The previous theorem allows us to go further a calculate a stadard deviatio associated with these estimates. For the studet who scores a 54 o the prelimiary quiz, let Y be the actual course percetage ad let a + bx 68 be the least squares lie estimate we made above. The, V ar[a + bx Y ] 36( ) ad so the stadard deviatio i the predictive error is SD[a + bx Y ] 6.1. This meas that studets who make a average score of 54 o the prelimiary quiz will have a rage of percetages i the course. This rage will have a ormal distributio with mea 68 ad stadard deviatio 6.1. We could the use ormal curve computatios to make further predictios about how likely such a studet may be to reach a certai bechmark. Next take the example of a studet who scores 80 o the prelimiary quiz. The least squares lie predicts the course percetage for such a studet will be a + bx 81, but ow V ar[a + bx Y ] 36( (80 54) ) 43.0 ad so SD[a + bx Y ] 6.6. Studet who score a 80 o the prelimiary exam will have a rage of course percetages with a ormal distributio of mea 81 ad stadard deviatio 6.6. Thikig of the stadard deviatio as the likely error associated with predictio this example suggests that predictios of data further from the mea will ted to have less accuracy tha predictios ear to the mea. This is true i the simple liear model ad will be explored i the exercises. Versio: April 25, 2016

9 10.6 hypothesis testig ad regressio 249 exercises Ex Usig the figures from Example do the followig. Two studets are selected idepedetly at radom. The first scored a 50 o the prelimiary quiz while the secod scored 60. Determie how likely it is that the studet who scored the lower grade o the quiz will score a higher percetage i the course. Ex Explai why V ar[a + bx Y ] is miimized whe X X hypothesis testig ad regressio As a ad b both have a ormal distriubtio uder the assumptio of the simple liear model, it is possible to perform tests of sigificace cocerig the values of α ad β. Of particular importace is a test with a ull hypothesis that β 0 ad a alterate hypothesis β 0. This is commoly called a test of utility. The reaso for this ame is that if β 0, the the simple liear model produces output values Y j α + ɛ j which do ot deped o the correspodig iput X j. Therefore kowig the value of X j should ot be at all helpful i predictig the correspodig Y j result. However, if β 0 the kowig X j should be at least somewhat useful i predictig Y j value. Example Suppose (X 1, Y 1 ),..., (X 16, Y 16 ) follows the simple liear model with σ 5 ad produces a least squares lie y x. Suppose the sample average of the X j data is 20 ad the sample variace is SX What is the coclusio of a test of utility at a sigificace level of α 0.05? From the give least squares lie, b 1.1. As oted above, a test of utility compares a ull hypothesis that β 0 to a alterate hypothesis β 0, so this will be a two-tailed test. If the ull were true, the E[b] 0 ad we ca use the ormal distributio to determie whether the 1.1 value is so far from zero that the ull seems ureasoable. Usig the same sample mimicig idea itroduced i Chapter 9 we let Z 1,..., Z 16 be radom variables produced from X 1,... X 16 via the simple liear model. From Theorem , the slope of the least squares lie for the (X 1, Z 1 ),..., (X 16, Z 16 ) data has a ormal distributio with mea β 0 ad variace σ 2 ( 1)S 2 X 1 6. Therefore we ca calculate P ( slope of the least squares lie 1.1) P ( Z 1.1 1/6 ) 2P (Z < 1.1 1/6 ) where Z Normal(0, 1). As this P-value is less tha the sigificace level, the test rejects the ull hypothesis. That is, the test cocludes that the slope of 1.1 is far eough from 0 that it demostrates a true relatioship betwee the X j iput values ad the Y j output values. exercises Ex Cotiuig with Example , use Theorem to devise a hypothesis test for determiig whether or ot the regressio lie goes through the origi. That is, determie whether or ot α 0 is a plausible assumptio estimaig a ukow σ 2 I may cases the variace σ 2 of the poits aroud the regressio lie will be a ukow quatity ad so, like α ad β, it too will eed to be approximted usig the (X 1, Y 1 ),..., (X, Y ) data. The followig theorem provides a ubiased estimator for σ usig the data. Theorem Let (X 1, Y 1 ),..., (X, Y ) be data followig the simple liear model with > 2. Let S (Y j (a + bx j )) 2. The S 2 is a ubiased estimator for σ 2. (That is, E[S 2 ] σ 2 ). Versio: April 25, 2016

10 250 liear regressio Proof - Before lookig at E[S 2 ] i its etirety, we look at three quatities that will be helpful i computig this expected value. First ote, V ar[(y j Y )] V ar[ Y j + (Y 1 + Y Y ) ] 1 V ar[( 1)Yj 2 + Y i ] 1 2 [( 1) 2 σ 2 + i1,i j i1,i j 1 2 [( 1)2 σ 2 + ( 1)σ 2 ] 1 σ2 σ 2 ] ad therefore, E[(Y j Y ) 2 ] V ar[y j Y ] + (E[Y j Y ]) 2 1 σ2 + ((α + βx j ) (α + βx)) 2 1 σ2 + β 2 (X j X) 2 ( 1)σ 2 + β 2 (X j X) 2 ( 1)σ 2 + β 2 ( 1)S 2 X. (10.7.1) Next, E[b 2 (X j X) 2 ] E[b 2 ] (X j X) 2 (V ar[b] + (E[b]) 2 )() 2 σ 2 ( 2 + β 2 )() 2 σ 2 + β 2. 2 (10.7.2) Versio: April 25, 2016

11 10.7 estimaig a ukow σ Also, from which we may determie that E[(Y j Y )b(x j X)] E[bY j ] Cov[b, Y j ] + E[b]E[Y j ] X Cov[ i X ( 1)S 2 Y i, Y j ] + β(α + βx j ) i1 X X i X ( 1)S 2 Cov[Y i, Y j ] + β(α + βx j ) i1 X X i X 2 V ar[y j ] + β(α + βx j ) X i X 2 σ 2 + β(α + βx j ) (X j X)E[Y j b] (X j X)E[Y b] (X j X)( X i X ( 1)S 2 σ 2 + β(α + βx j )) (X j X)E[Y ]E[b] X (X i X) 2 ( 1)S 2 σ 2 + (X j X)β(α + βx j ) (X j X)(α + βx)β X σ 2 + (X j X)β 2 (X j X) σ 2 + β 2 ( 1)S 2 X (10.7.3) Fially, puttig together the results from equatios , , ad we fid E[ (Y j (a + bx j )) 2 ] E[ (Y j (Y + b(x j X)))] Hece E[S 2 X ] E[ 1 2 E[ E[ ((Y j Y ) b(x j X)) 2 ] (Y j Y ) 2 2(Y j Y )b(x j X) + b 2 (X j X) 2 ] E[(Y j Y ) 2 ] 2E[(Y j Y )b(x j X)] + E[b 2 (X j X) 2 ] ( ( 1)σ 2 + β 2 ) 2 ( 2 σ 2 + β 2 ) 2 ( + σ 2 + β 2 2 ) ( 2)σ 2 (Y j (a + bx j )) 2 ] σ 2 as desired. Versio: April 25, 2016

12 252 liear regressio Versio: April 25, 2016

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet