University of California, Los Angeles Department of Statistics. Practice problems - simple regression 2 - solutions

Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 00C Istructor: Nicolas Christou EXERCISE Aswer the followig questios: Practice problems - simple regressio - solutios a Suppose y, y,, y are idepedet radom variables ad y i µ ɛ i for i,,, Assume that Eɛ i 0, varɛ i, ad covɛ i, ɛ j 0 Fid the least squares estimate of µ Give the variace of this estimate We wat to miimize S y i µ wrt µ Therefore, S µ y i µ 0 Solve for µ to get ˆµ Ȳ Ad varȳ b Cosider the model y i β 0 β x i ɛ i Assume that Eɛ i 0, varɛ i, ad covɛ i, ɛ j 0 I additio, it is give that x i 0 What are the least squares estimates of β 0 ad β? If x 0 we get ˆβ x iy i, ad ˆβ 0 ȳ x i c We have show that ŷ i ca be expressed as ŷ i h ii y i j i h ijy j Use this expressio to fid varŷ i Sice Y, Y,, Y are idepedet we fid that varŷ i h ii j i h ij j h ij This is simplified as follows: varŷ i h ij j j x i xx j x x i x x i x x j x x i x x i xx j x x, ad after summig over j we get i x j x i x j x j x I x 0 i x x i x I x i x d Fid a expressio of corre i, e j i terms of h ii, h jj, h ij I homework, exercise 6 we foud that cove i, e j x i xx j x h ij I additio, from class x i x otes, vare i h ii ad vare j h jj Therefore, corre i, e j cove i, e j sde i sde j EXERCISE Aswer the followig questios: h ij h ii h jj h ij hii hjj a Cosider the model y i β 0 β x i ɛ i Assume that Eɛ i 0, varɛ i, ad covɛ i, ɛ j 0 Suppose we rescale the x values as x x α, ad we wat to fit the model y i β 0 β x i ɛ i Fid the least squares estimates of β 0 ad β The ew sample mea of x is x α Therefore, ˆβ will ot chage But ˆβ 0 ȳ ˆβ x α ȳ ˆβ x α ˆβ ˆβ 0 α ˆβ b Refer to the model y i β 0 β x i ɛ i of part a Fid the SSE of this model ad compare it to the SSE of the model y i β 0 β x i ɛ i What is your coclusio? SSE SST SSR Note: SST is the same We oly rescale x Will SSR chage? SSR ˆβ x i α x α SSR Therefore, SSE SSE c Cosider the simple regressio model y i β 0 β x i ɛ i, with Eɛ i 0, varɛ i, ad covɛ i, ɛ j 0 Show that ES Y Y β S XX, where S Y Y y i ȳ ad S XX x i x ES Y Y ESST ESSE ESSR E s e E ˆβ x i x

Es e x i x E ˆβ x i x var ˆβ E ˆβ x i x x i x β β d Refer to the model of part c Fid covɛ i, e i covɛ i, e i covɛ i, y i ȳ ˆβ x i x covɛ i, y i covɛ i, ȳ x i xcovɛ i, ˆβ i x i x x i x x i x x i x x i x EXERCISE 3 Cosider the simple regressio model y i β 0 β x i ɛ i, with Eɛ i 0, varɛ i, ad covɛ i, ɛ j 0 Also, assume that ɛ i N0, Suppose we wat to test simultaeously H 0 : β β ad β 0 β 0 H a : The hypothesis H 0 is ot true Aswer the followig questios: a I the expressio Q y i β0 β x i if we add ad subtract ˆβ 0 ad add ad subtract ˆβ x i show that Q y i ˆβ 0 ˆβ x i ˆβ 0 β0 ˆβ β x i x ˆβ 0 β0 ˆβ β Q Q y i β0 β x i yi β0 β x ˆβ 0 ˆβ 0 ˆβ x i ˆβ x i yi ˆβ 0 ˆβ x i ˆβ 0 β0 ˆβ β i x y i ˆβ 0 ˆβ x i ˆβ 0 β0 ˆβ β ˆβ 0 β 0 ˆβ β y i ˆβ 0 ˆβ x i this is zero because e i 0 y i ˆβ 0 ˆβ x i x i this is zero because e ix i 0 ˆβ 0 β0 ˆβ β x i x i but x i x y i ˆβ 0 ˆβ x i ˆβ 0 β0 ˆβ β x i x ˆβ 0 β0 ˆβ β b Let D ˆβ 0 ˆβ x Show that the radom variables ˆβ ad D are ucorrelated, ad explai why ˆβ ad D must therefore be idepedet CovD, ˆβ cov ˆβ 0 ˆβ x, ˆβ cov ˆβ 0, ˆβ xcov ˆβ 0, ˆβ x xi x 0 x xi x They are idepedet because they are bivariate ormal

c Show that the sum of the last three terms of i part a is equal to ˆβ β var ˆβ First let s fid vard D β 0 β x vard vard var ˆβ 0 ˆβ x var ˆβ 0 x var ˆβ xcov ˆβ 0, ˆβ Ad ow the proof: ˆβ β D β 0 β var ˆβ x vard ˆβ β ˆβ0 β0 ˆβ β x x i x ˆβ β x i x x x i x x x i x x x i x ˆβ 0 β0 x ˆβ β x ˆβ 0 β0 ˆβ β ˆβ β x i x ˆβ β ˆβ 0 β0 x ˆβ β x ˆβ 0 β0 ˆβ β ˆβ 0 β0 ˆβ β x i x ˆβ 0 β0 ˆβ β d If H 0 is true, what are the degrees of freedom of the radom variables ˆβ β var ˆβ Sice ˆβ N β, x i x D N β 0 β x, ad D β 0 β x vard? it follows that ˆβ β var ˆβ D β0 β x vard χ χ EXERCISE 4 Cosider the simple regressio model y i β 0 β x i ɛ i, with Eɛ i 0, varɛ i, ad covɛ i, ɛ j 0 Also, assume that ɛ i N0, Aswer the followig questios: a Fid EY i EY i vary i EY i β 0 β x i b Fid the distributio of Ȳ Ȳ Nβ 0 β x, c Fid EȲ EȲ varȳ EȲ β 0 β x d Fid cov ɛ i, ˆβ cov ɛ i, ˆβ cov ɛ i, j k jy j j k jcovɛ i, y j k i 0 Note: This is the same as cov y i, ˆβ covȳ, ˆβ covȳ, ˆβ 0 e Suppose EY i β 0 β x i, but that the Y i s are ot ecessarily idepedet or ormally distributed ad do ot ecessarily have equal variaces Are ˆβ 0 ad ˆβ ubiased estimators of β 0 ad β? Yes, ˆβ ad ˆβ 0 are still ubiased Takig expectatio of ˆβ ad ˆβ 0 does ot ivolve the idepedece assumptio

EXERCISE 5 Cosider the simple regressio model y i β 0 β x i ɛ i, with Eɛ i 0, varɛ i, ad covɛ i, ɛ j 0 Also, assume that ɛ i N0, I this umerical example, y represets the cocetratio of lead i ppm ad x represets the cocetratio of zic i ppm of soil at a particular area of iterest The sample size was 5 These data gave the followig results: y i ȳŷ i ȳ 7076708 y i ȳ 73376 x i x 560 x i 50706 ȳ 64 Aswer the followig questios: a Fid ˆβ Aswer 098 b Fid ˆβ 0 Aswer 58346 c Compute s e Aswer 9696 d Compute the value of the F statistic i testig the hypothesis H 0 : β 0 H a : β 0 Aswer 3593 e Compute var ˆβ 0 Aswer 46886 EXERCISE 6 Cosider the simple regressio model y i β 0 β x i ɛ i, i,, with Eɛ i 0, varɛ i, covɛ i, ɛ j 0, ad ɛ i N0, Aswer the followig questios: a Fid Covŷ i, y i covŷ i, y i cov ȳ ˆβ x i x, y i covȳ, yi x i xcovy i, ˆβ x i x x i x b Fid Cov ɛ i, e i Sice e i 0, it follows that Cov ɛ i, 0 0 EXERCISE 7 Three variables N, D, ad Y, all have zero sample meas ad uit sample variaces A fourth variable is C N D I the regressio of C o Y, the slope is 08 I the regressio of C o N, the slope is 05 I the regressio of D o Y the slope is 04 What is the error sum of squares i the regressio of C o D? There are observatios C N D varc varn vard covn, D covn, D From the two simple regressios we have: covc,y vary covc, Y 08 But covc, Y covn D, Y covn, Y covd, Y Also, covc,n varn covc, N 05 But covc,ncovnd, NvarN covd,n05 covd, N 05 Therefore, varc 05 Also, covc, D covn D, D covn, D vard 05 05 To fid the slope of the regressio of C o D: ˆβ covc,d vard 05 Fially, SSE SST SSR s C ˆβ S D 0 05 0 SSE 5 EXERCISE 8 Aswer the followig questios: a Cosider the simple regressio model y i β 0 β x i ɛ i, i,, with Eɛ i 0, varɛ i, covɛ i, ɛ j 0, ad ɛ i N0, Show that the correlatio coefficiet betwee ˆβ 0 ad ˆβ x is x i corr ˆβ 0, ˆβ cov ˆβ 0, ˆβ sd ˆβ 0 sd ˆβ x x i x x x i x x i x

x x i x x i x x x i x x i x x x i b Refer to the model of part a Give that x 0, derive a F statistic for testig the hypothesis H 0 : β β 0 agaist the alterative H a : β β 0 Follow these steps: Fid the distributio of ˆβ ˆβ 0 This is Nβ β 0, var ˆβ var ˆβ 0 cov ˆβ, β 0 Ad also s e χ Usig ad we ca create a ratio that follows the F distributio with degrees of freedom,, which ca also be obtaied if we have used t EXERCISE 9 Access the followig data i R: a <- readtable"http://wwwstatuclaedu/~christo/statistics00c/soil_completetxt", headertrue Aswer the followig questios: a Ru the regressio of textttcadmium o zic Attach the R output q <- lma$cadmium ~ a$zic summaryq Call: lmformula a$cadmium ~ a$zic Residuals: Mi Q Media 3Q Max -40976-0785 008 0607 4539 Coefficiets: Estimate Std Error t value Pr> t Itercept -0885463 0855-478 404e-06 *** a$zic 0008795 00003 884 < e-6 *** --- Sigif codes: 0?***? 000?**? 00?*? 005?? 0?? Residual stadard error: 47 o 53 degrees of freedom Multiple R-squared: 08394, Adjusted R-squared: 08384 F-statistic: 800 o ad 53 DF, p-value: < e-6 b Compute the leverage values leverage <- iflueceq$hat headleverage #List the first 6 leverage values: 3 4 5 0050933 00867869 0007849009 00086300 0008393 6 000867903 c Suppose the 0th observatio is deleted Give the formula that computes the ew ˆβ ad ˆβ 0 Use R to compute them ad attach the code The formula for computig ˆbeta after poit i is deleted is give by: ˆβ i ˆβ h ii x i xy i ȳ x i xy i ȳ ad ˆbeta 0 after poit i is deleted is give by: ˆβ 0 i ȳ i ˆβ i x i, where ȳ i ad x i are the sample meas of y ad x after observatio i is deleted from the data set

EXERCISE 0 Breast cacer mortality data: The data cotai breast cacer mortality y from 950 to 960 ad the adult white female populatios x i 960 for 30 couties i North Carolia, South Carolia, ad Georgia Access the data: a <- readtable"http://wwwstatuclaedu/~christo/statistics00c/cacertxt", sep",", headertrue Aswer the followig questios: Costruct a scatterplot of y o x Ru the regressio through the origi of y o x 3 Check the assumptios 4 Now ru the regressio of y o sqrtx 5 Check the assumptio of the model of questio 5 #Breast cacer mortality data: #Read the data: a <- readtable"http://wwwstatuclaedu/~christo/statistics0/cacertxt", sep",", headertrue #See the ames of the variables: amesa #Plot y o x: plota$x, a$y We see o-costat variace #Ru the regressio of y o x without the itercept: q <- lma$y ~ a$x 0 #See summary of the regressio: summaryq #No-costat variace ca be detected with the followig two plots: #Residuals o fitted values: plotq$fitted, q$res #Residuals o x: plota$x, q$res #Oe suggestio is to trasform the variables take square roots: #Ru the regressio o the trasformed variables: q <- lmsqrta$y ~ sqrta$x 0 #See summary of the ew regressio: summaryq #Make some plots: #First scatterplot of the trasformed variables: plotsqrta$x, sqrta$y #The plot of residuals o fitted values of the regressio o the trasformed variables: plotq$fitted, q$res #Ad residuals o sqrtx: plotsqrta$x, q$res #These plots usig the trasformed variables showed that the variace is defiitely more costat tha before