Lecture 2: Linear Least Squares Regression

Lecture : Lear Least Squares Regresso Dave Armstrog UW Mlwaukee February 8, 016 Is the Relatoshp Lear? lbrary(car) data(davs) d <- whch(davs$weght > 150) Davs$weght[d] <- NA wth(davs, plot( repwt, weght)) wth(a.omt(davs), les(lowess( repwt, weght), col"red", lwd)) 40 60 80 100 10 40 60 80 100 10 repwt weght Lear Relatoshps If relatoshps look lear, we ca descrbe them wth a lear equato: Determstc Y A + B Stochastc Y A + B + E where A s the y-tercept ad B s the slope. The systematc part of the equato s also called Ŷ Y A + B + E Ŷ + E Ŷ s also called the predcted or ftted value, t s the value we expect Y to take whe takes o a partcular value. Geometry of the lear model

Fdg the Le I the Davs data, we mght use the le Y 0+1 + E - dcatg that o average, people report ther actual weght accurately. wth(davs, plot( repwt, weght)) able(a0, b1, col"red", lwd) weght 40 60 80 100 10 40 60 80 100 10 Fdg the Le II So, why ot just use the le Y 0+1 + E? We could here, but ofte we wll ot have su cetly strog theory to gude our search. How do we kow ths s the best le? Wthout kowg that our le s the best oe to descrbe ths relatoshp, we ca t be sure that someoe else ca come alog wth better results tha ours that provde a d eret explaato of results. We wat to fd the le that makes the resduals as small as possble. Ths appears to descrbe the data relatvely well, why ot just use ths oe? repwt What do we mea by Small? Least Squares Regresso What do we mea by small whe we talk about the resduals? We could mea make P E as small as possble, however ths s a P uhelpful quatty as ay le that passes through (,Ȳ ) has 0 We could mea make P E as small as possble. Ths s Least Absolute Values (LAV) regresso ad does have some desrable propertes, but also some udesrable oes, so we leave ths strategy aloe rght ow. We could mea make P E as small as possble. Ths s (Ordary) Least Squares (OLS) regresso that we focus o for the remader of the course. Remember, we ca express the resdual as a fucto of Y ad Ŷ: E Y Ŷ We wat to fd the values of A ad B that make the sum of squared resduals as small as possble. Frst, we ca recogze that the resduals are fuctos of A ad B (remember what a fucto s?). E S(A, B) (Y Ŷ ) (Y (A + B )) (Y A B ) (Y Y A Y B +AB + A + B )

Elemetary Scalar Calculus A dervatve tells us how a fucto of x behaves gve arbtrarly small chages x. The dervatve gves us the slope of the le taget to the curve. To maxmze or mmze a fucto, we have to set ts frst dervatve to zero ad solve for the part whch we are terested. Ths s what we wll do whe solvg for a ad b the lear regresso problem. We ca wrte the dervatve of f(x) wthrespecttox as: d f(x) (1) dx Example Cosder the problem where we wat to fd the mea of ths set of umbers x {1,, 5, 8, 10}. We kow how to fd the mea wth arthmetc, but we could use a least squares method to fd t. The we get: x a + E E (x a) where a x x ax + a E x a x + a x a x + a Example II Basc Rules x <- c(1,,5,8,10) f <- fucto(a){sum((x-a)^)} s <- seq(1,10,legth1000) fs <- sapply(s, f) plot(s, fs, type"l", xlab"a", ylab"f(a)") f(a) 60 80 100 10 140 160 4 6 8 10 Power Rule. Wth expoets, we ca d eretate as follows: d dx x x 1. Ths s the rule that we ll eed most ofte. The dervatve of a costat s 0. The dervatve of a sum s smply the sum of the dervatves. d dx (f(x)+g(x)) d dx f(x)+ d dx g(x) d dx x log() x d dx f(x) log() x d dx f(x) d dx log(f(x)) 1 d d dxf(x) e.g. f(x) dx log(x) 1 x dx x 1 x d a

Partal Dervatves Dervatve our example E x a x + a Whe equatos have may varable quattes, we ca use @ stead of d to dcate that the dervatve s wth respect to just oe of the varable quattes. The operatos performed are the same, though they are performed o oly the peces of the equato cotag the varable quatty of terest. So, @ @x x +3y 4x () @ @a @ x @a 0 a x (1)( )a 1 1 x x @ @a a a @ @a (x a) x +a Solve for a @ @a (x a) x +a 0 x +a x a x a The Soluto: Step 1 Take the partal frst dervatves of S(A, B) wthrespecttoa ad B. @S(A, B) @A @S(A, B) @B ( 1)()(Y A B ) 1 ( )()(Y A B ) 1 The, set them equal to zero ad solve. Ths gves us: P x a x a A A + B + B Y Y

Soluto for A Y A + B A A 1 Y Y A Ȳ B B B 1 Y Soluto for B Y A + B (Ȳ B ) + B 1 Y B 1! + B 1 Y B 1 + B Y Y B + B Y B B B! B P P Y P Y P P P P Y P 1 P P Y 1 P P Re-expressg the umerator Re-expressg the deomator 1 Y Y 1 1 Y Y Y + 1 Y {z } 0 Y Ȳ Y + Ȳ Y Ȳ Y + Ȳ Y Ȳ Y + Ȳ Y Ȳ 1! 1! + 1 1 +! 1 P + 1 P! + 1 + 1 + +

Puttg t back together Davs Data Note: B P Y Ȳ P (3) The umerator s bascally the covarace of ad Y (.e., a uscaled verso) The deomator s bascally the varace of (.e., a uscaled verso) Y <- a.omt(davs)$weght <- a.omt(davs)$repwt B.um1 <- - mea() B.um <- Y - mea(y) B.deom <- (-mea())^ B <- sum(b.um1*b.um)/sum(b.deom) A <- mea(y) - B*mea() A [1].83349 B [1] 0.9571477 Regresso usg lm() Model Ft: Resdual Stadard Error summary(lm(weght ~ repwt, dataa.omt(davs))) Call: lm(formula weght ~ repwt, data a.omt(davs)) Resduals: M 1Q Meda 3Q Max -7.5054-1.1017-0.155 1.1448 6.350 Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept).83349 0.8108 3.489 0.000611 *** repwt 0.95715 0.0109 79.168 < e-16 *** --- Sgf. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Resdual stadard error:.41 o 178 degrees of freedom Multple R-squared: 0.974, Adjusted R-squared: 0.97 F-statstc: 668 o 1 ad 178 DF, p-value: <.e-16 The resdual stadard error s oe way that we ca fgure out how well our model fts. s P E S E Ths tells us how bg the average resdual s. Ths umber ca be compared to the stadard devato of the depedet varable

y Model Ft: R-squared The R gves the proporto of varace Y accouted for by varablty. Frst, we should thk about how much varace there s to expla. There are two ways we could thk about ths. We could frst just thk about the varace of the depedet varable 178.048, the Davs data. We could also thk about rug a lear regresso where do t kow aythg other tha the DV value. Y A + E. Ths mples a perfectly flat le (.e., B 0). I ths case, we ca thk of the total varablty to expla as the varace the resduals from ths oversmplfed model. Resduals (R) x <- c(1,,3,4,5) y <- c(.5,,3,6,5) plot(x, y) able(hmea(y), lty) able(lm(y ~ x)) 3 4 5 6 1 3 4 5 x r1 <- (y - mea(y))^ r <- lm(y ~ x)$resduals^ plot.dat <- data.frame( resds c(r1, r), x <- rep(1:5, ), mod factor(rep(c(1,), each5), levels1:, labelsc("oe", "x")) ) lbrary(lattce) xyplot(resds ~ x mod, dataplot.dat, pael fucto(x,y,subscrpts){ pael.segmets(x,0, x, y) } ) resds 5 4 3 1 oe 1 3 4 5 x 1 3 4 5 x

Sums of Squares What do we kow so far summary(lm(weght ~ repwt, dataa.omt(davs))) We ca defe three quattes that provde formato about varato ad the extet to whch the model captures t. Total P (Y Ȳ ) Resdual P (Y Ŷ ) Regresso P (Ŷ Ȳ ) R RegSS TotSS Call: lm(formula weght ~ repwt, data a.omt(davs)) Resduals: M 1Q Meda 3Q Max -7.5054-1.1017-0.155 1.1448 6.350 Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept).83349 0.8108 3.489 0.000611 *** repwt 0.95715 0.0109 79.168 < e-16 *** --- Sgf. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Resdual stadard error:.41 o 178 degrees of freedom Multple R-squared: 0.974, Adjusted R-squared: 0.97 F-statstc: 668 o 1 ad 178 DF, p-value: <.e-16 Sums of Squares R Margal vs Partal Relatoshps mod <- lm(weght ~ repwt, datadavs) Aova(mod) Aova Table (Type II tests) Respose: weght Sum Sq Df F value Pr(>F) repwt 3164 1 69.3 <.e-16 *** Resduals 914 180 --- Sgf. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 We ca use regresso to fd ether margal or partal relatoshps (e ects). Margal relatoshps are what smple lear regresso (oe Y ad oe ) gves us - these do ot cotrol for ay other varables. Partal relatoshps are what multple lear regresso (oe Y ad more tha oe ) gves us - these cotrol for the e ects of other varables.

Multple Regresso Estmatg Uque E ects I the multple regresso model, we estmate the e ect of more tha 1. Y A + B 1 1 + B + E We wll leave the Math for the book, t s more complcated ad ot all that more elghteg,but ote that each coe cet B 1 ad B are each fuctos of both 1 ad. I geeral, We ca get uque estmates of the e ects of the varables oly f: All varables have varace (.e., oe s costat). No oe varable s a perfect lear fucto of aother varable. Oe example of how ths could happe s f oe varable were perfectly correlated (r 1) wth aother varable that s also the model Y A + B 1 1 + B +...+ B k k + E data(prestge) summary(mod1 <- lm(prestge ~ I(come/1000), dataprestge)) Prestge Model 1 Call: lm(formula prestge ~ I(come/1000), data Prestge) Resduals: M 1Q Meda 3Q Max -33.007-8.378 -.378 8.43 3.084 Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept) 7.141.677 11.97 <e-16 *** I(come/1000).8968 0.833 10. <e-16 *** --- Sgf. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Resdual stadard error: 1.09 o 100 degrees of freedom Multple R-squared: 0.5111, Adjusted R-squared: 0.506 F-statstc: 104.5 o 1 ad 100 DF, p-value: <.e-16 summary(mod1 <- lm(prestge ~ I(come/1000) + educato, dataprestge)) Prestge Model Call: lm(formula prestge ~ I(come/1000) + educato, data Prestge) Resduals: M 1Q Meda 3Q Max -19.4040-5.3308 0.0154 4.9803 17.6889 Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept) -6.8478 3.190 -.17 0.0359 * I(come/1000) 1.361 0.4 6.071.36e-08 *** educato 4.1374 0.3489 11.858 < e-16 *** --- Sgf. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Resdual stadard error: 7.81 o 99 degrees of freedom Multple R-squared: 0.798, Adjusted R-squared: 0.7939 F-statstc: 195.6 o ad 99 DF, p-value: <.e-16

Model Ft: Multple Regresso Stadardzed Regresso Coe cets Model ft s bascally the same multple, as smple, regresso. The cocepts we use are the same wth some smple modfcatos. The degrees of freedom (.e., the deomator) the stadard error of the resduals s k 1, where k s the umber of depedet varables the model. R s calculated the same way (sce t s defed terms of the sums of squares oly havg to do wth Y, Ŷ ad Ȳ ) We ca make a adjustmet to R to accout for creasg the umber of varables the model. R 1 RSS k 1 TSS 1 Sometmes we wat to compare the e ects of varables that are otherwse comparable. There are a couple of d eret ways to do ths: Multply the e ect sze by some comparable measure of spread (e.g., IQR, rage, etc...). Ths tells us how much predctos would chage as the varable of terest chages. Stadardzed coe cets are aother way ad we ca accomplsh ths two d eret ways. Bk B Sk k S Y Estmate a ew regresso where all of the varables are made to z-scores. You ca do ths R wth scale(). Stadardzed Regresso R or use ScaleDataFrame() summary(mod <- lm(scale(prestge) ~ scale(come) + scale(educato), dataprestge)) Call: lm(formula scale(prestge) ~ scale(come) + scale(educato), data Prestge) Resduals: M 1Q Meda 3Q Max -1.178-0.3099 0.0009 0.895 1.08 Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept) -1.754e-17 4.495e-0 0.000 1 scale(come) 3.359e-01 5.533e-0 6.071.36e-08 *** scale(educato) 6.56e-01 5.533e-0 11.858 < e-16 *** --- Sgf. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Resdual stadard error: 0.454 o 99 degrees of freedom Multple R-squared: 0.798, Adjusted R-squared: 0.7939 F-statstc: 195.6 o ad 99 DF, p-value: <.e-16 lbrary(damsc) summary(mod3 <- lm(prestge ~ come + educato, datascaledataframe(prestge))) Call: lm(formula prestge ~ come + educato, data scaledataframe(prestge)) Resduals: M 1Q Meda 3Q Max -1.178-0.3099 0.0009 0.895 1.08 Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept) -1.754e-17 4.495e-0 0.000 1 come 3.359e-01 5.533e-0 6.071.36e-08 *** educato 6.56e-01 5.533e-0 11.858 < e-16 *** --- Sgf. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Resdual stadard error: 0.454 o 99 degrees of freedom Multple R-squared: 0.798, Adjusted R-squared: 0.7939 F-statstc: 195.6 o ad 99 DF, p-value: <.e-16