Lecture 2: Linear Least Squares Regression

Similar documents
Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

residual. (Note that usually in descriptions of regression analysis, upper-case

Simple Linear Regression

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Econometric Methods. Review of Estimation

Linear Regression with One Regressor

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Objectives of Multiple Regression

Lecture 8: Linear Regression

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

ENGI 3423 Simple Linear Regression Page 12-01

Chapter 13 Student Lecture Notes 13-1

Summary of the lecture in Biostatistics

Statistics: Unlocking the Power of Data Lock 5

ESS Line Fitting

Maximum Likelihood Estimation

Chapter 5 Properties of a Random Sample

Simple Linear Regression - Scalar Form

Lecture Notes Types of economic variables

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

Simple Linear Regression and Correlation.

Chapter Two. An Introduction to Regression ( )

Fundamentals of Regression Analysis

Multivariate Transformation of Variables and Maximum Likelihood Estimation

2SLS Estimates ECON In this case, begin with the assumption that E[ i

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Multiple Choice Test. Chapter Adequacy of Models for Regression

Lecture 3. Sampling, sampling distributions, and parameter estimation

CHAPTER 4 RADICAL EXPRESSIONS

Statistics MINITAB - Lab 5

ENGI 4421 Propagation of Error Page 8-01

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

: At least two means differ SST

ε. Therefore, the estimate

STA302/1001-Fall 2008 Midterm Test October 21, 2008


Functions of Random Variables

CHAPTER VI Statistical Analysis of Experimental Data

Multiple Linear Regression Analysis

Third handout: On the Gini Index

Chapter 4 Multiple Random Variables

Chapter 14 Logistic Regression Models

Lecture 1: Introduction to Regression

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

X ε ) = 0, or equivalently, lim

Simple Linear Regression

Analyzing Two-Dimensional Data. Analyzing Two-Dimensional Data

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

Module 7. Lecture 7: Statistical parameter estimation

CHAPTER 2. = y ˆ β x (.1022) So we can write

Continuous Distributions

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Probability and. Lecture 13: and Correlation

Chapter 2 Supplemental Text Material

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

LINEAR REGRESSION ANALYSIS

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

Parameter, Statistic and Random Samples

Class 13,14 June 17, 19, 2015

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

Lecture 07: Poles and Zeros

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Lecture 3 Probability review (cont d)

COV. Violation of constant variance of ε i s but they are still independent. The error term (ε) is said to be heteroscedastic.

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

TESTS BASED ON MAXIMUM LIKELIHOOD

L5 Polynomial / Spline Curves

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

ECON 5360 Class Notes GMM

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Logistic regression (continued)

Lecture 1: Introduction to Regression

Mu Sequences/Series Solutions National Convention 2014

ρ < 1 be five real numbers. The

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Point Estimation: definition of estimators

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

Special Instructions / Useful Data

Reaction Time VS. Drug Percentage Subject Amount of Drug Times % Reaction Time in Seconds 1 Mary John Carl Sara William 5 4

CHAPTER 3 POSTERIOR DISTRIBUTIONS

Line Fitting and Regression

Lecture 1 Review of Fundamental Statistical Concepts

EECE 301 Signals & Systems

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

MEASURES OF DISPERSION

We have already referred to a certain reaction, which takes place at high temperature after rich combustion.

Chapter 9 Jordan Block Matrices

Transcription:

Lecture : Lear Least Squares Regresso Dave Armstrog UW Mlwaukee February 8, 016 Is the Relatoshp Lear? lbrary(car) data(davs) d <- whch(davs$weght > 150) Davs$weght[d] <- NA wth(davs, plot( repwt, weght)) wth(a.omt(davs), les(lowess( repwt, weght), col"red", lwd)) 40 60 80 100 10 40 60 80 100 10 repwt weght Lear Relatoshps If relatoshps look lear, we ca descrbe them wth a lear equato: Determstc Y A + B Stochastc Y A + B + E where A s the y-tercept ad B s the slope. The systematc part of the equato s also called Ŷ Y A + B + E Ŷ + E Ŷ s also called the predcted or ftted value, t s the value we expect Y to take whe takes o a partcular value. Geometry of the lear model

Fdg the Le I the Davs data, we mght use the le Y 0+1 + E - dcatg that o average, people report ther actual weght accurately. wth(davs, plot( repwt, weght)) able(a0, b1, col"red", lwd) weght 40 60 80 100 10 40 60 80 100 10 Fdg the Le II So, why ot just use the le Y 0+1 + E? We could here, but ofte we wll ot have su cetly strog theory to gude our search. How do we kow ths s the best le? Wthout kowg that our le s the best oe to descrbe ths relatoshp, we ca t be sure that someoe else ca come alog wth better results tha ours that provde a d eret explaato of results. We wat to fd the le that makes the resduals as small as possble. Ths appears to descrbe the data relatvely well, why ot just use ths oe? repwt What do we mea by Small? Least Squares Regresso What do we mea by small whe we talk about the resduals? We could mea make P E as small as possble, however ths s a P uhelpful quatty as ay le that passes through (,Ȳ ) has 0 We could mea make P E as small as possble. Ths s Least Absolute Values (LAV) regresso ad does have some desrable propertes, but also some udesrable oes, so we leave ths strategy aloe rght ow. We could mea make P E as small as possble. Ths s (Ordary) Least Squares (OLS) regresso that we focus o for the remader of the course. Remember, we ca express the resdual as a fucto of Y ad Ŷ: E Y Ŷ We wat to fd the values of A ad B that make the sum of squared resduals as small as possble. Frst, we ca recogze that the resduals are fuctos of A ad B (remember what a fucto s?). E S(A, B) (Y Ŷ ) (Y (A + B )) (Y A B ) (Y Y A Y B +AB + A + B )

Elemetary Scalar Calculus A dervatve tells us how a fucto of x behaves gve arbtrarly small chages x. The dervatve gves us the slope of the le taget to the curve. To maxmze or mmze a fucto, we have to set ts frst dervatve to zero ad solve for the part whch we are terested. Ths s what we wll do whe solvg for a ad b the lear regresso problem. We ca wrte the dervatve of f(x) wthrespecttox as: d f(x) (1) dx Example Cosder the problem where we wat to fd the mea of ths set of umbers x {1,, 5, 8, 10}. We kow how to fd the mea wth arthmetc, but we could use a least squares method to fd t. The we get: x a + E E (x a) where a x x ax + a E x a x + a x a x + a Example II Basc Rules x <- c(1,,5,8,10) f <- fucto(a){sum((x-a)^)} s <- seq(1,10,legth1000) fs <- sapply(s, f) plot(s, fs, type"l", xlab"a", ylab"f(a)") f(a) 60 80 100 10 140 160 4 6 8 10 Power Rule. Wth expoets, we ca d eretate as follows: d dx x x 1. Ths s the rule that we ll eed most ofte. The dervatve of a costat s 0. The dervatve of a sum s smply the sum of the dervatves. d dx (f(x)+g(x)) d dx f(x)+ d dx g(x) d dx x log() x d dx f(x) log() x d dx f(x) d dx log(f(x)) 1 d d dxf(x) e.g. f(x) dx log(x) 1 x dx x 1 x d a

Partal Dervatves Dervatve our example E x a x + a Whe equatos have may varable quattes, we ca use @ stead of d to dcate that the dervatve s wth respect to just oe of the varable quattes. The operatos performed are the same, though they are performed o oly the peces of the equato cotag the varable quatty of terest. So, @ @x x +3y 4x () @ @a @ x @a 0 a x (1)( )a 1 1 x x @ @a a a @ @a (x a) x +a Solve for a @ @a (x a) x +a 0 x +a x a x a The Soluto: Step 1 Take the partal frst dervatves of S(A, B) wthrespecttoa ad B. @S(A, B) @A @S(A, B) @B ( 1)()(Y A B ) 1 ( )()(Y A B ) 1 The, set them equal to zero ad solve. Ths gves us: P x a x a A A + B + B Y Y

Soluto for A Y A + B A A 1 Y Y A Ȳ B B B 1 Y Soluto for B Y A + B (Ȳ B ) + B 1 Y B 1! + B 1 Y B 1 + B Y Y B + B Y B B B! B P P Y P Y P P P P Y P 1 P P Y 1 P P Re-expressg the umerator Re-expressg the deomator 1 Y Y 1 1 Y Y Y + 1 Y {z } 0 Y Ȳ Y + Ȳ Y Ȳ Y + Ȳ Y Ȳ Y + Ȳ Y Ȳ 1! 1! + 1 1 +! 1 P + 1 P! + 1 + 1 + +

Puttg t back together Davs Data Note: B P Y Ȳ P (3) The umerator s bascally the covarace of ad Y (.e., a uscaled verso) The deomator s bascally the varace of (.e., a uscaled verso) Y <- a.omt(davs)$weght <- a.omt(davs)$repwt B.um1 <- - mea() B.um <- Y - mea(y) B.deom <- (-mea())^ B <- sum(b.um1*b.um)/sum(b.deom) A <- mea(y) - B*mea() A [1].83349 B [1] 0.9571477 Regresso usg lm() Model Ft: Resdual Stadard Error summary(lm(weght ~ repwt, dataa.omt(davs))) Call: lm(formula weght ~ repwt, data a.omt(davs)) Resduals: M 1Q Meda 3Q Max -7.5054-1.1017-0.155 1.1448 6.350 Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept).83349 0.8108 3.489 0.000611 *** repwt 0.95715 0.0109 79.168 < e-16 *** --- Sgf. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Resdual stadard error:.41 o 178 degrees of freedom Multple R-squared: 0.974, Adjusted R-squared: 0.97 F-statstc: 668 o 1 ad 178 DF, p-value: <.e-16 The resdual stadard error s oe way that we ca fgure out how well our model fts. s P E S E Ths tells us how bg the average resdual s. Ths umber ca be compared to the stadard devato of the depedet varable

y Model Ft: R-squared The R gves the proporto of varace Y accouted for by varablty. Frst, we should thk about how much varace there s to expla. There are two ways we could thk about ths. We could frst just thk about the varace of the depedet varable 178.048, the Davs data. We could also thk about rug a lear regresso where do t kow aythg other tha the DV value. Y A + E. Ths mples a perfectly flat le (.e., B 0). I ths case, we ca thk of the total varablty to expla as the varace the resduals from ths oversmplfed model. Resduals (R) x <- c(1,,3,4,5) y <- c(.5,,3,6,5) plot(x, y) able(hmea(y), lty) able(lm(y ~ x)) 3 4 5 6 1 3 4 5 x r1 <- (y - mea(y))^ r <- lm(y ~ x)$resduals^ plot.dat <- data.frame( resds c(r1, r), x <- rep(1:5, ), mod factor(rep(c(1,), each5), levels1:, labelsc("oe", "x")) ) lbrary(lattce) xyplot(resds ~ x mod, dataplot.dat, pael fucto(x,y,subscrpts){ pael.segmets(x,0, x, y) } ) resds 5 4 3 1 oe 1 3 4 5 x 1 3 4 5 x

Sums of Squares What do we kow so far summary(lm(weght ~ repwt, dataa.omt(davs))) We ca defe three quattes that provde formato about varato ad the extet to whch the model captures t. Total P (Y Ȳ ) Resdual P (Y Ŷ ) Regresso P (Ŷ Ȳ ) R RegSS TotSS Call: lm(formula weght ~ repwt, data a.omt(davs)) Resduals: M 1Q Meda 3Q Max -7.5054-1.1017-0.155 1.1448 6.350 Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept).83349 0.8108 3.489 0.000611 *** repwt 0.95715 0.0109 79.168 < e-16 *** --- Sgf. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Resdual stadard error:.41 o 178 degrees of freedom Multple R-squared: 0.974, Adjusted R-squared: 0.97 F-statstc: 668 o 1 ad 178 DF, p-value: <.e-16 Sums of Squares R Margal vs Partal Relatoshps mod <- lm(weght ~ repwt, datadavs) Aova(mod) Aova Table (Type II tests) Respose: weght Sum Sq Df F value Pr(>F) repwt 3164 1 69.3 <.e-16 *** Resduals 914 180 --- Sgf. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 We ca use regresso to fd ether margal or partal relatoshps (e ects). Margal relatoshps are what smple lear regresso (oe Y ad oe ) gves us - these do ot cotrol for ay other varables. Partal relatoshps are what multple lear regresso (oe Y ad more tha oe ) gves us - these cotrol for the e ects of other varables.

Multple Regresso Estmatg Uque E ects I the multple regresso model, we estmate the e ect of more tha 1. Y A + B 1 1 + B + E We wll leave the Math for the book, t s more complcated ad ot all that more elghteg,but ote that each coe cet B 1 ad B are each fuctos of both 1 ad. I geeral, We ca get uque estmates of the e ects of the varables oly f: All varables have varace (.e., oe s costat). No oe varable s a perfect lear fucto of aother varable. Oe example of how ths could happe s f oe varable were perfectly correlated (r 1) wth aother varable that s also the model Y A + B 1 1 + B +...+ B k k + E data(prestge) summary(mod1 <- lm(prestge ~ I(come/1000), dataprestge)) Prestge Model 1 Call: lm(formula prestge ~ I(come/1000), data Prestge) Resduals: M 1Q Meda 3Q Max -33.007-8.378 -.378 8.43 3.084 Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept) 7.141.677 11.97 <e-16 *** I(come/1000).8968 0.833 10. <e-16 *** --- Sgf. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Resdual stadard error: 1.09 o 100 degrees of freedom Multple R-squared: 0.5111, Adjusted R-squared: 0.506 F-statstc: 104.5 o 1 ad 100 DF, p-value: <.e-16 summary(mod1 <- lm(prestge ~ I(come/1000) + educato, dataprestge)) Prestge Model Call: lm(formula prestge ~ I(come/1000) + educato, data Prestge) Resduals: M 1Q Meda 3Q Max -19.4040-5.3308 0.0154 4.9803 17.6889 Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept) -6.8478 3.190 -.17 0.0359 * I(come/1000) 1.361 0.4 6.071.36e-08 *** educato 4.1374 0.3489 11.858 < e-16 *** --- Sgf. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Resdual stadard error: 7.81 o 99 degrees of freedom Multple R-squared: 0.798, Adjusted R-squared: 0.7939 F-statstc: 195.6 o ad 99 DF, p-value: <.e-16

Model Ft: Multple Regresso Stadardzed Regresso Coe cets Model ft s bascally the same multple, as smple, regresso. The cocepts we use are the same wth some smple modfcatos. The degrees of freedom (.e., the deomator) the stadard error of the resduals s k 1, where k s the umber of depedet varables the model. R s calculated the same way (sce t s defed terms of the sums of squares oly havg to do wth Y, Ŷ ad Ȳ ) We ca make a adjustmet to R to accout for creasg the umber of varables the model. R 1 RSS k 1 TSS 1 Sometmes we wat to compare the e ects of varables that are otherwse comparable. There are a couple of d eret ways to do ths: Multply the e ect sze by some comparable measure of spread (e.g., IQR, rage, etc...). Ths tells us how much predctos would chage as the varable of terest chages. Stadardzed coe cets are aother way ad we ca accomplsh ths two d eret ways. Bk B Sk k S Y Estmate a ew regresso where all of the varables are made to z-scores. You ca do ths R wth scale(). Stadardzed Regresso R or use ScaleDataFrame() summary(mod <- lm(scale(prestge) ~ scale(come) + scale(educato), dataprestge)) Call: lm(formula scale(prestge) ~ scale(come) + scale(educato), data Prestge) Resduals: M 1Q Meda 3Q Max -1.178-0.3099 0.0009 0.895 1.08 Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept) -1.754e-17 4.495e-0 0.000 1 scale(come) 3.359e-01 5.533e-0 6.071.36e-08 *** scale(educato) 6.56e-01 5.533e-0 11.858 < e-16 *** --- Sgf. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Resdual stadard error: 0.454 o 99 degrees of freedom Multple R-squared: 0.798, Adjusted R-squared: 0.7939 F-statstc: 195.6 o ad 99 DF, p-value: <.e-16 lbrary(damsc) summary(mod3 <- lm(prestge ~ come + educato, datascaledataframe(prestge))) Call: lm(formula prestge ~ come + educato, data scaledataframe(prestge)) Resduals: M 1Q Meda 3Q Max -1.178-0.3099 0.0009 0.895 1.08 Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept) -1.754e-17 4.495e-0 0.000 1 come 3.359e-01 5.533e-0 6.071.36e-08 *** educato 6.56e-01 5.533e-0 11.858 < e-16 *** --- Sgf. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Resdual stadard error: 0.454 o 99 degrees of freedom Multple R-squared: 0.798, Adjusted R-squared: 0.7939 F-statstc: 195.6 o ad 99 DF, p-value: <.e-16