Lecture 1: Introduction to Regression

Similar documents
Lecture 1: Introduction to Regression

Simple Linear Regression

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Objectives of Multiple Regression

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Econometric Methods. Review of Estimation


Lecture Notes 2. The ability to manipulate matrices is critical in economics.

ε. Therefore, the estimate

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Lecture Notes Types of economic variables

Linear Regression with One Regressor

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Correlation and Simple Linear Regression

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Chapter Two. An Introduction to Regression ( )

Simple Linear Regression and Correlation.

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

ENGI 3423 Simple Linear Regression Page 12-01

4. Standard Regression Model and Spatial Dependence Tests

Lecture 2: The Simple Regression Model

Line Fitting and Regression

Chapter 2 Supplemental Text Material

ESS Line Fitting

Lecture 8: Linear Regression

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STA302/1001-Fall 2008 Midterm Test October 21, 2008

Simple Linear Regression

Chapter 13 Student Lecture Notes 13-1

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

i 2 σ ) i = 1,2,...,n , and = 3.01 = 4.01

Multiple Choice Test. Chapter Adequacy of Models for Regression

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Statistics MINITAB - Lab 5

Sampling Theory MODULE V LECTURE - 14 RATIO AND PRODUCT METHODS OF ESTIMATION

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

CHAPTER VI Statistical Analysis of Experimental Data

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Lecture 3. Sampling, sampling distributions, and parameter estimation

residual. (Note that usually in descriptions of regression analysis, upper-case

Summary of the lecture in Biostatistics

Multivariate Transformation of Variables and Maximum Likelihood Estimation

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

Lecture 2: Linear Least Squares Regression

Third handout: On the Gini Index

9.1 Introduction to the probit and logit models

ENGI 4421 Propagation of Error Page 8-01

Maximum Likelihood Estimation

Probability and. Lecture 13: and Correlation

Model Fitting, RANSAC. Jana Kosecka

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

Multiple Linear Regression Analysis

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Previous lecture. Lecture 8. Learning outcomes of this lecture. Today. Statistical test and Scales of measurement. Correlation

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Simple Linear Regression - Scalar Form

TESTS BASED ON MAXIMUM LIKELIHOOD

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Chapter 14 Logistic Regression Models

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

2SLS Estimates ECON In this case, begin with the assumption that E[ i

Homework Solution (#5)

STK4011 and STK9011 Autumn 2016

Chapter 5 Properties of a Random Sample

Regression. Linear Regression. A Simple Data Display. A Batch of Data. The Mean is 220. A Value of 474. STAT Handout Module 15 1 st of June 2009

The expected value of a sum of random variables,, is the sum of the expected values:

CHAPTER 2. = y ˆ β x (.1022) So we can write

Analyzing Two-Dimensional Data. Analyzing Two-Dimensional Data

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Econ 388 R. Butler 2016 rev Lecture 5 Multivariate 2 I. Partitioned Regression and Partial Regression Table 1: Projections everywhere

Chapter 8: Statistical Analysis of Simulated Data

Descriptive Statistics

Class 13,14 June 17, 19, 2015

QR Factorization and Singular Value Decomposition COS 323

Correlation and Regression Analysis

UNIT 7 RANK CORRELATION

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

: At least two means differ SST

Statistics: Unlocking the Power of Data Lock 5

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation

Lecture 3 Probability review (cont d)

Functions of Random Variables

Chapter Statistics Background of Regression Analysis

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

Chapter 2 Simple Linear Regression

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

MEASURES OF DISPERSION

ln( weekly earn) age age

Chapter 8. Inferences about More Than Two Population Central Values

Transcription:

Lecture : Itroducto to Regresso

A Eample: Eplag State Homcde Rates What kds of varables mght we use to epla/predct state homcde rates? Let s cosder just oe predctor for ow: povert Igore omtted varables, measuremet error How mght ths be related to homcde rates?

Povert ad Homcde These data are located here: http://www.publc.asu.edu/~gasweete/crj64/data/hom_pov.dta There appears to be some relatoshp betwee povert ad homcde rates, but t s ot perfect. There s a lot of ose whch we wll attrbute to uobserved factors ad radom error.

Povert ad Homcde, cot. There s some ozero value of epected homcdes the absece of povert. We epect homcde rates to crease as povert rates crease. Thus, Y Ths s the Populato Regresso Fucto X

Povert ad Homcde, Sample Regresso Fucto s the depedet varable, homcde rate, whch we are trg to epla. represets our estmate of what the homcde rate would be the absece of povert* s our estmate of the effect of a hgher povert rate o homcde u s a ose term reflectg other thgs that fluece homcde rates *Ths s etrapolato outsde the rage of data. Not recommeded. u

Povert ad Homcde, cot. u Ol ad are drectl observable the equato above. The task of a regresso aalss s to provde estmates of the slope ad tercept terms. The relatoshp s assumed to be lear. A crease s assocated wth a crease. Same epected chage homcde gog from 6 to 7% povert as from 5 to 6%

.973.475

Ordar Least Squares.973.475 u Substatvel, what do these estmates mea? How dd we arrve at ths estmate? Mmze the sum of the squared error, aka Ordar Least Squares OLS estmato m Y Y Wh squared error? Wh vertcal error? Not perpedcular.

Ordar Least Squares Estmates m Solvg for the mmum requres calculus set dervatve wth respect to β to ad solve The book shows how we ca go from some basc assumptos to estmates for β ad β wthout usg calculus. I wll go through two dfferet was to obta these estmates: Wooldrdge s ad Kha s khaacadem.org

Ordar Least Squares: Estmatg the tercept Wooldrdge s method Eu u E Assumg that the average value of the error term s zero, t s a trval matter to calculate β oce we kow β.

Ordar Least Squares: Estmatg the tercept Wooldrdge Icdetall, these last sets of equatos also mpl that the regresso le passes through the pot that correspods to the mea of ad the mea of :,

Ordar Least Squares: Estmatg the slope Wooldrdge Frst, we use the fact that the epected value of the error term s zero, to create geerate a ew equato equal to zero. We saw ths before, but here I use the eact formula used the book. u u u E

Ordar Least Squares: Estmatg the slope Wooldrdge We ca multpl ths last equato b sce the covarace betwee ad u s assumed to be zero ad the terms the paretheses are equal to u. Net, we plug our formula for the tercept ad smplf, u E u Cov

Ordar Least Squares: Estmatg the slope Wooldrdge Re-arragg...

Ordar Least Squares: Estmatg the slope Wooldrdge Re-arragg... Iterestgl, the fal result leads us to the relatoshp betwee covarace of ad ad varace of. var, cov

Ordar Least Squares: Estmates Kha s method Kha starts wth the actual pots, ad elaborates how these pots are related to the squared error, the square of the dstace betwee each pot, ad the le =m+b=β +β

Total Ordar Least Squares: Estmates Kha s method The vertcal dstace betwee a pot,, ad the regresso le = β +β s smpl -β +β Error It would be trval to mmze the total error. We could set β the slope equal to zero, ad β equal to the mea of, ad the the total error would be zero. Aother approach s to mmze the absolute dfferece, but ths actuall creates thorer math problems tha squarg the dffereces, ad results stuatos where there s ot a uque soluto. I short, what we wat s the sum of the squared error SE, whch meas we have to square ever term that equato.

Ordar Least Squares: Estmates Kha s method We eed to fd the β ad β that mmze the SE. Let s epad ths out. To be clear, the subscrpts for the β estmates just refer to our two regresso le estmates, whereas the subscrpts for our s ad s refer to the frst observato, secod observato ad so o. SE SE

Ordar Least Squares: Estmates Kha s method Summg these colums... Everthg but the regresso le coeffcets are kow ettes here. Ths equato represets a 3D surface, where dfferet values of β ad β correspod to dfferet values of the squared error. We just eed to pck the values of β ad β that mmze the SE. * * * * * mea mea mea mea mea SE

Ordar Least Squares: Estmates Kha s method Those famlar wth calculus wll kow that the mmum of the squared error surface occurs where the partal dervatve slope wth respect to β s equal to zero ad the partal dervatve wth respect to β s equal to zero. We ve see that before. How about the other dervatve? mea mea SE * *

Ordar Least Squares: Estmates Kha s method Summg these colums... Replacg β... var, cov * * * * * * * * * * * * mea mea mea mea mea mea mea mea mea mea mea SE

Ordar Least Squares Estmates Hopefull t s reassurg to kow that we ca obta the same aswers from two ver dfferet methods. These formulas allow us, a bvarate regresso, to calculate the regresso le b had wthout usg fac statstcal packages. All we eed to do s fd the mea of, the mea of, the mea of the products of ad, ad the mea of the squares of, ad the we ca plug ths to the formulas ad crak out our solutos.

OLS b had, eample Let s look at a set of 5 pots, ad see how to calculate a regresso le b had. Here are our fve pots: 4, 7,6, 6,3,4

OLS b had, eample We ca geerall guess that the slope wll be postve, but we ca fd the slope eactl f we calculate four thgs: the mea of, the mea of, the mea of the products of ad, ad the mea of the squares of The s are 4,7,,6, ad. Ther mea s 9/5=3.8 The s are,6,,3, ad 4. Ther mea s 6/5=3. The products are 8,4,,8 ad 8. Ther mea s 76/5=5.. The squared s are 6,49,,36, ad 4. Ther mea s 5/5=.

OLS b had, eample Recall the formula for the slope: mea mea * * 5. 3.*3.8 3.8*3.8 3.4 6.56.463 Oce we have the slope, the tercept s trval: 3..463*3.8.44 Ad our regresso le that mmzes the sum of squared dffereces:.44. 463 u

OLS b had, eample Checkg our work...

Aalss of Varace Oce we have our regresso le, we ca defe a ftted value as follows: Ths s our estmated value for gve our slope ad tercept estmates ad the value of. It s also sometmes called a predcted value. All of the -hats fall o the regresso le. For purposes of evaluatg our regresso, t makes sese to compare the -hats to the actual values of.

Aalss of Varace The total varato Y s assessed relatve to ts mea. We wat to partto ths varato to two compoets: We frst add ad subtract the ftted value of for each observato, the combe terms to get the resdual term, whch s devato ueplaed b the model, ad the dfferece betwee the ftted value of ad the mea of, whch s the porto of the varace eplaed b the model.

Aalss of Varace, cot. Of course, order to assess varace, we square all of these terms: SST SSR SSE Where SST s the total sum of squares, SSE s the eplaed sum of squares, ad SSR s the resdual sum of squares.

R R-squared R represets the porto of the varace that s eplaed b the model. R SSE SST Tpcall, socal scece applcatos, our stadards for R are prett low. Idvdual-level regressos rarel eceed.3

Ordar Least Squares Estmates b had See Ecel fle state hom povert -bar -bar * -bar Alabama 8.3 6.7 4.6 3.53 6.7.3 Alaska 5.4 -.9.63 -.3 4.37 Arzoa 7.5 5. 3..73 8.49 9.67 Arkasas 7.3 3.8.7.53 4.36.9 Calfora 6.8 3...3.53.3

Ordar Least Squares Estmates b had, cot. We ca also get β from the covarace. corr hom pov, c matr Stata, whch shows that the covarace of homcde ad povert s 4.34 ad the varace of povert s 9.6. β =4.34/9.6=.475 The mea of homcde rates s 4.77, ad the mea of povert rates s.9. β =4.77-.9*.475=-.973 Or, Stata. reg hom pov

Stata output β =4.34/9.6=.475 β =4.77-.9*.475=-.973. reg hom pov Source SS df MS Number of obs = 5 -------------+------------------------------ F, 48 =.36 Model.75656.75656 Prob > F =. Resdual 5.9343 48 4.68977798 R-squared =.38 -------------+------------------------------ Adj R-squared =.935 Total 35.84999 49 6.63846936 Root MSE =.656 ------------------------------------------------------------------------------ homrate Coef. Std. Err. t P> t [95% Cof. Iterval] -------------+---------------------------------------------------------------- povert.4755.787 4.6..68376.686795 _cos -.97359.7983 -.76.45-3.5467.664 ------------------------------------------------------------------------------

Assumptos of the Classcal Lear Regresso Model X & Y are learl related the populato. We have a radom sample of sze from the populato. 3 The values of through are ot all the same. 4 The error has a epected value of zero for all values of : Eu = zero codtoal mea 5 The error term has a costat varace for all values of : Varu = homoscedastct

Leart If X ad Y are ot learl related, the estmates wll be correct. Look at our data! Eample, how do these data compare?:. summ Varable Obs Mea Std. Dev. M Ma -------------+-------------------------------------------------------- 9 3.3665 4 4 9 3.3665 4 4 3 9 3.3665 4 4 4 9 3.3665 8 9 7.599.3568 4.6.84 -------------+-------------------------------------------------------- 7.599.3657 3. 9.6 3 7.5.344 5.39.74 4 7.599.3579 5.5.5

Leart, cot. How do these models compare? β =3 β =.5 Let s look at each of them separatel

Leart, cot., Regresso

Leart, cot., Regresso

Leart, cot., Regresso 3

Leart, cot., Regresso 4

3 Sample varato If there s o varato the values of, t s ot possble to estmate a regresso le. The le of best ft would pot straght up ad pass through ever pot. Mmal varato s sometmes problematc as well, as t makes regresso estmates ver ustable. Ths assumpto s eas to check b lookg at summar statstcs.

4 Zero codtoal mea Eu = I practcal terms, ths meas that the sum of the uobserved varables s ot related to. Also, t meas that varato our estmates of the tercept ad slope are all due to varatos the error terms. Should ths assumpto hold true, our estmates of the slope ad tercept are ubased, meag that o average we re gog to get the rght aswer.

5 Varu = homoscedastct I practcal terms, ths meas that the varace of the error term s urelated to the depedet varables.

Root Mea Squared Error RMSE Root mea squared error gves us a dcato of how well the regresso le fts the data. RMSE SSR k Ths s the square root of the resdual sum of squares dvded b the sample sze mus the umber of parameters beg estmated k= smple bvarate regresso.

Root Mea Squared Error, cot. Provded the error term s dstrbuted ormall, the RMSE tells us: 68.3% of the observatos fall wth the bad that s ±*RMSE of the regresso le 95.4% of the observatos fall wth the bad that s ±*RMSE of the regresso le 99.7% of the observatos fall wth the bad that s ±3*RMSE of the regresso le RMSE s also a elemet calculatg the stadard errors of β ad β

Regresso estmates, stadard errors SE RMSE SE RMSE

Regresso estmates, stadard errors, cot. Whle these two stadard error formulas ma ot appear ver tutve, we ca glea some mportat formato from them:. As ucertat about the regresso le creases RMSE creases, the stadard errors of both β ad β crease.. As the varablt of creases, the stadard errors of both β ad β decrease.

Formal test of model ft, F-test F k, N k SSE k SSR Where k = the umber of parameters the model, ad s the sample sze Ths s a geeral test of model ft. If the F- test s statstcall sgfcat, t meas that the model eplas some of the varace Y. k

Net tme: Homework: Problems.4,.4, C.4, C.4 Read: Wooldrdge Chapters 9 & Apped C.6, ad Bushwa, Sweete & Wlso 6 artcle