Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Similar documents
Simple Linear Regression

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Econometric Methods. Review of Estimation

Linear Regression with One Regressor

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Chapter 13 Student Lecture Notes 13-1

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Multiple Linear Regression Analysis

ENGI 3423 Simple Linear Regression Page 12-01

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

ESS Line Fitting

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture Note to Rice Chapter 8

Simple Linear Regression

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

STA302/1001-Fall 2008 Midterm Test October 21, 2008

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Lecture 8: Linear Regression

Objectives of Multiple Regression

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Probability and. Lecture 13: and Correlation

Simulation Output Analysis

Lecture 3 Probability review (cont d)

Multivariate Transformation of Variables and Maximum Likelihood Estimation

residual. (Note that usually in descriptions of regression analysis, upper-case

Chapter 14 Logistic Regression Models

ρ < 1 be five real numbers. The

Maximum Likelihood Estimation

TESTS BASED ON MAXIMUM LIKELIHOOD

Special Instructions / Useful Data

Multiple Regression Analysis

Simple Linear Regression - Scalar Form

1 Solution to Problem 6.40

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Functions of Random Variables

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

Econometrics. 3) Statistical properties of the OLS estimator

Correlation and Simple Linear Regression

Chapter 4 Multiple Random Variables

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

Statistics MINITAB - Lab 5

LINEAR REGRESSION ANALYSIS

Lecture Notes Types of economic variables

Summary of the lecture in Biostatistics

Point Estimation: definition of estimators

STATISTICAL INFERENCE

Wu-Hausman Test: But if X and ε are independent, βˆ. ECON 324 Page 1

Line Fitting and Regression

2SLS Estimates ECON In this case, begin with the assumption that E[ i

Statistics: Unlocking the Power of Data Lock 5

Lecture 2: Linear Least Squares Regression

ECON 5360 Class Notes GMM

Chapter 3 Sampling For Proportions and Percentages

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

ENGI 4421 Propagation of Error Page 8-01

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Lecture 1: Introduction to Regression

Qualifying Exam Statistical Theory Problem Solutions August 2005

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Chapter 5 Properties of a Random Sample

Multiple Choice Test. Chapter Adequacy of Models for Regression

Introduction to Matrices and Matrix Approach to Simple Linear Regression

Chapter 8. Inferences about More Than Two Population Central Values

ε. Therefore, the estimate

Third handout: On the Gini Index

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Chapter 2 Supplemental Text Material

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

4. Standard Regression Model and Spatial Dependence Tests

Simple Linear Regression and Correlation.

i 2 σ ) i = 1,2,...,n , and = 3.01 = 4.01

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

X ε ) = 0, or equivalently, lim

GENERALIZED METHOD OF MOMENTS CHARACTERISTICS AND ITS APPLICATION ON PANELDATA

Answer key to problem set # 2 ECON 342 J. Marcelo Ochoa Spring, 2009

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation

Lecture 1: Introduction to Regression

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I


Simple Linear Regression Analysis

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

Econ 388 R. Butler 2016 rev Lecture 5 Multivariate 2 I. Partitioned Regression and Partial Regression Table 1: Projections everywhere

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Chapter -2 Simple Random Sampling

COV. Violation of constant variance of ε i s but they are still independent. The error term (ε) is said to be heteroscedastic.

Chapter -2 Simple Random Sampling

Chapter 2 Simple Linear Regression

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Transcription:

Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos (Y X ), =,,.., o the relatoshp whch, because t s ot exact, we shall wrte as: y = α + β x +,..., I ths relatoshp α, β ad ε =,.., are fudametally uobservable ad we would lke to estmate α ad β. Approach: The dea s to select estmates of α ad β lets call them α* ad β* whch yeld a straght le (called the regresso le) Y = α* + β*x whch mmses a measure of the aggregate dstace of the pots (Y X ), =,,.., to that le X Y space, where Y s measured o the vertcal axs. The measure we use s the sum of squared vertcal dstaces whch we shall call the Error Sum of Squares (ESS) so that α* ad β* are solutos to the problem: ( ) t m ESS Y α β X α, β = + The multvarate calculus s employed to solve ths problem by settg the partal dervatves of ESS wth respect to α* ad β* to zero (called the frst order codtos) ad solvg thus: The solutos to whch are: ( Y ( α β X) ) ESS = + = 0 α ( ( α β ) ) ESS = Y + X X 0 = β

α = Y β X β = ( Y Y X ( X X X Note the formula for β * has may alteratve equvalet versos whch may be see by observg that for the umerator: Y Y X = Y Y X X = X X Y = X Y YX ad for the deomator: X X X = X X X X = X X so that varous combatos of umerator ad deomator formulae yeld alteratve represetatos of β *. Assumptos the Ordary Least Squares model. Note that whle α, β ad ε, =,.., are fudametally uobservable we oly cocer ourselves wth estmatg α ad β whch defe the relatoshp betwee Y ad X. The ε =,.., are cosdered errors whch accommodate all the other flueces o Y ot accouted for α +βx as such we assume them to be radom ad to obey the followg four assumptos: ) E(ε ) = 0 for all. Ths assumpto really says that the average or et effect of all the other flueces o Y ot accouted for α +βx s costat ad zero for each observato (,..,). ) V(ε ) = σ > 0 for all. Ths assumpto really says that the varablty of the et effect of all the other flueces o Y ot accouted for α +βx s costat ad o zero for each observato (,..,).

Ths s sometmes referred to as the homoskedastcty assumpto ad s ot as ocuous as t seems. For example f Y were the cosumpto behavour of a dvdual ad X was ther dsposable come t says that the scale of varablty of the uobserved effects would be the same for both rch ad poor dvduals. 3) E(ε ε j ) = 0 for all j. 4) E(ε X j ) = 0. For ad j These last two assumptos derve from the geeral assumpto that the et effects of all the other flueces o Y ot accouted for α +βx are depedet of the X s ad of each other. Aga these are ot ocuous assumptos, for example f the equato relates to the behavour of dvduals ad dvduals the sample are related to oe aother some fasho they are ulkely to be true. Smlarly f the equato relates to behavour through tme today s radom effects are ulkely to be depedet of yesterdays effects. Gve these assumptos certa propertes of the estmators follow. Ubasedess. Uder the above assumptos the ordary least squares estmators α* ad β* are ubased so that E(α*) = α ad E(β*) = β whch may be demostrated as follows. From the varous formulae for β* we may wrte: ( ( α β + ) β = = = = X X Y X X + X ( X X X X X X whch, after otg that the sum of devatos from mea s equal to 0, may be wrtte as: β = β + ( X ( X X X

whch ca readly be see to have a expectato β. For α* ote that: α = Y + β X = α + βx + β X = α + β β X + sce E(β - β*) = 0 ad gve assumpto E(α*) = α. The varaces of the estmators. The varace of the estmators α* ad β* facltate ferece ad cofdece terval estmato. The ca be derved, gve the above assumptos as follows. The varace of ay radom varable W s gve by E[(W-E(W)) ] so that for β* from the above we may wrte: X X X X E ( β E( β )) E β β E ( X X ( X X) = + = Note from our assumptos that: σ ( ) E X X X X = 0 for all j j j = X X for all = j whch meas that the varace may be wrtte as: E ( X σ σ ( β E( β )) = = ( X ) ( X X ) X By smlar argumets the varace of α* s gve by: X E ( α E( α )) = σ + ( X X The esduals.

The thgs that are ot explaed by the model, the ε s are fact mplctly estmated by the model by Y - α* - β*x. These estmates of the errors are frequetly referred to as the esduals. They too have propertes whch reflect the assumptos made regardg the true errors.. The resduals sum to 0. = α β = β β e Y X Y Y X X ( Y Y) β ( X X) = = 0 Ths mmcs the dea that the errors have a expectato 0.. The resduals are orthogoal to the X s. ex = Y α β X X = Y Y β X β X X β ( Y Y X = = = 0 Y Y X X X X Y Y X X X X ( X X Whch mmcs the dea that the errors are depedet of the X s. Iferece ad Cofdece Itervals. Notg that β* may be wrtte as: β = β + ( X X) ( X X X t ca readly be see that the estmator s a lear fucto of the errors ε, =,.., so that f they were ormally dstrbuted the so would β* be, fact based upo the foregog t s dstrbuted as:

β N β, σ ( X a smlar fasho α* wll be dstrbuted as: α N α σ +, X ( X so that ferece ad cofdece terval computatos ca proceed accordgly. A Estmator for σ. As usual t s extremely rare for the value of the varace to be kow so that geerally t has to be estmated. The estmator σ * s gve by problem: σ ( Y ( α + β X) ) ) ESS = = whch s most coveetly calculated by otg that: ( ) β ( ) ESS = Y Y X X employg ths estmate meas that: ( β β) ad smlarly: σ + σ ( X ( α α) X ( ) t ( X t ( so ferece ad cofdece terval calculatos ca be coducted accordgly.

ad all that. A very commo strumet for measurg the degree of explaatory power a equato s the statstc whch measures the proporto of the varablty the Y s that s attrbutable to the X s. Gve the amout of varablty the Y s attrbutable to the errors s gve by the error sum of squares (ESS), the proporto of the varablty of the Y s attrbutable to the X s s gve by: ( ) ( ( ) Y Y ESS β X X = = ( Y Y Y Y where the last equalty s obtaed by substtutg the formula for ESS. Notg that - s: t follows that: = ESS ( Y Y) ( X X) β β ( = = ESS st dev ( β ) whch s the square of the classc t statstc for β whch wll have a F(,-) dstrbuto. Multple egresso. Here we cosder the exteso of the smple regresso techque to aalysg the lear relatoshp betwee K+ varables Y ad X, X,..,X K. We have K+-tuples of observatos (Y X, X,..,X K ) =,,.., o the relatoshp whch, because t s ot exact, we shall wrte as: K Y = α + β X + =,..., k k k =

I ths relatoshp α, β k, k =,..,K ad ε =,.., are fudametally uobservable ad we would lke to estmate the α ad β k. Essetally ths deals wth the case where Y s descrbed by more tha oe varable X. We requre oe addto to our 4 assumptos, amely > K+. I the same way that estmators for α, β were developed the smple regresso case (by mmsg ESS) estmators for α, β k, k =,..,K (deoted α *, β * k, k =,..,K ) ca be developed by mmzg wth respect to α *, β * k, k =,..,K the multvarate verso of the error sum of squares gve by: ESS = Y + X K α β k k k= The formulae for these estmates ad smlarly formulae for Var(α * ) ad Var(β * k), k =,..,K are complcated ad wll ot be gve here however these estmates are readly calculated usg avalable software packages ad just as the smple regresso case ferece ad cofdece tervals may be pursued by otg that: ( βk βk) var ( βk ) ( K ) t ad smlarly: ( αk αk) var ( αk ) ( k ) t so that ferece ad cofdece terval calculatos ca be coducted accordgly. the Multvarate Case. I the multvarate case ca be calculated exactly the same fasho as: = ESS ( Y Y)

however ths case f we wsh to use t the cotext of a jot test of the sgfcace of all of the X k s we have to use: ( K ) K (, K ) F K whch provdes us wth a oe sded upper taled test statstc for the sgfcace of the K explaatory varables X k.