Economics 130. Lecture 4 Simple Linear Regression Continued

Similar documents
ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Statistics II Final Exam 26/6/18

Properties of Least Squares

x i1 =1 for all i (the constant ).

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

STAT 3008 Applied Regression Analysis

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Statistics for Economics & Business

e i is a random error

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Chapter 11: Simple Linear Regression and Correlation

Lecture 4 Hypothesis Testing

Statistics for Business and Economics

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Chapter 5: Hypothesis Tests, Confidence Intervals & Gauss-Markov Result

x = , so that calculated

Basic Business Statistics, 10/e

First Year Examination Department of Statistics, University of Florida

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Biostatistics 360 F&t Tests and Intervals in Regression 1

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity

/ n ) are compared. The logic is: if the two

Lecture 3 Stat102, Spring 2007

Continuous vs. Discrete Goods

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

STATISTICS QUESTIONS. Step by Step Solutions.

a. (All your answers should be in the letter!

Statistics Chapter 4

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

CHAPTER 8. Exercise Solutions

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Chapter 4: Regression With One Regressor

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

18. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III

Scatter Plot x

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Chapter 13: Multiple Regression

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Introduction to Regression

Correlation and Regression

Problem of Estimation. Ordinary Least Squares (OLS) Ordinary Least Squares Method. Basic Econometrics in Transportation. Bivariate Regression Analysis

Chapter 14 Simple Linear Regression

Professor Chris Murray. Midterm Exam

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Topic- 11 The Analysis of Variance

Chapter 15 Student Lecture Notes 15-1

Estimation: Part 2. Chapter GREG estimation

The Ordinary Least Squares (OLS) Estimator

January Examinations 2015

Lecture 6: Introduction to Linear Regression

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Exam. Econometrics - Exam 1

β0 + β1xi and want to estimate the unknown

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

A Comparative Study for Estimation Parameters in Panel Data Model

Lecture 3 Specification

Polynomial Regression Models

Lecture 2: Prelude to the big shrink

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

17 - LINEAR REGRESSION II

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

CHAPTER 6 GOODNESS OF FIT AND CONTINGENCY TABLE PREPARED BY: DR SITI ZANARIAH SATARI & FARAHANIM MISNI

Learning Objectives for Chapter 11

# c i. INFERENCE FOR CONTRASTS (Chapter 4) It's unbiased: Recall: A contrast is a linear combination of effects with coefficients summing to zero:

Chapter 3 Describing Data Using Numerical Measures

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

Midterm Examination. Regression and Forecasting Models

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Comparison of Regression Lines

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

β0 + β1xi. You are interested in estimating the unknown parameters β

Topic 7: Analysis of Variance

Linear Regression Analysis: Terminology and Notation

Modeling and Simulation NETW 707

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Transcription:

Economcs 130 Lecture 4 Contnued

Readngs for Week 4 Text, Chapter and 3.

We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do we create the model relatng the data? How do we relate data to on another? How do we evaluate these relatonshps?

Tonght we wll reprse last week s lecture. We wll then derve the estmates b 1 and b. We wll dscuss what t means that OLS wth certan assumptons s BLUE. We wll dscuss how evaluate our estmates. We wll do a problem.

Last week we: Developed a smple lnear regresson model Dscussed the error term Explaned the dfferences between parameters and estmates Presented OLS for obtanng estmates Introduced mnmzng the resduals Introduced R

Remember OLS chooses b 1 and b to mnmze the SSE, sum of squared resduals. The solutons: b ( Y Y)( X X ) = ( X X ) b 1= Y b X

Frst Dervaton (b 1 ) ê = Y - Ŷ (the resdual) ê = Y b 1 b *X ê ^ = (Y - b 1 - b *X)^ ê ^/ b 1 = *(Y - b 1 - b *X)*-1 0 = *(Y - b 1 - b *X)*-1 0 = (Y - b 1 - b *X) 0 = Y - b 1 - b *X b 1 = Y - b *X b 1 = Y - b *X n b 1 = Y - b *X b 1 = Y - b * X

Now let s do b : ê ^ = (Y - b 1 - b *X)^ ê ^/ b = *(Y - b 1 - b *X)*-X 0 = *(Y - b 1 - b *X)*-X 0 = (Y - b 1 - b *X)*X 0 = (YX - b 1 *X - b *X^) 0 = YX - b 1 *X - b *X^

YX =b 1 *X + b *X^ YX =b 1 X + b X^ YX =( Y - b X )*X + b X^ YX =[1/nY-b *(1/n)X)]*X + b X^ YX =1/nYX -b *(1/n)X^ + b *X^

b = YX - n Y * X X^ - n* X ^ = XY - n Y * X +n Y * X - n Y * X X^ - n* X ^+ n* X ^ - n* X ^

b = YX - Y *X - Y* X + n Y * X X^ - X* X + n* X ^ b = (YX - Y *X - Y* X + Y * X ) (X^ - X* X + X ^) b = (Y - Y )*(X - X ) (X - X )^

Now we turn to BLUE: Best Lnear Unbased Estmators

Why mght OLS be a good estmator? Desrable propertes of an estmator: 1) Unbased,.e., expected value of the estmator equals the true parameter value that we want to estmate. ) Precse,.e., the varance of the estmator, s small. Turns out that least squares estmator s: unbased lnear n the y s among lnear unbased estmators, the best has smallest varance. I.e., OLS (ordnary least squares) estmator s BLUE. Ths s the Gauss Markov theorem.

Gauss-Markov Theorem states that OLS estmates of the regresson coeffcents are (1) unbased, () consstent and (3) most effcent. Assumng our 6 assumptons from last week are true.

Sx Crtcal Assumptons Lnearty Some observed Xs are dfferent Condtonal mean of e, gven X, = 0 Xs are gven, and can be treated as nonrandom All e s are equally dstrbuted wth the same condtonal varance (σ ) [Homoskedastcty (equal scatter] e s are ndependently dstrbuted; cov (e, e j ) = 0

To prove unbased-ness we need to remember our assumptons: Four Assumptons Lnearty Some observed Xs are dfferent Condtonal mean of e, gven X, = 0 Xs are gven, and can be treated as nonrandom

Defnton of unbased: E(b) = β Begn wth Please note: b ( x x)( y y) ( x x) = 1) The sum of a varable around ts average s always zero That s: ( x) = 0 x ) For convenence, we wll defne w as w x x ( x x) =

Usng these notes, we can rewrte b as follows: ( )( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) = = = = = w y y x x x x x x y x x x x x x y y x x x x y y x x b

Snce y = β 1 + β x +e and we can smplfy our equaton: b = = w β + β x + e ( ) 1 = wβ + β wx + we = β + w y 1 we

We can fnd the expected value of b usng the fact that the expected value of a sum s the sum of the expected values: Eb ( ) = Eb ( + we) = E(β + we+ we +... + we ) 1 1 N N = β Snce E(e ) = 0. = E(β ) + E( we ) + E( w e ) +... + E( w e ) 1 1 = E(β ) + E( we ) = β + we( e) N N

Usng our assumpton that the condtonal mean (expected value) of the error terms = 0: E( b ) = β Therefore, OLS estmates are unbased.

What about most effcent? Effcency s defned the by sze of the varance. If there are two unbased estmators, the one wth the smallest varance s the most effcent.

What about the varance of our estmates? All e s are equally dstrbuted wth the same condtonal varance (σ ) [Homoskedastcty (equal scatter] e s are ndependently dstrbuted; cov (e, e j ) = 0

Remember: b = β +we Therefore: var( b ) = E β + we β = E we = E w e + ww ee = w E e + = = σ ( x x) j j j ( ) ( ) ww E ee j j j σ w

If we postulate another dfferent estmate, b * whch dffers from b by a constant, w +c ( ) * var( b) = var β + w + c e ( w c ) var ( e ) = + ( w c ) = σ + = σ w + σ ( b ) ( b ) = var + σ var c c

What about the varance of our estmates? All e s are equally dstrbuted wth the same condtonal varance (σ ) [Homoskedastcty (equal scatter] e s are ndependently dstrbuted; cov (e, e j ) = 0 Then, among all unbased, lnear combnatons of Ys, our estmates b 1 and b have the lowest varance = most effcent.

As for consstency, t s the property that estmates converge to true values as the sample sze s ncreased ndefntely. Smlar to unbased-ness, f our frst four assumptons hold, (especally #4, whch mples X s and e s are uncorrelated), then OLS estmators are consstent.

Therefore, OLS estmators are BLUE. What f the assumptons are not true? To be contnued at a later date...

Now we combne statstcal analyss wth OLS estmates.

s Statstcal Aspects of Regresson b 1 and b are only estmates of β 1 and β Key queston: How accurate are these estmates? Statstcal procedures allow us to formally address ths queston.

s The normal dstrbuton of b, the least squares estmator of β, s b N β ~, ( x ) x A standardzed normal random varable s obtaned from b by subtractng ts mean and dvdng by ts standard devaton: Z = σ ( x x) σ b β ~ N ( 0,1)

s We know that: ( 1.96 1.96) = 0. 95 P Z Substtutng: P 1.96 σ b β 1.96 = ( x x) 0.95 Rearrangng: P 1.96 b σ = ( ) x x β b + 1.96 σ ( x x) 0. 95

s The two end-ponts b ±. provde an nterval estmator. ( x ) 1 σ x 96 In repeated samplng 95% of the ntervals constructed ths way wll contan the true value of the parameter β. Ths easy dervaton of an nterval estmator s based on the assumpton SR6 and that we know the varance of the error term σ.

s Replacng σ wth t: σˆ creates a random varable t = σˆ σ b β β β = = ~ ( N ) ( x x) vâr( b ) se( b ) b b t ( ) The rato t = b β se b has a t-dstrbuton wth (N ) degrees of freedom, whch we denote as: t t ~ ( N )

s In general we can say, f assumptons SR1-SR6 hold n the smple lnear regresson model, then bk β t = k k se ( ) for 1, ( ) ~ t = N b The t-dstrbuton s a bell shaped curve centered at zero It looks lke the standard normal dstrbuton, except t s more spread out, wth a larger varance and thcker tals The shape of the t-dstrbuton s controlled by a sngle parameter called the degrees of freedom, often abbrevated as df k

A Confdence Interval for β k Uncertanty about accuracy of the estmate b can be summarsed n a confdence nterval 95% confdence nterval for β s gven by: b β k k P t = α ( ) c tc 1 se bk P b t se b β t + t se b = 1 [ ( ) ( )] α k c k k c c k t c s a crtcal value from the Student t-dstrbuton se b = standard error of s a measure of the accuracy o s b = SSE ( N ) ( X X )

A Confdence Interval for β (cont.) t c controls the confdence level (e.g. t b s bgger for 95% confdence than 90%). se vares drectly wth SSE (.e. how varable the resduals are) se vares nversely wth N, the number of data ponts se vares nversely wth (X X ) whch s related to the varance/varablty of X.

Intuton of Confdence Interval: Useful (but formally ncorrect) ntuton: There s a 95% probablty that the true value of β les n the confdence nterval. Correct ntuton: If you repeatedly use the above formula for calculatng a confdence nterval, 95% of the ntervals you construct wll contan the true value for β. Can choose any level of confdence you want (e.g. 90%, 99%).

EXAMPLE FROM TEXT: b = 10.1, N = 40 df = 38 var(b ) = 4.38 Create a 95% confdence nterval (α =.05) Crtcal value of t =.04 se = (4.38) ½ =.09 A 95% confdence nterval estmate for β : b ( ) = 10.1±.04(.09) [ 5.97,14.45] ± tcse b = When the procedure we used s appled to many random samples of data from the same populaton, then 95% of all the nterval estmates constructed usng ths procedure wll contan the true parameter!!

The most common method of evaluatng estmates s through hypothess testng usng a test statstc usng the t dstrbuton.

Hypothess Testng Test whether β=0 (.e. whether X has any explanatory power) One way of dong t: look at confdence nterval, check whether t contans zero. If no, then you are confdent β 0. Alternatve (equvalent) way s to use t-statstc (often called t-rato ) bk c t = ~ t( N ) se( b ) f c = 0, then k Bg values for t ndcate β 0. Small values for t ndcate β=0. t = b se

Hypothess Testng (cont.) Q: What do we mean by bg and small? A: Look at p-value. If P-value.05 then t s bg and conclude β 0. If P-value >.05 then t s small and conclude β=0. Useful (but formally ncorrect) ntuton: P-value measures the probablty that β = 0..05 = 5% = level of sgnfcance Other levels of sgnfcance (e.g. 1% or 10%) occasonally used

The Test statstc for H 0 : β =c, s: t 0 = (β -c)/se ~ t(df=n-) Rejecton regons exactly the same as before (dependng whether you are dong a one-sded or two-sded test). p-values exactly the same as before (dependng upon t 0 and whether you are dong a one-sded or two-sded test).

Components of Hypothess Tests 1. A null hypothess, H 0. An alternatve hypothess, H 1 3. A test statstc 4. A rejecton regon 5. A concluson

1. For our purposes, the null hypothess s H 0 = 0. The alternatve hypothess s H 1 0.. Let α =.05. The crtcal values for ths two-tal test are the.5-percentle t (.05,38 ) =.04 and the 97.5-percentle t (.975,38 ) =.04. We REJECT the null hypothess f the calculated value of t.04 or f t.04. If.04 < t <.04, we DO NOT REJECT the Null.

A coeffcent s sad to be STATISTICALLY SIGNIFICANT or SIGNIFICANTLY DIFFERENT FROM ZERO at level α (usually, 1%, 5% or 10%) f you reject the null hypothess that the coeffcent s ZERO (generally wth a two-sded test). Ths s what reported t-ratos test.

p-value rule: Reject the null hypothess when the p-value s less than, or equal to, the level of sgnfcance α. That s, f p α then reject H 0. If p > α then do not reject H 0

The null hypothess s H 0 : β = 0. The alternatve hypothess s H 1 : β 0 Recall t statstc for b : t = 4.88 p-value for H 0 = p P t( 38) P t( 38) = 4.88 4.88 + = 0.0000

Returnng to the housng problem: PRICE = 5.351 +.13875SQFT (1.404) (7.41)

Next Week More evaluatng results A few words on Lnear Algebra Begn multple regresson models Multple R and F test