Wednesday, September 19 Handout: Ordinary Least Squares Estimation Procedure The Mechanics

Similar documents
Chapter 5: Ordinary Least Squares Estimation Procedure The Mechanics Chapter 5 Outline Best Fitting Line Clint s Assignment Simple Regression Model o

Wednesday, September 26 Handout: Estimating the Variance of an Estimate s Probability Distribution

Chapter 8 Handout: Interval Estimates and Hypothesis Testing

Monday, November 26: Explanatory Variable Explanatory Premise, Bias, and Large Sample Properties

[Mean[e j ] Mean[e i ]]

Wednesday, October 17 Handout: Hypothesis Testing and the Wald Test

Monday, September 10 Handout: Random Processes, Probability, Random Variables, and Probability Distributions

Chapter 11 Handout: Hypothesis Testing and the Wald Test

The Simple Regression Model. Part II. The Simple Regression Model

ECO220Y Simple Regression: Testing the Slope

CHAPTER 6: SPECIFICATION VARIABLES

ECON3150/4150 Spring 2015

At this point, if you ve done everything correctly, you should have data that looks something like:

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Wednesday, November 7 Handout: Heteroskedasticity

Chapter 13: Dummy and Interaction Variables

Chapter 15: Other Regression Statistics and Pitfalls

Monday, October 15 Handout: Multiple Regression Analysis Introduction

Handout 11: Measurement Error

ECON3150/4150 Spring 2016

An Introduction to Econometrics. A Self-contained Approach. Frank Westhoff. The MIT Press Cambridge, Massachusetts London, England

Wednesday, October 10 Handout: One-Tailed Tests, Two-Tailed Tests, and Logarithms

Hint: The following equation converts Celsius to Fahrenheit: F = C where C = degrees Celsius F = degrees Fahrenheit

Chapter 14: Omitted Explanatory Variables, Multicollinearity, and Irrelevant Explanatory Variables

ECON3150/4150 Spring 2016

Answers to Problem Set #4

Applied Statistics and Econometrics

Amherst College Department of Economics Economics 360 Fall 2012

2. Linear regression with multiple regressors

4. Nonlinear regression functions

LECTURE 15: SIMPLE LINEAR REGRESSION I

Solutions: Monday, October 15

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

Lecture 5. In the last lecture, we covered. This lecture introduces you to

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Lectures 5 & 6: Hypothesis Testing

Multiple Regression Analysis

Solutions: Monday, October 22

Econometrics Midterm Examination Answers

Chapter 10: Multiple Regression Analysis Introduction

Chapter 3. Introduction to Linear Correlation and Regression Part 3

Econometrics Review questions for exam

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Lab 07 Introduction to Econometrics

Introduction to Simple Linear Regression

Solutions: Wednesday, December 12

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation

MBF1923 Econometrics Prepared by Dr Khairul Anuar

Intermediate Econometrics

1 Quantitative Techniques in Practice

Answer Key: Problem Set 6

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Problem Set 10: Panel Data

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Inferences for Regression

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Intro to Linear Regression

Nov 13 AP STAT. 1. Check/rev HW 2. Review/recap of notes 3. HW: pg #5,7,8,9,11 and read/notes pg smartboad notes ch 3.

Intro to Linear Regression

Lecture 14. More on using dummy variables (deal with seasonality)

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Semester 2, 2015/2016

SIMPLE REGRESSION ANALYSIS. Business Statistics

Statistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Business Statistics. Lecture 10: Course Review

Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Chapter 1 Handout: Descriptive Statistics

Linear Regression with Multiple Regressors

Gov 2000: 9. Regression with Two Independent Variables

Lab 11 - Heteroskedasticity

General Linear Model (Chapter 4)

The Simple Linear Regression Model

Intermediate Econometrics

Exercise sheet 3 The Multiple Regression Model

Multiple Regression Analysis

Wednesday, December 12 Handout: Simultaneous Equations Identification

Business Statistics. Lecture 9: Simple Regression

Statistical Inference. Part IV. Statistical Inference

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

appstats27.notebook April 06, 2017

Do not copy, post, or distribute

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Conditions for Regression Inference:

Handout 12. Endogeneity & Simultaneous Equation Models

Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Analytics What s New? Not Much!

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

Lecture 24: Partial correlation, multiple regression, and correlation

Stat 101 L: Laboratory 5

Regression Analysis. BUS 735: Business Decision Making and Research

2 Prediction and Analysis of Variance

Amherst College Department of Economics Economics 360 Fall 2015 Monday, December 7 Problem Set Solutions

Review of Multiple Regression

AP CALCULUS BC 2010 SCORING GUIDELINES

Correlation and Linear Regression

Transcription:

Amherst College Department of Economics Economics Fall 2012 Wednesday, September 19 Handout: Ordinary Least Squares Estimation Procedure he Mechanics Preview Best Fitting Line: Income and Savings Clint s Assignment Simple Regression Model o Parameters of the Model o Error erm o Best Fitting Line Ordinary Least Squares (OLS) Estimation Procedure o Sum of Squared Residuals Criterion o Finding the Best Fitting Line Importance of the Error erm o o Absence of Random Influences Presence of Random Influences: Constant and Coefficient of Best Fitting Line Are Random Variables Error erms and Random Influences: A Closer Look Clint s Assignment: he wo Parts Income and Savings he following table reports on the (after tax) income of Americans and their savings between 19 and 1975 in billions of dollars: Year Income Savings Year Income Savings Year Income Savings 19 210.1 17.9 1959.5 2.9 1968 625.0 67.0 1951 21.0 22.5 19 65.4.7 1969 674.0 68.8 1952 24.4 2.9 1961 81.8 9.7 19 75.7 87.2 195 258.6 25.5 1962 405.1 41.8 1971 1.8 99.9 1954 264. 24. 196 425.1 42.4 1972 869.1 98.5 1955 28. 24.5 1964 462.5 51.1 197 978. 125.9 1956 0.0 1. 1965 498.1 54. 1974 1071.6 18.2 1957 19.8 2.9 1966 57.5 56.6 1975 1187.4 15.0 1958 0.5 4. 1967 575. 67.5 Economic theory suggests that as Americans earn more income, we will save more. heory: Additional income increases savings.

2 Question: Do the data support the theory? Question: How can we estimate the relationship between savings and income more precisely? hat is, what equation describes the best fitting line? We estimate that an additional $1 of income increases savings by $ ; or equivalently, an an additional $1,000 of income increase savings by $. Aside: Random Influences Clint s Assignment: Effect of Studying on Quiz Scores Background: hree students are enrolled in Professor Jeff Lord s 8:0 am class. Every week, he gives a quiz. Professor Lord asks his students to report the number of minutes they studied; the students always respond honestly. Std heory: Additional studying increases quiz scores. Professor Lord s First Quiz: Student Minutes Score 1 5 66 2 15 87 25 Question: Do the data support the theory? 5 10 15 20 25 0

he Regression Model y t β Const + β x x t + e t where y t Quiz score received by student t: x t Number of minutes studied by student t: e t Error term for student t: Interpretation of the parameters, β Const and β x : β Const represents the number of points Professor Lord gives students just for showing up; β x represents the number of additional points earned for each additional minute of study. Interpretation of the error term, e t : he error term, e t, is a random variable; it represents random influences, the factors that cannot be anticipated and/or determined before the quiz is given. wo implicit assumptions: Professor Lord gives each student the same number of points for showing up. he number of additional points earned for an additional minute of study is the same for each student. Clint s Assignment: Find β Const and β x. But, β Const and β x are unobservable. What can Clint do? Econometrician s Philosophy: If you lack the information to determine the value directly, estimate the value to the best of your ability using the information you do have. Strategy: Use the intercept and slope of the best fitting line to estimate β Const and β x. b Const Intercept of the best fitting line b Const estimates the value of β Const b x Slope of the best fitting line b x estimates the value of β x Problem: How can we decide on the best fitting line? Std 5 10 15 20 25 0

4 he Ordinary Least Squares (OLS) Estimation Procedure Ordinary Least Squares (OLS) Criterion: Minimize the sum of squared residuals. he following two equations achieve this objective: Σ b Const y b t1 (yt y )(x t x x b x Σ t1 (xt 2 Step 1: Define the sum of squared residuals (SSR) he Model: y t β Const + β x x t + e t y t Actual quiz score received by student t: Dependent variable x t Actual number of minutes studied by student t: Explanatory variable e t Actual error for student t β Const Actual constant: Points awarded for showing up β x Actual coefficient: Additional points received for each additional minute studied he Estimate: Esty t b Const x t Esty t Estimated quiz score for student t b Const Estimated constant; that is, b Const estimates the value of β Const b x Estimated coefficient; that is, b x estimates the value of β x he Residual: Res t y t Esty t Res t Residual for student t Res t Actual quiz score for student t Estimated quiz score for student t Strategy: Determine the best fitting line by minimizing the sum of squared residuals. Esty 1 b Const Esty 2 b Const Esty b Const Res 1 y 1 Esty 1 Res 2 y 2 Esty 2 Res y Esty Res 1 y 1 Res 2 y 2 Res y SSR Res 2 1 + Res2 2 + Res2 ) 2 + (y 2 ) 2 + (y ) 2

5 Step 2: Differentiate the sum of squared residuals (SSR) with respect to b Const dssr db 2 ) 2(y 2 ) 2(y ) 0 Const ) + (y 2 ) + (y ) 0 + y 2 + y ) + ( ) + ( b x ) 0 + y 2 + y ) b Const ( + + ) 0 y 1 + y 2 + y + + 0 y x 0 y b Const x Note that b Const y x (x, y ) Std OLS Estimate: y b Const x 5 10 15 20 25 0 Step : Differentiate the sum of squared residuals (SSR) with respect to b x SSR ) 2 + (y 2 ) 2 + (y ) 2 [y 1 (y x ) ] 2 + [y 2 (y x ) ] 2 + [y (y x ) ] 2 [y 1 y x ] 2 + [y 2 y x ] 2 + [y y x ] 2 [y 1 y x ] 2 + [y 2 y x ] 2 + [y y x ] 2 [ y ) ( ] 2 + [(y 2 y ) ( ] 2 + [(y y ) ( ] 2 dssr db x 2[ y ) ( ]( 2[(y 2 y ) ( ]( 2[(y y ) ( ]( 0 [ y ) ( ]( + [(y 2 y ) ( ]( + [(y y ) ( ]( 0 y )( ( 2 + (y 2 y )( ( 2 + (y y )( ( 2 0 y )( + (y 2 y )( + (y y )( b x ( 2 ( 2 ( 2 y )( + (y 2 y )( + (y y )( b x [( 2 + ( 2 + ( 2 ] b x y )( + (y 2 y )( + (y y )( ( 2 + ( 2 + ( 2 Σ t1 (yt y )(x t Σ t1 (xt 2

6 Ordinary Least Squares Estimates Calculations he Data: Student x y 1 5 66 x Minutes Studied 2 15 87 y Quiz score 25 he equations: b Const y b x x b x y )( + (y 2 y )( + (y y )( ( 2 + ( 2 + ( 2 Σ t1 (yt y )(x t Σ t1 (xt 2 he means: y y 1 + y 2 + y + + x Deviations from the means: Student y t y y t y x t x x t x 1 66 5 2 87 15 25 Product of the x and y deviations and squared x deviations. Student (y t y)(x t x) (x t x) 2 1 ( )( ) ( ) 2 2 ( )( ) ( ) 2 ( )( ) ( ) 2 Sum Sum Σ t1 (yt y )(x t Applying the formulas: b x Σ t1 (xt x ) 2 b Const y x Ordinary Least Squares (OLS) Best Fitting Line: y + x Std OLS Estimate: y + x (x, y ) 5 10 15 20 25 0

7 he sum of squared residuals for the best fitting line Student x t y t Esty t 6 + 6 5 x t 6 + 1.2x t Res t y t Esty 2 t Res t 1 5 66 6 + 6 6 + 5 2 15 87 6 + 6 6 + 6 + 5 25 6 + 6 6 + 6 + 5 SSR Simulation to Check Our Calculations for the OLS Best Fitting Line EViews Dependent Variable: Y Included observations: Variable Coefficient Std. Error t-statistic Prob. X 1.200000 0.519615 2.09401 0.21 C 6.00000 8.874120 7.099296 0.0891 Sum squared resid 54.00000 Schwarz criterion 6.4657 Best Fitting Line: y + x Summary he Regression Model Consider the following equation: y t β Const + β x x t + e t where y t Quiz score received by student t x t Minutes studied by student t e t Error term for student t β Const and β x are called the parameters of the model. Before interpreting the parameters recall that it is generally believed that Professor Lord gives students some credit just for showing up for the quiz; Studying more will improve a student s score. Interpreting β Const and β x : β Const represents ; β x represents. Interpreting the Ordinary Least Squares Estimates: Esty 6 + 1.2x We estimate that Professor Lord gives students points for showing up for the quiz. Studying one additional minute results in additional points.

8 Importance of the Error erm Regression Model: y t β Const + β x x t + e t where y t Quiz score of student t x t Minutes studied by student t e t Error term for student t For the moment, suppose that β Const equals and β x equals 2. In words, this means: Professor Lord gives each student points for showing up. Each additional minute of study provides 2 additional points. he regression model is: y t + 2x t + e t he actual constant would be and the actual coefficient would be 2. Error erm Represents Random Influences: e t he error term reflects all the factors that cannot be anticipated or determined before the quiz is given; that is the error term represents all random influences. WHA IF Question: What if there were no random influences? hat is, what if there were no error term? In the absence of an error term, y t + 2x t ; that is, in the absence of an error term there would be no random influences: Actual: y + 2x Absence of Random Influences Student Minutes (x t ) Score (y t + 2x t ) 5 10 15 20 25 0 1 5 + 2 + 2 15 + 2 + 25 + 2 + Claim: In the absence of random influences, it would be trivial to compute the actual value of the constant and coefficient.

9 Coefficient Estimate Simulation: Absence of Random Influences Absence of Error erm o address this question, we shall begin by using Act Const our simulation to Actual 40 illustrate the importance Constant: No error term of the error term. β Const NB: We can view each week s quiz as one repetition of an experiment. Actual Coefficient: β x Act Coef Our simulation allows us to do something we Repetition cannot do in the real world. It allows us to Coef Est specify the constant and coefficient of our model; that is, we can select β Const and β x. hat is, we can specify the points Professor Lord gives students just for showing up, ; additional points earned for an additional minute of study, 2. Err erm Note that initially the Err erm checkbox is checked indicating that the error term and hence random influences are present. o eliminate the error term and random influences, the Err erm checkbox is cleared. 2 0 2 Estimated coefficient value calculated from this repetition: Σ t1 (yt y )(x t b x Σ t1 (xt 2 Coefficient Estimate: Estimate of Coefficient Value Repetition No Error erm 1 2 4 Std In the absence of random influences, the best fitting line fits the data perfectly. he best fitting line coincides with the actual line. We can determine the actual value of the coefficient by calculating the slope of the line using any two points. Actual: y + 2x But remember that the absence of random influences is unrealistic. In the real world, random influences are inevitably present. We shall now use a simulation to illustrate how the error term in the model captures the random influences. 5 10 15 20 25 0

10 Random Influences Are Present in the Real World But the real world is not this simple; random influences play an important role in the real world. Presence of Random Influences Student 1 5 66 2 15 87 25 he red points represent the actual scores from the first quiz; that is, the red points include the random influences. As a consequence of the random influences, Students 1 and 2 over perform while Student under performs. hat is, Student 1: e 1 is Student 2: e 2 is Student : e is 5 10 15 20 25 0 Coefficient Estimate Simulation: Presence of Random Influences Presence of Error erm Coefficient Estimate: Estimate of Coefficient Value Repetition No Error erm Act Err Var 0 1 2.0 2 2.0 2.0 4 2.0 Actual: y + 2x Std As a consequence of the random influences, the line which best fits the data does not have an intercept of, the actual intercept; also, the best fitting line does not have a coefficient of 2, the actual coefficient. he simulation is reporting on the coefficient estimates. Actual Constant: β Const Actual Coefficient: β x Repetition Coef Est Act Const 40 Act Coef 2 0 2 Act Err Var 200 0 Err erm Variance Error of Error erm Probability Distribution: Var[e] Estimated coefficient value calculated from this repetition: Σ t1 (yt y )(x t b x Σ t1 (xt 2

11 Key Point: he constant and coefficient estimates are a random variable. Real world Random influences are We expect the intercept and slope of the best fitting line to equal the actual constant and coefficient In fact, even if we know the actual values of the constant and coefficient, β Const and β x, we predict the constant and coefficient of the best fitting line, b Const and b x, with certainty before the quiz was given. he intercept and slope of the best fitting line, b Const and b x, are. he Error erm and Random Influences: A Closer Look Actual: y + 2x 5 10 15 20 25 0 Std OLS Estimate: y 6 + 1.2x he Model: y t β Const + β x x t + e t he error term, e t, is a random variable. Intuition: What happens after many, many quizzes? Since the error term represents the random influences, a student s error term should be: positive about half the time indicating that the student performs than usual; negative about half the time indicating that the student performs than usual. In the long run, however, the error terms should average out to. Random Influence Error erm Simulation Initially, the Pause Err Var checkbox is checked and variance of the error Repetition Pause 200 term s probability distribution is 0. Click 0 Start and record error term for each of the three students in the first repetition Repetition Student 1 Student 2 Student 1 2 Actual Variance of Error erm s Probability Distribution: Var[e] Can you predict the numerical value of a student s error term beforehand?.

12 Next, clear the Pause checkbox and click Continue. After 1,000,000 or so repetitions, click Stop. Mean[e 1 ] Mean[e 2 ] Mean[e ] e 1 is positive about e 2 is positive about e is positive about the time and negative the time and negative the time and negative about the time about the time about the time e 1 has systematic e 2 has systematic e has systematic effect on Student 1 s score effect on Student 2 s score effect on Student s score e 1 represents e 2 represents e represents a influence a influence a influence Summary he mean of the probability distribution for each student s error term equals 0. he chances that a student s error term will be positive in any one quiz are about equal to the chances that it will be negative. A student s error term has no systematic effect on a his/her quiz score. A student s error term represents a random influence. Clint s Assignment: Where Do We Stand? Summary he OLS estimate for the value of the coefficient is 1.2; Clint estimates that an additional minute of studying results in 1.2 additional points suggesting that the theory is correct. But, since random influences are present in the real world, we know that the coefficient estimate is a random variable. We are all but certain that the numerical value of the coefficient estimate, 1.2, does NO equal the actual value of the coefficient. What should Clint do? We shall proceed by dividing Clint s assignment into two related parts: Coefficient Reliability: How reliable is the coefficient estimate calculated from the results of the first quiz? hat is, how confident should Clint be that the coefficient estimate, 1.2, will be close to the actual value? heory Confidence: How much confidence should Clint have in the theory that additional studying increases quiz scores?