Wednesday, September 19 Handout: Ordinary Least Squares Estimation Procedure The Mechanics

Size: px

Start display at page:

Download "Wednesday, September 19 Handout: Ordinary Least Squares Estimation Procedure The Mechanics"

Kelley Blake
5 years ago
Views:

1 Amherst College Department of Economics Economics Fall 2012 Wednesday, September 19 Handout: Ordinary Least Squares Estimation Procedure he Mechanics Preview Best Fitting Line: Income and Savings Clint s Assignment Simple Regression Model o Parameters of the Model o Error erm o Best Fitting Line Ordinary Least Squares (OLS) Estimation Procedure o Sum of Squared Residuals Criterion o Finding the Best Fitting Line Importance of the Error erm o o Absence of Random Influences Presence of Random Influences: Constant and Coefficient of Best Fitting Line Are Random Variables Error erms and Random Influences: A Closer Look Clint s Assignment: he wo Parts Income and Savings he following table reports on the (after tax) income of Americans and their savings between 19 and 1975 in billions of dollars: Year Income Savings Year Income Savings Year Income Savings Economic theory suggests that as Americans earn more income, we will save more. heory: Additional income increases savings.

2 2 Question: Do the data support the theory? Question: How can we estimate the relationship between savings and income more precisely? hat is, what equation describes the best fitting line? We estimate that an additional $1 of income increases savings by $ ; or equivalently, an an additional $1,000 of income increase savings by $. Aside: Random Influences Clint s Assignment: Effect of Studying on Quiz Scores Background: hree students are enrolled in Professor Jeff Lord s 8:0 am class. Every week, he gives a quiz. Professor Lord asks his students to report the number of minutes they studied; the students always respond honestly. Std heory: Additional studying increases quiz scores. Professor Lord s First Quiz: Student Minutes Score Question: Do the data support the theory?

3 he Regression Model y t β Const + β x x t + e t where y t Quiz score received by student t: x t Number of minutes studied by student t: e t Error term for student t: Interpretation of the parameters, β Const and β x : β Const represents the number of points Professor Lord gives students just for showing up; β x represents the number of additional points earned for each additional minute of study. Interpretation of the error term, e t : he error term, e t, is a random variable; it represents random influences, the factors that cannot be anticipated and/or determined before the quiz is given. wo implicit assumptions: Professor Lord gives each student the same number of points for showing up. he number of additional points earned for an additional minute of study is the same for each student. Clint s Assignment: Find β Const and β x. But, β Const and β x are unobservable. What can Clint do? Econometrician s Philosophy: If you lack the information to determine the value directly, estimate the value to the best of your ability using the information you do have. Strategy: Use the intercept and slope of the best fitting line to estimate β Const and β x. b Const Intercept of the best fitting line b Const estimates the value of β Const b x Slope of the best fitting line b x estimates the value of β x Problem: How can we decide on the best fitting line? Std

4 4 he Ordinary Least Squares (OLS) Estimation Procedure Ordinary Least Squares (OLS) Criterion: Minimize the sum of squared residuals. he following two equations achieve this objective: Σ b Const y b t1 (yt y )(x t x x b x Σ t1 (xt 2 Step 1: Define the sum of squared residuals (SSR) he Model: y t β Const + β x x t + e t y t Actual quiz score received by student t: Dependent variable x t Actual number of minutes studied by student t: Explanatory variable e t Actual error for student t β Const Actual constant: Points awarded for showing up β x Actual coefficient: Additional points received for each additional minute studied he Estimate: Esty t b Const x t Esty t Estimated quiz score for student t b Const Estimated constant; that is, b Const estimates the value of β Const b x Estimated coefficient; that is, b x estimates the value of β x he Residual: Res t y t Esty t Res t Residual for student t Res t Actual quiz score for student t Estimated quiz score for student t Strategy: Determine the best fitting line by minimizing the sum of squared residuals. Esty 1 b Const Esty 2 b Const Esty b Const Res 1 y 1 Esty 1 Res 2 y 2 Esty 2 Res y Esty Res 1 y 1 Res 2 y 2 Res y SSR Res Res2 2 + Res2 ) 2 + (y 2 ) 2 + (y ) 2

5 5 Step 2: Differentiate the sum of squared residuals (SSR) with respect to b Const dssr db 2 ) 2(y 2 ) 2(y ) 0 Const ) + (y 2 ) + (y ) 0 + y 2 + y ) + ( ) + ( b x ) 0 + y 2 + y ) b Const ( + + ) 0 y 1 + y 2 + y y x 0 y b Const x Note that b Const y x (x, y ) Std OLS Estimate: y b Const x Step : Differentiate the sum of squared residuals (SSR) with respect to b x SSR ) 2 + (y 2 ) 2 + (y ) 2 [y 1 (y x ) ] 2 + [y 2 (y x ) ] 2 + [y (y x ) ] 2 [y 1 y x ] 2 + [y 2 y x ] 2 + [y y x ] 2 [y 1 y x ] 2 + [y 2 y x ] 2 + [y y x ] 2 [ y ) ( ] 2 + [(y 2 y ) ( ] 2 + [(y y ) ( ] 2 dssr db x 2[ y ) ( ]( 2[(y 2 y ) ( ]( 2[(y y ) ( ]( 0 [ y ) ( ]( + [(y 2 y ) ( ]( + [(y y ) ( ]( 0 y )( ( 2 + (y 2 y )( ( 2 + (y y )( ( 2 0 y )( + (y 2 y )( + (y y )( b x ( 2 ( 2 ( 2 y )( + (y 2 y )( + (y y )( b x [( 2 + ( 2 + ( 2 ] b x y )( + (y 2 y )( + (y y )( ( 2 + ( 2 + ( 2 Σ t1 (yt y )(x t Σ t1 (xt 2

6 6 Ordinary Least Squares Estimates Calculations he Data: Student x y x Minutes Studied y Quiz score 25 he equations: b Const y b x x b x y )( + (y 2 y )( + (y y )( ( 2 + ( 2 + ( 2 Σ t1 (yt y )(x t Σ t1 (xt 2 he means: y y 1 + y 2 + y + + x Deviations from the means: Student y t y y t y x t x x t x Product of the x and y deviations and squared x deviations. Student (y t y)(x t x) (x t x) 2 1 ( )( ) ( ) 2 2 ( )( ) ( ) 2 ( )( ) ( ) 2 Sum Sum Σ t1 (yt y )(x t Applying the formulas: b x Σ t1 (xt x ) 2 b Const y x Ordinary Least Squares (OLS) Best Fitting Line: y + x Std OLS Estimate: y + x (x, y )

7 7 he sum of squared residuals for the best fitting line Student x t y t Esty t x t x t Res t y t Esty 2 t Res t SSR Simulation to Check Our Calculations for the OLS Best Fitting Line EViews Dependent Variable: Y Included observations: Variable Coefficient Std. Error t-statistic Prob. X C Sum squared resid Schwarz criterion Best Fitting Line: y + x Summary he Regression Model Consider the following equation: y t β Const + β x x t + e t where y t Quiz score received by student t x t Minutes studied by student t e t Error term for student t β Const and β x are called the parameters of the model. Before interpreting the parameters recall that it is generally believed that Professor Lord gives students some credit just for showing up for the quiz; Studying more will improve a student s score. Interpreting β Const and β x : β Const represents ; β x represents. Interpreting the Ordinary Least Squares Estimates: Esty x We estimate that Professor Lord gives students points for showing up for the quiz. Studying one additional minute results in additional points.

8 8 Importance of the Error erm Regression Model: y t β Const + β x x t + e t where y t Quiz score of student t x t Minutes studied by student t e t Error term for student t For the moment, suppose that β Const equals and β x equals 2. In words, this means: Professor Lord gives each student points for showing up. Each additional minute of study provides 2 additional points. he regression model is: y t + 2x t + e t he actual constant would be and the actual coefficient would be 2. Error erm Represents Random Influences: e t he error term reflects all the factors that cannot be anticipated or determined before the quiz is given; that is the error term represents all random influences. WHA IF Question: What if there were no random influences? hat is, what if there were no error term? In the absence of an error term, y t + 2x t ; that is, in the absence of an error term there would be no random influences: Actual: y + 2x Absence of Random Influences Student Minutes (x t ) Score (y t + 2x t ) Claim: In the absence of random influences, it would be trivial to compute the actual value of the constant and coefficient.

9 9 Coefficient Estimate Simulation: Absence of Random Influences Absence of Error erm o address this question, we shall begin by using Act Const our simulation to Actual 40 illustrate the importance Constant: No error term of the error term. β Const NB: We can view each week s quiz as one repetition of an experiment. Actual Coefficient: β x Act Coef Our simulation allows us to do something we Repetition cannot do in the real world. It allows us to Coef Est specify the constant and coefficient of our model; that is, we can select β Const and β x. hat is, we can specify the points Professor Lord gives students just for showing up, ; additional points earned for an additional minute of study, 2. Err erm Note that initially the Err erm checkbox is checked indicating that the error term and hence random influences are present. o eliminate the error term and random influences, the Err erm checkbox is cleared Estimated coefficient value calculated from this repetition: Σ t1 (yt y )(x t b x Σ t1 (xt 2 Coefficient Estimate: Estimate of Coefficient Value Repetition No Error erm Std In the absence of random influences, the best fitting line fits the data perfectly. he best fitting line coincides with the actual line. We can determine the actual value of the coefficient by calculating the slope of the line using any two points. Actual: y + 2x But remember that the absence of random influences is unrealistic. In the real world, random influences are inevitably present. We shall now use a simulation to illustrate how the error term in the model captures the random influences

10 10 Random Influences Are Present in the Real World But the real world is not this simple; random influences play an important role in the real world. Presence of Random Influences Student he red points represent the actual scores from the first quiz; that is, the red points include the random influences. As a consequence of the random influences, Students 1 and 2 over perform while Student under performs. hat is, Student 1: e 1 is Student 2: e 2 is Student : e is Coefficient Estimate Simulation: Presence of Random Influences Presence of Error erm Coefficient Estimate: Estimate of Coefficient Value Repetition No Error erm Act Err Var Actual: y + 2x Std As a consequence of the random influences, the line which best fits the data does not have an intercept of, the actual intercept; also, the best fitting line does not have a coefficient of 2, the actual coefficient. he simulation is reporting on the coefficient estimates. Actual Constant: β Const Actual Coefficient: β x Repetition Coef Est Act Const 40 Act Coef Act Err Var Err erm Variance Error of Error erm Probability Distribution: Var[e] Estimated coefficient value calculated from this repetition: Σ t1 (yt y )(x t b x Σ t1 (xt 2

11 11 Key Point: he constant and coefficient estimates are a random variable. Real world Random influences are We expect the intercept and slope of the best fitting line to equal the actual constant and coefficient In fact, even if we know the actual values of the constant and coefficient, β Const and β x, we predict the constant and coefficient of the best fitting line, b Const and b x, with certainty before the quiz was given. he intercept and slope of the best fitting line, b Const and b x, are. he Error erm and Random Influences: A Closer Look Actual: y + 2x Std OLS Estimate: y x he Model: y t β Const + β x x t + e t he error term, e t, is a random variable. Intuition: What happens after many, many quizzes? Since the error term represents the random influences, a student s error term should be: positive about half the time indicating that the student performs than usual; negative about half the time indicating that the student performs than usual. In the long run, however, the error terms should average out to. Random Influence Error erm Simulation Initially, the Pause Err Var checkbox is checked and variance of the error Repetition Pause 200 term s probability distribution is 0. Click 0 Start and record error term for each of the three students in the first repetition Repetition Student 1 Student 2 Student 1 2 Actual Variance of Error erm s Probability Distribution: Var[e] Can you predict the numerical value of a student s error term beforehand?.

12 12 Next, clear the Pause checkbox and click Continue. After 1,000,000 or so repetitions, click Stop. Mean[e 1 ] Mean[e 2 ] Mean[e ] e 1 is positive about e 2 is positive about e is positive about the time and negative the time and negative the time and negative about the time about the time about the time e 1 has systematic e 2 has systematic e has systematic effect on Student 1 s score effect on Student 2 s score effect on Student s score e 1 represents e 2 represents e represents a influence a influence a influence Summary he mean of the probability distribution for each student s error term equals 0. he chances that a student s error term will be positive in any one quiz are about equal to the chances that it will be negative. A student s error term has no systematic effect on a his/her quiz score. A student s error term represents a random influence. Clint s Assignment: Where Do We Stand? Summary he OLS estimate for the value of the coefficient is 1.2; Clint estimates that an additional minute of studying results in 1.2 additional points suggesting that the theory is correct. But, since random influences are present in the real world, we know that the coefficient estimate is a random variable. We are all but certain that the numerical value of the coefficient estimate, 1.2, does NO equal the actual value of the coefficient. What should Clint do? We shall proceed by dividing Clint s assignment into two related parts: Coefficient Reliability: How reliable is the coefficient estimate calculated from the results of the first quiz? hat is, how confident should Clint be that the coefficient estimate, 1.2, will be close to the actual value? heory Confidence: How much confidence should Clint have in the theory that additional studying increases quiz scores?

Chapter 5: Ordinary Least Squares Estimation Procedure The Mechanics Chapter 5 Outline Best Fitting Line Clint s Assignment Simple Regression Model o

Chapter 5: Ordinary Least Squares Estimation Procedure The Mechanics Chapter 5 Outline Best Fitting Line Clint s Assignment Simple Regression Model o Parameters of the Model o Error Term and Random Influences