Chapter 5: Ordinary Least Squares Estimation Procedure The Mechanics Chapter 5 Outline Best Fitting Line Clint s Assignment Simple Regression Model o

Size: px
Start display at page:

Download "Chapter 5: Ordinary Least Squares Estimation Procedure The Mechanics Chapter 5 Outline Best Fitting Line Clint s Assignment Simple Regression Model o"

Transcription

1 Chapter 5: Ordinary Least Squares Estimation Procedure The Mechanics Chapter 5 Outline Best Fitting Line Clint s Assignment Simple Regression Model o Parameters of the Model o Error Term and Random Influences o Best Fitting Line o Needed: A Systematic Procedure to Determine the Best Fitting Line Ordinary Least Squares (OLS) Estimation Procedure o Sum of Squared Residuals Criterion o Finding the Best Fitting Line Importance of the Error Term o Absence of Random Influences: A What If Question o Presence of Random Influences: Back to Reality Error Terms and Random Influences: A Closer Look Standard Ordinary Least Squares (OLS) Premises Clint s Assignment: The Two Parts Chapter 5 Prep Questions 1. The following table reports the (disposable) income earned by Americans and their total savings between 1950 and 1975 in billions of dollars: Income Savings Income Savings Income Savings Year (Billion $) (Billion $) Year (Billion $) (Billion $) Year (Billion $) (Billion $) a. Construct a scatter diagram for income and savings. Place income on the horizontal axis and savings on the vertical axis. b. Economic theory teaches that savings increases with income. Do these data tend to support this theory?

2 2 c. Using a ruler, draw a straight line through these points to estimate the relationship between savings and income. What equation describes this line? d. Using the equation, estimate by how much savings will increase if income increases by $1 billion. 2. Three students are enrolled in Professor Jeff Lord s 8:30 am class. Every week, he gives a short quiz. After returning the quiz, Professor Lord asks his students to report the number of minutes they studied; the students always respond honestly. The minutes studied and the quiz scores for the first quiz appear in the table below: 1 Minutes Quiz Student Studied (x) Score (y) a. Construct a scatter diagram for income and savings. Place minutes on the horizontal axis and score on the vertical axis. b. Ever since first grade, what have your parents and teachers been telling you about the relationship between studying and grades? For the most part, do these data tend to support this theory? c. Using a ruler, draw a straight line through these points to estimate the relationship between minutes studied and quiz scores. What equation describes this line? d. Using the equation, estimate by how much a student s quiz score would increase if that student studies one additional minute. 3. Recall that the presence of a random variable brings forth both bad news and good news. a. What is the bad news? b. What is the good news? 4. What is the relative frequency interpretation of probability? 5. Calculus problem: Consider the following equation: SSR= ( y1 bconst bx x 1) + ( y2 bconst bx x 2) + ( y3 bconst bx x 3) Differentiate SSR with respect to b Const and set the derivative equal to 0: dssr = 0. dbconst Solve for b Const, and show that

3 3 y1 + y2 + y3 bconst = y bx x where y = 3 x1 + x2 + x3 x = 3 6. Again, consider the following equation: SSR= ( y1 bconst bx x 1) + ( y2 bconst bx x 2) + ( y3 bconst bx x 3) Let bconst = y bx x Substitute the expression for b Const into the equation for SSR. Show that after the substitution: SSR= [( y y) b ( x x)] + [( y y) b ( x x)] + [( y y) b ( x x)] x 1 2 x 2 3 x 3 Best Fitting Line Recall the income and savings data we introduced in the chapter preview questions: Income and Savings Data: Annual time series data of U. S. disposable income and savings from 1950 and Income Savings Income Savings Income Savings Year (Billion $) (Billion (Billion (Billion (Billion (Billion Year Year $) $) $) $) $) Table 5.1: U. S. Annual Income and Savings Data 1950 to 1975 Economic theory suggests that as American households earn more income, they will save more: Theory: Additional income increases savings. Project: Assess the effect of income on savings. Question: How can we use our data to test this theory? That is, how can we assess the effect of income on savings?

4 4 Answer: We begin by drawing a scatter diagram of the income-savings data. Each point represents income and savings of a single year. The lower left point represents income and savings for 1950: (210.1, 17.9). The upper right point represents income and savings for 1975: (1187.4, 153.0). Each other point represents one of the other years. Figure 5.1: Income and Savings Scatter Diagram The data appear to support the theory: as income increases, savings generally increase. Question: How can we estimate the relationship between income and savings? Answer: Draw a line through the points that best fits the data; then, use the equation for the best fitting line to estimate the relationship.

5 5 Figure 5.2: Income and Savings Scatter Diagram with Best Fitting Line By choosing two points on this line, we can solve for the equation of the best fitting line. It looks like the points (200, 15) and (1200, 155) are more or less on the line. Let us use these two points to estimate the slope: Rise Slope = = = =.14 Run 1, ,000 A little algebra allows us to derive the equation for this line: y 15 =.14 x 200 y 15 =.14x 28 y =.14x 13 This equation suggests that if Americans earn an additional $1 of income, savings will rise by an estimated $.14; or equivalently, we estimate that a $1,000 increase in income causes a $140 increase in savings. Since the slope is positive, the data appear to support our theory; additional income appears to increase savings. Clint s Assignment Next, consider a second example. Three students are enrolled in Professor Jeff Lord s 8:30 am class. Every week, he gives a short quiz. After returning the quiz, Professor Lord asks his students to report the number of minutes they studied; the

6 students always respond honestly. The minutes studied and the quiz scores for the first quiz appear in Table 5.2: 6

7 7 Student Minutes Studied (x) Quiz Score (y) Table 5.2: First Quiz Results The theory suggests that a student s score on the quiz depends on the number of minutes he/she studied: Theory: Additional studying increases quiz scores. Also, it is generally believed that Professor Lord, a very generous soul, awards students some points just for showing up for a quiz so early in the morning. Our friend Clint has been assigned the problem of assessing the theory. Clint s assignment is to use the data from Professor Lord s first quiz to assess the theory: Project: Use data from Professor Lord s first quiz to assess the effect of studying on quiz scores. Simple Regression Model The following equation allows us to use the simple regression model to assess the theory: y t = β Const + β x x t + e t where y t = Quiz score received by student t x t = Number of minutes studied by student t e t = Error term for student t: Random influences t = 1, 2, and 3 Denoting the three students: Student 1, Student 2, and Student 3 y t, quiz score, is called the dependent variable and x t, minutes studied, the explanatory variable. The value of the dependent variable depends on the value of the explanatory variable. Or putting it differently, the value of the explanatory variable explains the value of the dependent value. Parameters of the Model β Const and β x, the constant and coefficient of the equation, are called the parameters of the model. To interpret the parameters recall that it is generally believed that Professor Lord gives students some points just for showing up for the quiz. the theory postulates that studying more will improve a student s score. Using these observations, we can interpret the parameters, β Const and β x : β Const represents the number of points Professor Lord gives students just for showing up.

8 β x represents the number of additional points earned for an additional minute of studying. Error Term and Random Influences e t is the error term. The error term reflects all the random influences on student t s quiz score, y t. For example, if Professor Lord were in an unusually bad humor when he graded one student s quiz, that student s quiz score might be unusually low; this would be reflected by a negative error term. On the other hand, if Professor Lord were in an unusually good humor, the student s score might be unusually high and a positive error term would result. Professor Lord s disposition is not the only sources of randomness. For example, one particular student could have just lucked out by correctly anticipating the questions Professor Lord asked. In this case, the students score would be unusually high, his/her error term would be positive. All these random influences are accounted for by the error term. The error term accounts for all the factors that cannot be determined or anticipated beforehand. What Is Simple about the Simple Regression Model? The word simple is used to describe the model because the model includes only a single explanatory variable. Obviously, many other factors influence a student s quiz score; the number of minutes studied is only one such factor. However, we must start somewhere. We will begin with the simple regression model. Later we shall move on and introduce multiple regression models to analyze more realistic scenarios in which two or more explanatory variables are used to explain the dependent variable. Best Fitting Line Question: How can Clint use the data to assess the effect of studying on quiz scores? Answer: He begins by drawing a scatter diagram using the data appearing in Table

9 9 Score (y) Std 2 Std Std Minutes (x) Figure 5.3: Minutes and Scores Scatter Diagram The data appear to confirm the theory. As minutes studied increase, quiz scores tend to increase. Question: How can Clint estimate the relationship between minutes studied and the quiz score more precisely? Answer: Draw a line through the points that best fits the data; then, use the best fitting line s equation to estimate the relationship.

10 10 Score (y) Std 2 Std Std Minutes (x) Figure 5.4: Minutes and Scores Scatter Diagram with Clint s Eyeballed Best Fitting Line Clint s effort to eyeball the best fitting line appears in Figure 5.4. By choosing two points on this line, Clint can solve for the equation of his best fitting line. It looks like the points (0, 60) and (20, 90) are more or less on the line. He can use these two points to estimate the slope: Rise Slope = = = = 1.5 Run Next, Clint can use a little algebra to derive the equation for the line: y 60 = 1.5 x 0 y 60 = 1.5x y = x This equation suggests that an additional minute of studying increases a student s score by 1.5 points. Needed: A Systematic Procedure to Determine the Best Fitting Line Let us compare the two examples we introduced. In the income-savings case, the points were clustered tightly around our best fitting line. Two different individuals

11 might not eyeball the identical best fitting line, but the difference would be slight. In the minutes-scores case, however, the points are not clustered nearly so tightly. Two individuals could eyeball the best fitting line very differently; therefore, two individuals could derive substantially different equations for the best fitting line and would then would report very different estimates of the effect that studying has on quiz scores. Consequently, we need a systematic procedure to determine the best fitting line. Furthermore, once we determine the best fitting line, we need to decide how confident we should be in the theory. We will now address two issues: What systematic procedure should we use to determine the best fitting line for the data? In view of the best fitting line, how much confidence should we have in the theory s validity? Ordinary Least Squares (OLS) Estimation Procedure The ordinary least squares (OLS) estimation procedure is the most widely used estimation procedure to determine the equation for the line that best fits the data. Its popularity results from two factors. The procedure is computationally straightforward; it provides us (and computer software) with a relatively easy way to estimate the regression model s parameters, the constant and slope of the best fitting line. possesses several desirable properties when the error term meets certain conditions. This chapter focuses on the computational aspects of the ordinary least squares (OLS) estimation procedure. In Chapter 6 we turn to the properties of the estimation procedure. We begin our study of the ordinary least squares (OLS) estimation procedure by introducing a little notation. We must distinguish between the actual values of the parameters and the estimates of the parameters. We have used the Greek letter beta, β, to denote the actual values. Recall the original model: y t = β Const + β x x t + e t β Const denotes the actual constant and β x the actual coefficient. We shall use Roman italicized b s to denote the estimates. b Const denotes the estimate of the constant for the best fitting line and b x denotes the estimate of the coefficient for the best fitting line. That is, the equation for the best fitting line is: y = b Const + b x x The constant and slope of the best fitting line, b Const and b x, estimate the values of β Const and β x. 2 Sum of Squared Errors Criterion 11

12 12 The ordinary least squares (OLS) estimation procedure chooses b Const and b x so as to minimize the sum of the squared residuals. We shall now use our example to illustrate precisely what this means. We begin by introducing an equation for each student s estimated score: Esty 1, Esty 2, and Esty 3. Esty 1 = b Const + b x x 1 Esty 2 = b Const + b x x 2 Esty 3 = b Const + b x x 3 Esty 1, Esty 2, and Esty 3 estimate the score received by students 1, 2, and 3 based on the estimated constant, b Const, the estimated coefficient, b x, and the number of minutes each student studies, x 1, x 2, and x 3. The difference between a student s actual score, y t, and his/her estimated score, Esty t, is called the residual, Res t : Res 1 = y 1 Esty 1 Res 2 = y 2 Esty 2 Res 3 = y 3 Esty 3 Substituting for each student s estimated score: Res 1 = y 1 b Const b x x 1 Res 2 = y 2 b Const b x x 2 Res 3 = y 3 b Const b x x 3 Next, we square each residual and add them together to compute the sum of squared residuals, SSR: SSR = Res1 + Res2 + Res3 = ( y b b x ) + ( y b b x ) + ( y b b x ) Const x 1 2 Const x 2 3 Const x 3 We can generalize the sum of squared residuals by considering a sample size of T: T T 2 2 t t Const x t t= 1 t= 1 SSR = Res = ( y b b x ) where T = SampleSize b Const and b x are chosen to minimize the sum of squared residuals. The following equations for b Const and b x accomplish this: t t= 1 Const = x x = T b y b x b T ( y y)( x x) t= 1 ( x x) To justify the equations, consider a sample size of 3: SSR = ( y b bx) + ( y b bx) + ( y b bx) Const x 1 2 Const x 2 3 Const x 3 Finding the Best Fitting Line First, focus on b Const. Differentiate the sum of squared residuals, SSR, with respect to b Const and set the derivative equal to 0: t t 2

13 13 dssr db Const = 2( y b bx) + 2( y b bx) + 2( y b bx) = 0 1 Const x 1 2 Const x 2 3 Const x 3 Dividing by 2. ( y1 bconst bx x 1) + ( y2 bconst bx x 2) + ( y3 bconst bx x 3) = 0 Collecting like terms. ( y1+ y2 + y3) + ( bconst bconst bconst ) + ( bx x 1 bx x 2 bx x 3) = 0 Simplifying. ( y1+ y2 + y3) 3 bconst bx ( x1+ x2 + x3) = 0 Dividing by y x b x Const bx 3 3 = 0 y1+ y2 + y3 Since equals the mean of yy,, 3 x1+ x2 + x3 and equals the mean of x, x : 3 y b b x = Const x 0

14 14 Score (y) Std 2 (x, y ) = (15, 81) Std 3 OLS Estimate: y = b Const + b x x 70 Std Minutes (x) Figure 5.5: Minutes and Scores Scatter Diagram with OLS Best Fitting Line Our first equation, our equation for b Const, is now justified. To minimize the sum of squared residuals, the following relationship must be met: y = bconst + bx x or bconst = y bx x As illustrated in Figure 5.5, this equation simply says that the best fitting line must pass through the point ( x, y ), the point representing the mean of x, minutes studied, and the mean of y, the quiz scores. It is easy to calculate the means: x1+ x2 + x x = = = = y1+ y2 + y y = = = = The best fitting line passes through the point (15, 81). Next, we shall justify the equation for b x. Reconsider the equation for the sum of squared residuals and substitute y b x for b Const : x SSR = ( y b bx) + ( y b bx) + ( y b bx) Const x 1 2 Const x 2 3 Const x 3 Substituting y b xfor b. x Const

15 15 = [ y ( y b x) b x ] + [ y ( y b x) b x ] + [ y ( y b x) b x ] x x 1 2 x x 2 3 x 3 Simplifying each of the three terms. = [ y y+ b x b x ] + [ y y+ b x b x ] + [ y y+ b x b x ] x x 1 2 x x 2 3 x 3 Switching of the b x terms within each of the three squared terms. = [ y y b x + b x] + [ y y b x + b x] + [ y y b x + b x] x 1 x 2 x 2 x 3 x 3 x Factoring out b x within each of the three squared terms. = [( y y) b ( x x)] + [( y y) b ( x x)] + [( y y) b ( x x)] x 1 2 x 2 3 x 3 To minimize the sum of squared residuals, differentiate SSR with respect to b x and set the derivative equal to 0: dssr = 2[( y1 y) bx( x1 x)]( x1 x) 2[( y2 y) bx( x2 x)]( x2 x) db x 2[( y y) b ( x x)]( x x) = 0 3 x 3 3 Dividing by 2. [( y y) b ( x x)]( x x) + [( y y) b ( x x)]( x x) 1 x x [( y3 y) bx ( x3 x)]( x3 x) = 0 Simplifying the expression. ( y y)( x x) b ( x x) + ( y y)( x x) b ( x x) x x ( y3 y)( x3 x) bx ( x3 x) = 0 Moving all terms containing b x to the right side. ( y y)( x x) + ( y y)( x x) + ( y y)( x x) = b ( x x) + b ( x x) + b ( x x) x 1 x 2 x 3 Factoring out b x from the right side terms. ( y y)( x x) + ( y y)( x x) + ( y y)( x x) = b [( x x) + ( x x) + ( x x) ] x Solving for b x. ( y1 y)( x1 x) + ( y2 y)( x2 x) + ( y3 y)( x3 x) bx = ( x1 x) + ( x2 x) + ( x3 x) Now, let us generalize this to a sample size of T:

16 16 b x = T t= 1 ( y y)( x x) T t t= 1 ( x x) t t 2 Therefore, we have justified our second equation. Let us return to Professor Lord s first quiz to calculate the constant and slope, b Const and b x, of the ordinary least squares (OLS) best fitting line for the first quiz s data. We have already computed the means for the quiz scores and minutes studied: x = x1+ x2 + x = = = 15 y1+ y2 + y y = = = = Now, for each student calculate the deviation of y from its mean and the deviations of x from its mean: Student y t y yt y x t x xt x Next, for each student calculate the products of the y and x deviations and squared x deviations: Student 2 ( yt y)( xt x) ( xt x) 1 ( 15)( 10) = 150 (-10) 2 = (6)(0) = 0 (0) 2 = 0 3 (9)(10) = 90 (10) 2 = 100 Sum = 240 Sum = 200 b x equals the sum of the products of the y and x deviations divided by the sum of the squared x deviations: b T ( y y)( x x) t t t= 1 x = = = = T ( xt x) t= 1

17 17 Score (y) Std 2 Std 3 OLS Estimate: y = β Const + β x = x (x, y) = (15, 81) 70 Std Minutes (x) Figure 5.6: Minutes and Scores Scatter Diagram with OLS Best Fitting Line To calculate b Const recall that the best fitting line passes through the point representing the average value of x and y, ( x, y ) : y = bconst + bx x Solving for b Const, bconst = y bx x We just learned that b x equals 6/5. The average of the x s, x, equals 15 and the average of the y s, y, equals 81. Substituting, 6 bconst = 81 x 5 = = 63 Using the ordinary least squares (OLS) estimation procedure, the best fitting line for Professor Lord s first quiz is: 6 y = 63+ x = x 5 Consequently, the least squares estimates for β Const and β x are 63 and 1.2. These estimates suggest that Professor Lord gives each student 63 points just for

18 18 showing up; each minute studied earns the student 1.2 additional points. Based on the regression we estimate that: 1 additional minute studied increases the quiz score by 1.2 points. 2 additional minutes studied increase the quiz score by 2.4 points. etc. Let us now quickly calculate the sum of squared residuals for the best fitting line: Student x t y t 6 Estyt = 63+ xt = x Res t t = yt Estyt 5 2 Res t = = = = = = = = = = = 3 9 The sum of squared residuals for the best fitting line is 54. Econometrics Lab 5.1: Finding the Ordinary Least Squares (OLS) Estimates We can use our Econometrics Lab to emphasize how the ordinary least squares (OLS) estimation procedure determines the best fitting line by accessing the Best Fit simulation. SSR = 54 [Link to MIT-Lab 5.1 goes here.] By default the data from Professor Lord s first quiz are specified: the values of x and y for the first student are 5 and 66, for the second student 15 and 87, and for the third student 25 and 90:

19 19 Figure 5.7: Best Fitting Line Simulation Data Now, click Go. A new screen appears as shown in Figure 5.8 with two slider bars, one slide bar for the constant and one for the coefficient. Figure 5.8: Best Fitting Line Simulation Parameter Estimates By default the constant and coefficient values are 63 and 1.2, the ordinary least squares (OLS) estimates. Also, the arithmetic used to calculate the sum of squared residuals is displayed. When the constant equals 63 and the coefficient equals 1.2, the sum of squared residuals equals 54.00; this just the value that we calculated. Next, experiment with different values for the constant and coefficient values by moving the two sliders. Convince yourself that the equations we used to calculate

20 20 the estimate for the constant and coefficient indeed minimize the sum of squared residuals. Software and the Ordinary Least Squares (OLS) Estimation Procedure Fortunately, we do not have to trudge through the laborious arithmetic to compute the ordinary least squares (OLS) estimates. Statistical software can do the work for us. Professor Lord s First Quiz Data: Cross section data of minutes studied and quiz scores in the first quiz for the 3 students enrolled in Professor Lord s class. Student Minutes Studied (x) Quiz Score (y) Table 5.3: First Quiz Results [Link to MIT-Quiz1.wf1 goes here.] Getting Started in EViews We can use the statistical package EViews to perform the calculations. After opening the workfile in EViews: In the Workfile window: Click on the dependent variable, y, first; and then, click on the explanatory variable, x, while depressing the <Ctrl> key. In the Workfile window: Double click on a highlighted variable In the Workfile window: Click Open Equation In the Equation Specification window: Click OK This window previews the regression that will be run; note that the dependent variable, y, is the first variable listed followed by two expressions representing the explanatory variable, x, and the constant c. Do not forget to close the workfile.

21 21 Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE t-statistic Prob x Const Number of Observations 3 Sum Squared Residuals Estimated Equation: Esty = x Interpretation of Estimates: b Const = 63: Students receive 63 points for showing up. b x = 1.2: Students receive 1.2 additional points for each additional minute studied. Critical Result: The coefficient estimate equals 1.2. The positive sign of the coefficient estimate, suggests that additional studying increases quiz scores. This evidence lends support to our theory. Table 5.4: OLS First Quiz Regression Results Table 5.4 reports the values of the coefficient and constant for the best fitting line. Also, note that the sum of squared residuals for the best fitting line is also included. Importance of the Error Term Recall the regression model: y t = β Const + β x x t + e t where y t = Quiz score of student t x t = Minutes studied by student t e t = Error term for student t The parameters of the model, the constant, β Const, and the coefficient, β x, represent the actual number of points Professor Lord gives students just for showing up, β Const ; additional points earned for each minute of study, β x. Obviously, the parameters of the model play an important role, but what about the error term, e t? To illustrate the importance of the error term, suppose that somehow we know the values of β Const and β x. For the moment, suppose that β Const, the actual constant, equals 50 and β x, the actual coefficient, equals 2. In words, this means that Professor Lord gives each student 50 points for showing up; furthermore, each minute of study provides the student with 2 additional points. Consequently, the regression model is: y t = x t + e t

22 22 NB: In the real world, we never know the actual values of the constant and coefficient. We are assuming that we do here, just to illustrate the importance of the error term. The error term reflects all the factors that cannot be anticipated or determined before the quiz is given; that is, the error term represents all random influences. In the absence of random influences, the error terms would equal 0. Absence of Random Influences: A What If Question Score (y) 100 Std Std 2 70 Actual: y = x 60 Std Minutes (x) Figure 5.9: Best Fitting Line with No Error Term Assume, only for the moment, that there are no random influences; consequently, each error term would equal 0. While this assumption is unrealistic, it allows us to appreciate the important role played by the error term. Focus on the first student taking Professor Lord s first quiz. The first student studies for 5 minutes. In the absence of random influences (that is, if e 1 equaled 0), what score would the first student receive on the quiz? The answer is 60. y 1 = = = 60 Next, consider the second student. The second student studies for 15 minutes. In the absence of random influences, the second student would receive an 80 on the quiz:

23 23 y 2 = = = 80 The third student would receive a 100: y 3 = = = 100 We summarize this in Table 5.5: Absence of Random Influences Student Minutes (x) Score (y) Table 5.5: Quiz Results with No Random Influences (No Error Term) In the absence of random influences, the intercept and slope of the best fitting line would equal the actual constant and the actual coefficient, β Const and β x : y = β Const + β x x = x Summary: In the absence of random influences, the error term of each student equals 0 and the best fitting line fits the data perfectly. The slope of this line equals 2, the actual coefficient, and the vertical intercept of the line equals 50, the actual constant. Without random influences, it is easy to determine the actual constant and coefficient. We shall now use a simulation to emphasize this point. Econometrics Lab 5.2: Coefficient Estimates When Random Influences Are Absent

24 24 Figure 5.10: Coefficient Estimate Simulation The simulation allows us to do something we cannot do in the real world. It allows us to specify the actual values of the constant and coefficient in the model; that is, we can select β Const and β x. We can specify the number of points Professor Lord gives students just for showing up, β Const ; by default, β Const is set at 50. additional points earned for an additional minute of study, β x ; by default, β x is set at 2. Consequently, the regression model is: y t = x t + e t Each repetition of the simulation represents a quiz from a single week. In each repetition, the simulation: Calculates the score for each student based on the actual constant (β Const ), the actual coefficient (β x ), and the number of minutes the student studied; then, to be realistic, the simulation can add a random influence in the form of the error term, e t. An error term is included whenever the Err Term checkbox is checked. Applies the ordinary least squares (OLS) estimation procedure to estimate the coefficient.

25 25 When the Pause box is checked the simulation stops after each repetition; when it is cleared, quizzes are simulated repeatedly until the Stop button is clicked. [Link to MIT-Lab 5.2 goes here.] We can eliminate random influences by clearing the Err Term box. After doing so, click Start and then continue a few times. We discover that in the absence of random influences the estimate of the coefficient value always equals the actual value, 2:

26 26 Coefficient Estimate: Repetition No Error Term Table 5.6: Simulation Results with No Random Influences (No Error Term) This is precisely what we concluded earlier from the scatter diagram. In the absence of random influences, the best fitting line fits the data perfectly. The best fitting line s slope equals the actual value of the coefficient. Presence of Random Influences: Back to Reality The real world is not that simple, however; random influences play an important role. In the real world, random influences are inevitably present: Inclusion of Random Influences Student Minutes (x) Score (y) Table 5.7: Quiz Results with Random Influences (with Error Term) In Figure 5.11, the actual scores on the first quiz have been added to the scatter diagram. As a consequence of the random influences, Students 1 and 2 over perform while Student 3 under performs.

27 27 Score (y) Std 2 Std Std 1 Actual: y = x Minutes (x) Figure 5.11: Scatter Diagram with Error Term As illustrated in Figure 5.12, when random influences are present, we cannot expect the intercept and slope of the best fitting line to equal the actual constant and the actual coefficient. The intercept and slope of the best fitting line, b Const and b x, are affected by the random influences.

28 28 Score (y) Std 2 Std 3 OLS Estimate: y = x 70 Actual: y = x 60 Std Minutes (x) Figure 5.12: OLS Best Fitting Line with Error Term Consequently, the intercept and slope of the best fitting line, b Const and b x, are themselves random variables. Even if we knew the actual constant and slope, that is, if we knew the actual values of β Const and β x, we could not predict the values of the constant and slope of the best fitting line, b Const and b x, with certainty before the quiz was given. Econometrics Lab 5.3: Coefficient Estimates When Random Influences Are Present We shall now use the Coefficient Estimate simulation to emphasize this point. We shall show that in the presence of random influences, the coefficient of the best fitting line is a random variable. [Link to MIT-Lab 5.3 goes here.] Note that the error term checkbox is now checked to include the error term. Be certain that the Pause checkbox is checked and then click Start. When the simulation computes the best fitting line, the estimated value of the coefficient typically is not 2 despite the fact that the actual value of the coefficient is 2. Click the Continue button a few more times to simulate each successive week s quiz.

29 29 What do you observe? We simply cannot expect the coefficient estimate to equal the actual value of the coefficient. In fact, when random influences are present, the coefficient estimate almost never equals the actual value of the coefficient. Sometimes the estimate is less than the actual value, 2, and sometimes it is greater than the actual value. When random influences are present, the coefficient estimates are random variables: Coefficient Estimate: Repetition With Error Term Table 5.8: Simulation Results with Random Influences (with Error Term) While your coefficient estimates will no doubt differ from the ones in Table 5.8, one thing is clear. Even if we know the actual value of the coefficient, as we do in the simulation, we cannot predict with certainty the value of the estimate from one repetition. Our last two simulations illustrate a critical point: The coefficient estimate is a random variable as a consequence of the random influences introduced by each student s error term. Error Terms and Random Influences: A Closer Look We shall now use a simulation to gain insights into random influences and error terms. As we know, random influences are those factors that cannot be anticipated or determined beforehand. Sometimes random influences lead to a higher quiz score and other times they lead to a lower score. The error terms embody these random influences: Sometimes the error term is positive indicating that the score is higher than usual ; Other times the error term is negative indicating that the score is lower than usual. If the random influences are indeed random, they should be a wash after many, many quizzes. That is, random influences should not systematically lead to higher or lower quiz scores. In other words, if the error terms truly reflect random influences, they should average out to 0 in the long run. Econometrics Lab 5.4: Error Terms When Random Influences Are Present Let us now check to be certain that the simulations are capturing random influences properly by accessing the Random Influence Error Terms simulation. [Link to MIT-Lab 5.4 goes here.]

30 30 Figure 5.13: Error Term Simulation Initially, the Pause checkbox is checked and the error term variance is 500. Now, click Start and observe that the simulation reports the numerical value error term for each of the three students. Record these three values. Also, note that the simulation constructs a histogram for each student s error term and also reports the mean and variance. Click Continue again to observe the numerical values of the error terms for the second quiz. Confirm that the simulation is calculating the mean and variance of each student s error terms correctly. Click Continue a few more times. Note that the error terms are indeed random variables. Before the quiz is given, we cannot predict the numerical value of a student s error term. Each student s histogram shows that sometimes the error term for that student is positive and sometimes it is negative. Next, clear the Pause checkbox and click Continue. After many, many repetitions, click Stop. Student 1 Student 2 Student 3 Mean:.0 Variance: 500. Mean:.0 Variance: 500. Mean:.0 Variance: 500. Figure 5.14: Error Term Simulation Results After many, many repetitions, the mean (average) of each student s error terms equals about 0. Consequently, each student s error term truly represents a random influence; it does not systematically influence the student s quiz score. It is also instructive to focus on each student s histogram. For each student, the numerical value of the error term is positive about half the time and negative about half the time after many, many repetitions. Summary: The error terms represent random influences; consequently, the error terms have no systematic effect on quiz scores, the dependent variable: Sometimes the error term is positive indicating that the score is higher than usual ;

31 Other times the error term is negative indicating that the score is lower than usual. What can we say about the student s error terms beforehand, before the next quiz? We can describe their probability distribution. The chances that a student s error term will be positive is the same as the chances it will be negative. For any one quiz, the mean of each student s error term s probability distribution equals 0: Mean[e 1 ] = 0 Mean[e 2 ] = 0 Mean[e 3 ] = 0 e 1 has no systematic e 2 has no systematic e 3 has no systematic effect on Student 1 s score effect on Student 2 s score effect on Student 3 s score e 1 represents e 2 represents e 3 represents a random influence a random influence a random influence Standard Ordinary Least Squares (OLS) Premises Initially, we shall make some strong assumptions regarding the explanatory variables and the error terms: Error Term Equal Variance Premise: The variance of the error term s probability distribution for each observation is the same; all the variances equal Var[e]: Var[e 1 ] = Var[e 2 ] = = Var[e T ] = Var[e] Error Term/Error Term Independence Premise: The error terms are independent: Cov[e i, e j ] = 0. Knowing the value of the error term from one observation does not help us predict the value of the error term for any other observation. Explanatory Variable/Error Term Independence Premise: The explanatory variables, the x t s, and the error terms, the e t s, are not correlated. Knowing the value of an observation s explanatory variable does not help us predict the value of that observation s error term. We call these premises the standard ordinary least squares (OLS) premises. They make the analysis as straightforward as possible. In Part Four of this textbook, we relax these premises to study more general cases. Our strategy is to start with the most straightforward case and then move on to more complex ones. While we only briefly cite the premises here, we shall return to them in the fourth part of the textbook to study their implications. Clint s Assignment: The Two Parts Recall Clint s assignment. He must assess the effect of studying on quiz scores by using Professor Lord s first quiz as evidence. Clint can apply the ordinary least squares (OLS) estimation procedure; the OLS estimate for the value of the coefficient is 1.2. But we now know that the estimate is a random variable. We 31

32 32 cannot expect the coefficient estimate from the one quiz, 1.2, to equal the actual value of the coefficient, the actual impact that studying has on a student s quiz score. We shall proceed by dividing Clint s assignment into two related parts: Reliability of the Coefficient Estimate: How reliable is the coefficient estimate calculated from the results of the first quiz? That is, how confident should Clint be that the coefficient estimate, 1.2, will be close to the actual value? Assessment of the Theory: In view of the fact that Clint s estimate of the coefficient equals 1.2, how confident should Clint be that the theory is correct, that additional studying increases quiz scores? In the next few chapters, we shall address these issues. 1 NB: These data are not real. Instead, they were constructed to illustrate important pedagogical points. 2 There is another convention that is often used to denote the parameter estimates, the beta-hat convention. The estimate of the constant is denoted by ˆConst β and the coefficient by ˆx β. While the Roman italicized b s estimation convention will be used throughout this textbook, be aware that you will come across textbooks and articles that use the beta-hat convention. The b s and ˆβ s denote the same thing; they are interchangeable.

Wednesday, September 19 Handout: Ordinary Least Squares Estimation Procedure The Mechanics

Wednesday, September 19 Handout: Ordinary Least Squares Estimation Procedure The Mechanics Amherst College Department of Economics Economics Fall 2012 Wednesday, September 19 Handout: Ordinary Least Squares Estimation Procedure he Mechanics Preview Best Fitting Line: Income and Savings Clint

More information

Chapter 8 Handout: Interval Estimates and Hypothesis Testing

Chapter 8 Handout: Interval Estimates and Hypothesis Testing Chapter 8 Handout: Interval Estimates and Hypothesis esting Preview Clint s Assignment: aking Stock General Properties of the Ordinary Least Squares (OLS) Estimation Procedure Estimate Reliability: Interval

More information

[Mean[e j ] Mean[e i ]]

[Mean[e j ] Mean[e i ]] Amherst College Department of Economics Economics 360 Fall 202 Solutions: Wednesday, September 26. Assume that the standard ordinary least square (OLS) premises are met. Let (x i, y i ) and (, y j ) be

More information

Monday, November 26: Explanatory Variable Explanatory Premise, Bias, and Large Sample Properties

Monday, November 26: Explanatory Variable Explanatory Premise, Bias, and Large Sample Properties Amherst College Department of Economics Economics 360 Fall 2012 Monday, November 26: Explanatory Variable Explanatory Premise, Bias, and Large Sample Properties Chapter 18 Outline Review o Regression Model

More information

Wednesday, September 26 Handout: Estimating the Variance of an Estimate s Probability Distribution

Wednesday, September 26 Handout: Estimating the Variance of an Estimate s Probability Distribution Amherst College Department of Economics Economics 60 Fall 2012 Wednesday, September 26 Handout: Estimating the Variance of an Estimate s Probability Distribution Preview: Review: Ordinary Least Squares

More information

Chapter 13: Dummy and Interaction Variables

Chapter 13: Dummy and Interaction Variables Chapter 13: Dummy and eraction Variables Chapter 13 Outline Preliminary Mathematics: Averages and Regressions Including Only a Constant An Example: Discrimination in Academia o Average Salaries o Dummy

More information

Chapter 15: Other Regression Statistics and Pitfalls

Chapter 15: Other Regression Statistics and Pitfalls Chapter 15: Other Regression Statistics and Pitfalls Chapter 15 Outline Two-Tailed Confidence Intervals o Confidence Interval Approach: Which Theories Are Consistent with the Data? o A Confidence Interval

More information

Chapter 10: Multiple Regression Analysis Introduction

Chapter 10: Multiple Regression Analysis Introduction Chapter 10: Multiple Regression Analysis Introduction Chapter 10 Outline Simple versus Multiple Regression Analysis Goal of Multiple Regression Analysis A One-Tailed Test: Downward Sloping Demand Theory

More information

Chapter 11 Handout: Hypothesis Testing and the Wald Test

Chapter 11 Handout: Hypothesis Testing and the Wald Test Chapter 11 Handout: Hypothesis Testing and the Wald Test Preview No Money Illusion Theory: Calculating True] o Clever Algebraic Manipulation o Wald Test Restricted Regression Reflects Unrestricted Regression

More information

Wednesday, October 17 Handout: Hypothesis Testing and the Wald Test

Wednesday, October 17 Handout: Hypothesis Testing and the Wald Test Amherst College Department of Economics Economics 360 Fall 2012 Wednesday, October 17 Handout: Hypothesis Testing and the Wald Test Preview No Money Illusion Theory: Calculating True] o Clever Algebraic

More information

Chapter 12: Model Specification and Development

Chapter 12: Model Specification and Development Chapter 12: Model Specification and Development Chapter 12 Outline Model Specification: Ramsey REgression Specification Error Test (RESET) o RESET Logic o Linear Demand Model o Constant Elasticity Demand

More information

Monday, October 15 Handout: Multiple Regression Analysis Introduction

Monday, October 15 Handout: Multiple Regression Analysis Introduction Amherst College Department of Economics Economics 360 Fall 2012 Monday, October 15 Handout: Multiple Regression Analysis Introduction Review Simple and Multiple Regression Analysis o Distinction between

More information

An Introduction to Econometrics. A Self-contained Approach. Frank Westhoff. The MIT Press Cambridge, Massachusetts London, England

An Introduction to Econometrics. A Self-contained Approach. Frank Westhoff. The MIT Press Cambridge, Massachusetts London, England An Introduction to Econometrics A Self-contained Approach Frank Westhoff The MIT Press Cambridge, Massachusetts London, England How to Use This Book xvii 1 Descriptive Statistics 1 Chapter 1 Prep Questions

More information

Monday, September 10 Handout: Random Processes, Probability, Random Variables, and Probability Distributions

Monday, September 10 Handout: Random Processes, Probability, Random Variables, and Probability Distributions Amherst College Department of Economics Economics 360 Fall 202 Monday, September 0 Handout: Random Processes, Probability, Random Variables, and Probability Distributions Preview Random Processes and Probability

More information

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

AP Statistics L I N E A R R E G R E S S I O N C H A P 7 AP Statistics 1 L I N E A R R E G R E S S I O N C H A P 7 The object [of statistics] is to discover methods of condensing information concerning large groups of allied facts into brief and compendious

More information

Chapter 14: Omitted Explanatory Variables, Multicollinearity, and Irrelevant Explanatory Variables

Chapter 14: Omitted Explanatory Variables, Multicollinearity, and Irrelevant Explanatory Variables Chapter 14: Omitted Explanatory Variables, Multicollinearity, and Irrelevant Explanatory Variables Chapter 14 Outline Review o Unbiased Estimation Procedures Estimates and Random Variables Mean of the

More information

Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur

Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur Module No. # 01 Lecture No. # 28 LOGIT and PROBIT Model Good afternoon, this is doctor Pradhan

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

Stat 101 L: Laboratory 5

Stat 101 L: Laboratory 5 Stat 101 L: Laboratory 5 The first activity revisits the labeling of Fun Size bags of M&Ms by looking distributions of Total Weight of Fun Size bags and regular size bags (which have a label weight) of

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

POL 681 Lecture Notes: Statistical Interactions

POL 681 Lecture Notes: Statistical Interactions POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship

More information

LECTURE 15: SIMPLE LINEAR REGRESSION I

LECTURE 15: SIMPLE LINEAR REGRESSION I David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).

More information

Linear Regression 3.2

Linear Regression 3.2 3.2 Linear Regression Regression is an analytic technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear correlation, you

More information

Solutions: Monday, October 15

Solutions: Monday, October 15 Amherst College Department of Economics Economics 360 Fall 2012 1. Consider Nebraska petroleum consumption. Solutions: Monday, October 15 Petroleum Consumption Data for Nebraska: Annual time series data

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square

More information

EXPERIMENT 2 Reaction Time Objectives Theory

EXPERIMENT 2 Reaction Time Objectives Theory EXPERIMENT Reaction Time Objectives to make a series of measurements of your reaction time to make a histogram, or distribution curve, of your measured reaction times to calculate the "average" or mean

More information

1 Correlation and Inference from Regression

1 Correlation and Inference from Regression 1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is

More information

Chapter 4 Describing the Relation between Two Variables

Chapter 4 Describing the Relation between Two Variables Chapter 4 Describing the Relation between Two Variables 4.1 Scatter Diagrams and Correlation The is the variable whose value can be explained by the value of the or. A is a graph that shows the relationship

More information

APPENDIX 1 BASIC STATISTICS. Summarizing Data

APPENDIX 1 BASIC STATISTICS. Summarizing Data 1 APPENDIX 1 Figure A1.1: Normal Distribution BASIC STATISTICS The problem that we face in financial analysis today is not having too little information but too much. Making sense of large and often contradictory

More information

The cover page of the Encyclopedia of Health Economics (2014) Introduction to Econometric Application in Health Economics

The cover page of the Encyclopedia of Health Economics (2014) Introduction to Econometric Application in Health Economics PHPM110062 Teaching Demo The cover page of the Encyclopedia of Health Economics (2014) Introduction to Econometric Application in Health Economics Instructor: Mengcen Qian School of Public Health What

More information

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur Lecture 10 Software Implementation in Simple Linear Regression Model using

More information

BIOSTATISTICS NURS 3324

BIOSTATISTICS NURS 3324 Simple Linear Regression and Correlation Introduction Previously, our attention has been focused on one variable which we designated by x. Frequently, it is desirable to learn something about the relationship

More information

Amherst College Department of Economics Economics 360 Fall 2012

Amherst College Department of Economics Economics 360 Fall 2012 Amherst College Department of Economics Economics 360 Fall 2012 Monday, December 3: Omitted Variables and the Instrumental Variable Estimation Procedure Chapter 20 Outline Revisit Omitted Explanatory Variable

More information

appstats8.notebook October 11, 2016

appstats8.notebook October 11, 2016 Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

Introduction to Algebra: The First Week

Introduction to Algebra: The First Week Introduction to Algebra: The First Week Background: According to the thermostat on the wall, the temperature in the classroom right now is 72 degrees Fahrenheit. I want to write to my friend in Europe,

More information

Mathematical Forms and Strategies

Mathematical Forms and Strategies Session 3365 Mathematical Forms and Strategies Andrew Grossfield College of Aeronautics Abstract One of the most important mathematical concepts at every level is the concept of form. Starting in elementary

More information

Introduction to Uncertainty and Treatment of Data

Introduction to Uncertainty and Treatment of Data Introduction to Uncertainty and Treatment of Data Introduction The purpose of this experiment is to familiarize the student with some of the instruments used in making measurements in the physics laboratory,

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Most of this course will be concerned with use of a regression model: a structure in which one or more explanatory

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Information Sources. Class webpage (also linked to my.ucdavis page for the class):

Information Sources. Class webpage (also linked to my.ucdavis page for the class): STATISTICS 108 Outline for today: Go over syllabus Provide requested information I will hand out blank paper and ask questions Brief introduction and hands-on activity Information Sources Class webpage

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation? Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation

More information

Lesson/Unit Plan Name: Algebraic Expressions Identifying Parts and Seeing Entities. as both a single entity and a sum of two terms.

Lesson/Unit Plan Name: Algebraic Expressions Identifying Parts and Seeing Entities. as both a single entity and a sum of two terms. Grade Level/Course: Grade 6 Lesson/Unit Plan Name: Algebraic Expressions Identifying Parts and Seeing Entities Rationale/Lesson Abstract: This lesson focuses on providing students with a solid understanding

More information

Review of the Normal Distribution

Review of the Normal Distribution Sampling and s Normal Distribution Aims of Sampling Basic Principles of Probability Types of Random Samples s of the Mean Standard Error of the Mean The Central Limit Theorem Review of the Normal Distribution

More information

At this point, if you ve done everything correctly, you should have data that looks something like:

At this point, if you ve done everything correctly, you should have data that looks something like: This homework is due on July 19 th. Economics 375: Introduction to Econometrics Homework #4 1. One tool to aid in understanding econometrics is the Monte Carlo experiment. A Monte Carlo experiment allows

More information

WISE Regression/Correlation Interactive Lab. Introduction to the WISE Correlation/Regression Applet

WISE Regression/Correlation Interactive Lab. Introduction to the WISE Correlation/Regression Applet WISE Regression/Correlation Interactive Lab Introduction to the WISE Correlation/Regression Applet This tutorial focuses on the logic of regression analysis with special attention given to variance components.

More information

Slope Fields: Graphing Solutions Without the Solutions

Slope Fields: Graphing Solutions Without the Solutions 8 Slope Fields: Graphing Solutions Without the Solutions Up to now, our efforts have been directed mainly towards finding formulas or equations describing solutions to given differential equations. Then,

More information

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc. Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright

More information

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of

More information

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i B. Weaver (24-Mar-2005) Multiple Regression... 1 Chapter 5: Multiple Regression 5.1 Partial and semi-partial correlation Before starting on multiple regression per se, we need to consider the concepts

More information

Warm-up Using the given data Create a scatterplot Find the regression line

Warm-up Using the given data Create a scatterplot Find the regression line Time at the lunch table Caloric intake 21.4 472 30.8 498 37.7 335 32.8 423 39.5 437 22.8 508 34.1 431 33.9 479 43.8 454 42.4 450 43.1 410 29.2 504 31.3 437 28.6 489 32.9 436 30.6 480 35.1 439 33.0 444

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 17, 2010 Instructor: John Parman Final Exam - Solutions You have until 12:30pm to complete this exam. Please remember to put your

More information

4.1 Least Squares Prediction 4.2 Measuring Goodness-of-Fit. 4.3 Modeling Issues. 4.4 Log-Linear Models

4.1 Least Squares Prediction 4.2 Measuring Goodness-of-Fit. 4.3 Modeling Issues. 4.4 Log-Linear Models 4.1 Least Squares Prediction 4. Measuring Goodness-of-Fit 4.3 Modeling Issues 4.4 Log-Linear Models y = β + β x + e 0 1 0 0 ( ) E y where e 0 is a random error. We assume that and E( e 0 ) = 0 var ( e

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

Wednesday, October 10 Handout: One-Tailed Tests, Two-Tailed Tests, and Logarithms

Wednesday, October 10 Handout: One-Tailed Tests, Two-Tailed Tests, and Logarithms Amherst College Department of Economics Economics 360 Fall 2012 Wednesday, October 10 Handout: One-Tailed Tests, Two-Tailed Tests, and Logarithms Preview A One-Tailed Hypothesis Test: The Downward Sloping

More information

Please bring the task to your first physics lesson and hand it to the teacher.

Please bring the task to your first physics lesson and hand it to the teacher. Pre-enrolment task for 2014 entry Physics Why do I need to complete a pre-enrolment task? This bridging pack serves a number of purposes. It gives you practice in some of the important skills you will

More information

The SuperBall Lab. Objective. Instructions

The SuperBall Lab. Objective. Instructions 1 The SuperBall Lab Objective This goal of this tutorial lab is to introduce data analysis techniques by examining energy loss in super ball collisions. Instructions This laboratory does not have to be

More information

Chapter 1 Handout: Descriptive Statistics

Chapter 1 Handout: Descriptive Statistics Preview Chapter 1 Handout: Descriptive Statistics Describing a Single Data Variable o Introduction to Distributions o Measure of the Distribution Center: Mean (Average) o Measures of the Distribution Spread:

More information

Chapter 6. Exploring Data: Relationships

Chapter 6. Exploring Data: Relationships Chapter 6 Exploring Data: Relationships For All Practical Purposes: Effective Teaching A characteristic of an effective instructor is fairness and consistenc in grading and evaluating student performance.

More information

Measurement: The Basics

Measurement: The Basics I. Introduction Measurement: The Basics Physics is first and foremost an experimental science, meaning that its accumulated body of knowledge is due to the meticulous experiments performed by teams of

More information

11.5 Regression Linear Relationships

11.5 Regression Linear Relationships Contents 11.5 Regression............................. 835 11.5.1 Linear Relationships................... 835 11.5.2 The Least Squares Regression Line........... 837 11.5.3 Using the Regression Line................

More information

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals 4 December 2018 1 The Simple Linear Regression Model with Normal Residuals In previous class sessions,

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

TOPIC 9 SIMPLE REGRESSION & CORRELATION

TOPIC 9 SIMPLE REGRESSION & CORRELATION TOPIC 9 SIMPLE REGRESSION & CORRELATION Basic Linear Relationships Mathematical representation: Y = a + bx X is the independent variable [the variable whose value we can choose, or the input variable].

More information

Using Microsoft Excel

Using Microsoft Excel Using Microsoft Excel Objective: Students will gain familiarity with using Excel to record data, display data properly, use built-in formulae to do calculations, and plot and fit data with linear functions.

More information

3 Non-linearities and Dummy Variables

3 Non-linearities and Dummy Variables 3 Non-linearities and Dummy Variables Reading: Kennedy (1998) A Guide to Econometrics, Chapters 3, 5 and 6 Aim: The aim of this section is to introduce students to ways of dealing with non-linearities

More information

3.4 Complex Zeros and the Fundamental Theorem of Algebra

3.4 Complex Zeros and the Fundamental Theorem of Algebra 86 Polynomial Functions 3.4 Complex Zeros and the Fundamental Theorem of Algebra In Section 3.3, we were focused on finding the real zeros of a polynomial function. In this section, we expand our horizons

More information

PHY 123 Lab 1 - Error and Uncertainty and the Simple Pendulum

PHY 123 Lab 1 - Error and Uncertainty and the Simple Pendulum To print higher-resolution math symbols, click the Hi-Res Fonts for Printing button on the jsmath control panel. PHY 13 Lab 1 - Error and Uncertainty and the Simple Pendulum Important: You need to print

More information

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variable In this lecture: We shall look at two quantitative variables.

More information

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variables In this lecture: We shall look at two quantitative variables.

More information

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and

More information

The following formulas related to this topic are provided on the formula sheet:

The following formulas related to this topic are provided on the formula sheet: Student Notes Prep Session Topic: Exploring Content The AP Statistics topic outline contains a long list of items in the category titled Exploring Data. Section D topics will be reviewed in this session.

More information

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing 1. Purpose of statistical inference Statistical inference provides a means of generalizing

More information

CHAPTER 6: SPECIFICATION VARIABLES

CHAPTER 6: SPECIFICATION VARIABLES Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero

More information

Lesson: Slope. Warm Up. Unit #2: Linear Equations. 2) If f(x) = 7x 5, find the value of the following: f( 2) f(3) f(0)

Lesson: Slope. Warm Up. Unit #2: Linear Equations. 2) If f(x) = 7x 5, find the value of the following: f( 2) f(3) f(0) Warm Up 1) 2) If f(x) = 7x 5, find the value of the following: f( 2) f(3) f(0) Oct 15 10:21 AM Unit #2: Linear Equations Lesson: Slope Oct 15 10:05 AM 1 Students will be able to find the slope Oct 16 12:19

More information

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47 ECON2228 Notes 2 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 2 2014 2015 1 / 47 Chapter 2: The simple regression model Most of this course will be concerned with

More information

STAT 350 Final (new Material) Review Problems Key Spring 2016

STAT 350 Final (new Material) Review Problems Key Spring 2016 1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,

More information

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals Chapter 8 Linear Regression Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8-1 Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fat Versus

More information

Experiment 2. Reaction Time. Make a series of measurements of your reaction time. Use statistics to analyze your reaction time.

Experiment 2. Reaction Time. Make a series of measurements of your reaction time. Use statistics to analyze your reaction time. Experiment 2 Reaction Time 2.1 Objectives Make a series of measurements of your reaction time. Use statistics to analyze your reaction time. 2.2 Introduction The purpose of this lab is to demonstrate repeated

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Conditions for Regression Inference:

Conditions for Regression Inference: AP Statistics Chapter Notes. Inference for Linear Regression We can fit a least-squares line to any data relating two quantitative variables, but the results are useful only if the scatterplot shows a

More information

Algebra Exam. Solutions and Grading Guide

Algebra Exam. Solutions and Grading Guide Algebra Exam Solutions and Grading Guide You should use this grading guide to carefully grade your own exam, trying to be as objective as possible about what score the TAs would give your responses. Full

More information

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Chapter 10 Regression Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Scatter Diagrams A graph in which pairs of points, (x, y), are

More information

2 Prediction and Analysis of Variance

2 Prediction and Analysis of Variance 2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering

More information

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable Chapter 08: Linear Regression There are lots of ways to model the relationships between variables. It is important that you not think that what we do is the way. There are many paths to the summit We are

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice Name Period AP Statistics Bivariate Data Analysis Test Review Multiple-Choice 1. The correlation coefficient measures: (a) Whether there is a relationship between two variables (b) The strength of the

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Unit 1.1 Equations. Quarter 1. Section Days Lesson Notes. Algebra 1 Unit & Lesson Overviews Mathematics Variables and Expressions

Unit 1.1 Equations. Quarter 1. Section Days Lesson Notes. Algebra 1 Unit & Lesson Overviews Mathematics Variables and Expressions Unit 1.1 Equations Quarter 1 Section Days Lesson Notes 1.1 1 Variables and Expressions 1.2 1.3 1 Solving Equations by Addition, Subtraction, Multiplying or Dividing 1.4 1 Solving Two-Step and Multi-Step

More information

Scatterplots and Correlation

Scatterplots and Correlation Bivariate Data Page 1 Scatterplots and Correlation Essential Question: What is the correlation coefficient and what does it tell you? Most statistical studies examine data on more than one variable. Fortunately,

More information

Linear Regression with one Regressor

Linear Regression with one Regressor 1 Linear Regression with one Regressor Covering Chapters 4.1 and 4.2. We ve seen the California test score data before. Now we will try to estimate the marginal effect of STR on SCORE. To motivate these

More information

TESTING FOR CO-INTEGRATION

TESTING FOR CO-INTEGRATION Bo Sjö 2010-12-05 TESTING FOR CO-INTEGRATION To be used in combination with Sjö (2008) Testing for Unit Roots and Cointegration A Guide. Instructions: Use the Johansen method to test for Purchasing Power

More information

Lab 3 Acceleration. What You Need To Know: Physics 211 Lab

Lab 3 Acceleration. What You Need To Know: Physics 211 Lab b Lab 3 Acceleration Physics 211 Lab What You Need To Know: The Physics In the previous lab you learned that the velocity of an object can be determined by finding the slope of the object s position vs.

More information

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships Chapter 3: Describing Relationships Section 3.2 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Chapter 3 Describing Relationships 3.1 Scatterplots and Correlation 3.2 Section 3.2

More information

Simple Linear Regression: One Quantitative IV

Simple Linear Regression: One Quantitative IV Simple Linear Regression: One Quantitative IV Linear regression is frequently used to explain variation observed in a dependent variable (DV) with theoretically linked independent variables (IV). For example,

More information

Uncertainty, Error, and Precision in Quantitative Measurements an Introduction 4.4 cm Experimental error

Uncertainty, Error, and Precision in Quantitative Measurements an Introduction 4.4 cm Experimental error Uncertainty, Error, and Precision in Quantitative Measurements an Introduction Much of the work in any chemistry laboratory involves the measurement of numerical quantities. A quantitative measurement

More information