Chapter 8 Handout: Interval Estimates and Hypothesis Testing

Size: px

Start display at page:

Download "Chapter 8 Handout: Interval Estimates and Hypothesis Testing"

Loraine Lang
5 years ago
Views:

1 Chapter 8 Handout: Interval Estimates and Hypothesis esting Preview Clint s Assignment: aking Stock General Properties of the Ordinary Least Squares (OLS) Estimation Procedure Estimate Reliability: Interval Estimate Question o Normal Distribution versus the Student t-distribution: One Last Complication o Assessing the Reliability of a Coefficient Estimate: Applying the Student t-distribution heory Assessment: Hypothesis esting Summary: he Ordinary Least Squares (OLS) Estimation Procedure o Regression Model and the Role of the Error erm o Standard Ordinary Least Squares (OLS) Premises o Ordinary Least Squares (OLS) Estimation Procedure: hree Important Parts Value of the coefficient itself Variance of the error term s probability distribution Variance of the coefficient estimate s probability distribution o Properties of the Ordinary Least Squares (OLS) Estimation Procedure When the Standard Ordinary Least Squares (OLS) Premises Are Met: Each estimation procedure is unbiased. he estimation procedure for the coefficient value is the best linear unbiased estimation procedure (BLUE). Causation versus Correlation Assignment: aking Stock heory: Additional studying increases quiz scores. Regression Model: y t = Const + x x t + e t o y t = Quiz score Const = Points for showing up o x t = Minutes studied x = Points for each minute studied o e t = Error term e t reflects random influences: Mean[e t ] = 0 Ordinary Least Squares (OLS) Estimation Procedure: Apply the ordinary least squares estimation procedure; use data from Professor Lord s first quiz to estimate the values of Const and x by finding the best fitting line, the equation that minimizes the sum of squared residuals: First Quiz Data Student x y Ordinary Least Squares (OLS) Estimates: Esty = x b Const = y b x x = = 63 5 b Const = Estimated points for showing up = 63 ( yt y)( xt x) t x ( xt x) t1 Clint s Assignment Coefficient Reliability: How reliable is the coefficient estimate calculated from the results of the first quiz? hat is, how confident should Clint be that the coefficient estimate, 1.2, will be close to the actual value? heory Confidence: How much confidence should Clint have in the theory that additional studying increases quiz scores? b 1.2 b x = Estimated points for each minute studied = 1.2

2 2 General Properties of the Ordinary Least Squares (OLS) Estimation Procedure When the standard ordinary least squares (OLS) premises are satisfied, the following equations describe the coefficient estimate s general properties, the estimate s probability distribution: Mean[b x ] = x Var[ e] Var[ bx ] 2 ( x x) Importance of the Probability Distribution s Mean (Center) and Variance (Spread) Mean: When the mean of the estimate s probability distribution, Mean[b x ], equals the actual value of the coefficient, x, the estimation procedure is unbiased. o Unbiased does not mean that the estimate will equal the actual value. In fact, we can be all but certain that the estimate will not equal the actual value. o What unbiased does mean that the estimation procedure does not systematically underestimate or overestimate the actual value. Formally, the mean of the estimate s probability distribution equals the actual value. Furthermore, if the estimate s probability distribution is symmetric, the chances that the estimate is too high equal the chances that it is too low. Variance: When the estimation procedure is unbiased, the variance of the estimate s probability distribution, Var[b x ], determines the reliability of the estimate. o As the variance decreases, the estimate is more likely to be close to the actual value. o he Problem: In reality, neither Clint nor we know the variance of the error term s probability distribution which is needed to calculate the variance of the coefficient estimate s probability distribution. How can of Clint proceed? o Solution: A wo Step Approach Step 1: Estimate the variance of the error term s probability distribution. Step 2: Use the estimate of the variance of the error term s probability distribution to estimate the variance of the coefficient estimate s probability distribution. Clint s Assignment: Coefficient Reliability. How reliable is the coefficient estimate, 1.2, calculated from the first quiz? hat is, how confident should Clint be that the coefficient estimate, 1.2, will be close to the actual value? Interval Estimate Question: What is the probability that the coefficient estimate, 1.2, lies within of the actual coefficient value? One Last Complication Unfortunately, before we can tackle Clint s assignment, we must address one last complication: o use the normal distribution we We do not know the actual value must know the actual value of the of the variance for the coefficient variance (and the standard deviation) estimate s probability distribution. of the random variable s probability distribution. We must estimate its variance. t1 t We cannot use the normal distribution when dealing with the coefficient estimate. Instead we must use another distribution, the Student t-distribution.

3 3 Normal Distribution Versus the Student t-distribution Estimating the standard deviation introduces an additional element of uncertainty; hence, the t-distribution is more than the normal distribution. Probability Distribution of Random Variable Normal he Student t-distribution s spread depends on the _. As the degrees of freedom increase, we have information.. Hence, the distribution s spread is. Distribution Mean Value of Random Variable he Normal Distribution s z and the Student t-distribution s t: When the standard deviation, is known, we use the normal distribution: z = Value of Random Variable Distribution Mean Standard Deviation of Random Variable = Number of Standard Deviations from the Mean o calculate probabilities with the normal distribution we need only the value of. When the standard deviation is not known it must be estimated; we must use the Student t- distribution rather than the normal distribution: t = Value of Random Variable Distribution Mean Estimated Standard Deviation of Random Variable = Number of Estimated Standard Deviations (Standard Errors) from the Mean o calculate probabilities with the Student t-distribution we need the value of _. the _.

4 4 Clint s Assignment: Coefficient Reliability. How reliable is the coefficient estimate, 1.2, calculated from the first quiz? hat is, how confident should Clint be that the coefficient estimate, 1.2, will be close to the actual value? Interval Estimate Question: What is the probability that the coefficient estimate, 1.2, lies within of the actual coefficient value? First Blank: We begin by filling in the first blank, choosing our close to criterion. Suppose that we choose 1.5: Close o Criterion = So, we write 1.50 in the first blank. Next, draw the picture. Question: Why does the mean of the coefficient estimate s probability distribution equal the actual coefficient value, x? Probability that the estimate is within of the actual value β x t = Second Blank: Calculate the probability. Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE t-statistic Prob x Const Number of Observations 3 Convert 1.50 into standard errors: 1.50 = _ SE s. he probability that the he probability that the estimate lies within 1.50 of = estimate lies within SE s of the actual value. the actual value. Between t s of and Degrees of Freedom = Sample Size Number of Estimated Parameters = _ _ = _ Using the Econometrics Lab: Lab 8.1b Lab 8.1a Degrees of Freedom: Left tail: t: Right tail: t: Left tail probability = SE s SE s Actual Value = β x Student t-distribution Mean[b x ] = β x SE[b x ] = DF = β x + t = b x Right tail probability = he probability that the estimate lies within 1.5 of the actual value: 1.00 ( + ) = 1.00 =.

5 5 Clint s Assignment: heory Confidence. How much confidence should Clint have in the theory that additional studying increases quiz scores? heory: Additional studying increases quiz scores. Step 0: Construct a model reflecting the theory to be tested: y t = Const + x x t + e t y t = quiz score x t = minutes studied e t = error term Const reflects points for showing up x reflects points for each minute studied he theory suggests that x should be positive. heory: x > 0. Step 1: Collect data, run the regression, and interpret the estimates First Quiz Data Student x y Ordinary Least Squares (OLS) Estimates: Esty = x b Const = y b x x = = 63 5 b Const = Estimated points for showing up = 63 b ( yt y)( xt x) t x ( xt x) t1 1.2 b x = Estimated points for each minute studied = 1.2 Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE t-statistic Prob x Const Number of Observations 3 Sum Squared Residuals SE of Regression Estimated Equation: Esty = x Interpretation: Focus on the parameter estimates: he estimate of the constant suggests that students receive points for showing up; he estimate of the coefficient suggests that students receive additional points for each additional minute of studied. Critical Result: he coefficient estimate equals. he sign of the coefficient estimate suggests that additional studying increases quiz scores. his evidence our theory.

6 6 Step 2: Play the cynic, challenge the evidence, and construct the null and alternative hypotheses. Cynic s View: Sure the coefficient estimate from the first quiz data was positive, but this result was just the luck of the draw. In fact, studying actually has no impact on a student s quiz score; the actual coefficient, x, equals 0. H 0 : x = 0 Cynic is correct: Studying has no impact on a student s quiz score H 1 : x > 0 Cynic is incorrect: Additional studying increases quiz scores Lab 8.2 Probability Distribution of Coefficient Estimates Question: Can we rule out the possibility that the cynic might be correct? _ If the actual coefficient, x, equals 0: Mean[b x ] = 0 he distribution mean (center) would equal 0. Why? If β x = 0 Prob[b x > 0].50 herefore, the probability that the coefficient estimate from one repetition of the experiment (quiz) would be positive equals about _. Step 3: Formulate the question to assess the cynic s view. 0 b x Question to Assess Cynic s View: Generic Question: What is the probability that the results would be like those we actually obtained (or even stronger), if the cynic is correct and studying actually has no impact? Specific Question: he regression s coefficient estimate was 1.2: What is the probability that the coefficient estimate in one regression would be 1.2 or more, if H 0 were actually true (if the actual coefficient, x, equals 0)? Answer: Prob[Results IF Cynic Correct] or equivalently rue] he size of this probability determines whether or not we reject the null hypothesis: rue] small that H 0 is true rue] large that H 0 is true H 0 H 0 rue]: What is the probability that the coefficient estimate in one regression would be 1.2 or more, if H 0 were actually true (if the actual coefficient, x, equals 0)?

7 7 Step 4: Use the general properties of the estimation procedure, the properties of the probability distribution of the estimates, to calculate rue]. Since ordinary least squares estimation procedure for the Probability Distribution coefficient value is unbiased, the mean of the probability distribution for the estimate equals. If the null hypothesis were true, the actual coefficient would equal. he standard error equals _. he degrees of freedom equal. Student t-distribution Mean = SE = DF = b x OLS estimation Assume H 0 Standard Number of Number of procedure unbiased is true Error Observations Parameters Mean[b x ] = = SE[b x ] = DF = = Econometrics Lab: t = _ Lab 8.3 rue] =. An easier way to calculate rue]: Most statistical software packages make this easy for us. Focus attention on the t-statistic and Prob. columns. Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE t-statistic Prob x Const Number of Observations 3 Sum Squared Residuals SE of Regression Estimated Equation: Esty = x t-statistic Column: What does the t-statistic column report? Question: How many standard errors (estimated standard deviations) does the coefficient estimate, 1,2, lie from 0? Answer: Coefficient Column Std. Error Column = = Question: What does the t-statistic column equal? Answer: _

8 8 Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE t-statistic Prob x Const Number of Observations 3 Sum Squared Residuals SE of Regression Estimated Equation: Esty = x Prob. Column: Question: What does the Prob. Column Report? Question: What is the tails probability, the probability that the coefficient estimate, b x, resulting from one regression would will lie at least 1.2 from 0, if the actual coefficient, x, equals 0. Answer: Probability Distribution Student t-distribution Mean = SE = DF = Question: What does the Prob. column equal? Answer: _ b x Prob. Column Answer: rue] = 2 Step 5: Decide on the standard of proof, a significance level = 2. he significance level is the dividing line between the probability being small and the probability being large. rue] Less han Significance Level rue] that H 0 is true rue] Greater han Significance Level rue] that H 0 is true H 0 H 0 Would we reject H 0 at a 1 percent (.01) significance level? Would we reject H 0 at a 5 percent (.05) significance level? Would we reject H 0 at a 10 percent (.10) significance level? At the traditional significance levels we reject the null hypothesis that studying has no impact on quiz scores.

9 9 Summary: he Ordinary Least Squares Estimation Procedure Now, let us sum up what we have learned about the ordinary least squares (OLS) estimation procedure: Regression Model y t = Const + x x t + e t y t = Dependent variable x t = Explanatory variable e t = Error term t = 1, 2,, = Sample size Role of the Error erm he error term is a random variable; it represents random influences. he mean of the error term s probability distribution equals 0: Mean[e t ] = 0 Standard Ordinary Least Squares (OLS) Premises Error erm Equal Variance Premise: he variance of the error term s probability distribution for each observation is the same; all the variances equal Var[e]: Var[e 1 ] = Var[e 2 ] = = Var[e ] = Var[e] Error erm/error erm Independence Premise: he error terms are independent: Cov[e i, e j ] = 0. Knowing the value of the error term from one observation does not help you predict the value of the error term for any other observation. Explanatory Variable/Error erm Independence Premise: he explanatory variables, the x t s, and the error terms, the e t s, are not correlated. Knowing the value of an observation s explanatory variable does not help you predict the value of that observation s error term. Ordinary least squares estimation procedure (OLS) includes three procedures to estimate the: Values of the regression parameters, x and Const : ( yt y)( xt x) t1 bx 2 ( x x) and b Const = y b x x Variance of the error term s probability distribution, Var[e]: EstVar[e] = t1 t SSR Degrees of Freedom Variance of the coefficient estimate s probability distribution, Var[b x ]: EstVar[ b ] x EstVar[ e] t1 ( x x) t 2 Properties of the Ordinary Least Squares Estimation Procedure: When the standard regression premises are met: Each estimation procedure is unbiased; each estimation procedure does not systematically underestimate or overestimate the actual value. he ordinary least squares (OLS) estimation procedure for the coefficient value is the best linear unbiased estimation procedure (BLUE).

10 10 Causation versus Correlation Our theory and Step 0 illustrate the important distinction between causation and correlation: heory: Additional studying increases quiz scores. Step 0: Formulate a model reflecting the theory to be tested. y t = Const + x x t + e t y t = Quiz score Const reflects points for showing up x t = Minutes studied x reflects points for each minute studied e t = Error term he theory suggests that x is positive. heory: x > 0. Increase in studying (x t ) _ Our model is a model. Quiz score to increase (y t ) Question: Does causation imply correlation? If our theory is correct, does knowing the number of minutes a student studies help us to predict his/her quiz score? _ If our theory is correct, does knowing a student s quiz score helps us predict the number of minutes he/she has studied? _ Question: Does correlation necessarily imply causation? Consider the win Cities, Minneapolis and St. Paul. Is rainfall in Minneapolis and St. Paul correlated? o Does knowing whether or not it rains in Minneapolis help us predict whether or not it will rain in St. Paul? o Does knowing whether or not it rains in St. Paul help us predict whether or not it will rain in Minneapolis? Is there a causal relationship between rainfall in Minneapolis and St. Paul? o o Does rain in Minneapolis cause rain in St. Paul? Does rain in St. Paul cause rain in Minneapolis? Summary: Causation versus Correlation Causation correlation. Correlation causation.

Wednesday, September 26 Handout: Estimating the Variance of an Estimate s Probability Distribution

Amherst College Department of Economics Economics 60 Fall 2012 Wednesday, September 26 Handout: Estimating the Variance of an Estimate s Probability Distribution Preview: Review: Ordinary Least Squares