The Simple Linear Regression Model

The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77

Bivariate Data and Relationships Many uses of statistics, especially in economics and business, investigate relationships between variables # of police & # of crimes; healthcare spending & avg. lifespan government spending & GDP sales & profits We will look at bivariate data relationships between two variables (e.g. X and Y ) Our immediate aim is to explore associations between variables We can quantify associations with measures such as correlation and linear regression An eventual goal will be to examine causation a very difficult thing to prove (that will require a course in econometrics) Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 3 / 77

Bivariate Data and Relationships We examine many individuals with multiple variables in spreadsheets A row contains data about all variables for a single individual. A column contains data about a single variable across all individuals. Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 4 / 77

Bivariate Data and Relationships The most basic and helpful way to visualize the relationship between two quantitative variables is a scatterplot Each data point coordinate (X i, Y i ) is an individual observation (e.g. country) Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 5 / 77

Bivariate Data and Relationships On the horizontal axis we usually put the independent or explanatory variable (e.g. Economic Freedom Index) On the vertical axis we usually put the dependent or response variable (e.g. GDP per capita) Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 6 / 77

Bivariate Data and Relationships We want to look for an association between the independent and dependent variables based on the following factors: 1 Direction: is the trend positive or negative? 2 Form: is the trend linear, quadratic, something else, or no pattern? 3 Strength: Is the association strong or weak? 4 Outliers: Are there unusual data points that break the trends above? Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 7 / 77

Correlation We want a way to quantify the strength of the association between two variables We can measure the sample correlation: r = 1 X i X ( )( Y i Ȳ ) n 1 s X s Y Notice each parenthetical is a standardized (i.e. Z) score for each variable, so equivalently: r X,Y = n Z X Z Y n 1 Take each coordinate pair, standardize the X value and the Y values, multiply them, and average them over n 1 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 8 / 77

Correlation n ZX Z Y r X,Y = n 1 Correlation is standardized to be between -1 and 1 Negative values imply a negative association Positive values imply a positive association A correlation of zero implies no association The closer r is to 1, the stronger the association r = 1 implies a perfect straight line Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 9 / 77

Correlation Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 10 / 77

Correlation Guess The Correlation Game Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 11 / 77

Correlation and Endogeneity Correlation does not imply causation! There is no way to conclude from correlation alone that X causes Y There may be lurking or confounding variables (e.g. Z) that simultaneously affect both X and Y There may be simultaneous or reverse causation (e.g. maybe Y causes X!) Most of econometrics deals with trying to properly identify causal effects by controlling for lurking variables Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 12 / 77

Correlation Example The correlation between Life Expectancy and Doctors Per Person is 0.705. So should we send more doctors to developing countries to increase their life expectancy? Income? Living Standards? Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 13 / 77

Linear Regression If an association is roughly linear, we can estimate a line that would fit the data Recall a linear equation describing a line can be written as: Y = a + bx a: vertical intercept b: slope of the line Note we will use different symbols for a and b, in line with standard econometric notation How do we find the line that best fits the data? By linear regression Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 14 / 77

The Population Linear Regression Model Linear regression lets us estimate the slope of the population regression line between two variables, X and Y We can then make inferences about the population slope coefficient Ultimately, we want to estimate causal effect on Y of a unit change in X Y X i.e. for a one unit change in X, how many units will this cause Y to change? First, we will focus on fitting a straight line to data on X and Y Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 15 / 77

The Population Linear Regression Model The statistical analyses for linear regression are the same as the ones we looked at for estimating population means, proportions, or differences in means: Estimation: fit a line through data to estimate population relationships (slope) Hypothesis testing: test if the true slope is a certain value Confidence intervals: construct a confidence interval for the true slope Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 16 / 77

The Population Linear Regression Model Example What is the relationship between class size and educational performance? Policy question: What is the effect of reducing class sizes by 1 student per class on test scores? 10 students? Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 17 / 77

The Population Linear Regression Model Example What is the relationship between class size and educational performance? 690 Test Score 660 630 14 16 18 20 22 24 26 Student to Teacher Ratio Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 18 / 77

The Population Linear Regression Model If we change the class size by an amount, what would we expect the change in test scores to be? β ClassSize = change in test score change in class size = test score class size If we knew β ClassSize, we could say that increasing (decreasing) class size by 1 student will change test scores by (negative) β ClassSize Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 19 / 77

The Population Linear Regression Model Rearranging: test score = β ClassSize class size Suppose β ClassSize = 0.6. If we shrank class size by 2 students, model predicts: test score = 0.6 2 = 1.2 The line relating class size and test scores has the equation: test score = β 0 + β ClassSize class size β 0 is the vertical-intercept, test score where class size is 0 β ClassSize is the slope of the regression line This relationship only holds on average for all districts in the population, individual districts are also affected by other factors Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 20 / 77

The Population Linear Regression Model To get an equation that holds as true for each district, we need to include other factors test score = β 0 + β ClassSize class size + other factors For now, we will ignore these until the next lesson Thus, β 0 + β ClassSize class size gives the average effect of class sizes on scores Later, we will want to estimate the marginal effects of each factor on a district s test score, holding all other factors constant Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 21 / 77

The Population Linear Regression Model Y = β 0 + β 1 X 1 + β 2 X 2 + ɛ Y is the dependent variable of interest AKA response variable, regressand, Left-hand side (LHS) variable X 1 and X 2 are independent variables AKA explanatory variables, regressors, Right-hand side (RHS) variables, covariates, control variables We have observed values of y, x 1, and x 2 & regress y on x 1 and x 2 β 0, β 1, and β 2 are unknown parameters to estimate ɛ is the error term, incorporating all other factors that affect Y It is stochastic (random) We can never measure the error term, only make assumptions about it Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 22 / 77

The Population Linear Regression Model How do we draw a line through the scatterplot? We do not know the true β ClassSize We do have data from a sample of class sizes and test scores So the real question is, how can we estimate β 0 and β 1? Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 23 / 77

The Ordinary Least Squares Estimators Y Suppose we have a scatter plot of points (X i, Y i ) We can draw a line of best fit through our scatterplot The residual (ɛ i ) of each data point is the difference between actual and predicted value of Y given X Ŷ i Y i (X i, Y i ) ɛ i = Y i Ŷi ɛ i = Y i Ŷ i If we were to square each residual and add them all up, this is Sum of Squared Errors (SSE) X i X n n SSE = ɛ 2 i = (Y i Ŷi ) 2 i=1 i=1 The line of best fit minimizes SSE Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 24 / 77

The Ordinary Least Squares Estimators The ordinary least squares (OLS) estimators of the unknown population parameters β 0 and β 1, solve the calculus problem: min β 0,β 1 n i=1 min(sse) [Y i (β 0 + β 1 X }{{} i )] Ŷ i OLS estimators minimize average squared distance between the actual values (Y i ) and the predicted values (Ŷ i ) along the estimated regression line 2 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 25 / 77

The OLS Regression Line The OLS regression line or sample regression line is the linear function constructed using the OLS estimators: Ŷ i = ˆβ 0 + ˆβ 1 X i ˆβ 0 and ˆβ 1 ( beta 0 hat and beta 1 hat ) are the OLS estimators of population parameters β 0 and β 1 using sample data The predicted value of Y given X, based on the regression, is E(Y i X i ) = Ŷ i The residual or prediction error for the i th observation is the difference between observed Y i and its predicted value, ˆɛ i = Y i Ŷ i Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 26 / 77

The OLS Regression Estimators The solution to the calculus problem yields: For ˆβ 0 : ˆβ 0 = Ȳ ˆβ 1 X For ˆβ 1 : ˆβ 1 = n (X i X )(Y i Ȳ ) i=1 = n (X i X ) 2 s XY s 2 X = cov(x, Y ) var(x ) i=1 Equivalently (r X,Y is the correlation coefficient between X & Y ): ˆβ 1 = r X,Y s Y s X Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 27 / 77

The Ordinary Least Squares Estimators Y Ŷ = ˆβ 0 + ˆβ 1 X ˆβ 1 = Y X X Y ˆβ 0 X Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 28 / 77

The Population Regression Model 690 Test Score 660 630 14 16 18 20 22 24 26 Student to Teacher Ratio Population regression line: Test score = β 0 + β 1 STR test score β 1 = STR =?? Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 29 / 77

The Sample OLS Regression Model 690 Test Score 660 630 Using OLS, we find 14 16 18 20 22 24 26 Student to Teacher Ratio Test Score = 689.9 2.28 STR Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 30 / 77

The Sample OLS Regression Model Test Score = 689.9 2.28 STR test score Estimated slope ˆβ 1 = STR = 2.28 Estimated intercept ˆβ 0 = 689.9 Not economically meaningful, just extrapolates the line from the data Literally, districts with STR of 0 have a predicted test score of 689.9 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 31 / 77

The Sample OLS Regression Model Test Score = 689.9 2.28 STR We can now make predictions with our model For a district with 20 students per teacher, the predicted test score is 689.9 2.28(20) = 644.3 Is this estimate big or small? How economically meaningful is it? Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 32 / 77

The Sample OLS Regression Model: In Stata If we plug this into Stata and run OLS, this is the output Highlighted in red: top row: coefficient for STR ( ˆβ 1 =-2.280); bottom row: intercept (constant) ( ˆβ 0 =698.93) Test score = 698.93-2.280 STR Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 33 / 77

The Sample OLS Regression Model 690 Richmond Test Score 660 630 14 16 18 20 22 24 26 Student to Teacher Ratio Stock & Watson (2015: p. 113) One district in the sample is Richmond with STR = 20.00 and Test Score = 672.45 Predicted value: Y Richmond = 698 2.28(20.00) = 652.4 Residual: ɛ Richmond = 672.4 652.4 = 20.0 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 34 / 77

Measures of Fit: R 2 How well does a line fit data? How much variation in Y i is explained by the model? How tightly clustered around the regression line are the observations? Primary measure of fit is regression R 2, the fraction of the sample variance of Y i explained (predicted) by Ŷ i. Y i = Ŷ i + ˆɛ i Observed values of dependent variable are the sum of the predicted values and the residuals (errors) Recall OLS has chosen a model specifically to minimize SSE Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 37 / 77

Measures of Fit: R 2 R 2 = ESS TSS R 2 is the ratio of the sample variance of the model (Ŷ i ) to the sample variance of observations (Y i ); ranging from 0 to 1 Explained Sum of Squares (ESS): sum of squared deviations from the predicted value from their mean: n ESS = (Ŷ i Ȳ ) 2 Total Sum of Squares (TSS): sum of squared deviations from observed values from their mean: n TSS = (Y i Ȳ ) 2 i=1 i=1 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 38 / 77

Measures of Fit: R 2 Alternatively, R 2 can be written in terms of the fraction of the variance of the observations Y i not explained by the model: R 2 = 1 SSE TSS Sum of Squared Errors (SSE): recall: SSE = Note, you may see this called sum of squared residuals (SSR) Lastly, R 2 of the regression is also equal to the square of the correlation coefficient between X and Y : n i=1 ˆɛ i 2 R 2 = (r XY ) 2 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 39 / 77

Measures of Fit: SER The standard error of the regression (SER, ˆσ, or ˆσ ɛ ) is an estimator of the standard deviation of ɛ i ˆσ = 1 n SSE ˆɛ 2 i = n 2 n 2 i=1 Measures spread of the observations around the regression line, the average size of the residual error df correction of n 2: use of 2 degrees of freedom to find β 0 and β 1 Stata gives us the Root Mean Squared Errors (Root MSE) (divides by n, rather than n 2) Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 40 / 77

Measures of Fit: Example The R 2 of the regression is 0.051, so 5.1% of the variation in Test Scores is explained by the variation in Student-Teacher Ratios SER ( Root MSE ) is 18.6, standard deviation of the residuals Large spread from the line = predictions will be off by a lot! Indicates there are other important factors that also influence test scores Note: it is very rare in econo(metr)ics that we get very high R 2 values, due to tons of unobserved variables affecting economic outcomes. Don t get discouraged! Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 41 / 77

Measures of Fit: Looking at Residuals 690 Test Score 660 630 14 16 18 20 22 24 26 Student to Teacher Ratio Recall for every data point, the equation of the line constructs a predicted value of Ŷ X This is different from the actual value of Y i. The difference (positive or negative) between the two is the residual error Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 42 / 77

Measures of Fit: Looking at Residuals 50 25 Residual 0 25 50 14 16 18 20 22 24 26 Student to Teacher Ratio We can construct a residual plot to examine the residuals (ˆɛ = Y i Ŷi ) Stronger relationships should have small residuals, data points more tightly concentrated around the regression line We now turn to the question of quantifying just how well the line fits the data Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 43 / 77

Distribution of OLS Estimators OLS estimators ( ˆβ 0, ˆβ 1 ) are computed from a specific sample of data Two sources of randomness in our estimate: 1 sampling randomness: different samples will generate different OLS estimators 2 modeled randomness: ɛ includes all factors affecting Y other than X, different samples have different values of those other factors Thus, ˆβ0, ˆβ 1 are also random variables, with their own sampling distribution Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 45 / 77

Distribution of OLS Estimators The central limit theorem allows us to say that the distribution of ˆβ 0 and ˆβ 1 are normal Generally agreed that n > 100 is sufficient ˆβ 1 N(β 1, σ ˆβ1 ) β 1 ˆβ 1 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 46 / 77

Distribution of OLS Estimators Similar to sampling distributions of sample means ( X ) We care about the sampling distribution of ˆβ 1 ( ˆβ 0 is less useful) What is E[ ˆβ 1 ] (Where is the center?) What is var[ ˆβ 1 ] (How precise is our estimate?) β 1 ˆβ 1 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 47 / 77

Exogeneity and Unbiasedness We want to see if ˆβ 1 is unbiased: there is no systematic difference, on average, between sample values of ˆβ 1 and the true population β 1, i.e. E[ ˆβ 1 ] = β 1 Doesn t mean every sample gives us ˆβ 1 = β 1, only the estimation procedure will, on average, yield the correct value On average, random errors above and below the true value cancel A long story short: ˆβ 1 is an unbiased estimator of β 1, i.e. E[ ˆβ 1 ] = β 1 when X is exogenous (See handouts for proof) Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 48 / 77

Exogeneity and Unbiasedness Recall, exogenous independent variables (X ) if it is unrelated to other factors affecting Y, i.e.: corr(x, ɛ) = 0 Technically, this is called the Zero Conditional Mean Assumption E(ɛ X ) = 0 For any known value of X, the expected value of ɛ is 0. Knowing the value of X must tell us nothing about the value of ɛ (anything else relevant to Y other than X ) We can then confidently assert causation: X Y Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 49 / 77

Endogeneity and Bias Nearly all independent variables are endogenous, they are related to the error term ɛ corr(x, ɛ) 0 Suppose we estimate the following relationship: Violent crimes t = β 0 + β 1 Ice cream sales t + ɛ t We find ˆβ 1 > 0 Does this mean Ice cream sales Violent crimes? Tell me a story Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 50 / 77

Endogeneity and Bias The true expected value of ˆβ 1 is actually (see handouts for proof): E[ ˆβ 1 ] = β 1 + corr(x, ɛ) σ ɛ σ X Takeaways: If X is exogenous: corr(x, ɛ) = 0, we re just left with β 1 The larger corr(x, ɛ) is, larger bias: ( E[ ˆβ 1 ] β 1 ) We can also sign the direction of the bias based on corr(x, ɛ) Positive corr(x, ɛ) overestimates the true β 1 ( ˆβ 1 is too high) Negative corr(x, ɛ) underestimates the true β 1 ( ˆβ 1 is too low) Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 51 / 77

Endogeneity and Bias Example wages = β 0 + β 1 educ + ɛ Is this an accurate reflection of educ wages? Does E[ɛ educ] = 0? What would E[ɛ educ] > 0 mean? Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 52 / 77

Endogeneity and Bias Example per capita cigarette consumption = β 0 + β 1 State cig tax rate + ɛ Is this an accurate reflection of tax cons? Does E[ɛ tax] = 0? What would E[ɛ tax] > 0 mean? Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 53 / 77

Exogeneity and RCTs Think about an idealized randomized controlled experiment Subjects randomly assigned to treatment or control group Implies knowing whether someone is treated (X ) tells us nothing about their personal characteristics (ɛ) Random assignment makes ɛ independent of X, so corr(x, ɛ) = 0 and E[ɛ X ] = 0 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 54 / 77

Precision and Variance of ˆβ 1 So we know the center and expected value of the sampling distribution of ˆβ 1 is β 1 What about the spread or variance? Small variance Large variance β 1 ˆβ 1 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 55 / 77

Precision and Variance of ˆβ 1 The variance of ˆβ 1 measures how precise our estimate of the slope is var( ˆβ 1 ) = ˆσ 2 n var(x ) Where ˆσ 2 is the variance of the regression ˆσ 2 = SSE n 2 = 1 n 2 n ˆɛ 2 Recall we ve see the standard error of the regression (SER) ˆσ as the square root of this! The standard error of ˆβ 1 is the square root of the variance: ˆσ se( ˆβ 1 ) = 2 n var(x ) i=1 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 56 / 77

Precision and Variance of ˆβ 1 The variance of ˆβ 1 measures how precise our estimate of the slope is var( ˆβ 1 ) = ˆσ 2 n var(x ) Variance of ˆβ 1 affected by three things: Model fit, measured by variance (or S.E.) of regression ˆσ 2 Larger ˆσ 2, larger var( ˆβ 1) Sample size n Larger n, lower var( ˆβ 1) Variation in X Larger var(x ), lower var( ˆβ 1) Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 57 / 77

Precision and Variance of ˆβ 1 690 Test Score 660 630 14 16 18 20 22 24 26 Student to Teacher Ratio Smaller var(x i ) (light dots), larger var( ˆβ 1 ) harder to determine precise slope! Larger var(x i ) (all dots), smaller var( ˆβ 1 ) easier to determine precise slope! Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 58 / 77

Solvable Problems: Heteroskedasticity and Autocorrelation Now we want to look at how the errors (ˆɛ i ) are distributed Homoskedastic if errors have same variance over all levels of X var(ˆɛ X ) = ˆσ 2 ɛ Homoskedastic if errors have different variance over all levels of X var(ˆɛ X ) ˆσ 2 ɛ Heteroskedasticity will not cause ˆβ 1 to be biased! But it does mess with the variance of β 1, causing it to overstate statistical significance in hypothesis tests! Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 60 / 77

Heteroskedasticity We ve already seen the standard error of ˆβ 1 : ˆσ se( ˆβ 1 ) = 2 n var(x ) But this assumes errors are homoskedastic When errors are heteroskedastic, standard error becomes: n se( ˆβ (X i X ) 2ˆɛ 2 i=1 1 ) = [ n ] 2 (X i X ) 2 i=1 These are known as the heteroskedasticity-robust (or just robust ) standard error of ˆβ 1 No need to memorize the formula, there is an easy fix in Stata, but know what robust standard errors are and when we need them! Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 61 / 77

Homoskedasticity Stock & Watson (2011: 156) E[ɛ X ] = 0 (exogenous) Variance of ɛ does not depend on X Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 62 / 77

Heteroskedasticity Stock & Watson (2011: 156) E[ɛ X ] = 0 (exogenous) Variance of ɛ does depend on X Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 63 / 77

Variance of the Errors Stock & Watson (2011: 162) Homoskedastic or heteroskedastic? Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 64 / 77

Heteroskedasticity-Robust Standard Errors Using the robust command, Stata computes heteroskedasticity-robust standard errors, otherwise Stata defaults to homoskedasticity-only standard errors! Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 65 / 77

Outliers An outlier is an observation that is strongly different from the rest of the sample Outliers may bias our OLS estimates by having a strong influence on the shape of the line 700 650 Test Score 600 550 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 66 / 77

Outliers Often outliers are simply the result of human error in recording data e.g. suppose you are recording height in inches (e.g. 60 ) and accidentally record one person s in centimeters (e.g. 130) Outliers may be important and valid parts of the observed effect Always check your data! Scatterplot and look for weird data points Run different models and see how adding/dropping outliers affects OLS estimates In Stata: dfbeta command Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 67 / 77

Hypothesis Testing: Overview Objective: test a hypothesis (β 1 = β 1,0 ) using data H 0 and two-sided alternative: H 0 : β 1 = β 1,0 H 0 and one-sided alternative: H 2 : β 1 β 1,0 H 0 : β 1 = β 1,0 H 2 : β 1 > β 1,0 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 69 / 77

Hypothesis Testing: Overview General approach: construct t-statistic and compute p-value (or compare to the N(0, 1) critical value Z-score) t = estimator hypothesized value standard error of the estimator Again, SE(estimator) = var(estimator) Recall: for testing the mean of Y : t = Ȳ µ Y,0 SE(Ȳ ) For testing the β 1 : t = ˆβ 1 β 1,0 SE( ˆβ 1 ) Where SE( ˆβ 1 ) is the square root of an estimator of the variance of the sampling distribution of ˆβ 1 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 70 / 77

Hypothesis Testing: Overview Construct the t-statistic: t = ˆβ 1 β 1,0 SE( β ˆ 1 ) Reject H 0 at the 5% significance level if t > 1.96 p-value = P[ t* > t ], the probability in tails of normal distribution beyond computed t ; reject H 0 if p < 0.05 For large samples, t-statistic distributed as standard normal, p-value = P( Z* > t ) = 2Φ( t ) Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 71 / 77

Hypothesis Testing: Example Estimated regression line: Stata estimates SE s: SE( ˆβ 0 ) = 10.4 SE( ˆβ 1 ) = 0.52 H 0 : β 1 = 0, H 1 : β 1 0 TestScore = 689.9 2.28 STR t-statistic: t = ˆβ 1 β 1,0 SE( β ˆ = 2.28 0 1 ) 0.52 = 4.38 4.38 > 1.96 for α = 0.05, we can reject H 0 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 72 / 77

Confidence Intervals Recall that a 95% confidence interval is The set of points that cannot be rejected at the 5% significance level An interval that is a function of the data that contains the true parameter value 95% of the time over repeated samples For large samples, since t is standard normally distributed N(0, 1), 95% confidence interval for ˆβ 1 similar to the case for the sample mean: CI (0.95) ˆβ 1 = ˆβ 1 ± 1.96SE( ˆβ 1 ) Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 73 / 77

Confidence Intervals: Example Estimated regression line: Stata estimates SE s: SE( ˆβ 0 ) = 10.4 SE( ˆβ 1 ) = 0.52 95% confidence interval for ˆβ 1 : TestScore = 689.9 2.28 STR ˆβ 1 ± 1.96SE( ˆβ 1 ) = 2.28 ± 1.96(0.52) = ( 3.30, 1.26) Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 74 / 77

Conventional Way to Report Regressions Test Score = 689.9 2.28 STR, R 2 = 0.05, SER = 18.6 (10.4) (0.52) Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 75 / 77

Stata Output of Regression Example TestScore = 689.9 2.28 STR, R 2 = 0.05, SER = 18.6 (10.4)(0.52) t(β 1 = 0) = 4.38, p-value=0.000 (2-sided) 95% Confidence Interval for β 1 is (-3.30,-1.26) Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 76 / 77

Summary of Statistical Inference About β 0 and β 1 Estimation: OLS estimators of β 0 and β 1 : ˆβ 0 and ˆβ 1 ˆβ 0 and ˆβ 1 have approximately normal sampling distributions for large n Testing: H 0 : β 1 = β 1,0 vs. H 1 : β 1 β 1,0 (β 1,0 is value of β 1 under H 0 ) t = ˆβ 1 β 1,0 SE( ˆβ 1 p-value: area under standard normal distribution (Z) outside t (for large n) Confidence Intervals 95% confidence interval for β 1 is { ˆβ 1 ± 1.96(SE( ˆβ 1 )} This is the set of β 1 not rejected at 5% level, contains true β 1 in 95% of all samples Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 77 / 77