Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Economics 113 Simple Regression Models Simple Regression Assumptions Simple Regression Derivation Changing Units of Measurement Nonlinear effects OLS and unbiased estimates Variance of the OLS estimates Book chapters Chapters 1, 2 and 3 are relevant for the next few lectures Books have been placed on reserve at the Science and Engineering Library.

Econometrics Intro Statistics applied to non-experimental data Estimate relationships and describe behaviors Usually comes from an economic model Utility Maximization Profit Maximization Government Objectives Political Objectivs

Econometrics Example Model of criminal activity y = Hours spent breaking into cars y = f(x 1, x 2,..., x m 1, x m ) What should be included in the function? What do we do with our model and estimates? Form individual or joint hypotheses about the variables and their effects. Generate predictions

Econometrics Causality Causality: The most important thing you will learn in this class! Very hard to determine in a non-experimental setting. Examples: Maternal smoking and infant birth weight Smoking vs non-smoking mothers Are they the same? Wages and Labor supply You only receive a wage if you have a job People who are in the labor force receive a job for a reason Temperatures and CO2 More CO2 leads to higher temperatures Higher temperatures lead to more CO2

Econometrics Data Cross-sectional Data Sample of agents taken at one point in time. Ideally, the data is a random sample and observations are independent. Are cross-sectional observation independent? Time-series Data Repeat observations on specific agents. Are time-series observations are independent? Panel Data Have repeat observations for the same agents in different time periods. Ideal data, but difficult to get. Panel data can be used to analyze individual-specific differences Are panel observations are independent?

Simple Regression Model Intro How does y change with x. y can be called: Dependent variable Explained variable LHS variable x can be called: Independent variable Explanatory variable RHS variable y = β 0 + β 1 x + u u is the error term, or "disturbance" term u contains everything that we don t control for, both observed and unobserved β 1 is the slope parameter β 0 is the intercept parameter

Simple Regression Model An Example Example: Class attendance and grades grade i = β 0 + β 1 Attend i + u i How do we interpret β 0 and β 1. Suppose we estimate: grade i = 22.769 + 0.121Attend i Each additional class attended is associated with a higher grade of 0.121. Is this causal? When does β 1 summarize a causal relationship between Attend and grade?

Simple Regression Model The Assumptions General framework: y i = β 0 + β 1 x i + u i Assumption 1: E(u) = 0 This is innocuous as long as we have an intercept in the model. Assumption 2: E(u x) = E(u) Combined with assumption 1 this gives us E(u x) = 0 This means that given any x, the value of u we expect will be 0. This is not necessarily realistic. This is the hard assumption to satisfy.

Simple Regression Model The Example Example: Class attendance and grades grade i = β 0 + β 1 Attend i + u i The key: u contains all the variables, other than Attend, that help determine your grade!!!! Can you list some of these variables? For example, for A2 to hold, we would need E(u Attend = 32) = E(u Attend = 10) What does this mean? Is this likely?

Least Squares Regression The Derivation How do we estimate β 0 and β 1? Predicted Value: y i = β 0 + β 1 x i Residual: u i = y i y i Suppose that we choose to minimize the sum of squared errors min β 0 β1 n i=1 u 2 i Thus: Take derivatives! min β 0 β1 n yi β 0 β 2 1 x i (1) i=1

Least Squares Regression The Derivation Differentiate n i=1 yi β 0 β 1 x i 2 with respect to β0 : n 2 y i β 0 β 1 x i = 0 i=1 Divide by 2, divide by n 1 n n i=1 yi β 0 β 1 x i = 0 To which assumption does this equation correspond? E(u) = 0

Least Squares Regression The Derivation Differentiate n i=1 yi β 0 β 1 x i 2 with respect to β1 : n 2x i yi β 0 β 1 x i = 0 i=1 Divide by 2, divide by n 1 n n i=1 x i yi β 0 β 1 x i = 0 To which assumption does this equation correspond? E(u x) = 0

Least Squares Regression The Derivation Combining the two equations, and after lots of algebra, we get: n i=1 xi µ x (yi µ y ) β 1 = n 2 i=1 xi µ x Another way to write this β 1 = σ xy σ 2 x To solve for β 0, take means of y i = β 0 + β 1 x i + u i and rearrange: We can also solve for the residuals: β 0 = µ y β 1 µ x u i = y i y i u i = y i ( β 0 + β 1 x i )

Simple Regression Model Diagnostic Measures SST: Total sum of squares measures the total amount of variability in the dependent variable. n 2 SST = yi µ y SSR: Sum of squared residuals measures the total amount of variability that the model does not explain n 2 SSR = ui R-Squared: R 2 i=1 i=1 R 2 = 1 SSR SST Measures the variation "explained" by the model Often misinterpreted as "goodness of fit"

OLS Changing Units of Measurement Data Scaling Predictions in different units Different interpretations Example: Estimates: wage = β 0 + β 1 educ + β 2 tenure + u - educ is in years - tenure is years on the job - wage is in dollars wage = β 0 + β 1 educ + β 2 tenure Again, the u vanishes since E[u educ, tenure] = 0.

OLS Changing Units of Measurement Wage in cents rather than dollars? = wage dollars = 1 100 wage cents Original Equation: Substitute: 1 wage dollars = β 0 + β 1 educ + β 2 tenure 100 wage cents = β 0 + β 1 educ + β 2 tenure wage cents = 100 β 0 + 100 β 1 educ + 100 β 2 tenure What if we want to measure tenure in months? = tenure years = 1 12 tenure months Substitute 1 wage = β 0 + β 1 educ + β 2 12 tenure months 1 wage = β 0 + β 1 educ + β 12 2 tenure months

OLS Handling non-linearity Not everything linear in real life. Relationship between education and wage is linear? No. Which has the higher benefit? 3 more years after 6th grade? 3 more years after undergrad? Common ways to easily handle non-linearity 1 Take logs of the dependent variable 2 Take logs of the independent variable 3 Take logs of both

OLS Wage in Levels wage 0 500 1000 1500 2000 2500 3000 10 12 14 16 18 educ

OLS Wage in logs log(wage) 5.0 5.5 6.0 6.5 7.0 7.5 8.0 10 12 14 16 18 educ

OLS Handling non-linearity If data are in levels: wage = β 0 + β 1 educ How do we interpret β 1? Totally differentiate. Simplify Interpret β 1 wage = β 1 educ wage educ = β 1 wage = 15, 432 + 1, 324educ For each additional year of education, you earn $1,324 more.

OLS Handling non-linearity If wage is in logs log( wage) = β 0 + β 1 educ How do we interpret β 1? Totally differentiate. Simplify wage wage 100 }{{} % change wage wage = β 1 educ Interpret β 1 in the following results = β1 100 educ }{{} unit change log( wage) = 9.64 + 0.08educ A one-year increase in education yields an 8% increase in wage

OLS Handling non-linearity If wage and educ in logs How do we interpret β 1? Totally differentiate. Simplify log( wage) = β 0 + β 1 log(educ) wage wage = β educ 1 educ wage wage 100 }{{} % change Interpret β 1 in the following results = β 1 educ educ 100 }{{} % change log( wage) = 9.64 + 0.5 log(educ) A 1% increase in education yields an 0.5% increase in wage

Simple Regression Model Biased or unbiased When is β 1 a good estimate, where "good" is defined as unbiased? By unbiased, E β1 x = β 1 β 1 s are centered around β 1 β 1 Unbiased if the following assumptions hold! 1 Linear in parameters: y i = β 0 + β 1 x i 2 Random sample of size n. {(x 1, y 1 ), (x 2, y 2 ), (x 3, y 3 )... (x n, y n )} 3 Zero conditional mean: E(u x) = 0 4 σ 2 x > 0.

Simple Regression Model Biased or unbiased Simple example Suppose that the population is characterized by: y = 3 2x 1 + u - β 0 = 3 - β 1 = 2 - u distributed normal, mean 0 and sd 3 - x s are between 0.01 and 10, spaced evenly - 1000 people Estimate using: y = β 0 + β 1 x 1 + u Plot y on x

y -25-20 -15-10 -5 0 5 10 0 2 4 6 8 10 x

Simple Regression Model Biased or unbiased Suppose that we sample 30 people from the population, and estimate β 1 via OLS First sample: β 1 = 1.951 Second sample: β 1 = 1.890 Third sample: β 1 = 1.559 They re all wrong. Is this a problem? Keep sampling!! Sample 1000 times Plot a histogram of the estimates of β 1 How does the distribution of estimates compare to 2?

Histogram of Beta1 Density 0.0 0.5 1.0 1.5 2.0-2.5-2.0-1.5 Beta1

OLS - Variance Basics If assumptions 1-4 hold, β 1 is centered around β 1. Central tendency says nothing about dispersion. We are also interested in estimating Var( β 1 ) From the previous histogram, there is variance in the estimate β 1 Is the estimate of β 1 precise/reliable? Assumption 5 - Homoskedastic Errors: Var [u x] = σ 2 Variance of errors is common across x. Assumptions 1-5 are called the "Gauss-Markov Assumptions" If Var [u x] Var [u], errors are heteroskedastic.

y -25-20 -15-10 -5 0 5 10 0 2 4 6 8 10 x

y -80-60 -40-20 0 20 40 0 2 4 6 8 10 x

OLS - Variance Estimate Variance Variance of the slope parameter: var β1 = σ 2 n i=1 xi µ x 2 What do I need for these variance estimates? An estimate of σ 2 : σ 2 = 1 n 2 n i=1 u 2 i Why n 2? σ 2 requires estimating β 0 and β 1.

OLS Estimate Variance Standard error of β 1 : se β1 = σ 2 n i=1 xi µ x 2 Dispersion of β 1 around β 1, same scale as β 1 How does σ effect the precision of our estimates? Why? Higher σ yields higher standard errors (lower precision). With higher σ, there is more noise, and thus it is harder to get a precise estimate of β 1 Using the original example, compare the following two situations: u distributed normal, mean 0 and sd 10 u distributed normal, mean 0 and sd 3

Histogram of Beta1 - SD(u)=10 Density 0.0 0.5 1.0 1.5 2.0-5 -4-3 -2-1 0 Beta1

Histogram of Beta1 - Adding SE(u)=3 Density 0.0 0.5 1.0 1.5 2.0-5 -4-3 -2-1 0 Beta1