The Simple Regression Model. Part II. The Simple Regression Model

Part II The Simple Regression Model As of Sep 22, 2015

Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square Units of Measurement and Functional Form Scaling and translation Nonlinearities Expected Values and Variances of the OLS Estimators Unbiasedness Variances Estimating the Error Variance Standard Errors of ˆβ 0 and ˆβ 1

Definition Two (observable) variables y and x. y = β 0 + β 1 x + u. (1) Equation (1) defines the simple regression model. Terminology: y x Dependent variable Independent variable Explained variable Explanatory variable Response variable Control variable Predicted variable Predictor Regressand Regressor

Definition Error term u i is a combination of a number of effects, like: 1 Omitted variables: Accounts the effects of variables omitted from the model 2 Nonlinearities: Captures the effects of nonlinearities between y and x. Thus, if the true model is y i = β 0 + β 1 x i + γx 2 i + v i, and we assume that it is y i = β 0 + β 1 x + u i, then the effect of x 2 i is absorbed to u i. In fact, u i = γx 2 i + v i. 3 Measurement errors: Errors in measuring y and x are absorbed in u i. 4 Unpredictable effects: u i includes also inherently unpredictable random effects.

Estimation of the model, OLS 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square Units of Measurement and Functional Form Scaling and translation Nonlinearities Expected Values and Variances of the OLS Estimators Unbiasedness Variances Estimating the Error Variance Standard Errors of ˆβ 0 and ˆβ 1

Estimation of the model, OLS Given observations (x i, y i ), i = 1,..., n, the task for an econometrician is to estimate the population parameters β 0 (intercept parameter) and β 1 (slope parameter) of (1) in an optimal way by utilizing the sample information. Assumptions (classical assumptions): 1 Zero average error: E[u i ] = 0 for all i. 2 Homoscedasticity: var[u i ] = σ 2 for all i. 3 Uncorrelated errors: cov[u i, u j ] = 0, for all i j. 4 n i=1 (x i x) 2 > 0. 5 cov[u i, x i ] = 0. Note: Assumpton 5 is the key assumption, expressed often also as E[u i x i ] = 0, which implies cov[u i, x i ] = 0. Assumption 3. is mostly relevant in time series data or clustered data.

Estimation of the model, OLS The goal in the estimation is to find values for β 0 and β 1 of model (1) such that the error terms is as small as possible (in suitable sense). Under the above classical assumptions, the Ordinary Least Squares (OLS) minimizes the residual sum of squares of the error terms u i = y i β 0 β 1 x i, which produces optimal estimates for the parameters (the optimality criteria are discussed later). Denote the sum of squares as f (β 0, β 1 ) = n (y i β 0 β 1 x i ) 2. (2) i=1 The first order conditions (foc) for the minimum are found by setting the partial derivatives equal to zero. Denote by ˆβ 0 and ˆβ 1 the values satisfying the foc.

Estimation of the model, OLS First order conditions: f ( ˆβ 0, ˆβ 1 ) β 0 n = 2 (y i ˆβ 0 ˆβ 1 x i ) = 0 (3) i=1 f ( ˆβ 0, ˆβ 1 ) β 1 = 2 n x i (y i ˆβ 0 ˆβ 1 x i ) = 0 (4) i=1 These yield so called normal equations n ˆβ 0 + ˆβ 1 xi = y i ˆβ 0 xi + ˆβ 1 x 2 i = x i y i, (5) where the summation is from 1 to n.

Estimation of the model, OLS The explicit solutions for ˆβ 0 and ˆβ 1 are (OLS estimators of β 0 and β 1 ) n i=1 ˆβ 1 = (x i x)(y i ȳ) n i=1 (x i x) 2 (6) where x = 1 n are the sample means. ˆβ 0 = ȳ ˆβ 1 x, (7) n x i and ȳ = 1 n i=1 In the solutions (6) and (7) we have used the properties n n (x i x)(y i ȳ) = x i y i n xȳ (8) i=1 i=1 and n n (x i x) 2 = xi 2 n x 2. (9) i=1 i=1 n i=1 y i

Estimation of the model, OLS Fitted regression line: Residuals: ŷ = ˆβ 0 + ˆβ 1 x. (10) û i = y i ŷ i = (β 0 ˆβ 0 ) + (β 1 ˆβ 1 )x i + u i. (11) Thus the residual component û i consist of the pure error term u i and the sample errors due to the estimation of the parameters β 0 and β 1.

Estimation of the model, OLS Remark 2.1: The slope coefficient ˆβ 1 in terms of sample covariance of x and y and variance of x. Sample covariance: s xy = 1 n 1 n (x i x)(y i ȳ) (12) i=1 Sample variance: Thus s 2 x = 1 n 1 n (x i x) 2. (13) i=1 ˆβ 1 = s xy sx 2. (14)

Estimation of the model, OLS Remark 2.2: The slope coefficient ˆβ 1 in terms of sample correlation and standard deviations of x and y. Sample correlation: r xy = where s x = s 2 x and s y = and y, respectively. n i=1 (x i x)(y i ȳ) s xy n i=1 (x i x) =, (15) 2 n i=1 (y i ȳ) 2 s x s y s 2 y are the sample standard deviations of x Thus we can also write the slope coefficient in terms of sample standard deviations and correlation as ˆβ 1 = s y s x r xy. (16)

Estimation of the model, OLS Example 2.1: Relationship between wage and eduction. wage = average hourly earnings educ = years of education Data is collected in 1976, n = 526 Excerpt of the data set wage.raw (Wooldridge) wage (y) educ (x) 3.10 11 3.24 12 3.00 11 6.00 8 5.30 12 8.75 16 11.25 18 5.00 12 3.60 12 18.18 17 6.25 16..

Estimation of the model, OLS Scatterplot of the data with regression line. 30 25 20 Hourly earnings 15 10 5 0 0 2 4 6 8 10 12 14 16 18 20-5 Eduction (years) Figure 2.2: Wages and education.

Estimation of the model, OLS Eviews estimation results: Dependent Variable: WAGE Method: Least Squares Date: 02/07/06 Time: 20:25 Sample: 1 526 Included observations: 526 Variable Coefficient Std. Error t-statistic Prob. C -0.904852 0.684968-1.321013 0.1871 EDUC 0.541359 0.053248 10.16675 0.0000 R-squared 0.164758 Mean dependent var 5.896103 Adjusted R-squared 0.163164 S.D. dependent var 3.693086 S.E. of regression 3.378390 Akaike info criterion 5.276470 Sum squared resid 5980.682 Schwarz criterion 5.292688 Log likelihood -1385.712 F-statistic 103.3627 Durbin-Watson stat 1.823686 Prob(F-statistic) 0.000000 The estimated model is ŷ = 0.905 + 0.541x. Thus the model predicts that an additional year increases hourly wage on average by 0.54 dollars. Using (16) you can verify the OLS estimate for β 1 can be computed using the correlation and standard deviations. After that, applying (7) you get OLS estimate for the intercept. Thus in all, the estimates can be derived from the basic sample statistics.

OLS Statistics 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square Units of Measurement and Functional Form Scaling and translation Nonlinearities Expected Values and Variances of the OLS Estimators Unbiasedness Variances Estimating the Error Variance Standard Errors of ˆβ 0 and ˆβ 1

OLS Statistics n û i = 0. (17) i=1 n x i û i = 0. (18) i=1 ȳ = ˆβ 0 + ˆβ 1 x. (19) SST = SSE = SSR = n (y i ȳ) 2. (20) i=1 n (ŷ i ȳ) 2. (21) i=1 n (y i ŷ i ) 2. (22) i=1

OLS Statistics It can be shown that n n n (y i ȳ) 2 = (ŷ i ȳ) 2 + (y i ŷ i ) 2, (23) i=1 i=1 i=1 that is SST = SSE + SSR. (24) Prove this! Remark 2.3: It is unfortunate that different books and different statistical packages use different definitions, particularly for SSR and SSE. In many the former means Regression sum of squares and the latter Error sum of squares. I.e., just the opposite we have here!

Goodness-of-Fit, the R-square 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square Units of Measurement and Functional Form Scaling and translation Nonlinearities Expected Values and Variances of the OLS Estimators Unbiasedness Variances Estimating the Error Variance Standard Errors of ˆβ 0 and ˆβ 1

Goodness-of-Fit, the R-square R-square (coefficient of determination) R 2 = SSE SST = 1 SSR SST. (25) The positive square root of R 2, denoted as R, is called the multiple correlation. Remark 2.4: Here in the case of simple regression R 2 = r 2 xy, i.e. R = r xy. These do not hold in the general case (multiple regression)! Prove Remark 2.4 yourself. Remark 2.5: Generally it holds for the OLS estimation, however, that R = r yŷ, i.e. correlation between the observed and fitted (or predicted) values.

Goodness-of-Fit, the R-square Remark 2.6: It is obvious that 0 R 2 1 with R 2 = 0 representing no linear relation between x and y and R 2 = 1 representing a perfect fit. Adjusted R-square: where s 2 u = 1 n 2 R 2 = 1 s2 u sy 2, (26) n (y i ŷ i ) 2 (27) i=1 is an estimate of the residual variance σ 2 u = var[u]. We find easily that One finds immediately that R 2 < R 2. R 2 = 1 n 2 n 1 (1 R2 ). (28)

Goodness-of-Fit, the R-square Example 2.2: In the previous example R 2 = 0.164758 and adjusted R-squared, R 2 = 0.163134. The R 2 tells that about 16.5 percent of the variation in the hourly earnings can be explained by education. However, the rest 83.5 percent is not accounted by the model.

Units of Measurement and Functional Form 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square Units of Measurement and Functional Form Scaling and translation Nonlinearities Expected Values and Variances of the OLS Estimators Unbiasedness Variances Estimating the Error Variance Standard Errors of ˆβ 0 and ˆβ 1

Units of Measurement and Functional Form Consider the simple regression model with σ 2 u = var[u i ]. y i = β 0 + β 1 x i + u i (29) Let y i = a 0 + a 1 y i and x i = b 0 + b 1 x i, a 1 0 and b 1 0. Then (29) becomes y i = β 0 + β 1x i + u i, (30) where β 0 = a 1 β 0 + a 0 a 1 b 1 β 1 b 0, (31) and β 1 = a 1 b 1 β 1, (32) σ 2 u = a2 1σ 2 u. (33)

Units of Measurement and Functional Form Remark 2.7: Coefficients a 1 and b 1 scale the measurements and a 0 and b 0 shift the measurements. For example, if y is temperature measured in Celsius, then gives temperature in Fahrenheit. y = 32 + 9 5 y

Units of Measurement and Functional Form Example 2.3: Let the estimated model be Demeaned observations: ŷ i = ˆβ 0 + ˆβ 1 x i. y i = y i ȳ and x i x. So a 0 = ȳ, b 0 = x, and a 1 = b 1 = 1. Because ˆβ 0 = ȳ ˆβ 1 x, we obtain from (31) ˆβ 0 = ˆβ 0 (ȳ ˆβ 1 x) = 0. So ŷ = ˆβ 1 x. (Note ˆβ 1 remains unchanged).

Units of Measurement and Functional Form If we further define a 1 = 1/s y and b 1 = 1/s x, where s y and s x are the sample standard deviations of y and x, respectively. Applying the transformation yields standardized observations y i = y i ȳ and xi = x i x. s y s x Then again ˆβ 0 = 0. The slope coefficient becomes ˆβ 1 = s x s y ˆβ1, which is called standardized regression coefficient. As an exercise show that in this case ˆβ 1 = r xy, the correlation coefficient of x and y.

Units of Measurement and Functional Form Logarithmic transformation is one of the most applied transformation for economic variables. Table 2.1 Functional forms including log-transformations Dependent Independent Interpretation Model variable variable of β 1 level-level y x y = β 1 x level-log y log(x) y = (β 1 /100)% x log-level log(y) x % y = (100β 1 ) x log-log log(y) log(x) % y = β 1 % x Can you find the rationale for the interpretations? Remark 2.8: Log-transformation can be only applied to variables that assume strictly positive values!

Units of Measurement and Functional Form Example 2.4 Consider the wage example (Ex 2.1). Suppose we believe that instead of the absolute change a better choice is to consider the percentage change of wage (y) as a function of education (x). Then we would consider the model Estimation of this model yields log(y) = β 0 + β 1 x + u. Dependent Variable: LOG(WAGE) Method: Least Squares Sample: 1 526 Included observations: 526 Variable Coefficient Std. Error t-statistic Prob. C 0.583773 0.097336 5.997510 0.0000 EDUC 0.082744 0.007567 10.93534 0.0000 R-squared 0.185806 Mean dependent var 1.623268 Adjusted R-squared 0.184253 S.D. dependent var 0.531538 S.E. of regression 0.480079 Akaike info criterion 1.374061 Sum squared resid 120.7691 Schwarz criterion 1.390279 Log likelihood -359.3781 F-statistic 119.5816 Durbin-Watson stat 1.801328 Prob(F-statistic) 0.000000

Units of Measurement and Functional Form That is log(y) = 0.584 + 0.083x (34) n = 526, R 2 = 0.186. Note that R-squares of this model and the level-level model are not comparable. The model predicts that an additional year of education increases on average hourly earnings by 8.3%.

Units of Measurement and Functional Form Remark 2.9: Typically all models where transformations on y and x are functions of these variables alone can be cast to the form of linear model. That is, if have generally g(y) = β 0 + β 1 h(x) + u, (35) where g and h are functions, then defining y = g(y) and x = h(x) we have a linear model y = β 0 + β 1 x + u. Note, however, that all models cannot be cast to a linear form. An example is 1 cons = β 0 + β 1 income + u.

Expected Values and Variances of the OLS Estimators 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square Units of Measurement and Functional Form Scaling and translation Nonlinearities Expected Values and Variances of the OLS Estimators Unbiasedness Variances Estimating the Error Variance Standard Errors of ˆβ 0 and ˆβ 1

Expected Values and Variances of the OLS Estimators We say generally ] that an estimator of ˆθ of a parameter θ is unbiased if E[ˆθ = θ. Theorem 2.1: Under the classical assumptions 1 5 [ ] [ ] E ˆβ 0 = β 0 and E ˆβ 1 = β 1. (36) Proof: Given observations x 1,..., x n the expectations are conditional on the given x i -values. We prove first the unbiasedness of ˆβ 1. Now ˆβ 1 = = (xi x)(y i ȳ) (xi x) 2 = (xi x)y i (xi x) 2 (xi x)β 0 (xi x) 2 + β 1 (xi x)x i (xi x) 2 + (xi x)u i (xi x) 2 (37) = β 1 + 1 (xi x) 2 (xi x)u i.

Expected Values and Variances of the OLS Estimators I.e., ˆβ 1 = β 1 + 1 (xi x) 2 (xi x)u i. (38) Because x i s fixed we get [ ] 1 E ˆβ1 = β 1 + (xi (xi x) 2 x)e[u i ] = β 1 (39) Because E[u i ] = 0 by assumption 1. Thus ˆβ 1 is unbiased. Proof of unbiasedness of ˆβ 0 is left to students.

Expected Values and Variances of the OLS Estimators Theorem 2.2: Under the classical assumptions 1 through 5 and given x 1,..., x n [ ] σu 2 var ˆβ1 = n i=1 (x i x), (40) 2 [ ] ( 1 var ˆβ0 = n + x 2 ) (xi x) 2 σu. 2 (41) and for ŷ = ˆβ 0 + ˆβ 1 x with given x ( ) 1 var[ŷ] = n + (x x)2 (xi x) 2 σu. 2 (42)

Expected Values and Variances of the OLS Estimators Proof: Again we prove as an example only (40). Using (38) and the properties of variance with x 1,..., x n given [ ] [ ] 1 var ˆβ1 = var β 1 + (xi x) 2 (xi x)u i = = ( ) 2 1 (xi x) 2 (xi x) 2 var[u i ] ( ) 2 1 (xi x) 2 (xi x) 2 σu 2 (43) = σ2 u (xi x) 2 ( (x i x) 2 ) 2 = σ 2 u (xi x) 2.

Expected Values and Variances of the OLS Estimators Remark 2.10: (41) can be written equivalently as [ ] var ˆβ0 = σ2 u x 2 i n (x i x) 2. (44)

Expected Values and Variances of the OLS Estimators Recalling from equation (11) the residual û i = y i ŷ i is of the form û i = u i ( ˆβ 0 β 0 ) ( ˆβ 1 β 1 )x i. (45) This reminds us about the difference between the error term u i and the residual term û i. An unbiased estimator of the error variance σ 2 u = var[u i ] is ˆσ 2 u = 1 n 2 n ûi 2. (46) Taking the (positive) square root gives an estimator for the error standard deviation σ u = σu, 2 called usually the standard error of regression 1 ˆσ u = û2 n 2 i. (47) i=1

Expected Values and Variances of the OLS Estimators Theorem 2.3: Under the assumption (1) (5) E [ˆσ u] 2 = σ 2 u, i.e., ˆσ 2 is an unbiased estimator of σu. 2 Proof: Omitted.

Expected Values and Variances of the OLS Estimators Replacing in (40) and (41) σ 2 u by ˆσ 2 u and taking square roots give the standard error of ˆβ 1 and ˆβ 0 se( ˆβ 1 ) = ˆσ u (xi x) 2 (48) and se( ˆβ 0 ) = ˆσ u 1 n + x 2 (xi x) 2. (49) These belong to the standard output of regression estimation, see computer print-outs above.