ECN 3202 APPLIED ECONOMETRICS 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University of Guyana 1 Semester 2, 2015/2016
PREDICTION The true value of y when x takes some particular value x0 is found as The OLS-estimated value (or predictor) of y0 is (note analogy) Can we trust it? Is it a good predictor? How big is the prediction error? Can we construct a confidence interval on the predicted value? Let us define the prediction error (or forecast error) as Note: because y is a r.v. the prediction error is also a random variable, with some mean (expectation) and some variance. 2 Let s see what they are
PREDICTION cont. If assumptions SR1 to SR5 hold then ie., Similarly, 3
Note: the smaller the variance of the original error (noise) the better (i.e., less noisy) would be the prediction, ceteris paribus. the larger the sample size N the better (i.e., less noisy) would be the prediction, ceteris paribus. the larger the variation in x (i.e., the larger ) the better (i.e., the less noisy) would be the prediction, ceteris paribus. Where will be the smallest variance of the prediction? 4
PREDICTION cont. With a help of simple algebra, we can also derive that Note: σ² is unknown, but as before, we can replace σ2 by its estimate σ² (as we did in Lecture 2) and so we get.( f )ݎ The square root of the estimated forecast error variance is called the standard error of the forecast, denoted as se(f )= If SR6 (normality) is correct (or N is large), then a 100(1 α)% confidence interval, or prediction interval for y0 is: where tc=t(1 α/2,n 2) is the value that leaves an area of α/2 in the right-tail of a t-distribution with N 2 degrees of freedom. 5
PREDICTION cont. If we construct CIs for y at all x and plot it (dashed curve) together with the fitted regression line, then we will see figure like this: The smallest variance of the prediction is obtained when x= x and the further away it is from x the larger would be the variance of prediction Reduction of the variance (or se) of the predicted error reduces the width of the confidence interval 6
Example. Using N = 40 observations on food expenditure and income: The least squares estimate (or prediction) of y given x = 20 is The standard error of the forecast is So, the 95%-prediction interval for y0, given x = 20 is: 287.61 ± (2.024)(90.63) or 287.61 ± 183.43 7
Intuitively: We are 95% confident that weekly food expenditure of the person with x=20 is between $104.18 and $471.04. 8
Example cont. The estimated confidence intervals might be somewhat disappointing they are too wide! Well, statistics is a very powerful tool, but it is not a Chrystal ball! Remember, confidence intervals (CI) in general depend on: variance of the original error (noise) sample size (N) variation in explanatory variable (x) and, in addition, for prediction CI, the smallest variance of the prediction is obtained when x0=x 9
What about the data we used?! N is small and large variation in y May be other important explanatory variables are missing? What we need is a measure of how good a regression fits the data! This measure would then indicate (before we estimate prediction intervals!) how reliable would be predictions based on such regression 10
GOODNESS OF FIT Let then and so Now square both sides of last equation and sum it over all i to get: 11
SST = total sum of squares (a measure of total variation in the dependent variable about its sample mean) SSR = regression sum of squares (the part that is explained by the regression) SSE = sum of squared errors (the part of the total variation that is unexplained at all) 12
13
GOODNESS OF FIT cont. Now, let s compute the degree of how much of total variation in the dependent variable y (i.e., in SST) is explained by our estimated regression, i.e., by SSR. For this, we can use: this R² is called coefficient of determination. If R²=1 the data fall exactly on the fitted OLS regression line, in which case we call it a perfect fit. If the sample data for y and x are uncorrelated and show no linear association, then the fitted OLS line is horizontal, so SSR = 0 and so R² = 0. Also note: For a simple regression model, R² can also be computed as 14 the square of the correlation coefficient between yi and yi.
Example Using N = 40 observations on income and food expenditure: 15
Example cont.: Output from EViews A common format for reporting regression results: 16
Example cont. We conclude that 38.5% of the variation in food expenditure (about its sample mean) is explained by the variation in x. 17
THE EFFECTS OF SCALING THE DATA Changing the scale of x into x/c: 18
Example food expenditure cont. Measuring food expenditure in dollars and income in $100: Food expenditure and income in dollars (i.e., x* = 100x): Food expenditure in $100 and income in $100 (i.e., = y/100): but t-statistics and R2 are unaffected! 19
CHOOSING A FUNCTIONAL FORM Different functional forms imply different relationships between y and x and certainly different estimated coefficients! So, one must choose the functional form carefully! 20
Linear Functional Form Model: Slope ( Marginal effect ): Meaning of slope: A one-unit increase in x leads to β2-units change in y (in whatever x and y are measured in). Measure of Elasticity: Meaning of Elasticity: The elasticity measures the percent change in y with respect to a one-percent change in x, it may vary across x (in spite of linear relationship b/w x and y!). might be a more convenient measure of the impact of x on y than slope 21
Log-Log Functional Form Suppose the true model: Let s transform it into: where y*=ln (y) and x*=lnx, i.e., natural logarithms of x and y. So, the slope-coefficient in the transformed model is β2, i.e., Also note that: i.e., 22
i.e., slope-coefficient, β2, in the log-log model is the elasticity of y vs. x! Also, note that for the log-log model: the elasticity of y with respect to x is constant! (= β2) To use this model we must have y > 0 and x > 0 23
Reciprocal Functional Form Model: Slope: Elasticity: 24
Common Functional Forms 25
Examples 26
Examples cont. 27
Examples cont. 28
A Practical Approach We should choose a functional form that is consistent with what economic theory tells us about the relationship between x and y; is compatible with assumptions SR2 to SR6; and is flexible enough to fit the data. In practice, this involves plotting the data and choosing economically-plausible models; testing hypotheses concerning the parameters; performing residual analysis; assessing forecasting performance measuring goodness-of fit; and using the principle of parsimony. 29
Example food expenditure cont.: which model to use? Linear: Linear-log: Log-linear: All slope coefficients are significantly different from zero at the 1% level of significance So, which of the models shall we trust more?! Can we just compare R² and choose the highest one? No! Not so simple 30
Remarks on Goodness-of-Fit In linear model: R² measures how well the linear model explains the variation in y, In log-linear model: R² measures how well that log-linear model explains the variation in ln(y) So, the two measures should not be compared! To compare goodness-of-fit in models with different dependent variables, we can compute the generalised R²: where y is the fitted value of y from the estimated regression on the particular model of interest and corr (y, y) is the sample correlation coefficient between y and y, estimated as 31
Example food expenditure cont. Linear model: Log-linear model: Note: For the log-linear model, we can compute the generalised R² using either the natural or corrected predictions because they differ only by a constant R²g would be the same, since correlation is not affected by a constant. Conclusion: In our example, both models have very similar (and not very high!) R² and so can be deemed as they fit the data similarly well, with linear fitting slightly better and so might be 32 preferred for the sake of simplicity (parsimony principle!)
TESTING FOR NORMALLY DISTRIBUTED ERRORS The k-th central moment of the random variable e is where μ denotes the mean (and the first moment!) of e. Measures of spread (dispersion), symmetry and peakedness are: True Variance: μ2=σ² True Skewness: S=μ3/σ³ True Kurtosis: K=μ4/σ 4 If e is normally distributed then S = 0 and K = 3. 33
The Jarque-Bera Test There are many tests for normality of the errors (or residuals). The idea of the most popular test for normality, called the Jarque-Bera test, is based on testing how far the measures of residual skewness and kurtosis are from 0 and 3, respectively. In particular: sݎ oݎݎ H0:E eݎ Noݎm l; sݎ oݎݎ H1:E eݎ non noݎm l Test Statistic: Decision: Reject if value of JB is beyond the critical value from χ(2)2 for chosen α or, simply, if p-vale is less than or equal to α. Conclusion: If H0 is rejected unlikely that errors are normal. 34 If H0 is not rejected can t say we accept H0 as the truth but have more confidence in assumption that errors are normal
Example food expend. cont. Hypotheses: H0: errors are normal H1: not H0. Test statistic: Decision rule: Reject Ho if Decision: Do not reject Ho Conclusion: We cannot reject hypothesis of normally distributed errors with our data and assumption. Although, based on inability to reject normality of errors, we can t claim errors are normal (i.e., can t accept H0), we get more confidence by making the assumption of normality, if we need 35 it
36
Prediction and Functional Forms When doing predictions, one must be remember the units of measurement and the functional form used. For example, in the case of the log-linear regression model, the fitted regression line predicts but we need to predict y! How can get one from the other? A natural predictor of y is: However, if assumption SR6 holds, then a better predictor is 37
Example The estimated log-linear model: The natural prediction of y given, for example, x = 20 is A better prediction: 38
Prediction Intervals for Log-Linear Models For purpose of Prediction Intervals in a log-linear model, it s easier to use the natural predictor (because the corrected predictor includes the estimated error variance, making t-distribution no applicable anymore!) get prediction interval in usual manner then take antilog of this interval Specifically, if SR6 (normality) is correct (or N is large), then a 100(1 α)% prediction interval for y0 is: where tc=t(1 α/2,n 2) is the value that leaves an area of α/2 in the right-tail of a t-distribution with N 2 degrees of freedom. 39