Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

1/25 Outline Basic Econometrics in Transportation Heteroscedasticity What is the nature of heteroscedasticity? What are its consequences? How does one detect it? What are the remedial measures? Amir Samimi Civil Engineering Department Sharif University of Technology Primary Source: Basic Econometrics (Gujarati) 2/25 3/25 Nature of Heteroscedasticity Possible Reasons An important t assumption in CLRM is that E(u 2 i) = 2 i ) σ This is the assumption of equal (homo) spread (scedasticity). Example: the higher income families on the average save more than the lowerincome families, but there is also more variability in their savings. 1. As people e learn, their errors of behavior become e smaller over time. As the number of hours of typing practice increases, the average number of typing errors as well as their variances decreases. 2. As incomes grow, people have more choices about the disposition of their income. Rich people have more choices about their savings behavior. 3. As data collecting techniques improve, σ 2 i is likely to decrease. Banks that have sophisticated data processing equipment are likely to commit fewer errors.

4/25 5/25 Possible Reasons Cross-sectional and Time Series Data 4. Heteroscedasticity can arise when there are outliers. An observation that is much different than other observations in the sample. 5. Heteroscedasticity arises when model is not correctly specified. Very often what looks like heteroscedasticity may be due to the fact that some important variables are omitted from the model. 6. Skewness in distribution of a regressor is an other source. Distribution of income and wealth in most societies is uneven, with the bulk of the income and wealth being owned by a few at the top. 7. Other sources of heteroscedasticity: Incorrect data transformation (ratio or first difference transformations). Incorrect functional form (linear versus log linear models). Heteroscedasticity is likely to be more common in cross- sectional than in time series data. In cross-sectional data, one usually deals with members of a population at a given point in time. These members may be of different sizes, income, etc. In time series data, the variables tend to be of similar orders of magnitude because one generally collects the data for the same entity over a period of time. 6/25 7/25 OLS Estimation with Heteroscedasticity Method of Generalized Least Squares OLS estimators s and their variances a when.. Is it still BLUE when we drop only the homoscedasticity assumption? We can easily prove that it is still linear and unbiased. We can also show that it is a consistent estimator. It is no longer best and the minimum variance is not given by the equation above. What is BLUE in the presence of heteroscedasticity? Ideally, we would like to give less weight to the observations o s coming from populations with greater variability. Consider: Y i = β 1 + β 2 X i + u i = β 1 X 0i + β 2 X i + u i Assume the heteroscedastic variances are known: Variance of transformed disturbance term is now homoscedastic: Apply OLS to the transformed model and get BLUE estimators.

8/25 9/25 GLS Estimators Consequences of Using OLS Minimize Follow the standard calculus techniques, we have: OLS estimator for variance is a biased estimator. Overestimates or underestimates, on average Cannot tell whether the bias is positive or negative No longer rely on confidence intervals, t and F tests If we persist in using the usual testing procedures despite heteroscedasticity, whatever conclusions we draw may be very misleading. Heteroscedasticity is potentially a serious problem and the researcher needs to know whether it is present in a given situation. 10/25 11/25 Detection There eare no hard-and-fast ad ast rules for detecting heteroscedasticity, c ty, only a few rules of thumb. This is inevitable because σ 2 i can be known only if we have the entire Y population corresponding to the chosen X s, More often than not, there is only one sample Y value corresponding to a particular value of X. And there is no way one can know σ 2 i from just one Y observation. Thus, heteroscedasticity may be a matter of intuition, educated guesswork, or prior empirical experience. Most of the detection methods are based on examination of OLS residuals. Those are the ones we observe, and not u i. We hope they are good estimates. This hope may be fulfilled if the sample size is fairly large. Informal Methods Nature of the Problem Nature of problem may suggest heteroscedasticity is likely to be encountered. Residual variance around the regression of consumption on income increases with income. Graphical Method Estimated u 2 i are plotted against estimated Y i Is the estimated mean value of Y systematically related to the squared residual? a) no systematic pattern, perhaps no b-e) definite pattern, perhaps no homoscedasticity. Using such knowledge, one may transform the data to alleviate the problem.

12/25 13/25 Park Test He formalizes the graphical method, by suggesting a Log-linear model: ln σ 2 i = ln σ 2 + β ln X i + v i Since σ 2 i is generally unknown, Park suggests If β turns out to be insignificant, homoscedasticity assumption may be accepted. The particular functional form chosen by Park is only suggestive. Note: the error term v i may not satisfy the OLS assumptions. Glejser Test Glejser suggests regressing the estimated error term on the X variable: Following functional forms are suggested: For large samples the first four give generally satisfactory results. The last two models are nonlinear in the parameters. Note: some argued that v i does not have a zero expected value, it is serially correlated, and heteroscedastic. 14/25 15/25 Spearman s a Rank Correlation o Test Fit the regression to the data on Y and X and estimate the residuals. Rank both absolute value of residuals and X i (or estimated Y i ) and compute the Spearman s rank correlation coefficient: d i = difference in the ranks for i th observation. Assuming that the population rank correlation coefficient is zero and n > 8, the significance ifi of fthe sample r s can be tested t by the t test, t with df = n 2: If the computed t value exceeds the critical t value, we may accept the hypothesis of Goldfeld-Quandt d dt Test Rank the observations according to X i values. Omit c central observations, and divide the remaining observations into two groups each of (n c) / 2 observations. Fit separate OLS regressions to the first and last set of observations, and obtain the residual sums of squares RSS 1 and RSS 2. Compute the ratio If u i are assumed to be normally distributed, and if the assumption of homoscedasticity is valid, then it can be shown that λ follows the F distribution. The ability of the test depends on how c is chosen. Goldfeld and Quandt suggest that c = 8 if n = 30, c = 16 if n = 60. Judge et al. note that c = 4 if n = 30 and c = 10 if n is about 60.

16/25 17/25 Breusch Pagan Godfrey Test Success of GQ test depends on c and X with which observations are ordered. Estimate Y i = β 1 + β 2 X 2i + + β k X ki + u i by OLS and obtain the residuals. Obtain, (ML estimator of σ 2 ) Construct variables p i defined as Regress p i on the Z s as p i = α 1 + α 2 Z 2i + + α m Z mi + v i o σ 2 i is assumed to be a linear function of the Z s. o Some or all of the X s can serve as Z s. Obtain the ESS (explained sum of squares) = 0.5 ESS Assuming u i are normally distributed, one can show that if there is homoscedasticity and if the sample size n increases indefinitely, then χ 2 m 1 BPG test is an asymptotic, or large-sample, test. White s General Heteroscedasticity Test Does not rely on the normality assumption and is easy to implement. Estimate Y i = β 1 + β 2 X 2i + β 3 X 3i + u i and obtain the residuals. Run the following auxiliary regression: Higher powers of regressors can also be introduced. Under the null hypothesis (homoscedasticity), if the sample size n increases indefinitely, it can be shown that nr 2 χ 2 (df = number of regressors) If the chi-square value exceeds the critical value, the conclusion is that there is If it does not α 2 = α 3 = α 4 = α 5 = α 6 = 0. It has been argued that if cross-product terms are present, then it is a test of heteroscedasticity and specification bias. 18/25 19/25 Remedial Measures Remedial Measures Heteroscedasticity c ty does not destroy unbiasedness ess and consistency. But OLS estimators are no longer efficient, not even asymptotically. There are two approaches to remediation: when σ 2 i is known, and When σ 2 i is not known. When σ 2 i is known: The most straightforward method of correcting heteroscedasticity is by means of weighted least squares. WLS method provides BLUE estimators. When σ 2 i is unknown: Is there a way of obtaining consistent estimates of the variances and covariances of OLS estimators even if there is heteroscedasticity? The answer is yes.

20/25 21/25 White s Correction White s Procedure White has suggested a procedure by which asymptotically valid statistical inferences can be made about the true parameter values. Several computer packages present White s heteroscedasticitycorrected variances and standard errors along with the usual OLS variances and standard errors. White s heteroscedasticity-corrected standard errors are also known as robust standard errors. For a 2-variable regression model Y i = β 1 + β 2 X 2i + u i we showed: White has shown that is a consistent estimator of For Y i = β 1 + β 2 X 2i + β 3 X 3i + +β k X ki + u i we have: are the residuals obtained from the original regression. are the residuals obtained from the auxiliary regression of the regressor X j on the remaining regressors. 22/25 23/25 Example Reasonable Heteroscedasticity Patterns Y = per capita expenditure on public schools by state in 1979 Income = per capita income by state in 1979 Both the regressors are statistically significant at the 5 percent level, whereas on the basis of White estimators they are not. Since robust standard errors are now available in established regression packages, it is recommended to report them. WHITE option can be used to compare the output with regular OLS output as a check for Apart from being a large-sample agesa pepocedue,o procedure, one edawbac drawback of the White procedure is that the estimators thus obtained may not be so efficient as those obtained by methods that transform data to reflect specific types of We may consider several assumptions about the pattern of

24/25 25/25 Reasonable Heteroscedasticity Patterns Homework 5 Assumption 1: if, Basic Econometrics (Gujarati, 2003) Assumption 2: if, 1. Chapter 11, Problem 15 [50 points] 2. Chapter 11, Problem 16 [50 points] Assumption 3: if, Assumption 4: A log transformation such as lny i = β 1 + β 2 ln X i + u i very often reduces Assignment weight factor = 0.5