Practical Regression: Noise, Heteroskedasticity, and Grouped Data
|
|
- Asher Hunter
- 5 years ago
- Views:
Transcription
1 DAVID DRANOVE Practical Regression: Noise, Heteroskedasticity, and Grouped Data This is one in a series of notes entitled Practical Regression. These notes supplement the theoretical content of most statistics texts with practical advice on solving real world empirical problems through regression analysis. Introduction to Noisy Variables A variable is noisy if does not exactly equal the variable of interest (the one that best fits what the theory demands) or if it is mismeasured. Here are some examples: You want to measure the impact of product-level advertising on product sales. You have data on firms total advertising budgets. To estimate product-level budgets, you divide the total budget by the number of products. Your measure of product-level advertising is noisy. You want to determine if inventory turnaround is faster in firms that use just-in-time (JIT) inventory techniques. You survey logistics managers to get information about inventory turnaround. The busy managers provide rough, and therefore noisy, estimates. Continuing this investigation of inventory turnaround, you next study whether turnaround times differ by nation. Using the survey responses, you compute the average turnaround in each nation. The number of survey respondents ranges from seventy-five in the United States to two in Chile. Based on the law of large numbers, you know that the seventy-five U.S. firms in the sample are fairly representative of the United States as a whole. But you feel that the two Chilean responses may not accurately reflect all Chilean firms. Your measure of nation-level turnaround times is noisy, especially for nations with few sample respondents. The first part of this note describes the implications of noisy variables and suggests possible ways to deal with them. The second part of this note discusses problems that arise when the error term does not satisfy the ordinary least squares (OLS) assumptions of homoskedasticity and independence by the Kellogg School of Management, Northwestern University. This technical note was prepared by Professor David Dranove. Technical notes are developed solely as the basis for class discussion. Technical notes are not intended to serve as endorsements, sources of primary data, or illustrations of effective or ineffective management. To order copies or request permission to reproduce materials, call or cases@kellogg.northwestern.edu. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of the Kellogg School of Management.
2 TECHNICAL NOTE: NOISE, HETEROSKEDASTICITY, AND GROUPED DATA Implications of Noisy Variables It is not always easy to determine which variables are noisy. After all, the best way to know if a variable is noisy is to compare it with an accurate measure. But if you had an accurate measure, you would not need the noisy one. It is sometimes possible to apply statistical common sense to determine whether variables are noisy. For now, we will suppose that we know when a variable is noisy and discuss what that means for the analysis. Noisy Dependent Variables Here are two key facts: Coefficients obtained from OLS regressions with noisy dependent variables are unbiased. This implies that your predictions are also unbiased. Coefficients obtained from OLS regressions with noisy dependent variables are estimated less precisely (i.e., the standard errors increase). Thus, your predictions are less accurate. These statements are readily confirmed. Suppose that the true model relating X to Y is: (1) Y = B 0 + B 1 X + y where y is normally distributed. Suppose further that you do not have an accurate measure of Y. Instead, you have: (2) Z = Y + z where z is a normally distributed noise term that is independent of y. 1 Substituting from (2) into (1) yields: (3) Z = B 0 + B 1 X + ( y + z ) This is a regression equation. In fact, the only difference between equations (1) and (3) is that the error term is larger in equation (3) (( y + z ) versus y ). 2 This implies that the standard errors on B 0 and B 1 are larger when you use Z as the dependent variable. This causes the standard errors of any predictions to increase as well. Noisy Predictor Variables Things are a bit different when the predictor variables are noisy. Let s see what happens when X is noisy. Suppose that the true model is: 1 In general, you do not know the precise nature of the noise. If you assume that it is normally distributed, it is usually a good approximation and makes the math much easier. 2 Recall that the sum of two normally distributed variables is also normal. Thus, ε x + ε y is normal, so that equation (2) is a standard OLS regression model. 2 KELLOGG SCHOOL OF MANAGEMENT
3 TECHNICAL NOTE: NOISE, HETEROSKEDASTICITY, AND GROUPED DATA (4) Y = B 0 + B 1 X + y Suppose that you cannot measure X with precision. Instead, you measure: (5) Q = X + x where x is normally distributed and independent of y. We won t derive it here, but the estimated B 1 will tend toward the following value: (6) Estimated B 1 = (True B 1 )/(1 + 2 x/ 2 y) where 2 x and 2 y are the variances of ε x and ε y, respectively. Noting that the denominator is larger than 1, we conclude that the estimate of B 1 is biased toward zero. 3 This is known as attenuation bias. The degree of attenuation bias depends on the relative values of 2 x and 2 y. If 2 x is large relative to 2 y (i.e., X is measured with a lot of noise relative to the regression error), then the bias can be quite large. Most of the time, you should not be overly concerned about attenuation bias. It is inevitable that you will measure some predictor variables with error. If the measurement errors are relatively small, the bias is small as well. Moreover, if you are mainly interested in hypothesis testing as opposed to examining magnitudes, then the bias is of the right type. That is, if the estimated B 1 is statistically significant when you have measurement error, then the true B 1 would be larger and likely more significant if you could eliminate that error. Heteroskedasticity A key assumption of OLS regression is that the errors for all observations are distributed identically. In other words, you expect the model to give equally precise predictions for all observations. Recalling that the OLS regression residuals are unbiased estimates of the error in the underlying regression equation, we expect that any variation in the residuals from one observation to another must be completely random. This requirement is violated if the magnitude of the residuals is correlated with some factor Z. 4 It does not matter whether Z is in your model. For example, your residual may be large in magnitude whenever Z is large, and your residual may be small in magnitude whenever Z is small. If the magnitude of the residuals is correlated with any factor Z, then your model suffers from heteroskedasticity. When you have heteroskedasticity, the OLS standard errors are incorrect. 3 If the true value of B 1 is positive, the computer will report an estimate of B 1 that is a smaller positive number. Similarly, if the true value is negative, the computer will report a smaller (in magnitude) negative number. 4 Remember, the error is the ε in the underlying model. The residual is the difference between the actual and predicted values. The two are not the same due to the randomness of the process that generates your data. Even so, the residual is your best estimate of the actual error. KELLOGG SCHOOL OF MANAGEMENT 3
4 TECHNICAL NOTE: NOISE, HETEROSKEDASTICITY, AND GROUPED DATA Testing for Heteroskedasticity Heteroskedasticity can arise in any number of ways, and there is no universal test for it. We can illustrate the problem by examining the relationship between sales and price of yogurt. Our data contains eighty-eight weeks of sales and pricing information on yogurt. The key dependent variable, labeled sales1 in the data set, gives the number of yogurt containers sold in a week. The key predictor, price1, is the price of yogurt in dollars per ounce. The variable promo1 indicates whether the yogurt is promoted in a special display case. Here is the result when we run regress sales1 price1 promo1: One way to test the assumption of heteroskedasticity is by performing an interocular (eyeball) test. Plot the residuals against the predicted values, or plot the absolute values of the residuals against the predicted values: 4 KELLOGG SCHOOL OF MANAGEMENT
5 TECHNICAL NOTE: NOISE, HETEROSKEDASTICITY, AND GROUPED DATA The residuals seem to show greater variance when the predicted values are larger (look at the wide range of residuals when the fitted values are around 9,000). This is evidence of heteroskedasticity. Eyeballing the data makes us suspicious of heteroskedasticity, but we can also perform statistical tests. Specifically, we can test specific hypotheses about the residuals. Remember, heteroskedasticity can arise in countless ways, so there are countless tests we can perform. In practice, econometricians limit their toolkits to just a few tests. The most common test for heteroskedasticity is the Breusch-Pagan (BP) test. To perform the BP test, you should regress the squared values of the residuals on the predictor variables in the original regression and then perform a joint (F) test of all the predictors in the second regression Step 1: Perform regression: Y = B 0 + B x X + e Step 2: Regress squares residuals on predictors cover residuals: e 2 = γ 0 + γ x X Step 3: Perform joint (F) test on γ x Fortunately, Stata performs this test in one command following the regression: The test statistic reveals that we can reject the null hypothesis of constant variance of the residuals, which is tantamount to rejecting the null hypothesis of homoskedasticity. Thus, the regression suffers from heteroskedasticity. KELLOGG SCHOOL OF MANAGEMENT 5
6 TECHNICAL NOTE: NOISE, HETEROSKEDASTICITY, AND GROUPED DATA Fixing the Problem So you have heteroskedasticity and you don t know what to do about it. You certainly can t ignore it. Stata (and all regression software) computes standard errors and performs t-tests under the assumption of homoskedastic residuals. If you have heteroskedasticity, your standard errors and significance tests are incorrect. Most often, the cause of heteroskedasticity is a misspecified model. Misspecification can occur if we have omitted predictors or if we should have transformed the dependent and/or independent variables, for example by taking logs. We have discussed omitted variables in a previous technical note. 5 Sometimes regressions remain heteroskedastic despite the best specifications, and your standard errors are still incorrect. (If your model is well-specified, however, your coefficients are unbiased.) The standard solution to heteroskedasticity in well-specified regressions is the White correction (attributed to Halbert White). The White correction adjusts the variance-covariance matrix that is used to compute the standard errors. The technical details are not important, but there are a few things you should note: The White method corrects for heteroskedasticity without altering the regression coefficients. If the data are homoskedastic, the White-corrected standard errors are equivalent to the standard errors from OLS. (In practice, there are always at least small differences.) There are numerous modifications to the White correction, so different software packages may yield slightly different results. The White correction can be applied to models estimated via maximum likelihood techniques. Getting White-corrected standard errors (sometimes known as whitewashing ) is very simple in Stata. Just repeat your regression and add,robust: 5 David Dranove, Practical Regression: Introduction to Endogeneity: Omitted Variable Bias, Case # (Kellogg School of Management, 2012). 6 KELLOGG SCHOOL OF MANAGEMENT
7 TECHNICAL NOTE: NOISE, HETEROSKEDASTICITY, AND GROUPED DATA Note the following: The regression coefficients are unchanged; thus, the regression R 2 is unchanged. The corrected standard errors (now called robust standard errors ) are different from the original regression. The corrected standard errors in this case are larger than before. This is typical, but does not always occur. KELLOGG SCHOOL OF MANAGEMENT 7
8 TECHNICAL NOTE: NOISE, HETEROSKEDASTICITY, AND GROUPED DATA One final point: The simple regression I have been working with is badly misspecified. Before I did any test for heteroskedasticity, I should have worked harder to correct the specification by considering additional predictors, using fixed effects (for the grocery store), and using a logged dependent variable. Using Weighted Least Squares to Correct Heteroskedasticity The Breusch-Pagan test regresses the squared residuals on all of the predictors in the model. Sometimes the squared residuals are a function of a single predictor in the model; the BP test might capture this. Because the BP regression includes all the regressors in your model, however, it is a weaker test than if you had just regressed the squared residuals on a single predictor. 6 Because you hypothesize that only a single predictor matters, your test should be limited to that predictor. A potentially more severe problem with the BP test is that the Z factor that causes heteroskedasticity might not be in your regression. If this is the case, the BP test will fail. This problem occurs in a wide class of regressions and can be fixed by using weighted least squares (WLS). One important class of applications for which the weighting factor is easy to identify and essential to use occurs whenever the left hand side (LHS) variable is drawn from individual survey data that is aggregated up to the market level. That is a rich sentence with lots of content, so let s break it down. 1. You need to have survey data. 2. The survey data is used to construct the LHS variable. 3. The LHS variable is computed by aggregating individual responses to create a market-level mean. If all three conditions hold (and they often do), then WLS is indicated. The following example should make things clearer. Suppose that you are studying determinants of television viewing in different cities. You survey lots of viewers in lots of cities to find out about their viewing habits. Your unit of analysis is the city, so you compute citywide average viewing levels. In some cities, you may have just one or two responses. In others, you have fifty or one hundred responses. Simple statistics tells you that in those cities with a higher number of responses, the citywide averages you compute are pretty close to the actual averages for those cities (assuming you have a representative sample). In those cities with only one or two responses, however, the averages you compute may be very different from the citywide averages. Because the sample sizes are rather small in many cities, your LHS variable estimated citywide viewership is noisy. But there is something predictable about the magnitude of the noise. A bit of statistics will show that if n i is the number of respondents in city i, and e i is the regression residual for city i, then the magnitude of e i is proportional to 1/n i. This implies that you have heteroskedasticity, as e i is systematically related to some factor (in this case, n i ). 6 By adding insignificant predictors to the BP test regression, you decrease your chances of getting a signficant result. 8 KELLOGG SCHOOL OF MANAGEMENT
9 TECHNICAL NOTE: NOISE, HETEROSKEDASTICITY, AND GROUPED DATA The BP test will not pick this up because sample size (n i ) is not a regressor and is therefore excluded from the test. A variant of the test can be used instead: simply regress the squared residuals on n i. To illustrate weighting, let s look at data on managed care organization (MCO) penetration. The dependent variable is the percentage of physician revenues derived from managed care insurers in each of 294 metropolitan areas. Predictors include the income, education (percentage with college degrees), and hospital concentration in each market. We will regress MCO penetration on income, percentage of population with college education, and a measure of hospital concentration in the market. Note that the dependent variable is derived from survey data and there are a different number of survey respondents in each market. I perform the standard tests for heteroskedasticity: KELLOGG SCHOOL OF MANAGEMENT 9
10 TECHNICAL NOTE: NOISE, HETEROSKEDASTICITY, AND GROUPED DATA Neither the plot nor the BP test indicate any problem with heteroskedasticity. But because I have survey data, I have my doubts. First, I plot the residual against the number of survey respondents in each market: Note the classic funnel shape; residuals get smaller as the number of respondents gets larger. Now, perform the modified BP test: I have heteroskedasticity. To eliminate this problem, I need to get the computer to pay less attention to those cities with fewer respondents. Specifically, I will weight each observation by n and run a simple OLS regression. Weighting by n means that you multiply each and every value in your data set by n before running the regression. 10 KELLOGG SCHOOL OF MANAGEMENT
11 TECHNICAL NOTE: NOISE, HETEROSKEDASTICITY, AND GROUPED DATA Here is why this works. If you multiply everything by n, then the error term for each observation is also multiplied by n. This in turn implies that the squared errors are multiplied by n. Recall that OLS works hardest to fit the observations that contribute the most to the sum of squared errors. By multiplying the squared errors by n, you force the computer to do a good job of fitting the observations with the largest n s, which is exactly what you want. But be careful. If the residuals are not exactly proportional to the square root of the sample size, then WLS is not exactly the correct solution. Still, it is probably better to use a simple solution like WLS when it is theoretically sound than to perform ex post data picking in search of the best-fitting solution (i.e., picking out a significant result after the fact, then coming up with a theory to explain that result). Be warned, however, that widespread use of WLS can cause more problems than it solves. Needless to say, there is a simple way to perform WLS in Stata. Note the addition of [w=number_surveyed] at the end of the regression command. Some observations: The results can be interpreted like any OLS results. The R 2 is lower than before. Ignore this. The original regression was heteroskedastic, so the standard errors used to compute the R 2 in the original regression were incorrect. The WLS model can still have general heteroskedasticity that can be detected using the BP test and corrected using the White correction. REVIEW: A GUIDE TO WEIGHTING You may want to put more weight on some observations than others. This is certainly the case if the errors are systematically smaller for some observations; these observations deserve more weight (e.g., when you aggregate survey data). KELLOGG SCHOOL OF MANAGEMENT 11
12 TECHNICAL NOTE: NOISE, HETEROSKEDASTICITY, AND GROUPED DATA You can check whether you need to do WLS by correlating the absolute value of the residuals with the potential weighting factor n, or better still, doing hettest n. WLS multiplies the LHS and right hand side (RHS) variables by n, where n is the weighting factor. This weights the squared errors by n, which is what you want. It is easy to perform WLS in Stata; just add [w=n] at the end of your regression statement. Avoid using WLS unless absolutely justified by the nature of the data. Summarizing Heteroskedasticity You have heteroskedasticity if the magnitude of the standard errors is correlated with some unmeasured factor. Heteroskedasticity biases the standard errors, but the coefficients are unbiased. You can test for heteroskedasticity using the hettest command in Stata. The,robust option corrects the standard errors in heteroskedastic OLS regressions. A common source of heteroskedasticity is the use of aggregated survey data. This can be corrected by using WLS. WLS and,robust are not mutually exclusive. Grouped Data Another critical assumption of OLS is that all the observations are independent. This assumption is frequently violated in practice. A prime example is regression with grouped data. For example, you may run a regression of profits for firms in a variety of industries. It seems plausible that profits will be correlated for firms within any given industry. Here is a more extreme example. Suppose you want to know if redheads are more popular than brunets. You have two friends named John and Paul. John has brown hair and Paul has red hair. At 1:00 p.m., you poll the class to see how many classmates like John more than Paul. You find that forty-five prefer John and fifteen prefer Paul. You repeat this poll at 1:10, 1:20, and so on. Your data looks as follows: 12 KELLOGG SCHOOL OF MANAGEMENT
13 TECHNICAL NOTE: NOISE, HETEROSKEDASTICITY, AND GROUPED DATA Name Hair Color Popularity John B 45 Paul R 15 John B 46* Paul R 15 John B 46 Paul R 15 John B 46 Paul R 15 John B 46 Paul R 14* *One student arrived late and another student left class early to go to a job interview. Given these ten observations, you regress popularity on an indicator variable for hair color, where Hair=1 if the hair is brown and Hair=0 if red. The resulting coefficient is B hair 30.5, which is statistically significant, thanks to the ten observations and the apparent nine degrees of freedom. Do you conclude that people with brown hair are more popular? Of course not. The reason you get a significant coefficient on hair color is that you do not have ten independent observations; you have two observations that are each repeated five times. The computer has no reason to know this; it thinks you have lots of experiments and computes the standard errors accordingly. This is an extreme example of groupiness in the data. If you do not account for the groupiness of your data, you will overstate the true degrees of freedom in your model, and the reported standard errors will be artificially small. You run a great risk of tricking yourself into thinking that you have significant findings when in reality you do not. One way to deal with grouped data is to estimate fixed effects. In fixed effects models, the computer ignores within-group variation when estimating the coefficients. Thus, only acrossgroup variation matters. (To determine the effect of hair color in the prior example, either John or Paul would need to change theirs from brown to red, or vice versa.) There are times when you do not want to estimate fixed effects models. This is especially true if you do not have much within-group variation. For example, suppose you want to study the effect of market demographics on yogurt sales. The demographics of the communities surrounding the stores will change little over time. If you include store fixed effects, you will not have sufficient within-store variation. You will have to omit the store dummies and rely on across-store variation. (You now run a heightened risk of omitted variable bias, of course, but if you have a rich set of demographics, this risk is minimized.) Suppose you go ahead and omit the store fixed effects. It is now likely that the standard errors across observations within each store are no longer independent. You have grouped data and if you don t account for it, your standard errors will be biased. The technique for adjusting the standard errors to account for groupiness is preprogrammed into Stata. Continuing the example, if you have data on the income of each store s local community, you could estimate the following regression: regress sales1 price1 promo1 income KELLOGG SCHOOL OF MANAGEMENT 13
14 TECHNICAL NOTE: NOISE, HETEROSKEDASTICITY, AND GROUPED DATA To correct the standard errors for possible groupiness, just use the cluster subcommand in Stata. In this case, the groupiness comes from the variable store, so you type: regress sales1 price1 promo1 income, cluster(store) Coping with Grouped Data Use common sense as a guide to determine if your data fall naturally into groups. As an alternative, examine the error terms for observations within specific groups. Are they systematically positive or negative? If so, then you may not have independent observations. As a result, your standard errors are too small. You can estimate a fixed effects model to avoid the resulting bias in the standard errors, but you will be unable to examine the effects of variables that vary only between groups. If you want to preserve between-group action but avoid biased standard errors, use the,cluster(groupname) option in Stata. 14 KELLOGG SCHOOL OF MANAGEMENT
15 TECHNICAL NOTE: NOISE, HETEROSKEDASTICITY, AND GROUPED DATA An Unexpected Problem (Math Optional) Suppose your initial model is: Y = B 0 + B 1 X + y You decide that you want to divide both Y and X by some other variable V. An example might be when you express both variables in per capita amounts, where V is the size of the population. If you divide Y by V, then you must divide the RHS by V to keep the equation correct. This means you are effectively regressing: Y/V = B 0 /V + (B 1 X)/V + y /V Note that the error term is now clearly larger when V is smaller that is, when the dependent and independent variables are smaller. This is blatant heteroskedasticity. One excuse for keeping the model this way is that the underlying model is in fact: Y/V = B 0 + (B 1 X)/V + y OLS appears to be safe here. If you think this is the correct model, you are almost safe. A word of caution is still necessary. Suppose the true model is: Y/V = B 0 + (B 1 X)/V + y but you do not have a precise measure of V. Instead, you have U = V + u. Thus, you are actually regressing: Y/(V + u ) = B 0 + (B 1 X)/(V+ u ) + y Note that whenever u is positive, both the dependent variable (Y/(V + u )) and the predictor variable (X/(V+ u )) in the regression are smaller in magnitude than the corresponding variables in the true model. Similarly, if u is negative, both variables are larger than they are supposed to be. This implies that the two variables move together in the data, not because the variables are causally related but because of noisy measurement of V. This will bias the estimate of B 1 upward it is more positive than the true B 1. This bias emerges whenever you divide the dependent and predictor variables by the same variable and the divisor is a noisy variable. Many empirical researchers feel that such bias is inevitable and suggest that you restate the regression in such a way as to avoid dividing both the LHS and RHS by the same variable. I generally side with this skeptical group, although I think it important to determine whether the divisor accurately measures the theoretical construct. For example, I am less worried about dividing the LHS and RHS by population (to obtain per capita values) than I am about dividing by other variables that might be measured with considerable noise (or be noisy measures of the underlying theoretical construct). KELLOGG SCHOOL OF MANAGEMENT 15
Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of
Wooldridge, Introductory Econometrics, d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of homoskedasticity of the regression error term: that its
More informationMgmt 469. Causality and Identification
Mgmt 469 Causality and Identification As you have learned by now, a key issue in empirical research is identifying the direction of causality in the relationship between two variables. This problem often
More information1 Motivation for Instrumental Variable (IV) Regression
ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data
More informationIntroduction to Econometrics. Heteroskedasticity
Introduction to Econometrics Introduction Heteroskedasticity When the variance of the errors changes across segments of the population, where the segments are determined by different values for the explanatory
More informationAn overview of applied econometrics
An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical
More informationMultiple Regression Analysis
Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators
More informationMultiple Regression Analysis
Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,
More informationPanel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43
Panel Data March 2, 212 () Applied Economoetrics: Topic March 2, 212 1 / 43 Overview Many economic applications involve panel data. Panel data has both cross-sectional and time series aspects. Regression
More informationEconometrics - 30C00200
Econometrics - 30C00200 Lecture 11: Heteroskedasticity Antti Saastamoinen VATT Institute for Economic Research Fall 2015 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business
More informationInternal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.
Section 7 Model Assessment This section is based on Stock and Watson s Chapter 9. Internal vs. external validity Internal validity refers to whether the analysis is valid for the population and sample
More informationRegression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.
Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate
More information1 Correlation between an independent variable and the error
Chapter 7 outline, Econometrics Instrumental variables and model estimation 1 Correlation between an independent variable and the error Recall that one of the assumptions that we make when proving the
More informationECNS 561 Multiple Regression Analysis
ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking
More informationIV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors
IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral IAE, Barcelona GSE and University of Gothenburg Gothenburg, May 2015 Roadmap Deviations from the standard
More informationMultiple Regression Analysis: Heteroskedasticity
Multiple Regression Analysis: Heteroskedasticity y = β 0 + β 1 x 1 + β x +... β k x k + u Read chapter 8. EE45 -Chaiyuth Punyasavatsut 1 topics 8.1 Heteroskedasticity and OLS 8. Robust estimation 8.3 Testing
More informationTopic 7: Heteroskedasticity
Topic 7: Heteroskedasticity Advanced Econometrics (I Dong Chen School of Economics, Peking University Introduction If the disturbance variance is not constant across observations, the regression is heteroskedastic
More informationPBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.
PBAF 528 Week 8 What are some problems with our model? Regression models are used to represent relationships between a dependent variable and one or more predictors. In order to make inference from the
More informationMultiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =
Economics 130 Lecture 6 Midterm Review Next Steps for the Class Multiple Regression Review & Issues Model Specification Issues Launching the Projects!!!!! Midterm results: AVG = 26.5 (88%) A = 27+ B =
More informationFinal Exam - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis March 17, 2010 Instructor: John Parman Final Exam - Solutions You have until 12:30pm to complete this exam. Please remember to put your
More informationHeteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD
Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Introduction For pedagogical reasons, OLS is presented initially under strong simplifying assumptions. One of these is homoskedastic errors,
More informationThe Simple Linear Regression Model
The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate
More informationEcon 1123: Section 2. Review. Binary Regressors. Bivariate. Regression. Omitted Variable Bias
Contact Information Elena Llaudet Sections are voluntary. My office hours are Thursdays 5pm-7pm in Littauer Mezzanine 34-36 (Note room change) You can email me administrative questions to ellaudet@gmail.com.
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationTopic 1. Definitions
S Topic. Definitions. Scalar A scalar is a number. 2. Vector A vector is a column of numbers. 3. Linear combination A scalar times a vector plus a scalar times a vector, plus a scalar times a vector...
More informationEC4051 Project and Introductory Econometrics
EC4051 Project and Introductory Econometrics Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Intro to Econometrics 1 / 23 Project Guidelines Each student is required to undertake
More informationMidterm 2 - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis February 24, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put
More informationWooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares
Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit
More informationLab 11 - Heteroskedasticity
Lab 11 - Heteroskedasticity Spring 2017 Contents 1 Introduction 2 2 Heteroskedasticity 2 3 Addressing heteroskedasticity in Stata 3 4 Testing for heteroskedasticity 4 5 A simple example 5 1 1 Introduction
More informationdownload instant at
Answers to Odd-Numbered Exercises Chapter One: An Overview of Regression Analysis 1-3. (a) Positive, (b) negative, (c) positive, (d) negative, (e) ambiguous, (f) negative. 1-5. (a) The coefficients in
More informationAnswers to Problem Set #4
Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2
More informationECON 4551 Econometrics II Memorial University of Newfoundland. Panel Data Models. Adapted from Vera Tabakova s notes
ECON 4551 Econometrics II Memorial University of Newfoundland Panel Data Models Adapted from Vera Tabakova s notes 15.1 Grunfeld s Investment Data 15.2 Sets of Regression Equations 15.3 Seemingly Unrelated
More informationECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41
ECON2228 Notes 7 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 6 2014 2015 1 / 41 Chapter 8: Heteroskedasticity In laying out the standard regression model, we made
More informationRockefeller College University at Albany
Rockefeller College University at Albany PAD 705 Handout: Suggested Review Problems from Pindyck & Rubinfeld Original prepared by Professor Suzanne Cooper John F. Kennedy School of Government, Harvard
More informationRegression with a Single Regressor: Hypothesis Tests and Confidence Intervals
Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression
More informationA particularly nasty aspect of this is that it is often difficult or impossible to tell if a model fails to satisfy these steps.
ECON 497: Lecture 6 Page 1 of 1 Metropolitan State University ECON 497: Research and Forecasting Lecture Notes 6 Specification: Choosing the Independent Variables Studenmund Chapter 6 Before we start,
More informationHypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =
Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,
More informationACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.
ACE 564 Spring 2006 Lecture 8 Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information by Professor Scott H. Irwin Readings: Griffiths, Hill and Judge. "Collinear Economic Variables,
More information2 Prediction and Analysis of Variance
2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering
More informationEconometrics (60 points) as the multivariate regression of Y on X 1 and X 2? [6 points]
Econometrics (60 points) Question 7: Short Answers (30 points) Answer parts 1-6 with a brief explanation. 1. Suppose the model of interest is Y i = 0 + 1 X 1i + 2 X 2i + u i, where E(u X)=0 and E(u 2 X)=
More informationEcon 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.
Outline 1 Elena Llaudet 2 3 4 October 6, 2010 5 based on Common Mistakes on P. Set 4 lnftmpop = -.72-2.84 higdppc -.25 lackpf +.65 higdppc * lackpf 2 lnftmpop = β 0 + β 1 higdppc + β 2 lackpf + β 3 lackpf
More informationSampling and Sample Size. Shawn Cole Harvard Business School
Sampling and Sample Size Shawn Cole Harvard Business School Calculating Sample Size Effect Size Power Significance Level Variance ICC EffectSize 2 ( ) 1 σ = t( 1 κ ) + tα * * 1+ ρ( m 1) P N ( 1 P) Proportion
More informationEconometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague
Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage
More informationFinal Exam - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your
More informationChapter 8 Heteroskedasticity
Chapter 8 Walter R. Paczkowski Rutgers University Page 1 Chapter Contents 8.1 The Nature of 8. Detecting 8.3 -Consistent Standard Errors 8.4 Generalized Least Squares: Known Form of Variance 8.5 Generalized
More informationLinear Regression with Multiple Regressors
Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution
More informationCHAPTER 6: SPECIFICATION VARIABLES
Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero
More informationFreeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94
Freeing up the Classical Assumptions () Introductory Econometrics: Topic 5 1 / 94 The Multiple Regression Model: Freeing Up the Classical Assumptions Some or all of classical assumptions needed for derivations
More informationOrdinary Least Squares Regression
Ordinary Least Squares Regression Goals for this unit More on notation and terminology OLS scalar versus matrix derivation Some Preliminaries In this class we will be learning to analyze Cross Section
More informationstatistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:
Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility
More informationWooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems
Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems Functional form misspecification We may have a model that is correctly specified, in terms of including
More informationMultiple Linear Regression CIVL 7012/8012
Multiple Linear Regression CIVL 7012/8012 2 Multiple Regression Analysis (MLR) Allows us to explicitly control for many factors those simultaneously affect the dependent variable This is important for
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level
More informationWooldridge, Introductory Econometrics, 4th ed. Chapter 6: Multiple regression analysis: Further issues
Wooldridge, Introductory Econometrics, 4th ed. Chapter 6: Multiple regression analysis: Further issues What effects will the scale of the X and y variables have upon multiple regression? The coefficients
More information1 Correlation and Inference from Regression
1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is
More informationSection 3: Simple Linear Regression
Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationRockefeller College University at Albany
Rockefeller College University at Albany PAD 705 Handout: Simultaneous quations and Two-Stage Least Squares So far, we have studied examples where the causal relationship is quite clear: the value of the
More informationLinear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons
Linear Regression with 1 Regressor Introduction to Econometrics Spring 2012 Ken Simons Linear Regression with 1 Regressor 1. The regression equation 2. Estimating the equation 3. Assumptions required for
More informationOutline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity
1/25 Outline Basic Econometrics in Transportation Heteroscedasticity What is the nature of heteroscedasticity? What are its consequences? How does one detect it? What are the remedial measures? Amir Samimi
More informationWISE International Masters
WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are
More informationLinear Regression with Multiple Regressors
Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution
More informationThe returns to schooling, ability bias, and regression
The returns to schooling, ability bias, and regression Jörn-Steffen Pischke LSE October 4, 2016 Pischke (LSE) Griliches 1977 October 4, 2016 1 / 44 Counterfactual outcomes Scholing for individual i is
More informationMotivation for multiple regression
Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope
More informationECON 497 Midterm Spring
ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain
More informationReliability of inference (1 of 2 lectures)
Reliability of inference (1 of 2 lectures) Ragnar Nymoen University of Oslo 5 March 2013 1 / 19 This lecture (#13 and 14): I The optimality of the OLS estimators and tests depend on the assumptions of
More informationAn Introduction to Mplus and Path Analysis
An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression
More informationBasic Linear Model. Chapters 4 and 4: Part II. Basic Linear Model
Basic Linear Model Chapters 4 and 4: Part II Statistical Properties of Least Square Estimates Y i = α+βx i + ε I Want to chooses estimates for α and β that best fit the data Objective minimize the sum
More informationSociology 593 Exam 2 Answer Key March 28, 2002
Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably
More informationSTOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Saturday, May 9, 008 Examination time: 3
More informationInstrumental Variables and the Problem of Endogeneity
Instrumental Variables and the Problem of Endogeneity September 15, 2015 1 / 38 Exogeneity: Important Assumption of OLS In a standard OLS framework, y = xβ + ɛ (1) and for unbiasedness we need E[x ɛ] =
More informationContest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.
Updated: November 17, 2011 Lecturer: Thilo Klein Contact: tk375@cam.ac.uk Contest Quiz 3 Question Sheet In this quiz we will review concepts of linear regression covered in lecture 2. NOTE: Please round
More informationLecture 4: Heteroskedasticity
Lecture 4: Heteroskedasticity Econometric Methods Warsaw School of Economics (4) Heteroskedasticity 1 / 24 Outline 1 What is heteroskedasticity? 2 Testing for heteroskedasticity White Goldfeld-Quandt Breusch-Pagan
More informationEconometrics Part Three
!1 I. Heteroskedasticity A. Definition 1. The variance of the error term is correlated with one of the explanatory variables 2. Example -- the variance of actual spending around the consumption line increases
More informationRegression Analysis. BUS 735: Business Decision Making and Research
Regression Analysis BUS 735: Business Decision Making and Research 1 Goals and Agenda Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn
More informationApplied Econometrics (MSc.) Lecture 3 Instrumental Variables
Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Estimation - Theory Department of Economics University of Gothenburg December 4, 2014 1/28 Why IV estimation? So far, in OLS, we assumed independence.
More informationApplied Quantitative Methods II
Applied Quantitative Methods II Lecture 4: OLS and Statistics revision Klára Kaĺıšková Klára Kaĺıšková AQM II - Lecture 4 VŠE, SS 2016/17 1 / 68 Outline 1 Econometric analysis Properties of an estimator
More informationM(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1
Math 66/566 - Midterm Solutions NOTE: These solutions are for both the 66 and 566 exam. The problems are the same until questions and 5. 1. The moment generating function of a random variable X is M(t)
More information1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE
1. You have data on years of work experience, EXPER, its square, EXPER, years of education, EDUC, and the log of hourly wages, LWAGE You estimate the following regressions: (1) LWAGE =.00 + 0.05*EDUC +
More informationECON 497: Lecture Notes 10 Page 1 of 1
ECON 497: Lecture Notes 10 Page 1 of 1 Metropolitan State University ECON 497: Research and Forecasting Lecture Notes 10 Heteroskedasticity Studenmund Chapter 10 We'll start with a quote from Studenmund:
More informationAn Introduction to Path Analysis
An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving
More informationBusiness Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More informationInstrumental variables estimation using heteroskedasticity-based instruments
Instrumental variables estimation using heteroskedasticity-based instruments Christopher F Baum, Arthur Lewbel, Mark E Schaffer, Oleksandr Talavera Boston College/DIW Berlin, Boston College, Heriot Watt
More information2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0
Introduction to Econometrics Midterm April 26, 2011 Name Student ID MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. (5,000 credit for each correct
More informationEcon 510 B. Brown Spring 2014 Final Exam Answers
Econ 510 B. Brown Spring 2014 Final Exam Answers Answer five of the following questions. You must answer question 7. The question are weighted equally. You have 2.5 hours. You may use a calculator. Brevity
More informationCRP 272 Introduction To Regression Analysis
CRP 272 Introduction To Regression Analysis 30 Relationships Among Two Variables: Interpretations One variable is used to explain another variable X Variable Independent Variable Explaining Variable Exogenous
More informationHeteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.
Heteroskedasticity y i = β + β x i + β x i +... + β k x ki + e i where E(e i ) σ, non-constant variance. Common problem with samples over individuals. ê i e ˆi x k x k AREC-ECON 535 Lec F Suppose y i =
More informationChapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression
Chapter 7 Hypothesis Tests and Confidence Intervals in Multiple Regression Outline 1. Hypothesis tests and confidence intervals for a single coefficie. Joint hypothesis tests on multiple coefficients 3.
More informationAnswer Key: Problem Set 6
: Problem Set 6 1. Consider a linear model to explain monthly beer consumption: beer = + inc + price + educ + female + u 0 1 3 4 E ( u inc, price, educ, female ) = 0 ( u inc price educ female) σ inc var,,,
More informationFinQuiz Notes
Reading 10 Multiple Regression and Issues in Regression Analysis 2. MULTIPLE LINEAR REGRESSION Multiple linear regression is a method used to model the linear relationship between a dependent variable
More informationSolutions to Problem Set 5 (Due November 22) Maximum number of points for Problem set 5 is: 220. Problem 7.3
Solutions to Problem Set 5 (Due November 22) EC 228 02, Fall 2010 Prof. Baum, Ms Hristakeva Maximum number of points for Problem set 5 is: 220 Problem 7.3 (i) (5 points) The t statistic on hsize 2 is over
More informationSimple Regression Model. January 24, 2011
Simple Regression Model January 24, 2011 Outline Descriptive Analysis Causal Estimation Forecasting Regression Model We are actually going to derive the linear regression model in 3 very different ways
More information1 A Non-technical Introduction to Regression
1 A Non-technical Introduction to Regression Chapters 1 and Chapter 2 of the textbook are reviews of material you should know from your previous study (e.g. in your second year course). They cover, in
More informationECON3150/4150 Spring 2015
ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2
More informationEconometrics Review questions for exam
Econometrics Review questions for exam Nathaniel Higgins nhiggins@jhu.edu, 1. Suppose you have a model: y = β 0 x 1 + u You propose the model above and then estimate the model using OLS to obtain: ŷ =
More informationInference in Regression Model
Inference in Regression Model Christopher Taber Department of Economics University of Wisconsin-Madison March 25, 2009 Outline 1 Final Step of Classical Linear Regression Model 2 Confidence Intervals 3
More informationEconometrics Problem Set 11
Econometrics Problem Set WISE, Xiamen University Spring 207 Conceptual Questions. (SW 2.) This question refers to the panel data regressions summarized in the following table: Dependent variable: ln(q
More informationMid-term exam Practice problems
Mid-term exam Practice problems Most problems are short answer problems. You receive points for the answer and the explanation. Full points require both, unless otherwise specified. Explaining your answer
More information8. TRANSFORMING TOOL #1 (the Addition Property of Equality)
8 TRANSFORMING TOOL #1 (the Addition Property of Equality) sentences that look different, but always have the same truth values What can you DO to a sentence that will make it LOOK different, but not change
More informationSTAT 212 Business Statistics II 1
STAT 1 Business Statistics II 1 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA STAT 1: BUSINESS STATISTICS II Semester 091 Final Exam Thursday Feb
More informationEconometrics Honor s Exam Review Session. Spring 2012 Eunice Han
Econometrics Honor s Exam Review Session Spring 2012 Eunice Han Topics 1. OLS The Assumptions Omitted Variable Bias Conditional Mean Independence Hypothesis Testing and Confidence Intervals Homoskedasticity
More information8. Instrumental variables regression
8. Instrumental variables regression Recall: In Section 5 we analyzed five sources of estimation bias arising because the regressor is correlated with the error term Violation of the first OLS assumption
More information