Lecture Notes Part 3: Regression Model

Size: px
Start display at page:

Download "Lecture Notes Part 3: Regression Model"

Transcription

1 Lecture Notes Part 3: Regression Model

2 3. Regression Model Regression is central to the social sciences because social scientists usually want to know the e ect of one variable or set of variables of interest on an outcome variable, holding all else equal. Here are some classic, important problems: What is the e ect of knowledge of a subject on preferences about that subject? Do uninformed people have any preferences? Are those preferences short-sighted? Do people develop \enlightened" interests, seeing how they bene t from collective bene t? What is the e ect of increasing enforcement and punishment on deviant behavior? For example, how much does an increase in the number of police on the streets on the crime rate? Does the death penalty deter crime? How are markets structure? How does the supply of a good change with the price people are willing to pay? How does people's willingness to pay for a good change as a good becomes scarce? To whom are elected representatives responsible? Do parties and candidates converge to the preferred policy of the median voter, as predicted by many analytical models of electoral competition? Or do parties and interest groups strongly in uence legislators' behavior? In each of these problems we are interested in measuring how changes in an input or independent variable produce changes in an output or dependent variable. For example, my colleagues interested in the study of energy wish to know if increasing public understanding of the issue would change attitudes about global warming. In particular, would people become more likely to support remediation. I have done a couple of surveys for this initiative. Part of the design of these surveys involves capturing the attitude toward global warming, the dependent variable. One simple variable is the willingness to pay. 1

3 If it solved global warming, would you be willing to pay $5 more a month on your electricity bill? [of those who answered yes] $10 more a month? $25 more a month? $50 more a month? $100 more a month? We also wanted to distinguish the attitudes of those who know a lot about global warming and carbon emissions and those who do not. We asked a battery of questions. People could have gotten up to 10 factual questions right. The average was 5. Finally, we controlled many other factors, such as income, religiosity, attitude toward business regulation, size of electricity bill and so forth. The results of the analysis are provided on the handout. The dependent variable is coded 0 for those unwilling to pay $5, 1 for those willing to pay no more than $5, 2 for those willing to pay no more than $10, 3 for those willing to pay more than $25, 4 for those willing to pay no more than $50, and 5 for those who said they would be willing to pay $100 (roughly double the typical electricity bill). What is estimated by the coe±cient on information (gotit) is the e ect of knowing more about carbon dioxide on willingness to pay for a solution to global warming. Moving from no information to complete information (from 0 to 10) increases willingness to pay by.65 units along the willingness to pay scale, which ranges from 0 to 5 and has a mean of 1.7 and standard deviation of More generally, we seek to measure the e ect on Y of increasing X 1 from x 1 to x 1.The expected change in the random variable Y is: 1 0 E[yjx 1 ;x 2 ;:::x k ] E[yjx 1 ;x 2 ; :::x k ]; where the values of all independent variables other than x 1 are held constant. Unfortunately, the world is a mess. We usually deal with observational data rather than carefully constructed experiments. It is exceedingly di±cult to hold all else constant in order to isolate the partial e ect of a variable of interest on the outcome variable. In stead of trying to isolate and control the independent variable of interest in the design of a study; we attempt to measure its independent e ect in the analysis of the data collected. 2

4 Multivariate regression is the primary data analytic tool with which we attempt to hold other things constant Basic Model In general, we can treat y and X as random variables. At times we will switch into a simpler problem { X xed. What is meant by this distinction? Fixed values of X would arise if the researcher chose the speci c values of X and assigned them to the units. Random or stochastic X arises when the researcher does not control the assignment { this is true in the lab as well as the eld. Fixed X will be easier to deal with analytically. If X is xed then it is guaranteed to be have no correlation with ² because the vectors of the matrix of X consist of xed values of the variables, not random draws from the set of possible values of the variables. More generally, X is random, and we must analyze the conditional distributions of ²jX carefully. The basic regression model consists of several sets of assumptions. 1. Linearity y = X + ² 2. 0 mean of the errors. E[²jX] = 0 3. No correlation between X and ²: E[²XjX] = 0 4. Spherical Errors (constant error variance and no autocorrelation): 5. Normally distributed errors: E[²² 0 jx]= ¾ 2 ² I ²jX» N(0;¾ 2 ² I) 3

5 Indeed, these properties are implied by (su±cient but not necessary), the assumption that y; X are jointly normally distributed. See the properties of the conditional distributions in section 2.6. It is easy to see that this model implements the de nition of an e ect above. Indeed, if all of the assumptions hold we might even say the e ect captured by the regression model is causal. Two failures of causality might emerge. First, there may be omitted variables. Any variable X j that is not measured and included in a model is captured in the error term ². An included variable might appear to cause (or not cause) y, but we have in fact missed the true relationship because we did not hold X j constant. Of course, there are a very large number of potential omitted variables, and the struggle in any eld of inquiry is to speculate what those might be and come up with explicit measures to capture those e ects or designs to remove them. Second, there may be simultaneous causation. X 1 might cause y, but y might also cause X 1. This is a common problem in economics where prices at which goods are bought and the quantities purchased are determined through bargaining or through markets. Prices and quantities are simultaneously determined. This shows up, in complicated ways, in the error term. Speci cally, the error term becomes recursive ²jX 1 necessarily depends on ujy, the error term from the regression of X on y. Social scientists have found ways to solve these problems, involving \quasi-experiments" and sometimes real experiments. Within areas of research there is a lot of back and forth about what speci c designs and tools really solve the problems of omitted variables and simultaneity. The last third of the course will be devoted to these ideas. Now we focus on the tools for holding constant other factors directly. 4

6 3.2. Estimation The regression model has K + 1 parameters { the regression coe±cients and the error variance. But, there are n equations. The parameters are overdetermined, assumning n > K + 1. We must some how reduce the data at hand to devise estimates of the unknown parameters in terms of data we know. Overdetermination might seem like a nuisance, but if n were smaller than k +1 we could not hope to estimate the parameters. Here lies a more general lesson about social inquiry. Be wary of generalizations from a single or even a small set of observations The Usual Suspects There are, as mentioned before, three general ideas about how use data to estimate parameters. 1. The Method of Moments. (a) Express the theoretical moments of the random variables as functions of the unknown parameters. The theoretical moments are the means, variances, covariances, and higher powers of the random variables. (b) Use the empirical moments as estimators of the theoretical moments { these will be unbiased estimates of the theoretical moments. (c) Solve the system of equations for the values of the parameters in terms of the empirical moments Minimum Variance/Minimum Mean Squared Error (Minimum  ). (a) Express the objective function of the sum squared errors as a function of the unknown parameters. (b) Find values of the parameters that minimize that function. 3. Maximum Likelihood. (a) Assume that the random variables follw a particular density function (usually normal). (b) Find values of the parameters that maximize the density function. 5

7 We have shown that for the regression model, all three approaches lead to the same estimate of the vector of coe±cients: b = (X 0 X) 1 X 0 y An unbiased estimate of the variance of the error term is 0 2 e e e s = n K The maximum likelihood estimate is ¾ 2 ² 0 e e ^ = n We may also estimate the variance decomposition. We wish to account for the total sum of squared deviations in y: 0 y y,which has mean square y 0 y=(n 1) (theest. varianceof y). The residual or error sum of squares is e 0 e,which has mean square s 2 = n K : e 0 e e The model or explained sum of squares is the di erence between these: y y e e = y y (y 0 b 0 X 0 )(y Xb)= b 0 X 0 Xb: which has mean square b 0 X 0 Xb=(K 1). R 2 = b0 X 0 Xb. y 0 y Interpretation of the Ordinary Least Squares Regression An important interpretation of the regression estimates is that they estimate the partial derivative, j. The coe±cient from a simple bivariate regression of y on X j measures the TOTAL e ect of a X j on y. This is akin to the total derivative. Let us break this down into component parts and see where we end up. 6

8 The total derivative, you may recall, is the partial derivative of y with respect to X j plus the sum of partial derivatives of y with respect to all other variables, X k,times the partial derivative of X k with respect to X j.if y depends on two X variables, dx 2 = + dx 2 dx dx 1 = + dx 1 dx 2 Using the idea that the bivariate regression coe±cients measure the total e ect of one variable on another, we can write the bivariate regression estimates in terms of the partial regression coe±cient (from a multivariate regression) plus other partial regression coe±cients and bivariate regression coe±cients. For simplicity consider a case with 2 X variables. Let a 1 and a 2 be the slope coe±cients from the regressions of y on X 1 and y on X 2.Let b 1 and b 2 be the partial regression coe±cients from the multivariate regression of y on X 1 and X 2. And let c 1 be the slope coe±cient from the regression of X 1 on X 2 and c 2 be the slope from the regression of X 2 on X 1. We can express our estimates of the regression parameters in terms of the other coe±cients. Let r be the correlation between X 1 and X µ (x2x2)( x1y) ( x1x2)( x2 y ) b 0 x ) ( x 0 2 ) 2 1 x a 1 a 2 c 1 1 (x0 x1 )( x = 1 r 2 B C =B C b A (x1x1)( x2y) ( x1x )( a2 a1 c2 0 (x x )( x 0 1 x 2 ) ( x 1 r x1 y ) 0 ) 1 x 2 2 We can solve these two equations to express a 1 and a 2 in terms of b 1 and b 2. a 1 = b 1 + b 2 c 1 a 2 = b 2 + b 1 c 2 : This is estimated total e ect (bivariate regression) equals the estimated direct e ect of X j (partial coe±cient) on y plus the indirect e ect of X j through X k. Another way to express these results is that multivariate regression can be simpli ed to bivariate regression, once we partial out the e ects of the other variables on y and on X j. 7

9 Suppose we wish to measure and display the e ect of X 1 on y. Regress X 1 on X 2 ;:::X k. Regress y on X 2 ;:::X k. Take the residuals from each of these regressions, e(x 1 jx 2 ;:::x k )and e(yjx 2 ;:::x k ). The rst vector of residuals equals the part of X 1 remaining after subtracting out the other variables. Importantly, this vector of residuals is independent of X 2 ; :::X k.the second vector equals the part of y remaining after subtratcting the e ects of X 2 ; :::X k {that is the direct e ect of X 1 and the error term ². Regress e(yjx 2 ;:::x k )on e(x 1 jx 2 ; :::x k ). This bivariate regression estimates the partial regression coe±cient from the multivariate regression, 1. Partial regression is very useful for displaying a particular relationship. Plotting the rst vector of residuals against the second vector displays the partial e ect of X 1 on y, holding all other variables constant. In STATA partial regression plots are implmented through the command avplot. After performing a regression type the command avplot x1, where x1 is the variable of interest. A nal way to interpret regression is using prediction. The predicted values of a regression are the values of the regression plane, ^y. These are estimates of E[yjX]. We mayvaryvalues of any X j to generate predicted values. Typically, we consider the e ect on y^ of a onestandard deviation change in X j, say from 1/2 standard deviation below the mean of X to 1/2 standard deviation above the mean. 1 Denote y^1 as the predicted value at x j = ¹x j + :5s j and y^0 as the predicted value at x0 j = ¹x j :5s j. The matrix of all variables except x j is X (j ) and b (j) is the vector of coe±cients except b j. Choose any vector x (j). A standard deviation change in the predicted values is: y^1 y^0 =(b 0 + b j x j + x ( 0 j)b (j) ) (b 0 + b j x 0 + x (j) b (j) )= b j (x j x j )= b j s j : j For non-linear fucntions such variations are harder to analyze, because the e ect of any one variable depends on the values of the other variables. We typically set other variables equal to their means An Alternative Estimation Concept: Instrumental Variables. 8

10 There are many other ways we may estimate the regression parameters. For example, we could minimize the Mean Absolute Deviation of the errors. One important alternative to ordinary least squares estimation is instrumental variables estimation. Assume that there is a variable or matrix of variables, Z, such that Z does not directly a ect y, i.e., it is uncorrelated with ², but it does correlate strongly with X. The instrumental variables estimator is: b IV =(X 0 Z) 1 Z 0 y: This is a linear estimator. It is a weighted average of the y's, where the weights are of the form z i x), in stead of x i (z (x i ¹ x). i z¹)(x i ¹ x)(x i ¹ In the bivariate case, this esimator is the ratio of the slope coe±cient from the regression of y on z to the slope coe±cient from the regression of x on z. This estimator is quite important in quasi-experiments, because, if we can nd valid instruments, the estimator will be unbiased because it will be independent of omitted factors in the least squares regression. It will, however, come at some cost. It is a noisier estimator than least squres. 9

11 3.3. Properties of Estimates Parameter estimates themselves are random variables. They are functions of random variables that we use to make guesses about unknown constants (parameters). Therefore, from one study to the next, parameter estimates will vary. It is our hope that a given study has no bias, so if the study were repeated under identical conditions the results would vary around the true parameter value. It is also hoped that estimation uses all available data as e±ciently as possible. We wish to know the sample properties of estimates in order to understand when we might face problems that might lead us to draw the wrong conclusions from our estimates, such as spurious correlations. We also wish to know the sample properties of estimates in order to perform inferences. At times those inferences are about testing particular theories, but often our inferences concern whether we've built the best possible model using the data at hand. In this regard, we are concerned about potential bias, and will try to error on the side of getting unbiased estimates. Erring on the side of safety, though, can cost us e±ciency. Test for the appropriateness of one model versus another, then, depend on the tradeo between bias and e±ciency Sampling Distributions of Estimates We will characterize the random vector b with its mean, the variance, and the frequency function. The mean is, which means that it is unbiased. The variance is ¾ 2 ² (X 0 X) 1,which is the matrix describing the variances and covariances of the estimated coe±cients. And the density function f(b) is approximated by the Normal distribution as n becomes large. The results about the mean and the variance stem from the regression assumptions. First, consider the mean of the parameter estimate. Under the assumption that ² 0 X = 0, 10

12 the regression estimates are unbiased. That is, E[b] = E[bjX] = E[(X 0 X) 1 X 0 y]= E[(X 0 X) 1 X 0 (X + ²)] = (X 0 X) 1 X 0 X + E(X 0 X) 1 X 0 ² = This means that before we do a study we expect the data to yield a b of. Of course, the density of any one point is zero. So another way to think about this is if we do repeated sampling under identical circumstances, then the coe±cients will vary from sample to sample. The mean of those coe±cients, though, will be. Second, consider the variance. V [b] = E[(b )(b ) 0 ]= E[( +(X 0 X) 1 X 0 ² )( +(X 0 X) 1 X 0 ² ) 0 ] = E[(X 0 X 1 X 0 ²)((X 0 X) 1 X 0 ²) 0 ]= E[X 0 X 1 X 0 ²² 0 X(X 0 X) 1 ] = ¾ 2 (X 0 X 1 X 0 X(X 0 X) 1 )= ¾ 2 X 0 X 1 Here we have assumed the sphericality of the errors and the exogeneity of the independent variables. The sampling distribution function can be shown to follow the joint normal. This follows from the multivariate version of the central limit theorem, which I will not present because of the complexity of the mathematics. The result emerges, however, from the fact that the regression coe±cients are weighted averages of the values of the random variable y, where the xi weights are of the form (xi. More formally, let b ¹x) 2 n be the vector of regression coe±cients estimated from a sample of size n. b n! d N( ; ¾ 2 ² (X 0 X 1 )) Of course, as n! 1,(X 0 X 1 )! 0. We can see this in the following way. The elements x 0 j of (X 0 X 1 ) are, upto a signedterm, xk jx 0 Xj. The elements of the determinant are multiples of the cross products, and as observations are added the determinant grows faster than any single cross product. Hence, each element approaches 0. Consider the 2x2 case. µ 0 0 x 2 x 2 ; x 1 x 2 (X 0 X 1 1 )= x1 x 1 x 2 x 2 (x 1 x 2 ) 2 0 : x 2 x 1 ; x 1 x 1 11

13 This may be rewritten as à 1 1! x0 0 x 0 x (x 0 2 ; 1x1 ( x1 x ) x 1 x 1 = x 2 ) 0 (x 0 x x x2= x x2 ) ( x ) ; 2x2= x1x2 ) ( x x x1 ( x x2 ) 2 = x 0 x x 2 Taking the limit of this matrix as n!1amounts to taking the limit of each element of the matrix. Considering each term, we see that the sum of squares in the denominators grow with the addition of each observation, while the numerators remain constant. Hence, each element approaches 0. Combined with unbiasedness, this last result proves consistency. Consistency means that the limit as n grows of the probability that an estimator deviates from the parameter of interest approaches 0. That is, lim n!1 Pr(jµ^ n µj)= 0 A su±cient condition for this is that the estimator be unbiased and that the variance of the estimator shrink to 0. This condition is called convergence in mean squared error, and is an immediate application of Chebychev's Inequality. Of course, this means that the limiting distribution of f(b) shrinks to a single point, which is not so good. So the normality of the distribution of b is sometimes written as follows: p n(bn )» N(0;¾ 2 ² Q 1 ); 1 where Q = n (X 0 X), the asymptotic variance covariance matrix of X's We may consider in this the distribution of a single element of b. 2 b j» N( j ;¾ ² a jj ); where a jj is the jth diagonal element of (X 0 X) 1. Hence, we can construct a 95 percent con dence interval for any parameter j using the normal distribution and the above formula for the variance of b j. The standard error is q ¾ 2 ² a jj. So, a 95 percent con dence interval for a single parameter is: b j 1:96 q ¾ 2 ² a jj 12

14 . The instrumental variables estimator provides an interesting contrast to the least squares estimator. Let us consider the sampling properties of the Instrumental Variables estimator. E[b IV ]= E[(X 0 Z) 1 (Z 0 y)] = E[(X 0 Z) 1 (Z 0 (X + ²)] = E[(X 0 Z) 1 (Z 0 X) +(X 0 Z) 1 (Z 0 ²))] = E[ +(X 0 Z) 1 (Z 0 ²))] = The instrumental variables estimator is an unbiased estimator of. The variance of the instrumental variables estimator is: 0 V [b IV ]= E[(b IV )(b IV ) 0 ]= E[(X Z) 1 (Z 0 ²))((X 0 Z) 1 (Z 0 ²)) 0 ] 2 = E[(X 0 Z) 1 (Z 0 ²²Z((X 0 Z) 1 ]=(X 0 Z) 1 (Z 0 ¾ ² IZ((X 0 Z) 1 2 = ¾ ² (X 0 Z) 1 (Z 0 Z((X 0 Z) 1 : As with the least squares estimator the instrumental variables estimator will follow a normal distribution because the IV estimator is a (weighted) sum of the random variables, y Bias versus E±ciency General E±ciency of Least Squares An important property of the Ordinary Least Squares estimates is that they have the lowest variance of all linear, unbiased estimators. That is, they are the most e±cient unbiased estimators. This result is the Gauss-Markov Theorem. Another version arises as a property of the maximum likelihood estimator, where the lower bound for the variances of all possible 2 consistent estimators is ¾ ² (X 0 X) 1. This implies that the Instrumental Variables estimator is less e±cient than the Ordinary Least Squares estimator. To see this, consider the bivariate regression case. 2 ¾ ² V [b] = P (x x) 2 i ¹ 13

15 2 P ¾ (zi z¹) 2 V [b ² IV ]= P (x i x)(z ¹ i z¹) P P (zi z¹)2 P (xi ¹ x) Comparingthese twoformulas: V [b 2 IV ]=V [b] = (. This ratio is the inverse (x i x)(z ¹ i z¹) of the square of the correlation between X and Z. Since the correlation never exceeds 1, we know the numerator must be larger than the denominator. (The square of the correlation is known to be less than 1 because of the Cauchy-Schwartz inequality.) Sources of Bias and Ine±ciency in Least Squares There are four primary sources of bias and inconsistency in least squares estimates: measurement error in the independent variables, omitted variables, non-linearities, and simultaneity. We'll discuss two of these cases here { speci cation of regressors (omitted variables) and measurement error. Measurement Error. Assume that X is the true variable of interest but we can only measure X = X + u; where u is a random error term. For example, suppose we regress the actual share of the vote for the incumbent president in an election on the job approval rating of the incumbency president, measured in a 500 person preelection poll the week before the election. Gallup has measured this since Each election is an observation. The polling data will have measurement error based on random sampling inherent in surveys. Speci cally, the variance of the measurement error is p(1 p) n,where p is the percent approving of the president. Another common source of measurement error arises from typographical errors in datasets. Keypunching errors are very common, even in data sets distributed publicly through reputable sources such as the Interuniversity Consortium for Political and Social Research. For example, Gary King, Jim Snyder, and others who have worked with election data estimate that about 10 percent of the party identi cation codes of candidates are incorrect in some of the older ICPSR datasets on elections. Finally, some data sources are not very reliable, or estimates must be made. This is 14

16 common in social and economic data in developing economies and in data on wars. If we regress y on X, the coe±cient is a function of both the true variable X and the error term. That is b = P n Pn Pn i=1(x x)(y i y¹) i=1(x i x ¹ i ¹ + u i u)(y ¹ i y¹) = x) 2 n ¹ i=1 (x i ¹ Pi=1 (x i x + u u) 2 i ¹ P n ¹ P n = P n P n P n i=1 (x i x )(y i y¹)+ i=1 (u i ¹u)(y i y¹) i=1(x i x ¹ ) 2 + i=1(u i u) ¹ + i=1(x x i ¹ )(u i ¹u) This looks like quite a mess. A few assumptions are made to get some traction. First, it is usually assumed that u and ² and that u and X are uncorrelated. If they are correlated, things are even worse. Second, the u i 's is assumed to be uncorrelated with one another and to have constant variance. Taking expected values of b will be very di±cult, because b is a function of the ratio of random variables. Here is a situation where Probability Limits (plim's) make life easier. Because plims are limits, they obey the basic rules of limits. The limit of a sum is the sum of the limit and the limit of a ratio is the ratio of the limits. Divide the top and bottom of b by n. Now consider the probability limit of each element of b: 1 X n n i=1 plim (x x¹ ) 2 = ¾ 2 i n n 1 X 1 X ¹ ¹ ¹ i i x n i=1 n i=1 plim (x x )(y i y¹) = plim (x i x )( x x )= ¾ 2 1 X n n i=1 plim (x i x¹ )(u i ¹u)= 0 n 1 X n i=1 We can pull these limits together as follows: plim (u u) 2 = ¾ 2 i ¹ u x plimb = ¾ 2 x ¾ 2 x = ¾ 2 + ¾ 2 ¾ 2 + ¾ 2 x u x u < Thus, in a bivariate regression, measurement error in the X variable biases the estimate toward 0. In a multivariate setting, the bias cannot generally be signed. Attentuation is 15

17 typical, but it is possible also to reverse signs or in ate the coe±cients. In non-linear models, such as those using the square of X, the bias terms become quite substantial and even more troublesome. The best approach for eliminating measurement error is cleaning the dataset. However, it is possible to x some measurement error using instrumental variables. One approach is to use the quantiles or the ranks of the observed X to predict the observed X andthenuse the predicted values in the regression predicting y. The idea behind this is that the ranks of X are correlated with the underlying true values of X, i.e., X but not with u or ². These examples of measurement error assume that the error is purely random. It may not be. Sometimes measurement error is systematic. For example, people underreport socially undesirable attitudes or behavior. This is an interesting subject that is extensively studied in public opinion research, but often understudied in other elds. A good survey researcher, for example, will tap into archives of questions and even do question wording experiments to test the validity and reliability of instruments. Choice of Regressors: omitted and Included Variables. The struggle in most statistical modeling is to specify the most appropriate regression model using the data at hand. The di±culty is deciding which variables to include and which to exclude. In rare cases we can be guided by a speci c theory. Most often, though, we have gathered data or will gather data to test speci c ideas and arguments. From the data at hand what is the best model? There are three important rules to keep in mind. 1. Omitted Variables That Directly A ect Y And Are Correlated With X Produce Bias. The most common threat to the validity of estimates from a multivariate statistical analysis is omitted variables. omitted variables a ect both the consistency and e±ciency of our estimates. First and foremost, they create bias. Even if they do not bias our results, we 16

18 often want to control for other factors to improve e±ciency. To see the bias due to omitted variables assume that X is a matrix of included variables and Z is a matrix of variables not included in the analysis. The full model is y = X X + Z z + ²: Suppose that Z is omitted. Obviously we can't estimate the coe±cient.will the other z coe±cients be biased? Is there a loss of e±ciency? Let b x be the parameter vector estimated when only the variables in the matrix X are included. Let X bet the subset of coe±cients from the true model on the included variables, X. The model estimated is where u = Z z + ². y = X X + u; 0 0 X) E[b X ]= E[(X 0 X) 1 X y] = E[(X X X X +(X 0 X) 1 X u] = + E[(X 0 X) 1 X 0 Z z ]+ E[(X 0 X) 1 X 0 ²]= X + 0 z ; X zx where zx is a matrix of coe±cients from the regression of the columns of Z on the variables in X. This is an extremely useful formula. There are two important lessons to take away. First, omitted variables will bias regression estimates of the included variables (and lead to inconsistency) if (1) those variables directly a ect Y and (2) those variables are correlated with the included variables (X). It is not enough, then, to object to an analysis that there are variables that have not been included. That is always true. Rather, science advances by conjecturing (and then gathering the data on) variables that a ect y directly and are correlated with X. I think the latter is usually hard to nd. Second, we can generally sign the bias of omitted variables. When we think about the potential problem of an omitted variable we usually have in mind the direct e ect that it 17

19 likely has on the dependent variable and we might also know or can make a reasonable guess about the correlation with the included variables of interest. The bias in an included variable will be the direct e ect of the omitted variable on y times the e ect of the included variable on the excluded variable. If both of those e ects are positive or both are negative then the estimated e ect of X on Y will be biased up { it will be too large. If one of these e ects is negative and the other positve then the estimate e ect of X on Y will be biased downward. 2. E±ciency Can Be Gained By Including Variables That Predict Y But Are Uncorrelated With X. 2 A straightforward analysis reveals that the estimated V [b X ]= s e (X 0 X) 1. So 1 good. But the variance of the error term is in ated. Speci cally, s 2 = 0 e n K u u. Because 0 2 u = Z Z + ², E[u 0 u=(n K)] = ZZ 0 Z Z=(n K) + ¾² 2 >¾ ². In fact, the estimated residual variance is too large by the explained or model sums of squared errors for the omitted variables. This has an interesting implication for experiments. Randomization in exp eriments guarantees unbiasedness. But, we still want to control for other factors to reduce noise. In fact, combining regression models in the data analysis is a powerful way to gain e±ciency (and reduce the necessary sample sizes) in randomized experiments. Even in observational studies we may want to keep in a model a variable whose inclusion does not a ect the estimated value of the parameter of a variable of interest if the included variable strongly a ects the dependent variable. Keeping such a variable captures some of the otherwise unexplained error variance, thereby reducing the estimated variance of the residuals. As a result, the size of con dence intervals will narrow. far so 3. Including Variables That Are Unrelated To Y and X Loses E±ciency (Precision), And There IsNo BiasFrom Their Exclusion That there is no bias can be seen readily from the argument we have just made. 18

20 The loss of e±ciency occurs because we use up degrees of freedom. Hence, all variance estimates will be too large. Simply put, parsimonious models are better. COMMENTS: a. Thinking through the potential consequences of omitted variables in this manner is very useful. It helps you identify what other variables will matter in your analysis and why, and it helps you identify additional information to see if this could be a problem. It turns out that the ideological t with the district has some correlation with the vote, but it is not that strong. The reason is there is relatively little variation in ideological t that is not explained by simple knowledge of party. So this additional information allows us to conjecture (reasonably safely) that, although ideological t could be a problem, it likely does not explain the substantial bias in the coe±cients on spending. b.the general challenge in statistical modeling and inference is deciding how to balance possible biases against possible ine±ciencies in choosing a particular speci cation. Naturally, we usually wish to err on the side of ine±ciency. But, these are choices we make on the margin. As we will see statistical tests measure whether the possible improvement in bias from one model outweighs the loss of e±ciency compared to another model. This should not distract from the main objective of your research, which is to nd phenomena and relationships of large magnitude and of substantive importance. Concern about omitted Variable Bias should be a seed of doubt that drives you to make the estimates that you make as good as possible Examples Model building in Statistics is really a progressive activity. We usually begin with interesting or important observations. Sometimes those originate in a theory of social behavior, 19

21 and sometimes they come from observation of the world. Statistical analyses allow us to re ne those observations. And sometimes they lead to a more re ned view of how social behavior works. Obvious problems of measurement error or omitted variables exist when the implications of an analysis are absurd. Equally valid, though, are arguments that suggest a problem with a plausible result. Here we'll consider four examples. 1. Incumbency Advantages The observation of the incumbency advantage stems from a simple di erence of means. From 1978 to 2002, the average Demcoratic vote share of a typical U.S. House Democratic incumbent is 68%; the average Democratic vote share a typical U.S. House Republican incumbent is 34 The incumbency advantage model is speci ed as follows. The vote for the Democratic candidate in district i in election t equals the normal party vote, N i, plus a national party tide, t, plus the e ect of incumbency. Incumbency is coded I it = +1 for Democratic Incumbents, I it = 1 for Republican Incumbents, and I it =0 for Open Seats. V it = t + N i + I it + ² it Controlling for the normal vote and year tides reduces the estimated incumbency e ect to about 7 to 9 percentage oints. 2. Strategic Retirement and the Incumbency Advantage An objection to models of the incumbency advantage is that incumbents choose to step down only when they are threatened, either by changing times, personal problems, or an unusually good challenger. The reasons that someone retires, then, might depend on factors that the researcher cannot measure but that predict the vote { omitted variables. This would cause I to be correlated to the regression error u. The factors that are thought to a ect the retirement decisions and the vote are negatively correlated with V and negatively correlated with I (making it less likely to run). Hence, the incumbency advantage may be in ated. 20

22 3. Police/Prisons and Crime The theory of crime and punishment begins with simple assumptions about rational behavior. People will comit crimes if the likelihood of being caught and the severity of the punishment are lower than the bene t to the crime. A very common observation in the sociology of crime is that areas that have larger numbers of police or more people in prison have higher crime rates. 4. Campaign Spending and Votes Researchers measuring the factors that explain House election outcomes include various measures of electoral competition in explaining the vote. Campaign Expenditures, and the advertising they buy, are thought to be one of the main forces a ecting election outcomes. A commonly used model treats the incumbent party's share of the votes as a function of the normal party division in the congressional district, candidate characteristics (such as experience or scandals), and campaign expenditures of the incumbent and the challenger. Most of the results from such regressions make sense: the coe±cient on challenger spending and on incumbent party strength make sense. But the coe±cient on incumbent spending has the wrong (negative) sign. The naive interpretation is that the more incumbents spend the worse they do. One possibile explanation is that incumbents who are unpopular and out of step with their districts have to spend more in order to remain in place. Could this explain the incorrect sign? In this account of the bias in the spending coe±cients there is a positive correlation between the omitted variable, \incumbent's t with the district," and the included variable, \incumbent spending." Also, the incumbent's t with the district likely has a negative direct e ect on the vote. The more out-of-step an incumbent is the worse he will do on election day. Hence, lacking a measure of \ t with the district" might cause a downward bias. Biases General strategies for Correcting for Omitted Variables and Measurement Error 21

23 1. More Data, More Variables. Identify relevant omitted variables and then try to collect them. This is why many regression analyses will include a large number of variables that do not seem relevant to the immediate question. They are included to hold other things constant, and also to improve e±ciency. 2. Multiple Measurement. Two sorts of use of multiple measurement are common. First, to reduce measurement error researchers often average repeated measures of a variable or construct an index to capture a \latent" variable. Although not properly a topic for this course, factor analysis and muti-dimensional scaling techniques are very handy for this sort of data reduction. Second, to eliminate bias we may observe the \same observation" many times. For example, we could observe the crime rate in a set of cities over a long period of time. If the omitted factor is one that due to factors that are constant within Panel models. omitted Variables as Nuisance Factors. Using the idea of "control in the design." 3. Instrumental Variables. Instrumental variables estimates allow researchers to purge the independent variable of interest with its correlation with the omitted variables, which cause bias. What is di±cult is nding suitable variables with which to construct instruments. We will deal with this matter at length later in the course. 22

24 3.4. Prediction Prediction and Interpretation We have given one interpretation to the regression model as an estimate of the partial derivatives of a function, i.e., the e ects of a set of independent variables holding constant the values of the other independent variables. In constructing this de nition we began with the de nition of an e ect as the di erence in the conditional mean of Y across two distinct values of X. And, a causal e ect assumes that we hold all else constant. Another important way to interpret conditional means and regressions is as predicted values. Indeed, sometimes the goal of an analysis is not to estimate e ects but to generate predictions. For example, one might be asked to formulate a prediction about the coming presidential election. A common sort of election forecasting model regresses the incumbent president's vote share on the rate of growth in the economy plus a measure of presidential popularity plus a measure of party identi cation in the public. Based on elections since 1948, that regression has the following coe±cients: Vote = xxx + xxxgrowth + xxxp opularity + xxxxp arty: We then consider plausible values for Growth, Popularity, and Partisanship to generate predictions about the Vote. For any set of values of X, say x 0, the most likely value or expected value of Y is y 0 = E[Y jx = x 0 ]. This value is calculated in a straightforward manner from the estimated regression. Let x 0 be a row vector of values of the independent variables for which a prediction is made. I.e., x 0 =(x 1 ; x 2 ;:::x k ). The predicted value is calculated as X K 0 0 y^ = x 0 b = b 0 + b j x j : j=1 It is standard practice to set variables equal to their mean value if no speci c value is of interest in a prediction. One should be somewhat careful in the choice of predicted values so that the value does not lie too far out of the set of values on which the regression was originally estimated. 23

25 Consider the presidential election example. Assume a growth rate of 2 percent, a Popularity rating of 50 percent, and a Republican Party Identi cation of 50 Percent (equal split between the parties), then Bush is predicted to receive xxx percent of the two-party presidential vote in The predicted value or forecast is itself subject to error. To measure the forecast error we construct the deviation of the observed value from the \true" value, which we wish to predict. The true value is itself a random variable: y 0 = x 0 + ² 0. The prediction or forecast error is the deviation of the predicted value from the true value: 0 0 e = y y^0 = x 0 ( b)+ ² 0 The varince of the prediction error is V [e ]= ¾ 2 + V [(x 0 ( b)] = ¾ 2 +(x 0 E[( b)( b) 0 ]x 0 )= ¾ 2 + ¾ 2 ² (x 0 (X 0 X) 1 x 0 ) ² ² ² We can use this last result to construct the 95 percent con dence interval for the predicted value: q 0 y^ 1:96 V [e 0 ] As a practical matter this might become somewhat cumbersome to do. A quick way to generate prediction con dence intervals is with an \augmented" regression. Suppose we have estimated a regression using n observations, and we wish to construct several di erent predicted values based on di erent sets of values for X, say X 0. A handy trick is to add the matrix of values X 0 to the bottom of the X matrix. That is add n 0 observations to your data set for which the values of the indepedent variables are the appropriate values of X 0. Let the dependent variable equal 0 for all of these values. Finally, add n 0 columns to your dataset that equal -1 for each new observation. Now regress y on X and the new set of dummy variables. The resulting estimates will reproduce the original regression and will have coe±cient estimates for each of the independent variables. The coe±cients on the dummy variables equal 24

26 the predicted values and the standard errors of these estimates are the correct prediction standard errors. As an example, consider the analysis of the relationship between voting weights and posts in parliamentary governments. Let's let regression calculate the predicted values and standard errors for 6 distinct cases: Voting Weight =.25 and Formateur, Voting Weight =.25 and Not Formateur, Voting Weight =.35 and Formateur, Voting Weight =.35 and Not Formateur, Voting Weight =.45 and Formateur, and Voting Weight =.45 and Not Formateur. First, I ran the regression of Share of Posts on Share of Voting Weight plus an Indicator variable of The Party that formed the government (Formateur). I then ran the regression with 6 additional observations, constructed as described above. Below are the estimated coe±cients and standard errors (constants not reported). Using Augmented Regression To Calculate Predicted Values Variable Coe. (SE) Coe. (SE) Voting Weight Formateur D 1 (.25, 1) D 2 (.25, 0) D 3 (.35, 1) D 4 (.35, 0) D 5 (.45, 1) D 6 (.45, 0).9812 (.0403).2300 (.0094) { { { { { {.9812 (.0403).2300 (.0094).5572 (.0952).3272 (.0952).6554 (.0953).4253 (.0955).7535 (.0955).5235 (.0960) Model Checking Predicted values allow us to detect deviations from most of the assumptions of the regression model. The one assumption we cannot validate is the assumption that X and ² are uncorrelated. The residual vector, e, is de ned to be orthogonal to the independent variables, X: e 0 X = 0. This implies that e 0^y = e 0 Xb = 0. This is a restriction, so we cannot test how well the data approximate or agree with this assumption. The other assumptions { linearity, homoskedasticity, no autocorrelation, and normality { are readily veri ed, and xed. A useful diagnostic tool is a residual plot. Graph the 25

27 residuals from a regression against the predicted values. This plot will immediately show many problems, if they exist. We expect an elliptical cloud centered around e =0. If the underlying model is non-linear, the residual plot will re ect the deviations of the data from a straightline t through the data. Data generated from a quadratic concave function will have negative residuals for low values of ^y, then positive for intermediate values of y, ^ then negative for large values of y. ^ If there is heteroskedasticity, the residual plot will show deviations non-constant deviations around e = 0. A common case arises when the residual plot looks like a funnel. This situation means that the e ects are multiplicative. That is the model is y = X ² (where ² takes only positive values), not y = + X + epsilon. This is readily xed by taking logarithms, so that the model becomes: y = log( )+ log(x )+ log(²). A further plot for measuring heteroskedasticity is of e 2 against ^y, against a particular X variable, or against some other factor, such as the \size" of the unit. In this plot e 2 serves as the estimate of the variance. This is an enlightening plot when the variance is a function of a particular X or when we are dealing with aggregated data, where the aggregates consist of averages of variables across places of di erent populations. To detect autocorrelation we use a slightly di erent plot. Suppose the units are indexed by time, say years, t. We may examine the extent of autocorrelation by taking the correlations between observations that are s units apart: P T t=1 e te t s r s = P T 2 t=1 e t The correlation between an observation and the previous observation is r 1. Thisiscalled rst-order autocorrelation. Another form of autocorrelation, especially in monthly economic data, is seasonal variation, which is captured with s = 12. It is instructive to plot the estimated autocorrelation against s, where s runs from 0 to a relatively large number, say 1/10th of T. What should this plot look like? At s = 0, the autocorrelation parameter is just ¾ 2 = ² ¾ 2 u 1 ½ 2. The values of r s for s > 0 depends on the nature of the autocorrelation structure. 26

28 Let's take a closer look. The most basic and commonly analyzed autocorrelation structure involves an autoregression of order 1 (or AR-1). Let u t be an error term that is independent of ² t. First-order autocorrelation in ² is of the form: ² t = ½² t 1 + u t Thevarianceof ² t follows immediately from the de nition of the variance: ¾ 2 ² = E[(½² t 1 + u t )(½² t 1 + u t )] = ½ 2 ¾ 2 epsison + ¾2 u Solving for ¾ 2 epsilon : ¾ 2 ¾ 2 ² u = : (1 ½ 2 ) To derive, the correlation between t and t s, we must derive the covariance rst. For s = 1, the covariance between two observations is E[² t ² t 1 ]= E [(½² t 1 + u t )² t 1 ]= ½¾ 2 ² Using repeated substitutions for ² we nd that for an AR-1: t E[² t ² t s ]= ½ s ¾ 2 ² Now, we can state what we expect to observe in the r s when the residuals contain rstorder autocorrelation: Cov(² t ;² t s ) ½ s 2 ¾ ½ ² s = q = = ½ s V (² ¾ 2 t )V (² t 1 ) ² There are two patterns that may appear in the autocorrelation plot, depending on the sign of ½. If ½> 0, the plot should decline exponentially toward 0. For examle, suppose ½ = :5, then we expect r 0 =1, r 1 = :5, r 2 = :25, r 3 = :125, r 4 = :0625, r 5 = : If ½ < 0, the plot will seesaw, converging on 0. For example, suppose ½ = :5, then we expect r 0 =1, r 1 = :5, r 2 = :25, r 3 = :125, r 4 = :0625, r 5 = : Higher order auto-correlation structures { such as ² t = ½ 1 ² t 1 + ½ 2 ² t 2 + u t {lead to more complicated patterns. We may test for the appropriateness of a particular structure 27

29 by comparing the estimated autocorrelations, r s, with the values implied by a structure. For example, we may test for rst order autocorrelation by comparing the observed r 's with s those implied by the AR-1 model when ½^ = r 1. A simple rule of thumb applies for all autocorrelation structures. If r s <:2 for all s, then there is no signi cant degree of autocorrelation. 28

30 3.5. Inference General Framework. Hypotheses. What is an hypothesis? An hypothesis is a statement about the data derived from an argument, model, or theory. It usually takes the form of a claim about the behavior of parameters of the distribution function or about the e ect of one variable on another. For example, a simple argument about voting holds that in the absence of other factors, such as incumbency, voters use party to determine their candidate of choice. Therefore, when no incumbent is on the ticket, an additional one-percent Democratic in an electoral district should translate into one-percent higher Democratic vote for a particular o±ce. In a regression model, controlling for incumbency, the slope on the normal vote ought to equal one. Of course there are a number of reasons why this hypothesis might fail to hold. The argument itself might be incorrect; there may be other factors beside incumbency that must be included in the model; the normal vote is hard to measure and we must use proxies, which introduce measurement error. In classical statistics, an hypothesis test is a probability statement. We reject an hypothesis if the probability of observing the data given that the hypothesis is true is su±ciently small, say below.05. Let T be the vector of estimated parameters and µ be the true parameters of the distribution. Let µ 0 be the values of the distribution posited by the hypothesis. Finally, let be the variance of T. If the hypothesis is true, then µ = µ 0. If the hypothesis is true, then the deviation of T from µ 0 should look like a random draw from the underlying distribution, and thus be unlikely to have occured by chance. What we have just described is the size of a test. We also care about the power of a test. If µ 0 is not true, what is the probability of observing a su±ciently small deviation that we do not reject the hypothesis? This depends on sample size and variance of X. 29

31 Tests of a Single Parameter. We may extend the framework for statistical tests about means and di erences of means to the case of tests about a single regression coe±cient. Recall that the classical hypothesis test for a single mean was: s Pr(j¹x ¹ 0 j > t =2;n 1 p n ) < Because ¹x is the sum of random variables, its distribution is approximately normal. However, because we must estimate the standard deviation of X, s, the t-distribution is used as the reference distribution for the hypothesis test. The test criterion can be rewritten as follows. x ¹ 0 j We reject the hypothesis if j¹ >t =2;n 1. s= p n There is an important duality between the test criterion above and the con dence interval. The test criterion for size.05 may be rewritten as: ¾ ¾ Pr(¹ x t :025;n 1 p <¹ 0 < x ¹ + t :025;n 1 p ) >:95 n n So, we can test the hypothesis by ascertaining whether the hypothesized value falls inside the 95 percent con dence interval. Now consider the regression coe±cient,. Suppose our hypothesis is H : = 0. A common value is 0 = 0 { i.e., no e ect of X on Y. Like the sample average, a single regression parameter follows the normal distribution, because the regression parameter is P ¾ 2 ² the sum of random variables. The mean of this distribution is and the variance, (x x) 2 i ¹ in the case of a bivariate regression, or, more generally, ¾ 2 ² a jj,where a jj is the jth diagonal element of the matrix (X 0 X) 1. The test criterion for the hypothesis states the following. If the null hypothesis is true, then we expect that the probability of a large standardized deviation b from 0 will be unlikely to have occurred by chance: Pr(jb 0 j > t =2;n K s p a jj ) < As with the sample mean, this test criterion can be expressed as follows. We reject the jb 0 j hypothesized value if s p ajj >t =2;n K. 30

Lecture Notes Part 7: Systems of Equations

Lecture Notes Part 7: Systems of Equations 17.874 Lecture Notes Part 7: Systems of Equations 7. Systems of Equations Many important social science problems are more structured than a single relationship or function. Markets, game theoretic models,

More information

1 Correlation between an independent variable and the error

1 Correlation between an independent variable and the error Chapter 7 outline, Econometrics Instrumental variables and model estimation 1 Correlation between an independent variable and the error Recall that one of the assumptions that we make when proving the

More information

Lecture Notes Part 4: Nonlinear Models

Lecture Notes Part 4: Nonlinear Models 17.874 Lecture Notes Part 4: Nonlinear Models 4. Non-Linear Models 4.1. Functional Form We have discussed one important violation of the regression assumption { omitted variables. And, we have touched

More information

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

1 The Multiple Regression Model: Freeing Up the Classical Assumptions 1 The Multiple Regression Model: Freeing Up the Classical Assumptions Some or all of classical assumptions were crucial for many of the derivations of the previous chapters. Derivation of the OLS estimator

More information

Lecture Notes Part 2: Matrix Algebra

Lecture Notes Part 2: Matrix Algebra 17.874 Lecture Notes Part 2: Matrix Algebra 2. Matrix Algebra 2.1. Introduction: Design Matrices and Data Matrices Matrices are arrays of numbers. We encounter them in statistics in at least three di erent

More information

Online Appendix to The Political Economy of the U.S. Mortgage Default Crisis Not For Publication

Online Appendix to The Political Economy of the U.S. Mortgage Default Crisis Not For Publication Online Appendix to The Political Economy of the U.S. Mortgage Default Crisis Not For Publication 1 Robustness of Constituent Interest Result Table OA1 shows that the e ect of mortgage default rates on

More information

1 A Non-technical Introduction to Regression

1 A Non-technical Introduction to Regression 1 A Non-technical Introduction to Regression Chapters 1 and Chapter 2 of the textbook are reviews of material you should know from your previous study (e.g. in your second year course). They cover, in

More information

3 Random Samples from Normal Distributions

3 Random Samples from Normal Distributions 3 Random Samples from Normal Distributions Statistical theory for random samples drawn from normal distributions is very important, partly because a great deal is known about its various associated distributions

More information

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley Time Series Models and Inference James L. Powell Department of Economics University of California, Berkeley Overview In contrast to the classical linear regression model, in which the components of the

More information

Lecture Notes on Measurement Error

Lecture Notes on Measurement Error Steve Pischke Spring 2000 Lecture Notes on Measurement Error These notes summarize a variety of simple results on measurement error which I nd useful. They also provide some references where more complete

More information

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43 Panel Data March 2, 212 () Applied Economoetrics: Topic March 2, 212 1 / 43 Overview Many economic applications involve panel data. Panel data has both cross-sectional and time series aspects. Regression

More information

Economics 241B Estimation with Instruments

Economics 241B Estimation with Instruments Economics 241B Estimation with Instruments Measurement Error Measurement error is de ned as the error resulting from the measurement of a variable. At some level, every variable is measured with error.

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Business Statistics 41000: Homework # 5

Business Statistics 41000: Homework # 5 Business Statistics 41000: Homework # 5 Drew Creal Due date: Beginning of class in week # 10 Remarks: These questions cover Lectures #7, 8, and 9. Question # 1. Condence intervals and plug-in predictive

More information

1 Fixed E ects and Random E ects

1 Fixed E ects and Random E ects 1 Fixed E ects and Random E ects Estimation 1.1 Fixed E ects Introduction Fixed e ects model: y it = + x it + f i + it E ( it jx it ; f i ) = 0 Suppose we just run: y it = + x it + it Then we get: ^ =

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Environmental Econometrics

Environmental Econometrics Environmental Econometrics Syngjoo Choi Fall 2008 Environmental Econometrics (GR03) Fall 2008 1 / 37 Syllabus I This is an introductory econometrics course which assumes no prior knowledge on econometrics;

More information

The regression model with one stochastic regressor (part II)

The regression model with one stochastic regressor (part II) The regression model with one stochastic regressor (part II) 3150/4150 Lecture 7 Ragnar Nymoen 6 Feb 2012 We will finish Lecture topic 4: The regression model with stochastic regressor We will first look

More information

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) 1 2 Model Consider a system of two regressions y 1 = β 1 y 2 + u 1 (1) y 2 = β 2 y 1 + u 2 (2) This is a simultaneous equation model

More information

Lecture 4: Linear panel models

Lecture 4: Linear panel models Lecture 4: Linear panel models Luc Behaghel PSE February 2009 Luc Behaghel (PSE) Lecture 4 February 2009 1 / 47 Introduction Panel = repeated observations of the same individuals (e.g., rms, workers, countries)

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations. Exercises for the course of Econometrics Introduction 1. () A researcher is using data for a sample of 30 observations to investigate the relationship between some dependent variable y i and independent

More information

Linear Models in Econometrics

Linear Models in Econometrics Linear Models in Econometrics Nicky Grant At the most fundamental level econometrics is the development of statistical techniques suited primarily to answering economic questions and testing economic theories.

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. B203: Quantitative Methods Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. Part I: Compulsory Questions. Answer all questions. Each question carries

More information

6. Assessing studies based on multiple regression

6. Assessing studies based on multiple regression 6. Assessing studies based on multiple regression Questions of this section: What makes a study using multiple regression (un)reliable? When does multiple regression provide a useful estimate of the causal

More information

Chapter 1. GMM: Basic Concepts

Chapter 1. GMM: Basic Concepts Chapter 1. GMM: Basic Concepts Contents 1 Motivating Examples 1 1.1 Instrumental variable estimator....................... 1 1.2 Estimating parameters in monetary policy rules.............. 2 1.3 Estimating

More information

4.8 Instrumental Variables

4.8 Instrumental Variables 4.8. INSTRUMENTAL VARIABLES 35 4.8 Instrumental Variables A major complication that is emphasized in microeconometrics is the possibility of inconsistent parameter estimation due to endogenous regressors.

More information

ECON 594: Lecture #6

ECON 594: Lecture #6 ECON 594: Lecture #6 Thomas Lemieux Vancouver School of Economics, UBC May 2018 1 Limited dependent variables: introduction Up to now, we have been implicitly assuming that the dependent variable, y, was

More information

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails GMM-based inference in the AR() panel data model for parameter values where local identi cation fails Edith Madsen entre for Applied Microeconometrics (AM) Department of Economics, University of openhagen,

More information

The Causal Inference Problem and the Rubin Causal Model

The Causal Inference Problem and the Rubin Causal Model The Causal Inference Problem and the Rubin Causal Model Lecture 2 Rebecca B. Morton NYU Exp Class Lectures R B Morton (NYU) EPS Lecture 2 Exp Class Lectures 1 / 23 Variables in Modeling the E ects of a

More information

Econometrics Midterm Examination Answers

Econometrics Midterm Examination Answers Econometrics Midterm Examination Answers March 4, 204. Question (35 points) Answer the following short questions. (i) De ne what is an unbiased estimator. Show that X is an unbiased estimator for E(X i

More information

ECNS 561 Multiple Regression Analysis

ECNS 561 Multiple Regression Analysis ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking

More information

Quantitative Techniques - Lecture 8: Estimation

Quantitative Techniques - Lecture 8: Estimation Quantitative Techniques - Lecture 8: Estimation Key words: Estimation, hypothesis testing, bias, e ciency, least squares Hypothesis testing when the population variance is not known roperties of estimates

More information

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors: Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35 Rewrap ECON 4135 November 18, 2011 () Rewrap ECON 4135 November 18, 2011 1 / 35 What should you now know? 1 What is econometrics? 2 Fundamental regression analysis 1 Bivariate regression 2 Multivariate

More information

THE MULTIVARIATE LINEAR REGRESSION MODEL

THE MULTIVARIATE LINEAR REGRESSION MODEL THE MULTIVARIATE LINEAR REGRESSION MODEL Why multiple regression analysis? Model with more than 1 independent variable: y 0 1x1 2x2 u It allows : -Controlling for other factors, and get a ceteris paribus

More information

Föreläsning /31

Föreläsning /31 1/31 Föreläsning 10 090420 Chapter 13 Econometric Modeling: Model Speci cation and Diagnostic testing 2/31 Types of speci cation errors Consider the following models: Y i = β 1 + β 2 X i + β 3 X 2 i +

More information

2 Prediction and Analysis of Variance

2 Prediction and Analysis of Variance 2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering

More information

Advanced Quantitative Research Methodology Lecture Notes: January Ecological 28, 2012 Inference1 / 38

Advanced Quantitative Research Methodology Lecture Notes: January Ecological 28, 2012 Inference1 / 38 Advanced Quantitative Research Methodology Lecture Notes: Ecological Inference 1 Gary King http://gking.harvard.edu January 28, 2012 1 c Copyright 2008 Gary King, All Rights Reserved. Gary King http://gking.harvard.edu

More information

ECONOMETRICS HONOR S EXAM REVIEW SESSION

ECONOMETRICS HONOR S EXAM REVIEW SESSION ECONOMETRICS HONOR S EXAM REVIEW SESSION Eunice Han ehan@fas.harvard.edu March 26 th, 2013 Harvard University Information 2 Exam: April 3 rd 3-6pm @ Emerson 105 Bring a calculator and extra pens. Notes

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria SOLUTION TO FINAL EXAM Friday, April 12, 2013. From 9:00-12:00 (3 hours) INSTRUCTIONS:

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

review session gov 2000 gov 2000 () review session 1 / 38

review session gov 2000 gov 2000 () review session 1 / 38 review session gov 2000 gov 2000 () review session 1 / 38 Overview Random Variables and Probability Univariate Statistics Bivariate Statistics Multivariate Statistics Causal Inference gov 2000 () review

More information

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han Econometrics Honor s Exam Review Session Spring 2012 Eunice Han Topics 1. OLS The Assumptions Omitted Variable Bias Conditional Mean Independence Hypothesis Testing and Confidence Intervals Homoskedasticity

More information

Introduction to Game Theory

Introduction to Game Theory COMP323 Introduction to Computational Game Theory Introduction to Game Theory Paul G. Spirakis Department of Computer Science University of Liverpool Paul G. Spirakis (U. Liverpool) Introduction to Game

More information

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1 PANEL DATA RANDOM AND FIXED EFFECTS MODEL Professor Menelaos Karanasos December 2011 PANEL DATA Notation y it is the value of the dependent variable for cross-section unit i at time t where i = 1,...,

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 4 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 23 Recommended Reading For the today Serial correlation and heteroskedasticity in

More information

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1 Markov-Switching Models with Endogenous Explanatory Variables by Chang-Jin Kim 1 Dept. of Economics, Korea University and Dept. of Economics, University of Washington First draft: August, 2002 This version:

More information

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc. Notes on regression analysis 1. Basics in regression analysis key concepts (actual implementation is more complicated) A. Collect data B. Plot data on graph, draw a line through the middle of the scatter

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 24, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

Econometrics Homework 1

Econometrics Homework 1 Econometrics Homework Due Date: March, 24. by This problem set includes questions for Lecture -4 covered before midterm exam. Question Let z be a random column vector of size 3 : z = @ (a) Write out z

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

Multiple Choice Questions (circle one part) 1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e

Multiple Choice Questions (circle one part) 1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e Economics 102: Analysis of Economic Data Cameron Fall 2005 Department of Economics, U.C.-Davis Final Exam (A) Tuesday December 16 Compulsory. Closed book. Total of 56 points and worth 40% of course grade.

More information

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

NOWCASTING THE OBAMA VOTE: PROXY MODELS FOR 2012

NOWCASTING THE OBAMA VOTE: PROXY MODELS FOR 2012 JANUARY 4, 2012 NOWCASTING THE OBAMA VOTE: PROXY MODELS FOR 2012 Michael S. Lewis-Beck University of Iowa Charles Tien Hunter College, CUNY IF THE US PRESIDENTIAL ELECTION WERE HELD NOW, OBAMA WOULD WIN.

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information

Polling and sampling. Clement de Chaisemartin and Douglas G. Steigerwald UCSB

Polling and sampling. Clement de Chaisemartin and Douglas G. Steigerwald UCSB Polling and sampling. Clement de Chaisemartin and Douglas G. Steigerwald UCSB 1 What pollsters do Pollsters want to answer difficult questions: As of November 1 st 2016, what % of the Pennsylvania electorate

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Physics 509: Non-Parametric Statistics and Correlation Testing

Physics 509: Non-Parametric Statistics and Correlation Testing Physics 509: Non-Parametric Statistics and Correlation Testing Scott Oser Lecture #19 Physics 509 1 What is non-parametric statistics? Non-parametric statistics is the application of statistical tests

More information

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation Inference about Clustering and Parametric Assumptions in Covariance Matrix Estimation Mikko Packalen y Tony Wirjanto z 26 November 2010 Abstract Selecting an estimator for the variance covariance matrix

More information

Economics Introduction to Econometrics - Fall 2007 Final Exam - Answers

Economics Introduction to Econometrics - Fall 2007 Final Exam - Answers Student Name: Economics 4818 - Introduction to Econometrics - Fall 2007 Final Exam - Answers SHOW ALL WORK! Evaluation: Problems: 3, 4C, 5C and 5F are worth 4 points. All other questions are worth 3 points.

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Most of this course will be concerned with use of a regression model: a structure in which one or more explanatory

More information

Economics 620, Lecture 7: Still More, But Last, on the K-Varable Linear Model

Economics 620, Lecture 7: Still More, But Last, on the K-Varable Linear Model Economics 620, Lecture 7: Still More, But Last, on the K-Varable Linear Model Nicholas M. Kiefer Cornell University Professor N. M. Kiefer (Cornell University) Lecture 7: the K-Varable Linear Model IV

More information

STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Saturday, May 9, 008 Examination time: 3

More information

1 Regression with Time Series Variables

1 Regression with Time Series Variables 1 Regression with Time Series Variables With time series regression, Y might not only depend on X, but also lags of Y and lags of X Autoregressive Distributed lag (or ADL(p; q)) model has these features:

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator Chapter 6: Endogeneity and Instrumental Variables (IV) estimator Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans December 15, 2013 Christophe Hurlin (University of Orléans)

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Intro to Economic analysis

Intro to Economic analysis Intro to Economic analysis Alberto Bisin - NYU 1 Rational Choice The central gure of economics theory is the individual decision-maker (DM). The typical example of a DM is the consumer. We shall assume

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

1. The Multivariate Classical Linear Regression Model

1. The Multivariate Classical Linear Regression Model Business School, Brunel University MSc. EC550/5509 Modelling Financial Decisions and Markets/Introduction to Quantitative Methods Prof. Menelaos Karanasos (Room SS69, Tel. 08956584) Lecture Notes 5. The

More information

ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University

ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University Instructions: Answer all four (4) questions. Be sure to show your work or provide su cient justi cation for

More information

LECTURE 13: TIME SERIES I

LECTURE 13: TIME SERIES I 1 LECTURE 13: TIME SERIES I AUTOCORRELATION: Consider y = X + u where y is T 1, X is T K, is K 1 and u is T 1. We are using T and not N for sample size to emphasize that this is a time series. The natural

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics T H I R D E D I T I O N Global Edition James H. Stock Harvard University Mark W. Watson Princeton University Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models James J. Heckman and Salvador Navarro The University of Chicago Review of Economics and Statistics 86(1)

More information

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Econometrics - 30C00200

Econometrics - 30C00200 Econometrics - 30C00200 Lecture 11: Heteroskedasticity Antti Saastamoinen VATT Institute for Economic Research Fall 2015 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business

More information

In the previous chapter, we learned how to use the method of least-squares

In the previous chapter, we learned how to use the method of least-squares 03-Kahane-45364.qxd 11/9/2007 4:40 PM Page 37 3 Model Performance and Evaluation In the previous chapter, we learned how to use the method of least-squares to find a line that best fits a scatter of points.

More information

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47 ECON2228 Notes 2 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 2 2014 2015 1 / 47 Chapter 2: The simple regression model Most of this course will be concerned with

More information

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Ref.:   Spring SOS3003 Applied data analysis for social science Lecture note SOS3003 Applied data analysis for social science Lecture note 05-2010 Erling Berge Department of sociology and political science NTNU Spring 2010 Erling Berge 2010 1 Literature Regression criticism I Hamilton

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

Advanced Econometrics I

Advanced Econometrics I Lecture Notes Autumn 2010 Dr. Getinet Haile, University of Mannheim 1. Introduction Introduction & CLRM, Autumn Term 2010 1 What is econometrics? Econometrics = economic statistics economic theory mathematics

More information

Introduction: structural econometrics. Jean-Marc Robin

Introduction: structural econometrics. Jean-Marc Robin Introduction: structural econometrics Jean-Marc Robin Abstract 1. Descriptive vs structural models 2. Correlation is not causality a. Simultaneity b. Heterogeneity c. Selectivity Descriptive models Consider

More information

Parametric Inference on Strong Dependence

Parametric Inference on Strong Dependence Parametric Inference on Strong Dependence Peter M. Robinson London School of Economics Based on joint work with Javier Hualde: Javier Hualde and Peter M. Robinson: Gaussian Pseudo-Maximum Likelihood Estimation

More information

EC402 - Problem Set 3

EC402 - Problem Set 3 EC402 - Problem Set 3 Konrad Burchardi 11th of February 2009 Introduction Today we will - briefly talk about the Conditional Expectation Function and - lengthily talk about Fixed Effects: How do we calculate

More information

Statistical Inference with Regression Analysis

Statistical Inference with Regression Analysis Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing

More information

We begin by thinking about population relationships.

We begin by thinking about population relationships. Conditional Expectation Function (CEF) We begin by thinking about population relationships. CEF Decomposition Theorem: Given some outcome Y i and some covariates X i there is always a decomposition where

More information

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems Functional form misspecification We may have a model that is correctly specified, in terms of including

More information

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e Economics 102: Analysis of Economic Data Cameron Spring 2016 Department of Economics, U.C.-Davis Final Exam (A) Tuesday June 7 Compulsory. Closed book. Total of 58 points and worth 45% of course grade.

More information

Multivariate Regression: Part I

Multivariate Regression: Part I Topic 1 Multivariate Regression: Part I ARE/ECN 240 A Graduate Econometrics Professor: Òscar Jordà Outline of this topic Statement of the objective: we want to explain the behavior of one variable as a

More information

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i 1/34 Outline Basic Econometrics in Transportation Model Specification How does one go about finding the correct model? What are the consequences of specification errors? How does one detect specification

More information

Greene, Econometric Analysis (7th ed, 2012)

Greene, Econometric Analysis (7th ed, 2012) EC771: Econometrics, Spring 2012 Greene, Econometric Analysis (7th ed, 2012) Chapters 2 3: Classical Linear Regression The classical linear regression model is the single most useful tool in econometrics.

More information

TESTING FOR CO-INTEGRATION

TESTING FOR CO-INTEGRATION Bo Sjö 2010-12-05 TESTING FOR CO-INTEGRATION To be used in combination with Sjö (2008) Testing for Unit Roots and Cointegration A Guide. Instructions: Use the Johansen method to test for Purchasing Power

More information

Ability Bias, Errors in Variables and Sibling Methods. James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006

Ability Bias, Errors in Variables and Sibling Methods. James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006 Ability Bias, Errors in Variables and Sibling Methods James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006 1 1 Ability Bias Consider the model: log = 0 + 1 + where =income, = schooling,

More information

MORE ON MULTIPLE REGRESSION

MORE ON MULTIPLE REGRESSION DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 MORE ON MULTIPLE REGRESSION I. AGENDA: A. Multiple regression 1. Categorical variables with more than two categories 2. Interaction

More information