POLI 618 Notes. Stuart Soroka, Department of Political Science, McGill University. March 2010
|
|
- Dina James
- 5 years ago
- Views:
Transcription
1 POLI 618 Notes Stuart Soroka, Department of Political Science, McGill University March 2010 These pages were written originally as my own lecture notes, but are now designed to be distributed to students taking the stats methods course Poli 618 at McGill University. They are also freely available online, at snsoroka.com. The notes draw on a good number of statistics texts, including Kennedy s Econometrics, Greene s Econometric Analysis, and a number of volumes in Sage s quantitative methods series. That said, please do keep in mind that they are just lecture notes there are errors and omissions, and there is for no single topic enough information included in this file to learn statistics from the notes alone. (There are of course many textbooks that are better equipped for that purpose.) The notes are nonetheless a useful background guide to Poli 618 and perhaps, more generally, to some of the basic statistics most common in empirical political science. If you find errors (and you will), please do let me know. Thanks, Stuart Soroka stuart.soroka@mcgill.ca
2 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 2 Table of Contents Variance, Covariance and Correlation... 3 Introducing Bivariate Ordinary Least Squares Regression... 5 Multivariate Ordinary Least Squares Regression Error, and Model Fit Assumptions of OLS regression Nonlinearities Collinearity and Multicollinearity Heteroskedasticity Outliers Models for dichotomous data Linear Probability Models Nonlinear Probability Model: Logistic Regression An Alternative Description: The Latent Variable Model Nonlinear Probability Model: Probit Regression Maximum Likelihood Estimation Interpretation & Goodness of Fit Measures for Categorical Models Models for Categorical Data Ordinal Outcomes Nominal Outcomes Times Series: Autocorrelation Univariate Statistics Bivariate Statistics Multivariate Models Significance Tests Distribution Functions The chi-square test The t test The F Test Factor Analysis Background: Correlations and Factor Analysis An Algebraic Description Factor Analysis Results Rotated Factor Analyses... 54
3 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 3 Variance, Covariance and Correlation Let s begin with Y i, a continuous variable measuring some value for each individual (i) in a representative sample of the population. Y i can be income, or age, or a thermometer score expressing degrees of approval for a presidential candidate. Variance in our variable Y i is calculated as follows: (Yi (1) S2 Y = Ȳ )2 N 1, or (2) S 2 Y = N( Y 2 i ) ( Y i ) 2 N(N 1), where both versions are equivalent, and the latter is referred to as the computational formula (because it is, in principle, easier to calculate by hand). Note that the equation is pretty simple: we are interested in variance in Y i, and Equation 1 is basically taking the average of each individual Y i s variance around the mean (Y ). There are a few tricky parts. First, the differences between each individual Y i and Y (that is, Y i Y ) are squared in Equation 1, so that negative values do not cancel out positive values (since squaring will lead to only positive values). Second, we use N-1 as the denominator rather than N (where N is the number of cases). This produces a more conservative (slightly inflated) result, in light of the fact that we re working with a sample variance rather than the population variance that is, the values of Y i in our (hopefully) representative sample, and the values of Y i that we believe may exist in the total real-world population. For a small-n samples, where we might suspect that we under-estimate the variance in the population, using N-1 effectively adjusts the estimated variance upwards. With a large-n sample, the difference between N-1 and N is increasingly marginal. That the adjustment matters more for small sample than for big samples reflects our increasing confidence in the representative-ness of our sample as it increases. (Note that some texts distinguish between S Y 2 and σ Y 2, where the Roman S is the sample variance and the Greek σ is the population variance. Indeed, some texts will distinguish between sample values and population values using Roman and Greek versions across the board B for an estimated slope coefficient, for instance, and β for an actual slope in the population. I am not this systematic below.) The standard deviation is a simple function of variance:
4 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 4 (Yi (3) S Y = S 2Y = Ȳ )2 N 1, So standard deviations are also indications of the extent to which a given variable varies around its mean. S Y is important for understanding distributions and significance tests, as we shall see below. So far, we ve looked only at univariate statistics statistics describing a single variable. Most of the time, though, what we want to do is describe relationships between two (or more) variables. Covariance a measure of common variance between two variables, or how much two variables change together is calculated as follows: (4) S XY = (Xi X)(Y i Ȳ ) N 1, or (5) S XY = N X i Y i X i Yi N(N 1), the latter of which is the computational formula. Again, we use N-1 as the denominator, for the same reasons as above. Pearson s correlation coefficient is also based on a ratio of covariances and standard deviations, as follows: (6) r = S XY S X S Y, or (7) r = (Xi X)(Y i Ȳ ) (Xi X) 2 (Y i Ȳ )2. where S XY is the sample covariance between X i and Y i, and S X and S Y are the sample standard deviations of X i and Y i respectively. (Note the relationship between this Equation 7, and the preceding equations for standard deviations and covariances, Equation 3 and Equation 4.)
5 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 5 Introducing Bivariate Ordinary Least Squares Regression Take a simple data series, and plot it X Y What we want to do is describe the relationship between X and Y. Essentially, we want to draw a line between the dots, and describe that line. Given that the data here are relatively simple, we can just do this by hand, and describe it using two basic properties, α and β : where α, the constant, is in this case equal to 1, and β, the slope, is 1 (the increase in Y) divided by 2 (the increase in X) =.5. So we can produce an
6 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 6 equation for this line allowing us to predict values of Y based on values of X. The general model is, (8) Y i = α + βx i And the particular model in this case is Y = 1 +.5X. Note that the constant is simply a function of the means of both X and Y, along with the slope. That is: (9) α = Ȳ β X X Y mean So, following Equation 9, α = Ȳ β X = 3.5 (.5)*5 = = 1. This is pretty simple. The difficulty is that data aren t like this they don t fall along a perfect line. They re likely more like this: X Y Now, note that we can draw any number of lines that will satisfy Equation 8. All that matters is that the line goes through the means of X and Y. So the means are:
7 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 7 X Y mean And let s make up an equation where Y=3.75 when X=5 Y = α + β X 3.75 = α + (β )* = 4 + (β )* = 4 + (-.05)* = 4 + (-.25) So here it is: Y = 4 + (-.05)X. Plotted, it looks like this: Note that this new model has to be expressed in a slightly different manner, including an error term: (10) Y i = α + βx i + i, or, alternatively: (11) Y i = Ŷi + i, where are the estimated values of the actual Y i, and where the error can be expressed in the following ways: (12) i = Y i Ŷ. or i = Y i (α + βx i ). So we ve now accounted for the fact that we work with messy data, and that there will consequently be a certain degree of error in the model. This is
8 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 8 inevitable, of course, since we re trying to draw a straight line through points that are unlikely to be perfectly distributed along a straight line. Of course, the line above won t do it quite clearly does not describe the relationship between X and Y. What we need is a method of deriving a model that better describes the effect that X has on Y essentially, a method that draws a line that comes as close to all the dots as possible. Or, more precisely, a model that minimizes the total amount of error(ε i ). We first need a measure of the total amount of error the degree to which our predictions miss the actual values of Y i. We can t simply take the sum of all errors, i, because positive and negative errors can cancel each other out. We could take the sum of the absolute values, i, which in fact is used in some estimations. The norm is to use the sum of squared errors, the SSE or 2 i. This sum is most greatly affected by large errors by squaring residuals, large residuals take on very large magnitudes. An estimation of Equation 10 that tries to minimize 2 i accordingly tries especially hard to avoid large errors. (By implication, outlying cases will have a particularly strong effect on the overall estimation. We return to this in the section on outliers below.) This is what we are trying to do in ordinary least squares (OLS) regression: minimize the SSE, and have an estimate of β (on which our estimate of α relies) that comes as close to all the dots as is possible. Least-squares coefficients for simple bivariate regression are estimated as follows: (13) β = (Xi X)(Y i Ȳ ) (Xi X) 2, or
9 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 9 (14) β = N Y i X i Y i Xi N X 2 i ( X i ) 2. The latter is referred to as the computational formula, as it s supposed to be easier to compute by hand. (I actually prefer the former, which I find easier to compute, and has the added advantage of nicely illustrating the important features of OLS regression.) We can use Equation 13 to calculate the Least Squares estimate for the above data: The data Calculated values (used in Equation 13) X i Y i X i X Y i Y (X i X )(Y i Y ) (X i X ) X i =5 Y i =3.75 = 9 =20 So solving Equation 13 with the values above looks like this: β = (Xi X)(Y i Ȳ ) (Xi X) 2 = 9 20 =.45 And we can use these results in Equation 9 to find the constant: α = Ȳ β X =3.75 (.45) 5= = 1.5 So the final model looks like this: Y i =1.5+(.45) X i
10 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 10 Using this model, we can easily see what the individual predicted values ( ˆ Y i ) are, as well as the associated errors (ε i ): X i Y i Y ˆ i ε i = Y ˆ i Y i X i =5 Y i =3.75 One further note about Equation 13, and our means of estimating OLS slope coefficients: Recall the equations for variance (Equation 1) and covariance (Equation 4). If we take the ratio of covariance and variance, as follows, (15) S XY S 2 x = P (Xi X)(Y i Ȳ ) N 1 P (Xi ˆX) 2 N 1, we can adjust somewhat to produce the following, (16) S XY S 2 x = (Xi X)(Y i Ȳ ) (Xi ˆX) 2, where Equation 16 simply drops the N-1 denominators, which cancel each other out. More importantly, Equation 16 looks suspiciously indeed, exactly like the formula for β (Equation 13). β is thus essentially a ratio between the covariance between X and Y, and the variance of X, as follows: (17) β YX = S YX S 2 X This should make sense when we consider the standard interpretation of β : for a one-unit shift in X, how much does Y change?
11 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 11 Multivariate Ordinary Least Squares Regression Things are more complicated for multiple, or multivariate, regression, where there is more than one independent variable. The standard OLS multivariate model is nevertheless a relatively simple extension of bivariate regression imagine, for instance, plotting a line through dots plotted along two X axes, in what amounts to three-dimensional space: This is all we re doing in multivariate regression drawing a line through these dots, where values of Y are driven by a combination of X 1 and X 2, and where the model itself would be as follows: (18) Y i = α + β 1 X 1 i + β 2 X 2 i + i. That said, when we have more than two regressors, we start plotting lines through four- and five-dimensional space, and that gets hard to draw. Least squares coefficients for multiple regression with two regressors, as in Equation 18, are calculated as follows: (19)β 1 = ( (X X )(Y Y ) 1i 1 i (X 2i X 2 )) ( (X 2i X 2 )(Y i Y ) (X 1i X 1 )(X 2i X 2 )) ( (X 1i X 1 ) 2 (X 2i X 2 ) 2 ) ( (X 1i X 1 )(X 2i X 2 )) 2 and (20)β 2 = ( (X X )(Y Y ) 2i 2 i (X 1i X 1 )) ( (X 1i X 1 )(Y i Y ) (X 1i X 1 )(X 2i X 2 )), ( (X 1i X 1 ) 2 (X 2i X 2 ) 2 ) ( (X 1i X 1 )(X 2i X 2 )) 2 and the constant is now estimated as follows: (21) α = Ȳ β 1 X 1 β 2 X2.,
12 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 12 Error, and Model Fit The standard deviation of the residuals, or the standard error of the slope, is as follows, 2 i (22) SE β = N 2, Or, more generally, 2 (23) i SE β = N K 2, Equation 22 is the same as Equation 23, except that the former is a simple version that applies to bivariate regression only, and the latter is a more general version that applies to multivariate regression with any number of independent variables. N in these equations refers to the total number of cases, while K is the total number of independent variables in the model. The SE β is a useful measure of the fit of a regression slope it gives you the average error of the prediction. It s also used to test the significance of the slope coefficient. For instance, if we are going to be 95% confident that our estimate is significantly different from zero, zero should not fall within the interval β ± 2(SE β ). Alternatively, if we are using t-statistics to examine coefficients significance, then the ratio of β to SE β should be roughly 2. Assuming you remember the basic sampling and distributional material in your basic statistics course, this reasoning should sound familiar. Here s a quick refresher: Testing model fit is based on some standard beliefs about distributions. Normal distributions are unimodel, symmetric, and are described by the following probability distribution: (24) p(y )= e (Y µ Y ) 2 /2σ 2 Y 2πσ 2 Y where p(y) refers to the probability of a given value of Y, and where the shape of the curve is determined by only two values: the population mean,, and its variance,. (Also see our discussion of distribution functions, below.) Assuming two distributions with the same mean (of zero, for instance), the effect of changing variances is something like this:
13 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 13 We know that many natural phenomena follow a normal distribution. So we assume that many political phenomena do as well. Indeed, where the current case is concerned, we believe that our estimated slope coefficient, β, is one of a distribution of possible β s we might find in repeated samples. These β s are normally distributed, with a standard deviation that we try to estimate from our data. We also know that in any normal distribution, roughly 68% of all cases fall within plus or minus one standard deviation from the mean, and 95% of all cases fall within plus or minus two standard deviations from the mean. It follows that our slope should not be within two standard errors of zero. If it is, we cannot be 95% confidence that our coefficient is significantly different from zero that is, we cannot reject the null hypothesis that there is no significant effect. Going through this process step-by-step is useful. Let s begin with our estimated bivariate model from page 8, where the model is Y i = (.45)*X i, and the data are, X i Y i Y ˆ i ε i = Y ˆ i Y i 2 ε i X i =5 Y i = Based on Equation 22, we calculate the standard error of the slope as follows:
14 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 14 SE β = 2 i N 2 = 4 2 = 2 = 1.35 = 1.16 So, we can be 95% confident that the slope estimate in the population is.45 ± (2 1.16), or.45 ± Zero is certainly within this interval, so our results are not statistically significant. This is mainly due to our very small sample size. Imagine the same slope and SE β, but based on a sample of 200 cases: SE β = 2 i N 2 = = 198 =.014 =.118 Now we can be 95% confident that the slope estimate in the population is.45 ± (2.118), or.45 ±.236. Zero is not within this interval, so our results in this case would be statistically significant. Just to recap, our decision about the statistical significance of the slope is based on a combination of the magnitude of the slope (β ), the total amount of error in the estimate (using the SE β ), and the sample size (N, used in our calculation of the SE β ). Any one of these things can contribute to significant findings: a greater slope, less error, and/or a larger sample size. (Here, we saw the effect that sample size can have.) Another means of examining the overall model fit that is, including all independent variables in a multivariate context is by looking at proportion of the total variation in Y i explained by the model. First, total variation can be decomposed into explained and unexplained components as follows: TSS is the Total Sum of Squares RSS is the Regression Sum of Squares (note that some texts call this RegSS) ESS is the Error Sum of Squares (some texts call this the residual sum of squares, RSS) So, TSS = RSS + ESS, where (25) TSS = (Y i Ȳ )2, (26) RSS = (Ŷi Ȳ )2, and (27)ESS = (Y i Ŷ )2
15 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 15 We re basically dividing up the total variance in Y i around its mean (TSS) into two parts: the variance accounted for in the regression model (RSS), and the variance not accounted for by the regression model (ESS). Indeed, we can illustrate on a case-by-case basis the variance from the mean that is accounted for by the model, and the remaining, unaccounted for, variance: All the explained variance (squared) is summed to form RSS; all the unexplained variance (squared) is summed to form ESS. Using these terms, the coefficient of determination, more commonly, the R 2, is calculated as follows: (28) R 2 = RSS TSS, or R 2 =1 ESS TSS, or R 2 = Or, alternatively, following from Equation 25-Equation 27: (29) R 2 = RSS TSS = And we can estimate all of this as follows: TSS ESS TSS ( Ŷ 1 Ȳ )2 (Yi Ȳ )2 = (Yi Ȳ )2 (Y i Ŷi) 2 (Yi Ȳ )2 X i Y i Y ˆ i (Y i Y ) 2 ( Y ˆ i Y ) 2 (Y i Y ˆ i ) X i =5 Y i =3.75 TSS=6.74 RSS=4.04 ESS=2.7.
16 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 16 The coefficient of determination is thus R2 = RSS TSS = =.599. The coefficient of determination is calculated the same way for multivariate regression. The R 2 has one problem, though it can only ever increase or stay equal as variables are added to the equation. More to the point, including extra variables can never lower the R 2, and the measure accordingly does not reward for model parsimony. If you want a measure that does so, you need to use a correction for degrees of freedom (sometimes called an adjusted R-squared): (30) R2 =1 RSS N K 1 TSS N 1 Note that this should only make a difference when the sample size is relatively small, or the number of independent variables is relatively large. But you can see in Equation 30 that if the sample size is small, increasing the number of variables will reduce the numerator, and thus reduce the adjusted R 2. One further note about the coefficient of determination: note that the R 2 is equivalent to the square of Pearson s r (Equation 6). That is, (31) r = S XY S X S Y = R 2 XY, There is, then, a clear relationship between the correlation coefficient and the coefficient of determination. There is also a relationship between a bivariate correlation coefficient and the regression coefficient. Let s begin with an equation for the regression coefficient, as in Equation 17 above: (32) β XY = S XY S 2 X, and rearrange these terms to isolate the covariance: (33) S XY = β XY SX 2, Now, let s substitute this for in the equation for correlation (Equation 6): (34) r XY = S XY S X S Y = β XY S 2 X S X S Y.
17 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 17 So the correlation coefficient and bivariate regression coefficient are products of each other. More clearly: (35) r XY = β XY S X S Y, and (36) β XY = r XY S Y S X. The relationship between the two in multivariate regression is of course much more complicated. But the point is that all these measures - measures capturing various aspects of the relationship between two (or more) variables - are related to each other, each a function of a given set of variances and covariances.
18 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 18 Assumptions of OLS regression The preceding OLS linear regression models are unbiased and efficient (that is, they provide the Best Linear Unbiased Estimator, or BLUE) provided five assumptions are not violated. If any of these assumptions are violated, the regular linear OLS model ceases to be unbiased and/or efficient. The assumptions themselves, as well as problems resulting from violating each one, are listed below (drawn from Kennedy, Econometrics). Of course, many data or models violate one or more of these assumptions, so much of what we have to cover now is how to deal with these problems. 1. Y can be calculated as a linear function of X, plus a disturbance term. Problems: wrong regressors, nonlinearity, changing parameters 2. Expected value of e is zero; the mean of e is zero. Problems: biased intercept 3. Disturbance terms have the same variance and are not correlated with one another Problems: heteroskedasticity, autocorrelated errors 4. Observations of Y are fixed in repeated samples; it is possible to repeat the sample with the same independent values Problems: errors in variables, autoregression, simultaneity 5. Number of observations is greater than the number of independent variables, and there are no exact linear relationships between the independent variables. Problems: multicollinearity
19 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 19 Nonlinearities So far, we ve assumed that the relationship between Y i and X i is linear. In many cases, this will not be true. We could imagine any number of non-linear relationships. Here are two just common possibilities: We can of course estimate a linear relationship in both cases it doesn t capture the actual relationship very well, though. In order to better capture the relationship between Y and X, we may want to adjust our variables to represent this non-linearity. Let s begin with the basic multivariate model, (37) Y i = α + β 1 X 1i + β 2 X 2i + i. Where a single X is believed to have a nonlinear relationship with Y, the simplest approach is to manipulate the X to use X 2 in place of X, for instance: (38) Y i = α + β 1 X 2 1i + β 2 X 2i + i, This may capture the exponential increase depicted in the first figure above. To capture the ceiling effect in the second figure, we could use both the linear (X) and quadratic (X 2 ), with the expectation that the coefficient for the former (β 1 ) would be positive and large, and the coefficient for the latter ( β 2 ) would be negative and small: (39) Y i = α + β 1 X 1i + β 2 X 2 1i + β 3 X 2i + i, This coefficient on the quadratic will gradually, and increasingly, reduce the positive effect of X 1. Indeed, if the effect of the quadratic is great enough, it can in combination with the linear version of X 1 produce a line that increases, peaks, and then begins to decrease.
20 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 20 Of course, these are just two of the simplest (and most common) nonlinearities. You can imagine any number of different non-linear relationships; most can be captured by some kind of mathematical adjustment to regressors. Sometimes we believe there is a nonlinear relationship between all the Xs and Y that is, all Xs combined have a nonlinear effect on Y, for instance: (40) Y i =(α + β 1 X 1i + β 3 X 2i ) 2 + i. The easiest way to estimate this is not Equation 40, though, but rather an adjustment as follows: (41) Yi = α + β 1 X 1i + β 3 X 2i + i. Here, we simply transform the dependent variable. I ve replaced the squared version of the right hand side (RHS) variables with the square root of the left hand side (LHS) because it s a simple example of a nonlinear transformation. It s not the most common, however. The most common is taking the log of Y, as follows: (42) ln(y i )=α + β 1 X 1i + β 3 X 2i + i. Doing so serves two purposes. First, we might believe that the shape of the effect of our RHS variables on Y i is actually nonlinear and specifically, logistic in shape (a S-curve). This transformation may quite nicely capture this nonlinearity. Second, taking the log of Y i can solve a distributional problem with that variable. OLS estimations will work more efficiently with variables that are normally distributed. If Y i has a great many small values, and a long right-hand tail (as many of our variables will; for instance, income), then taking the log of Y i often does a nice job of generating a more normal distribution. This example highlights a second reason for transforming a variable, on the LHS or RHS. Sometimes, a transformation is based on a particular shape of an effect, based on theory. Other times, a transformation is used to fix a non-normally distributed variable. The first transformation is based on theoretical expectations; the second is based on a statistical problems. (In practice, separating the two is not always easy.)
21 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 21 Collinearity and Multicollinearity When there is a linear relationship among the regressors, the OLS coefficients are not uniquely identified. This is not a problem if your goal is only to predict Y multicollinearity will not affect the overall prediction of the regression model. If your goal is to understand how the individual RHS variables impact Y, however, multicollinearity is a big problem. One problem is that the individual p-values can be misleading confidence intervals on the regression coefficients will be very wide. Essentially, what we are concerned about is the correlation amongst regressors, for instance, X 1 and X 2 : (43) r 12 = (X1 X 2 )(X 2 X 2 ) (Xi X 1 ) 2 (X 2 X 2 ) 2, This is of course just a simple adjustment to the Pearson s r equation (Equation 7). Equation 43 deals just with the relationship between two variables, however, and we are often worried about a more complicated situation one in which a given regressor is correlated with a combination of several, or even all, the other regressors in a model. (Note that this multicollinearity can exist even if there are no striking bivariate relationships between regressors.) Multicollinearity is perhaps most easily depicted as a regression model in which one X is regressed on all others. That is, for the regression model, (44) Y i = α + β 1 X 1i + β 2 X 2i + β 3 X 3i + β 4 X 4i + i we might be concerned that the following regression produces strong results: (45) X 1i = α + β 2 X 2i + β 3 X 3i + β 4 X 4i + i If X 1 is well predicted by X 2 through X 4, it will be very difficult to identify the slope (and error) for X 1 from the set of other slopes (and errors). (The slopes and errors for the other slopes may be affected as well.) Variance inflation factors are one measure that can be used to detect multicollinearity. Essentially, VIFs are a scaled version of the multiple correlation coefficient between variable j and the rest of the independent variables. Specifically, (46) VIF j = 1 1 R 2 j where R 2 j would be based on results from a model as in Equation 45. If R 2 j equals zero (i.e., no correlation between X j and the remaining independent
22 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 22 variables), then VIF j equals 1. This is the minimum value. As R 2 j increases, however, the denominator of Equation 46 decreases, and the estimated VIF rises as a consequence. A value greater than 10 represents a pretty big multicollinearity problem. VIFs tell us how much the variance of the estimated regression coefficient is 'inflated' by the existence of correlation among the predictor variables in the model. The square root of the VIF actually tells us how much the standard error is inflated. This table, drawn from the Sage volume by Fox, shows the relationship between a given R 2 j, the VIF, and the estimated amount by which the standard error of X j is inflated by multicollinearity. Coefficient Variance Inflation as a Function of Inter-Regressor Multiple Correlation R j 2 VIF (impact on SE β j ) Ways of dealing with multicollinearity include (a) dropping variables, (b) combining multiple collinear variables into a single measure, and/or (c) if collinearity is only moderate, and all variables are of substantive importance to the model, simply interpreting coefficients and standard errors taking into account the effects of multicollinearity.
23 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 23 Heteroskedasticity Heteroskedasticity refers to unequal variance in the regression errors. Note that there can be heteroskedasticity relating to the effect of individual independent variables, and also heteroskedasticity related to the combined effect of all independent variables. (In addition, there can be heteroskedasticity in terms of unequal variance over time.) The following figure portrays the standard case of heteroskedasticity, where the variance in Y (and thus the regression error as well) is systematically related to values of X. The difficulty here is that the error of the slope will be poorly estimated it will over-estimate the error at small values of X, and under-estimate the error at large values of X. Diagnosing heteroskedasticity is often easiest by looking at a plot of errors (ε i ) by values of the dependent variable (Y i ). Basically, we begin with the standard bivariate model of Y i, (47) Y i = α + βx i + ε i, and then plot the resulting values of ε i by Y i. If we did so for the data in the preceding figure, then the resulting residuals plot would look as follows:
24 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 24 As Y i increases here, so too does the variance in ε i. There are of course other possible (heteroskedastic) relationships between Y i and ε i, for instance, where variance in much greater in the middle. Any version of heteroskedasticity presents problems for OLS models. When the sample size is relatively small, these diagnostic graphs are probably the best means of identifying heteroskedasticity. When the sample size is large, there are too many dots on the graph to distinguish what s going on. There are several tests for heteroskedasticity, however. The Breusch-Pagan test tests for a relationship between the error and the independent variables. It starts with a standard multivariate regression model, (48) Y i = α + β 1 X 1i + β 2 X 2i β k X ki + i, and then substitutes the estimated errors, squared, for the dependent variable, (49) 2 i = α + β 1x 1i + β 2 x 2i β k x ki + ν i. We then use a standard F-test to test the joint significance of coefficients in Equation 49. If they are significant, there is some kind of systematic relationship between the independent variables and the error.
25 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 25 Outliers Recall that OLS regression pays particularly close attention to avoiding large errors. It follows that outliers cases that are unusual can have a particularly large effect on an estimated regression slope. Consider the following two possibilities, where a single outlier has a huge effect on the estimated slope: Hat values (h i ) are the common measure of leverage in a regression. It is possible to express the fitted values of in terms of the observed values : (50) Ŷj = h 1j Y 1 + h 2 Y h nj Y n = n H ij Y i. i=1 The coefficient, or weight, h ij captures the contribution of each observation to the fitted value. Outlying cases can usually not be discovered by looking at residuals OLS estimation tries, after all, to minimize the error for high-leverage cases. In fact, the variance in residuals is in part a function of leverage, (51) V (E i)=σ 2 (1 h i ). The greater the hat value in Equation 51, the lower the variance. How can we identify high-leverage cases? Sometimes, simply plotting data can be very helpful. Also, we can look closely at residuals. Start with the model for standardized residuals, as follows, (52) E i = E i S E 1 hi, which simply expresses each residual as a number (or increment) of standard deviations in E i. The problem with Equation 52 is that case i is included in the
26 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 26 estimation of the variance; what we really want is a sense for how i looks in relation to the variance in all other cases. This is a studentized residual, (53) E i = E i S E( 1) 1 hi. and it provides a good indication of just how far out a given case is in relation to all other cases. (To test significance, the statistic follows a t-distribution with N-K-2 degrees of freedom.) Note that you can estimate studentized residuals in a quite different way (though with the same results). Start by defining a variable D, equal to 1 for case i and equal to 0 for all other cases. Now, for a multivariate regression model as follows: (54) Y i = α + β 1 X 1 + β 2 X β k X k + i. add variable D and estimate, (55) Y i = α + β 1 X 1 + β 2 X β k X k + γd i + i. This is referred to as a mean-shift outlier model, and the t-statistic for γ provides a test equivalent to the studentized residual. What do we do if we have outliers? That depends. If there are reasons to believe the case is abnormal, then sometimes it s best just to drop it from the dataset. If you believe the case is correct, or justifiable, however, in spite of the fact that it s an outlier, then you may choose to keep it in the model. At a minimum, you will want to test your model with and without this outlier, to explore the extent to which you results are driven by a single case (or, in case of several outliers, a small number of cases).
27 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 27 Linear Probability Models Models for dichotomous data Let s begin with a simple definition of our binary dependent variable. We have variable, Y i, which only takes on the values 0 or 1. We want to predict when Y i is equal to 0, or 1; put differently, we want to know for each individual case i the probability that Y i is equal to 1, given X i. More formally, (56)E(Y i )=Pr(Y i =1 X i ), which states that the expected value of Y i is equal to the probability that Y i is equal to one, given X i. Now, a linear probability model simply estimates Pr(Y i =1) in same way as we would estimate an interval-level Y i : (57) Pr(Y i = 1) = α + βx i. There are two difficulties with this kind of model. First, while the estimated slope coefficients are good, the standard errors are incorrect due to heteroskedasticity (errors increase in the middle range, first negative, then positive). Graphing the data with a regular linear regression line, for instance, would look something like this: The second problem with the linear probability model is that it will generate predictions that are greater than 1 and/or less than 0 (as shown in the preceding figure) even though these are nonsensical where probabilities are concerned. As a consequence, it is desirable to try and transform either the LHS or RHS of the model so predictions are both realistic and efficient.
28 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 28 Nonlinear Probability Model: Logistic Regression One option is to transform Y i, to develop a nonlinear probability model. To extend the range beyond 0 to 1, we first transform the probability into the odds (58) Pr(Y i =1 X i ) Pr(Y i =0 X i ) = Pr(Y i =1 X i ) 1 Pr(Y i =1 X i ), which indicate how often something happens relative to how often it does not, and range from 0 to infinity as X i approaches 1. We then take the log of this to get, (59) ln( Pr(Y i =1 X i ) 1 Pr(Y i =1 X i ) ), or more simply, (60) ln( where, p i 1 p i ), (61) p i = Y i =1 X i. Modeling what we ve seen in equation 60 then captures the log odds that something will happen. By taking the log, we ve effectively stretched out the ends of the 0 to 1 range, and consequently have a comparatively unconstrained dependent variable that can be used without difficulty in an OLS regression, where (62) ln( p i 1 p i )=βx i. Just to make clear the effects of our transformation, here s what taking the log odds of a simple probability looks like:
29 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 29 Probability Odds Logit /99= /95= /9= /7= /5= / /1= /5= /1= Note that there is another way of representing a logit model, essentially the inverse (un-logging of both sides) of Equation 62: (63) Pr(Y i =1 X i )= expβx 1 1+exp βx i. Just to be clear, we can work our way backwards from equation Equation 63 to Equation 62 as follows: (64) Pr(Y i =1 X i )= expβx 1 1+exp βx i, and Pr(Y i =0 X i )= exp βx i 1 1+expβX or 1 exp βx i. So, (65) p 1 p 0 = p 1 1 p 1 = and, expβx i 1+expβX i 1 1+expβX i = expβx i 1, (66) p i 1 p i = exp βxi, which when logging both sides becomes, (67)ln( p i 1 p i )=βx i.
30 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 30 The notation in Equation 62 is perhaps the most useful in connecting logistic with probit and other non-linear estimations for binary data. The logit transformation is just one possible transformation that effectively maps the linear prediction into the 0 to 1 interval allowing us to retain the fundamentally linear structure of the model while at the same time avoiding the contradiction of probabilities below 0 or above 1. Many cumulative density functions (CDFs) will meet this requirement. (Note that CDFs define the probability mass to the left of a given value of X; they are of course related in that they are slight adjustment of PDFs, which are dealt with in more detail in the section on significance tests.) Equation 63 is in contrast useful for thinking about the logit model as just one example of transformations in which Pr(Y i =1) is a function of a non-linear transformation of the RHS variables, based on any number of CDFs. A more general version of Equation 63 is, then, (68) Pr(Y i =1 X i )=F (βx i ). where F is the logistic CDF for the logit model, as follows, (69)Pr(Y i =1 X i )=F (βx i ), where F = 1 1+exp (x µ)/s, but could just as easily be the normal CDF for the probit model, or a variety of other CDFs. How do we know which CDF to use? The CDF we choose should reflect our beliefs about the distribution of Y i, or, alternatively (and equivalently) the distribution of error in Y i. We discuss this more below. An Alternative Description: The Latent Variable Model Another way to draw the link between logistic and regular regression is through the latent variable model, which posits that there is an unobserved, latent variable Y * i, where (70)Y i = βx i + i, and the link between the observed binary Y i and the latent Y i * is as follows: (71)Y i =1 if Y i > 0, and (72)Y i =0 if Y i 0. Using this example, the relationship between the observed binary Y i and the latent Y i can be graphed as follows:
31 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 31 So, at any given value of X i there is a given probability that Y i is greater than zero. This figure also shows how our beliefs about the distribution of error (ε i ) are fundamental there is a distribution of possible outcomes in Y i * when, in this figure, X i =4. For a probit model, we assume that Var(ε i ) =1 ; for a logit model, we assume that Var(ε i ) = π 2 /3. Other CDFs make other assumptions. The distribution of error (ε i ) at any given value of X i is related to a non-linear increase in the probability that Y i =1. Indeed, we can show this non-linear shift first by plotting a distribution of ε i at each value of X i, and then by looking at how the movement of this distribution across the zero line shifts the probability that Y i =1:
32 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 32 As the thick part of the distribution moves across the zero line, the probability increases dramatically. Nonlinear Probability Model: Probit Regression As noted above, probit models are based on the same logic as logistic models. Again, they can be thought of as a non-linear transformation of the LHS or RHS variables. The only difference for probit models is that rather than assume a logistic distribution, we assume a normal one. In equation 68, then, F would now be the cumulative density function for a normal distribution. Why assume a normal distribution? The critical question is why assume a logistic one? We typically assume a logistic distribution because it is very close to normal, and estimating a logistic model is computationally much easier than estimating probit model. We now have faster computers, so there is now less reason to rely on logit rather than probit models. That said, logit has some advantages where teaching is concerned. Compared to probit, it s very simple.
33 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 33 Maximum Likelihood Estimation Models for categorical variables are not estimated using OLS, but using maximum likelihood. ML estimates are the values of the parameters that have the greatest likelihood (that is, the maximum likelihood) of generating the observed sample of data if the assumptions of the model are true. For a simple model like Y i = α + βx i, an ML estimation looks at many different possible values of and, and finds the combination which is most likely to generating the observed values of Y i. Take, for instance, the above graph, which shows the observed values of Y i on the bottom axis. There are two different probability distributions, one produced by one set of parameters, A, and one produced by another set of parameters, B. MLE asks which distribution seems more likely to have produced the observed data. Here, it looks like the B parameters have an estimated distribution more likely to produce the observed data. Alternatively, consider the following. If we are interested in the probability that Y i =1, given a certain set of parameters (p), then an ML estimation is interested in the likelihood of p given the observed data (73) L(p Y i ). This is a likelihood function. Finding the best set of parameters is an iterative process, which starts somewhere and starts searching; different optimization algorithms may start in slightly different places, and conduct the search differently; all base their decision about searching for parameters on the rate of improvement in the model. (The way in which model fit is judged is addressed below.)
34 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 34 Note that our being vague about parameters here is purposeful. As analysts, the parameters we are thinking about are the coefficients for the various independent variables (βx ). The parameters critical to the ML estimation, however, are those that define the shape of the distribution; for a normal distribution, for instance, these are the mean (µ) and variance (σ ) (see Equation 24). Every set of parameters, βx, however, produces a given estimated normal distribution of Y i with mean µ and variance σ ; the ML estimation tries to find the βx producing the distribution most likely to have generated our observed data. Not also that while we speak about ML estimations maximizing the likelihood equation, in practice programs maximize the log of the likelihood, which simplifies computations considerably (and gets the same results). Because the likelihood is always between 0 and 1, the log likelihood is always negative. We can see this in the iteration log in STATA logit estimates, for instance.
35 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 35 Interpretation & Goodness of Fit Measures for Categorical Models Indeed, the -2 log likelihood is the measure of model fit for most categorical models. It is as follows, (74) 2(LL A LL B ), where LL A is the log likelihood of finding our sample of Y i in a distribution produced by our parameterized model, and LL B is the log likelihood of finding our sample of Y i in the distribution produced when all parameters are restricted to 0. Essentially, then, we re looking at the total improvement in the model s predictive power the difference between our model, and no model (save for a distributional assumption). Multiplying this difference by -2 has the (albeit mysterious) advantage of producing a statistic that is asymptotically χ 2 distributed. There are various versions of a pseudo R 2 for categorical models, usually based on some manipulation of the -2 log likelihood. To interpret individual coefficients resulting from a categorical model, we usually transform them into odds ratios (from log-odds ratios, which are not readily interpretable). This transformation is relatively simple. Recall that one version of the logit model is as follows, p i (75) ln( )=βx i. 1 p i This is the log odds ratio, of course, equivalent to the following, (76) p i 1 p i = exp(βx i ). This transformation of coefficients produces odds ratios, where each coefficient is now expressed as the odds that Y i is equal to 1 (rather than 0) when there is a one-unit increase in X i. (There are equivalent transformations for probit coefficients.)
36 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 36 Ordinal Outcomes Models for Categorical Data For models where the dependent variable is categorical, but ordered, ordered logit is the most appropriate modelling strategy. A typical description begins with a latent variable Y * i which is a function of (77)Y i * = βx i + ε i, and a link between an observed binary Y i and a latent Y i * as follows: (78)Y i =1 if Y i * δ 1 Y i = 2 if δ 1 Y i * δ 2, and, and Y i = 3 if Y i * δ 2, where δ 1 and δ 2 are unknown parameters to be estimated along with the β in equation 79. We can restate the model, then, as follows: (79)Pr(Y i =1x) = Pr(βX i + ε i δ 1 ) = Pr(ε i δ 1 βx i ), and Pr(Y i = 2 X i ) = Pr(δ 1 βx + ε i δ 2 ) = Pr(δ 1 βx i < ε i δ 2 βx i ), and Pr(Y i = 3 X i ) = Pr(βX i + ε i δ 2 ) = Pr(ε i δ 2 βx i ). The last statement of each line here makes clear the importance that the distribution of error plays in the estimation: the probability of a given outcome can be expressed as the probability that the error is in the first line, for instance smaller than the difference between theta and the estimated value. This set of statements can also be expressed as follows, adding hats to denote estimated values, substituting predicted Y ˆ for βx, and inserting a given cumulative distribution function, F, from which we derive our probability estimates: (80) p ˆ i1 = Pr(ε i ˆ δ 1 Y ˆ i ) = F( ˆ δ 1 Y ˆ i ), and p ˆ i2 = Pr( ˆ δ 1 Y ˆ i < ε i ˆ δ 2 Y ˆ i ) = F( ˆ δ 2 Y ˆ i ) F( ˆ δ 1 Y ˆ 1 ), and p ˆ i3 = Pr(ε i ˆ δ 2 Y ˆ i ) =1 F( ˆ δ 2 Y ˆ i ), Where F can again be the logistic CDF (for ordered logit), but also the normal CDF (for ordered probit), and so on. Again, using the logistic version as the
37 March 2010 Poli618 Notes, Stuart Soroka, Dept of Political Science, McGill University pg 37 example is far easier, and we can express the whole system in another way, as follows: p (81)ln( 1 p ) = βx, ln( 1 + p 2 p ) = βx, ln( 1 + p p k ) = βx, 1 p 1 1 p 1 p 2 1 p 1 p 2... p k where. Note that these models rest on the parallel slopes assumption: the slope coefficients do not vary between different categories of the dependent variable (i.e., from the first to second category, the second to third category, and so on). If this assumption is unreasonable, a multinomial model is more appropriate. (In fact, this assumption can be tested by fitting a multinomial model and examining differences and similarities in coefficients across categories.) And now, when we talk about odds ratios, we are talking about a shift in the odds of falling into a given category (m), (82)OR(m) = Pr(Y i m) Pr(Y i < m). Nominal Outcomes Multinomial logit is essentially a series of logit regressions examining the probability that Y i = m rather than Y i = k, where k is a reference category. This means that one category of the dependent variable is set aside as the reference category, and all models show the probability of Y i being one outcome rather than outcome k. Say, for instance, there are four outcomes k, m, n, and q, where k is the reference category. The models estimated are: (83)ln( Pr(Y i = m) Pr(Y i = k) ) = β m X, ln(pr(y i = n) Pr(Y i = k) ) = β n X, ln(pr(y i = q) Pr(Y i = k) ) = β q X These models explore the variables that distinguish each of m, n, and q from k. Any category can be the base category, of course. It may be that it is additionally interesting to see how q is distinguished from the other categories, in which case the following models can be estimated: (84)ln( Pr(Y i = k) Pr(Y i = q) ) = β k X, ln(pr(y i = m) Pr(Y i = q) ) = β m xx, ln(pr(y i = n) Pr(Y i = q) ) = β n X Results for multinomial logit models aren t expressed as odds ratios, since odds ratios refer to the probability of an outcome divided by 1. Rather, multinomial results are expressed as a risk-ratio, or relative risk, which is easily calculated by taking the exponential of the log risk-ratio. Where, the log risk-ratio is
Generalized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationreview session gov 2000 gov 2000 () review session 1 / 38
review session gov 2000 gov 2000 () review session 1 / 38 Overview Random Variables and Probability Univariate Statistics Bivariate Statistics Multivariate Statistics Causal Inference gov 2000 () review
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More information2 Prediction and Analysis of Variance
2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationMultiple Regression Analysis
Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,
More informationThe Simple Linear Regression Model
The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate
More informationBusiness Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More informationChapter 4: Regression Models
Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,
More informationEconometrics Summary Algebraic and Statistical Preliminaries
Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L
More informationInstructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses
ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationChapter 2: simple regression model
Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.
More informationECON 497: Lecture 4 Page 1 of 1
ECON 497: Lecture 4 Page 1 of 1 Metropolitan State University ECON 497: Research and Forecasting Lecture Notes 4 The Classical Model: Assumptions and Violations Studenmund Chapter 4 Ordinary least squares
More informationEconometrics Honor s Exam Review Session. Spring 2012 Eunice Han
Econometrics Honor s Exam Review Session Spring 2012 Eunice Han Topics 1. OLS The Assumptions Omitted Variable Bias Conditional Mean Independence Hypothesis Testing and Confidence Intervals Homoskedasticity
More informationLinear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons
Linear Regression with 1 Regressor Introduction to Econometrics Spring 2012 Ken Simons Linear Regression with 1 Regressor 1. The regression equation 2. Estimating the equation 3. Assumptions required for
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationPsychology Seminar Psych 406 Dr. Jeffrey Leitzel
Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting
More informationFinal Exam - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your
More informationAn overview of applied econometrics
An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical
More informationChapter 16. Simple Linear Regression and dcorrelation
Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationChapter 6. Logistic Regression. 6.1 A linear model for the log odds
Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More informationRegression. ECO 312 Fall 2013 Chris Sims. January 12, 2014
ECO 312 Fall 2013 Chris Sims Regression January 12, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License What
More informationstatistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:
Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationLeast Squares Estimation-Finite-Sample Properties
Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions
More information12 Statistical Justifications; the Bias-Variance Decomposition
Statistical Justifications; the Bias-Variance Decomposition 65 12 Statistical Justifications; the Bias-Variance Decomposition STATISTICAL JUSTIFICATIONS FOR REGRESSION [So far, I ve talked about regression
More informationQuadratic Equations Part I
Quadratic Equations Part I Before proceeding with this section we should note that the topic of solving quadratic equations will be covered in two sections. This is done for the benefit of those viewing
More informationRegression with Nonlinear Transformations
Regression with Nonlinear Transformations Joel S Steele Portland State University Abstract Gaussian Likelihood When data are drawn from a Normal distribution, N (µ, σ 2 ), we can use the Gaussian distribution
More informationECONOMETRICS HONOR S EXAM REVIEW SESSION
ECONOMETRICS HONOR S EXAM REVIEW SESSION Eunice Han ehan@fas.harvard.edu March 26 th, 2013 Harvard University Information 2 Exam: April 3 rd 3-6pm @ Emerson 105 Bring a calculator and extra pens. Notes
More informationChapter 19: Logistic regression
Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog
More informationLecture 3: Multiple Regression
Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u
More informationLecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson
Lecture 10: Alternatives to OLS with limited dependent variables PEA vs APE Logit/Probit Poisson PEA vs APE PEA: partial effect at the average The effect of some x on y for a hypothetical case with sample
More informationPOL 681 Lecture Notes: Statistical Interactions
POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship
More information1 Motivation for Instrumental Variable (IV) Regression
ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data
More informationAlgebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.
This document was written and copyrighted by Paul Dawkins. Use of this document and its online version is governed by the Terms and Conditions of Use located at. The online version of this document is
More informationLectures 5 & 6: Hypothesis Testing
Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across
More informationWooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model
Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Most of this course will be concerned with use of a regression model: a structure in which one or more explanatory
More informationLinear Regression & Correlation
Linear Regression & Correlation Jamie Monogan University of Georgia Introduction to Data Analysis Jamie Monogan (UGA) Linear Regression & Correlation POLS 7012 1 / 25 Objectives By the end of these meetings,
More informationBusiness Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM
Subject Business Economics Paper No and Title Module No and Title Module Tag 8, Fundamentals of Econometrics 3, The gauss Markov theorem BSE_P8_M3 1 TABLE OF CONTENTS 1. INTRODUCTION 2. ASSUMPTIONS OF
More informationEC4051 Project and Introductory Econometrics
EC4051 Project and Introductory Econometrics Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Intro to Econometrics 1 / 23 Project Guidelines Each student is required to undertake
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.
ACE 564 Spring 2006 Lecture 8 Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information by Professor Scott H. Irwin Readings: Griffiths, Hill and Judge. "Collinear Economic Variables,
More information1 Correlation and Inference from Regression
1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is
More informationLesson 21 Not So Dramatic Quadratics
STUDENT MANUAL ALGEBRA II / LESSON 21 Lesson 21 Not So Dramatic Quadratics Quadratic equations are probably one of the most popular types of equations that you ll see in algebra. A quadratic equation has
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationKeller: Stats for Mgmt & Econ, 7th Ed July 17, 2006
Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationECNS 561 Multiple Regression Analysis
ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationIntroduction to Econometrics. Heteroskedasticity
Introduction to Econometrics Introduction Heteroskedasticity When the variance of the errors changes across segments of the population, where the segments are determined by different values for the explanatory
More informationEconometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur
Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur Module No. # 01 Lecture No. # 28 LOGIT and PROBIT Model Good afternoon, this is doctor Pradhan
More informationECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47
ECON2228 Notes 2 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 2 2014 2015 1 / 47 Chapter 2: The simple regression model Most of this course will be concerned with
More informationSociology 593 Exam 2 Answer Key March 28, 2002
Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably
More informationChapte The McGraw-Hill Companies, Inc. All rights reserved.
12er12 Chapte Bivariate i Regression (Part 1) Bivariate Regression Visual Displays Begin the analysis of bivariate data (i.e., two variables) with a scatter plot. A scatter plot - displays each observed
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationChapter 16. Simple Linear Regression and Correlation
Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationRegression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.
Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if
More informationMATH CRASH COURSE GRA6020 SPRING 2012
MATH CRASH COURSE GRA6020 SPRING 2012 STEFFEN GRØNNEBERG Contents 1. Basic stuff concerning equations and functions 2 2. Sums, with the Greek letter Sigma (Σ) 3 2.1. Why sums are so important to us 3 2.2.
More informationOrdinary Least Squares Regression Explained: Vartanian
Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent
More informationISQS 5349 Final Exam, Spring 2017.
ISQS 5349 Final Exam, Spring 7. Instructions: Put all answers on paper other than this exam. If you do not have paper, some will be provided to you. The exam is OPEN BOOKS, OPEN NOTES, but NO ELECTRONIC
More informationDiscrete Dependent Variable Models
Discrete Dependent Variable Models James J. Heckman University of Chicago This draft, April 10, 2006 Here s the general approach of this lecture: Economic model Decision rule (e.g. utility maximization)
More informationECON 497 Midterm Spring
ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain
More informationAlgebra & Trig Review
Algebra & Trig Review 1 Algebra & Trig Review This review was originally written for my Calculus I class, but it should be accessible to anyone needing a review in some basic algebra and trig topics. The
More informationLECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity
LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists
More informationDo not copy, post, or distribute
14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible
More informationWooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares
Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit
More informationRockefeller College University at Albany
Rockefeller College University at Albany PAD 705 Handout: Suggested Review Problems from Pindyck & Rubinfeld Original prepared by Professor Suzanne Cooper John F. Kennedy School of Government, Harvard
More informationGeneralized Linear Models
York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear
More informationLECTURE 15: SIMPLE LINEAR REGRESSION I
David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).
More informationFormal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More informationGeneralized Models: Part 1
Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes
More informationGov 2000: 9. Regression with Two Independent Variables
Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62 1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics
More informationOverview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation
Bivariate Regression & Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate Linear Regression Line SPSS Output Interpretation Covariance ou already
More informationChapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a
Chapter 9 Regression with a Binary Dependent Variable Multiple Choice ) The binary dependent variable model is an example of a a. regression model, which has as a regressor, among others, a binary variable.
More informationHeteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.
Heteroskedasticity y i = β + β x i + β x i +... + β k x ki + e i where E(e i ) σ, non-constant variance. Common problem with samples over individuals. ê i e ˆi x k x k AREC-ECON 535 Lec F Suppose y i =
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationBiostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras
Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationSo far our focus has been on estimation of the parameter vector β in the. y = Xβ + u
Interval estimation and hypothesis tests So far our focus has been on estimation of the parameter vector β in the linear model y i = β 1 x 1i + β 2 x 2i +... + β K x Ki + u i = x iβ + u i for i = 1, 2,...,
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationEstimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.
Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.
More informationAP Statistics. Chapter 6 Scatterplots, Association, and Correlation
AP Statistics Chapter 6 Scatterplots, Association, and Correlation Objectives: Scatterplots Association Outliers Response Variable Explanatory Variable Correlation Correlation Coefficient Lurking Variables
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationMultiple Regression Analysis
Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators
More informationHypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima
Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s
More informationLinear Regression 9/23/17. Simple linear regression. Advertising sales: Variance changes based on # of TVs. Advertising sales: Normal error?
Simple linear regression Linear Regression Nicole Beckage y " = β % + β ' x " + ε so y* " = β+ % + β+ ' x " Method to assess and evaluate the correlation between two (continuous) variables. The slope of
More informationReview of Multiple Regression
Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate
More informationECON3150/4150 Spring 2015
ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2
More informationExploratory Factor Analysis and Principal Component Analysis
Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationA Re-Introduction to General Linear Models
A Re-Introduction to General Linear Models Today s Class: Big picture overview Why we are using restricted maximum likelihood within MIXED instead of least squares within GLM Linear model interpretation
More informationMultiple Linear Regression
Andrew Lonardelli December 20, 2013 Multiple Linear Regression 1 Table Of Contents Introduction: p.3 Multiple Linear Regression Model: p.3 Least Squares Estimation of the Parameters: p.4-5 The matrix approach
More informationRewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35
Rewrap ECON 4135 November 18, 2011 () Rewrap ECON 4135 November 18, 2011 1 / 35 What should you now know? 1 What is econometrics? 2 Fundamental regression analysis 1 Bivariate regression 2 Multivariate
More information