Gov 2000: 9. Regression with Two Independent Variables
|
|
- Dustin Garrison
- 5 years ago
- Views:
Transcription
1 Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall / 62
2 1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics with Two Covariates 5. OLS Assumptions with Two Covariates 6. Omitted Variable Bias 7. Goodness of Fit & Multicollinearity 2 / 62
3 Where are we? Where are we going? 10 Number of Covariates in Our Regressions Last Week 3 / 62
4 Where are we? Where are we going? 10 Number of Covariates in Our Regressions Last Week This Week 4 / 62
5 Where are we? Where are we going? 10 Number of Covariates in Our Regressions Last Week This Week Next Week 5 / 62
6 1/ Why Add Variables to a Regression? 6 / 62
7 7 / 62
8 Berkeley gender bias Graduate admissions data from Berkeley, 1973 Acceptance rates: Men: 8442 applicants, 44% admission rate Women: 4321 applicants, 35% admission rate Evidence of discrimination toward women in admissions? This is a marginal relationship. What about the conditional relationship within departments? 8 / 62
9 Berkeley gender bias, II Within departments: Men Women Dept Applied Admitted Applied Admitted A % % B % 25 68% C % % D % % E % % F 373 6% 341 7% Within departments, women do somewhat better than men! Women apply to more challenging departments. Marginal relationships (admissions and gender) conditional relationship given third variable (department). 9 / 62
10 Simpson s paradox 1 Z = 1 0 Y -1-2 Z = X Overall a positive relationship between Y i and X i. 10 / 62
11 Simpson s paradox 1 Z = 1 0 Y -1-2 Z = X Overall a positive relationship between Y i and X i. But within levels of Z i, the opposite. 11 / 62
12 Basic idea Old goal: estimate the mean of Y as a function of some independent variable, X: E[Y i X i ]. For continuous X s, we modeled the CEF/regression function with a line: Y i = β 0 + β 1 X i + u i New goal: estimate the relationship of two variables, Y i and X i, conditional on a third variable, Z i : Y i = β 0 + β 1 X i + β 2 Z i + u i β s are the population parameters we want to estimate. 12 / 62
13 Why control for another variable Descriptive Predictive Causal Get a sense for the relationships in the data. Conditional on the number of steps I ve taken, does higher activity levels correlate with less weight? We can usually make better predictions about the dependent variable with more information on independent variables. Block potential confounding, which is when X doesn t cause Y, but only appears to because a third variable Z causally affects both of them. 13 / 62
14 Plan of attack 1. Interpretation with a binary Z i 2. Interpretation with a continuous Z i 3. Mechanics of OLS with 2 covariates 4. OLS assumptions with 2 covariates: Omitted variable bias Multicollinearity 14 / 62
15 What we won t cover in lecture 1. The OLS formulas for 2 covariates 2. Proofs 3. The second covariate being a function of the first: Z i = X 2 i 4. Hypothesis test/confidence intervals (almost exactly the same) 15 / 62
16 2/ Adding a Binary Covariate 16 / 62
17 Example Non-African countries Log GDP per capita African countries Strength of Property Rights 17 / 62
18 Basics Ye olde model: E[Y i X i ] = α 0 + α 1 X i (α0, α 1 ) are the bivariate intercept/slope, e i is the bivariate error. Concern: AJR might be picking up an African effect : African countries might have low incomes and weak property rights. Condition on country being in Africa or not to remove this: E[Y i X i, Z i ] = β 0 + β 1 X i + β 2 Z i Z i = 1 to indicate that i is an African country Zi = 0 to indicate that i is an non-african country Effects are now within Africa or within non-africa, not between 18 / 62
19 AJR model ajr.mod <- lm(logpgp95 ~ avexpr + africa, data = ajr) summary(ajr.mod) ## ## Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) <2e-16 *** ## avexpr <2e-16 *** ## africa e-08 *** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: on 108 degrees of freedom ## (52 observations deleted due to missingness) ## Multiple R-squared: 0.708, Adjusted R-squared: ## F-statistic: 131 on 2 and 108 DF, p-value: <2e / 62
20 Two lines, one regression How can we interpret this model? Plug in two possible values for Z i and rearrange When Z i = 0: When Z i = 1: Y i = β 0 + β 1 X i + β 2 Z i = β 0 + β 1 X i + β 2 0 = β 0 + β 1 X i Y i = β 0 + β 1 X i + β 2 Z i = β 0 + β 1 X i + β 2 1 = ( β 0 + β 2 ) + β 1 X i Two different intercepts, same slope 20 / 62
21 Interpretation of the coefficients Intercept for X i Non-African country (Z i = 0) β 0 β 1 African country (Z i = 1) β 0 + β 2 β 1 Slope for X i In this example, we have: Y i = X i Z i β 0 : average log income for non-african country (Z i = 0) with property rights measured at 0 is β 1 : A one-unit increase in property rights is associated with a increase in average log incomes for two African countries (or for two non-african countries) β 2 : there is a average difference in log income per capita between African and non-african counties conditional on property rights 21 / 62
22 General interpretation of the coefficients Y i = β 0 + β 1 X i + β 2 Z i β 0 : average value of Y i when both X i and Z i are equal to 0 β 1 : A 1-unit increase in X i is associated with a β 1 -unit change in Y i for units with the same value of Z i β 2 : average difference in Y i between Z i = 1 group and Z i = 0 group for units with the same value of X i 22 / 62
23 Adding a binary variable, visually Log GDP per capita β 0 5 β 0 = β 1 = Strength of Property Rights 23 / 62
24 Adding a binary variable, visually Log GDP per capita β 0 β 0 = β 1 = β 2 = β β 0 + β Strength of Property Rights 24 / 62
25 Marginal vs conditional Log GDP per capita Strength of Property Rights 25 / 62
26 3/ Adding a Continuous Covariate 26 / 62
27 Adding a continuous variable Ye olde model: E[Y i X i ] = α 0 + α 1 X i New concern: geography is confounding the effect geography might affect political institutions geography might affect average incomes (through diseases like malaria) Condition on Z i : mean temperature in country i (continuous) E[Y i X i, Z i ] = β 0 + β 1 X i + β 2 Z i 27 / 62
28 AJR model, revisited ajr.mod2 <- lm(logpgp95 ~ avexpr + meantemp, data = ajr) summary(ajr.mod2) ## ## Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) e-12 *** ## avexpr e-08 *** ## meantemp ** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: on 57 degrees of freedom ## (103 observations deleted due to missingness) ## Multiple R-squared: 0.615, Adjusted R-squared: ## F-statistic: 45.6 on 2 and 57 DF, p-value: 1.48e / 62
29 Interpretation with a continuous Z Intercept for X i Z i = 0 C β 0 β 1 Z i = 21 C β 0 + β 2 21 β 1 Z i = 24 C β 0 + β 2 24 β 1 Z i = 26 C β 0 + β 2 26 β 1 In this example we have: Slope for X i Y i = X i 0.06 Z i β 0 : average log income for a country with property rights measured at 0 and a mean temperature of 0 is β 1 : A one-unit increase in property rights is associated with a change in average log incomes conditional on a country s mean temperature β 2 : A one-degree increase in mean temperature is associated with a 0.06 change in average log incomes conditional on strength of property rights 29 / 62
30 General interpretation Y i = β 0 + β 1 X i + β 2 Z i The coefficient β 1 measures how the predicted outcome varies in X i for units with the same value of Z i. The coefficient β 2 measures how the predicted outcome varies in Z i for units with the same value of X i. 30 / 62
31 4/ OLS Mechanics with Two Covariates 31 / 62
32 Fitted values and residuals Where do we get our hats? β 0, β 1, β 2 Fitted values for i = 1,, n: Residuals for i = 1,, n: Y i = E[Y i X i, Z i ] = β 0 + β 1 X i + β 2 Z i u i = Y i Y i Minimize the sum of the squared residuals, just like before: ( β 0, β 1, β 2 ) = arg min b 0,b 1,b 2 n (Y i b 0 b 1 X i b 2 Z i ) 2 We ll derive closed-form estimators with arbitrary covariates next week. i=1 32 / 62
33 OLS estimator recipe using two steps No explicit OLS formulas this week, but a recipe instead Partialling out OLS recipe: 1. Run regression of X i on Z i : X i = E[X i Z i ] = δ 0 + δ 1 Z i 2. Calculate residuals from this regression: r xz,i = X i X i 3. Run a simple regression of Y i on residuals, r xz,i : Y i = α 0 + α 1 r xz,i Estimate of α 1 will be equivalent to β 1 from the big regression: Y i = β 0 + β 1 X i + β 2 Z i 33 / 62
34 First regression Regress X i on Z i : ## when missing data exists, need the na.actionin order ## to place residuals or fitted values back into the data ajr.first <- lm(avexpr ~ meantemp, data = ajr, na.action = na.exclude) summary(ajr.first) ## ## Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) < 2e-16 *** ## meantemp *** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.32 on 58 degrees of freedom ## (103 observations deleted due to missingness) ## Multiple R-squared: 0.241, Adjusted R-squared: ## F-statistic: 18.4 on 1 and 58 DF, p-value: / 62
35 Regression of log income on the residuals Save residuals: ## store the residuals ajr$avexpr.res <- residuals(ajr.first) Now we compare the estimated slopes: coef(lm(logpgp95 ~ avexpr.res, data = ajr)) ## (Intercept) avexpr.res ## coef(lm(logpgp95 ~ avexpr + meantemp, data = ajr)) ## (Intercept) avexpr meantemp ## / 62
36 Residual/partial regression plot Can plot the conditional relationship between property rights and income given temperature: 10 Log GDP per capita Residuals(Property Right ~ Mean Temperature) 36 / 62
37 5/ OLS Assumptions with Two Covariates 37 / 62
38 OLS assumptions for unbiasedness Simple regression assumptions unbiasedness/consistency of OLS: 1. Linearity 2. Random/iid sample 3. Variation in X i 4. Zero conditional mean error: E[u i X i ] = 0 Small modification to these with 2 covariates: 1. Linearity Y i = β 0 + β 1 X i + β 2 Z i + u i 2. Random/iid sample 3. No perfect collinearity 4. Zero conditional mean error (both X i and Z i unrelated to u i ) E[u i X i, Z i ] = 0 38 / 62
39 New assumption Assumption 3: No perfect collinearity (1) No independent variable is constant in the sample and (2) there are no exactly linear relationships among the independent variables. Two components 1. Both X i and Z i have to vary. 2. Z i cannot be a deterministic, linear function of X i. Part 2 rules out anything of the form: Z i = a + bx i What s the correlation between Z i and X i? 1! 39 / 62
40 Perfect collinearity example Simple example: X i = 1 if a country is not in Africa and 0 otherwise. Zi = 1 if a country is in Africa and 0 otherwise. But, clearly we have the following: Z i = 1 X i These two variables are perfectly collinear. What about the following: Xi = property rights Z i = Xi 2 Do we have to worry about collinearity here? No! Because while Z i is a deterministic function of X i, it is a nonlinear function of X i. 40 / 62
41 R and perfect collinearity R, Stata, et al will drop one of the variables if there is perfect collinearity: ajr$nonafrica <- 1 - ajr$africa summary(lm(logpgp95 ~ africa + nonafrica, data = ajr)) ## ## Coefficients: (1 not defined because of singularities) ## Estimate Std. Error t value Pr(> t ) ## (Intercept) < 2e-16 *** ## africa e-14 *** ## nonafrica NA NA NA NA ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: on 146 degrees of freedom ## (15 observations deleted due to missingness) ## Multiple R-squared: 0.323, Adjusted R-squared: ## F-statistic: 69.7 on 1 and 146 DF, p-value: 4.87e / 62
42 6/ Omitted Variable Bias 42 / 62
43 Unbiasedness revisited Long regression: Y i = β 0 + β 1 X i + β 2 Z i + u i Assumptions 1-4 OLS is unbiased for β 0, β 1, β 2 What happens if we ignore the Z i and just run the simple linear regression with just X i? Short regression: Y i = α 0 + α 1 X i + u i OLS estimates from the short regression: ( α 0, α 1 ) Question: will E[ α 1 ] = β 1? If not, what will be the difference? 43 / 62
44 Deriving the short regression How can we relate α 1 to β 1? Short regression will be unbiased for CEF of Yi just given X i. Write short CEF in terms of the long regression model: E[Y i X i ] = E[β 0 + β 1 X i + β 2 Z i + u i X i ] = β 0 + β 1 X i + β 2 E[Z i X i ] + E[u i X i ] By assumption 4, X i is unrelated to the long-regression error, so E[u i X i ] = 0. E[Y i X i ] = β 0 + β 1 X i + β 2 E[Z i X i ] 44 / 62
45 Deriving the short regression E[Y i X i ] = β 0 + β 1 X i + β 2 E[Z i X i ] Let E[Z i X i ] = γ 0 + γ 1 X i be the (population) CEF from a regression of Z i on X i. Then, we can write the short CEF as: E[Y i X i ] = β 0 + β 1 X i + β 2 (γ 0 + γ 1 X i ) = (β 0 + γ 0 ) + (β 1 + β 2 γ 1 )X i = α 0 + α 1 X i Under these assumptions, short regression OLS unbiased for α 1 : E[ α 1 ] = α 1 = β 1 + β 2 γ 1 45 / 62
46 Omitted variable bias Omitted variable bias: bias for long regression coefficient from omitting Z i : Bias( α 1 ) = E[ α 1 ] β 1 = β 2 δ 1 In other words omitted variable bias is: ( effect of Z i on Y i ) ( effect of X i on Z i ) (omitted outcome) (included omitted) 46 / 62
47 Omitted variable bias, summary Remember that by OLS, the effect of X i on Z i is: δ 1 = cov(z i, X i ) var(x i ) We can summarize the direction of bias like so: cov(x i, Z i ) > 0 cov(x i, Z i ) < 0 cov(x i, Z i ) = 0 β 2 > 0 Positive bias Negative Bias No bias β 2 < 0 Negative bias Positive Bias No bias β 2 = 0 No bias No bias No bias Very relevant if Z i is unobserved for some reason! 47 / 62
48 Including irrelevant variables What if we do the opposite and include an irrelevant variable? What would it mean for Z i to be an irrelevant variable? Y i = β 0 + β 1 X i + 0 Z i + u i So in this case, the true value of β 2 = 0. But under Assumptions 1-4, OLS is unbiased for all the parameters: E[ β 0 ] = β 0 E[ β 1 ] = β 1 E[ β 2 ] = 0 Including an irrelevant variable will increase the standard errors for β / 62
49 7/ Goodness of Fit & Multicollinearity 49 / 62
50 Prediction error How do we judge how well a regression fits the data? How much does X i help us predict Y i? Prediction errors without X i : Best prediction is the mean, Y Prediction error is called the total sum of squares (SS tot ) would be: Prediction errors with X i : SS tot = n i=1 (Y i Y) 2 Best predictions are the fitted values, Y i. Prediction error is the the sum of the squared residuals or SS res : SS res = n i=1 (Y i Y i ) 2 50 / 62
51 Total SS vs SSR 12 Total Prediction Errors 11 Log GDP per capita Y Strength of Property Rights 51 / 62
52 Total SS vs SSR Residuals Log GDP per capita Y i Strength of Property Rights 52 / 62
53 R-square Regression will always improve in-sample fit: SS tot > SS res How much better does using X i do? Coefficient of determination or R 2 : R 2 = SS tot SS res SS tot = 1 SS res SS tot R 2 = fraction of the total prediction error eliminated by conditioning on X i. Common interpretation: R 2 is the fraction of the variation in Y i is explained by X i. R 2 = 0 means no relationship R 2 = 1 implies perfect linear fit 53 / 62
54 Sampling variance for bivariate regression Under simple linear regression and homoskadasticity, the sampling variance of the slope was: V[ β 1 X] = σ 2 u n i=1 (X i X) 2 = Factors affecting the standard errors: σ 2 u (n 1)S 2 X The error variance σ 2 u (higher conditional variance of Y i leads to bigger SEs) The sample variance of X i : SX 2 (lower variation in X i leads to bigger SEs) The sample size n (higher sample size leads to lower SEs) 54 / 62
55 Sampling variation with 2 covariates Regression with an additional independent variable: V[ β 1 X i, Z i ] = σ 2 u (1 R 2 1 )(n 1)S2 X Here, R 2 1 is the R2 from the regression of X i on Z i : X i = δ 0 + δ 1 Z i Factors now affecting the standard errors: The error variance: σ 2 u The sample variance of Xi : SX 2 The sample size n The strength of the (linear) relationship betwee Xi and Z i (stronger relationships mean higher R1 2 and thus bigger SEs) 55 / 62
56 Multicollinearity Definition Multicollinearity is defined to be high, but not perfect, correlation between two independent variables in a regression. With multicollinearity, we ll have R1 2 1, but not exactly. The stronger the relationship between X i and Z i, the closer the R1 2 will be to 1, and the higher the SEs will be: V[ β 1 X i, Z i ] = σ 2 u (1 R 2 1 )(n 1)S2 X Given the symmetry, it will also increase var( β 2 ) as well. 56 / 62
57 Intuition for multicollinearity Remember the OLS recipe: r xz,i are the residuals from the regression of X i on Z i β 1 from regression of Y i on Estimated coefficient: r xz,i β 1 = ĉov[ r xz,iy i ] V[ rxz,i 2 ] When Z i and X i have a strong relationship, then the residuals will have low variation We explain away a lot of the variation in X i through Z i. 57 / 62
58 Multicollinearity, visualized Weak X-Z Strong X-Z X 2 X Z Z 58 / 62
59 Multicollinearity, visualized Weak X-Z Strong X-Z Y 2 Y Residuals(X ~ Z) Residuals(X ~ Z) 59 / 62
60 Multicollinearity, visualized Weak X-Z Strong X-Z Y 2 Y Residuals(X ~ Z) Residuals(X ~ Z) 60 / 62
61 Effects of multicollinearity No effect on the bias of OLS. Only increases the standard errors. Really just a sample size problem: If X i and Z i are extremely highly correlated, you re going to need a much bigger sample to accurately differentiate between their effects. 61 / 62
62 Conclusion In this brave new world with 2 independent variables: 1. β s have slightly different interpretations 2. OLS still minimizing the sum of the squared residuals 3. Adding or omitting variables in a regression can affect the bias and the variance of OLS Remainder of class: 1. Regression in most general glory (matrices) 2. How to diagnose and fix violations of the OLS assumptions 62 / 62
Gov 2000: 9. Regression with Two Independent Variables
Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Harvard University mblackwell@gov.harvard.edu Where are we? Where are we going? Last week: we learned about how to calculate a simple
More informationWeek 6: Linear Regression with Two Regressors
Week 6: Linear Regression with Two Regressors Brandon Stewart 1 Princeton October 17, 19, 2016 1 These slides are heavily influenced by Matt Blackwell, Adam Glynn and Jens Hainmueller. Stewart (Princeton)
More informationThe Simple Linear Regression Model
The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate
More informationGov 2002: 4. Observational Studies and Confounding
Gov 2002: 4. Observational Studies and Confounding Matthew Blackwell September 10, 2015 Where are we? Where are we going? Last two weeks: randomized experiments. From here on: observational studies. What
More informationLecture 5: Omitted Variables, Dummy Variables and Multicollinearity
Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity R.G. Pierse 1 Omitted Variables Suppose that the true model is Y i β 1 + β X i + β 3 X 3i + u i, i 1,, n (1.1) where β 3 0 but that the
More informationEconometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018
Econometrics I KS Module 1: Bivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: March 12, 2018 Alexander Ahammer (JKU) Module 1: Bivariate
More informationGov 2000: 7. What is Regression?
Gov 2000: 7. What is Regression? Matthew Blackwell Harvard University mblackwell@gov.harvard.edu October 15, 2016 Where are we? Where are we going? What we ve been up to: estimating parameters of population
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More informationECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors
ECON4150 - Introductory Econometrics Lecture 6: OLS with Multiple Regressors Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 6 Lecture outline 2 Violation of first Least Squares assumption
More informationEconometrics Summary Algebraic and Statistical Preliminaries
Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L
More informationTHE MULTIVARIATE LINEAR REGRESSION MODEL
THE MULTIVARIATE LINEAR REGRESSION MODEL Why multiple regression analysis? Model with more than 1 independent variable: y 0 1x1 2x2 u It allows : -Controlling for other factors, and get a ceteris paribus
More informationReview of Econometrics
Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationRegression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y
Regression and correlation Correlation & Regression, I 9.07 4/1/004 Involve bivariate, paired data, X & Y Height & weight measured for the same individual IQ & exam scores for each individual Height of
More informationThe linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.
8 Nonlinear effects Lots of effects in economics are nonlinear Examples Deal with these in two (sort of three) ways: o Polynomials o Logarithms o Interaction terms (sort of) 1 The linear model Our models
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationstatistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:
Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility
More informationProblem Set #6: OLS. Economics 835: Econometrics. Fall 2012
Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More informationLecture 3: Multiple Regression
Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationLinear Regression with Multiple Regressors
Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution
More informationEconometrics Review questions for exam
Econometrics Review questions for exam Nathaniel Higgins nhiggins@jhu.edu, 1. Suppose you have a model: y = β 0 x 1 + u You propose the model above and then estimate the model using OLS to obtain: ŷ =
More informationECO220Y Simple Regression: Testing the Slope
ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x
More information2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0
Introduction to Econometrics Midterm April 26, 2011 Name Student ID MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. (5,000 credit for each correct
More informationChapter 2: simple regression model
Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.
More informationSimple Linear Regression Analysis
LINEAR REGRESSION ANALYSIS MODULE II Lecture - 6 Simple Linear Regression Analysis Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Prediction of values of study
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationMultiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =
Economics 130 Lecture 6 Midterm Review Next Steps for the Class Multiple Regression Review & Issues Model Specification Issues Launching the Projects!!!!! Midterm results: AVG = 26.5 (88%) A = 27+ B =
More informationMultiple Regression Analysis. Part III. Multiple Regression Analysis
Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant
More informationMultiple Linear Regression CIVL 7012/8012
Multiple Linear Regression CIVL 7012/8012 2 Multiple Regression Analysis (MLR) Allows us to explicitly control for many factors those simultaneously affect the dependent variable This is important for
More informationStatistical Inference with Regression Analysis
Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing
More informationECNS 561 Multiple Regression Analysis
ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking
More informationInstrumental Variables
Instrumental Variables Department of Economics University of Wisconsin-Madison September 27, 2016 Treatment Effects Throughout the course we will focus on the Treatment Effect Model For now take that to
More informationStat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov
Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz Quiz #4 This weekend, don t forget. Usual format Assumptions Display 7.5 p. 180 The ideal normal, simple
More informationTwo-Variable Regression Model: The Problem of Estimation
Two-Variable Regression Model: The Problem of Estimation Introducing the Ordinary Least Squares Estimator Jamie Monogan University of Georgia Intermediate Political Methodology Jamie Monogan (UGA) Two-Variable
More informationGov 2000: 13. Panel Data and Clustering
Gov 2000: 13. Panel Data and Clustering Matthew Blackwell Harvard University mblackwell@gov.harvard.edu November 28, 2016 Where are we? Where are we going? Up until now: the linear regression model, its
More informationHandout 11: Measurement Error
Handout 11: Measurement Error In which you learn to recognise the consequences for OLS estimation whenever some of the variables you use are not measured as accurately as you might expect. A (potential)
More informationMeasurement Error. Often a data set will contain imperfect measures of the data we would ideally like.
Measurement Error Often a data set will contain imperfect measures of the data we would ideally like. Aggregate Data: (GDP, Consumption, Investment are only best guesses of theoretical counterparts and
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationVariance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.
10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for
More informationInteractions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept
Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and
More informationEconometrics Midterm Examination Answers
Econometrics Midterm Examination Answers March 4, 204. Question (35 points) Answer the following short questions. (i) De ne what is an unbiased estimator. Show that X is an unbiased estimator for E(X i
More informationSection 3: Simple Linear Regression
Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationA Re-Introduction to General Linear Models
A Re-Introduction to General Linear Models Today s Class: Big picture overview Why we are using restricted maximum likelihood within MIXED instead of least squares within GLM Linear model interpretation
More informationECON 3150/4150, Spring term Lecture 7
ECON 3150/4150, Spring term 2014. Lecture 7 The multivariate regression model (I) Ragnar Nymoen University of Oslo 4 February 2014 1 / 23 References to Lecture 7 and 8 SW Ch. 6 BN Kap 7.1-7.8 2 / 23 Omitted
More informationIntroduction to Within-Person Analysis and RM ANOVA
Introduction to Within-Person Analysis and RM ANOVA Today s Class: From between-person to within-person ANOVAs for longitudinal data Variance model comparisons using 2 LL CLP 944: Lecture 3 1 The Two Sides
More informationLinear Regression with Multiple Regressors
Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution
More informationCh 7: Dummy (binary, indicator) variables
Ch 7: Dummy (binary, indicator) variables :Examples Dummy variable are used to indicate the presence or absence of a characteristic. For example, define female i 1 if obs i is female 0 otherwise or male
More informationBasic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler
Basic econometrics Tutorial 3 Dipl.Kfm. Introduction Some of you were asking about material to revise/prepare econometrics fundamentals. First of all, be aware that I will not be too technical, only as
More informationECON The Simple Regression Model
ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In
More informationRockefeller College University at Albany
Rockefeller College University at Albany PAD 705 Handout: Suggested Review Problems from Pindyck & Rubinfeld Original prepared by Professor Suzanne Cooper John F. Kennedy School of Government, Harvard
More informationEconometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017
Econometrics with Observational Data Introduction and Identification Todd Wagner February 1, 2017 Goals for Course To enable researchers to conduct careful quantitative analyses with existing VA (and non-va)
More informationThe general linear regression with k explanatory variables is just an extension of the simple regression as follows
3. Multiple Regression Analysis The general linear regression with k explanatory variables is just an extension of the simple regression as follows (1) y i = β 0 + β 1 x i1 + + β k x ik + u i. Because
More informationLECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity
LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationCS 5014: Research Methods in Computer Science
Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationEmpirical Application of Simple Regression (Chapter 2)
Empirical Application of Simple Regression (Chapter 2) 1. The data file is House Data, which can be downloaded from my webpage. 2. Use stata menu File Import Excel Spreadsheet to read the data. Don t forget
More informationFinal Exam. Economics 835: Econometrics. Fall 2010
Final Exam Economics 835: Econometrics Fall 2010 Please answer the question I ask - no more and no less - and remember that the correct answer is often short and simple. 1 Some short questions a) For each
More informationEconometrics Problem Set 4
Econometrics Problem Set 4 WISE, Xiamen University Spring 2016-17 Conceptual Questions 1. This question refers to the estimated regressions in shown in Table 1 computed using data for 1988 from the CPS.
More informationThe Simple Regression Model. Simple Regression Model 1
The Simple Regression Model Simple Regression Model 1 Simple regression model: Objectives Given the model: - where y is earnings and x years of education - Or y is sales and x is spending in advertising
More informationChapter 2 The Simple Linear Regression Model: Specification and Estimation
Chapter The Simple Linear Regression Model: Specification and Estimation Page 1 Chapter Contents.1 An Economic Model. An Econometric Model.3 Estimating the Regression Parameters.4 Assessing the Least Squares
More informationECON Introductory Econometrics. Lecture 17: Experiments
ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.
More informationFinal Exam - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your
More informationLecture 16 - Correlation and Regression
Lecture 16 - Correlation and Regression Statistics 102 Colin Rundel April 1, 2013 Modeling numerical variables Modeling numerical variables So far we have worked with single numerical and categorical variables,
More informationECON Introductory Econometrics. Lecture 13: Internal and external validity
ECON4150 - Introductory Econometrics Lecture 13: Internal and external validity Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 9 Lecture outline 2 Definitions of internal and external
More informationThe returns to schooling, ability bias, and regression
The returns to schooling, ability bias, and regression Jörn-Steffen Pischke LSE October 4, 2016 Pischke (LSE) Griliches 1977 October 4, 2016 1 / 44 Counterfactual outcomes Scholing for individual i is
More informationEstadística II Chapter 4: Simple linear regression
Estadística II Chapter 4: Simple linear regression Chapter 4. Simple linear regression Contents Objectives of the analysis. Model specification. Least Square Estimators (LSE): construction and properties
More informationHomoskedasticity. Var (u X) = σ 2. (23)
Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This
More informationPath Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis
Path Analysis PRE 906: Structural Equation Modeling Lecture #5 February 18, 2015 PRE 906, SEM: Lecture 5 - Path Analysis Key Questions for Today s Lecture What distinguishes path models from multivariate
More informationLecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1
Lecture Simple Linear Regression STAT 51 Spring 011 Background Reading KNNL: Chapter 1-1 Topic Overview This topic we will cover: Regression Terminology Simple Linear Regression with a single predictor
More informationSimple Regression Model (Assumptions)
Simple Regression Model (Assumptions) Lecture 18 Reading: Sections 18.1, 18., Logarithms in Regression Analysis with Asiaphoria, 19.6 19.8 (Optional: Normal probability plot pp. 607-8) 1 Height son, inches
More informationA Re-Introduction to General Linear Models (GLM)
A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More informationGov Multiple Random Variables
Gov 2000-4. Multiple Random Variables Matthew Blackwell September 29, 2015 Where are we? Where are we going? We described a formal way to talk about uncertain outcomes, probability. We ve talked about
More informationThe Classical Linear Regression Model
The Classical Linear Regression Model ME104: Linear Regression Analysis Kenneth Benoit August 14, 2012 CLRM: Basic Assumptions 1. Specification: Relationship between X and Y in the population is linear:
More informationImportant note: Transcripts are not substitutes for textbook assignments. 1
In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance
More informationEssential of Simple regression
Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship
More informationLab 07 Introduction to Econometrics
Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand
More information1 The Classic Bivariate Least Squares Model
Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating
More informationIntroduction to Econometrics. Review of Probability & Statistics
1 Introduction to Econometrics Review of Probability & Statistics Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com Introduction 2 What is Econometrics? Econometrics consists of the application of mathematical
More informationLinear Regression. Junhui Qian. October 27, 2014
Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency
More informationMultiple Linear Regression. Chapter 12
13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.
More informationFNCE 926 Empirical Methods in CF
FNCE 926 Empirical Methods in CF Lecture 2 Linear Regression II Professor Todd Gormley Today's Agenda n Quick review n Finish discussion of linear regression q Hypothesis testing n n Standard errors Robustness,
More informationECON3150/4150 Spring 2015
ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2
More informationMultiple Regression Theory 2006 Samuel L. Baker
MULTIPLE REGRESSION THEORY 1 Multiple Regression Theory 2006 Samuel L. Baker Multiple regression is regression with two or more independent variables on the right-hand side of the equation. Use multiple
More informationMaking sense of Econometrics: Basics
Making sense of Econometrics: Basics Lecture 4: Qualitative influences and Heteroskedasticity Egypt Scholars Economic Society November 1, 2014 Assignment & feedback enter classroom at http://b.socrative.com/login/student/
More informationLECTURE 15: SIMPLE LINEAR REGRESSION I
David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).
More informationSimple, Marginal, and Interaction Effects in General Linear Models
Simple, Marginal, and Interaction Effects in General Linear Models PRE 905: Multivariate Analysis Lecture 3 Today s Class Centering and Coding Predictors Interpreting Parameters in the Model for the Means
More informationMultiple Regression Analysis
Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,
More informationLogistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015
Logistic regression: Why we often can do what we think we can do Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015 1 Introduction Introduction - In 2010 Carina Mood published an overview article
More informationIntroduction to Econometrics. Multiple Regression
Introduction to Econometrics The statistical analysis of economic (and related) data STATS301 Multiple Regression Titulaire: Christopher Bruffaerts Assistant: Lorenzo Ricci 1 OLS estimate of the TS/STR
More informationMotivation for multiple regression
Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope
More informationSimple Regression Model. January 24, 2011
Simple Regression Model January 24, 2011 Outline Descriptive Analysis Causal Estimation Forecasting Regression Model We are actually going to derive the linear regression model in 3 very different ways
More informationReview of probability and statistics 1 / 31
Review of probability and statistics 1 / 31 2 / 31 Why? This chapter follows Stock and Watson (all graphs are from Stock and Watson). You may as well refer to the appendix in Wooldridge or any other introduction
More informationRewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35
Rewrap ECON 4135 November 18, 2011 () Rewrap ECON 4135 November 18, 2011 1 / 35 What should you now know? 1 What is econometrics? 2 Fundamental regression analysis 1 Bivariate regression 2 Multivariate
More informationChapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression
BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between
More information