THE MULTIVARIATE LINEAR REGRESSION MODEL
|
|
- Samson Lawrence Lee
- 5 years ago
- Views:
Transcription
1 THE MULTIVARIATE LINEAR REGRESSION MODEL
2 Why multiple regression analysis? Model with more than 1 independent variable: y 0 1x1 2x2 u It allows : -Controlling for other factors, and get a ceteris paribus effect. Ex: y: wage, x 1 : education, x 2 : IQ => IQ is no more part of u. =>better job at inferring causality. -better predictions: more of the variation in y can be explained.
3 Why multiple regression analysis?(2) It also allows: - estimating non-linear relationship. Ex: quadratic relationship between wage and experience. wage 0 1 exp er 2 exp er Careful: no ceteris paribus interpretation here! -Testing joint hypotheses on parameters. Key assumption: E 2 u ( u 2 x1, x ) 0
4 Example: Determinants of wage Source: Wooldridge, WAGE1.dta (data from 1976 Current Population Survey) Population model: wage educ exp er u use sum wage educ exper Variable Obs Mean Std. Dev. Min Max wage educ exper corr educ exper (obs=526) educ exper educ exper
5 Example: Determinants of wage(2). reg wage educ Source SS df MS Number of obs = 526 F( 1, 524) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = wage Coef. Std. Err. t P> t [95% Conf. Interval] educ _cons reg wage educ exper Source SS df MS Number of obs = 526 F( 2, 523) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = wage Coef. Std. Err. t P> t [95% Conf. Interval] educ exper _cons
6 Example: Determinants of wage(3) Interpretation: A one-year increase in education is predicted to increase hourly wage by 64 cent, ceteris paribus. An additional year of experience is predicted to increase wage by 7 cents, ceteris paribus. Comparing with the results of the bivariate model, we now obtain a higher estimate of returns to education. We suspect the results of the bivariate case to be biased, since experience is correlated with education, and experience affects wage too. I.e. the zero conditional mean assumptions was likely to be violated in the bivariate case. In other words: in the bivariate case, the impact of education accounted for the impact of experience as well. As the correlation between the two variables is negative, the estimate of the impact of education on wage was downward biased.
7 Example: introducing quadratics What if the impact of a variable is not constant? wage 0 1 exp er 2 exp er 2 u Introducing quadratics allows us to: model an increasing or decreasing effect of experience when experience increases. wage exp er exp er Determine the turning point of the effect: exp er 1 2 2
8 . list exper* in 1/10 exper expersq reg wage exper* Source SS df MS Number of obs = 526 F( 2, 523) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = wage Coef. Std. Err. t P> t [95% Conf. Interval] exper expersq _cons Interpretation: For low levels of experience, wage is predicted to increase with experience, ceteris paribus. The negative sign on the square term indicates however that as the number of years of experience increases, the returns to an additional year decreases. In fact we can calculate the turning point, i.e. the point where the marginal returns to education are 0. This happens at.298/(2*006), i.e. approximately at 25 years of experience.
9 Stata commands:. scatter wage exper qfit wage exper,name(multiple). scatter wage exper lfit wage exper,name(simple). graph combine multiple simple,saving(simple_multiple) (file simple_multiple.gph saved) exper exper wage Fitted values wage Fitted values
10 The model with k independent variables The general multiple linear regression model (also called the multiple regression model) can be written in the population as: y x... x u. Notation: x 1, x2,..., x k are the independent variables, with k the number of independent variables, and x ik the value of variable x k for observation i. Key assumption:. E ( 2 0 u x1, x,..., x k ) k k
11 Deriving the OLS estimates The estimated model is:. We want to estimate =>k+1 OLS estimates. Minimize sum of squares of residuals: =>First order conditions (using calculus, see Appendix 3A) give k+1 linear equations with k+1 unknown: ˆ,..., ˆ, ˆ ) ˆ... ˆ ˆ ( 1 0 ik k i n i i x x y Min k k x k x x y ˆ... ˆ ˆ ˆ ˆ k ˆ,..., ˆ, ˆ, ˆ k ˆ,..., ˆ, ˆ, ˆ 2 1 0
12 Interpretation of OLS estimates ˆ Estimated model: yˆ (3.11) 0 1x1 2x2... kx k How do we interpret ˆ ˆ ˆ? 1, 2,..., k We can obtain from (3.11) the predicted change in y given changes in the x i : yˆ ˆ x ˆ x ˆ kx k. The coefficient on x 1 measures the change in y due to a one-unit increase in x 1, holding all other independent variables fixed. That is, if we hold x 2, x 3, x k constant: yˆ ˆ 1x1=>allows ceteris paribus estimation, even if data were not collected this way!! ˆ ˆ ˆ
13 OLS Fitted Values and Residuals For obs i, the fitted value is simply: y ˆ ˆ x... ˆ x The actual value y i will not equal the predicted value. Residual: ˆi 0 1 i1 uˆ i y i yˆ Same properties of fitted values and residuals: -The sample average of the residuals is zero. i -The sample covariance between the x i and residuals is zero=> between fitted values and residuals also. -The average point is always on the regression line. k ik ŷ i
14 Simple Vs. Multiple regression estimates ~ ~ ~ x Simple regression model: y Multiple regression model: yˆ ˆ 0 ˆ 1x1 ˆ 2x2 ~ 1 ˆ 1 if: - the partial effect of x 2 is zero in the sample -x 1 and x 2 are uncorrelated in the sample ~ 1 ˆ 1 if: : - the partial effect of x 2 is small in the sample -x 1 and x 2 are weakly correlated in the sample.
15 How good is the estimation at explaining the dependent variable? Measure of sample variation: Total Sum of Squares: SST n i1 ( y i y) 2 Part that is explained by x: Explained Sum of Squares: SSE n i1 ( yˆ i y) 2 Part that is unexplained by x: Residual Sum of Squares: SSR n i1 2 uˆ i Just as in the simple regression case, SST=SSE+SSR.
16 Goodness of fit: the R-squared 2 SSE SSR R 1 SST SST R 2 is the proportion of the sample variation in y i that is explained by the OLS regression line. R 2 lies between 0 and 1. Higher value indicates a better fit, but: R 2 never decreases, and it usually increases when another independent variable is added to a regression=> poor tool for deciding which model to choose. We will need another criterion to choose whether to include a variable.
17 Example: explaining arrest records Population model: First, we estimate the model without the variable avgsen. We obtain:. use reg narr86 pcnv ptime86 qemp86 Source SS df MS Number of obs = 2725 F( 3, 2721) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE =.8416 narr86 Coef. Std. Err. t P> t [95% Conf. Interval] pcnv ptime qemp _cons
18 So we obtain the estimated equation: Nârr86= pcnv ptime qemp86 n = 2,725, R 2 =.0413 The three variables pcnv, ptime86, and qemp86 explain about 4.1 percent of the variation in narr86. What happens if pcnv increases by 50%? nârr86 = -.150(.5) = What happens if ptime86 increases from 0 to 12? predicted arrests for a particular man falls by 0.034(12)=0.408 What if we include avgsen in the model?
19 . reg narr86 avgsen pcnv ptime86 qemp86 Source SS df MS Number of obs = 2725 F( 4, 2720) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = narr86 Coef. Std. Err. t P> t [95% Conf. Interval] avgsen pcnv ptime qemp _cons R 2 increases from.0413 to.0422, a practically small effect. The sign of the coefficient on avgsen is also unexpected: a longer average sentence length increases criminal activity. =>What should we conclude about the two models?
20 Unbiasedness of OLS Remember assumptions: Linearity (in parameters!): y 0 1x... k xk u Random sampling: yi 0 1xi 1... k xik ui i=1,2,,n. Zero conditional mean: E( u x1, x2,..., x k ) 0 No perfect collinearity: In the sample (and therefore in the population), none of the independent variables is constant, and there are no exact linear relationships among the independent variables.. Using all these assumptions we can prove the first important statistical property of OLS: unbiasedness. E( ˆ j ) j, j 1,2,..., k
21 Violations of zero conditional mean ZCM assumption will not be true if the functional relationship between the explained and explanatory variables is misspecified in equation : 2 Ex1: True model: cons 0 1inc 2inc u Estimated model: cons 0 1inc u Ex2: True model: log( wage) 0 1educ u Estimated model: wage 0 educ u 1 We omit a variable that is correlated with the x i =>endogeneity.
22 Violations of no perfect collinearity Assumption is violated if there exists (a, b) such that x 1 = a+bx 2. one variable can t be multiple from another (Ex: inc and inc 2 are ok, but log(inc) and log(inc 2 ) are not ok.) One variable can t be the sum of some of the others When variables are shares: can t include all the shares. Practical Note: Stata will not estimate models with perfect collinearity. Solution: drop any of the perfectly correlated variables!
23 Perfect collinearity could also fail if n<k+1: to estimate k+1 parameters, we need at least k+1observations. Bad luck in collecting the sample.
24 Examples of perfect collinearity : Voting outcomes and campaign expenditures Source: Wooldridge, VOTE1.dta (From M. Barone and G. Ujifusa, The Almanac of American Politics, Washington, DC: National Journal.) two-party races for the US House of Representatives in bcuse vote1. ge shareb=100-sharea Data description: votea: percent vote for A expenda: campaign expends. by A, $1000s expendb: campaign expends. by B, $1000s sharea: 100*(expendA/(expendA+expendB)). su votea expenda expendb sharea shareb Variable Obs Mean Std. Dev. Min Max votea expenda expendb sharea shareb
25 . reg votea sharea shareb note: sharea omitted because of collinearity Source SS df MS Number of obs = 173 F( 1, 171) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = votea Coef. Std. Err. t P> t [95% Conf. Interval] sharea (omitted) shareb _cons reg votea shareb Source SS df MS Number of obs = 173 F( 1, 171) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = votea Coef. Std. Err. t P> t [95% Conf. Interval] shareb _cons reg votea sharea Source SS df MS Number of obs = 173 F( 1, 171) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = votea Coef. Std. Err. t P> t [95% Conf. Interval] sharea _cons
26 Interpretation: The variables sharea and shareb are perfectly collinear (sharea = 100-shareB). Therefore they cannot both be used as independent variables in the regression. STATA will automatically drop one=> First two estimates are the same. ShareA as only explanatory variable and ShareB as only explanatory variable yield the same results: Increasing the share of expenditures of B with one percentage point (=a one percentage point decrease of the share of A), is predicted to decrease the share of votes for A by.46 percentage points, ceteris paribus.
27 Omitted variable bias Let y be the true model. 0 1x1 2x2 u All 4 assumptions are verified. When estimated, it gives: yˆ ˆ 0 ˆ 1x1 ˆ 2x2 We want the effect of x 1 on y. What happens if we regress y on x 1 only? The estimated (underspecified) model then is: ~ ~ ~ y x ~ ~ ~ is biased for β 1 : ( ). 1 E omittedbias
28 About the omitted bias: Two cases when ~ 1 is not biased: When β 2 =0 so that x 2 does not appear in the true model. ~ If 0, i.e. if and only if x 1 and x 2 are uncorrelated in 1 the sample. Direction of the omitted variable bias: 2-Variable case. Corr(x 1,x 2 )>0 Corr(x 1,x 2 )<0 β 2 >0 Positive bias Negative bias β 2 <0 Negative bias Positive bias
29 Example 3: Impact of IQ on relationship between wage and education Source: WAGE2.dta, Wooldridge, (data used in M. Blackburn and D. Neumark (1992), Unobserved Ability, Efficiency Wages, and Interindustry Wage Differentials, Quarterly Journal of Economics 107, ). use su wage IQ educ Variable Obs Mean Std. Dev. Min Max wage IQ educ
30 . reg wage educ Source SS df MS Number of obs = 935 F( 1, 933) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = wage Coef. Std. Err. t P> t [95% Conf. Interval] educ _cons reg wage educ IQ Source SS df MS Number of obs = 935 F( 2, 932) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = wage Coef. Std. Err. t P> t [95% Conf. Interval] educ IQ _cons corr educ IQ (obs=935) educ IQ educ IQ
31 Interpretation: Intellectual ability is likely to affect both people s wage and their education. Therefore a simple regression of wage on education is likely to be biased, as intellectual ability will be included in the error term, resulting in a violation of the zero conditional mean assumption. If we use IQ (as a proxy for intellectual ability) we correct for this bias. Given that IQ and education are positively correlated, and IQ and wage are also positively correlated, we suspect the coefficient in the bivariate model to be positively biased (i.e overestimated). This is confirmed when we run the regression including IQ. The coefficient of education drops from 60 to 42. An increase in the IQ score of 1, is predicted to increase wage by 5$ per month, ceteris paribus. Given that the correlation between education and IQ, and between IQ and wage is strong, the bias in the bivariate model was large.
32 Omitted variable bias: multiple case What happens if multiple regressors? Correlation between a single explanatory variable and the error generally results in all OLS estimators being biased. If focus is on the relationship between a particular explanatory variable, say x1, and the key omitted factor, ignoring all other explanatory variables is a valid practice only when each one is uncorrelated with x1, but it is still a useful guide.
33 Including irrelevant variables Overspecifying the model: one (or more) of the independent variables is included in the model even though it has no partial effect on y in the population. (That is, its population coefficient is zero.) No bias (when 4 assumptions hold), but not harmless though: undesirable effects on the variances of the OLS estimators.
34 Back to the Broad Picture We are interested in understanding the effect of a variable x on variable y. Need a coefficient estimate, need to know its sign and magnitude. Need to know how precise this estimate is => need to find about its variance. 4 assumptions give us unbiasedness of the coefficient estimates. Need one more assumption to obtain an unbiased estimate of the variance of the coefficient estimate, and to know OLS are efficient.
35 5 Gauss-Markov assumptions: 4+1 Linearity Random Sampling Zero Conditional Mean No Perfect Collinearity Homoskedasticity: variance in the error term, conditional on the explanatory variables, is constant 2 Var ( u x 1, x2,..., x k ) Under these conditions, 2 OLS estimate of error variance is unbiased: E( ˆ 2 ) We derive formula for sampling variance of the OLS coefficients. OLS is efficient (i.e. variance is the smallest variance possible).
36 Sampling variance of the OLS coefficients Under Assumptions 1 through 5, conditional on the sample values of the independent variables, For j=1,2,,k where is the total variation in x j, and is the R-squared from regressing x j on all other independent variables. Why should we care about its size?
37 Unbiased estimator of σ 2 Need an unbiased estimator of σ 2 to get an unbiased estimator of Var ( ˆ j ). σ 2 =E(u 2 ) => logical estimator would be u 2 i. Problem: the errors are not observable! But the residuals are. => ˆ? u 2 i n An unbiased estimator of σ 2 is: n SSR ˆ uˆ i n k 1 i1 n k 1 Why n-k-1? Degree of freedom=number of obs-number of parameters n
38 More precise estimate when: 2 2 lower: more noise in the equation (a larger ) makes it more difficult to estimate the partial effect of any of the x j on y. To reduce it, increase nb of x j. Larger total variation in x j : to increase it, increase sample size. SST j =0 is ruled out by assumption 4. Less correlation between the x j. Two extreme cases: =0 smallest variance. =1 perfect collinearity. (no way as it is ruled out by assumption 4)
39 What if R j 2 is close to 1? This is called multicollinearity. It does not violate assumption 4, but still is a problem as variance of the estimator increases. How to reduce multicollinearity? Dropping a variable? How big is the multicollinearity issue depends on which variable is your focus.
40 Example of Multicollinearity Relationship between education and family background Source: WAGE2.dta, Wooldridge, (data used in M. Blackburn and D. Neumark (1992), Unobserved Ability, Efficiency Wages, and Interindustry Wage Differentials, Quarterly Journal of Economics 107, ). use su educ sibs meduc feduc Variable Obs Mean Std. Dev. Min Max educ sibs meduc feduc In order to predict educational attainment, should we include all of these variables?
41 Omitted variable bias vs multicollinearity What happens if we omit father s education?. reg educ sibs meduc Source SS df MS Number of obs = 857 F( 2, 854) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = educ Coef. Std. Err. t P> t [95% Conf. Interval] sibs meduc _cons These estimates are biased if feduc affects educ and is correlated with meduc and/or sibs.. corr feduc meduc sibs (obs=722). corr educ feduc (obs=741) feduc meduc sibs feduc meduc sibs Corr(feduc,meduc) is high => there is Also a problem of multicollinearity if Both are in the model. educ feduc educ feduc
42 Should we still include the omitted variable?. reg educ sibs meduc feduc Source SS df MS Number of obs = 722 F( 3, 718) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = educ Coef. Std. Err. t P> t [95% Conf. Interval] sibs meduc feduc _cons Because of the high correlation between meduc and feduc, the standard error of the coefficient of meduc increased substantially. Given that multicollinearity is not a violation of any assumption, we prefer the second over the first estimation.
43 Try to redefine the research question: Create a third variable to sum up the information contained in the two variables meduc and feduc.. gen avpareduc=(feduc+meduc)/2 (213 missing values generated). reg educ sibs avpareduc Source SS df MS Number of obs = 722 F( 2, 719) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = educ Coef. Std. Err. t P> t [95% Conf. Interval] sibs avpareduc _cons Note: if x 1 is uncorrelated with x 2 and x 3, and x 1 is the variable of interest, then we do not really care whether x 2 and x 3 are correlated. So include x 3, it will make a better case for causality, and only the variances of the estimators of the coefficients of x 2 and x 3 will increase.
44 Misspecification Let y 0 1x1 2x2 u be the true model. All Gauss-Markov assumptions are ok. We consider two estimators of β 1 : ˆ from ˆ ˆ ˆ 1 yˆ, and 0 1x1 2x2 ~ 1 from the estimated (underspecified) model ~ ~ ~ y x Which one is the best? If bias is the criterion: ˆ 1 If variance is the criterion? will be better.
45 Trade-off variance vs bias We have equality if x1 and x2 are uncorrelated. If not: ~ When β 2 0: is biased, ˆ is not, and 1 1 ~ When β 2 =0: and ˆ are unbiased, and 1 1 Why should we prefer 1? -variances will decrease when n increases. ˆ -when we omit x 2 and β 2 0, variance of 1 bigger than it seems because x 2 is in error term, so bigger σ. ~
46 Example of misspecification. regress educ sibs meduc feduc brthord Source SS df MS Number of obs = 663 F( 4, 658) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = educ Coef. Std. Err. t P> t [95% Conf. Interval] sibs meduc feduc brthord _cons regress educ sibs meduc feduc if brthord!=. Source SS df MS Number of obs = 663 F( 3, 659) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = educ Coef. Std. Err. t P> t [95% Conf. Interval] sibs meduc feduc _cons The first regression suggests that birthorder has no significant effect on education. There has to be however a correlation between the number of siblings and the birthorder, which causes some multicollinearity. As a result the standard error for the coefficient of sibs in the first model is larger than in the second model.
47 Efficiency of OLS: Gauss-Markov theorem Under first four assumptions, OLS estimators are unbiased. But maybe there are other estimators with smaller variances? GM theorem: If assumptions 1 to 5 are satisfied, the OLS gives us the Best Linear Unbiased Estimators (BLUE) Unbiased: Assumptions 1 to 4: E( ˆ j ) j, j 1..., k Best: Smallest variances=>most precise. Linear: ˆ j can be written as a linear combination of y i.
48 findit bcuse bcuse vote1 d su expend* ge sharea2=( expenda/( expenda+ expendb))*100 ge shareb=( expendb/( expenda+ expendb))*100 list share* su share* ge a=sharea+shareb list sharea shareb a rename a sumshare reg votea sharea shareb reg votea sharea2 shareb su votea reg votea shareb clear bcuse wage2 su hours reg wage educ su wage reg wage educ IQ reg wage educ exper corr educ exper corr educ IQ corr educ exper wage corr IQ wage
Multiple Regression: Inference
Multiple Regression: Inference The t-test: is ˆ j big and precise enough? We test the null hypothesis: H 0 : β j =0; i.e. test that x j has no effect on y once the other explanatory variables are controlled
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More informationECON3150/4150 Spring 2016
ECON3150/4150 Spring 2016 Lecture 4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo Last updated: January 26, 2016 1 / 49 Overview These lecture slides covers: The linear regression
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More informationstatistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:
Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility
More informationECNS 561 Multiple Regression Analysis
ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking
More informationIn Chapter 2, we learned how to use simple regression analysis to explain a dependent
3 Multiple Regression Analysis: Estimation In Chapter 2, we learned how to use simple regression analysis to explain a dependent variable, y, as a function of a single independent variable, x. The primary
More informationMultiple Linear Regression CIVL 7012/8012
Multiple Linear Regression CIVL 7012/8012 2 Multiple Regression Analysis (MLR) Allows us to explicitly control for many factors those simultaneously affect the dependent variable This is important for
More informationStatistical Inference with Regression Analysis
Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing
More informationMultiple Regression Analysis. Part III. Multiple Regression Analysis
Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant
More informationECON3150/4150 Spring 2015
ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2
More informationECON3150/4150 Spring 2016
ECON3150/4150 Spring 2016 Lecture 6 Multiple regression model Siv-Elisabeth Skjelbred University of Oslo February 5th Last updated: February 3, 2016 1 / 49 Outline Multiple linear regression model and
More informationLab 6 - Simple Regression
Lab 6 - Simple Regression Spring 2017 Contents 1 Thinking About Regression 2 2 Regression Output 3 3 Fitted Values 5 4 Residuals 6 5 Functional Forms 8 Updated from Stata tutorials provided by Prof. Cichello
More informationECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47
ECON2228 Notes 2 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 2 2014 2015 1 / 47 Chapter 2: The simple regression model Most of this course will be concerned with
More information1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11
Econ 495 - Econometric Review 1 Contents 1 Linear Regression Analysis 4 1.1 The Mincer Wage Equation................. 4 1.2 Data............................. 6 1.3 Econometric Model.....................
More informationEssential of Simple regression
Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship
More informationLecture 8: Instrumental Variables Estimation
Lecture Notes on Advanced Econometrics Lecture 8: Instrumental Variables Estimation Endogenous Variables Consider a population model: y α y + β + β x + β x +... + β x + u i i i i k ik i Takashi Yamano
More informationThe general linear regression with k explanatory variables is just an extension of the simple regression as follows
3. Multiple Regression Analysis The general linear regression with k explanatory variables is just an extension of the simple regression as follows (1) y i = β 0 + β 1 x i1 + + β k x ik + u i. Because
More informationLecture 3: Multivariate Regression
Lecture 3: Multivariate Regression Rates, cont. Two weeks ago, we modeled state homicide rates as being dependent on one variable: poverty. In reality, we know that state homicide rates depend on numerous
More informationECO220Y Simple Regression: Testing the Slope
ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationIn Chapter 2, we learned how to use simple regression analysis to explain a dependent
C h a p t e r Three Multiple Regression Analysis: Estimation In Chapter 2, we learned how to use simple regression analysis to explain a dependent variable, y, as a function of a single independent variable,
More informationWarwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation
Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation Michele Aquaro University of Warwick This version: July 21, 2016 1 / 31 Reading material Textbook: Introductory
More informationLab 07 Introduction to Econometrics
Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand
More informationMultiple Regression Analysis
Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators
More informationEconometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018
Econometrics I KS Module 1: Bivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: March 12, 2018 Alexander Ahammer (JKU) Module 1: Bivariate
More information2. (3.5) (iii) Simply drop one of the independent variables, say leisure: GP A = β 0 + β 1 study + β 2 sleep + β 3 work + u.
BOSTON COLLEGE Department of Economics EC 228 Econometrics, Prof. Baum, Ms. Yu, Fall 2003 Problem Set 3 Solutions Problem sets should be your own work. You may work together with classmates, but if you
More informationMultiple Regression Analysis: Estimation. Simple linear regression model: an intercept and one explanatory variable (regressor)
1 Multiple Regression Analysis: Estimation Simple linear regression model: an intercept and one explanatory variable (regressor) Y i = β 0 + β 1 X i + u i, i = 1,2,, n Multiple linear regression model:
More informationEmpirical Application of Simple Regression (Chapter 2)
Empirical Application of Simple Regression (Chapter 2) 1. The data file is House Data, which can be downloaded from my webpage. 2. Use stata menu File Import Excel Spreadsheet to read the data. Don t forget
More informationHandout 12. Endogeneity & Simultaneous Equation Models
Handout 12. Endogeneity & Simultaneous Equation Models In which you learn about another potential source of endogeneity caused by the simultaneous determination of economic variables, and learn how to
More informationMeasurement Error. Often a data set will contain imperfect measures of the data we would ideally like.
Measurement Error Often a data set will contain imperfect measures of the data we would ideally like. Aggregate Data: (GDP, Consumption, Investment are only best guesses of theoretical counterparts and
More informationBasic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler
Basic econometrics Tutorial 3 Dipl.Kfm. Introduction Some of you were asking about material to revise/prepare econometrics fundamentals. First of all, be aware that I will not be too technical, only as
More informationLinear Regression with Multiple Regressors
Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution
More information1 Motivation for Instrumental Variable (IV) Regression
ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data
More information4 Instrumental Variables Single endogenous variable One continuous instrument. 2
Econ 495 - Econometric Review 1 Contents 4 Instrumental Variables 2 4.1 Single endogenous variable One continuous instrument. 2 4.2 Single endogenous variable more than one continuous instrument..........................
More informationLab 11 - Heteroskedasticity
Lab 11 - Heteroskedasticity Spring 2017 Contents 1 Introduction 2 2 Heteroskedasticity 2 3 Addressing heteroskedasticity in Stata 3 4 Testing for heteroskedasticity 4 5 A simple example 5 1 1 Introduction
More information4 Instrumental Variables Single endogenous variable One continuous instrument. 2
Econ 495 - Econometric Review 1 Contents 4 Instrumental Variables 2 4.1 Single endogenous variable One continuous instrument. 2 4.2 Single endogenous variable more than one continuous instrument..........................
More informationProblem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]
Problem Set #3-Key Sonoma State University Economics 317- Introduction to Econometrics Dr. Cuellar 1. Use the data set Wage1.dta to answer the following questions. a. For the regression model Wage i =
More information5.2. a. Unobserved factors that tend to make an individual healthier also tend
SOLUTIONS TO CHAPTER 5 PROBLEMS ^ ^ ^ ^ 5.1. Define x _ (z,y ) and x _ v, and let B _ (B,r ) be OLS estimator 1 1 1 1 ^ ^ ^ ^ from (5.5), where B = (D,a ). Using the hint, B can also be obtained by 1 1
More informationcoefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1
Review - Interpreting the Regression If we estimate: It can be shown that: where ˆ1 r i coefficients β ˆ+ βˆ x+ βˆ ˆ= 0 1 1 2x2 y ˆβ n n 2 1 = rˆ i1yi rˆ i1 i= 1 i= 1 xˆ are the residuals obtained when
More informationLab 10 - Binary Variables
Lab 10 - Binary Variables Spring 2017 Contents 1 Introduction 1 2 SLR on a Dummy 2 3 MLR with binary independent variables 3 3.1 MLR with a Dummy: different intercepts, same slope................. 4 3.2
More informationECON The Simple Regression Model
ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In
More informationOrdinary Least Squares (OLS): Multiple Linear Regression (MLR) Analytics What s New? Not Much!
Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Analytics What s New? Not Much! OLS: Comparison of SLR and MLR Analysis Interpreting Coefficients I (SRF): Marginal effects ceteris paribus
More informationHandout 11: Measurement Error
Handout 11: Measurement Error In which you learn to recognise the consequences for OLS estimation whenever some of the variables you use are not measured as accurately as you might expect. A (potential)
More informationECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors
ECON4150 - Introductory Econometrics Lecture 6: OLS with Multiple Regressors Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 6 Lecture outline 2 Violation of first Least Squares assumption
More informationRegression #8: Loose Ends
Regression #8: Loose Ends Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #8 1 / 30 In this lecture we investigate a variety of topics that you are probably familiar with, but need to touch
More informationProblem Set 10: Panel Data
Problem Set 10: Panel Data 1. Read in the data set, e11panel1.dta from the course website. This contains data on a sample or 1252 men and women who were asked about their hourly wage in two years, 2005
More informationHomoskedasticity. Var (u X) = σ 2. (23)
Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This
More informationAnswer all questions from part I. Answer two question from part II.a, and one question from part II.b.
B203: Quantitative Methods Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. Part I: Compulsory Questions. Answer all questions. Each question carries
More informationNonlinear Regression Functions
Nonlinear Regression Functions (SW Chapter 8) Outline 1. Nonlinear regression functions general comments 2. Nonlinear functions of one variable 3. Nonlinear functions of two variables: interactions 4.
More informationSpecification Error: Omitted and Extraneous Variables
Specification Error: Omitted and Extraneous Variables Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 5, 05 Omitted variable bias. Suppose that the correct
More informationWooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares
Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit
More informationSimultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser
Simultaneous Equations with Error Components Mike Bronner Marko Ledic Anja Breitwieser PRESENTATION OUTLINE Part I: - Simultaneous equation models: overview - Empirical example Part II: - Hausman and Taylor
More informationChapter 6: Linear Regression With Multiple Regressors
Chapter 6: Linear Regression With Multiple Regressors 1-1 Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution
More informationsociology 362 regression
sociology 36 regression Regression is a means of studying how the conditional distribution of a response variable (say, Y) varies for different values of one or more independent explanatory variables (say,
More informationProblem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics
Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics C1.1 Use the data set Wage1.dta to answer the following questions. Estimate regression equation wage =
More informationsociology 362 regression
sociology 36 regression Regression is a means of modeling how the conditional distribution of a response variable (say, Y) varies for different values of one or more independent explanatory variables (say,
More informationWeek 3: Simple Linear Regression
Week 3: Simple Linear Regression Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED 1 Outline
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 5 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 44 Outline of Lecture 5 Now that we know the sampling distribution
More informationMultivariate Regression: Part I
Topic 1 Multivariate Regression: Part I ARE/ECN 240 A Graduate Econometrics Professor: Òscar Jordà Outline of this topic Statement of the objective: we want to explain the behavior of one variable as a
More informationCHAPTER 6: SPECIFICATION VARIABLES
Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero
More informationRegression with a Single Regressor: Hypothesis Tests and Confidence Intervals
Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression
More informationProblem Set 1 ANSWERS
Economics 20 Prof. Patricia M. Anderson Problem Set 1 ANSWERS Part I. Multiple Choice Problems 1. If X and Z are two random variables, then E[X-Z] is d. E[X] E[Z] This is just a simple application of one
More information5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is
Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do
More informationChapter 2: simple regression model
Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.
More informationECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests
ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one
More informationEconomics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects
Economics 113 Simple Regression Models Simple Regression Assumptions Simple Regression Derivation Changing Units of Measurement Nonlinear effects OLS and unbiased estimates Variance of the OLS estimates
More informationIntroduction to Econometrics. Multiple Regression (2016/2017)
Introduction to Econometrics STAT-S-301 Multiple Regression (016/017) Lecturer: Yves Dominicy Teaching Assistant: Elise Petit 1 OLS estimate of the TS/STR relation: OLS estimate of the Test Score/STR relation:
More informationEconometrics Midterm Examination Answers
Econometrics Midterm Examination Answers March 4, 204. Question (35 points) Answer the following short questions. (i) De ne what is an unbiased estimator. Show that X is an unbiased estimator for E(X i
More informationEconomics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2
Economics 326 Methods of Empirical Research in Economics Lecture 14: Hypothesis testing in the multiple regression model, Part 2 Vadim Marmer University of British Columbia May 5, 2010 Multiple restrictions
More informationMultiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =
Economics 130 Lecture 6 Midterm Review Next Steps for the Class Multiple Regression Review & Issues Model Specification Issues Launching the Projects!!!!! Midterm results: AVG = 26.5 (88%) A = 27+ B =
More informationProblem Set 4 ANSWERS
Economics 20 Problem Set 4 ANSWERS Prof. Patricia M. Anderson 1. Suppose that our variable for consumption is measured with error, so cons = consumption + e 0, where e 0 is uncorrelated with inc, educ
More informationSection Least Squares Regression
Section 2.3 - Least Squares Regression Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin Regression Correlation gives us a strength of a linear relationship is, but it doesn t tell us what it
More informationGov 2000: 9. Regression with Two Independent Variables
Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62 1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics
More informationECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests
ECON4150 - Introductory Econometrics Lecture 7: OLS with Multiple Regressors Hypotheses tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 7 Lecture outline 2 Hypothesis test for single
More information1 The basics of panel data
Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Related materials: Steven Buck Notes to accompany fixed effects material 4-16-14 ˆ Wooldridge 5e, Ch. 1.3: The Structure of Economic Data ˆ Wooldridge
More informationMultiple Linear Regression
Multiple Linear Regression Asymptotics Asymptotics Multiple Linear Regression: Assumptions Assumption MLR. (Linearity in parameters) Assumption MLR. (Random Sampling from the population) We have a random
More informationEconometrics Homework 1
Econometrics Homework Due Date: March, 24. by This problem set includes questions for Lecture -4 covered before midterm exam. Question Let z be a random column vector of size 3 : z = @ (a) Write out z
More informationCourse Econometrics I
Course Econometrics I 4. Heteroskedasticity Martin Halla Johannes Kepler University of Linz Department of Economics Last update: May 6, 2014 Martin Halla CS Econometrics I 4 1/31 Our agenda for today Consequences
More informationEconometrics. 8) Instrumental variables
30C00200 Econometrics 8) Instrumental variables Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Thery of IV regression Overidentification Two-stage least squates
More informationProblem 4.1. Problem 4.3
BOSTON COLLEGE Department of Economics EC 228 01 Econometric Methods Fall 2008, Prof. Baum, Ms. Phillips (tutor), Mr. Dmitriev (grader) Problem Set 3 Due at classtime, Thursday 14 Oct 2008 Problem 4.1
More informationECON Introductory Econometrics. Lecture 13: Internal and external validity
ECON4150 - Introductory Econometrics Lecture 13: Internal and external validity Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 9 Lecture outline 2 Definitions of internal and external
More information(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections
Answer Key Fixed Effect and First Difference Models 1. See discussion in class.. David Neumark and William Wascher published a study in 199 of the effect of minimum wages on teenage employment using a
More informationSoc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis
Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Problem 1. The files
More informationECON Introductory Econometrics. Lecture 17: Experiments
ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.
More informationIntroduction to Econometrics. Multiple Regression
Introduction to Econometrics The statistical analysis of economic (and related) data STATS301 Multiple Regression Titulaire: Christopher Bruffaerts Assistant: Lorenzo Ricci 1 OLS estimate of the TS/STR
More informationWooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model
Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Most of this course will be concerned with use of a regression model: a structure in which one or more explanatory
More informationLecture 14. More on using dummy variables (deal with seasonality)
Lecture 14. More on using dummy variables (deal with seasonality) More things to worry about: measurement error in variables (can lead to bias in OLS (endogeneity) ) Have seen that dummy variables are
More informationHeteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant)
Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u 2 i /X i ) σ 2 i (In practice this means the spread
More informationQuantitative Methods Final Exam (2017/1)
Quantitative Methods Final Exam (2017/1) 1. Please write down your name and student ID number. 2. Calculator is allowed during the exam, but DO NOT use a smartphone. 3. List your answers (together with
More informationLecture 7: OLS with qualitative information
Lecture 7: OLS with qualitative information Dummy variables Dummy variable: an indicator that says whether a particular observation is in a category or not Like a light switch: on or off Most useful values:
More informationThe Simple Linear Regression Model
The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate
More informationPractice exam questions
Practice exam questions Nathaniel Higgins nhiggins@jhu.edu, nhiggins@ers.usda.gov 1. The following question is based on the model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + u. Discuss the following two hypotheses.
More informationECON Introductory Econometrics. Lecture 16: Instrumental variables
ECON4150 - Introductory Econometrics Lecture 16: Instrumental variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 12 Lecture outline 2 OLS assumptions and when they are violated Instrumental
More informationMultiple Regression Analysis: Heteroskedasticity
Multiple Regression Analysis: Heteroskedasticity y = β 0 + β 1 x 1 + β x +... β k x k + u Read chapter 8. EE45 -Chaiyuth Punyasavatsut 1 topics 8.1 Heteroskedasticity and OLS 8. Robust estimation 8.3 Testing
More informationEconometrics II Censoring & Truncation. May 5, 2011
Econometrics II Censoring & Truncation Måns Söderbom May 5, 2011 1 Censored and Truncated Models Recall that a corner solution is an actual economic outcome, e.g. zero expenditure on health by a household
More informationMaking sense of Econometrics: Basics
Making sense of Econometrics: Basics Lecture 4: Qualitative influences and Heteroskedasticity Egypt Scholars Economic Society November 1, 2014 Assignment & feedback enter classroom at http://b.socrative.com/login/student/
More informationEconometrics Review questions for exam
Econometrics Review questions for exam Nathaniel Higgins nhiggins@jhu.edu, 1. Suppose you have a model: y = β 0 x 1 + u You propose the model above and then estimate the model using OLS to obtain: ŷ =
More information1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.
Economics 3 Introduction to Econometrics Winter 2004 Professor Dobkin Name Final Exam (Sample) You must answer all the questions. The exam is closed book and closed notes you may use calculators. You must
More information