Applied Econometrics. Professor Bernard Fingleton

Similar documents
Exercises (in progress) Applied Econometrics Part 1

Brief Sketch of Solutions: Tutorial 3. 3) unit root tests

Multiple Regression Analysis

ARDL Cointegration Tests for Beginner

7. Integrated Processes

1 Quantitative Techniques in Practice

The Multiple Regression Model Estimation

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

EC408 Topics in Applied Econometrics. B Fingleton, Dept of Economics, Strathclyde University

APPLIED MACROECONOMETRICS Licenciatura Universidade Nova de Lisboa Faculdade de Economia. FINAL EXAM JUNE 3, 2004 Starts at 14:00 Ends at 16:30

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

7. Integrated Processes

Applied Econometrics. Professor Bernard Fingleton

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall

Time Series. Chapter Time Series Data

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Introduction to Econometrics Chapter 4

Solution to Exercise E6.

Christopher Dougherty London School of Economics and Political Science

10. Time series regression and forecasting

About the seasonal effects on the potential liquid consumption

Introduction to Eco n o m et rics

4. Nonlinear regression functions

2. Linear regression with multiple regressors

Practice Questions for the Final Exam. Theoretical Part

ECON 4551 Econometrics II Memorial University of Newfoundland. Panel Data Models. Adapted from Vera Tabakova s notes

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

EC408 Topics in Applied Econometrics. B Fingleton, Dept of Economics, Strathclyde University

ST430 Exam 2 Solutions

Economics 308: Econometrics Professor Moody

10) Time series econometrics

CHAPTER 6: SPECIFICATION VARIABLES

Econ 423 Lecture Notes: Additional Topics in Time Series 1

3. Linear Regression With a Single Regressor

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Econ 427, Spring Problem Set 3 suggested answers (with minor corrections) Ch 6. Problems and Complements:

Econometrics. 8) Instrumental variables

Exercise Sheet 6: Solutions

CHAPTER 21: TIME SERIES ECONOMETRICS: SOME BASIC CONCEPTS

9) Time series econometrics

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Ch 2: Simple Linear Regression

Questions and Answers on Unit Roots, Cointegration, VARs and VECMs

NATCOR Regression Modelling for Time Series

Time Series Methods. Sanjaya Desilva

1 Regression with Time Series Variables

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND

Exercise Sheet 5: Solutions

Correlation Analysis

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables

Econometrics Lab Hour Session 6

This is a repository copy of Estimating Quarterly GDP for the Interwar UK Economy: An Application to the Employment Function.

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Simple Linear Regression: A Model for the Mean. Chap 7

Eastern Mediterranean University Department of Economics ECON 503: ECONOMETRICS I. M. Balcilar. Midterm Exam Fall 2007, 11 December 2007.

A Guide to Modern Econometric:

Univariate linear models

THE INFLUENCE OF FOREIGN DIRECT INVESTMENTS ON MONTENEGRO PAYMENT BALANCE

FinQuiz Notes

A discussion on multiple regression models

Modelling Seasonality of Gross Domestic Product in Belgium

Economtrics of money and finance Lecture six: spurious regression and cointegration

E c o n o m e t r i c s

Testing and Model Selection

Econometrics. 9) Heteroscedasticity and autocorrelation

Bristol Business School

Multivariate Time Series: Part 4

Heteroscedasticity 1

Section 2 NABE ASTEF 65

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

Tjalling C. Koopmans Research Institute

in the time series. The relation between y and x is contemporaneous.

General Linear Model (Chapter 4)

Statistical Inference. Part IV. Statistical Inference

TESTING FOR CO-INTEGRATION

Answers to Problem Set #4

5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares

Answers: Problem Set 9. Dynamic Models

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

sociology 362 regression

Fixed and Random Effects Models: Vartanian, SW 683

Empirical Economic Research, Part II

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Econometrics I. Professor William Greene Stern School of Business Department of Economics 25-1/25. Part 25: Time Series

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

CORRELATION, ASSOCIATION, CAUSATION, AND GRANGER CAUSATION IN ACCOUNTING RESEARCH

Course information EC2020 Elements of econometrics

ECON 366: ECONOMETRICS II. SPRING TERM 2005: LAB EXERCISE #10 Nonspherical Errors Continued. Brief Suggested Solutions

sociology 362 regression

Financial Econometrics

Financial Time Series Analysis: Part II

Reading Assignment. Serial Correlation and Heteroskedasticity. Chapters 12 and 11. Kennedy: Chapter 8. AREC-ECON 535 Lec F1 1

Exercices for Applied Econometrics A

Chapter 14 Multiple Regression Analysis

STAT 212 Business Statistics II 1

Transcription:

Applied Econometrics Professor Bernard Fingleton

Regression A quick summary of some key issues

Some key issues Text book JH Stock & MW Watson Introduction to Econometrics 2nd Edition Software Gretl Gretl.sourceforge.net 3

Course outline Week 5 introduction to regression Week 6 endogeneity & instrumental variables Week 7 panel data Week 8 spurious regression, Dickey-Fuller etc Week 9 cointegration and error correction Week 10 - autoregressive distributed lag models Week 11 vector autoregression (VAR), vector error correction, multiple cointegrating vectors Week 12 VAR, Johansen etc 4

Regression Regression is used to analyze how a single dependent variable (or Y variable) is affected by the values of one or more independent variables (also called regressors, X variables, factors). 5

6 Multiple regression 1 1 2 2 1 1 0 1 1 2 2 1 1 0 1 1 2 2 1 1 0 2 2 1 1 0 ˆ... ˆ ˆ ˆ ˆ... ) (... + + + = + + + = + + + + = + + + = k k k k k k X b X b X b b Y X b X b X b b Y E e X b X b X b b Y e X b X b b Y b 1 is change in E(Y) per unit change in X 1 b k-1 is change in E(Y) per unit change in X k-1

Interpreting partial regression coefficients b i is the change in E(Y) per unit change in X i, with all other variables held statistically constant E(Y) = b 0 + b 1 X 1 + b 2 X 2 assume we change X 1 by an amount equal to ΔX 1, but keep X 2 constant; this changes E(Y) to new E(Y) new E(Y) = b 0 + b 1 (X 1 + ΔX 1 ) + b 2 X 2 new E(Y) - E(Y) = ΔΕ( Y) = b 1 ΔX 1 thus if ΔX 1 = 1, ΔΕ( Y) = b 1 7

Theory indicating X variables output = f(labour, capital) adopt a Cobb-Douglas production function output = labour α capital β ln(output) = α ln(labour) + β ln(capital) if α + β > 1 we have increasing returns doubling inputs more than doubles output if α + β = 1 we have constant returns to scale 9

dy / Y elasticity = ;% change in Y per 1% change in X dx / X log Yˆ = b ˆ + bˆ log X 0 1 Yˆ = exp( bˆ ) X = b ˆ X bˆ 0 0 bˆ 1 1 dyˆ ˆ ˆ ˆ b1 1 = b 0 b1x dx dyˆ ˆ ˆ b1 = b1( b 0 X ) X dx dyˆ ˆ ˆ 1 = byx 1 dx ˆ / ˆ ˆ dy Y b1 = dx / X ˆ 1

Data indicating X variables letting the data speak is a good way to obtain a realistic theory we look at the data to identify important variables unimportant variables, that are indistinguishable from random variation and can be left in the error term 18

Methods for choosing X variables R 2 t tests F tests 19

R 2 2 2 R = 1 - eˆ / S ˆ YY = corr(y,y ) 2 indicates, on a scale from 0 to 1, or 0% to 100% how much of Y s s variation is accounted for by the Xs s contained in the regression model [this equation assumes that b 0 is present] 20

R 2 DISADVANTAGES R 2 s s probability distribution is not constant making it difficult to objectively compare the R 2 of different models. R 2 ALWAYS increases if additional (perhaps unimportant)variables are added to the model. Hence the most complex model always seems the best using R 2. BUT R 2 -adjusted takes into account the number of Xs. 21

t -test Say we wish to test whether a particular variable, X i, should be included [In practice i could be,say, 2 if we were testing X 2 ] H 0 : b i =0 [X i has no effect on Y] t bˆ b bˆ = i i = i σˆ s se..( bˆ ) XX t ratio ~ t T k i When H o correct for population T = sample size, k = number of regression coefficients 22

23

Taiwanese agricultural output The regression equation is ln output = - 3.34 + 1.4988 ln labour + 0.4899 ln capital Predictor Coef Stdev t-ratio p Constant b 0 = -3.338 2.450-1.36 0.198 ln labour b 1 = 1.4988 0.5398 2.78 0.017 ln capital b 2 = 0.4899 0.1020 4.80 0.000 For ln(labour) from t tables t = 1.4988/0.5398 = 2.78 t crit = 2.18 with (T-k)( ) = 15-3 3 = 12 degrees of freedom, crit is the value (ignoring sign) with p-value p = 0.05 in the 12 distribution t crit t 12 since t > t crit (ie p-value for t = 0.017 < 0.05) reject H o that b 1 =0

Analysis of Variance (ANOVA) 25

F test : Jointly testing a group of X s H : b = b =...= b = O 0 1 2 k 1 The test statistic is the F ratio calculated from an ANOVA table This is calculated automatically whenever a regression model is fitted to a data set 26

F test of a group of X variables H : b = b =... b = 0 H O A 1 2 k 1 is that H is untrue O ( SYY D)/( k 1) F = ~ Fk 1, T k assuming HO is true D/( T k) D= ( Y Yˆ ) = eˆ i S = ( Y Y) YY i i i 2 2 i i i i 2

The regression equation is cons = 24.8 + 0.942 income - 0.0424 wealth Predictor Coef Stdev t-ratio p Constant 24.775 6.752 3.67 0.008 income 0.9415 0.8229 1.14 0.290 wealth -0.04243 0.08066-0.53 0.615 R-sq = 0.964 Analysis of Variance (ANOVA table) SOURCE DF SS MS F p Regression 2 8565.6 4282.8 92.40 <0.0001 Error 7 324.4 46.3 Total 9 8890.0 F = 92.4, p-value in F 2,7 is <0.0001

Analysis of Variance (ANOVA table) SOURCE DF SS MS F Regression k-1 S YY -D (S YY -D)/(k-1) (S YY -D)/(k-1) 2 error T-k D = e i D/(T-k) D/(T-k) Total T-1 S YY Analysis of Variance (ANOVA table) SOURCE DF SS MS F p Regression 2 8565.6 4282.8 92.40 <0.0001 Error 7 324.4 46.3 Total 9 8890.0

34

Interpreting the F test since the p-value for F = 92.4 is < 0.05 the H o is reject this means H : b = b =...= b = O is rejected 1 2 k 1 0 indicating that one or more b i is unequal to zero 35

Multicollinearity b -hence t- for any one variable will change as X variable(s) added or subtracted EXCEPT when the X variables not correlated with each other in which case t is the same in bivariate and multiple regression 37

Multicollinearity : consumption, income and wealth example A) The regression equation is consumption = 24.5 + 0.509 income Predictor Coef Stdev t-ratio p Constant 24.455 6.414 3.81 0.005 income 0.50909 0.03574 14.24 0.000 B) The regression equation is consumption = 24.8 + 0.942 income - 0.0424 wealth Predictor Coef Stdev t-ratio p Constant 24.775 6.752 3.67 0.008 income 0.9415 0.8229 1.14 0.290 wealth -0.04243 0.08066-0.53 0.615 Correlation of income and wealth = 0.999

Multicollinearity in A) income is highly significant similarly wealth alone is highly significant in B) wealth and income are insignificant yet paradoxically R 2 = 0.96 the reason for such extreme changes in the apparent effects is multicollinearity 39

Multicollinearity multicollinearity means that the X variables are very highly correlated, so that they are not distinct so with severe multicollinearity the estimated b s become very unreliable if we change X 1 (say) slightly, b 1 changes a lot the standard errors of the b s become very large hence we see low t values but high R 2 40

Multicollinearity Ln Halifax House price index, Greater London and Scotland Model 1: OLS, using observations 1983:2-2007:2 (T = 97) Dependent variable: lns coefficient std. error t-ratio p-value --------------------------------------------------------- const 1.75890 0.171794 10.24 5.12e-017 *** lngl 0.629605 0.0305174 20.63 7.26e-037 *** Mean dependent var 5.287020 S.D. dependent var 0.375992 Sum squared resid 2.476363 S.E. of regression 0.161453 R-squared 0.817532 Adjusted R-squared 0.815612 F(1, 95) 425.6400 P-value(F) 7.26e-37 Model 2: OLS, using observations 1983:2-2007:2 (T = 97) Dependent variable: lns coefficient std. error t-ratio p-value --------------------------------------------------------- const 1.10835 0.0988124 11.22 4.92e-019 *** lngl -0.376925 0.0653108-5.771 1.01e-07 *** lne_ro 1.15071 0.0723979 15.89 2.21e-028 *** Mean dependent var 5.287020 S.D. dependent var 0.375992 Sum squared resid 0.671550 S.E. of regression 0.084523 R-squared 0.950518 Adjusted R-squared 0.949465 F(2, 94) 902.8345 P-value(F) 4.36e-62

lns, lne_ro and lngl 7 lns lngl lne_ro 6.5 6 5.5 5 4.5 1985 1990 1995 2000 2005 42

Solutions to multicollinearity problems use less-correlated X variables eg data for a longer/different time period, so that X 1, X 2, etc become more separated use the change in Y, X at each point in time rather than the levels of Y, X, since changes tend not to be as strongly correlated as levels 43

Solutions to multicollinearity problems use the change in Y, X at each point in time Actually the difference in logs equals the exponential growth rate X(t)=105 X(t-1)=100 growth 5% LnX(t) = 4.65396 lnx(t-1) = 4.60517 lnx(t)-lnx(t-1)= 0.04879 44

Solutions to multicollinearity problems : use differences = growth with logs Model 3: OLS, using observations 1983:2-2007:2 (T = 97) Dependent variable: d_lns coefficient std. error t-ratio p-value -------------------------------------------------------- const 0.00749513 0.00346701 2.162 0.0332 ** d_lngl -0.176427 0.122453-1.441 0.1530 d_lne_ro 0.684976 0.138318 4.952 3.22e-06 *** Mean dependent var 0.017016 S.D. dependent var 0.031070 Sum squared resid 0.069073 S.E. of regression 0.027108 R-squared 0.254643 Adjusted R-squared 0.238785 F(2, 94) 16.05706 P-value(F) 1.00e-06 45

Fitted values versus Scotland house price growth 0.14 0.12 fv d_lns 0.1 0.08 0.06 0.04 0.02 0-0.02-0.04-0.06 1985 1990 1995 2000 2005 46

Fitted values versus Scotland house price growth : with quarterly dummies Model 4: OLS, using observations 1983:2-2007:2 (T = 97) Dependent variable: d_lns coefficient std. error t-ratio p-value -------------------------------------------------------- const 0.00891064 0.00521555 1.708 0.0910 * d_lngl -0.233089 0.112314-2.075 0.0408 ** d_lne_ro 0.551339 0.131062 4.207 6.06e-05 *** dq1-0.0122192 0.00710341-1.720 0.0888 * dq2 0.0233635 0.00754418 3.097 0.0026 *** dq3-0.00229161 0.00721425-0.3177 0.7515 Mean dependent var 0.017016 S.D. dependent var 0.031070 Sum squared resid 0.054784 S.E. of regression 0.024536 R-squared 0.408834 Adjusted R-squared 0.376353 F(5, 91) 12.58663 P-value(F) 2.70e-09 48

Observed S growth and fitted values : with quarterly dummies 0.14 0.12 fv d_lns 0.1 0.08 0.06 0.04 0.02 0-0.02-0.04-0.06 1985 1990 1995 2000 2005 50

F test of seasonal effects Model 5: OLS, using observations 1983:2-2007:2 (T = 97) Dependent variable: d_lns coefficient std. error t-ratio p-value -------------------------------------------------------- const 0.00749513 0.00346701 2.162 0.0332 ** d_lngl -0.176427 0.122453-1.441 0.1530 d_lne_ro 0.684976 0.138318 4.952 3.22e-06 *** Mean dependent var 0.017016 S.D. dependent var 0.031070 Sum squared resid 0.069073 S.E. of regression 0.027108 R-squared 0.254643 Adjusted R-squared 0.238785 F(2, 94) 16.05706 P-value(F) 1.00e-06 Log-likelihood 213.8572 Akaike criterion -421.7143 Schwarz criterion -413.9902 Hannan-Quinn -418.5911 rho -0.164960 Durbin-Watson 2.314335 Comparison of Model 5 and Model 4: Null hypothesis: the regression parameters are zero for the variables dq1, dq2, dq3 Test statistic: F(3, 91) = 7.9117, with p-value = 9.54953e-005 51

Summary Statistical criteria R 2 gives the overall % of Y s variation accounted for by the X variables t test for testing the significance of individual Xs F test for testing whether groups of Xs should be present in the model Multicollinearity is a problem that occurs when we have highly correlated variables (as often occurs in time series) Solve by reducing the correlation by differencing and/or extra data 52