:Effects of Data Scaling We ve already looked at the effects of data scaling on the OLS statistics, 2, and R 2. What about test statistics?

Similar documents
Ch 7: Dummy (binary, indicator) variables

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

School of Mathematical Sciences. Question 1

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Wooldridge, Introductory Econometrics, 4th ed. Chapter 6: Multiple regression analysis: Further issues

Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras

Multiple Regression Analysis

Multiple Linear Regression CIVL 7012/8012

ECON3150/4150 Spring 2015

ECON 497: Lecture 4 Page 1 of 1

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Econometrics Summary Algebraic and Statistical Preliminaries

The Simple Regression Model. Part II. The Simple Regression Model

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Making sense of Econometrics: Basics

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

ECON 497 Midterm Spring

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Section 3: Simple Linear Regression

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

SIMPLE REGRESSION ANALYSIS. Business Statistics

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

CS 147: Computer Systems Performance Analysis

Inference in Regression Model

Regression and Statistical Inference

Econometrics 2, Class 1

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Unit 10: Simple Linear Regression and Correlation

Lecture 10: F -Tests, ANOVA and R 2

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics

2. Linear regression with multiple regressors

The regression model with one fixed regressor cont d

Multiple Regression Model: I

Mathematics for Economics MA course

Basic Probability Reference Sheet

ECNS 561 Multiple Regression Analysis

Linear Regression 9/23/17. Simple linear regression. Advertising sales: Variance changes based on # of TVs. Advertising sales: Normal error?

Intermediate Econometrics

Final Exam. Name: Solution:

4. Nonlinear regression functions

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

Lecture 14 Simple Linear Regression

Basic Business Statistics 6 th Edition

Answer Key: Problem Set 6

Review of Statistics

Multiple Regression Analysis

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Advanced Econometrics I

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Chapter 8 Heteroskedasticity

REVIEW 8/2/2017 陈芳华东师大英语系

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

Least Squares Estimation-Finite-Sample Properties

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

LECTURE 15: SIMPLE LINEAR REGRESSION I

Linear Regression with Multiple Regressors

10. Time series regression and forecasting

Multiple Regression Analysis

Review of Econometrics

Stat 5421 Lecture Notes Fuzzy P-Values and Confidence Intervals Charles J. Geyer March 12, Discreteness versus Hypothesis Tests

Simple Linear Regression for the Climate Data

Model Specification and Data Problems. Part VIII

The Multiple Regression Model Estimation

Final Exam - Solutions

Lectures 5 & 6: Hypothesis Testing

Introduction to Econometrics

Exercises (in progress) Applied Econometrics Part 1

A Practitioner s Guide to Cluster-Robust Inference

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i

Single and multiple linear regression analysis

Formal Statement of Simple Linear Regression Model

LECTURE 5 HYPOTHESIS TESTING

Statistical Inference with Regression Analysis

Ordinary Least Squares Regression Explained: Vartanian

Intermediate Econometrics

Ch 2: Simple Linear Regression

Empirical Economic Research, Part II

Homework Set 2, ECO 311, Fall 2014

Econometrics Part Three

Semiparametric Regression

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

1 A Non-technical Introduction to Regression

Simple linear regression

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Motivation for multiple regression

Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend

22s:152 Applied Linear Regression

Midterm 2 - Solutions

Chapter 14. Linear least squares

Statistical Distribution Assumptions of General Linear Models

10. Alternative case influence statistics

Multiple Regression Analysis: Further Issues

Transcription:

MRA: Further Issues :Effects of Data Scaling We ve already looked at the effects of data scaling on the OLS statistics, 2, and R 2. What about test statistics? 1. Scaling the explanatory variables Suppose we replace the explanatory variables X by X where X XD where D is an invertible matrix (a change of units is the special case where D diag c 1, c 2,,c K ). So instead of the model y X u we consider the model y X u D 1.

The general linear hypothesis H 0 : R r is replaced with H 0 : R r where R RD. The test statistic for H 0 : R r R r R X X 1 R 1 R r / q 2 The test statistic for H 0 : R r is given by R r R X X 1 R 1 R r / q 2 R r R X X 1 R 1 R r / q 2 given that we know R RDD 1,and 2 2 (changes in the basis representing Sp X don t change the sum of squared residuals). But

X X 1 XD XD 1 D X XD 1 D 1 X X 1 D 1 (with repeated application of the property BA 1 A 1 B 1 provided both inverses exist). We conclude that tests of the general linear hypothesis are invariant to the basis chosen for Sp X. Asthet test is a special case, t statistics are invariant to choice of units.

2. Scaling the dependent variable Suppose we replace the dependent variable y with y cy. So our new model is y X u where v c. The restrictions under test become R v r v,wherer v cr. The test statistic is R v r v R X X 1 R 1 R 2 v r v / q v But we know v c and v 2 c2 2. We conclude that tests of the general linear hypothesis are invariant to the units chosen for the dependent variable.

:Change of basis for the restrictions Suppose we have the restrictions 1 2and 2 0. This is equivalent to saying 1 2 2and 2 0. But these two ways of expressing the same restrictions generate two different values for the R matrix and r vector. Fortunately, if we replace H 0 : R r with H 0 : R v r v where R v BR, andr v Br with B invertible, we don t change the value of the test statistic. Exercise: Prove the result in the previous bullet. Another exercise: Find the B matrix for the example I ve given in the first bullet above.

:Beta (standardized) coefficients In some applications where units are difficult to interpret, researchers divide the dependent and independent variables by their standard deviations. In this case, the coefficients tell us how many standard deviations the dependent variable responds to a one standard deviation increase in each explanatory variable. Even in cases where the units are easy to interpret, it is sometimes useful to report standardized coefficients to give a sense of the importance of "typical" movements in an explanatory variable.

:Using logs Suppose we estimate the model ln y 0 1 ln x 1 2 x 2 u The parameter 1 measures an elasticity, 1 E ln y ln x 1 %Δ in predicted y %Δ in x 1 1 is "dimensionless" or "unit free". If we change units y y c 0 y x 1 x 1 c 1 x 1 The model becomes ln y 0 1 ln x 1 2 x 2 u ln y ln c 0 0 1 ln x 1 ln c 1 2 x 2 u Therefore

1 1 2 2 u u 0 0 ln c 0 1 ln c 1 Using logs for the dependent variable may lead to disturbances that appear more likely to be i.i.d draws from a normal density (fewer outlier, less heteroskedasticity). That s the idea behind the Box-Cox and similar transformations (see the paper by Wooldridge on the home page). But to use it, we must have strictly positive values for the dep. variable. Also, in a regression setting, using logs versus levels changes what we want to explain. This seems innocuous in wage regressions, but not if the dependent variable is, say, (gross) returns.

Using a Taylor series expansion ln y ln E y X 1 E y X y E y X So 0. 5 1 E y X 2 y E y X 2 E ln y X ln E y X 0. 5 y X E y X Even if returns are unpredictable, so E y X (a constant), running a regression on log-returns could generate statistical significant coefficients if the standard deviation of returns is predictable. 2

Whether or not logarithms should be used for the independent variables is a much more straightforward matter. We can treat it as a problem of hypothesis testing. For example we could run the regression ln y 0 1 ln x 1 2 x 2 3 x 1 u Test 3 0 to decide if the log specification is sufficient, or 1 0 to see if the linear specification is sufficient. If we don t reject either null, then the data don t care and it s a matter of taste which specification we use. If we reject one null, but not the other, then the data tell us which to choose. If reject both nulls, then neither the linear nor the logarithmic specification is sufficient to capture the response of lny to x 1.

:Parameter heterogeneity A very important consideration in applied work is that responses can differ across the observations. A conceptually simple case is where this variation only depends on the regressors, i.e. y i x i i u i x i x i u i For example, consider the special case with y i 0 1 x 1i 2 x i x 2i u i 2 x i 2 3 x 1i 4 x 2i

Substituting out for 2 x i, the model becomes y i 0 1 x 1i 2 3 x 1i 4 x 2i x 2i u i So 0 1 x 1i 2 x 2i 3 x 1i x 2i 4 x 2 2i u i E y i x i x 1i E y i x i x 2i 1 3 x 2i 2 3 x 1i 2 4 x 2i If the explanatory variable is not continuous the number of rooms in a house then it makes sense to work with ΔE y i x i (the textbook uses Δ y i ) to understand the effect of changes in the explanatory variables. This creates a small but important difference in interpretation.

:Goodness of fit and selection of regressors In what follows, assume the model contains an intercept. AhighR 2 doesn t mean that we have a good model (trending data often have a high R 2 ;alowr 2 doesn t mean that we have a bad model (the market efficiency hypothesis predicts a zero R 2 if we try to forecast returns). R 2 cannot fall when we add a regressor: R 2 1 SSR SST and, by definition, SSR can never increase if we add a regressor A (relatively dumb) alternative to R 2 is sometimes used (especially in finance) that does penalize for adding regressors. It s called the adjusted R 2 or the "R-bar squared"

R 2 1 SSR/ n K SST/ n 1 1 1 R 2 n 1 n k If we add a regressor, x K 1, to the model, then R 2 increases iff the t-statistic for H 0 : K 1 0 exceeds 1. If we add a set of regressors, x K 1, to the model, then R 2 increases iff the F-statistic for H 0 : K 1 0 exceeds 1. Notice that we can also write R 2 1 2 y Ay/ n 1 Changing regressors, affects only 2,soR 2 increases iff 2 decreases. It is better to report R 2 and 2 then R 2 and R 2.

We saw that if a model is "false" then E 2 2. This lead to a (very old) suggestion that we should use R 2 to choose between various specifications. This is NOT A GOOD IDEA. It makes no sense if the dependent variable changes across specifications. If the models are nested, we can use standard hypothesis tests. And if the models are non-nested, then we should use an information criterion (Akaike, or better Schwartz/BIC, or Hannan-Quinn) especially if we have more than two models to compare.

Loose ends: 1. Controlling for Too Many factors In the attempt to remove bias, you may make a coefficient estimate something very different from the effect of interest (eg. fatalities on beer tax and beer consumption, or wages on gender and industry dummies). See Wooldridge 2. Adding Regressors to Reduce Error Variance Even if the coefficient estimates are unbiased (random treatments), we can benefit from adding regressors if they reduce the variance of the error term (see Wooldridge).

:Prediction Suppose we wish to predict an out-of-sample observation, y 0, using our regression estimates. For our sample, we have the model y X u. Assume that y 0 comes from the model y 0 x 0 u 0 where u u 0 ~N 0, 2 I n 1 The obvious predictor is just y 0 x 0 with X X 1 X y But y 0 is just a lin. comb. of. Therefore x 0 ~N x0, 2 x 0 X X 1 x 0

This result allows us to form a confidence interval for x 0 using the t-distribution x 0 x0 ~t n K 2 x 0 X X 1 x 0 Easy to generalize to the cause where we want to predict several out of sample observations simultaneously. Then x 0 is a matrix, y 0 is a vector, but nothing else changes, except that we would use the F-distribution for a confidence ellipsoid.

A prediction interval for y 0 combines parameter uncertainty (coming from ) with intrinsic uncertainty coming from the disturbance u 0. The prediction error is defined by e 0 y 0 y 0 X u 0 But both pieces are normal and independent, therefore e 0 ~N 0, 2 1 x 0 X X 1 x 0 Proceeding as above we get an interval estimate for y 0 that is a mix of a confidence interval and a prediction interval. WARNING: Asymptotic theory gives a justification for using MVN to approximate the distribution of, but to construct the prediction interval above we have to take seriously the small sample distribution assumption for u 0.

:Residual analysis Which observations have the largest and smallest residuals u i? Looking at these residuals may suggest left out variables. But a better approach is the "leave one out" regression residuals discussed in lecture "Multiple Regression 1". To see why looking at u i can be very misleading, consider the following example. Suppose the data on y,x are 2, 2, 1, 1, 0, 0, 1, 1, 13, 2. The first four observations lie on the straight line y x. The fitted regression line is y 3 2x, and the OLS residuals are 3, 0, 3, 6, 6. It looks like observations 4 and 5 are a bit strange, but it s really only observation 5 that is out of line. Examples can be constructed where the "leave one out" outlier isn t the largest OLS residual

Sometimes residuals are used to measure "value-added" after controlling for the quality of inputs. For example Frontier production or cost functions School quality rankings (CD Howe David Johnson) Law School (see Wooldridge) Searching for "alpha" (expected returns in excess of compensation for risk)

:Prediciting y when ln y is the dep. variable When we regress ln y X u The coefficient E ln y X / X. But what if we are interested in E y X / X? Case 1. u~n We can show that ln y i ~N i, 2 i E y i exp i 2 i /2. Therefore, E y i x i x i e 2 /2 e x i x i e x i 2 /2 Replacing the unknown parameters by the OLS estimators gives us a consistent estimate of the response.

Case 2. E exp u i x i (a constant) Then E y i x i exp x i, and E y i x i ex i e x i x x i i We can estimate 1. by estimating the regression model through the origin y i m i i where m i exp x i and is the OLS estimator. 2. using the smearing estimate 1 n exp u i i Rk: If x i contains variables that aren t continuous, then we should look at ΔE y i x i (see Wooldridge Ex 7.5)

:Choosing levels or logs for the dependent variable Case 1. u~n (Box-Cox). Replace y with y y/ y g, where y g is the geometric mean of y, i.e.ln y g ln y i /n. Run the two regressions y X u ln y X w and choose the model that has the smallest value for 2.

Case 2. E exp u i x i (a constant) Regress y X u and store the R 2. Then compute the fitted vector y exp xi (where denotes either of the two estimators described in the section above) and calculate the squared correlation r 2 yy. If R 2 r y y 2, choose the level specification. Else, choose the log.