Interpreting Regression Results

Similar documents
Interpreting Regression Results -Part II

The Linear Regression Model

Model Mis-specification

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1

Heteroscedasticity and Autocorrelation

Economics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18)

Lectures 5 & 6: Hypothesis Testing

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

1 Appendix A: Matrix Algebra

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Econometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012

The Statistical Property of Ordinary Least Squares

The regression model with one stochastic regressor.

Financial Econometrics Lecture 6: Testing the CAPM model

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Next is material on matrix rank. Please see the handout

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Regression: Ordinary Least Squares

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Essential of Simple regression

Solutions for Econometrics I Homework No.3

Introduction to Statistical Hypothesis Testing

Econometrics of Panel Data

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

LECTURE 5 HYPOTHESIS TESTING

Econometrics Review questions for exam

[y i α βx i ] 2 (2) Q = i=1

Lecture 4: Testing Stuff

1. The Multivariate Classical Linear Regression Model

Math 423/533: The Main Theoretical Topics

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

Topic 3: Inference and Prediction

Reference: Davidson and MacKinnon Ch 2. In particular page

Lecture 4: Heteroskedasticity

Topic 3: Inference and Prediction

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Ma 3/103: Lecture 24 Linear Regression I: Estimation

ECON 5350 Class Notes Functional Form and Structural Change

The multiple regression model; Indicator variables as regressors

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Birkbeck Working Papers in Economics & Finance

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Introduction to Econometrics

Part 6: Multivariate Normal and Linear Models

Introduction to Estimation Methods for Time Series models. Lecture 1

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Large Sample Properties of Estimators in the Classical Linear Regression Model

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

New Developments in Econometrics Lecture 16: Quantile Estimation

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Stat 5101 Lecture Notes

Multiple Regression Analysis: Inference ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Least Squares Estimation-Finite-Sample Properties

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T,

Homework Set 2, ECO 311, Fall 2014

ECON3150/4150 Spring 2015

1 Correlation and Inference from Regression

Ch 3: Multiple Linear Regression

Statistics and econometrics

Brief Suggested Solutions

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance

Instrumental Variables

CHAPTER 6: SPECIFICATION VARIABLES

Fundamental Probability and Statistics

Linear Regression. Junhui Qian. October 27, 2014

Lecture 6: Geometry of OLS Estimation of Linear Regession

MAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM

Lab 07 Introduction to Econometrics

CHAPTER 21: TIME SERIES ECONOMETRICS: SOME BASIC CONCEPTS

Answers to Problem Set #4

R = µ + Bf Arbitrage Pricing Model, APM

Ch 2: Simple Linear Regression

Multiple Linear Regression CIVL 7012/8012

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Multiple Regression Analysis

Randomized Complete Block Designs

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Intermediate Econometrics

Cointegration Lecture I: Introduction

Review of Econometrics

EC4051 Project and Introductory Econometrics

Multiple Regression Analysis. Basic Estimation Techniques. Multiple Regression Analysis. Multiple Regression Analysis

Review of Statistics 101

Multivariate Distributions

Probability. Table of contents


Multivariate Tests of the CAPM under Normality

Unit roots in vector time series. Scalar autoregression True model: y t 1 y t1 2 y t2 p y tp t Estimated model: y t c y t1 1 y t1 2 y t2

A Likelihood Ratio Test

Econometrics I. Ricardo Mora

8. Hypothesis Testing

Transcription:

Interpreting Regression Results Carlo Favero Favero () Interpreting Regression Results 1 / 42

Interpreting Regression Results Interpreting regression results is not a simple exercise. We propose to split these procedure in three steps. First, understand the relevance of our regression independently from inference on the parameters. There is an easy way to do this: suppose all parameters in the model are known and identical to the estimated values and learn how to read these. Second, introduce a measure of sampling variability and evaluate again what you know taking into account that parameters are estimated and there is uncertainty surrounding your point estimates. Third, remember that each regression is run after a reduction process has been, explicitly or implicitly implemented. The relevant question is what happens if something went wrong in the reduction process? What are the consequences of omitting relevant information or of including irrelevant one in your specification? Favero () Interpreting Regression Results 2 / 42

Relevance of a regression is different form statistical significance of the estimated parameters. In fact, confusing statistical significance of the estimated parameter describing the effect of a regressor on the dependent variable with practical relevance of that effect is a rather common mistake in the use of the linear model. Statistical inference is a tool for estimating parameters in a probability model and assessing the amount of sampling variability. Statistics gives us indication on what we can say about the values of the parameters in the model on the basis of our sample. The relevance of a regression is determined by the share of the unconditional variance of y that is explained by the variance of E (y X). Measuring how large is the share of the unconditional variance of y explained by the regression function is the fundamental role of R 2. Favero () Interpreting Regression Results 3 / 42 as a measure of relevance of a regression 2

The R-squared as a measure of relevance of a regression To illustrate the point let us consider two specific cases of applications of the CAPM: ( ( r i t r rf t r m t t ( ui,t ) ) r rf u m,t ) = 0.8σ m u m,t + σ i u i,t = µ m + σ m u m,t [( 0 n.i.d. 0 ), ( 1 0 0 1 )] µ m = 0.0065, σ m = 0.054, σ 1 = 0.09, σ 2 = 0.005 We simulate an artificial sample of 1056 (same length with the sample July 1926-June2014) observations for each process. µ m and σ m are calibrated to match the first two moments of the market portfolio excess returns over the sample 1926:7-2014:7. While the standard errors of the two excess returns are calibrated to deliver R 2 in the CAPM regression of respectively about.22 and.98. Favero () Interpreting Regression Results 4 / 42

The R-squared as a measure of relevance of a regression By running the two CAPM regressions on the artificial sample: TABLE 3.1: The estimation ( of ) the CAPM on artificial data Dependent Variable r 1 t rrf t Regressor ( ) Coefficient Std. Error t-ratio Prob. r m t r rf t 0.875 17.48 0.000 R 2 0.22 S.E. of regression 0.0076 ( ) Dependent Variable r 2 t rrf t Regressor Coefficient Std. Error t-ratio Prob. ( r m t r rf t ) 0.793 201.86 0.000 R 2 0.972 S.E. of regression 0.0000 In both cases the estimated beta are statistically significant and very close to their true value of 0.8. Favero () Interpreting Regression Results 5 / 42

In both experiments the conditional expectation changes of the same amount but the share of the unconditional variance of y explained by Favero () Interpreting Regression Results 6 / 42 The R-squared as a measure of relevance of a regression Simulate again the processes but introduce at some point a temporary shift of two per cent in the excess returns in the market portfolio..24.20.16.12.08..00.04 04. 08. 12. 16.3.2.1.0.1.2.3.4.20.15.10.05.00. 05. 10. 15 Simulated Market Portfolio excess returns 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 Baseline Alternative Simulated Portfolio 1 excess returns 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 Baseline Alternative Simulated Portfolio 2 excess returns 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 Baseline Alternative

Inference in the Linear Regression Model Users of econometric models in finance attributes high priority to the concept of "statistical significance" of their estimates. In the standard statistical jargon an estimate of a parameter is statistical significant if its estimated value, compared with its sampling standard deviation makes it unlikely that in other samples the estimate may change of sign. In the linear regression model the statistical index mostly used is the t-ratio and an estimated parameter has a significance which is usually measured in terms of its P-value, the probability with which that coefficient is equal to zero. In the previous section we have discussed the common confusion between statistical significance and relevance In this section we illustrate the basic principles that allow us to evaluate statistical significance and to perform test of relevant hypothesis on the estimated coefficient in a linear model. Favero () Interpreting Regression Results 7 / 42

Elements of Distribution theory We consider the distribution of a generic n-dimensional vector z, together with the derived distribution of the vector x = g (z) which admits the inverse z = h (x), with h = g 1. If prob (z 1 < z < z 2 ) = z 2 z 1 f (z) dz, and prob (x 1 < x < x 2 ) = x 2 x 1 f (x) dx, then: f (x) = f (h (x)) J, where J = h 1 h n x 1 x 1............ h 1 x n... h n x n = h. x Favero () Interpreting Regression Results 8 / 42

The normal distribution The standardized normal univariate has the following distribution: 1 f (z) = exp ( 12 ) z2, 2π E (z) = 0, var (z) = 1. By considering the transformation x = σz + µ, we derive the distribution of the univariate normal as: ( ) 1 f (x) = σ 2π exp (x µ)2 2σ 2, E (x) = µ, var (x) = σ 2. Favero () Interpreting Regression Results 9 / 42

The normal multivariate distribution Consider now the vector z = (z 1, z 2,..., z n ), such that f (z) = n i=1 ( f (z i ) = (2π) n 2 exp 1 ) 2 z z. z is, by construction, a vector of normal independent variables with zero mean and identity variance covariance matrix. The conventional notation is z N (0, I n ). Favero () Interpreting Regression Results 10 / 42

The normal multivariate distribution Consider now the linear transformation, x = Az + µ, where A is an (n n) invertible matrix. We consider the following transformation z = A 1 (x µ) with Jacobian J = A 1 = 1 A. By applying the formula for the transformation of variables, we have: f (x) = (2π) n 2 A 1 exp ( 1 ) 2 (x µ) A 1 A 1 (x µ), which, by defining the positive definite matrix = AA, equals ( f (x) = (2π) n 2 1 2 exp 1 ) 2 (x µ) 1 (x µ). The conventional notation for the multivariate normal is x N (µ, ). Favero () Interpreting Regression Results 11 / 42

The transformation of normal multivariate The formula of the transformation of variable allows us to better understand the theorem introduced in a previous section of this chapter., Theorem For any x N (µ, ), given any (m n) B matrix and any (m 1) vector, d, if y = Bx + d, this implies y N ( Bµ + d, B B ). Consider a partitioning of an n-variate normal vector in two sub-vectors of dimensions n 1 and n n 1 : ( ) (( ) ( )) x1 µ1 Σ11 Σ N, 12. x 2 µ 2 Σ 21 Σ 22 By applying the formula for the transofrmation of variables, we obtain two results: 1 x 1 N (µ 1, 11 ), which follows from applying the general formula in the case d = 0, B = (I n1 0); 2 (x 1 x 2 ) N ( µ 1 + Σ 12 Σ22 1 (x 2 µ 2 ), Σ 11 Σ 12 Σ22 1 Σ 21), which is Favero () Interpreting Regression Results 12 / 42

Distributions derived from the normal Consider z N (0, I n ), an n-variate standard normal. The distribution of ω = z z is defined as a χ 2 (n) distribution with n degrees of freedom. Consider two vectors z 1 and z 2 of dimensions n 1 and n 2 respectively, with the following distribution: ( z1 z 2 ) (( 0 N 0 ) ( In1 0, 0 I n2 )). We have ω 1 = z 1 z 1 χ 2 (n 1 ), ω 2 = z 2 z 2 χ 2 (n 2 ), and ω 1 + ω 2 = z 1 z 1 + z 2 z 2 χ 2 (n 1 + n 2 ). In general, the sum of two independent χ 2 (n) distributions is in itself distributed as χ 2 with a number of degrees of freedom equal to the sum of the degrees of freedom of the two χ 2. Favero () Interpreting Regression Results 13 / 42

Distributions derived from the normal Our discussion of the multivariate normal concludes that if x N (µ, ), then (x µ) 1 (x µ) χ 2 (n). A related result establishes that if z N (0, I n ) and M is a symmetric idempotent (n n) matrix of rank r, then z Mz χ 2 (r). Another distribution related to the normal is the F-distribution. The F-distribution is obtained as the ratio of two independent χ 2 divided by the respective degrees of freedom. Given ω 1 χ 2 (n 1 ), and ω 2 χ 2 (n 2 ), we have: ω 1 /n 1 ω 2 /n 2 F (n 1, n 2 ). Favero () Interpreting Regression Results 14 / 42

Distributions derived from the normal The Student s t distribution is then defined as: t n = F (1, n). Another useful result establishes that two quadratic forms in the standard multivariate normal, z Mz and z Qz, are independent if MQ = 0. We can finally state the following theorem, which is fundamental to the statistical inference in the linear model: Theorem If z N (0, I n ), M and Q are symmetric and idempotent matrices of ranks r and s respectively and MQ = 0, then z Qz r F (s, r). z Mz s Favero () Interpreting Regression Results 15 / 42

The conditional distribution y X To perform inference in the linear regression model, we need a further hypothesis to specify the distribution of y conditional upon X: ( ) y X N Xβ, σ 2 I, (1) or, equivalently ( ) ɛ X N 0, σ 2 I. (2) ) Given (1) we can immediately derive the distribution of ( β X which, being a linear combination of a normal distribution, is also normal: ) ( ( β X N β, σ 2 ( X X ) ) 1. (3) Favero () Interpreting Regression Results 16 / 42

The conditional distribution y X Equation (3) constitutes the basis to construct confidence intervals and to perform hypothesis testing in the linear regression model. Consider the following expression: ( β β ) X X ( β β ) σ 2 = ɛ X (X X) 1 X X (X X) 1 X ɛ σ 2 = ɛ Qɛ σ 2, Q = X ( X X ) 1 X and, applying the results derived in the previous section, we know that ɛ Qɛ σ 2 X χ 2 (k). (4) Favero () Interpreting Regression Results 17 / 42

The conditional distribution y X Equation (4) is not useful in practice, as we do not know σ 2. However, we know that ) S ( β X ɛ σ 2 = Mɛ σ 2 X χ 2 (T k). (5) M = I Q (6) Since MQ = 0, we know the distribution of the ratio of (4) and (5); moreover, taking the ratio, we get rid of the unknown term σ 2 : ( β β ) X X ( β β ) /σ 2 s 2 /σ 2 = ɛ Qɛ ɛ (T k) kf (k, T k). (7) Mɛ Favero () Interpreting Regression Results 18 / 42

Clicker 6 Insert Clicker 6 here Favero () Interpreting Regression Results 19 / 42

Confidence Intervals for β We use result (7) to obtain from the tables of the F-distribution the critical value F α (k, T k) such that prob [F (k, T k) > F α (k, T k)] = α, 0 < α < 1, for different values of α we are in the position of evaluating exactly an inequality of the following form: { ) ) } prob ( β β X X ( β β ks 2 F α (k, T k) = 1 α, which defines confidence intervals for β centred upon β. Favero () Interpreting Regression Results 20 / 42

Hypothesis Testing Hypothesis testing is strictly linked to the derivation of confidence intervals. When testing the hypothesis, we aim at rejecting the validity of restrictions imposed on the model on the basis of the sample evidence. Within this framework, (??) (3) are the maintained hypothesis and the restricted version of the model is identified with the null hypothesis H 0. Following the Neyman Pearson approach to hypothesis testing, one derives a statistic with known distribution under the null. Then the probability of the first-type error (rejecting H 0 when it is true) is fixed at α. For example, we use a test at the level α of the null hypothesis β = β 0, based on the F-statistic, when we do not reject the null H 0 if β 0 lies within the confidence interval associated with the probability 1 α. However, in practice, this is not a useful way of proceeding, as the economic hypotheses of interest rarely involve a number of restrictions equal to the number of estimated parameters. Favero () Interpreting Regression Results 21 / 42

Hypothesis Testing The general case of interest is therefore the one when we have r restrictions on the vector of parameters with r < k. If we limit our interest to the class of linear restrictions, we can express them as H 0 = Rβ = r, where R is an (r k) matrix of parameters with rank k and r is an (r 1) vector of parameters. To illustrate how R and r are constructed, we consider the baseline case of the CAPM model; we want to impose the restriction β 0,i = 0 on the following specification: ( r i t r rf t ) = β 0,i + β 1,i ( r m t Rβ = r, ( ) ( ) β 1 0 0,i = (0). β 1,i r rf t ) + u i,t, (8) The distribution of a known statistic under the null is derived by applying known results. Favero () Interpreting Regression Results 22 / 42

Hypothesis Testing ) ( If ( β X N β, σ 2 (X X) 1), then: ( ) ( R β r X N Rβ r, σ 2 R ( X X ) ) 1 R. (9) The test is constructed by deriving the distribution of (9) under the null Rβ r = 0. Given that ( ) R β r X = Rβ r + R ( X X ) 1 X u, under H 0, we have: ( ) R β r (R ( X X ) ) 1 1 ( ) R R β r = ɛ X ( X X ) 1 R ( R ( X X ) 1 R ) 1 R ( X X ) 1 X ɛ = ɛ Pɛ. where P is a symmetric idempotent matrix of rank r, orthogonal to M. Favero () Interpreting Regression Results 23 / 42

Hypothesis Testing Then ( ) R β r (R (X X) 1 R ) 1 ( ) R β r s 2 rf (r, T k), under H 0, which can be used to test the relevant hypothesis. Favero () Interpreting Regression Results 24 / 42

Clicker 7 Insert Clicker 7 here Favero () Interpreting Regression Results 25 / 42

The Partitioned Regression Model Given the linear model: y = Xβ + ɛ, Partition X in two blocks two blocks of dimension (Txr) and (Tx (k r)) and β in a corresponding way into [ ] β 1 β 2. The partitioned regression model can then be written as follows y = X 1 β 1 + X 2 β 2 + ɛ, Favero () Interpreting Regression Results 26 / 42

The Partitioned Regression Model It is useful to derive the formula for the OLS estimator in the partitioned regression model. To obtain such results we partition the normal equations X X β = X y as: ( X 1 X 2 ) ( X1 X 2 ) ( β1 β 2 ) = ( X 1 X 2 ) y, or, equivalently, ( X 1 X 1 X 1 X 2 X 2 X 1 X 2 X 2 ) ( β1 β 2 ) = ( X 1 y X 2 y ). (10) Favero () Interpreting Regression Results 27 / 42

The Partitioned Regression Model System (10) can be resolved in two stages by first deriving an expression β 2 as: β 2 = ( X 2 X ) ) 1 2 (X 2 y X 2 X 1 β 1, and then by substituting it in the first equation of (10) to obtain X 1 X 1 β 1 + X 1 X ( ) ) 2 X 1 2 X 2 (X 2 y X 2 X 1 β 1 = X 1 y, from which: β 1 = ( X 1 M ) 1 2X 1 X 1 M 2 y ( ) ) M 2 = (I X 2 X 1 2 X 2 X 2. Favero () Interpreting Regression Results 28 / 42

The Partitioned Regression Model Note that, as M 2 is idempotent, we can also write: β 1 = ( X 1 M 2 M 2X 1 ) 1 X 1 M 2 M 2y, and β 1 can be interpreted as the vector of OLS coefficients of the regression of y on the matrix of residuals of the regression of X 1 on X 2. Thus, an OLS regression on two regressors is equivalent to two OLS regressions on a single regressor (Frisch-Waugh theorem). Favero () Interpreting Regression Results 29 / 42

The Partitioned Regression Model Finally, consider the residuals of the partitioned model: ɛ = y X 1 β1 X 2 β2, ɛ = y X 1 β X2 ( X 2 X 2 ) 1 ( X 2 y X 2 X 1 β 1 ), ɛ = M 2 y M 2 X 1 β1 ( ) = M 2 y M 2 X 1 X 1 1 M 2 X 1 X 1 M 2 y ( ) ) = (M 2 M 2 X 1 X 1 1 M 2 X 1 X 1 M 2 y, however, we already know that ɛ = My, therefore, ( ) ) M = (M 2 M 2 X 1 X 1 1 M 2 X 1 X 1 M 2. (11) Favero () Interpreting Regression Results 30 / 42

Testing restrictions on a subset of coefficients In the general framework to test linear restrictions we set r = 0, R = [ I r 0 ], and partition β in a corresponding way into [ ] β 1 β 2. In this case the restriction Rβ r = 0 is equivalent to β 1 = 0 in the partitioned regression model. Under H 0, X 1 has no additional explicatory power for y with respect to X 2, therefore: ( ) H 0 : y = X 2 β 2 + ɛ, (ɛ X 1, X 2 ) N 0, σ 2 I. Note that the statement y = X 2 γ 2 + ɛ, ( ) (ɛ X 2 ) N 0, σ 2 I, is always true under our maintained hypotheses. However, in general γ 2 = β 2. Favero () Interpreting Regression Results 31 / 42

Testing restrictions on a subset of coefficients To derive a statistic to test H 0 remember that the general matrix R (X X) 1 R is the upper left block of (X X) 1, which we can now write as (X 1 M 2X 1 ) 1. The statistic then takes the form β 1 (X 1 M 2X 1 ) β 1 rs 2 = y M 2 X 1 (X 1 M 2X 1 ) 1 X 1 M 2y T k y F (T k, r). My r Given (11), (10) can be re-written as: y M 2 y y My T k y F (T k, r), (12) My r where the denominator is the sum of the squared residuals in the unconstrained model, while the numerator is the difference between the sum of residuals in the constrained model and the sum of residuals in the unconstrained model. Favero () Interpreting Regression Results 32 / 42

Testing restrictions on a subset of coefficients Consider the limit case r = 1 and β 1 is a scalar. The F-statistic takes the form β 2 1 s 2 (X 1 M 2X 1 ) F (T k, r), under H 0, where (X 1 M 2X 1 ) 1 is element (1, 1) of the matrix (X X) 1. Using the result on the relation between the F and the Student s t-distribution: β 1 s (X 1 M 2X 1 ) 1/2 t (T k) under H 0. Therefore, an immediate test of significance of the coefficient can be performed, by taking the ratio of each estimated coefficient and the associated standard error. Favero () Interpreting Regression Results 33 / 42

The partial regression theorem The Frisch-Waugh Theorem described above is worth more consideration. The theorem tells us than any given regression coefficient in the model E (y X) = Xβ can be computed in two different but exactly equivalent ways: 1) by regressing y on all the columns of X, 2) by first regressing the j-th column of X on all the other columns of X, computing the residuals of this regression and then by regressing y on these residuals. This result is relevant in that it clarifies that the relationships pinned down by the estimated parameters in a linear model do not describe the connections between the regressand and each regressor but the connection between the part of each regressor that is not explained by the other ones and the regressand. Favero () Interpreting Regression Results 34 / 42

What if analysis The relevant question in this case becomes how much shall y change if I change X i? The estimation of a single equation linear model does not allow to anser that question, for a number of reasons. First, estimated parameters in a linear model can only answer the question how much shall E (y X) if I change X? We have seen that the two questions are very different if the R 2 of the regression is low, in this case a change in E (y X) may not effect any visible and relevant effect on y. Second, a regression model is a conditional expected value GIVEN X. In this sense there is no space for changing the value of any element in X. Favero () Interpreting Regression Results 35 / 42

What if analysis Any statement involving such a change requires some assumption on how the conditional expectation of y changes if X changes and a correct analysis of this requires an assumption on the joint distribution of y and X. Simulation might require the use of the multivariate joint model even when valid estimation can be performed concentrating only on the conditional model. Strong exogeneity is stronger than weak exogeneity for the estimation of the parameters of interest. Favero () Interpreting Regression Results 36 / 42

What if analysis Think of a linear model with know parameters y = β 1 x 1 + β 2 x 2 What is in this model the effect of on y of changing x 1 by one unit while keeping x 2 constant? Easy β 1. Now think of the estimated linear model: y = ˆ β 1 x 1 + ˆ β 2 x 2 + û Now y is different from E (y X) and the question "what is in this model the effect of on E (y X) of changing x 1 by one unit while keeping x 2 constant?" does not in general make sense. Favero () Interpreting Regression Results 37 / 42

Clicker 8 Insert Clicker 8 here Favero () Interpreting Regression Results 38 / 42

What if analysis Changing x 1 keeping x 2 unaltered implies that there is zero correlation among this variables. But the estimates β ˆ 1 and β ˆ 2 are obained by using data in which in general there is some correlation between x 1 and x 2. Data in which fluctuations in x 1 do not have any effect on x 2 would have most likely generated different estimates from those obtained in the estimation sample. The only valid question that can be answered using the coefficients in linear regression is "What is the effect on E (y X) of changing the part of each regressors that is orthogonal to the other ones". "What if" analysis requires simulation and in most cases a low level of reduction than that used for regression analysis. Favero () Interpreting Regression Results 39 / 42

The semi-partial R-squared When the columns of X are orthogonal to eache other the total R 2 can be exactly decomposed in the sum of the partial R 2 due to each regressor x i (the partial R 2 of a regressor i is defined as the R 2 of the regression of y on x i ). This is in general not the case in applications with non experimental data: columns of X are correlated and a (often large) part of the overall R 2 does depend on the joint behaviour of the columns of X. However, it is always possible to compute the marginal contribution to the overall R 2 due to each regressor x i, defined as the difference between the overall R 2 and the R 2 ot the regression that inlcudes all columns X except x i. This is called the semi-partial R 2. Favero () Interpreting Regression Results 40 / 42

The semi-partial R-squared Interestingly, the the semi-partial R 2 is a simple tranformation of the t-ratio: ( spr 2 i = t2 β 1 R 2 ) i (T k) This result has two interesting implications. First, a quantity which we considered as just a measure of statistical reliability, can lead to a measure of relevance when combined with the overall R 2 of the regression. Second, we can re-iterate the difference between statistical significance and relevance. Suppose you have a sample size of 10000 and you have 10 columns in X and the t-ratio on a coefficient β i is of about 4 with an associate P-value of the order.01: very statistical significant! The derivation of the semi-partial R 2 tells us that the contribution of this variable to the overall R2 is at most approximately 16/(10000-10) that is: less than two thousands. Favero () Interpreting Regression Results 41 / 42

Clicker 9 Insert Clicker 9 here Favero () Interpreting Regression Results 42 / 42