Analysis of Cross-Sectional Data

Similar documents
Analysis of Cross-Sectional Data

MFE Financial Econometrics 2018 Final Exam Model Solutions

Empirical Economic Research, Part II

Generalized Linear Models

the error term could vary over the observations, in ways that are related

Heteroskedasticity. Part VII. Heteroskedasticity

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Introductory Econometrics

Intermediate Econometrics

Instead of using all the sample observations for estimation, the suggested procedure is to divide the data set

The outline for Unit 3

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Chapter 8 Heteroskedasticity

Additional Topics on Linear Regression

Econometrics Summary Algebraic and Statistical Preliminaries

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Testing and Model Selection

Introduction to Econometrics. Heteroskedasticity

Graduate Econometrics Lecture 4: Heteroskedasticity

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Introductory Econometrics

Review of Econometrics

Econometrics of Panel Data

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econometric Analysis of Cross Section and Panel Data

CHAPTER 6: SPECIFICATION VARIABLES

Correlation and Regression

Advanced Econometrics I

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Least Squares Estimation-Finite-Sample Properties

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T,

Introduction to Eco n o m et rics

Model Estimation Example

Chapter 1 Statistical Inference

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

2. Linear regression with multiple regressors

Topic 7: Heteroskedasticity

A Practitioner s Guide to Cluster-Robust Inference

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

Econ 510 B. Brown Spring 2014 Final Exam Answers

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Review of Statistics 101

Applied Econometrics (QEM)

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

9 Correlation and Regression

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Maximum Likelihood (ML) Estimation

Introduction to Econometrics

EC4051 Project and Introductory Econometrics

Econometrics of Panel Data

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

FinQuiz Notes

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

LECTURE 11. Introduction to Econometrics. Autocorrelation

Making sense of Econometrics: Basics

Quick Review on Linear Multiple Regression

Econometrics Multiple Regression Analysis: Heteroskedasticity

Economics 582 Random Effects Estimation

8. Hypothesis Testing

Föreläsning /31

Stat 5101 Lecture Notes

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

8. Instrumental variables regression

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

ECON 4551 Econometrics II Memorial University of Newfoundland. Panel Data Models. Adapted from Vera Tabakova s notes

A Guide to Modern Econometric:

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Lecture 3: Multiple Regression

POLSCI 702 Non-Normality and Heteroskedasticity

Specification testing in panel data models estimated by fixed effects with instrumental variables

Lectures on Simple Linear Regression Stat 431, Summer 2012

Heteroskedasticity and Autocorrelation

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Probability and Statistics Notes


Applied Statistics and Econometrics

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of

Multivariate Regression Analysis

Final Exam - Solutions

Using all observations when forecasting under structural breaks

Motivation for multiple regression

Heteroskedasticity-Robust Inference in Finite Samples

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University

Review of Statistics

Physics 509: Bootstrap and Robust Parameter Estimation

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.

Model Mis-specification

Transcription:

Analysis of Cross-Sectional Data Kevin Sheppard http://www.kevinsheppard.com Oxford MFE This version: November 8, 2017 November 13 14, 2017

Outline Econometric models Specification that can be analyzed with linear regression Factor models for returns Estimation of unknown parameters Assessing model fit The relationship between OLS and ML Testing Hypotheses: Wald, LR and LM Model building Checking for assumption violation 2 / 64

Large Sample Properties Theorem (Asymptotic Distribution of ˆ ) Under these assumptions p n( ˆ n ) d! N(0, 1 XX S 1 XX ) (1) where XX = E[x 0 i x i] and S = V[n 1/2 X 0 ]. CLT is a strong result that will form the basis of the inference we can make on What good is a CLT? 3 / 64

Estimating the parameter covariance p Before making inference, the covariance of n ˆ must be estimated Theorem (Asymptotic Covariance Consistency) Under the large sample assumptions, and ˆ XX =n 1 X 0 X! p XX nx Ŝ =n 1 ˆ 2 i x0 i x p i! S i=1 =n 1 X 0 ÊX ˆ 1 XXŜ ˆ 1 XX p! 1 XX S 1 XX where Ê = diag( ˆ 2 1,..., ˆ 2 n ). 4 / 64

Bootstrap Estimation of Parameter Covariance 2 common bootstraps for estimating regression parameter covariance 1. Residual Bootstrap Appropriate when data are conditionally homoskedastic Separate selection of xi and ˆ i when constructing bootstrap ỹ i 2. Non-parametric Bootstrap Works under more general conditions Resamples {yi, x i } as a pair Both are for data where the errors are not cross-sectionally correlated 5 / 64

The Bootstrap Algorithm for homoskedastic data Algorithm (Residual Bootstrap Regression Covariance) 1. Generate 2 sets of n uniform integers {u 1,i } n i=1 and {u 2,i} n i=1on [1, 2,...,n]. 2. Construct a simulated sample ỹ u1,i = x u1,i ˆ + ˆ u2,i. 3. Estimate the parameters of interest using ỹ u1,i = x u1,i + u1,i, and denote the estimate b. 4. Repeat steps 1 through 3 a total of B times. 5. Estimate the variance of ˆ using bv ˆ = B 1 BX j ˆ j ˆ 0 or bv ˆ b=1 = B 1 BX j j 0 b=1 6 / 64

The Bootstrap Algorithm for heteroskedastic data Algorithm (Nonparametric Bootstrap Regression Covariance) 1. Generate a sets of n uniform integers {u i } n i=1 on [1, 2,...,n]. 2. Construct a simulated sample {y ui, x ui }. 3. Estimate the parameters of interest using y ui = x ui + ui, and denote the estimate b. 4. Repeat steps 1 through 3 a total of B times. 5. Estimate the variance of ˆ using bv ˆ = B 1 BX j ˆ j ˆ 0 or bv ˆ b=1 = B 1 BX j j 0 b=1 7 / 64

Hypothesis Testing

Hypothesis Testing 8 / 64

Elements of a hypothesis test Definition (Null Hypothesis) The null hypothesis, denoted H 0, is a statement about the population values of some parameters to be tested. The null hypothesis is also known as the maintained hypothesis. Null is important because it determines the conditions under which the distribution of ˆ must be known Definition (Alternative Hypothesis) The alternative hypothesis, denoted H 1, is a complementary hypothesis to the null and determines the range of values of the population parameter that should lead to rejection of the null. Alternative is important because it determines the conditions where the null should be rejected H 0 : Market = 0, H 1 : Market > 0 or H 1 : Market 6= 0 9 / 64

Elements of a hypothesis test Definition (Hypothesis Test) A hypothesis test is a rule that specifies the values where H 0 should be rejected in favor of H 1. The test embeds a test statistic and a rule which determines if H0 can be rejected Note: Failing to reject the null does not mean the null is accepted. Definition (Critical Value) The critical value for an -sized test, denoted C, is the value where a test statistic, T, indicates rejection of the null hypothesis when the null is true. CV is the value where the null is just rejected CV is usually a point although can be a set 10 / 64

Elements of a hypothesis test Definition (Rejection Region) The rejection region is the region where T > C. Definition (Type I Error) A Type I error is the event that the null is rejected when the null is actually valid. Controlling the Type I is the basis of frequentist testing Note: Occurs only when null is true Definition (Size) The size or level of a test, denoted, is the probability of rejecting the null when the null is true. The size is also the probability of a Type I error. Size represents the preference for being wrong and rejecting true null 11 / 64

Elements of a hypothesis test Definition (Type II Error) A Type II error is the event that the null is not rejected when the alternative is true. A Type II occurs when the null is not rejected when it should be Definition (Power) The power of the test is the probability of rejecting the null when the alternative is true. The power is equivalently defined as 1 minus the probability of a Type II error. High power tests can discriminate between the null and the alternative with a relatively small amount of data 12 / 64

Type I & II Errors, Size and Power Size and power can be related to correct and incorrect decisions Decision Do not reject H 0 Reject H 0 Truth H 0 Correct Type I Error (Size) H 1 Type II Error Correct (Power) 13 / 64

Hypothesis testing in regressions Distribution theory allows for inference Hypothesis H 0 : R( )=0 R( ) is a function from R k! R m, m apple k All equality hypotheses can be written this way H 0 : ( 1 1)( 2 1) =0 H 0 : Linear Equality Hypotheses (LEH) H 0 : R r = 0 or in long hand, kx j=1 R i,j 1 2 1 + 2 1 = 0 j = r i, i = 1, 2,...,m R is an m by k matrix r is an m by 1 vector Attention limited to linear hypotheses in this chapter Nonlinear hypotheses examined in GMM notes 14 / 64

What is a linear hypothesis 3-Factor FF Model: BH e i = 1 + 2 VWM e i + 3 SMB i + 4 HML i + i H 0 : 2 = 0 [Market Neutral] R =[0100] r = 0 H 0 : 2 + 3 = 1 R =[0110] r = 1 H 0 : 3 = 4 = 0 [CAPM with nonzero intercept] apple 0 0 1 0 R = 0 0 0 1 r =[0 0] 0 H 0 : 1 = 0, 2 = 1, 2 + 3 + 4 = 1 2 3 1 0 0 0 R = 4 0 1 0 0 5 0 1 1 1 r =[0 1 1] 0 15 / 64

Estimating linear regressions subject to LER Linear regressions subject to linear equality constraints can always be directly estimated using a transformed regression BHi e = 1 + 2 VWMi e + 3 SMB i + 4 HML i + i H 0 : 1 = 0, 2 = 1, 2 + 3 + 4 = 1 ) 2 = 1 3 4 ) 1 = 1 3 4 ) 3 = 4BHi e Combine to produce restricted model BH e i = 0 + 1VWM e i + 3 SMB i 3 HML i + i BH e i VWM e i = 3(SMB i HML i ) + i r i = 3 r P i + i 16 / 64

3 Major Categories of Tests Wald Directly tests magnitude of R r t-test is a special case Estimation only under alternative (unrestricted model) Lagrange Multiplier (LM) Also Score test or Rao test Tests how close to a minimum the sum of squared errors is if the null is true Estimation only under null (restricted model) Likelihood Ratio (LR) Tests magnitude of log-likelihood difference between the null and alternative Invariant to reparameterization Good thing! Estimation under both null and alternative Close to LM in asymptotic framework 17 / 64

Visualizing the three tests Rβ r =0 2X (y Xβ) elihood tio Lagrange Multiplier Wald (y Xβ) (y Xβ)) β ˆβ 18 / 64

t-tests Single linear hypothesis: H0 : R = r p n ˆ d! N(0, 1 XX S 1 XX ) ) p n R ˆ Note: Under the null H0 : R = r Transform to standard normal random variable z = p R n ˆ r q R 1 XX S 1 XX R0 Infeasible: Depends on unknown covariance Construct feasible version using estimate t = p R n ˆ r q R ˆ XXŜ 1 ˆ XXR 1 0 r d! N(0, R 1 XX S 1 XX R0 ) Estimated variance of R ˆ Note: Asymptotic distribution is unaffected since covariance estimator is consistent 19 / 64

t-test and t-stat Unique property of t-tests: Easily test one-sided alternatives H 0 : 1 = 0 vs. H 1 : 1 > 0 More powerful if you know the sign (e.g. risk premia) t-stat of a coefficient ˆk is test of H 0 : k = 0 against H 0 : k 6= 0 p n ˆk r 1 XX S 1 XX Single most common statistic Reported for nearly every coefficient [kk] ] 20 / 64

Distribution and rejection region 95% One sided (Upper) 95% 2 sided z 1.645 1.960 1.960 ˆβ β 0 se( ˆβ) 3 2 1 0 1 2 3 21 / 64

Implementing a t Test Algorithm (t-test) 1. Estimate the unrestricted model y i = x i + i 2. Estimate the parameter covariance using ˆ 1 XXŜ ˆ 1 XX 3. Construct the restriction matrix, R, and the value of the restriction, r, from null 4. Compute t = p n R ˆ n r p, v = R ˆ 1 v XXŜ ˆ XXR 1 0 5. Make decision (C is the upper tail -CV from N(0, 1)): a. 1-sided Upper: Reject the null if t > C b. 1-sided Lower: Reject the null if t < C c. 2-sided: Reject the null if t > C /2 22 / 64

t-tests on the FF Portfolios Testing the coefficients on the BH e portfolio All have H0 : j = 0 and H 1 : j 6= 0 Definition (P-value) Heteroskedasticity Parameter ˆ S.E. t-stat p-val Constant -0.064 0.042-1.518 0.129 VWM e 1.077 0.011 97.013 0.000 SMB 0.019 0.024 0.771 0.440 HML 0.803 0.018 43.853 0.000 UMD -0.040 0.014-2.824 0.005 The p-value is largest size ( ) where the null hypothesis cannot be rejected. The p-value can be equivalently defined as the smallest size where the null hypothesis can be rejected. Should be familiar with 1.645, 1.96 (2) and 2.32 23 / 64

Wald tests Wald tests examine validity of one or more equality restriction by measuring magnitude of R r For same reasons as t-test, under the null p n R ˆ r d! N(0, R 1 XX S 1 XX R0 ) Standardized and squared W = n(r ˆ r) 0 h R 1 XX S 1 XX R0i 1 (R ˆ r) d! 2 m Again, this is infeasible, so use the feasible version W = n(r ˆ r) 0 h R ˆ 1 XXŜ ˆ 1 XXR 0i 1 (R ˆ r) d! 2 m 24 / 64

Bivariate confidence sets No Correlation Positive Correlation 3 3 2 2 1 1 0 0 1 1 2 2 3 2 0 2 3 2 0 2 Negative Correlation Different Variances 3 2 99% 90% 80% 3 2 1 1 0 0 1 1 2 2 3 2 0 2 3 2 0 2 25 / 64

Implementing a Wald Test Algorithm (Large Sample Wald Test) 1. Estimate the unrestricted model y i = x i + i. 2. Estimate the parameter covariance using ˆ 1 XXŜ ˆ 1 XX where ˆ XX = n 1 nx x 0 i x i, Ŝ = n 1 nx ˆ 2 i x0 i x i i=1 i=1 3. Construct the restriction matrix, R, and the value of the restriction, r, from the null hypothesis. 4. Compute W = n(r ˆ h i n r) 0 R ˆ XXŜ 1 ˆ XXR 1 1 0 (R ˆ n r). 5. Reject the null if W > C where C is the critical value from a 2 m using a size of. 26 / 64

Wald tests on the FF Portfolios BH e i = 1 + 2 VWM e i + 3 SMB i + 4 HML i + 5 UMB i + i Wald tests only fail to reject the null that the constant and the coefficient on SMB are zero Wald Tests Null M W p-val j = 0, j = 1,...,5 5 16303 0.000 j = 0, j = 1, 3, 4, 5 4 2345.3 0.000 j = 0, j = 1, 5 2 13.59 0.001 j = 0, j = 1, 3 2 2.852 0.240 5 = 0 1 7.97 0.005 Note: Nulls are and, alternatives are or H0 : 1 = 0 \ 3 = 0, H 1 : 1 6= 0 [ 3 6= 0 27 / 64

Lagrange Multiplier (LM) tests LM tests examine shadow price of the constraint (null) min (y X ) 0 (y X ) s. t. R r = 0. Lagrangian L(, )=(y X ) 0 (y X )+(R r) 0 If null true, then 0 FOC: @ L = 2X 0 (y X )+R 0 = 0 @ @ L = R r = 0 @ A few minutes of matrix algebra later = 2 R(X 0 X) 1 R 0 1 (R ˆ r) = ˆ (X 0 X) 1 R 0 R(X 0 X) 1 R 0 1 (R ˆ r) Tilde denotes estimates computed under the null (e.g. ) 28 / 64

Why LM tests are also known as score tests... = 2 hr(x 0 X) 1 R 0i 1 (R ˆ r) is just a function of normals (via ˆ, the OLS estimator) Alternatively, R 0 = 2X 0 R has rank m, so R 0 0, X 0 0 are the estimated residuals under the null Under the assumptions, p n s = p n n 1 X 0 d! N(0, S) And we know how to test multivariate normals for equality to 0 LM = n s 0 S 1 s d! But we always have to use the feasible version, 2 m LM = n s 0 ˆ S 1 s = n s 0 n 1 X 0 ẼX 1 s d! 2 m Note: ˆ S (and Ẽ) is estimated using the errors from the restricted regression. 29 / 64

Implementing a LM test Algorithm (Large Sample Lagrange Multiplier Test) 1. Form the unrestricted model, y i = x i + i. 2. Impose the null on the unrestricted model and estimate the restricted model, ỹ i = x i + i. 3. Compute the residuals from the restricted regression, i = ỹ i x i. 4. Construct the score using the residuals from the restricted regression from both models, s i = x i i where x i are the regressors from the unrestricted model. 5. Estimate the average score and the covariance of the score, s = n 1 nx s i, S = n 1 nx s 0 i s i (2) i=1 i=1 6. Compute the LM test statistic as LM = n s S 1 s 0 and compare to the critical value from a 2 m using a size of. 30 / 64

LM tests on the FF Portfolios BH e i = 1 + 2 VWM e i + 3 SMB i + 4 HML i + 5 UMB i + i LM broadly agrees with Wald but note the magnitudes LM Tests Null M LM p-val j = 0, j = 1,...,5 5 106 0.000 j = 0, j = 1, 3, 4, 5 4 170.5 0.000 j = 0, j = 1, 5 2 14.18 0.001 j = 0, j = 1, 3 2 3.012 0.222 5 = 0 1 8.38 0.004 31 / 64

Likelihood ratio (LR) tests A large sample LR test can be constructed using a test statistic that looks like the LM test Formally the large-sample LR is based on testing whether the difference of the scores, evaluated at the restricted and unrestricted parameters, is large in a statistically meaningful sense Suppose S is known, then n ( s ŝ) 0 S 1 ( s ŝ) = n ( s 0) 0 S 1 ( s 0) ( Why?) n s 0 S 1 s d! Leads to definition of large sample LR identical to LM but uses a difference variance estimator 2 m LR = n s 0 Ŝ 1 s d! 2 m Note: Ŝ (and Ê) is estimated using the errors from the unrestricted regression. Ŝ is estimated under the alternative and S is estimated under the null Ŝ is usually smaller than S ) LR is usually larger than LM 32 / 64

Implementing a LR test Algorithm (Large Sample Likelihood Ratio Test) 1. Estimate the unrestricted model y i = x i + i. 2. Impose the null on the unrestricted model and estimate the restricted model, ỹ i = x i + i. 3. Compute the residuals from the restricted regression, i = ỹ i x i, and from the unrestricted regression, ˆ i = y i x i ˆ. 4. Construct the score from both models, s i = x i i and ŝ i = x i ˆ i, where in both cases x i are the regressors from the unrestricted model. 5. Estimate the average score and the covariance of the score, s = n 1 nx s i, Ŝ = n 1 nx ŝ 0 iŝi (3) i=1 i=1 6. Compute the LR test statistic as LR = n sŝ 1 s 0 and compare to the critical value from a 2 m using a size of. 33 / 64

Likelihood ratio (LR) tests (Classic Assumptions) Warning: The LR on this slide critically relies on homoskedasticity and normality If null is close to alternative, log-likelihood should be similar under both! LR = 2 ln max, 2 f (y X;, 2 ) subject to R = r max, 2 f (y X;, 2 ) A little simple algebra later... SSER LR = n ln SSE U = n ln s 2 R s 2 U In classical setup, distribution LR is apple n k LR exp m n 1 F m,n k Although m Fm,n k! 2 m as n!1 34 / 64

LR tests on the FF Portfolios BH e i = 1 + 2 VWM e i + 3 SMB i + 4 HML i + 5 UMB i + i LR tests agree with the other two but are closer to Wald in magnitude LR Tests Null M LR p-val j = 0, j = 1,...,5 5 16303 0.000 j = 0, j = 1, 3, 4, 5 4 10211.4 0.000 j = 0, j = 1, 5 2 13.99 0.001 j = 0, j = 1, 3 2 3.088 0.213 5 = 0 1 8.28 0.004 35 / 64

Comparing the three tests Three tests, which to choose Asymptotically all are equivalent Rule of thumb: W LR > LM since W and LR use errors estimated under the alternative Larger test statistics are good since all have same distribution ) more power If derived from MLE (Classical Assumptions: normality, homoskedasticity), an exact relationship: W = LR > LM In some contexts (not linear regression) ease of estimation is a useful criteria to prefer one test over the others Easy estimation of null: LM Easy estimation of alternative: Wald Easy to estimate both: LR or Wald 36 / 64

Comparing the three Rβ r =0 2X (y Xβ) elihood tio Lagrange Multiplier Wald (y Xβ) (y Xβ)) β ˆβ 37 / 64

Heteroskedasticity

Heteroskedasticity Heteroskedasticity: hetero: Different skedannumi: To scatter Heteroskedasticity is pervasive in financial data Usual covariance estimator (previously given) allows for Heteroskedasticity of unknown form Tempting to always use Heteroskedasticity Robust Covariance estimator Also known as White s Covariance (Eicker/Huber) estimator Finite sample properties are generally worse if data are homoskedastic If data are homoskedastic can use a simpler estimator Formally need i h i h i h E h 2 i x j,i x j,i = E 2 i x j,i and E 2 i x2 j,i x j,i = E 2 i i x 2 j,i for i = 1, 2,...,n, j = 1, 2,...,k to justify simpler estimator 38 / 64

Estimating the parameter covariance (Homoskedasticity) Theorem (Homoskedastic CLT) Under the large sample assumptions, and if the errors are homoskedastic, p n( ˆ n ) where XX = E[x 0 i x i] and 2 = V[ i ] d! N(0, 2 1 XX ) Theorem (Homoskedastic Covariance Estimator) Under the large sample assumptions, and if the errors are homoskedastic, ˆ 2 ˆ 1 XX p! 2 1 XX Homoskedasticity justifies the usual estimator ˆ 2 n 1 X 0 X 1 When using financial data this is the unusual estimator 39 / 64

Testing for heteroskedasticity Two covariance estimators: ˆ XXŜ 1 ˆ 1 XX ˆ 2 ˆ 1 XX White s Covariance estimator has worse finite sample properties Should be avoided if homoskedasticity plausible White s test: ˆ 2 i = z i + i z i consist of all cross products of x in x jn LM test that all coefficients on parameters (except the constant) are zero H 0 : 2 = 3 =...= k (k+1)/2 = 0 White s test on the BH e portfolio returns in CAPM ˆ 2 i = 1 + 2 VWM e i + 3 VWM e2 i + i ˆ 3.32 0.93 0.21 t-stat 2.89 4.63 18.28 LM test: 274.9, p-val 0.000 ) Strong evidence of heteroskedasticity 40 / 64

Implementing White s Test for Heteroskedasticity Algorithm (White s Test for Heteroskedasticity) 1. Fit the model y i = x i + i 2. Construct the fit residuals ˆ i = y i x i ˆ 3. Construct the auxiliary regressors z i where the k(k + 1)/2 elements of z i are computed from x i,o x i,p for o = 1, 2,...,k, p = o, o + 1,...,k. 4. Estimate the auxiliary regression ˆ 2 i = z i + i 5. Compute White s Test statistic as nr 2 where the R 2 is from the auxiliary regression and compare to the critical value at size from a 2 k(k+1)/2 1. Note: This algorithm assumes the model contains a constant. If the original model does not contain a constant, then z i should be augmented with a constant, and the asymptotic distribution is a 2 k(k+1)/2. 41 / 64

Pitfalls and Problems

Problems with models What happens when the assumptions are violated? Model misspecified Omitted variables Extraneous Variables Functional Form Heteroskedasticity Too few moments Errors correlated with regressors Rare in asset pricing 42 / 64

Not enough moments Too few moments causes problems for both ˆ and t-stats Consistency requires 2 moments for xi, 1 for i Consistent estimation of variance requires 4 moments of xi and 2 of i Fewer than 2 moments of xi Slopes can still be consistent Intercepts cannot Fewer than 1 for i ˆ is inconsistent Too much noise! Between 2 and 4 moments of xi or 1 and 2 of i Tests are inconsistent 43 / 64

Omitted Variables What if the linearity assumption is violated? Omitted variables Correct Model y i = x 1,i 1 + x 2,i 2 + i Model Estimated y i = x 1,i 1 + i Can show ˆ 1 p! 1 + 0 2 x 2,i = x 1,i + i ˆ 1 captures any portion of y i explainable by x 1,i from model 1 2 through correlation between x 1,i and x 2,i Two cases where omitted variables do not produce bias x 1,i and x 2,i uncorrelated: Dummy variables 2 = 0 : Model correct Estimated variance inconsistent 44 / 64

Extraneous Variables Correct model y i = x 1,i 1 + i Model Estimated y i = x 1,i 1 + x 2,i 2 + i Can show: ˆ 1 p! 1 No problem, right? Including extraneous regressors increase parameter uncertainty Excluding marginally relevant regressors reduces parameter uncertainty but increases chance model is misspecified Bias-Variance Trade off Smaller models reduce variance, even if introducing bias Large models have less bias Related to model selection... 45 / 64

Heteroskedasticity Common problem across most financial data sets Asset returns Firm characteristics Executive compensation Solution 1: Heteroskedasticity robust covariance estimator ˆ 1 XXŜ ˆ 1 XX Partial Solution 2 : Use data transformations Ratios: Volume vs. Turnover (Volume/Shares Outstanding) Logs: Volume vs. ln Volume Volume = Size Shock ln Volume = ln Size + ln Shock 46 / 64

GLS and FGLS Solution 3: Generalized Least Squares (GLS) ˆ GLS n =(X 0 W 1 X) Can choose W cleverly 1 X 0 W 1 y, W is n n positive definite ˆ GLS n Choose W so that W 1 2 is homoskedastic and uncorrelated p! ˆ GLS is asymptotically efficient In practice W is unknown, but can be estimated ˆ 2 i = z i + i Ŵ = diag(z i ˆ) Resulting estimator is Feasible GLS (FGLS) Still asymptotically efficient Small sample properties are not assured may be quite bad In between: Use pre-specified but potentially sub-optimal W For example: Diagonal which ignores potential correlation Requires alternative estimator of parameter covariance (notes) 47 / 64

Model Building

Model Building The Black Art of econometrics Many rules and procedures Most contradictory Always a trade-off between bias and variance in finite sample Better models usually have a finance or economic theory behind them Three steps Model Selection Specification Checking Model Evaluation Pseudo-Out-of-Sample evaluation Unless on orders from your boss 48 / 64

Strategies General to Specific Fit largest specification Drop largest p-val Refit Stop if all p-values indicate significance at size is the econometrician s choice Specific to General Fit all specifications that include a single explanatory variable Include variable with the smallest p-val Starting from this model, test all other variables by adding in one-at-a-time Stop if no p-val of an excluded variable indicates significance at size 49 / 64

Information Criteria Information Criteria Akaike Information Criterion (AIC) AIC = ln ˆ 2 + 2 k n Schwartz (Bayesian) Information Criterion (SIC/BIC) BIC = ln ˆ 2 + k ln n n Both have versions suitable for likelihood based estimation Reward for better fit: Reduce ln ˆ 2 Penalty for more parameters: 2 k n Choose model with smallest IC or k ln n n AIC has fixed penalty ) inclusion of extraneous variables BIC has larger penalty if ln n > 2 (n > 7) 50 / 64

Cross Validation Use 100 m% to estimate parameters, evaluate using remaining m% m = 100 k 1 in k-fold cross validation Algorithm (k-fold cross validation) 1. For each model: a. Randomly divide observations into k-equally sized blocks, S j, j = 1,...,k b. For j = 1,...,k estimate ˆ j by excluding the observations in block i c. Compute cross-validated SSE using observations in block j and ˆ j SSE xv = kx X y i x i ˆ j 2 j=1 i2s j 2. Select model with lowest cross-validated SSE Typical values for k are 5 or 10 51 / 64

Specification Analysis Is a selected model any good? Test it: Stability Test: Chow y i = x i + i y i = x i + I [n>c] x i + i H 0 : = 0 Nonlinearity Test: Ramsey s RESET y i = x i + 1 ŷ 2 i + 2 ŷ 3 i +...+ L 1 ŷ L i + i H 0 : = 0 Recursive and/or Rolling Estimation Influence Function Influence: x i (X 0 X) 1 x 0 i ( Normalized length of x i Normality Tests: Jarque-Bera sk 2 (apple 3)2 JB = n + 6 24 52 / 64

Implementing a Chow & RESET Tests Algorithm (Chow Test) 1. Estimate the model y i = x i + I [n>c] x i + i. 2. Test the null H 0 : = 0 against the alternative H 1 : i 6= 0, for some i, using a Wald, LM or LR test. Algorithm (RESET Test) 1. Estimate the model y i = x i + i and construct the fit values, ŷ i = x i ˆ. 2. Re-estimate the model y i = x i + 1 ŷi 2 + 2 ŷi 3 +...+ i. 3. Test the null H 0 : 1 = 2 =...= m = 0 against the alternative H 1 : i 6= 0, for some i, using a Wald, LM or LR test, all of which have a 2 m distribution. 53 / 64

BH e Residual plot in 4 factor model 10 5 0 5 1930 1940 1950 1960 1970 1980 1990 2000 54 / 64

Specification Analysis: The BH e model Chow and RESET Tests: Dependent variable BH e i Variable Coefficient P-val Variable Coefficient P-val 1-0.04 0.304 1-0.093 0.040 VWM e i 1.06 0.000 VWM e i 1.079 0.000 SMB i 0.01 0.743 SMB i 0.016 0.506 HML i 0.87 0.000 HML i 0.813 0.000 UMD i -0.03 0.260 UMD i -0.038 0.009 VWM e i I [YR 1947] 0.00 0.961 ŷi 2 0.000 0.282 SMB i I [YR 1947] -0.01 0.773 ŷ 3 i -0.000 0.691 HML i I [YR 1947] -0.11 0.016 UMD i I [YR 1947] -0.00 0.773 Chow 9.73 0.045 ( Jarque-Bera: 1828.7 P-val: 0.00 ( 2 2 ) 2 4 ) RESET 1.52 0.466 ( 2 2 ) Kurtosis 10 (p-val: 0.00) ( Skewness 1 (p-val: 0.00) ( 2 1 ) 2 1 ) 55 / 64

Outliers Outliers happen for a number of reasons Data entry errors Funds blowing-up Hyper-inflation Often interested in results which are robust to some outliers Three common options Trimming Windsorizaiton (Iteratively) Reweighted Least Squares (IRWLS) Similar to GLS, only uses functions based on outlyingness of error 56 / 64

Trimming Trimming involves removing observations Removal must be based on values of i not y i Removal based on yi can lead to bias Requires initial estimate of ˆ, denoted Could include full sample, but sensitive to outliers, especially if extreme Use a subsample that you believe is good Choose subsamples at random and use a typical value Construct residuals i = y i x i and delete observations if i < ˆq or i > ˆq 1 for some small (typically 2.5% or 5%) ˆq is the -quantile of the empirical distribution of i Estimate final ˆ using OLS on remaining (non-trimmed) data 57 / 64

Correct and Incorrect Trimming Removal based on yi leads to bias 5 Correct Trimming 5 Incorrect Trimming 0 0 5 5 0 5 5 5 0 5 58 / 64

Windsorization Windsoization involves replacing outliers with less outlying observations Like trimming, removal must be based on values of i not y i Requires initial estimate of ˆ, denoted Construct residuals i = y i x i Reconstruct yi as 8 >< x i + ˆq i < ˆq y i = y i ˆq apple i apple ˆq 1 >: x i + ˆq 1 i ˆq 1 Estimate final ˆ using OLS the reconstructed data 59 / 64

Correct and Incorrect Windsorization Removal based on yi leads to bias 5 Correct Windsorization 5 Incorrect Windsorization 0 0 5 5 0 5 5 5 0 5 60 / 64

Rolling and Recursive Regressions Parameter stability is often an important concern Rolling regressions are an easy method to examine parameter stability 0 1 1 0 1 j+m j+m X ˆ i = @ x 0 i x X A @ i x 0 i y A i, j = 1, 2,...,n m i=j i=j Constructing confidence intervals formally is difficult Approximate method computes full sample covariance matrix, and then scales by n/m to reflect the smaller sample used Similar to building a confidence interval under a null that the parameters are constant Recursive regression is defined similarly only using an expanding window 0 1 1 0 1 jx jx ˆ i = @ x 0 i x A @ i x 0 i y A i, j = m, m + 1,...,n i=1 i=1 Similar issues with confidence intervals Often hard to find see variation near end of sample if n is large 61 / 64

Rolling Estimates of ˆ VWM e 1.4 1.2 1 0.8 1930 1940 1950 1960 1970 1980 1990 2000 HML 1.2 1 0.8 0.6 1930 1940 1950 1960 1970 1980 1990 2000 0.4 0.2 0 0.2 SMB 1930 1940 1950 1960 1970 1980 1990 2000 0.2 0 UMD 0.2 1930 1940 1950 1960 1970 1980 1990 2000 62 / 64

Recursive Estimates of ˆ 1.1 1 VWM e 0.9 1930 1940 1950 1960 1970 1980 1990 2000 SMB 0.1 0 0.1 0.2 0.3 1930 1940 1950 1960 1970 1980 1990 2000 1 0.9 0.8 HML 0.1 0 0.1 UMD 0.7 1930 1940 1950 1960 1970 1980 1990 2000 0.2 1930 1940 1950 1960 1970 1980 1990 2000 63 / 64

Influence Function, BH e on 4-factors 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 1930 1940 1950 1960 1970 1980 1990 2000 64 / 64