Analysis of Cross-Sectional Data

Analysis of Cross-Sectional Data Kevin Sheppard http://www.kevinsheppard.com Oxford MFE This version: November 8, 2017 November 13 14, 2017

Outline Econometric models Specification that can be analyzed with linear regression Factor models for returns Estimation of unknown parameters Assessing model fit The relationship between OLS and ML Testing Hypotheses: Wald, LR and LM Model building Checking for assumption violation 2 / 64

Large Sample Properties Theorem (Asymptotic Distribution of ˆ ) Under these assumptions p n( ˆ n ) d! N(0, 1 XX S 1 XX ) (1) where XX = E[x 0 i x i] and S = V[n 1/2 X 0 ]. CLT is a strong result that will form the basis of the inference we can make on What good is a CLT? 3 / 64

Estimating the parameter covariance p Before making inference, the covariance of n ˆ must be estimated Theorem (Asymptotic Covariance Consistency) Under the large sample assumptions, and ˆ XX =n 1 X 0 X! p XX nx Ŝ =n 1 ˆ 2 i x0 i x p i! S i=1 =n 1 X 0 ÊX ˆ 1 XXŜ ˆ 1 XX p! 1 XX S 1 XX where Ê = diag( ˆ 2 1,..., ˆ 2 n ). 4 / 64

Bootstrap Estimation of Parameter Covariance 2 common bootstraps for estimating regression parameter covariance 1. Residual Bootstrap Appropriate when data are conditionally homoskedastic Separate selection of xi and ˆ i when constructing bootstrap ỹ i 2. Non-parametric Bootstrap Works under more general conditions Resamples {yi, x i } as a pair Both are for data where the errors are not cross-sectionally correlated 5 / 64

The Bootstrap Algorithm for homoskedastic data Algorithm (Residual Bootstrap Regression Covariance) 1. Generate 2 sets of n uniform integers {u 1,i } n i=1 and {u 2,i} n i=1on [1, 2,...,n]. 2. Construct a simulated sample ỹ u1,i = x u1,i ˆ + ˆ u2,i. 3. Estimate the parameters of interest using ỹ u1,i = x u1,i + u1,i, and denote the estimate b. 4. Repeat steps 1 through 3 a total of B times. 5. Estimate the variance of ˆ using bv ˆ = B 1 BX j ˆ j ˆ 0 or bv ˆ b=1 = B 1 BX j j 0 b=1 6 / 64

The Bootstrap Algorithm for heteroskedastic data Algorithm (Nonparametric Bootstrap Regression Covariance) 1. Generate a sets of n uniform integers {u i } n i=1 on [1, 2,...,n]. 2. Construct a simulated sample {y ui, x ui }. 3. Estimate the parameters of interest using y ui = x ui + ui, and denote the estimate b. 4. Repeat steps 1 through 3 a total of B times. 5. Estimate the variance of ˆ using bv ˆ = B 1 BX j ˆ j ˆ 0 or bv ˆ b=1 = B 1 BX j j 0 b=1 7 / 64

Hypothesis Testing

Hypothesis Testing 8 / 64

Elements of a hypothesis test Definition (Null Hypothesis) The null hypothesis, denoted H 0, is a statement about the population values of some parameters to be tested. The null hypothesis is also known as the maintained hypothesis. Null is important because it determines the conditions under which the distribution of ˆ must be known Definition (Alternative Hypothesis) The alternative hypothesis, denoted H 1, is a complementary hypothesis to the null and determines the range of values of the population parameter that should lead to rejection of the null. Alternative is important because it determines the conditions where the null should be rejected H 0 : Market = 0, H 1 : Market > 0 or H 1 : Market 6= 0 9 / 64

Elements of a hypothesis test Definition (Hypothesis Test) A hypothesis test is a rule that specifies the values where H 0 should be rejected in favor of H 1. The test embeds a test statistic and a rule which determines if H0 can be rejected Note: Failing to reject the null does not mean the null is accepted. Definition (Critical Value) The critical value for an -sized test, denoted C, is the value where a test statistic, T, indicates rejection of the null hypothesis when the null is true. CV is the value where the null is just rejected CV is usually a point although can be a set 10 / 64

Elements of a hypothesis test Definition (Rejection Region) The rejection region is the region where T > C. Definition (Type I Error) A Type I error is the event that the null is rejected when the null is actually valid. Controlling the Type I is the basis of frequentist testing Note: Occurs only when null is true Definition (Size) The size or level of a test, denoted, is the probability of rejecting the null when the null is true. The size is also the probability of a Type I error. Size represents the preference for being wrong and rejecting true null 11 / 64

Elements of a hypothesis test Definition (Type II Error) A Type II error is the event that the null is not rejected when the alternative is true. A Type II occurs when the null is not rejected when it should be Definition (Power) The power of the test is the probability of rejecting the null when the alternative is true. The power is equivalently defined as 1 minus the probability of a Type II error. High power tests can discriminate between the null and the alternative with a relatively small amount of data 12 / 64

Type I & II Errors, Size and Power Size and power can be related to correct and incorrect decisions Decision Do not reject H 0 Reject H 0 Truth H 0 Correct Type I Error (Size) H 1 Type II Error Correct (Power) 13 / 64

Hypothesis testing in regressions Distribution theory allows for inference Hypothesis H 0 : R( )=0 R( ) is a function from R k! R m, m apple k All equality hypotheses can be written this way H 0 : ( 1 1)( 2 1) =0 H 0 : Linear Equality Hypotheses (LEH) H 0 : R r = 0 or in long hand, kx j=1 R i,j 1 2 1 + 2 1 = 0 j = r i, i = 1, 2,...,m R is an m by k matrix r is an m by 1 vector Attention limited to linear hypotheses in this chapter Nonlinear hypotheses examined in GMM notes 14 / 64

What is a linear hypothesis 3-Factor FF Model: BH e i = 1 + 2 VWM e i + 3 SMB i + 4 HML i + i H 0 : 2 = 0 [Market Neutral] R =[0100] r = 0 H 0 : 2 + 3 = 1 R =[0110] r = 1 H 0 : 3 = 4 = 0 [CAPM with nonzero intercept] apple 0 0 1 0 R = 0 0 0 1 r =[0 0] 0 H 0 : 1 = 0, 2 = 1, 2 + 3 + 4 = 1 2 3 1 0 0 0 R = 4 0 1 0 0 5 0 1 1 1 r =[0 1 1] 0 15 / 64

Estimating linear regressions subject to LER Linear regressions subject to linear equality constraints can always be directly estimated using a transformed regression BHi e = 1 + 2 VWMi e + 3 SMB i + 4 HML i + i H 0 : 1 = 0, 2 = 1, 2 + 3 + 4 = 1 ) 2 = 1 3 4 ) 1 = 1 3 4 ) 3 = 4BHi e Combine to produce restricted model BH e i = 0 + 1VWM e i + 3 SMB i 3 HML i + i BH e i VWM e i = 3(SMB i HML i ) + i r i = 3 r P i + i 16 / 64

3 Major Categories of Tests Wald Directly tests magnitude of R r t-test is a special case Estimation only under alternative (unrestricted model) Lagrange Multiplier (LM) Also Score test or Rao test Tests how close to a minimum the sum of squared errors is if the null is true Estimation only under null (restricted model) Likelihood Ratio (LR) Tests magnitude of log-likelihood difference between the null and alternative Invariant to reparameterization Good thing! Estimation under both null and alternative Close to LM in asymptotic framework 17 / 64

Visualizing the three tests Rβ r =0 2X (y Xβ) elihood tio Lagrange Multiplier Wald (y Xβ) (y Xβ)) β ˆβ 18 / 64

t-tests Single linear hypothesis: H0 : R = r p n ˆ d! N(0, 1 XX S 1 XX ) ) p n R ˆ Note: Under the null H0 : R = r Transform to standard normal random variable z = p R n ˆ r q R 1 XX S 1 XX R0 Infeasible: Depends on unknown covariance Construct feasible version using estimate t = p R n ˆ r q R ˆ XXŜ 1 ˆ XXR 1 0 r d! N(0, R 1 XX S 1 XX R0 ) Estimated variance of R ˆ Note: Asymptotic distribution is unaffected since covariance estimator is consistent 19 / 64

t-test and t-stat Unique property of t-tests: Easily test one-sided alternatives H 0 : 1 = 0 vs. H 1 : 1 > 0 More powerful if you know the sign (e.g. risk premia) t-stat of a coefficient ˆk is test of H 0 : k = 0 against H 0 : k 6= 0 p n ˆk r 1 XX S 1 XX Single most common statistic Reported for nearly every coefficient [kk] ] 20 / 64

Distribution and rejection region 95% One sided (Upper) 95% 2 sided z 1.645 1.960 1.960 ˆβ β 0 se( ˆβ) 3 2 1 0 1 2 3 21 / 64

Implementing a t Test Algorithm (t-test) 1. Estimate the unrestricted model y i = x i + i 2. Estimate the parameter covariance using ˆ 1 XXŜ ˆ 1 XX 3. Construct the restriction matrix, R, and the value of the restriction, r, from null 4. Compute t = p n R ˆ n r p, v = R ˆ 1 v XXŜ ˆ XXR 1 0 5. Make decision (C is the upper tail -CV from N(0, 1)): a. 1-sided Upper: Reject the null if t > C b. 1-sided Lower: Reject the null if t < C c. 2-sided: Reject the null if t > C /2 22 / 64

t-tests on the FF Portfolios Testing the coefficients on the BH e portfolio All have H0 : j = 0 and H 1 : j 6= 0 Definition (P-value) Heteroskedasticity Parameter ˆ S.E. t-stat p-val Constant -0.064 0.042-1.518 0.129 VWM e 1.077 0.011 97.013 0.000 SMB 0.019 0.024 0.771 0.440 HML 0.803 0.018 43.853 0.000 UMD -0.040 0.014-2.824 0.005 The p-value is largest size ( ) where the null hypothesis cannot be rejected. The p-value can be equivalently defined as the smallest size where the null hypothesis can be rejected. Should be familiar with 1.645, 1.96 (2) and 2.32 23 / 64

Wald tests Wald tests examine validity of one or more equality restriction by measuring magnitude of R r For same reasons as t-test, under the null p n R ˆ r d! N(0, R 1 XX S 1 XX R0 ) Standardized and squared W = n(r ˆ r) 0 h R 1 XX S 1 XX R0i 1 (R ˆ r) d! 2 m Again, this is infeasible, so use the feasible version W = n(r ˆ r) 0 h R ˆ 1 XXŜ ˆ 1 XXR 0i 1 (R ˆ r) d! 2 m 24 / 64

Bivariate confidence sets No Correlation Positive Correlation 3 3 2 2 1 1 0 0 1 1 2 2 3 2 0 2 3 2 0 2 Negative Correlation Different Variances 3 2 99% 90% 80% 3 2 1 1 0 0 1 1 2 2 3 2 0 2 3 2 0 2 25 / 64

Implementing a Wald Test Algorithm (Large Sample Wald Test) 1. Estimate the unrestricted model y i = x i + i. 2. Estimate the parameter covariance using ˆ 1 XXŜ ˆ 1 XX where ˆ XX = n 1 nx x 0 i x i, Ŝ = n 1 nx ˆ 2 i x0 i x i i=1 i=1 3. Construct the restriction matrix, R, and the value of the restriction, r, from the null hypothesis. 4. Compute W = n(r ˆ h i n r) 0 R ˆ XXŜ 1 ˆ XXR 1 1 0 (R ˆ n r). 5. Reject the null if W > C where C is the critical value from a 2 m using a size of. 26 / 64

Wald tests on the FF Portfolios BH e i = 1 + 2 VWM e i + 3 SMB i + 4 HML i + 5 UMB i + i Wald tests only fail to reject the null that the constant and the coefficient on SMB are zero Wald Tests Null M W p-val j = 0, j = 1,...,5 5 16303 0.000 j = 0, j = 1, 3, 4, 5 4 2345.3 0.000 j = 0, j = 1, 5 2 13.59 0.001 j = 0, j = 1, 3 2 2.852 0.240 5 = 0 1 7.97 0.005 Note: Nulls are and, alternatives are or H0 : 1 = 0 \ 3 = 0, H 1 : 1 6= 0 [ 3 6= 0 27 / 64

Lagrange Multiplier (LM) tests LM tests examine shadow price of the constraint (null) min (y X ) 0 (y X ) s. t. R r = 0. Lagrangian L(, )=(y X ) 0 (y X )+(R r) 0 If null true, then 0 FOC: @ L = 2X 0 (y X )+R 0 = 0 @ @ L = R r = 0 @ A few minutes of matrix algebra later = 2 R(X 0 X) 1 R 0 1 (R ˆ r) = ˆ (X 0 X) 1 R 0 R(X 0 X) 1 R 0 1 (R ˆ r) Tilde denotes estimates computed under the null (e.g. ) 28 / 64

Why LM tests are also known as score tests... = 2 hr(x 0 X) 1 R 0i 1 (R ˆ r) is just a function of normals (via ˆ, the OLS estimator) Alternatively, R 0 = 2X 0 R has rank m, so R 0 0, X 0 0 are the estimated residuals under the null Under the assumptions, p n s = p n n 1 X 0 d! N(0, S) And we know how to test multivariate normals for equality to 0 LM = n s 0 S 1 s d! But we always have to use the feasible version, 2 m LM = n s 0 ˆ S 1 s = n s 0 n 1 X 0 ẼX 1 s d! 2 m Note: ˆ S (and Ẽ) is estimated using the errors from the restricted regression. 29 / 64

Implementing a LM test Algorithm (Large Sample Lagrange Multiplier Test) 1. Form the unrestricted model, y i = x i + i. 2. Impose the null on the unrestricted model and estimate the restricted model, ỹ i = x i + i. 3. Compute the residuals from the restricted regression, i = ỹ i x i. 4. Construct the score using the residuals from the restricted regression from both models, s i = x i i where x i are the regressors from the unrestricted model. 5. Estimate the average score and the covariance of the score, s = n 1 nx s i, S = n 1 nx s 0 i s i (2) i=1 i=1 6. Compute the LM test statistic as LM = n s S 1 s 0 and compare to the critical value from a 2 m using a size of. 30 / 64

LM tests on the FF Portfolios BH e i = 1 + 2 VWM e i + 3 SMB i + 4 HML i + 5 UMB i + i LM broadly agrees with Wald but note the magnitudes LM Tests Null M LM p-val j = 0, j = 1,...,5 5 106 0.000 j = 0, j = 1, 3, 4, 5 4 170.5 0.000 j = 0, j = 1, 5 2 14.18 0.001 j = 0, j = 1, 3 2 3.012 0.222 5 = 0 1 8.38 0.004 31 / 64

Likelihood ratio (LR) tests A large sample LR test can be constructed using a test statistic that looks like the LM test Formally the large-sample LR is based on testing whether the difference of the scores, evaluated at the restricted and unrestricted parameters, is large in a statistically meaningful sense Suppose S is known, then n ( s ŝ) 0 S 1 ( s ŝ) = n ( s 0) 0 S 1 ( s 0) ( Why?) n s 0 S 1 s d! Leads to definition of large sample LR identical to LM but uses a difference variance estimator 2 m LR = n s 0 Ŝ 1 s d! 2 m Note: Ŝ (and Ê) is estimated using the errors from the unrestricted regression. Ŝ is estimated under the alternative and S is estimated under the null Ŝ is usually smaller than S ) LR is usually larger than LM 32 / 64

Implementing a LR test Algorithm (Large Sample Likelihood Ratio Test) 1. Estimate the unrestricted model y i = x i + i. 2. Impose the null on the unrestricted model and estimate the restricted model, ỹ i = x i + i. 3. Compute the residuals from the restricted regression, i = ỹ i x i, and from the unrestricted regression, ˆ i = y i x i ˆ. 4. Construct the score from both models, s i = x i i and ŝ i = x i ˆ i, where in both cases x i are the regressors from the unrestricted model. 5. Estimate the average score and the covariance of the score, s = n 1 nx s i, Ŝ = n 1 nx ŝ 0 iŝi (3) i=1 i=1 6. Compute the LR test statistic as LR = n sŝ 1 s 0 and compare to the critical value from a 2 m using a size of. 33 / 64

Likelihood ratio (LR) tests (Classic Assumptions) Warning: The LR on this slide critically relies on homoskedasticity and normality If null is close to alternative, log-likelihood should be similar under both! LR = 2 ln max, 2 f (y X;, 2 ) subject to R = r max, 2 f (y X;, 2 ) A little simple algebra later... SSER LR = n ln SSE U = n ln s 2 R s 2 U In classical setup, distribution LR is apple n k LR exp m n 1 F m,n k Although m Fm,n k! 2 m as n!1 34 / 64

LR tests on the FF Portfolios BH e i = 1 + 2 VWM e i + 3 SMB i + 4 HML i + 5 UMB i + i LR tests agree with the other two but are closer to Wald in magnitude LR Tests Null M LR p-val j = 0, j = 1,...,5 5 16303 0.000 j = 0, j = 1, 3, 4, 5 4 10211.4 0.000 j = 0, j = 1, 5 2 13.99 0.001 j = 0, j = 1, 3 2 3.088 0.213 5 = 0 1 8.28 0.004 35 / 64

Comparing the three tests Three tests, which to choose Asymptotically all are equivalent Rule of thumb: W LR > LM since W and LR use errors estimated under the alternative Larger test statistics are good since all have same distribution ) more power If derived from MLE (Classical Assumptions: normality, homoskedasticity), an exact relationship: W = LR > LM In some contexts (not linear regression) ease of estimation is a useful criteria to prefer one test over the others Easy estimation of null: LM Easy estimation of alternative: Wald Easy to estimate both: LR or Wald 36 / 64

Comparing the three Rβ r =0 2X (y Xβ) elihood tio Lagrange Multiplier Wald (y Xβ) (y Xβ)) β ˆβ 37 / 64

Heteroskedasticity

Heteroskedasticity Heteroskedasticity: hetero: Different skedannumi: To scatter Heteroskedasticity is pervasive in financial data Usual covariance estimator (previously given) allows for Heteroskedasticity of unknown form Tempting to always use Heteroskedasticity Robust Covariance estimator Also known as White s Covariance (Eicker/Huber) estimator Finite sample properties are generally worse if data are homoskedastic If data are homoskedastic can use a simpler estimator Formally need i h i h i h E h 2 i x j,i x j,i = E 2 i x j,i and E 2 i x2 j,i x j,i = E 2 i i x 2 j,i for i = 1, 2,...,n, j = 1, 2,...,k to justify simpler estimator 38 / 64

Estimating the parameter covariance (Homoskedasticity) Theorem (Homoskedastic CLT) Under the large sample assumptions, and if the errors are homoskedastic, p n( ˆ n ) where XX = E[x 0 i x i] and 2 = V[ i ] d! N(0, 2 1 XX ) Theorem (Homoskedastic Covariance Estimator) Under the large sample assumptions, and if the errors are homoskedastic, ˆ 2 ˆ 1 XX p! 2 1 XX Homoskedasticity justifies the usual estimator ˆ 2 n 1 X 0 X 1 When using financial data this is the unusual estimator 39 / 64

Testing for heteroskedasticity Two covariance estimators: ˆ XXŜ 1 ˆ 1 XX ˆ 2 ˆ 1 XX White s Covariance estimator has worse finite sample properties Should be avoided if homoskedasticity plausible White s test: ˆ 2 i = z i + i z i consist of all cross products of x in x jn LM test that all coefficients on parameters (except the constant) are zero H 0 : 2 = 3 =...= k (k+1)/2 = 0 White s test on the BH e portfolio returns in CAPM ˆ 2 i = 1 + 2 VWM e i + 3 VWM e2 i + i ˆ 3.32 0.93 0.21 t-stat 2.89 4.63 18.28 LM test: 274.9, p-val 0.000 ) Strong evidence of heteroskedasticity 40 / 64

Implementing White s Test for Heteroskedasticity Algorithm (White s Test for Heteroskedasticity) 1. Fit the model y i = x i + i 2. Construct the fit residuals ˆ i = y i x i ˆ 3. Construct the auxiliary regressors z i where the k(k + 1)/2 elements of z i are computed from x i,o x i,p for o = 1, 2,...,k, p = o, o + 1,...,k. 4. Estimate the auxiliary regression ˆ 2 i = z i + i 5. Compute White s Test statistic as nr 2 where the R 2 is from the auxiliary regression and compare to the critical value at size from a 2 k(k+1)/2 1. Note: This algorithm assumes the model contains a constant. If the original model does not contain a constant, then z i should be augmented with a constant, and the asymptotic distribution is a 2 k(k+1)/2. 41 / 64

Pitfalls and Problems

Problems with models What happens when the assumptions are violated? Model misspecified Omitted variables Extraneous Variables Functional Form Heteroskedasticity Too few moments Errors correlated with regressors Rare in asset pricing 42 / 64

Not enough moments Too few moments causes problems for both ˆ and t-stats Consistency requires 2 moments for xi, 1 for i Consistent estimation of variance requires 4 moments of xi and 2 of i Fewer than 2 moments of xi Slopes can still be consistent Intercepts cannot Fewer than 1 for i ˆ is inconsistent Too much noise! Between 2 and 4 moments of xi or 1 and 2 of i Tests are inconsistent 43 / 64

Omitted Variables What if the linearity assumption is violated? Omitted variables Correct Model y i = x 1,i 1 + x 2,i 2 + i Model Estimated y i = x 1,i 1 + i Can show ˆ 1 p! 1 + 0 2 x 2,i = x 1,i + i ˆ 1 captures any portion of y i explainable by x 1,i from model 1 2 through correlation between x 1,i and x 2,i Two cases where omitted variables do not produce bias x 1,i and x 2,i uncorrelated: Dummy variables 2 = 0 : Model correct Estimated variance inconsistent 44 / 64

Extraneous Variables Correct model y i = x 1,i 1 + i Model Estimated y i = x 1,i 1 + x 2,i 2 + i Can show: ˆ 1 p! 1 No problem, right? Including extraneous regressors increase parameter uncertainty Excluding marginally relevant regressors reduces parameter uncertainty but increases chance model is misspecified Bias-Variance Trade off Smaller models reduce variance, even if introducing bias Large models have less bias Related to model selection... 45 / 64

Heteroskedasticity Common problem across most financial data sets Asset returns Firm characteristics Executive compensation Solution 1: Heteroskedasticity robust covariance estimator ˆ 1 XXŜ ˆ 1 XX Partial Solution 2 : Use data transformations Ratios: Volume vs. Turnover (Volume/Shares Outstanding) Logs: Volume vs. ln Volume Volume = Size Shock ln Volume = ln Size + ln Shock 46 / 64

GLS and FGLS Solution 3: Generalized Least Squares (GLS) ˆ GLS n =(X 0 W 1 X) Can choose W cleverly 1 X 0 W 1 y, W is n n positive definite ˆ GLS n Choose W so that W 1 2 is homoskedastic and uncorrelated p! ˆ GLS is asymptotically efficient In practice W is unknown, but can be estimated ˆ 2 i = z i + i Ŵ = diag(z i ˆ) Resulting estimator is Feasible GLS (FGLS) Still asymptotically efficient Small sample properties are not assured may be quite bad In between: Use pre-specified but potentially sub-optimal W For example: Diagonal which ignores potential correlation Requires alternative estimator of parameter covariance (notes) 47 / 64

Model Building

Model Building The Black Art of econometrics Many rules and procedures Most contradictory Always a trade-off between bias and variance in finite sample Better models usually have a finance or economic theory behind them Three steps Model Selection Specification Checking Model Evaluation Pseudo-Out-of-Sample evaluation Unless on orders from your boss 48 / 64

Strategies General to Specific Fit largest specification Drop largest p-val Refit Stop if all p-values indicate significance at size is the econometrician s choice Specific to General Fit all specifications that include a single explanatory variable Include variable with the smallest p-val Starting from this model, test all other variables by adding in one-at-a-time Stop if no p-val of an excluded variable indicates significance at size 49 / 64

Information Criteria Information Criteria Akaike Information Criterion (AIC) AIC = ln ˆ 2 + 2 k n Schwartz (Bayesian) Information Criterion (SIC/BIC) BIC = ln ˆ 2 + k ln n n Both have versions suitable for likelihood based estimation Reward for better fit: Reduce ln ˆ 2 Penalty for more parameters: 2 k n Choose model with smallest IC or k ln n n AIC has fixed penalty ) inclusion of extraneous variables BIC has larger penalty if ln n > 2 (n > 7) 50 / 64

Cross Validation Use 100 m% to estimate parameters, evaluate using remaining m% m = 100 k 1 in k-fold cross validation Algorithm (k-fold cross validation) 1. For each model: a. Randomly divide observations into k-equally sized blocks, S j, j = 1,...,k b. For j = 1,...,k estimate ˆ j by excluding the observations in block i c. Compute cross-validated SSE using observations in block j and ˆ j SSE xv = kx X y i x i ˆ j 2 j=1 i2s j 2. Select model with lowest cross-validated SSE Typical values for k are 5 or 10 51 / 64

Specification Analysis Is a selected model any good? Test it: Stability Test: Chow y i = x i + i y i = x i + I [n>c] x i + i H 0 : = 0 Nonlinearity Test: Ramsey s RESET y i = x i + 1 ŷ 2 i + 2 ŷ 3 i +...+ L 1 ŷ L i + i H 0 : = 0 Recursive and/or Rolling Estimation Influence Function Influence: x i (X 0 X) 1 x 0 i ( Normalized length of x i Normality Tests: Jarque-Bera sk 2 (apple 3)2 JB = n + 6 24 52 / 64

Implementing a Chow & RESET Tests Algorithm (Chow Test) 1. Estimate the model y i = x i + I [n>c] x i + i. 2. Test the null H 0 : = 0 against the alternative H 1 : i 6= 0, for some i, using a Wald, LM or LR test. Algorithm (RESET Test) 1. Estimate the model y i = x i + i and construct the fit values, ŷ i = x i ˆ. 2. Re-estimate the model y i = x i + 1 ŷi 2 + 2 ŷi 3 +...+ i. 3. Test the null H 0 : 1 = 2 =...= m = 0 against the alternative H 1 : i 6= 0, for some i, using a Wald, LM or LR test, all of which have a 2 m distribution. 53 / 64

BH e Residual plot in 4 factor model 10 5 0 5 1930 1940 1950 1960 1970 1980 1990 2000 54 / 64

Specification Analysis: The BH e model Chow and RESET Tests: Dependent variable BH e i Variable Coefficient P-val Variable Coefficient P-val 1-0.04 0.304 1-0.093 0.040 VWM e i 1.06 0.000 VWM e i 1.079 0.000 SMB i 0.01 0.743 SMB i 0.016 0.506 HML i 0.87 0.000 HML i 0.813 0.000 UMD i -0.03 0.260 UMD i -0.038 0.009 VWM e i I [YR 1947] 0.00 0.961 ŷi 2 0.000 0.282 SMB i I [YR 1947] -0.01 0.773 ŷ 3 i -0.000 0.691 HML i I [YR 1947] -0.11 0.016 UMD i I [YR 1947] -0.00 0.773 Chow 9.73 0.045 ( Jarque-Bera: 1828.7 P-val: 0.00 ( 2 2 ) 2 4 ) RESET 1.52 0.466 ( 2 2 ) Kurtosis 10 (p-val: 0.00) ( Skewness 1 (p-val: 0.00) ( 2 1 ) 2 1 ) 55 / 64

Outliers Outliers happen for a number of reasons Data entry errors Funds blowing-up Hyper-inflation Often interested in results which are robust to some outliers Three common options Trimming Windsorizaiton (Iteratively) Reweighted Least Squares (IRWLS) Similar to GLS, only uses functions based on outlyingness of error 56 / 64

Trimming Trimming involves removing observations Removal must be based on values of i not y i Removal based on yi can lead to bias Requires initial estimate of ˆ, denoted Could include full sample, but sensitive to outliers, especially if extreme Use a subsample that you believe is good Choose subsamples at random and use a typical value Construct residuals i = y i x i and delete observations if i < ˆq or i > ˆq 1 for some small (typically 2.5% or 5%) ˆq is the -quantile of the empirical distribution of i Estimate final ˆ using OLS on remaining (non-trimmed) data 57 / 64

Correct and Incorrect Trimming Removal based on yi leads to bias 5 Correct Trimming 5 Incorrect Trimming 0 0 5 5 0 5 5 5 0 5 58 / 64

Windsorization Windsoization involves replacing outliers with less outlying observations Like trimming, removal must be based on values of i not y i Requires initial estimate of ˆ, denoted Construct residuals i = y i x i Reconstruct yi as 8 >< x i + ˆq i < ˆq y i = y i ˆq apple i apple ˆq 1 >: x i + ˆq 1 i ˆq 1 Estimate final ˆ using OLS the reconstructed data 59 / 64

Correct and Incorrect Windsorization Removal based on yi leads to bias 5 Correct Windsorization 5 Incorrect Windsorization 0 0 5 5 0 5 5 5 0 5 60 / 64

Rolling and Recursive Regressions Parameter stability is often an important concern Rolling regressions are an easy method to examine parameter stability 0 1 1 0 1 j+m j+m X ˆ i = @ x 0 i x X A @ i x 0 i y A i, j = 1, 2,...,n m i=j i=j Constructing confidence intervals formally is difficult Approximate method computes full sample covariance matrix, and then scales by n/m to reflect the smaller sample used Similar to building a confidence interval under a null that the parameters are constant Recursive regression is defined similarly only using an expanding window 0 1 1 0 1 jx jx ˆ i = @ x 0 i x A @ i x 0 i y A i, j = m, m + 1,...,n i=1 i=1 Similar issues with confidence intervals Often hard to find see variation near end of sample if n is large 61 / 64

Rolling Estimates of ˆ VWM e 1.4 1.2 1 0.8 1930 1940 1950 1960 1970 1980 1990 2000 HML 1.2 1 0.8 0.6 1930 1940 1950 1960 1970 1980 1990 2000 0.4 0.2 0 0.2 SMB 1930 1940 1950 1960 1970 1980 1990 2000 0.2 0 UMD 0.2 1930 1940 1950 1960 1970 1980 1990 2000 62 / 64

Recursive Estimates of ˆ 1.1 1 VWM e 0.9 1930 1940 1950 1960 1970 1980 1990 2000 SMB 0.1 0 0.1 0.2 0.3 1930 1940 1950 1960 1970 1980 1990 2000 1 0.9 0.8 HML 0.1 0 0.1 UMD 0.7 1930 1940 1950 1960 1970 1980 1990 2000 0.2 1930 1940 1950 1960 1970 1980 1990 2000 63 / 64

Influence Function, BH e on 4-factors 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 1930 1940 1950 1960 1970 1980 1990 2000 64 / 64