Analysis of Cross-Sectional Data

Size: px

Start display at page:

Download "Analysis of Cross-Sectional Data"

Johnathan Waters
6 years ago
Views:

1 Analysis of Cross-Sectional Data Kevin Sheppard Oxford MFE This version: October 30, 2017 November 6, 2017

2 Outline Econometric models Specification that can be analyzed with linear regression Factor models for returns Estimation of unknown parameters Assessing model fit The relationship between OLS and ML Testing Hypotheses: Wald, LR and LM Model building Checking for assumption violation 2 / 61

3 Basic Notation y i = β 1 x 1,i + β 2 x 2,i β k x k,i + ε i, yi : Regressand, Dependant Variable, LHS Variable xj,i : Regressor, also Independent Variable, RHS Variable, Explanatory Variable εi : Innovation, also Shock, Error or Disturbance n observations, indexed i = 1, 2,..., n k regressors, indexed j = 1, 2,..., k Usually use matrix notation y: n 1 X: n k β: k 1 y = Xβ + ε ε: n 1 3 / 61

4 More Notation Row form: y i = x i β + ε i Column form: Throughout the notes and slides: y i = β 1 x 1 + β 2 x β k x k + ε i Standard math notation indicates a scalar y i, x i, β, ε i Lower case bold math indicates a vector y, x i, ε, β Upper case bold math indicates a matrix X, A, Γ, Σ 4 / 61

5 What is a linear regression? Many specifications can be examined using the tools of linear regression y i = βx i + ε i Two key requirements Additive error One multiplicative parameter per term Examples: Polynomials yi = β 1 x i + β 2 xi 2 + ε i Level shifts y i = β 1 x i + β 2 x i I [xi >κ] + ε i I[xi >κ] is an indicator variable that takes the value 1 or 0 I[xi >κ] = 1 if x i > κ Non-linear relationships y i = β 1 sin x i + β 2 ln x i + ε i 5 / 61

6 What cannot be analyzed as a linear regression? Non-separable parameters yi = β1x β2 i Lots of solutions: Non-linear least squares GMM Maximum Likelihood Note that yi = β 1 x β 2 i ε i is a LR! + ε i Requires nonnegativity of yi and x i ln y i = ln β 1 + β 2 ln x i + ln ε i ỹ i = β 1 + β 2 x i + ε i y t = GARCH σt 2ε t σ 2 t = ω + αy2 t 1 + βσ2 t 1 6 / 61

7 Regression Coefficient Interpretation Ceteris Paribus Not usually applicable Holding other (included) variables constant More reasonable On average, In nonlinear (in x i ) specifications y i = β 1 x 1,i + β 2 x 2,i β k x k,i β k y i x k,i ln y i = β ln x i β y i x i x i y i = E y,x y i = β 1 x i + β 2 x 2 i β 1 + 2β 2 x i y i x i 7 / 61

8 What is a model? Pretty hard question. Not to be resolved today. Two competing views Data generating process (DGP) Model taken as literal Simpler to think about Implausible for nearly everything we do Approximation to probability law (a.k.a. distribution) All models are misspecified, but... Even misspecified models can aid in understanding important relationships Reduces reality to tractable problem Some caution is needed My favorite example: GARCH model y t = σt 2ε t σ 2 t = ω + αy2 t 1 + βσ2 t 1 Relates today s variance to yesterday s variance and the squared return 8 / 61

9 Example: Approximate Factor Models Factor models are widely used in finance Capital Asset Pricing Model (CAP-M) Arbitrage Pricing (APT) Risk Exposure Basic specification r i = f i β + ε i ri : Return on dependent asset, often excess (ri e) fi : 1 k vector of factor innovations ε i innovation, corr(ε i, f j,i )=0, j = 1, 2,..., k Example: CAP-M r i r f i = β(r m i r e i = βr me i r f i ) + ε i + ε i 9 / 61

10 Factor Models Fama-French Model r e i = β 1 + β 2 VWM e i + β 3 SMB i + β 4 HML i + ε i VWM e i : Excess return on the value weighted market portfolio SMBi : Return on the Small Minus Big Portfolio HMLi : Return on the High Minus Low portfolio Carhart 4-Factor model FF + UMDi : Return on the Up Minus Down portfolio Recent papers have proposed many factors Asymmetric β PINS Default Idiosyncratic Risk Sheep Factor 10 / 61

11 The data Returns on 4 factors, risk-free rate, 6 portfolios: BH (Big-High), BM (Big-Medium), BL (Big-Low), SH (Small-High), SM (Small-Medium), SL (Small-Low) Monthly return data from January June 2008 All returns 100 (ln pt ln p t 1 ) High beats Low, Small beats Big Variance decreases with average return Data is the same as used in the MATLAB course, and is available at 11 / 61

12 Scatter plots of BH e (x-axis) VWM SMB HML UMD / 61

13 Dummy Variables Definition (Dummy Variable) A dummy variable is a variable that takes the value 0 or 1. Value depends on the value of some x variable(s) Denoted I [f (x) c] f (x) is some function of the regressors c is an arbitrary constant could be anything that would produce a logical expression (, >) Cannot depend on yi Dummies in finance Asymmetries: I[xi <0] Calendar effects: I[xi =1] x i is the month or day of the week) Structural breaks: I[xi >1987] x i is the year 13 / 61

14 Variable Interactions Non-linearities often introduced through interactions x 2 1,i, x 1,ix 2,i or x 2 1,i x 2,i Interactions can include dummy variables x 1,i I [x1,i <0] Asymmetric slope coefficient x 1,i x 2,i I [x1,i <0]I [x2,i <0] Asymmetric slope coefficient in (-,-) quadrant Interactions, particularly dummy interactions can capture important highly-linear features Kinked lines y i = β 1 + β 2 x i + β 3 x i I [xi <0] + ε i Jumps in lines y i = β 1 + β 2 I [xi <0] + β 3 x i + ε i Piece-wise linear splines y i = β 1 + β 2 x i + β 3 x i I [xi >c] + (β 1 + β 2 c β 3 c)i [xi >c] + ε i Polynomial or Tensor product expansions y i = β 1 + β 2 x 1,i + β 3 x 2,i + β 4 x 2 1,i + β 5x 2 2,i + β 6x 1,i x 2,i + ε i 14 / 61

15 A Caveat for Using Dummy Variables One warning Cannot include an intercept and all dummies I1,i = 1 if Monday, I 2,i = 1 if Tuesday, etc. Problematic specification: y i = β 1 + β 2 I 1,i + β 3 I 2,i + β 4 I 3,i + β 5 I 4,i + β 6 I 5,i + ε i 5 j=1 I j,i = 1 always Perfect Collinearity: Cannot estimate model Solution 1: Remove the constant y i = β 1 I 1,i + β 2 I 2,i + β 3 I 3,i + β 4 I 4,i + β 5 I 5,i + ε i Solution 2: Remove one dummy y i = β 1 + β 2 I 2,i + β 3 I 3,i + β 4 I 4,i + β 5 I 5,i + ε i Interpretation changes, models identical Most software will produce an error or warning 15 / 61

16 Estimating the unknown parameters Many possible ways to estimate β Take k data points and solve (Gaussian Elimination) Exact and simple solution Doesn t work if n > k Minimize the maximum error Maximum Score Very hard Minimize the average error Many solutions Minimize some nonnegative function of the errors Least squares min Least absolute deviations min n n ε 2 i = (y i x i β) 2 i=1 i=1 n n ε i = y i x i β i=1 i=1 16 / 61

17 Calculus of Least Squares Formal problem min β n (y i x i β) 2 = i=1 n i=1 ε 2 i Matrix equivalent min (y Xβ) (y Xβ) = ε ε β k First Order Conditions (F.O.C) 2X (y Xβ) = 2X ˆε = 0 2X y + 2X Xβ = 0 Solve for β to get LS estimator, denoted ˆβ ˆβ = (X X) 1 X y Second derivative is always positive definite as long as rank(x) = k. 2X X 17 / 61

18 Other estimators Fit values ŷ i = x i ˆβ Estimated errors ˆε i = y i x i ˆβ = yi ŷ i Error variance estimator s 2 = n i=1 ˆε2 i n k ˆσ 2 = n i=1 ˆε2 i n n k is a degree of freedom correction ˆεi are too close to zero. 18 / 61

19 Properties of OLS estimators One assumption implicitly made rank(x) = k X X is invertible Only assumption needed for estimation! Estimated errors are orthogonal to X n X ˆε = 0 or for each variables, x ij ˆε i = 0, j = 1, 2,..., k If model includes a constant, estimated errors have mean 0 n ˆε i = 0 i=1 Closed under linear transformations to either X or y Linear: az, a nonzero Closed under affine transformation to X or y if model has constant Affine: az + c, a nonzero i=1 19 / 61

20 Scatter plots of BH e VWM SMB HML UMD / 61

21 Scatter plots of BH e i ˆ β 1 ˆ β 2 VWM e i = ˆε i VWM SMB HML UMD / 61

22 Assessing fit Next step: Does my model fit? A few preliminaries n i=1 (y i ȳ) 2 Total Sum of Squares (TSS) n (x i ˆβ x ˆβ) 2 Regression Sum of Squares (RSS) i=1 n (y i x i ˆβ) 2 Sum of Squared Errors (SSE) i=1 ι is a k 1 vector of 1s. Note: ȳ = x ˆβ if the model contains a constant TSS = RSS + SSE Can form ratios of explained and unexplained R 2 = RSS TSS = 1 SSE TSS 22 / 61

23 Uncentered R 2 : R 2 u Usual R 2 is formally known as centered R 2 (R 2 c) Only appropriate if model contains a constant Alternative definition for models without constant n yi 2 Uncentered Total Sum of Squares (TSS U ) i=1 n (x i ˆβ) 2 Uncentered Regression Sum of Squares (RSS U ) i=1 n (y i x i ˆβ) 2 Uncentered Sum of Squares Errors (SSE U ) i=1 Uncentered R 2 : R 2 u Warning: Most software packages return R 2 c for any model Inference based on R 2 c when the model does not contain a constant will be wrong! Warning: Using the wrong definition can produce nonsensical and/or misleading numbers 23 / 61

24 The limitation of R 2 R 2 has one huge shortcoming: Adding variables cannot decrease the R 2 Enter R 2 Limits usefulness for selecting models : Bigger model always preferred R 2 = 1 SSE n k TSS n 1 = 1 s2 s 2 y = 1 SSE n 1 TSS n k = 1 (1 R2 ) n 1 n k R 2 is read as Adjusted R 2 R 2 increases if and only if the estimated error variance decreases Adding noise variables should generally decrease R 2 Caveat: For large n, penalty is essentially nonexistent Much better way to do model selection coming later / 61

25 Abuse of R 2 Using the wrong R 2 produces silly numbers Regressors R 2 U R 2 C R 2 U R 2 C VWM e VWM e, SMB VWM e, HML VWM e, SMB, HML VWM e, SMB, HML, UMD , VWM e , VWM e, SMB , VWM e, SMB, HML , VWM e, SMB, HML, UMD Warning: R 2 (either) is not invariant to changes in the regressand, so y i = βx i + ε i is never comparable to ln y i = β ln x i + ε i using R 2 25 / 61

26 Making sense of estimators Only one assumption in 30 slides X X is nonsingular (Identification) More needed to make any statements about unknown parameters Two standard setups: Classical (also Small Sample, Finite Sample, Exact) Make strong assumptions get clear results Easier to work with Implausible for most finance data Asymptotic (also Large Sample) Make weak assumptions hope distribution close Requires limits and convergence notions Plausible for many financial problems Extensions to make applicable to most finance problem We ll cover only the Asymptotic framework since the Classical framework is not appropriate for most financial data. 26 / 61

27 The assumptions Assumption (Linearity) y i = x i β + ε i Model is correct and conformable to requirements of linear regression Strong (kind of) Assumption (Stationary Ergodicity) {(x i, ε i )} is a strictly stationary and ergodic sequence. Distribution of (xi, ε i ) does not change across observations Allows for applications to time-series data Allows for i.i.d. data as a special case 27 / 61

28 The assumptions Assumption (Rank) E[x i x i] = Σ XX is nonsingular and finite. Needed to ensure estimator is well defined in large samples Rules out some types of regressors Functions of time Unit roots (random walks) Assumption (Moment Existence) E[x 4 j,i ] <, i = 1, 2,..., j = 1, 2,..., k and E[ε2 i ] = σ2 <, i = 1, 2,.... Needed to estimate parameter covariances Rules out very heavy-tailed data 28 / 61

29 The assumptions Assumption (Martingale Difference) [ (xj,i {x i ε ) ] 2 i, F i } is a martingale difference sequence, E ε i < j = 1, 2,..., k, i = 1, 2... and S = V[n 1 2 X ε] is finite and nonsingular. Provides conditions for a cental limit theorem to hold Definition (Martingale Difference Sequence) Let {z i } be a vector stochastic process and F i be the information set corresponding to observation i containing all information available when observation i was collected except z i. {z i, F i } is a martingale difference sequence if E[z i F i ] = 0 29 / 61

30 Large Sample Properties ˆβ n = ( n 1 ) 1 ( n x i x i n 1 ) n x i y i i=1 i=1 Theorem (Consistency of ˆβ) Under these assumptions ˆβ n p β Consistency means that the estimate will be close eventually to the population value Without further results it is a very weak condition 30 / 61

31 Large Sample Properties Theorem (Asymptotic Distribution of ˆβ) Under these assumptions n( ˆβ n β) where Σ XX = E[x i x i] and S = V[n 1/2 X ε]. d N(0, Σ 1 XX SΣ 1 XX ) (1) CLT is a strong result that will form the basis of the inference we can make on β What good is a CLT? 31 / 61

32 Understanding why a CLT may apply ˆβ n is a weighted average of n observations ˆβ n = = ( ( = β n 1 n 1 ( = β + ) 1 ( n x i x i i=1 ) 1 ( n x i x i i=1 n 1 ( n 1 β + Σ 1 XX n 1 n 1 ) 1 ( n x i x i i=1 ( ) n x i y i = ( X X ) 1 ( X y ) i=1 ) n x i (x iβ + ε i ) i=1 n 1 ) 1 ( n x i x i i=1 n 1 ) n x i ε i=1 n 1 ) ( n x i x i + i=1 ) n x i ε i=1 n 1 ) 1 ( n x i x i i=1 n 1 ) n x i ε i=1 32 / 61

33 Gauss-Markov Theory and BLUE ˆβ = (X X) 1 X y Theorem (Gauss-Markov Theorem) Under assumptions stronger than those of the asymptotic framework, ˆβ is the minimum variance estimator among all linear unbiased estimators. That is V[ β X] - V[ ˆβ X] is positive semi-definite where β = Cy and E[ β] = β. [ Main difference is that stronger assumption E εi X ] = 0 is required Produces much confusion Superficial argument favoring OLS Does not require normality Other estimators can be BUE Data specific 33 / 61

35 Solution and Cramer-Rao Theorem ˆβ MLE = (X X) 1 X y ˆσ 2 MLE = n 1 (y X ˆβ) (y X ˆβ) = n 1 ˆε ˆε ˆβ MLE = ˆβ, ˆσ 2 MLE s 2 Theorem (Cramer-Rao Inequality) Let f (z; θ ) be the joint density of z where θ is an k dimensional parameter vector. Let ˆθ be an unbiased estimator of θ with finite covariance. Under some regularity conditions on f ( ) where [ ] 2 ln f (z; θ ) I(θ ) = E θ θ V[ ˆθ ] > I 1 (θ ) and under some additional regularity conditions, [ ln f (z; θ ), J (θ ) = E θ I(θ ) = J (θ ). ] ln f (z; θ ) θ 35 / 61

36 BUE: Best Unbiased Estimator I(β, σ 2 ) = [ X X σ n 2σ 4 ], and I(β, σ 2 ) 1 = [ σ 2 (X X) σ 4 n ] Theorem (Best Unbiased Estimator) Under strong assumptions including joint normality of ε, ˆβ = ˆβ MLE is the best unbiased estimator for β. OLS mean parameters achieve the Cramer-Rao lower bound Unlike Gauss-Markov Theorem, this result is very powerful For some DGP, there is no better estimator, linear or nonlinear Not valid for ˆσ 2 MLE since E[ ˆσ 2 MLE ] = n k n Important: σ2 OLS is not MLE! OLS requires no specific assumptions about distribution OLS is valid under much weaker assumptions than those which justify Cramer-Rao or Gauss-Markov 36 / 61

37 Estimating the parameter covariance Before making inference, the covariance of n ( ˆβ β ) must be estimated Theorem (Asymptotic Covariance Consistency) Under the large sample assumptions, and ˆΣ XX =n 1 X X p Σ XX n Ŝ =n 1 ˆε 2 i x i x p i S i=1 ( ) =n 1 X ÊX ˆΣ 1 XXŜ ˆΣ 1 XX p Σ 1 XX SΣ 1 XX where Ê = diag( ˆε2 1,..., ˆε2 n ). 37 / 61

38 Consistent estimation of regression error variance Variance of the regression error can also be consistently estimated Theorem (Variance Consistency) Under the large sample assumptions, ˆσ 2 n = n 1 ˆε ˆε p σ / 61

39 Inside the covariance estimator Ŝ = n 1 X ÊX Ê is a diagonal matrix with ˆε 2 i on its i th diagonal position ˆε ˆε ˆε 2 i Shorthand notation for n 1 X ÊX = n 1 n i=1 ˆε 2 i x i x i Picks up the covariance between xj,i, x 2 j,i, and x j,ix l,i and ˆε 2 i Heteroskedasticity means E[ ˆε 2 i x i x i x i ] σ 2 x i x i 39 / 61

40 Bootstrap Estimation of Parameter Covariance 2 common bootstraps for estimating regression parameter covariance 1. Residual Bootstrap Appropriate when data are conditionally homoskedastic Separate selection of xi and ˆε i when constructing bootstrap ỹ i 2. Non-parametric Bootstrap Works under more general conditions Resamples {yi, x i } as a pair Both are for data where the errors are not cross-sectionally correlated 40 / 61

41 The Bootstrap Algorithm for homoskedastic data Algorithm (Residual Bootstrap Regression Covariance) 1. Generate 2 sets of n uniform integers {u 1,i } n i=1 and {u 2,i} n i=1on [1, 2,..., n]. 2. Construct a simulated sample { } ỹ u1,i = x u1,i ˆβ + ˆεu2,i. 3. Estimate the parameters of interest using ỹ u1,i = x u1,i β + ε u1,i, and denote the estimate β b. 4. Repeat steps 1 through 3 a total of B times. 5. Estimate the variance of ˆβ using V [ ˆβ] V [ ˆβ] = B 1 B b=1 = B 1 B b=1 ( β j ˆβ ) ( β j ˆβ ) or ( β j β ) ( β) β j 41 / 61

42 The Bootstrap Algorithm for heteroskedastic data Algorithm (Nonparametric Bootstrap Regression Covariance) 1. Generate a sets of n uniform integers {u i } n i=1 on [1, 2,..., n]. 2. Construct a simulated sample {y ui, x ui }. 3. Estimate the parameters of interest using y ui = x ui β + ε ui, and denote the estimate β b. 4. Repeat steps 1 through 3 a total of B times. 5. Estimate the variance of ˆβ using V [ ˆβ] V [ ˆβ] = B 1 B b=1 = B 1 B b=1 ( β j ˆβ ) ( β j ˆβ ) or ( β j β ) ( β) β j 42 / 61

43 Elements of a hypothesis test Definition (Null Hypothesis) The null hypothesis, denoted H 0, is a statement about the population values of some parameters to be tested. The null hypothesis is also known as the maintained hypothesis. Null is important because it determines the conditions under which the distribution of ˆβ must be known Definition (Alternative Hypothesis) The alternative hypothesis, denoted H 1, is a complementary hypothesis to the null and determines the range of values of the population parameter that should lead to rejection of the null. Alternative is important because it determines the conditions where the null should be rejected H 0 : λ Market = 0, H 1 : λ Market > 0 or H 1 : λ Market 0 43 / 61

44 Elements of a hypothesis test Definition (Hypothesis Test) A hypothesis test is a rule that specifies the values where H 0 should be rejected in favor of H 1. The test embeds a test statistic and a rule which determines if H0 can be rejected Note: Failing to reject the null does not mean the null is accepted. Definition (Critical Value) The critical value for an α-sized test, denoted C α, is the value where a test statistic, T, indicates rejection of the null hypothesis when the null is true. CV is the value where the null is just rejected CV is usually a point although can be a set 44 / 61

45 Elements of a hypothesis test Definition (Rejection Region) The rejection region is the region where T > C α. Definition (Type I Error) A Type I error is the event that the null is rejected when the null is actually valid. Controlling the Type I is the basis of frequentist testing Note: Occurs only when null is true Definition (Size) The size or level of a test, denoted α, is the probability of rejecting the null when the null is true. The size is also the probability of a Type I error. Size represents the preference for being wrong and rejecting true null 45 / 61

46 Elements of a hypothesis test Definition (Type II Error) A Type II error is the event that the null is not rejected when the alternative is true. A Type II occurs when the null is not rejected when it should be Definition (Power) The power of the test is the probability of rejecting the null when the alternative is true. The power is equivalently defined as 1 minus the probability of a Type II error. High power tests can discriminate between the null and the alternative with a relatively small amount of data 46 / 61

47 Type I & II Errors, Size and Power Size and power can be related to correct and incorrect decisions Decision Do not reject H 0 Reject H 0 Truth H 0 Correct Type I Error (Size) H 1 Type II Error Correct (Power) 47 / 61

48 Hypothesis testing in regressions Distribution theory allows for inference Hypothesis H 0 : R(β) = 0 R( ) is a function from k m, m k All equality hypotheses can be written this way H 0 : (β 1 1)(β 2 1) = 0 H 0 : Linear Equality Hypotheses (LEH) H 0 : Rβ r = 0 or in long hand, R is an m by k matrix r is an m by 1 vector β 1 β 2 β 1 + β 2 1 = 0 k R i,j β j = r i, i = 1, 2,..., m Attention limited to linear hypotheses in this chapter Nonlinear hypotheses examined in GMM notes 48 / 61 j=1

49 What is a linear hypothesis 3-Factor FF Model: BH e i = β 1 + β 2 VWM e i + β 3 SMB i + β 4 HML i + ε i H 0 : β 2 = 0 [Market Neutral] R = [ ] r = 0 H 0 : β 2 + β 3 = 1 R = [ ] r = 1 H 0 : β 3 = β 4 = 0 [CAPM with nonzero intercept] [ ] R = r = [0 0] H 0 : β 1 = 0, β 2 = 1, β 2 + β 3 + β 4 = R = r = [0 1 1] 49 / 61

50 Estimating linear regressions subject to LER Linear regressions subject to linear equality constraints can always be directly estimated using a transformed regression BHi e = β 1 + β 2 VWMi e + β 3 SMB i + β 4 HML i + ε i H 0 : β 1 = 0, β 2 = 1, β 2 + β 3 + β 4 = 1 β 2 = 1 β 3 β 4 1 = 1 β 3 β 4 β 3 = β 4 BHi e Combine to produce restricted model BH e i = 0 + 1VWM e i + β 3 SMB i β 3 HML i + ε i BH e i VWM e i = β 3(SMB i HML i ) + ε i r i = β 3 r P i + ε i 50 / 61

51 3 Major Categories of Tests Wald Directly tests magnitude of Rβ r t-test is a special case Estimation only under alternative (unrestricted model) Lagrange Multiplier (LM) Also Score test or Rao test Tests how close to a minimum the sum of squared errors is if the null is true Estimation only under null (restricted model) Likelihood Ratio (LR) Tests magnitude of log-likelihood difference between the null and alternative Invariant to reparameterization Good thing! Estimation under both null and alternative Close to LM in asymptotic framework 51 / 61

52 Visualizing the three tests Rβ r =0 2X (y Xβ) elihood tio Lagrange Multiplier Wald (y Xβ) (y Xβ)) β ˆβ 52 / 61

53 t-tests Single linear hypothesis: H0 : Rβ = r n ( ˆβ β ) d N(0, Σ 1 XX SΣ 1 XX ) n ( R ˆβ r ) d N(0, RΣ 1 XX SΣ 1 XX R ) Note: Under the null H0 : Rβ = r Transform to standard normal random variable z = R n ˆβ r RΣ 1 XX SΣ 1 XX R Infeasible: Depends on unknown covariance Construct feasible version using estimate t = R n ˆβ r R ˆΣ 1 XXŜ ˆΣ 1 XXR Estimated variance of R ˆβ Note: Asymptotic distribution is unaffected since covariance estimator is consistent 53 / 61

54 t-test and t-stat Unique property of t-tests: Easily test one-sided alternatives H 0 : β 1 = 0 vs. H 1 : β 1 > 0 More powerful if you know the sign (e.g. risk premia) t-stat of a coefficient ˆβk is test of H 0 : β k = 0 against H 0 : β k 0 n ˆβk ( ) Σ 1 XX SΣ 1 XX [kk] ] Single most common statistic Reported for nearly every coefficient 54 / 61

55 Distribution and rejection region 95% One sided (Upper) 95% 2 sided z ˆβ β 0 se(ˆβ) / 61

56 Implementing a t Test Algorithm (t-test) 1. Estimate the unrestricted model y i = x i β + ε i 2. Estimate the parameter covariance using ˆΣ 1 XXŜ ˆΣ 1 XX 3. Construct the restriction matrix, R, and the value of the restriction, r, from null 4. Compute t = n R ˆβ n r v, v = R ˆΣ 1 XXŜ ˆΣ 1 XXR 5. Make decision (C α is the upper tail α-cv from N(0, 1)): a. 1-sided Upper: Reject the null if t > C α b. 1-sided Lower: Reject the null if t < C α c. 2-sided: Reject the null if t > C α/2 56 / 61

57 t-tests on the FF Portfolios Testing the coefficients on the BH e portfolio All have H0 : β j = 0 and H 1 : β j 0 Definition (P-value) Heteroskedasticity Parameter ˆβ S.E. t-stat p-val Constant VWM e SMB HML UMD The p-value is largest size (α) where the null hypothesis cannot be rejected. The p-value can be equivalently defined as the smallest size where the null hypothesis can be rejected. Should be familiar with 1.645, 1.96 (2) and / 61

58 Wald tests Wald tests examine validity of one or more equality restriction by measuring magnitude of Rβ r For same reasons as t-test, under the null n ( R ˆβ r ) d N(0, RΣ 1 XX SΣ 1 XX R ) Standardized and squared W = n(r ˆβ r) [ RΣ 1 XX SΣ 1 XX R ] 1 (R ˆβ r) d χ 2 m Again, this is infeasible, so use the feasible version W = n(r ˆβ r) [ R ˆΣ 1 XXŜ ˆΣ 1 XXR ] 1 (R ˆβ r) d χ 2 m 58 / 61

59 Bivariate confidence sets No Correlation Positive Correlation Negative Correlation Different Variances % 90% 80% / 61

60 Implementing a Wald Test Algorithm (Large Sample Wald Test) 1. Estimate the unrestricted model y i = x i β + ε i. 2. Estimate the parameter covariance using ˆΣ 1 XXŜ ˆΣ 1 XX where ˆΣ XX = n 1 n n x i x i, Ŝ = n 1 i=1 i=1 ˆε 2 i x i x i 3. Construct the restriction matrix, R, and the value of the restriction, r, from the null hypothesis. 4. Compute W = n(r ˆβ [ ] n r) R ˆΣ 1 XXŜ ˆΣ 1 1 XXR (R ˆβ n r). 5. Reject the null if W > C α where C α is the critical value from a χ 2 m using a size of α. 60 / 61

61 Wald tests on the FF Portfolios BH e i = β 1 + β 2 VWM e i + β 3 SMB i + β 4 HML i + β 5 UMB i + ε i Wald tests only fail to reject the null that the constant and the coefficient on SMB are zero Wald Tests Null M W p-val β j = 0, j = 1,..., β j = 0, j = 1, 3, 4, β j = 0, j = 1, β j = 0, j = 1, β 5 = Note: Nulls are and, alternatives are or H0 : β 1 = 0 β 3 = 0, H 1 : β 1 0 β / 61

Analysis of Cross-Sectional Data

Analysis of Cross-Sectional Data Kevin Sheppard http://www.kevinsheppard.com Oxford MFE This version: November 8, 2017 November 13 14, 2017 Outline Econometric models Specification that can be analyzed