Correlation and Heterogeneity Robust Inference using Conservativeness of Test Statistic. London School of Economics November 24, 2011

Size: px

Start display at page:

Download "Correlation and Heterogeneity Robust Inference using Conservativeness of Test Statistic. London School of Economics November 24, 2011"

Merry Maxwell
5 years ago
Views:

1 Correlation and Heterogeneity Robust Inference using Conservativeness of Test Statistic Rustam Ibragimov Harvard University Ulrich K. Müller Princeton University London School of Economics November 24, 2011

2 Introduction Most economic data is observational data is plausibly not i.i.d.: think of countries, firms, or time series data Correlations and heterogeneity do not necessarily invalidate standard estimators, but standard errors and inference become a challenge 1

3 Consistent Variance Estimators White (1980) for heterogeneous and independent disturbances Newey and West (1987), Andrews (1991), Davidson and de Jong (2000) and others for time series disturbances Rogers (1993) and Arellano (1987) for clustered and panel data Conley (1999) for spatially correlated data based on a Law of Large Numbers, and thus require an infinite amount of independence poor small sample properties in many instances of interest 2

4 Inconsistent Variance Estimators Part of the problem is that sample variability of consistent variance estimators is neglected Approaches that account for sample variability of variance estimator: time series: Kiefer, Vogelsang and Bunzel (2000) and Kiefer and Vogelsang (2002, 2005), Müller (2006) panel data: Donald and Lang (2004), Hansen (2005) We add to this literature and develop a general approach to robust inference when little is known about correlations and heterogeneity 3

5 Our Approach Tests on parameter β : t statistic in group estimators Assume data can be classified in a finite number q of groups that allow asymptotically independent normal inference about the (scalar) parameter of interest β, so that ˆβ a j idn (β, v 2 j ) for j = 1,, q. Time series example: Divide data into q = 4 consecutive blocks, and estimate the model 4 times. Treat ˆβ j as observations for the usual t-statistic, and reject a 5% level test if t-statistic is larger than usual critical value for q 1 degrees of freedom. Results in valid inference by small sample result. Exploits information ˆβ j a idn (β, v 2 j ) in an efficient way. Does not rely on single asymptotic model of sampling variability for estimated standard deviation Important precursor: Fama-MacBeth (1973) method

6 Our Approach Tests on equality of parameters β 1 = β 2 : Behrens-Fisher statistic in group estimators Partition data into q 1 + q 2 groups; estimate β 1 and β 2 with q 1 and q 2 groups: ˆβij a idn (βi, v 2 ij ) for j = 1,, q i, i = 1, 2. Treat ˆβ ij as observations for the usual Behrens-Fisher (BF) statistic, and reject a 1% or 5% level test if BF statistic is larger than critical value of Student-t with min(q 1, q 2 ) 1 degrees of freedom. Valid inference by new small sample result. Exploits information ˆβ a ij idn (βi, vij 2 ) in an efficient way. Applications: robust analysis of treatment effects, structural breaks, inference under heavy tails, etc.

7 Plan of Talk Introduction The Small Sample t- and Behrens-Fisher Tests Conservativeness for Heterogenous Variances Optimality t-statistic & Behrens-Fisher Statistic Based Large Sample Robust Inference Basic Idea and Properties Comparison with Standard Inference and Known Asymptotic Variance Applications: Panel data, Time Series, Inequality Measures (on-going work with Paul Kattuman, Univ. of Cambridge) Conclusions

8 t statistic based robust inference: small samples Bakirov and Székely (2005), Ibragimov and Müller (2006): Usual small sample t test of level α 5% : conservative for independent heterogeneous Gaussian observations (not α = 10%) X j N (µ, σ 2 j ), j = 1,, q : H 0 : µ = 0 against H 1 : µ 0 t-statistic t = q X sx X = q 1 q j=1 X j, s 2 X = (q 1) 1 q j=1 (X j X) 2 cv q (α) = critical value of T q 1 : P ( T q 1 > cv q (α)) = α P ( t > cv(α) H 0 ) P ( t > cv(α) H 0, σ 2 1 =... = σ2 q) = P ( T q 1 > cv(α)) = α Holds under heavy tails: mixtures of normals (stable, Student-t)

9 Conservativeness Theorem (Bakirov and Székely 2005): Let cv q (α) be the critical value of the usual two-sided t-test based on t of level α, i.e. P ( T q 1 > cv q (α)) = α, wheret k is student-t distributed with k degrees of freedom, and let Φ denote the cumulative density function of a standard normal random variable. (i) If α 2Φ( 3) = , thenforallq 2, sup P ( t > cv q (α) H 0 )=P( T q 1 > cv q (α)) = α. (1) {σ 2 1,,σ2 q} (ii) Equation (1) also holds true for 2 q 14 if α α 1 = 0.1, and for q {2, 3} if α α 2 = 0.2. Moreover, define fcv q (α i ) = q ki (q 1) cv ki (α i ) 2 / q q(k i 1) + (q k i )cv ki (α i ) 2, i {1, 2}, where k 1 =14andk 2 =3. Thenforq k i +1, sup P ( t > fcv q (α i ) H 0 )=α i. {σ 2 1,,σ2 q} 8

10 Sketch of Proof I Rewrite t 2 = (q 1)(P q j=1 X j) 2 q P q j=1 (X j X) 2 > cv2 as X 0 AX > 0, where A =(1+ cv2 q 1 )ee0 q cv2 q 1 I q,ande =(1,, 1) 0. Let D =diag(σ 1,,σ q ). Then X 0 AX Z 0 BZ, whereb = DAD and Z N (0,I q ). Z 0 BZ P q j=1 λ jz 2 j,whereλ j are the eigenvalues of B. Observation: One eigenvalue (say, λ q )ofb is positive, and q 1 are negative. Thus, P (t 2 > cv 2 )=P (Z 2 q > q 1 X j=1 c j Z 2 j ) with c j = λ j /λ q. But: eigenvalues of B are a very messy function of {σ 2 1,,σ2 q}. 9

11 Sketch of Proof II Fact, already exploited in Bakirov (1989): for c j > 0, j =1,,q 1 P (Z 2 q > q 1 X j=1 c j Z 2 j )= Z 1 0 u (q 1)/2 (1 u) 1/2 r du Qq 1 π j=1 (u + c j) Q q 1 j=1 (u+c j) is a polynomial in u with roots c j = λ j /λ q. Eigenvalues λ j are roots of det(ui q B), and so after a scale normalization of Z 0 BZ > 0toensureλ q =1,wehave q 1 Y j=1 (u + c j )= det(ui q B) u 1 Obtain expression for P (t 2 > cv 2 )asanintegralwithanintegrand that is a reasonably tractable function of {σ 2 1,,σ2 q}. Argue why it cannot be maximized for positive but unequal variances... 10

12 14 12 Sharpe ratio density: Normal vs. Cauchy Normal Cauchy

13 Behrens-Fisher statistic robust inference: small sample results X ij N (µ i, σ 2 j ), j = 1,..., q i, i = 1, 2 H 0 : µ 1 = µ 2 vs. H 1 : µ 1 µ 2 Behrens-Fisher statistic BF = X 1 X 2 s s2 2 n 1 n2 X i = q 1 qi i j=1 X i,j s 2 i = (q i 1) 1 q i j=1 (X i,j X i ) 2, i = 1, 2 P( BF cv m 1 (α)) P( T m 1 cv m 1 (α)) = α, m = min(q 1, q 2 ) for 2 q 1, q 2 50 and α {0.1%, 1%, 5%}

14 11 Effective Rejection Probabilities

15 Optimality of t-statistic Let X j, j =1,,q with q 2, be distributed independent N (μ, σ 2 j ), and consider the hypothesis test H 0 : μ =0and{σ 2 j }q j=1 arbitrary H 1 : μ 6= 0andσ 2 j = σ2 for all j Theorem: Let cv be such that P ( T q 1 > cv) = α For any q 2, a test that rejects the null hypothesis for t > cv is the uniformly most powerful scale invariant level α test. Proof: For equal variance case, t-test is UMP scale invariant. Conservativeness of t-test implies that equal variance case under the null hypothesis is least favorable. 12

16 Asymptotic t-statistic Based Inference Partition the data into q 2 groups, with n j observations in group j, and P q j=1 n j = n. Denote by ˆβ j the estimator of β using observations in group j only. Suppose the groups are chosen such that n(ˆβ j β) N (0,σ 2 j )forallj (where max 1 j q σ 2 j > 0). Satisfiedformanymodelsaslongasmin j n j, linear or nonlinear. n(ˆβ i β) and n(ˆβ j β) are asymptotically independent for i 6= j. this amounts to n(ˆβ 1 β,, ˆβ q β) 0 N (0, diag(σ 2 1,,σ2 q)) 13

17 Asymptotic t-statistic Based Inference Rejection of H 0 : β = β 0 against H 1 : β 6= β 0 if t β exceeds the (1 α/2) percentile of the student-t distribution with q 1 degrees of freedom, where t β is the usual t-statistic t β = q ˆβ β 0 sˆβ with ˆβ = q 1 P q j=1 ˆβ j and s 2ˆβ =(q 1) 1 P q j=1 (ˆβ j ˆβ) 2,isasymptotically valid inference by small sample t-test result and the Continuous Mapping Theorem. This way of conducting inference efficiently exploits the information n(ˆβ 1 β,, ˆβ q β) 0 N (0, diag(σ 2 1,,σ2 q)) as it maximizes asymptotic local power uniformly against all local alternatives where β = β n = β 0 + c/ n for some c 6= 0andσ 2 i = σ2 j for all i, j, among all scale invariant tests by optimality of t-statistic. 14

18 Asymptotic Behrens-Fisher Statistic Based Inference Robust tests of H 0 : β 1 = β 2 against H 1 : β 1 β 2 Partition the data into q 1 + q 2 groups ˆβij, j = 1,..., q i : estimators of β i, i = 1, 2, using observations in group j only Behrens-Fisher statistic BF = ˆβ 1 ˆβ 2 S S2 2 q 1 q2 ˆβ i = q 1 qi ˆβ i j=1 i,j Si 2 = (q i 1) 1 q i j=1 ( ˆβ i,j ˆβ i ) 2, i = 1, 2 Reject H 0 if BF is larger than the critical value of Student-t distribution with min(q 1, q 2 ) 1 degrees of freedom

19 15 Size Control under AR(1) Correlation

20 Comparison with Inference with Known Asymptotic Variance Typically, n(ˆβ 1 β,, ˆβ q β) 0 N (0, diag(σ 2 1,,σ2 q)) is weaker than what is required to consistently estimate the asymptotic variance. Consistent estimation not only requires more assumptions on correlation structure, but typically also higher moments. What are the efficiency cost of this additional robustness, i.e. how does the t-statistic approach compare to an approach based on consistent variance estimation (if the variance can indeed be consistently estimated)? Consider question in exactly identified GMM framework with k 1 parameter θ and moment condition E[g(θ, y i )] = 0. We are interested in conducting inference about the first element of θ, denoted by β. 16

21 Properties of Group Estimators Let G j the set of indices of group j observations. Suppose the estimator ˆθ j basedongroupj =1,,q data satisfies n(ˆθ j θ) =Γ 1 j Q j + o p (1) N (0, Γ 1 j Ω j Γ 0 1 j ) where n 1 P g(a,y i ) p i G j a a=ˆθ Γj and Q j j = n 1/2 P i G j g(θ, y i ) N (0, Ω j ), and (Q 0 1,,Q0 q) 0 N (0, diag(ω 1,, Ω q )). The simple average of the group estimators ˆθ then satisfies n(ˆθ θ) =q 1 where Σ q = q 2 P q j=1 Γ 1 j q X j=1 Ω j Γ 0 1 Γ 1 j Q j + o p (1) N (0, Σ q ) j. 17

22 Full Sample Estimator In contrast, full sample GMM estimator ˆθ (with first element ˆβ) that solves n 1 P n i=1 g(ˆθ, y i ) 0 g(ˆθ, y i )=0satisfies n(ˆθ θ) = 1 qx X q Γ j j=1 j=1 Q j + o p (1) N (0, Σ q ) where Σ q = ³ P q j=1 Γ j 1 ³ Pq j=1 Ω j ³ Pq j=1 Γ0 j 1. Estimator ˆθ not efficient under group heterogeneity. Efficient estimator would exploit q k moment conditions E[g(θ, y i )] = 0 for i G j, j =1,,q. Optimal weighting matrix depends on {Ω j } q j=1, which is assumed difficult to estimate. We thus focus on comparison of t-statistic approach with ˆβ in the numerator with inference based on ˆβ and known σ (the (1,1) element of Σ q ). 18

23 General Comparison I In general, Σ q and Σ q are not identical. t-statistic approach and inference based on ˆβ with σ 2 known differ not only in the denominator, but also in the numerator. Both tests are consistent against fixed alternatives and have power against the same local alternatives β = β 0 + c/ n. 19

24 General Comparison II Theorem: (i) Let ι be the k 1vectorwithaoneinthefirst row and zeros elsewhere. Then ι 0 Σ q ι ι 0 ( Σ q ι 1/q inf {Γ i } q i=1,{ω i} q ι 0 Σ =0 and inf i=1 q ι {Γ i } q i=1,{ω i} q ι 0 Σ i=1 q ι = 2 if k =1 0 if k 2 (ii) For any sequence of full rank matrices {Γ i } q i=1 there exists a positive definite sequence { Ω j } q j=1 so that Σ q Σ q is positive semidefinite for {Ω j } q j=1 = { Ω j } q j=1, and for any sequence of symmetric positive definite matrices {Γ i } q i=1 there exists a positive definite sequence {Ω j} q j=1 so that Σ q Σ q is negative semidefinite for {Ω j } q j=1 = {Ω j} q j=1. (iii) If Γ i = Γ for i =1,,q,then Σ q = Σ q for all {Ω j } q j=1. 20

25 Special Case Consider the special case Γ j = Γ for j =1,,q. This naturally arises when the groups have an equal number of observations n/q, and the average of the derivative of the moment condition is homogenous across groups (leading example: i.i.d. data). Then n(ˆθ θ) = n(ˆθ θ)+o p (1) and ˆβ and ˆβ are asymptotically equivalent to order n. The asymptotic local power of tests based on t β and ˆβ with σ 2 known simply reduces to the small sample power of the small sample t-statistic and z-statistic of H 0 : μ =0whenX i N (μ, σ 2 i ), that is z = P q i=1 X i q Pq i=1 σ2 i where σ 2 i is the (1,1) element of Γ 1 Ω i Γ

26 22 Numerical Comparison

27 Applications t-statistic approach requires n(ˆβ1 β,, ˆβq β) N (0, diag(σ 2 1,, σ2 q )). 1. Panel Data 2. Time Series Data 3. Inequality, poverty, risk & concentration measures Small sample results Large sample inference 28

28 Panel Data Panel with potential time series correlation and few individuals N y i,t = x 0 i,t θ + u i,t, i =1,,N, t =1,,T where {x i,t,u i,t } T t=1 are independent across i and E[x i,tu i,t ]=0for all i, t. t-statistic approach asymptotically valid if T 1 P T t=1 x i,t x 0 p i,t Γ i and T 1/2 P T t=1 x i,t u i,t N (0, Ω i )foralli as T and N fixed for some full rank matrices Γ i and Ω i. Hansen (2005) shows that usual t-statistic with Rogers (1993) standard errors converges under the null to a scaled t-statistic with q 1 degrees of freedom under asymptotic homogeneity, i.e. when Γ i = Γ and Ω i = Ω for all i. 27

29 Panel Data II For applications in Finance, concern about cross section correlation. Our results justify Fama MacBeth method where regression is run cross sectionally for each time period, and inference is based on t- statistic of resulting estimators ˆβ j, j =1,,T,evenforsmallT and potential heterogeneity of variances as long as no correlation across t. For corporate Finance applications, uncorrelatedness in time is often implausible. Rather than to try to consistently estimate the longrun variance with few observations, the approach taken here suggests forming groups of more than one unit in time to achieve approximate independence. Alternatively, assume independence in cross section dimension, say, across industries, as in Froot (1989). Combinations are possible. Same possibilities for long-run event studies, country panel data, city panel data and so forth. 29

30 Monte Carlo Results Same design as in Thompson (2006): N =50,T =25. Linear Regression, one nonconstant regressor, individual persistence : u i,t = ξ t + η i,t, η i,t = ρη i,t 1 + ε i,t common persistence : factor structure u i,t = h i f t +ε i,t, f t = ρf t 1 +ξ t, h i N(1, 0.25) 30 Individual Persistence Common Persistence ρ Size t-statistic q = t-statistic q = t-statistic q = FM with Newey-West cluster by i and t cluster by i and t + cp Size Adjusted Power t-statistic q = t-statistic q = t-statistic q = FM with Newey-West cluster by i and t cluster by i and t + cp

31 Time Series Data In absence of more specific knowledge, exploit the default assumption that correlations between observations become weaker the further apart in time they are: Divide the sample of size T into q (approximately) equal sized groups of consecutive observations. Under a wide range of assumptions on the underlying model and observations, exactly identified GMM inference satisfies brtc X sup T 1 0 r 1 t=1 T b T X c 1/2 t=1 Z g(a, y t ) r a Γ(λ)dλ p a=ˆθ 0 (1) 0 g(θ, y t ) Z 0 h(λ)dw (λ) (2) where Γ( ) isapositivedefinite k k matrix function and h( ) is nonzero. 16

32 Time Series Data With that convergence, t-statistic approach is asymptotically valid, since T ˆθ 1 θ ˆθ 2 θ. ˆθ q θ µ R 1/q 1 R 1/q 0 Γ(λ)dλ 0 h(λ)dw (λ) µ R 2/q 1 1/q Γ(λ)dλ R 2/q h(λ)dw (λ) 1/q. ( R 1 (q 1)/q Γ(λ)dλ) 1 R 1 (q 1)/q h(λ)dw (λ) In contrast, no other known way of conducting asymptotically valid inference under (1) and (2): Kiefer and Vogelsang (2002, 2005) approach requires Γ( ) and h( ) to be constant Müller (2004) shows that no long-run variance estimator can be consistent for Var[ R 1 0 h(λ)dw (λ)] for all processes that satisfy (2) 17

33 Monte Carlo Results Same design as in Andrews (1991): Linear Regression, 5 regressors, 4 nonconstant regressors are independent draws from stationary Gaussian AR(1), as are the disturbances, + heteroskedasticity. T = 128, 5% level test about coefficient of one nonconstant regressor. t-statistic (q) ˆω 2 QA ˆω2 PW ˆω 2 BT (b) ρ Size ρ Size Adjusted Power

34 Heavy-tailed inference y t = β + j= ψ js t j, S t i.i.d. stable domain of attraction, α (1, 2] Subsampling: works under strong mixing for y t, extensions for GARCH (McElroy and Politis, 2002) t statistic approach: symmetric stable (scale mixtures of normals)

35 Differences: Subsampling: first calculate block t statistic, then approximate cdf of full sample t statistic by empirical cdf s Our approach: first calculate estimators over blocks, then form their t statistic to conduct asymptotically valid inference

36 Monte Carlo Results y t = β + u t, S t : symm. stable with index α (0, 2) AR: u t = 0.5u t 1 + S t, MA: u t = 10 j=0 ψ js t j, {ψ j } 10 j=0 = {.03,.05,.07,.1,.15,.2,.15,.1,.07,.05,.03}

37 Robust inequality measurement (on-going work with Paul Kattuman, University of Cambridge) Economic & financial data: correlated, heterogeneous or heavy-tailed Income, wealth & financial returns: heterogeneity, dependence & outliers Heterogeneous observations & outliers heavy-tailed power laws P (I > x) C x ζ Tail index ζ : heavy-tailedness, likelihood of extreme observations & outliers Income: 1.5 < ζ < 3; wealth: ζ 1.5 Infinite variance Failure of standard OLS! Problematic inference on income & wealth inequality Tail index ζ inequality measures: Gini G = 1/(2ζ 1) (inequality in upper tails) Similar relations: Singh-Maddala power law distributions, log-normal Crises, emerging markets: more volatile, more extreme shocks more pronounced heterogeneity, outliers & heavy tails (e.g., IIK, 2009: income & wealth in CIS) Financial returns: 2 < ζ < 4 finite variance, infinite fourth moment 3

38 A tale of two tails Simulated data from Normal, Cauchy and Levy distributions, n=25 Normal Cauchy α=1 Levy α=1/ Figure: Heavy-tailed distributions: more extreme observations

39 A tale of two tails Light vs. heavy tails Cauchy distributiion Levy distribution α=1/2 Normal distribution α= Figure: Tails of Cauchy distributions are heavier than those of normal distributions. Tails of Lévy distributions are heavier than those of Cauchy or normal distributions.

40 Robust inequality measurement Many inequality measures: problematic under heterogeneity, heavy-tailedness & correlation Inequality, poverty & concentration measures: very sensitive to extremes & heavy tails (GE, Theil, MLD, Gini, HHI) Income & wealth: heavy-tailed (Singh-Maddala, Pareto, log-normal) poor performance (Mandelbrot, 1997, Cowell & Flachaire, 2007, Davidson & Flachaire, 2007) Robustness of measure choice, estimation & inference Asymptotic, standard & non-standard (moon) bootstrap Semiparametric bootstrap & asymptotic methods Improvement Computationally expensive (bootstrap), problematic s.e. s (asymptotic) Many studies: only point estimates of inequality Conclusions may need to be modified: standard errors & statistical significance 4

41 t statistic based robust inference Inference on income inequality measure I (Gini or Theil) Data: heavy tails, heterogeneity or dependence H 0 : I = I 0 against the alternative H a : I I 0 t statistic based robust test of level α = 5% : 1. Partition sample I 1, I 2,..., I n of observations on incomes into q 2 groups 2. Estimate income inequality measure I for each group empirical inequality measures Î j, j = 1,..., q 3. Compute the usual t statistic t I = q Î I 0 s Î with Î = q 1 q j=1 Îj and s 2 Î = (q 1) 1 q j=1 (Î j Î) 2 (t statistic treating q group estimators as observations) 4. Reject H 0 if t R exceeds the 97.5-percentile of Student t distribution T q 1 with q 1 degrees of freedom 5. 95% confidence interval for unknown I : (Î τ 0.05 sî, Î + τ 0.05 sî) where P ( T q 1 > τ 0.05 ) =

42 Connection to inequality measures Ex.: coefficient of variation for logs Y j = log X j of X 1,..., X q Ỹ 1,..., Ỹ q : i.i.d. standard normal r.v. s: X j N (0, 1) P ( T q 1 > τ 0.05 ) = 0.05 Empirical coefficient of variation ˆ CV X = s X /X = q/t Sharpe ratio, Herfindahl-Hirschman Index Analogues for logs If Y j N (0, σ 2 j ) or scale mixtures of normals (stable, etc.) P (0 < CV Y < y) P (0 < CVỸ < y) for y < 1/(τ 0.05 q) In general, does not hold for y < 1/(τ 0.1 q) Homogeneity and thin-tails (normality): Reduce the inequality in region of their small values Does not not hold poor measure of inequality for medium & high 10

43 t statistic based robust inference Asymptotically valid & efficient if empirical inequality measures Î j, j = 1,..., q : asymptotically independent, unbiased & Gaussian of possibly different variances Standard results on CLT for empirical income inequality measures (Gini, Theil, etc.): n j (Î j I) d N (0, σ 2 j ) Same conditions as for empirical inequality measure Î over full sample I 1, I 2,..., I N : N(Î I) d N (0, σ 2 ) Asymptotic validity: also holds when group estimators Î j converge (at arbitrary rate) to mixed normal Symmetric stable Conditionally normal variates, dependence through second moments Heavy tails, extremes and outliers Dependent models with common shocks 7

44 t statistic based robust inference First applications in inequality measurement Assumptions: Asymptotic normality & independence for group estimators 8 Asymptotic normality: From usual CLT for (full) sample Many applications: Natural choice of groups for asymptotic independence Time series: Divide into q = 4 consecutive blocks & estimate model 4 times Panel data, cross section correlation: Justification of Fama-MacBeth Regression is run cross sectionally for each time period & inference using t statistic if no correlation across t Other combinations Income & wealth: spatial dependence, dependence across states, common shocks, crises, etc. Groups by geographical regions asymptotically correlated estimators Groups to achieve asymptotic independence of group estimators? Independence condition: methodological problems for direct applications

45 Asymptotic independence through randomization Solution: randomization of initial samples I 1, I 2,..., I N of income observations Randomization stage: randomly assign each observation I i, i = 1,..., N, to one of q groups j = 1,..., q with equal probability 1/q t statistic stage: Apply t statistic approach to q groups of consecutive observations in randomized sample of incomes Ĩ 1, Ĩ 2,..., Ĩ N As in time series applications: observation Ĩ i in randomized sample is element of group j if (j 1)N/q < i jn/q Group estimators for randomized sample: independent by construction Both conditions for validity of t statistic inference: satisfied Randomization-based modification of t statistic robust inference: may also prove to be useful in other problems with natural dependence for group estimators 9

46 Risk, Inequality & Concentration: MC Results Cowell & Flachaire (2007): Theil measure I = (E[Y α ]/(E[Y ]) α 1)/(α(α 1)) Singh-Maddala: F (y) = 1 (1 + ay b ) c, a = 100, b = 2.8, c = 1.7 tail index ζ = bc = 4.8, I = 0.14 Pareto: F (y) = 1 (y l /y) ζ, y l = 0.1, ζ = 2.5 I = 1/(ζ 1) + log((ζ 1)/ζ) = 0.16 Lognormal: log(y ) N (µ, σ 2 ), µ = 2, σ = 1 I = σ 2 /2 = 0.5 Singh-Maddala Pareto Lognormal N Size t-statistic q = t-statistic q = Asymptotic Standard bootstrap Moon bootstrap Semiparametric bootstrap Asymptotic test with semiparametric measure % of cases semipar. measure cannot be computed % 6.3% 9.0% 3.3%

47 Risk, Inequality & Concentration: MC Results Cowell & Flachaire (2007): Mean logarithmic deviation I = log(e[y ]) E[log(Y )] Singh-Maddala: F (y) = 1 (1 + ay b ) c, a = 100, b = 2.8, c = 1.7 tail index ζ = bc = 4.8, I = 0.15 Pareto: F (y) = 1 (y l /y) ζ, y l = 0.1, ζ = 2.5 I = 1/ζ log((ζ 1)/ζ) = 0.11 Lognormal: log(y ) N (µ, σ 2 ), µ = 2, σ = 1 I = σ 2 /2 = 0.5 Singh-Maddala Pareto Lognormal N Size t-statistic q = t-statistic q = Asymptotic Standard bootstrap Moon bootstrap Semiparametric bootstrap Asymptotic test with semiparametric measure % of cases semipar. measure cannot be computed % 6.3% 9% 3.3%

48 Robust inference for risk, inequality, poverty & concentration: t-statistic based approach vs. alternative procedures Singh-Maddala:, a=100, b=2.8, c=1.7 ξ b c P( X. > y) = (1 + ay ) = bc = 4. 8

49 Tail index estimation & inequality in upper tails Inequality in upper tails, applications of semiparametric asymptotic and bootstrap inequality inference: Estimates of tail index ζ in power laws Hill s estimator OLS log-log rank-size: optimal shifts & correct s.e. s, Gabaix & Ibragimov (11) I (1) I (2)... I (n) I (n+1) : power law P (I > x) C x ζ OLS approach: regression log(t) = a b log(i (t) ) log(rank) = a b log(size) b : estimate of tail index: Simple & robust Biased in small samples One should use the Rank 1/2, & run log(rank 1/2) = a b log(size) Shift of 1/2: optimal, and reduces bias to a leading order S.e. on ζ : not OLS s.e., but is asymptotically 2ζ/ n Numerical results: advantage over standard OLS estimation procedures Performs well under dependence & deviations from power laws 11

50 Conclusion New approach to correlation and heterogeneity robust inference in large samples. Method imposes only a finite amount of independence through the assumption that estimators from different groups are independent. Approach exploits this assumption in an efficient way. Many potential applications, encouraging Monte Carlo results. Challenge to choose groups in practice. But inference requires some assumption on correlation structure, and other methods make more implicit and even less interpretable assumptions. 36

t statistic based correlation and heterogeneity robust inference

t statistic based correlation and heterogeneity robust inference Rustam Ibragimov Harvard University Economics Department 1875 Cambridge Street Cambridge, MA 02138 Ulrich K. Müller Princeton University