Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)

Size: px

Start display at page:

Download "Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)"

Christopher Daniel
6 years ago
Views:

1 Factor-Adjusted Robust Multiple Test Jianqing Fan Princeton University with Koushiki Bose, Qiang Sun, Wenxin Zhou August 11, 2017

2 Outline 1 Introduction 2 A principle of robustification 3 Adaptive Huber estimation 4 FARM-test 5 Numerical studies

3 Introduction

4 Heavy-tailed distributions ubiquitous in modern statistics and machine learning financial returns; macroeconomics time series high-throughput data: microarrays, proteomics, fmri arising easily in high-dimensional data at odd with sub-gaussian or sub-exponential assumptions

5 Example 1: Macroeconomic time series 131 macroeconomic series (Stock & Watson 10, Ludvigson & Ng, 10, Mccraven & Ng, 16). Histgram of Kurtosis Frequency series heavier than t 5! All financial return time series 0 t

6 Example 2: RNA-seq Data Gene expressions for 104 autism patients and controls Distribution of Kurtosis a /19K gene exp. heavier than t 5! kurtosis By chance, some have heavier tails in high dim. Aim: Reduce significantly the tail assumptions

7 Example 3: Protein and Gene Expressions NCI-60: 60 human cancer cell lines (Shankavaram et al., 2007) Histgram of Kurtosis of Protein Expressions Histgram of Kurtosis of Gene Expressions Frequency Frequency Protein: 49/162 Gene: 6542/17924 heavier than t 5!

8 Large-Scale Hypothesis Testing X = (X 1,...,X p ) T has mean µ = (µ 1,...,µ p ) T. Multiple test: H 0j : µ j = 0 vs H 1j : µ j 0, for j = 1,...,p. Challenge: Strong dependence between X 1,...,X p (Benjamini & Hochberg, 95; Storey, 02; Donoho & Jin, 04; Genovese & Wasserman, 04; Efron, 07, 10; Fan et al., 12; Desai & Storey, 12; Barber & Candés, 15, Fan & Han, 17+; ) Global test: H 0 : µ = 0 vs H 1 : µ 0.

9 A common model for dependence Factor model: X ij = µ j + b T j f i + u ij, i = 1,...,n, j = 1,...,p.

10 Importance of Factor Adjustments A synthetic three-factor model: X i = µ + Bf i + u i, i = 1,...,n, f i N (0,I 3 ), B = (b jl ) IID U( 1,1) & u i t 3 (0,I p ). Model setup: (n,p) = (100,500), µ j = 0.6 for j p/4; 0, otherwise. Histogram of Sample Means Histogram of Sample Means with Factor Adjustment Frequency Frequency

11 Importance of Robust Adjustments Histogram of Robust Means Histogram of Robust Means with Factor Adjustment Frequency Frequency Decreased noise!

12 Principles of Robustification

13 Principle of Robustification I: Truncation Data: X i IID(µ,σ 2 ). Truncation: Let Xi = sgn(x i )min( X i,τ). Exponential concentration: When τ σ n, (Fan, Wang, Zhu, 17) ( 1 P n n i=1 X i µ t σ n ) 2exp( ct 2 ), univ const c 1/t 2, for sample mean Fundamental to high-dim. estimation

14 Robust Covariance Inputs Data: X i IID(0,Σ), d-dim. σ ij = E(X i X j ) (EX i )(EX j ). }{{} =0 Elementwise truncation: x ij = sgn(x ij )min( x ij,τ) and Σ = (1/n) n i=1 x i x T i If E X j X k 2 M and τ (nm/(logd)) 1 4, we have ( P Σ Σ max am logd n for any a > 2 and a universal constant c. ) d 2 a/c

15 Robust Covariance Inputs Data: X i IID(0,Σ), d-dim. σ ij = E(X i X j ) (EX i )(EX j ). }{{} =0 Elementwise truncation: x ij = sgn(x ij )min( x ij,τ) and Σ = (1/n) n i=1 x i x T i If E X j X k 2 M and τ (nm/(logd)) 1 4, we have ( P Σ Σ max am logd n for any a > 2 and a universal constant c. ) d 2 a/c

Applications of Covariance Finance: portfolio risk and management Stat & ML: Classification, graphical models, PCA 2 1 0 1 2 3 4 5?

16 Applications of Covariance Finance: portfolio risk and management Stat & ML: Classification, graphical models, PCA ? Regression: Let Σ = cov((x T,Y ) T ). E(Y X T β) 2 = ( β T,1)Σ ( β T,1) T Inference: Hotelling T 2, FDR/FDP control,

17 Principle of High-dim Robustification II Adaptive Huber loss: (Catoni 12; Fan, Li, Wang 17) 50 { x ρ τ (x) = 2 45, if x τ 40 µ τ = argmin τ(2 x τ), if x > τ. n i=1 ρ τ (Y i µ), u Then, for τ = nc/t with c SD(Y ), (Fan, Li, Wang 17) c P( µ τ µ t ) 2exp( t 2 /16), t n/8, n loss function: lτ (u) 1/t 2, for sample mean τ =0.5 τ =1 τ =2 τ =3 τ =4 τ =5 least squares

18 Adaptive Huber estimation

19 Adaptive Huber Estimation Robust covariante adjustments: ( µ, β T ) T n argmin µ,β τ (Y i µ X i=1l T i β) }{{} ε i Bahadur Representation If τ = τ 0 n(p t) 1/2 with τ 0 σ = var(ε i ) and X sub-gaussian, [ ] [ ] µ µ n Σ 1/2 ( β β 1 ) n l 1 τ(ε i ) i Σ 1/2 X R n(τ) p, i n P{R n (τ) > c p+t n } 8e t Results extended to the case with EX 4 i C via truncation.

20 A Note of Huber Regression in Low Dimensions Huber s M-estimator: βτ = argmin β R d (1/n) n i=1 l τ (y i x T i β) } {{ } L τ (β) is a sample version of β τ = argmin β R d EL τ (β). Heavy-tailed noise: v δ = 1 n n i=1 E ε i 1+δ < for some δ (0,1]. Bias β τ β 2 v δ τ δ for large τ. Larger τ less bias at cost of robustness

21 Phase Transition and Optimality Error bound For any t > 0, τ 0 ν δ, and τ = τ 0 (n/t) max{ 1 1+δ,1/2} P { S 1/2 n ( β τ β ) ( t 2 > 4τ 0 p n ) min{ δ 1+δ,1/2}} (2p + 1)e t, 2 d e n e pn min{ δ 1+δ,1/2} optimal =min{ /(1 + ), 1/2} 1/2 = 1+ impossible region =

22 Remarks The same behavior as sub-gaussian case when var. is finite. The results extend to random designs with sub-gaussian tails. Bahadur representation holds with exponential concentration. The results extends to high-dimensional with L 1 -penalty.

23 Dependent-adjusted Test T = n( µ τ µ )/σ, with τ = τ 0 n(p wn ) 1/2. Cramér-type deviation If E ε 3 < and w n and w n = o( n). Then, sup P( T z) 0 z o{min( w n, nw 1 2 2Φ(z) 1 0. n )} Taking τ n 1/3 gives the optimal result: P( Tτn x)/(1 Φ(x)) 1 uniformly over x [0,o(n 1/6 )). Rate has also been derived. Extended to heavy-tailed designs σ can be estimated by robust residual variances

24 Covariates and Factors-adjusted Multiple Tests

25 Covariate-adjusted Robust Test Covariate adjustments: X ij = µ j + b T j f i + u ij (known factor). For each null, run the robust two-sided test. Let P j = 2Φ( T j ) be the P-value for testing H 0 : µ j = 0. Uniform approximation of P-values Under regularity conditions, if logp = o{min(w n,nwn 2 )}, then max 1 j p P j P true j 1 1{Ptrue j > α/p} = o(1) as n,

26 FDP approximation Total discoveries: R(z) = p j=1 1( Tj z). FDP(z) = j Null 1( Tj z) R(z), FDP N (z) = 2p 0Φ( z). R(z) Valid approximation of FDP If m p p and m p, max FDP(z) 0 z Φ 1 (1 m p /(2p)) FDP N (z) 1 0 in probability.

27 Remarks 1 True FDP can be well approximated by normal dist., after factor adjustments, as if data are weakly dep. with reduced noise. 2 Verify the chosen critical value ẑ N,α yields truly FDP at level α. 3 Proportion of true nulls p 0 /p can be estimated as in Storey (2002): π 0 (λ) = 1 (1 λ)p p j=1 1( Pj > λ), where Pj = 2Φ( T j )

28 Factor-adjust Robust Test Model: X i = µ + Bf i + u i, i [n], f i is unobserved. Robust Estimation of Realized Factors: Note that X j = µ j + b T j f + ū j, j = 1,...,p Cross-sectional regression: If B were known, we estimate f by f = argminf R d p j=1 l γ ( Xj b T j f), where γ = γ(n, p) is a robustification parameter.

29 Factor-adjust Robust Test Model: X i = µ + Bf i + u i, i [n], f i is unobserved. Robust Estimation of Realized Factors: Note that X j = µ j + b T j f + ū j, j = 1,...,p Cross-sectional regression: If B were known, we estimate f by f = argminf R d p j=1 l γ ( Xj b T j f), where γ = γ(n, p) is a robustification parameter.

30 FARM Test Statistics n T j = σ jj b j 2 2 ( µj b T j f), σjj = RVar(X ij ) Validity of FDP Approximation Assume p n log(n), log(p) n 1/2 and that µ = (µ 1,...,µ p ) T is sparse, among other reg. cond. Then FDP N (z) FDP(z) = o P (1) as n,p.

31 Estimation of Loading Matrix B estimated by top d unnormalized eigenvectors of robust covariance matrix of var(x): 1 Elementwise robust estimator: σ ij = E(X i X j ) (EX i )(EX j ) 2 Robust U-type estimator: with ψ τ (u) = ( u τ)sign(u), Σ U (τ) = ( 1 ( ) 1 n ψ τ 2) 2 X j X k 2 (Xj X k )(X j X k ) T 2. X j k j X k 2 2 FARM-test: Choose ẑ α such that FDP N (ẑ α ; B) = α. Asymptotic results Validity of such a procedure in FDP control and estimation is proved.

32 Numerical Studies

33 Model and Methods Factor model: X i = µ + Bf i + u i, n {100,150,200},p = 500. B = (b jl ) IID U( 1,1), f i N (0,I 3 ), u i N (0,4I 3 ) or t 3 (0,I 3 ), µ j = 0.5, 1 j 25; µ j = 0, otherwise. Competing methods: 1 FARM-H: FARM-Test with adaptive Huber covariance estimator; 2 FARM-U: FARM-Test with U-type covariance estimator; 3 FAM: A non-robust counterpart of FARM (sample mean + cov.); 4 PFA: Principal factor approximation (Fan and Han, 17+); 5 Naive: Multiple t-tests ignoring factors.

34 FDP Control Table: Empirical mean abs. error between estimated & oracle FDP (t = 0.01) p = 500 u i n FARM-H FARM-U FAM PFA Naive Normal t Non-robust methods break down!

35 Power Comparisons Table: Empirical power p = 500 u i n FARM-H FARM-U FAM PFA Naive Normal t Little price to pay for robustness!

36 Power Curve under Varying Signal Strength Empirical power with respect to signal strength Empirical power FARM H FARM U FAM PFA Naive Signal Strength Figure: Empirical power versus signal strength for t 3 -distributed noise

37 Applications to Neuroblastoma Data German patients diagnosed between 1989 and 2004, aged from 0 to 296 months (median 15 months) customized oligonucleotide microarray with p = 10, focus on 3-y Event Free Survival, (49 + and 190 ) and 420 genes respectively have kurtosis heavier than t 5.

38 Effect of Adjustments and Differently Expressed Genes Before After Before After Negative group before adjustment Negative group after adjustment Negative group before adjustment Negative group after adjustment corr > 1/3 corr < 1/3. At t = 0.05, FARM-U, FAM and naive methods identify 2128, 1767, 1131 differently expressed genes.

39 Summary Introduce a simple robust principle Develop non-asymptotic Bahadur representation for adaptive Huber estimator Demonstrate a phase transition phenomenon. Propose a new factor-adjusted robust multiple test: FARM-test. Verify benefits and conclusions by simulation studies.

The End g{tç~ léâ A new perspective on robust M-estimation: finite sample theory and applications to dependence-adjusted multiple testing. (with W.-X. Zhou, K.

40 The End g{tç~ léâ A new perspective on robust M-estimation: finite sample theory and applications to dependence-adjusted multiple testing. (with W.-X. Zhou, K. Bose & H. Liu) Preprint, FARM-Test: factor-adjusted robust multiple testing with false discovery control. (with K. Yuan, Q. Sun & W.-X. Zhou) Preprint, 2017.

arxiv: v1 [stat.me] 15 Nov 2017

arxiv: v1 [stat.me] 15 Nov 2017 FARM-Test: Factor-adjusted robust multiple testing with false discovery control Jianqing Fan, Yuan Ke, Qiang Sun and Wen-Xin Zhou arxiv:1711.05386v1 [stat.me] 15 Nov 2017 Abstract Large-scale multiple