Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29

Outline 1 Background on Marginal Screening 2 2 2 Tables in Case-Control Studies (joint work with Min Qian) 3 Binary Screening Test (BST) 4 Forward Stepwise BST 5 Censored Survival Data (Tzu-Jung Huang, M. and Min Qian) Ian McKeague (Columbia University) Marginal Screening August 13, 2017 2 / 29

Wald Lecture at JSM 2017 by Candès Ian McKeague (Columbia University) Marginal Screening August 13, 2017 3 / 29

Panning for gold Are there any active predictors (in the sense defined by Candès)? The most relevant question with low signal to noise ratio (e.g., epidemiology) and sparse signals (if any). Knockoff approach furnishes rigorous FDR control (not FWER control): Barber and Candès (2016): knockoff filter for testing associations in high-dimensional linear model. Candès, Fan, Janson and Lv (2017): Panning for gold: model-free knockoffs for high-dimensional controlled variable selection Fighting words: To constrain oneself to marginal testing is to completely ignore the vast modern literature on sparse regression that, while lacking finite-sample Type I error control, has had tremendous success establishing other useful inferential guarantees such as model selection consistency under high-dimensional asymptotics... Ian McKeague (Columbia University) Marginal Screening August 13, 2017 4 / 29

The unreasonable effectiveness of marginal testing Assumption-lean full linear model: p Y = α 0 + β k X k + ɛ where ɛ has mean 0, finite variance, and is uncorrelated with each X k. The power of marginal testing derives from the fact that If (X 1,..., X p ) has a non-singular covariance matrix, k=1 H 0 : β k = 0, k = 1,..., p holds if and only if Y is marginally uncorrelated with each X k. That is, marginal testing does address the question of whether there are any active predictors in the full model (not the wrong question after all). Ian McKeague (Columbia University) Marginal Screening August 13, 2017 5 / 29

Marginal screening Least squares fitting of to each (standardized) predictor X k. E(Y X k ) = α k + β k X k This abuses the notation for β k, but it doesn t matter! Y is marginally uncorrelated with X k if and only if β k = 0. Parameter of interest: θ 0 = β k0, where k 0 arg max k=1,...,p Corr(X k, Y ). The Problem: Test whether θ 0 0 and provide a CI for θ 0 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 6 / 29

Marginal screening (cont d) Test statistics for H 0 : θ 0 = 0 versus H a : θ 0 0. Maximally-selected slope ˆθ n = Ĉov(Xˆkn, Y ) Var(Xˆk n ), ˆk n arg max k=1,...,p Ĉorr(X k, Y ). Reject H 0 for large ˆθ n. Equivalently, Maximally-selected correlation ˆρ n = Ĉorr(Xˆk n, Y ), Reject H 0 for large ˆρ n. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 7 / 29

Marginal screening (cont d) Has the hallmarks of post-selection inference: non-regular asymptotics at the null hypothesis, unstable behavior in small samples. How to provide an accurate p-value and CI for θ 0 while preserving the assumption-lean approach? i.e. without adding assumptions such as independent normal errors, as needed for conditional testing approaches, say. McKeague and Qian (2015, JASA): adaptive resampling test (ART). A modified nonparametric bootstrap provides valid (post-selection) p-values. Forward stepwise ART used to identify the presence of additional active predictors. Luedtke and van der Laan (2017, JASA): regularization approach. Asymptotically normal test statistic, CIs easily constructed, p can grow exponentially with n. Restricted to independent errors. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 8 / 29

2 2 Tables in Case-Control Studies (GWAS) Motivating Example: Risk Assessment of Cerebrovascular Events (RACE Study, 2017) On-going case-control study of stroke: over 5,000 imaging confirmed cases of stroke and 5,000 controls recruited from seven medical centers in Pakistan. Subset of 1,220 cases with early-onset stroke (stroke before age 60), and 1,273 controls Genome-wide SNP data available for this subset of subjects Goal: identify novel genetic factors associated with early-onset stroke Few if any known genetic associations (in contrast to Crohn s disease, say, where 30 genes are known) Ian McKeague (Columbia University) Marginal Screening August 13, 2017 9 / 29

Case-Control Set-Up Standard unmatched case-control study: N = M 1 (cases) + M 2 (controls) Binary disease status: D {case, control} Binary risk factors: W k {exposed, unexposed}, k = 1,..., p. Fixed margins. log-odds ratio (instead of correlation) to quantify the association. Is D significantly associated with any of the risk factors W 1,..., W p? Ideal approach would be model free, avoiding the requirement of a high-dimensional logistic regression model. Claim: the unreasonable effectiveness of marginal screening still holds. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 10 / 29

Multiple 2 2 Tables For the k-th risk factor, k = 1,..., p, Cases Controls Exposed X k N 1k X k N 1k Unexposed M 1 X k X k + N 2k M 1 N 2k M 1 M 2 N X k noncentral hypergeometric parameterized by the odds ratio. Odds Ratio: θ k = P(D = 1 W k = 1)/P(D = 0 W k = 1) P(D = 1 W k = 0)/P(D = 0 W k = 0). Hypotheses: H 0 : θ 1 =... = θ p = 1 versus H a : at least one θ k 1. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 11 / 29

Overview of Inference for 2 2 Tables Classical chi-squared and Fisher exact tests Mantel Haenszel test of H 0 : θ 1 =... = θ p = 1 versus H a : at least one θ k 1. requires all the odds ratios θ k to be identical. Kou and Ying (1996): asymptotic theory of the empirical log-odds ratio for a single table. Kou and Ying (2006) studied the problem of estimating a common odds ratio from sequences of dependent 2 2 tables. There is an extensive literature on tests for homogeneity of multiple odds ratios, e.g. Reis, Hirji and Afifi (1999) Ian McKeague (Columbia University) Marginal Screening August 13, 2017 12 / 29

Existing screening methods H 0 : θ 1 =... = θ p = 1 versus H a : at least one θ k 1. Marginal p-values: control of FWER using Bonferroni highly conservative for large p Permutation test (D randomly permuted among subjects) heavy computational burden FDR control for lasso (logistic regression) based on knockoffs Is there a more powerful, computationally efficient, and model free approach? Ian McKeague (Columbia University) Marginal Screening August 13, 2017 13 / 29

Hypotheses Question: Is D significantly related to any of the risk factors W 1,..., W p? Define k 0 arg max log θ k /σ k. k=1,...,p where σ k > 0 is a prescribed sequence of normalizing constants. Hypotheses: H 0 : log θ 0 = 0 versus H a : log θ 0 0, where θ 0 = θ k0. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 14 / 29

Binary Screening Test (BST) Empirical odds ratio ˆθ k = X k(x k + N 2k M 1 ), k = 1,..., p. (N 1k X k )(M 1 X k ) Estimate of k 0 : ˆk N arg max log ˆθ k /ˆτ k, k where ˆτ k is the standard error of log ˆθ k. Test Statistic T N = log ˆθ N = log ˆθˆk N. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 15 / 29

Asymptotic behavior of ˆθ N under local alternatives Local parameterization: log θ (N) = log θ (0) + b/ N, θ = (θ 1,..., θ p ) T. Hypotheses: H 0 : log θ N = 0 versus H a : log θ N 0, where θ N = θ kn, k N arg max k log θ k /σ k. Theorem Under regularity conditions, N(log ˆθ N log θ N ) d { σk0 Z k0 if θ (0) 1, σ K Z K + b K b k if θ (0) = 1, where (Z 1,..., Z p ) T N(0, C X ) with C X be the limit of Corr(X 1,..., X p ); k 0 = arg max k log θ (0) k /σ k assumed to be unique when θ (0) 1; k = arg max k b k /σ k assumed to be unique when θ (0) = 1 and b 0; and K = arg max k=1,...,p (Z k + b k /σ k ) 2. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 16 / 29

Key regularity condition Kou and Ying (1996) established the (marginal) representation M 1 d X k = I (η sk (1 + θ 1 k λ sk) 1, s N 1k ), s=1 where η sk iid Unif(0, 1) and λ sk 0 are the roots of the Jacobi polynomial φ k (z) = min(m 1,N 1k ) u=max(0,m 1 N 2k ) ( N1k u ) ( N2k M 1 u ) z u. We need to assume this representation holds jointly over all k = 1,..., p. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 17 / 29

Calibration of BST Calibrate under the null θ N = 1. That is θ (0) = 1 and b = 0. Only need to estimate the distribution of σ K Z K, where K = arg max k=1,...,p Z 2 k. σ 2 k can be consistently estimated by N ˆτ 2 k, where ˆτ k is the standard error of log ˆθ k. (Z 1,..., Z p ) T N(0, C X ), with C X consistently estimated by the sample correlation matrix of the vector of risk factors (W 1,..., W p ) restricted to the data on D = 1. Draw from the estimated null distribution of log ˆθ N using Monte Carlo simulations to obtain critical values. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 18 / 29

Confidence intervals for log θ 0 Three possibilities: CI 0 : use the same Monte Carlo calibration as BST CI max : use the most conservative critical values given by the limiting distributions in the Theorem as a function of b, with the ˆk N -th component of b allowed to vary freely and all its other components set to zero. CI boot : Select the value of on the grid that provides the best bˆkn agreement with the nominal 95% level in terms of the coverage of a bootstrapped version of log ˆθ N. Only CI boot works well. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 19 / 29

Simulation studies Three scenarios: A) Null: W k Ber(0.5) for k = 1,..., p; B) Alternative (weak dense signal): W k Ber(0.6), k = 1,..., p/2, W k Ber(0.5), k = p/2 + 1,..., p for cases, and W k Ber(0.5), k = 1,..., p/2, W k Ber(0.6), k = p/2 + 1,..., p for controls; C) Alternative (strong sparse signal): W 1 Ber(0.65), W 2 Ber(0.6), and W 3 Ber(0.55) for cases, W k Ber(0.4), k = 1, 2, 3 for controls, and W k Ber(0.5), k = 4,..., p. N = 200 with M 1 = M 2 = 100. Varying p from 10 to 400. Three correlation structures: independent, exchangeable, AR(1). Ian McKeague (Columbia University) Marginal Screening August 13, 2017 20 / 29

Independent risk factors Model p BST Bonferroni Permutation A 10 4.8 (0.5) 3.1 (0.15) 4.7 (10.7) 50 5.8 (0.9) 1.9 (0.06) 4.5 (51) 100 5.2 (1.6) 2.7 (0.1) 4.7 (102) 200 6.5 (2.1) 3.6 (0.2) 5.6 (193) 400 6.0 (4.3) 2.3 (0.4) 4.6 (389) B 10 60.5 (0.5) 51.0 (0.02) 60.3 (11) 50 82.0 (1.0) 64.0 (0.06) 81.0 (55) 100 86.4 (1.6) 73.0 (0.1) 85.0 (102) 200 94.3 (2.4) 85.7 (0.2) 91.9 (200) 400 98.0 (4.3) 86.1 (0.4) 96.3 (387) C 10 93.9 (0.6) 90.9 (0.02) 93.6 (11) 50 80.0 (1.0) 68.5 (0.06) 77.5 (51) 100 73.8 (1.4) 64.0 (0.1) 71.3 (100) 200 63.2 (2.2) 54.8 (0.2) 60.4 (200) 400 58.7 (4.0) 42.7 (0.4) 54.3 (388) Table: Empirical rejection rates (%) over 1, 000 Monte Carlo iterations and average runtime (seconds) per iteration when W k, k = 1,..., p, are independent. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 21 / 29

AR(1) risk factors Model p BST Bonferroni Permutation A 10 3.9 1.8 4.0 50 5.5 2.4 4.8 100 5.3 2.3 4.3 200 5.2 2.4 4.5 400 6.0 2.5 5.3 B 10 53.9 43.8 54.0 50 76.2 59.2 73.6 100 82.7 66.4 79.2 200 91.0 79.4 89.1 400 96.2 81.2 93.3 C 10 88.5 84.9 88.7 50 72.0 59.8 70.1 100 67.9 56.3 65.1 200 59.5 51.6 57.1 400 51.5 37.9 48.0 Table: Empirical rejection rates (%) with AR(1) correlation structure Corr(W j, W k ) = 0.5 j k. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 22 / 29

Exchangeable risk factors Model p BST Bonferroni Permutation A 10 4.6 2.3 7.2 50 4.5 1.2 6.7 100 5.8 2.1 5.4 200 6.0 1.2 5.1 400 5.8 1.1 4.9 B 10 59.8 42.1 58.1 50 67.0 38.9 65.1 100 72.4 37.5 67.8 200 75.3 42.7 69.6 400 79.2 38.1 73.1 C 10 90.6 85.4 90.2 50 76.6 58.3 74.5 100 73.4 54.4 70.3 200 69.1 48.9 65.6 400 61.1 34.6 55.2 Table: Empirical rejection rates (%) based on 1,000 samples generated from models A, B and C with exchangeable correlation structure Corr(W j, W k ) = 0.5 for j k. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 23 / 29

Performance Comparisons Simulation studies show: BST has good control of type I error rate, while consistently maintaining the highest power compared with the Bonferroni and Permutation test approaches. Advantage of BST is most evident when the 2 2 tables are highly correlated. BST is 10 times slower than Bonferroni (due to the computationally intensive simulation step). BST 100 times faster than the Permutation test (using 1000 permutations and 1000 Monte Carlo draws) BST is 1000 times faster than ART (our marginal screening test based on linear regression, which needs the double bootstrap). Ian McKeague (Columbia University) Marginal Screening August 13, 2017 24 / 29

Forward Stepwise BST Run BST. If a significant risk factor is found (say ˆk N ), then 1 Split the data on the remaining risk factors into two collections of p 1 tables: exposed or unexposed to ˆk N ; 2 For each of the remaining p 1 risk factors, calculate the Mantel-Haenszel OR estimate ˆθ k and standard error of log-or ˆτ k from each pair of the 2 2 tables. This yields a new test statistic T N. 3 Estimate the null distribution with the new correlation matrix C X estimated by the (p 1) (p 1) submatrix of the original estimate C X excluding the entries involving ˆk N. Repeat steps 1 3 until no more significant risk factors are found. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 25 / 29

Example: RACE Study N = 2493 with 1220 cases and 1273 controls. p = 2000: first 2,000 genetic variants on chromosome 5. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 26 / 29

Example: RACE Study (cont d) 95% confidence intervals based on CI boot : Ian McKeague (Columbia University) Marginal Screening August 13, 2017 27 / 29

Censored survival data (Huang, M, Q, 2017, Statistica Sinica, submitted) Only observe Ỹ = min(y, C), δ = 1 Y C, X = (X 1,..., X p ), where C is independent of X and ɛ. ART extends using a synthetic response Y S in place of Y. Koul, Susala and Van Ryzin (1981): linear regression based on Y S = δỹ where S(t) = P(C > t) survival function of C, S(Ỹ ) with plug-in of the K-M estimator of S. Correlations preserved: Corr(Y S, X k ) = Corr(Y, X k ) for all k, so the unreasonable effectiveness of marginal screening still applies. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 28 / 29

Selected References McKeague, I. W. and Qian, M. (2017). Marginal screening of 2 2 tables in large-scale case-control studies. In preparation for Biometrics. McKeague, I. W. and Qian, M. (2015). An adaptive resampling test for detecting the presence of significant predictors (with discussion). JASA 110, 1422 1433. Wang, H. Judy, McKeague, I. W. and Qian, M. (2017). Testing for marginal linear effects in quantile regression. JRSS-B, to appear. Huang, T.-J., McKeague, I. W. and Qian, M. (2017). Marginal screening for high-dimensional predictors of survival outcomes. Submitted to Statistica Sinica. Ian McKeague (Columbia University) Marginal Screening August 13, 2017 29 / 29