The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu June 24, 2014 Ian Barnett BIRS June 24, 2014 1 / 18

Background Genome-wide association studies (GWAS): millions of common (minor allele frequency > 0.05) SNPs genotyped. Gene-level/pathway-level analysis can provide power to detect these types of effects by combining information over the SNPs. Goal: Develop powerful, computationally efficient, statistical methodology for SNP-sets that have the power to detect joint SNP effects. Ian Barnett BIRS June 24, 2014 2 / 18

Model Model n subjects, q covariates, p genetic variants. Y i is phenotype for ith individual X i contains q covariates for ith individual G i contains SNP information (minor allele counts) in a gene/pathway/snp-set for ith individual α and β contain regression coefficients. µ i = E(Y i G i, X i ) h(µ i ) = X i α + G i β h( ) is the link function. Ian Barnett BIRS June 24, 2014 3 / 18

Marginal SNP test statistics The marginal score test statistic for the jth variant is: Z j = G T j (Y ˆµ 0) where ˆµ 0 is the MLE of E(Y H 0 ). Assume Z j is normalized. Letting UU T = Ĉov(Z) = ˆΣ, define the transformed (decorrelated) test statistics: Z = U 1 Z L MVN(0, I p) n Ian Barnett BIRS June 24, 2014 4 / 18

Current popular methods Method SKAT MinP p Test statistic j=1 Z j 2 max j { Z j } High power when signal sparsity is low. Accurate Pros p-values can be obtained quickly. Cons Can have very low power when sparsity is high. High power when signal sparsity is high. Slightly lower power when sparsity is low. Difficult to obtain accurate analytic p- values. Ian Barnett BIRS June 24, 2014 5 / 18

The higher criticism Let S(t) = p j=1 1 { Zj t} Assumes Σ = I p Under H 0, S(t) Binomial(p, 2 Φ(t)) where Φ(t) = 1 Φ(t) is the survival function of the normal distribution. The Higher Criticism test statistic is: HC = sup t>0 { } S(t) 2p Φ(t) 2p Φ(t)(1 2 Φ(t)) Ian Barnett BIRS June 24, 2014 6 / 18

The higher criticism Histogram of the Z i Density 0.00 0.10 0.20 0.30 4 2 0 2 4 t argmax{hc(t)} Ian Barnett BIRS June 24, 2014 7 / 18

Adjusting for correlation Recalling that Z = U 1 Z, let S (t) = p j=1 1 { Z j t} Note that under H 0, S (t) Binomial(p, 2 Φ(t)) regardless for general correlated Σ. The innovated Higher Criticism test statistic is: { } S (t) 2p Φ(t) ihc = sup t>0 2p Φ(t)(1 2 Φ(t)) Ian Barnett BIRS June 24, 2014 8 / 18

Adjusting for correlation Cancer Genetic Markers of Susceptibility (CGEM) Breast Cancer GWAS: FGFR2 gene Frequency 0 2 4 6 8 Z Frequency 0 2 4 6 8 Z * 4 2 0 2 4 Decorrelating causes ihc to lose power. Marginal test statistics Ian Barnett BIRS June 24, 2014 9 / 18

Comparison Method MinP SKAT ihc Robust to signal sparsity Robust to correlation/ld structure Computationally efficient Does not require decorrelating test statistics Ian Barnett BIRS June 24, 2014 10 / 18

Comparison Method MinP SKAT ihc GHC Robust to signal sparsity Robust to correlation/ld structure Computationally efficient Doesn t require decorrelating test statistics *We will also consider the omnibus test, OMNI, in our power simulations. It is based on the minimum p-value of the SKAT, MinP, and GHC. Ian Barnett BIRS June 24, 2014 11 / 18

Our contribution: the generalized higher critcism (GHC) Recall S(t) = p j=1 1 { Zj t} Now we allow Σ to have arbitrary correlation structure. S(t) is no longer binomial. Instead we approximate with Beta-binomial, matching on first two moments. The Generalized Higher Criticism test statistic is: S(t) 2p Φ(t) GHC = sup t>0 Var(S(t)) Ian Barnett BIRS June 24, 2014 12 / 18

The variance estimator Var(S(t)) Theorem 1 Let r n = 2 p(1 p) 1 k<l p (Σ kl) n and let H i (t) be the Hermite polynomials: H 0 (t) = 1, H 1 (t) = t, H 2 (t) = t 2 1 and so on. Then ( ) Cov S(t k ), S(t j ) = p[2 Φ(max{t j, t k }) 4 Φ(t j ) Φ(t k )] +4p(p 1)φ(t j )φ(t k ) H 2i 1 (t j )H 2i 1 (t k )r 2i i=1 (2i)! Proof follows from Schwartzman and Lin (2009) where they showed: P(Z k > t i, Z l > t j ) = Φ(t i ) Φ(t j ) + φ(t i )φ(t j ) n=1 Σ n kl n! H n 1(t i )H n 1 (t j ) Ian Barnett BIRS June 24, 2014 13 / 18

Analytic p-values for the GHC Letting h be the observed GHC statistic: S(t) 2p p-value = pr sup Φ(t) t>0 Var(S(t)) h There exists 0 < t 1 < < t p, such that ( p ) p-value = 1 pr {S(t k ) p k} k=1 Ian Barnett BIRS June 24, 2014 14 / 18

Power simulations ρ 1=0.4; ρ 3=0; p=40; 2 causal ρ 1=0.4; ρ 3=0; p=40; 4 causal power 0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.05 0.10 0.15 ρ 2 GHC MinP SKAT ihc OMNI power 0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.05 0.10 0.15 ρ 2 GHC MinP SKAT ihc OMNI ρ 1=0.4; ρ 3=0.4; p=40; 2 causal ρ 1=0.4; ρ 3=0.4; p=40; 4 causal power 0.0 0.2 0.4 0.6 0.8 1.0 GHC MinP SKAT ihc OMNI 0.0 0.1 0.2 0.3 0.4 0.5 0.6 ρ 2 power 0.0 0.2 0.4 0.6 0.8 1.0 ρ 1 : correlation within causal variants. ρ 2 : correlation between causal and noncausal variants. ρ 3 : correlation within non-causal variants. 0.0 0.1 0.2 0.3 0.4 0.5 ρ 2 Ian Barnett BIRS June 24, 2014 15 / 18 GHC MinP SKAT ihc OMNI

Data analysis The National Cancer Institute s Cancer Genetic Markers of Susceptibility (CGEM) breast cancer GWAS. Sample has 1145 cases, 1142 controls with european ancestry. 5 4 FGFR2 PTCD3 POLR1A Observed log 10(pvalue) 3 2 1 0 0 1 2 3 4 5 Expected log 10(pvalue) Ian Barnett BIRS June 24, 2014 16 / 18

Next (and final) step Thresholding tests (GHC and MinP) and summing tests (SKAT) are good complements Combining these classes of tests in a more principled way (than OMNI) is to use the following test statistic: p Z j γ I { Zj >t} γ = 2, t = 0 SKAT γ = 0 GHC sup γ,t We label this test as OPT. j=1 Ian Barnett BIRS June 24, 2014 17 / 18

Simulations p = 20, exchangeable correlation ρ For OPT, the supremum is selected from t (0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4) and γ (0, 0.5, 1, 1.5, 2). Non-zero β decrease with ρ. p=20; k=1; exchangeable correlation ρ power p=20; k=4; exchangeable correlation ρ power Power 0.0 0.2 0.4 0.6 0.8 1.0 SKAT MINP GHC OPT Power 0.0 0.2 0.4 0.6 0.8 1.0 SKAT MINP GHC OPT 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 ρ ρ Ian Barnett BIRS June 24, 2014 18 / 18