Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics PDF Free Download

Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 46

Lecture Overview 1. Variance Component Tests 2. Omnibus Tests 3. Weights 2 / 46

Recall: Region Based Analysis of Rare Variants Single variant test is not powerful to identify rare variant associations Strategy: Region based analysis Test the joint effect of rare/common variants in a gene/region while adjusting for covariates. Major Classes of Tests Burden/Collapsing tests Supervised/Adaptive Burden/Collapsing tests Variance component (similarity) based tests Omnibus tests: hedge against difference scenarios 3 / 46

Rare variants test: Variance component test Variance component test Burden tests are not powerful, if there exist variants with different association directions or many non-causal variants Variance component tests have been proposed to address it. Similarity based test 4 / 46

Rare variants test: Variance component test C-alpha test Neale BM, et al.(2011). Plos Genet. Case-control studies without covariates. Assume the jth variant is observed n j1 times, with r j1 times in cases. Under H 0 a A Total Case r j1 r j2 r Control s j1 s j2 s Total n j1 n j2 n r j1 Binomial(n j1, q) (q = r/n) 5 / 46

Rare variants test: Variance component test C-alpha test Risk increasing variant: r j1 qn j1 > 0 Risk decreasing variant: r j1 qn j1 < 0 Test statistic: T α = p p (r j1 qn j1 ) 2 n j1 q(1 q) j=1 j=1 This test is robust in the presence of the opposite association directions. 6 / 46

Rare variants test: Variance component test C-alpha test Weighting scheme T α = p p w j (r j1 qn j1 ) 2 w j n j1 q(1 q) j=1 j=1 Test for the over-dispersion due to genetic effects Neyman s C(α) test. 7 / 46

Rare variants test: Variance component test C-alpha test, P-value calculation Using normal approximation, since the test statistic is the sum of random variables. T α = p p (r j1 qn j1 ) 2 n j1 q(1 q) j=1 j=1 Doesn t work well when p is small (or moderate). P-value is computed using permutation. 8 / 46

Rare variants test: Variance component test C-alpha test C-alpha test is robust in the presence of the different association directions Disadvantages: Permutation is computationally expensive. Cannot adjust for covariates. 9 / 46

Rare variants test: Variance component test Sequence Kernal Association Test (SKAT) Wu et al.(2010, 2011). AJHG Recall the original regression models: Variance component test: Assume β j dist.(0, w 2 j τ). µ i /logit(µ i ) = α 0 + X T i α + G T i β H0 : β 1 = = β p = 0 <=> H 0 : τ = 0. 10 / 46

Rare variants test: Variance component test Sequence Kernel Association Test (SKAT) β j dist.(0, wj 2 τ): τ = 0 is on the boundary of the hypothesis. Score test statistic for τ = 0: Q SKAT = (y µ 0 ) K(y µ 0 ), K = GWWG : weighted linear kernel (W = diag[w 1,..., w p ]). 11 / 46

Rare variants test: Variance component test Sequence Kernel Association Test (SKAT) The C-alpha test is a special case of SKAT With no covariates and flat weights: p Q SKAT = (r j1 qn j1 ) 2 j=1 12 / 46

Rare variants test: Variance component test SKAT Q SKAT is a weighted sum of single variant score statistics Q SKAT = (y µ 0 ) GWWG (y µ 0 ) p p = wj 2 [g j(y µ 0 )] = wj 2 Uj 2 j=1 where U j = n i=1 g ij(y i µ 0i ). U j is a score of individual SNP j only model: j=1 µ i /logit(µ i ) = α 0 + X T i α + g ij β j 13 / 46

Rare variants test: Variance component test SKAT Q SKAT (asymptotically) follows a mixture of χ 2 distribution under the NULL. Q = (y µ 0 ) K(y µ 0 ) = (y µ 0 ) V 1/2 V1/2 K V 1/2 V 1/2 (y µ 0 ) p = λ j [u V j 1/2 (y µ 0 )] 2 j=1 p λ j χ 2 1,j j=1 14 / 46

Rare variants test: Variance component test SKAT λ j and u j are eigenvalues and eigenvectors of P 1/2 KP 1/2. where P = V 1 V 1 X( X V 1 X) 1 X V 1 is the project matrix to account that α is estimated. 15 / 46

Rare variants test: Variance component test SKAT: P-value calculation P-values can be computed by inverting the characteristic function using Davies method (1973, 1980) Characteristic function ϕ x (t) = E(e itx ). p Characteristic function of j=1 λ jχ 2 1,j p ϕ x (t) = (1 2λ j it) 1/2. i=j Inversion Formula P(X < u) = 1 2 1 π 0 Im[e itu ϕ x (t)] t dt. 16 / 46

Rare variants test: Variance component test Small sample adjustment Lee et al.(2012). AJHG When the sample size is small and the trait is binary, asymptotics does not work well. SKAT test statistic: Q SKAT = (y µ 0 ) K(y µ 0 ) p = λ v ηv 2, v=1 η v s are asymptotically independent and follow N(0,1). 17 / 46

Rare variants test: Variance component test Small sample adjustment When the trait is binary and the sample size is small: Var(η v ) < 1. ηv s are negatively correlated. 18 / 46

Rare variants test: Variance component test Small sample adjustment Mean and variance of the Q SKAT Large Sample Small Sample Mean λj λj Variance λ 2 j λj λ k c jk Adjust null distribution of Q SKAT using the estimated small sample variance. 19 / 46

Rare variants test: Variance component test Small sample adjustment Variance adjustment is not enough to accurately approximate far tail areas. Kurtosis adjustment: Estimate the kurtosis of QSKAT using parametric bootstrapping: γ (estimated kurtosis) D.F. estimator: df = 12/ˆγ Null distribution (Q SKAT λ 2 2 df j ) + df χ 2 df λj λ k c jk 20 / 46

Rare variants test: Variance component test Small sample adjustment Figure: ARDS data (89 samples) 21 / 46

Rare variants test: Variance component test General SKAT General SKAT Model: where h i GP(0, τk). µ i /logit(µ i ) = α 0 + X i α + h i Kernel K(G i, G i ) measures genetic similarity between two subjects. 22 / 46

Rare variants test: Variance component test General SKAT Examples: Linear kernel=linear effect K(Z i, Z i ) = w 2 1 Z i1 Z i 1 + w 2 p Z ip Z i p IBS Kernel (Epistatic Effect: SNP-SNP interactions) p k=1 K(Z i, Z j ) = w k 2IBS(Z ik, Z jk ) 2p 23 / 46

Omnibus Tests Omnibus Tests Questions: Which group of variants test? I.e. what is the threshold for rare? Which type of test should I use? Variance component or burden? Truth is unknown: depends on the situation Omnibus tests: work well across situation 24 / 46

Omnibus Tests Variable threshold (VT) test Most methods use a fixed threshold for rare variants: 0.5%, 1%,... 5%? Choosing an appropriate threshold can have a huge impact on power: prefer to restrict analysis to meaningful variants 25 / 46

Test statistics: Z max = max t Z(t) Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants Omnibus Tests Variable threshold (VT) test Price AL, Kryukov GV, et al.(2010) AJHG Find the optimal threshold to increase the power. Weight: wj(t) = { 1 if mafj t 0 if maf j > t Ci (t) = w j (t)g ij where Z(t) is a Z-score of C i. 26 / 46

Omnibus Tests P-value Calculations of Variable threshold (VT) test Price et al.proposed to use permutation to get a p-value Lin and Tang (2011) showed that the p-values can be calculated through numerical integration using normal approximation 27 / 46

Omnibus Tests Variable threshold (VT) test More robust than using a fixed threshold. Provide information on the MAF ranges of the causal variants. Lose power if there exist variants with opposite association directions. 28 / 46

Omnibus Tests SKAT vs. Collapsing Collapsing tests are more powerful when a large % of variants are causal and effects are in the same direction. SKAT is more powerful when a small % of variants are causal, or the effects have mixed directions. Both scenarios can happen when scanning the genome. Best test to use depends on the underlying biology. Difficult to choose which test to use in practice. We want to develop a unified test that works well in both situations. Omnibus tests 29 / 46

Omnibus Tests Combine p-values of Burden and SKAT Derkach A et al.(2013) Genetic Epi, 37:110-121 Fisher method: Q Fisher = 2 log(p Burden ) 2 log(p SKAT ) Q Fisher follows χ 2 with 4 d.f when these two p-values are independent Since they are not independent, p-values are calculated using resampling Mist (Sun et al. 2013) modified the SKAT test statistics to make them independent 30 / 46

Omnibus Tests Combine Test Statistics: Unified Test Statistics Lee et al.(2012). Biostatistics Combined Test of Burden tests and SKAT Q ρ = (1 ρ)q SKAT + ρq Burden, 0 ρ 1. Q ρ includes SKAT and burden tests. ρ = 0: SKAT ρ = 1: Burden 31 / 46

Omnibus Tests Derivation of the Unified Test Statistics Model: g(µ i ) = X i α + G i β where β j /w j follows any arbitrary distribution with mean 0 and variance τ and the correlation among β j s is ρ. Special cases: SKAT: ρ = 0 Burden: ρ = 1 Combined: 0 ρ 1 32 / 46

Omnibus Tests Derivation of the Unified Test Statsitics Q ρ is a test statistic of the SKAT with corr(β) = R(ρ): R(ρ) = (1 ρ)i + ρ1 1 (compound symmetric) Kρ = GWR(ρ)WG. Q ρ = (y ˆµ) K ρ (y ˆµ) = (1 ρ)q SKAT + ρq Burden 33 / 46

Omnibus Tests Adaptive Test (SKAT-O) Use the smallest p-value from different ρs: T = inf P ρ. 0 ρ 1 where P ρ is the p-value of Q ρ for given ρ. Test statistic: T = minp ρb, 0 = ρ 1 <... < ρ B = 1. 34 / 46

Omnibus Tests Adaptive Test (SKAT-O) Q ρ is a mixture of two quadratic forms. Q ρ = (1 ρ)(y ˆµ) GWWG (y ˆµ) + ρ(y ˆµ) GW 1 1 WG (y ˆµ) = (1 ρ)(y ˆµ) K 1 (y ˆµ) + ρ(y ˆµ) K 2 (y ˆµ) Q ρ is asymptotically equivalent to (1 ρ)κ + a(ρ)η 0, where and η 0 χ 2 1, κ approximately follows a mixture of χ2. 35 / 46

Omnibus Tests SKAT-O Q ρ is the asymptotically same as the sum of two independent random variables. (1 ρ)κ + a(ρ)η 0 η 0 χ 2 1 Approximate κ via moments matching. P-value of T: 1 Pr {Q ρ1 < q ρ1 (T ),..., Q ρb < q ρb (T )} = 1 E [Pr {(1 ρ 1 )κ + a(ρ 1 )η 0 < q ρ1 (T ),... η 0 }] = 1 E [P {κ < min{(q ρv (T )) a(ρ v )η 0 )/(1 ρ v )} η 0 }], where q ρ (T ) = quantile function of Q ρ 36 / 46

Omnibus Tests Simulation Simulate sequencing data using COSI 3kb randomly selected regions. Percentages of causal variants = 10%, 20%, or 50%. (β j > 0)% among causal variants = 100% or 80%. Three methods Burden test with beta(1,25) weight SKAT SKAT-O 37 / 46

Omnibus Tests Simulation 38 / 46

Omnibus Tests Simulation SKAT is more powerful than Burden test (Collapsing) when Existence of +/ βs Small percentage of variants are causal variants Burden test is more powerful than SKAT when All βs were positive and a large proportion of variants were casual variants SKAT-O is robustly powerful under different scenarios. 39 / 46

Omnibus Tests Summary Region based tests can increase the power of rare variants analysis. Relative performance of rare variant tests depends on underlying disease models The combined test (omnibus test), e.g, SKAT-O, is robust and powerful in different scenarios 40 / 46

Weighting and Thresholding MAF based weighting It is generally assumed that rarer variants are more likely to be causal variants with larger effect sizes. Simple thresholding is widely used. w(maf j ) = { 1 if MAFj < c 0 if MAF j c 41 / 46

Weighting and Thresholding MAF based weighting Instead of thresholding, continuous weighting can be used to upweight rarer variants. Ex: Flexible beta density function. w(maf j ) = (MAF j ) α 1 (1 MAF j ) β 1 (α = 0.5, β = 0.5) : Madsen and Browning weight (α = 1, β = 1) : Flat weight 42 / 46

Weighting and Thresholding MAF based weighting- beta weight 43 / 46

Weighting and Thresholding MAF based weighting- logistic weight Soft-thresholding. w(maf j ) = exp((α maf j )β)/{1 + exp((α maf j )β} 44 / 46

Weighting and Thresholding Weighting Using Functional information Variants have different functionalities. Non-synonymous mutations (e.g. missense and nonsense mutations) change the amino-acid (AA) sequence. Synonymous mutations do not change AA sequence. 45 / 46

Weighting and Thresholding Weighting Using Functional information Bioinformatic tools to predict the functionality of mutations. Polyphen2 (http://genetics.bwh.harvard.edu/pph2/) SIFT (http://sift.jcvi.org/) Test only functional mutations can increase the power. 46 / 46

Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017