Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017

Similar documents
Power and sample size calculations for designing rare variant sequencing association studies.


Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

Pooled Association Tests for Rare Genetic Variants: A Review and Some New Results

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials

Introduction to Statistical Analysis

Marginal Screening and Post-Selection Inference

Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing

An Adaptive Association Test for Microbiome Data

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Computational Systems Biology: Biology X

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Comparison of Statistical Tests for Disease Association with Rare Variants

THE STATISTICAL ANALYSIS OF GENETIC SEQUENCING AND RARE VARIANT ASSOCIATION STUDIES. Eugene Urrutia

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Testing for group differences in brain functional connectivity

Lecture 7 Introduction to Statistical Decision Theory

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Methods for Cryptic Structure. Methods for Cryptic Structure

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative trait evolution with mutations of large effect

Estimating the Marginal Odds Ratio in Observational Studies

Big Hypothesis Testing with Kernel Embeddings

(Genome-wide) association analysis

Introduction to QTL mapping in model organisms

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15.

On Identifying Rare Variants for Complex Human Traits

Lecture 7: Hypothesis Testing and ANOVA

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

Group exponential penalties for bi-level variable selection

STA 216, GLM, Lecture 16. October 29, 2007

Multidimensional heritability analysis of neuroanatomical shape. Jingwei Li

MATH5745 Multivariate Methods Lecture 07

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES

Combining multiple observational data sources to estimate causal eects

Stat 642, Lecture notes for 04/12/05 96

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Generalized Linear Models (1/29/13)

Vast Volatility Matrix Estimation for High Frequency Data

Introduction to QTL mapping in model organisms

Frequency Estimation of Rare Events by Adaptive Thresholding

Stat 579: Generalized Linear Models and Extensions

Some New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models.

Psychology 282 Lecture #4 Outline Inferences in SLR

Machine Learning Linear Classification. Prof. Matteo Matteucci

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Econometric Methods for Panel Data

,..., θ(2),..., θ(n)

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Lecture 5: LDA and Logistic Regression

A Conditional Approach to Modeling Multivariate Extremes

SEM for Categorical Outcomes

Introduction to QTL mapping in model organisms

Feature Engineering, Model Evaluations

Statistical Applications in Genetics and Molecular Biology

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Test of Association between Two Ordinal Variables while Adjusting for Covariates

Master s Written Examination

A Practitioner s Guide to Cluster-Robust Inference

N Utilization of Nursing Research in Advanced Practice, Summer 2008

Sliced Inverse Regression

Multiple QTL mapping

Fractal functional regression for classification of gene expression data by wavelets

Linear Regression (1/1/17)

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Unconstrained Ordination

Stat 710: Mathematical Statistics Lecture 31

Lecture 32: Asymptotic confidence sets and likelihoods

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Figure 36: Respiratory infection versus time for the first 49 children.

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

1 Preliminary Variance component test in GLM Mediation Analysis... 3

Cox s proportional hazards model and Cox s partial likelihood

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

On Expected Gaussian Random Determinants

Cover Page. The handle holds various files of this Leiden University dissertation

Association studies and regression

Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions

simple if it completely specifies the density of x

Optimal exact tests for complex alternative hypotheses on cross tabulated data

Exam: high-dimensional data analysis February 28, 2014

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quasi-likelihood Scan Statistics for Detection of

Hierarchical Linear Models. Hierarchical Linear Models. Much of this material already seen in Chapters 5 and 14. Hyperprior on K parameters α:

Sample Size and Power Considerations for Longitudinal Studies

Transcription:

Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 46

Lecture Overview 1. Variance Component Tests 2. Omnibus Tests 3. Weights 2 / 46

Recall: Region Based Analysis of Rare Variants Single variant test is not powerful to identify rare variant associations Strategy: Region based analysis Test the joint effect of rare/common variants in a gene/region while adjusting for covariates. Major Classes of Tests Burden/Collapsing tests Supervised/Adaptive Burden/Collapsing tests Variance component (similarity) based tests Omnibus tests: hedge against difference scenarios 3 / 46

Rare variants test: Variance component test Variance component test Burden tests are not powerful, if there exist variants with different association directions or many non-causal variants Variance component tests have been proposed to address it. Similarity based test 4 / 46

Rare variants test: Variance component test C-alpha test Neale BM, et al.(2011). Plos Genet. Case-control studies without covariates. Assume the jth variant is observed n j1 times, with r j1 times in cases. Under H 0 a A Total Case r j1 r j2 r Control s j1 s j2 s Total n j1 n j2 n r j1 Binomial(n j1, q) (q = r/n) 5 / 46

Rare variants test: Variance component test C-alpha test Risk increasing variant: r j1 qn j1 > 0 Risk decreasing variant: r j1 qn j1 < 0 Test statistic: T α = p p (r j1 qn j1 ) 2 n j1 q(1 q) j=1 j=1 This test is robust in the presence of the opposite association directions. 6 / 46

Rare variants test: Variance component test C-alpha test Weighting scheme T α = p p w j (r j1 qn j1 ) 2 w j n j1 q(1 q) j=1 j=1 Test for the over-dispersion due to genetic effects Neyman s C(α) test. 7 / 46

Rare variants test: Variance component test C-alpha test, P-value calculation Using normal approximation, since the test statistic is the sum of random variables. T α = p p (r j1 qn j1 ) 2 n j1 q(1 q) j=1 j=1 Doesn t work well when p is small (or moderate). P-value is computed using permutation. 8 / 46

Rare variants test: Variance component test C-alpha test C-alpha test is robust in the presence of the different association directions Disadvantages: Permutation is computationally expensive. Cannot adjust for covariates. 9 / 46

Rare variants test: Variance component test Sequence Kernal Association Test (SKAT) Wu et al.(2010, 2011). AJHG Recall the original regression models: Variance component test: Assume β j dist.(0, w 2 j τ). µ i /logit(µ i ) = α 0 + X T i α + G T i β H0 : β 1 = = β p = 0 <=> H 0 : τ = 0. 10 / 46

Rare variants test: Variance component test Sequence Kernel Association Test (SKAT) β j dist.(0, wj 2 τ): τ = 0 is on the boundary of the hypothesis. Score test statistic for τ = 0: Q SKAT = (y µ 0 ) K(y µ 0 ), K = GWWG : weighted linear kernel (W = diag[w 1,..., w p ]). 11 / 46

Rare variants test: Variance component test Sequence Kernel Association Test (SKAT) The C-alpha test is a special case of SKAT With no covariates and flat weights: p Q SKAT = (r j1 qn j1 ) 2 j=1 12 / 46

Rare variants test: Variance component test SKAT Q SKAT is a weighted sum of single variant score statistics Q SKAT = (y µ 0 ) GWWG (y µ 0 ) p p = wj 2 [g j(y µ 0 )] = wj 2 Uj 2 j=1 where U j = n i=1 g ij(y i µ 0i ). U j is a score of individual SNP j only model: j=1 µ i /logit(µ i ) = α 0 + X T i α + g ij β j 13 / 46

Rare variants test: Variance component test SKAT Q SKAT (asymptotically) follows a mixture of χ 2 distribution under the NULL. Q = (y µ 0 ) K(y µ 0 ) = (y µ 0 ) V 1/2 V1/2 K V 1/2 V 1/2 (y µ 0 ) p = λ j [u V j 1/2 (y µ 0 )] 2 j=1 p λ j χ 2 1,j j=1 14 / 46

Rare variants test: Variance component test SKAT λ j and u j are eigenvalues and eigenvectors of P 1/2 KP 1/2. where P = V 1 V 1 X( X V 1 X) 1 X V 1 is the project matrix to account that α is estimated. 15 / 46

Rare variants test: Variance component test SKAT: P-value calculation P-values can be computed by inverting the characteristic function using Davies method (1973, 1980) Characteristic function ϕ x (t) = E(e itx ). p Characteristic function of j=1 λ jχ 2 1,j p ϕ x (t) = (1 2λ j it) 1/2. i=j Inversion Formula P(X < u) = 1 2 1 π 0 Im[e itu ϕ x (t)] t dt. 16 / 46

Rare variants test: Variance component test Small sample adjustment Lee et al.(2012). AJHG When the sample size is small and the trait is binary, asymptotics does not work well. SKAT test statistic: Q SKAT = (y µ 0 ) K(y µ 0 ) p = λ v ηv 2, v=1 η v s are asymptotically independent and follow N(0,1). 17 / 46

Rare variants test: Variance component test Small sample adjustment When the trait is binary and the sample size is small: Var(η v ) < 1. ηv s are negatively correlated. 18 / 46

Rare variants test: Variance component test Small sample adjustment Mean and variance of the Q SKAT Large Sample Small Sample Mean λj λj Variance λ 2 j λj λ k c jk Adjust null distribution of Q SKAT using the estimated small sample variance. 19 / 46

Rare variants test: Variance component test Small sample adjustment Variance adjustment is not enough to accurately approximate far tail areas. Kurtosis adjustment: Estimate the kurtosis of QSKAT using parametric bootstrapping: γ (estimated kurtosis) D.F. estimator: df = 12/ˆγ Null distribution (Q SKAT λ 2 2 df j ) + df χ 2 df λj λ k c jk 20 / 46

Rare variants test: Variance component test Small sample adjustment Figure: ARDS data (89 samples) 21 / 46

Rare variants test: Variance component test General SKAT General SKAT Model: where h i GP(0, τk). µ i /logit(µ i ) = α 0 + X i α + h i Kernel K(G i, G i ) measures genetic similarity between two subjects. 22 / 46

Rare variants test: Variance component test General SKAT Examples: Linear kernel=linear effect K(Z i, Z i ) = w 2 1 Z i1 Z i 1 + w 2 p Z ip Z i p IBS Kernel (Epistatic Effect: SNP-SNP interactions) p k=1 K(Z i, Z j ) = w k 2IBS(Z ik, Z jk ) 2p 23 / 46

Omnibus Tests Omnibus Tests Questions: Which group of variants test? I.e. what is the threshold for rare? Which type of test should I use? Variance component or burden? Truth is unknown: depends on the situation Omnibus tests: work well across situation 24 / 46

Omnibus Tests Variable threshold (VT) test Most methods use a fixed threshold for rare variants: 0.5%, 1%,... 5%? Choosing an appropriate threshold can have a huge impact on power: prefer to restrict analysis to meaningful variants 25 / 46

Test statistics: Z max = max t Z(t) Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants Omnibus Tests Variable threshold (VT) test Price AL, Kryukov GV, et al.(2010) AJHG Find the optimal threshold to increase the power. Weight: wj(t) = { 1 if mafj t 0 if maf j > t Ci (t) = w j (t)g ij where Z(t) is a Z-score of C i. 26 / 46

Omnibus Tests P-value Calculations of Variable threshold (VT) test Price et al.proposed to use permutation to get a p-value Lin and Tang (2011) showed that the p-values can be calculated through numerical integration using normal approximation 27 / 46

Omnibus Tests Variable threshold (VT) test More robust than using a fixed threshold. Provide information on the MAF ranges of the causal variants. Lose power if there exist variants with opposite association directions. 28 / 46

Omnibus Tests SKAT vs. Collapsing Collapsing tests are more powerful when a large % of variants are causal and effects are in the same direction. SKAT is more powerful when a small % of variants are causal, or the effects have mixed directions. Both scenarios can happen when scanning the genome. Best test to use depends on the underlying biology. Difficult to choose which test to use in practice. We want to develop a unified test that works well in both situations. Omnibus tests 29 / 46

Omnibus Tests Combine p-values of Burden and SKAT Derkach A et al.(2013) Genetic Epi, 37:110-121 Fisher method: Q Fisher = 2 log(p Burden ) 2 log(p SKAT ) Q Fisher follows χ 2 with 4 d.f when these two p-values are independent Since they are not independent, p-values are calculated using resampling Mist (Sun et al. 2013) modified the SKAT test statistics to make them independent 30 / 46

Omnibus Tests Combine Test Statistics: Unified Test Statistics Lee et al.(2012). Biostatistics Combined Test of Burden tests and SKAT Q ρ = (1 ρ)q SKAT + ρq Burden, 0 ρ 1. Q ρ includes SKAT and burden tests. ρ = 0: SKAT ρ = 1: Burden 31 / 46

Omnibus Tests Derivation of the Unified Test Statistics Model: g(µ i ) = X i α + G i β where β j /w j follows any arbitrary distribution with mean 0 and variance τ and the correlation among β j s is ρ. Special cases: SKAT: ρ = 0 Burden: ρ = 1 Combined: 0 ρ 1 32 / 46

Omnibus Tests Derivation of the Unified Test Statsitics Q ρ is a test statistic of the SKAT with corr(β) = R(ρ): R(ρ) = (1 ρ)i + ρ1 1 (compound symmetric) Kρ = GWR(ρ)WG. Q ρ = (y ˆµ) K ρ (y ˆµ) = (1 ρ)q SKAT + ρq Burden 33 / 46

Omnibus Tests Adaptive Test (SKAT-O) Use the smallest p-value from different ρs: T = inf P ρ. 0 ρ 1 where P ρ is the p-value of Q ρ for given ρ. Test statistic: T = minp ρb, 0 = ρ 1 <... < ρ B = 1. 34 / 46

Omnibus Tests Adaptive Test (SKAT-O) Q ρ is a mixture of two quadratic forms. Q ρ = (1 ρ)(y ˆµ) GWWG (y ˆµ) + ρ(y ˆµ) GW 1 1 WG (y ˆµ) = (1 ρ)(y ˆµ) K 1 (y ˆµ) + ρ(y ˆµ) K 2 (y ˆµ) Q ρ is asymptotically equivalent to (1 ρ)κ + a(ρ)η 0, where and η 0 χ 2 1, κ approximately follows a mixture of χ2. 35 / 46

Omnibus Tests SKAT-O Q ρ is the asymptotically same as the sum of two independent random variables. (1 ρ)κ + a(ρ)η 0 η 0 χ 2 1 Approximate κ via moments matching. P-value of T: 1 Pr {Q ρ1 < q ρ1 (T ),..., Q ρb < q ρb (T )} = 1 E [Pr {(1 ρ 1 )κ + a(ρ 1 )η 0 < q ρ1 (T ),... η 0 }] = 1 E [P {κ < min{(q ρv (T )) a(ρ v )η 0 )/(1 ρ v )} η 0 }], where q ρ (T ) = quantile function of Q ρ 36 / 46

Omnibus Tests Simulation Simulate sequencing data using COSI 3kb randomly selected regions. Percentages of causal variants = 10%, 20%, or 50%. (β j > 0)% among causal variants = 100% or 80%. Three methods Burden test with beta(1,25) weight SKAT SKAT-O 37 / 46

Omnibus Tests Simulation 38 / 46

Omnibus Tests Simulation SKAT is more powerful than Burden test (Collapsing) when Existence of +/ βs Small percentage of variants are causal variants Burden test is more powerful than SKAT when All βs were positive and a large proportion of variants were casual variants SKAT-O is robustly powerful under different scenarios. 39 / 46

Omnibus Tests Summary Region based tests can increase the power of rare variants analysis. Relative performance of rare variant tests depends on underlying disease models The combined test (omnibus test), e.g, SKAT-O, is robust and powerful in different scenarios 40 / 46

Weighting and Thresholding MAF based weighting It is generally assumed that rarer variants are more likely to be causal variants with larger effect sizes. Simple thresholding is widely used. w(maf j ) = { 1 if MAFj < c 0 if MAF j c 41 / 46

Weighting and Thresholding MAF based weighting Instead of thresholding, continuous weighting can be used to upweight rarer variants. Ex: Flexible beta density function. w(maf j ) = (MAF j ) α 1 (1 MAF j ) β 1 (α = 0.5, β = 0.5) : Madsen and Browning weight (α = 1, β = 1) : Flat weight 42 / 46

Weighting and Thresholding MAF based weighting- beta weight 43 / 46

Weighting and Thresholding MAF based weighting- logistic weight Soft-thresholding. w(maf j ) = exp((α maf j )β)/{1 + exp((α maf j )β} 44 / 46

Weighting and Thresholding Weighting Using Functional information Variants have different functionalities. Non-synonymous mutations (e.g. missense and nonsense mutations) change the amino-acid (AA) sequence. Synonymous mutations do not change AA sequence. 45 / 46

Weighting and Thresholding Weighting Using Functional information Bioinformatic tools to predict the functionality of mutations. Polyphen2 (http://genetics.bwh.harvard.edu/pph2/) SIFT (http://sift.jcvi.org/) Test only functional mutations can increase the power. 46 / 46