Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017

Size: px
Start display at page:

Download "Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017"

Transcription

1 Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics / 46

2 Lecture Overview 1. Variance Component Tests 2. Omnibus Tests 3. Weights 2 / 46

3 Recall: Region Based Analysis of Rare Variants Single variant test is not powerful to identify rare variant associations Strategy: Region based analysis Test the joint effect of rare/common variants in a gene/region while adjusting for covariates. Major Classes of Tests Burden/Collapsing tests Supervised/Adaptive Burden/Collapsing tests Variance component (similarity) based tests Omnibus tests: hedge against difference scenarios 3 / 46

4 Rare variants test: Variance component test Variance component test Burden tests are not powerful, if there exist variants with different association directions or many non-causal variants Variance component tests have been proposed to address it. Similarity based test 4 / 46

5 Rare variants test: Variance component test C-alpha test Neale BM, et al.(2011). Plos Genet. Case-control studies without covariates. Assume the jth variant is observed n j1 times, with r j1 times in cases. Under H 0 a A Total Case r j1 r j2 r Control s j1 s j2 s Total n j1 n j2 n r j1 Binomial(n j1, q) (q = r/n) 5 / 46

6 Rare variants test: Variance component test C-alpha test Risk increasing variant: r j1 qn j1 > 0 Risk decreasing variant: r j1 qn j1 < 0 Test statistic: T α = p p (r j1 qn j1 ) 2 n j1 q(1 q) j=1 j=1 This test is robust in the presence of the opposite association directions. 6 / 46

7 Rare variants test: Variance component test C-alpha test Weighting scheme T α = p p w j (r j1 qn j1 ) 2 w j n j1 q(1 q) j=1 j=1 Test for the over-dispersion due to genetic effects Neyman s C(α) test. 7 / 46

8 Rare variants test: Variance component test C-alpha test, P-value calculation Using normal approximation, since the test statistic is the sum of random variables. T α = p p (r j1 qn j1 ) 2 n j1 q(1 q) j=1 j=1 Doesn t work well when p is small (or moderate). P-value is computed using permutation. 8 / 46

9 Rare variants test: Variance component test C-alpha test C-alpha test is robust in the presence of the different association directions Disadvantages: Permutation is computationally expensive. Cannot adjust for covariates. 9 / 46

10 Rare variants test: Variance component test Sequence Kernal Association Test (SKAT) Wu et al.(2010, 2011). AJHG Recall the original regression models: Variance component test: Assume β j dist.(0, w 2 j τ). µ i /logit(µ i ) = α 0 + X T i α + G T i β H0 : β 1 = = β p = 0 <=> H 0 : τ = / 46

11 Rare variants test: Variance component test Sequence Kernel Association Test (SKAT) β j dist.(0, wj 2 τ): τ = 0 is on the boundary of the hypothesis. Score test statistic for τ = 0: Q SKAT = (y µ 0 ) K(y µ 0 ), K = GWWG : weighted linear kernel (W = diag[w 1,..., w p ]). 11 / 46

12 Rare variants test: Variance component test Sequence Kernel Association Test (SKAT) The C-alpha test is a special case of SKAT With no covariates and flat weights: p Q SKAT = (r j1 qn j1 ) 2 j=1 12 / 46

13 Rare variants test: Variance component test SKAT Q SKAT is a weighted sum of single variant score statistics Q SKAT = (y µ 0 ) GWWG (y µ 0 ) p p = wj 2 [g j(y µ 0 )] = wj 2 Uj 2 j=1 where U j = n i=1 g ij(y i µ 0i ). U j is a score of individual SNP j only model: j=1 µ i /logit(µ i ) = α 0 + X T i α + g ij β j 13 / 46

14 Rare variants test: Variance component test SKAT Q SKAT (asymptotically) follows a mixture of χ 2 distribution under the NULL. Q = (y µ 0 ) K(y µ 0 ) = (y µ 0 ) V 1/2 V1/2 K V 1/2 V 1/2 (y µ 0 ) p = λ j [u V j 1/2 (y µ 0 )] 2 j=1 p λ j χ 2 1,j j=1 14 / 46

15 Rare variants test: Variance component test SKAT λ j and u j are eigenvalues and eigenvectors of P 1/2 KP 1/2. where P = V 1 V 1 X( X V 1 X) 1 X V 1 is the project matrix to account that α is estimated. 15 / 46

16 Rare variants test: Variance component test SKAT: P-value calculation P-values can be computed by inverting the characteristic function using Davies method (1973, 1980) Characteristic function ϕ x (t) = E(e itx ). p Characteristic function of j=1 λ jχ 2 1,j p ϕ x (t) = (1 2λ j it) 1/2. i=j Inversion Formula P(X < u) = π 0 Im[e itu ϕ x (t)] t dt. 16 / 46

17 Rare variants test: Variance component test Small sample adjustment Lee et al.(2012). AJHG When the sample size is small and the trait is binary, asymptotics does not work well. SKAT test statistic: Q SKAT = (y µ 0 ) K(y µ 0 ) p = λ v ηv 2, v=1 η v s are asymptotically independent and follow N(0,1). 17 / 46

18 Rare variants test: Variance component test Small sample adjustment When the trait is binary and the sample size is small: Var(η v ) < 1. ηv s are negatively correlated. 18 / 46

19 Rare variants test: Variance component test Small sample adjustment Mean and variance of the Q SKAT Large Sample Small Sample Mean λj λj Variance λ 2 j λj λ k c jk Adjust null distribution of Q SKAT using the estimated small sample variance. 19 / 46

20 Rare variants test: Variance component test Small sample adjustment Variance adjustment is not enough to accurately approximate far tail areas. Kurtosis adjustment: Estimate the kurtosis of QSKAT using parametric bootstrapping: γ (estimated kurtosis) D.F. estimator: df = 12/ˆγ Null distribution (Q SKAT λ 2 2 df j ) + df χ 2 df λj λ k c jk 20 / 46

21 Rare variants test: Variance component test Small sample adjustment Figure: ARDS data (89 samples) 21 / 46

22 Rare variants test: Variance component test General SKAT General SKAT Model: where h i GP(0, τk). µ i /logit(µ i ) = α 0 + X i α + h i Kernel K(G i, G i ) measures genetic similarity between two subjects. 22 / 46

23 Rare variants test: Variance component test General SKAT Examples: Linear kernel=linear effect K(Z i, Z i ) = w 2 1 Z i1 Z i 1 + w 2 p Z ip Z i p IBS Kernel (Epistatic Effect: SNP-SNP interactions) p k=1 K(Z i, Z j ) = w k 2IBS(Z ik, Z jk ) 2p 23 / 46

24 Omnibus Tests Omnibus Tests Questions: Which group of variants test? I.e. what is the threshold for rare? Which type of test should I use? Variance component or burden? Truth is unknown: depends on the situation Omnibus tests: work well across situation 24 / 46

25 Omnibus Tests Variable threshold (VT) test Most methods use a fixed threshold for rare variants: 0.5%, 1%,... 5%? Choosing an appropriate threshold can have a huge impact on power: prefer to restrict analysis to meaningful variants 25 / 46

26 Test statistics: Z max = max t Z(t) Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants Omnibus Tests Variable threshold (VT) test Price AL, Kryukov GV, et al.(2010) AJHG Find the optimal threshold to increase the power. Weight: wj(t) = { 1 if mafj t 0 if maf j > t Ci (t) = w j (t)g ij where Z(t) is a Z-score of C i. 26 / 46

27 Omnibus Tests P-value Calculations of Variable threshold (VT) test Price et al.proposed to use permutation to get a p-value Lin and Tang (2011) showed that the p-values can be calculated through numerical integration using normal approximation 27 / 46

28 Omnibus Tests Variable threshold (VT) test More robust than using a fixed threshold. Provide information on the MAF ranges of the causal variants. Lose power if there exist variants with opposite association directions. 28 / 46

29 Omnibus Tests SKAT vs. Collapsing Collapsing tests are more powerful when a large % of variants are causal and effects are in the same direction. SKAT is more powerful when a small % of variants are causal, or the effects have mixed directions. Both scenarios can happen when scanning the genome. Best test to use depends on the underlying biology. Difficult to choose which test to use in practice. We want to develop a unified test that works well in both situations. Omnibus tests 29 / 46

30 Omnibus Tests Combine p-values of Burden and SKAT Derkach A et al.(2013) Genetic Epi, 37: Fisher method: Q Fisher = 2 log(p Burden ) 2 log(p SKAT ) Q Fisher follows χ 2 with 4 d.f when these two p-values are independent Since they are not independent, p-values are calculated using resampling Mist (Sun et al. 2013) modified the SKAT test statistics to make them independent 30 / 46

31 Omnibus Tests Combine Test Statistics: Unified Test Statistics Lee et al.(2012). Biostatistics Combined Test of Burden tests and SKAT Q ρ = (1 ρ)q SKAT + ρq Burden, 0 ρ 1. Q ρ includes SKAT and burden tests. ρ = 0: SKAT ρ = 1: Burden 31 / 46

32 Omnibus Tests Derivation of the Unified Test Statistics Model: g(µ i ) = X i α + G i β where β j /w j follows any arbitrary distribution with mean 0 and variance τ and the correlation among β j s is ρ. Special cases: SKAT: ρ = 0 Burden: ρ = 1 Combined: 0 ρ 1 32 / 46

33 Omnibus Tests Derivation of the Unified Test Statsitics Q ρ is a test statistic of the SKAT with corr(β) = R(ρ): R(ρ) = (1 ρ)i + ρ1 1 (compound symmetric) Kρ = GWR(ρ)WG. Q ρ = (y ˆµ) K ρ (y ˆµ) = (1 ρ)q SKAT + ρq Burden 33 / 46

34 Omnibus Tests Adaptive Test (SKAT-O) Use the smallest p-value from different ρs: T = inf P ρ. 0 ρ 1 where P ρ is the p-value of Q ρ for given ρ. Test statistic: T = minp ρb, 0 = ρ 1 <... < ρ B = / 46

35 Omnibus Tests Adaptive Test (SKAT-O) Q ρ is a mixture of two quadratic forms. Q ρ = (1 ρ)(y ˆµ) GWWG (y ˆµ) + ρ(y ˆµ) GW 1 1 WG (y ˆµ) = (1 ρ)(y ˆµ) K 1 (y ˆµ) + ρ(y ˆµ) K 2 (y ˆµ) Q ρ is asymptotically equivalent to (1 ρ)κ + a(ρ)η 0, where and η 0 χ 2 1, κ approximately follows a mixture of χ2. 35 / 46

36 Omnibus Tests SKAT-O Q ρ is the asymptotically same as the sum of two independent random variables. (1 ρ)κ + a(ρ)η 0 η 0 χ 2 1 Approximate κ via moments matching. P-value of T: 1 Pr {Q ρ1 < q ρ1 (T ),..., Q ρb < q ρb (T )} = 1 E [Pr {(1 ρ 1 )κ + a(ρ 1 )η 0 < q ρ1 (T ),... η 0 }] = 1 E [P {κ < min{(q ρv (T )) a(ρ v )η 0 )/(1 ρ v )} η 0 }], where q ρ (T ) = quantile function of Q ρ 36 / 46

37 Omnibus Tests Simulation Simulate sequencing data using COSI 3kb randomly selected regions. Percentages of causal variants = 10%, 20%, or 50%. (β j > 0)% among causal variants = 100% or 80%. Three methods Burden test with beta(1,25) weight SKAT SKAT-O 37 / 46

38 Omnibus Tests Simulation 38 / 46

39 Omnibus Tests Simulation SKAT is more powerful than Burden test (Collapsing) when Existence of +/ βs Small percentage of variants are causal variants Burden test is more powerful than SKAT when All βs were positive and a large proportion of variants were casual variants SKAT-O is robustly powerful under different scenarios. 39 / 46

40 Omnibus Tests Summary Region based tests can increase the power of rare variants analysis. Relative performance of rare variant tests depends on underlying disease models The combined test (omnibus test), e.g, SKAT-O, is robust and powerful in different scenarios 40 / 46

41 Weighting and Thresholding MAF based weighting It is generally assumed that rarer variants are more likely to be causal variants with larger effect sizes. Simple thresholding is widely used. w(maf j ) = { 1 if MAFj < c 0 if MAF j c 41 / 46

42 Weighting and Thresholding MAF based weighting Instead of thresholding, continuous weighting can be used to upweight rarer variants. Ex: Flexible beta density function. w(maf j ) = (MAF j ) α 1 (1 MAF j ) β 1 (α = 0.5, β = 0.5) : Madsen and Browning weight (α = 1, β = 1) : Flat weight 42 / 46

43 Weighting and Thresholding MAF based weighting- beta weight 43 / 46

44 Weighting and Thresholding MAF based weighting- logistic weight Soft-thresholding. w(maf j ) = exp((α maf j )β)/{1 + exp((α maf j )β} 44 / 46

45 Weighting and Thresholding Weighting Using Functional information Variants have different functionalities. Non-synonymous mutations (e.g. missense and nonsense mutations) change the amino-acid (AA) sequence. Synonymous mutations do not change AA sequence. 45 / 46

46 Weighting and Thresholding Weighting Using Functional information Bioinformatic tools to predict the functionality of mutations. Polyphen2 ( SIFT ( Test only functional mutations can increase the power. 46 / 46

Power and sample size calculations for designing rare variant sequencing association studies.

Power and sample size calculations for designing rare variant sequencing association studies. Power and sample size calculations for designing rare variant sequencing association studies. Seunggeun Lee 1, Michael C. Wu 2, Tianxi Cai 1, Yun Li 2,3, Michael Boehnke 4 and Xihong Lin 1 1 Department

More information

Review We have covered so far: Single variant association analysis and effect size estimation GxE interaction and higher order >2 interaction Measurement error in dietary variables (nutritional epidemiology)

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017 Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction

More information

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu June 24, 2014 Ian Barnett

More information

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu August 5, 2014 Ian Barnett

More information

Pooled Association Tests for Rare Genetic Variants: A Review and Some New Results

Pooled Association Tests for Rare Genetic Variants: A Review and Some New Results Statistical Science 2014, Vol. 29, No. 2, 302 321 DOI: 10.1214/13-STS456 c Institute of Mathematical Statistics, 2014 Pooled Association Tests for Rare Genetic Variants: A Review and Some New Results Andriy

More information

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure) Previous lecture Single variant association Use genome-wide SNPs to account for confounding (population substructure) Estimation of effect size and winner s curse Meta-Analysis Today s outline P-value

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials

Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials Biostatistics (2013), pp. 1 31 doi:10.1093/biostatistics/kxt006 Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials XINYI LIN, SEUNGGUEN

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing

Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota,

More information

An Adaptive Association Test for Microbiome Data

An Adaptive Association Test for Microbiome Data An Adaptive Association Test for Microbiome Data Chong Wu 1, Jun Chen 2, Junghi 1 Kim and Wei Pan 1 1 Division of Biostatistics, School of Public Health, University of Minnesota; 2 Division of Biomedical

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Simulating Uniform- and Triangular- Based Double Power Method Distributions Journal of Statistical and Econometric Methods, vol.6, no.1, 2017, 1-44 ISSN: 1792-6602 (print), 1792-6939 (online) Scienpress Ltd, 2017 Simulating Uniform- and Triangular- Based Double Power Method Distributions

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

Comparison of Statistical Tests for Disease Association with Rare Variants

Comparison of Statistical Tests for Disease Association with Rare Variants Comparison of Statistical Tests for Disease Association with Rare Variants Saonli Basu, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 November

More information

THE STATISTICAL ANALYSIS OF GENETIC SEQUENCING AND RARE VARIANT ASSOCIATION STUDIES. Eugene Urrutia

THE STATISTICAL ANALYSIS OF GENETIC SEQUENCING AND RARE VARIANT ASSOCIATION STUDIES. Eugene Urrutia THE STATISTICAL ANALYSIS OF GENETIC SEQUENCING AND RARE VARIANT ASSOCIATION STUDIES Eugene Urrutia A dissertation submitted to the faculty at the University of North Carolina at Chapel Hill in partial

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Testing for group differences in brain functional connectivity

Testing for group differences in brain functional connectivity Testing for group differences in brain functional connectivity Junghi Kim, Wei Pan, for ADNI Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Banff Feb

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Bootstrapping high dimensional vector: interplay between dependence and dimensionality Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang

More information

Methods for Cryptic Structure. Methods for Cryptic Structure

Methods for Cryptic Structure. Methods for Cryptic Structure Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases

More information

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,

More information

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015 Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 18: Introduction to covariates, the QQ plot, and population structure II + minimal GWAS steps Jason Mezey jgm45@cornell.edu April

More information

Quantitative trait evolution with mutations of large effect

Quantitative trait evolution with mutations of large effect Quantitative trait evolution with mutations of large effect May 1, 2014 Quantitative traits Traits that vary continuously in populations - Mass - Height - Bristle number (approx) Adaption - Low oxygen

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

Big Hypothesis Testing with Kernel Embeddings

Big Hypothesis Testing with Kernel Embeddings Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]

More information

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15.

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15. NIH Public Access Author Manuscript Published in final edited form as: Stat Sin. 2012 ; 22: 1041 1074. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION

More information

On Identifying Rare Variants for Complex Human Traits

On Identifying Rare Variants for Complex Human Traits On Identifying Rare Variants for Complex Human Traits Ruixue Fan Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

Group exponential penalties for bi-level variable selection

Group exponential penalties for bi-level variable selection for bi-level variable selection Department of Biostatistics Department of Statistics University of Kentucky July 31, 2011 Introduction In regression, variables can often be thought of as grouped: Indicator

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Multidimensional heritability analysis of neuroanatomical shape. Jingwei Li

Multidimensional heritability analysis of neuroanatomical shape. Jingwei Li Multidimensional heritability analysis of neuroanatomical shape Jingwei Li Brain Imaging Genetics Genetic Variation Behavior Cognition Neuroanatomy Brain Imaging Genetics Genetic Variation Neuroanatomy

More information

MATH5745 Multivariate Methods Lecture 07

MATH5745 Multivariate Methods Lecture 07 MATH5745 Multivariate Methods Lecture 07 Tests of hypothesis on covariance matrix March 16, 2018 MATH5745 Multivariate Methods Lecture 07 March 16, 2018 1 / 39 Test on covariance matrices: Introduction

More information

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 28 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University December 3, 2015 1 2 3 4 5 1 Familywise error rates 2 procedure 3 Performance of with multiple

More information

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES Hisashi Tanizaki Graduate School of Economics, Kobe University, Kobe 657-8501, Japan e-mail: tanizaki@kobe-u.ac.jp Abstract:

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

Vast Volatility Matrix Estimation for High Frequency Data

Vast Volatility Matrix Estimation for High Frequency Data Vast Volatility Matrix Estimation for High Frequency Data Yazhen Wang National Science Foundation Yale Workshop, May 14-17, 2009 Disclaimer: My opinion, not the views of NSF Y. Wang (at NSF) 1 / 36 Outline

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Backcross P 1 P 2 P 1 F 1 BC 4

More information

Frequency Estimation of Rare Events by Adaptive Thresholding

Frequency Estimation of Rare Events by Adaptive Thresholding Frequency Estimation of Rare Events by Adaptive Thresholding J. R. M. Hosking IBM Research Division 2009 IBM Corporation Motivation IBM Research When managing IT systems, there is a need to identify transactions

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Some New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models.

Some New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models. Some New Methods for Latent Variable Models and Survival Analysis Marie Davidian Department of Statistics North Carolina State University 1. Introduction Outline 3. Empirically checking latent-model robustness

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Econometric Methods for Panel Data

Econometric Methods for Panel Data Based on the books by Baltagi: Econometric Analysis of Panel Data and by Hsiao: Analysis of Panel Data Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Application of Variance Homogeneity Tests Under Violation of Normality Assumption Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

A Conditional Approach to Modeling Multivariate Extremes

A Conditional Approach to Modeling Multivariate Extremes A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying

More information

SEM for Categorical Outcomes

SEM for Categorical Outcomes This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Human vs mouse Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] www.daviddeen.com

More information

Feature Engineering, Model Evaluations

Feature Engineering, Model Evaluations Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 6, Issue 1 2007 Article 28 A Comparison of Methods to Control Type I Errors in Microarray Studies Jinsong Chen Mark J. van der Laan Martyn

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Test of Association between Two Ordinal Variables while Adjusting for Covariates

Test of Association between Two Ordinal Variables while Adjusting for Covariates Test of Association between Two Ordinal Variables while Adjusting for Covariates Chun Li, Bryan Shepherd Department of Biostatistics Vanderbilt University May 13, 2009 Examples Amblyopia http://www.medindia.net/

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode

More information

N Utilization of Nursing Research in Advanced Practice, Summer 2008

N Utilization of Nursing Research in Advanced Practice, Summer 2008 University of Michigan Deep Blue deepblue.lib.umich.edu 2008-07 536 - Utilization of ursing Research in Advanced Practice, Summer 2008 Tzeng, Huey-Ming Tzeng, H. (2008, ctober 1). Utilization of ursing

More information

Sliced Inverse Regression

Sliced Inverse Regression Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

Fractal functional regression for classification of gene expression data by wavelets

Fractal functional regression for classification of gene expression data by wavelets Fractal functional regression for classification of gene expression data by wavelets Margarita María Rincón 1 and María Dolores Ruiz-Medina 2 1 University of Granada Campus Fuente Nueva 18071 Granada,

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies. Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation

G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS 776 1 14 G-Estimation ( G-Estimation of Structural Nested Models 14) Outline 14.1 The causal question revisited 14.2 Exchangeability revisited

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

Lecture 32: Asymptotic confidence sets and likelihoods

Lecture 32: Asymptotic confidence sets and likelihoods Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence

More information

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 00 MODULE : Statistical Inference Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks. The

More information

1 Preliminary Variance component test in GLM Mediation Analysis... 3

1 Preliminary Variance component test in GLM Mediation Analysis... 3 Honglang Wang Depart. of Stat. & Prob. wangho16@msu.edu Omics Data Integration Statistical Genetics/Genomics Journal Club Summary and discussion of Joint Analysis of SNP and Gene Expression Data in Genetic

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

On Expected Gaussian Random Determinants

On Expected Gaussian Random Determinants On Expected Gaussian Random Determinants Moo K. Chung 1 Department of Statistics University of Wisconsin-Madison 1210 West Dayton St. Madison, WI 53706 Abstract The expectation of random determinants whose

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle   holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/35195 holds various files of this Leiden University dissertation Author: Balliu, Brunilda Title: Statistical methods for genetic association studies with

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions

Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions Nilanjan Chatterjee, Zeynep Kalaylioglu 2, Roxana Moslehi, Ulrike Peters 3, Sholom Wacholder

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Optimal exact tests for complex alternative hypotheses on cross tabulated data

Optimal exact tests for complex alternative hypotheses on cross tabulated data Optimal exact tests for complex alternative hypotheses on cross tabulated data Daniel Yekutieli Statistics and OR Tel Aviv University CDA course 29 July 2017 Yekutieli (TAU) Optimal exact tests for complex

More information

Exam: high-dimensional data analysis February 28, 2014

Exam: high-dimensional data analysis February 28, 2014 Exam: high-dimensional data analysis February 28, 2014 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question (not the subquestions) on a separate piece of paper.

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

Quasi-likelihood Scan Statistics for Detection of

Quasi-likelihood Scan Statistics for Detection of for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for

More information

Hierarchical Linear Models. Hierarchical Linear Models. Much of this material already seen in Chapters 5 and 14. Hyperprior on K parameters α:

Hierarchical Linear Models. Hierarchical Linear Models. Much of this material already seen in Chapters 5 and 14. Hyperprior on K parameters α: Hierarchical Linear Models Hierarchical Linear Models Much of this material already seen in Chapters 5 and 14 Hierarchical linear models combine regression framework with hierarchical framework Unified

More information

Sample Size and Power Considerations for Longitudinal Studies

Sample Size and Power Considerations for Longitudinal Studies Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous

More information