How to analyze many contingency tables simultaneously?

Size: px
Start display at page:

Download "How to analyze many contingency tables simultaneously?"

Transcription

1 How to analyze many contingency tables simultaneously? Thorsten Dickhaus Humboldt-Universität zu Berlin Beuth Hochschule für Technik Berlin,

2 Outline Motivation: Genetic association studies Statistical setup Refined statistical inference methods Real data example Reference: Dickhaus, T., Straßburger, K., Schunk, D., Morcillo, C., Illig, T., and Navarro, A. (2012): How to analyze many contingency tables simultaneously in genetic association studies. SAGMB 11, Article 12.

3 What is a SNP (single nucleotide polymorphism)? Bi-allelic SNPs: Exactly two possible alleles Locus i... M

4 What is a SNP (single nucleotide polymorphism)? Bi-allelic SNPs: Exactly two possible alleles Locus i... M Tom A A G T... A... G

5 What is a SNP (single nucleotide polymorphism)? Bi-allelic SNPs: Exactly two possible alleles Locus i... M Tom A A G T... A... G Andrew A A G C... A... C

6 What is a SNP (single nucleotide polymorphism)? Bi-allelic SNPs: Exactly two possible alleles Locus i... M Tom A A G T... A... G Andrew A A G C... A... C Rachel A A G C... G... G

7 What is a SNP (single nucleotide polymorphism)? Bi-allelic SNPs: Exactly two possible alleles Locus i... M Tom (m) A A G T... A... G Tom (p) A A G T... A... C Andrew A A G C... A... C Rachel A A G C... G... G

8 What is a SNP (single nucleotide polymorphism)? Bi-allelic SNPs: Exactly two possible alleles Locus i... M Tom (m) A A G T... A... G Tom (p) A A G T... A... C Andrew A A G C... A... C A A G C... G... C Rachel A A G C... G... G A A G T... G... G

9 Contingency table layout in association studies Assume a bi-allelic marker (SNP) at a particular locus and a binary phenotype of interest, e. g., a disease status. Genotype A 1 A 1 A 1 A 2 A 2 A 2 Σ Phenotype 1 x 1,1 x 1,2 x 1,3 n 1. Phenotype 0 x 2,1 x 2,2 x 2,3 n 2. Absolute count n.1 n.2 n.3 N In case of allelic tests: Genotype A 1 A 2 Σ Phenotype 1 x 1,1 x 1,2 n 1. Phenotype 0 x 2,1 x 2,2 n 2. Absolute count n.1 n.2 N

10 Formalized association test problem Multiple test problem with system of hypotheses H = (H j : 1 j M), where H j : Genotype j Phenotype with two-sided alternatives K j.

11 Formalized association test problem Multiple test problem with system of hypotheses H = (H j : 1 j M), where H j : Genotype j Phenotype with two-sided alternatives K j. Abbreviated notation (one particular position): n = (n 1., n 2., n.1, n.2, n.3 ) N 5 resp. n = (n 1., n 2., n.1, n.2 ) N 4, ( ) ( ) x11 x x = 12 x 13 N x 21 x 22 x 2 3 x11 x resp. x = 12 N 23 x 21 x In both cases, the probability of observing x given n is under the null given by n n f (x n) = n! N! x x x!.

12 Tests for association of marker and phenotype (i) Chi-squared test Q(x) = r (x rs e rs ) 2, where e rs = n r. n.s /N. s e rs Resulting exact (non-asymptotic) p-value: p Q (x) = x f ( x n), with summation over all x with marginals n such that Q( x) Q(x). (Local) level α test: ϕ Q (x) = 1 pq (x) α

13 Tests for association of marker and phenotype (ii) Tests of Fisher-type p Fisher (x) = x f ( x n), with summation over all x with marginals n such that f ( x n) f (x n). Corresponding level α test: ϕ Fisher (x) = 1 pfisher (x) α

14 Tests for association of marker and phenotype (ii) Tests of Fisher-type p Fisher (x) = x f ( x n), with summation over all x with marginals n such that f ( x n) f (x n). Corresponding level α test: ϕ Fisher (x) = 1 pfisher (x) α ϕ Q (x) and ϕ Fisher (x) keep the (local) significance level α conservatively for any sample size N. In other words: p Q (X) U and p Fisher (X) U under the null, U UNI[0, 1].

15 Estimating the proportion of informative SNPs (References: Schweder and Spjøtvoll (1982), Storey et al., 2004)

16 Estimating the proportion of informative SNPs (References: Schweder and Spjøtvoll (1982), Storey et al., 2004)

17 Estimating the proportion of informative SNPs (References: Schweder and Spjøtvoll (1982), Storey et al., 2004)

18 Caveat: Storey s method does not work for discrete p-values p Q (X) and p Fisher (X)

19 Discreteness: Realized randomized p-values Definition: Statistical model (Ω, A, (P ϑ ) ϑ Θ ) given Two-sided test problem H : {ϑ = ϑ 0 } versus K : {ϑ ϑ 0 } Discrete test statistic: X P ϑ with values in Ω U UNI[0, 1], stochastically independent of X A realized randomized p-value for testing H versus K is a measurable mapping p r : Ω [0, 1] [0, 1] with P ϑ0 (p r (X, U) t) = t for all t [0, 1].

20 Realized randomized p-values based on p Q (X) and p Fisher (X) Lemma: Based upon the chi-squared and Fisher-type testing strategies, corresponding realized randomized p-values can be calculated as p r Q(x, u) = p Q (x) u f ( x n), x:q( x)=q(x) p r Fisher(x, u) = p Fisher (x) uγf (x n), where u denotes the realization of U UNI[0, 1], stochastically independent of X and γ γ(x) = { x : f ( x n) = f (x n)}. We propose realized randomized p-values for estimating π 0. For final decision making, their non-randomized counterparts should be used (Reproducibility!).

21 Effective number of tests A thought experiment Assume markers indexed by I = {1,..., M} can be divided into disjoint groups with indices in subsets I g I, g {1,..., G}.

22 Effective number of tests A thought experiment Assume markers indexed by I = {1,..., M} can be divided into disjoint groups with indices in subsets I g I, g {1,..., G}. Let ϕ = (ϕ i, i I) and assume that for each g {1,..., G} and for any pair (i, j) I g the identity {ϕ i = 1} = {ϕ j = 1} holds. Then, effectively only one single test is performed in each subgroup.

23 Effective number of tests A thought experiment Assume markers indexed by I = {1,..., M} can be divided into disjoint groups with indices in subsets I g I, g {1,..., G}. Let ϕ = (ϕ i, i I) and assume that for each g {1,..., G} and for any pair (i, j) I g the identity {ϕ i = 1} = {ϕ j = 1} holds. Then, effectively only one single test is performed in each subgroup. Denoting i(g) = min I g for g = 1,..., G, it holds G G FWER ϑ (ϕ) = P ϑ P ϑ {ϕ i(g) = 1}. g=1 i I 0 I g {ϕ i = 1} Consequently, multiplicity correction in this extreme scenario only has to be done with respect to G << M. Bonferroni-type adjustment α/g would be valid! g=1

24 Effective number of tests Cheverud-Nyholt method and beyond M eff. = M M i=1 M (1 rij). 2 The numbers r ij are measures of correlation among markers i and j and can typically be obtained from linkage disequilibrium (LD) matrices. More sophisticated methods exist in the literature, e. g.: j=1 simplem by X. Gao et al. (2008) K eff. by Moskvina and Schmidt (2008) All rely on the correlation structure reflected by the r ij s.

25 Our proposed data analysis workflow 1. Compute realized randomized p-values p r (x j, u j ) and non-randomized versions p(x j ), j = 1,..., M. 2. Estimate the proportion π 0 of uninformative SNPs by ˆπ Determine the effective number of tests M eff. by utilizing correlation values obtained from an appropriate LD matrix of the M SNPs. 4. For a pre-defined FWER level α, determine the list of associated markers by performing the multiple test ϕ = (ϕ j, j = 1,..., M), where ϕ j (x j ) = 1 p(xj ) t with t = α/(m eff. ˆπ 0 ).

26 Real data example: Herder et al. (2008) Replication study Herder, C. et al. (2008). Variants of the PPARG, IGF2BP2, CDKAL1, HHEX, and TCF7L2 genes confer risk of type 2 diabetes independently of BMI in the German KORA studies. Horm. Metab. Res. 40, Data: M = 44 SNPs on ten different genes (N 1900 study participants) Results section:...(conservative) Bonferroni correction for 10 genes... Authors claim: Threshold t = for raw marginal p-values controls the FWER at α = 5%

27 Herder et al. (2008): Data re-analysis LD information: Taken from the HapMap project (population CEU ) Estimated effective number of tests: M eff. = K eff. = (Cheverud-Nyholt method), (Moskvina-Schmidt method). Estimated proportion of uninformative SNPs: ˆπ 0 = (Storey et al., 2004) Resulting threshold according to our method: t = α/(k eff. ˆπ 0 ) = α/( ) = α/7.604 = In conclusion: Our proposed method confirms the authors heuristic argumentation and endorses their scientific claims.

28 Future research goals Effective number of tests for continuous response Effective number of tests for FDR control Adaptive estimation of effective numbers of tests Statistical methodology for confirmatory functional studies (fmri data) Hierarchical multiple testing methods for (auto-) correlated data (time series)

COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics

COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics Thorsten Dickhaus University of Bremen Institute for Statistics AG DANK Herbsttagung

More information

Weierstraß-Institut. für Angewandte Analysis und Stochastik. Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN

Weierstraß-Institut. für Angewandte Analysis und Stochastik. Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN Weierstraß-Institut für Angewandte Analysis und Stochastik Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN 2198-5855 On an extended interpretation of linkage disequilibrium in genetic

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

Case-Control Association Testing. Case-Control Association Testing

Case-Control Association Testing. Case-Control Association Testing Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies

More information

p(d g A,g B )p(g B ), g B

p(d g A,g B )p(g B ), g B Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda 1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

1. Understand the methods for analyzing population structure in genomes

1. Understand the methods for analyzing population structure in genomes MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population

More information

Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study

Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study Rui Wang, Yong Li, XiaoFeng Wang, Haixu Tang and Xiaoyong Zhou Indiana University at Bloomington

More information

Asymptotic distribution of the largest eigenvalue with application to genetic data

Asymptotic distribution of the largest eigenvalue with application to genetic data Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene

More information

Bayesian Inference of Interactions and Associations

Bayesian Inference of Interactions and Associations Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,

More information

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015 Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.

More information

On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease

On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease Yuehua Cui 1 and Dong-Yun Kim 2 1 Department of Statistics and Probability, Michigan State University,

More information

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies. Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus

More information

Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies

Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National

More information

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,

More information

Linkage and Linkage Disequilibrium

Linkage and Linkage Disequilibrium Linkage and Linkage Disequilibrium Summer Institute in Statistical Genetics 2014 Module 10 Topic 3 Linkage in a simple genetic cross Linkage In the early 1900 s Bateson and Punnet conducted genetic studies

More information

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle   holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/35195 holds various files of this Leiden University dissertation Author: Balliu, Brunilda Title: Statistical methods for genetic association studies with

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

The Quantitative TDT

The Quantitative TDT The Quantitative TDT (Quantitative Transmission Disequilibrium Test) Warren J. Ewens NUS, Singapore 10 June, 2009 The initial aim of the (QUALITATIVE) TDT was to test for linkage between a marker locus

More information

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs with Remarks on Family Selection Dissertation Defense April 5, 204 Contents Dissertation Defense Introduction 2 FWER Control within

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

Some models of genomic selection

Some models of genomic selection Munich, December 2013 What is the talk about? Barley! Steptoe x Morex barley mapping population Steptoe x Morex barley mapping population genotyping from Close at al., 2009 and phenotyping from cite http://wheat.pw.usda.gov/ggpages/sxm/

More information

Looking at the Other Side of Bonferroni

Looking at the Other Side of Bonferroni Department of Biostatistics University of Washington 24 May 2012 Multiple Testing: Control the Type I Error Rate When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis

More information

Power and sample size calculations for designing rare variant sequencing association studies.

Power and sample size calculations for designing rare variant sequencing association studies. Power and sample size calculations for designing rare variant sequencing association studies. Seunggeun Lee 1, Michael C. Wu 2, Tianxi Cai 1, Yun Li 2,3, Michael Boehnke 4 and Xihong Lin 1 1 Department

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

Enhancing eqtl Analysis Techniques with Special Attention to the Transcript Dependency Structure

Enhancing eqtl Analysis Techniques with Special Attention to the Transcript Dependency Structure Enhancing eqtl Analysis Techniques with Special Attention to the Transcript Dependency Structure by John C. Schwarz A dissertation submitted to the faculty of the University of North Carolina at Chapel

More information

Statistical Power of Model Selection Strategies for Genome-Wide Association Studies

Statistical Power of Model Selection Strategies for Genome-Wide Association Studies Statistical Power of Model Selection Strategies for Genome-Wide Association Studies Zheyang Wu 1, Hongyu Zhao 1,2 * 1 Department of Epidemiology and Public Health, Yale University School of Medicine, New

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Affected Sibling Pairs. Biostatistics 666

Affected Sibling Pairs. Biostatistics 666 Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD

More information

A TUTORIAL ON THE INHERITANCE PROCEDURE FOR MULTIPLE TESTING OF TREE-STRUCTURED HYPOTHESES

A TUTORIAL ON THE INHERITANCE PROCEDURE FOR MULTIPLE TESTING OF TREE-STRUCTURED HYPOTHESES A TUTORIAL ON THE INHERITANCE PROCEDURE FOR MULTIPLE TESTING OF TREE-STRUCTURED HYPOTHESES by Dilinuer Kuerban B.Sc. (Statistics), Southwestern University of Finance & Economics, 2011 a Project submitted

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions

Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions Nilanjan Chatterjee, Zeynep Kalaylioglu 2, Roxana Moslehi, Ulrike Peters 3, Sholom Wacholder

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

Genetic Association Studies in the Presence of Population Structure and Admixture

Genetic Association Studies in the Presence of Population Structure and Admixture Genetic Association Studies in the Presence of Population Structure and Admixture Purushottam W. Laud and Nicholas M. Pajewski Division of Biostatistics Department of Population Health Medical College

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs.

Nature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs. Supplementary Figure 1 Number of cases and proxy cases required to detect association at designs. = 5 10 8 for case control and proxy case control The ratio of controls to cases (or proxy cases) is 1.

More information

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam Banerjee November 26, 2007 Genetic Polymorphism Single nucleotide polymorphism (SNP) Genetic Polymorphism

More information

Introduction to Linkage Disequilibrium

Introduction to Linkage Disequilibrium Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have

More information

I Have the Power in QTL linkage: single and multilocus analysis

I Have the Power in QTL linkage: single and multilocus analysis I Have the Power in QTL linkage: single and multilocus analysis Benjamin Neale 1, Sir Shaun Purcell 2 & Pak Sham 13 1 SGDP, IoP, London, UK 2 Harvard School of Public Health, Cambridge, MA, USA 3 Department

More information

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17 Modeling IBD for Pairs of Relatives Biostatistics 666 Lecture 7 Previously Linkage Analysis of Relative Pairs IBS Methods Compare observed and expected sharing IBD Methods Account for frequency of shared

More information

Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni based closed tests

Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni based closed tests Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni based closed tests K. Strassburger 1, F. Bretz 2 1 Institute of Biometrics & Epidemiology German Diabetes Center,

More information

Significant Pattern Mining

Significant Pattern Mining Department Biosystems Significant Pattern Mining Karsten Borgwardt ETH Zürich Uni Basel, April 21, 2016 Biomarker Discovery Department Biosystems Karsten Borgwardt Seminar Basel April 21, 2016 2 / 41 Department

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo Friday Harbor 2017 From Genetics to GWAS (Genome-wide Association Study) Sept 7 2017 David Fardo Purpose: prepare for tomorrow s tutorial Genetic Variants Quality Control Imputation Association Visualization

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS

EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS Statistica Sinica 19 (2009), 125-143 EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS Debashis Ghosh Penn State University Abstract: There is much recent interest

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

Lecture WS Evolutionary Genetics Part I 1

Lecture WS Evolutionary Genetics Part I 1 Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8 The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as

More information

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15.

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15. NIH Public Access Author Manuscript Published in final edited form as: Stat Sin. 2012 ; 22: 1041 1074. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION

More information

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018 High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously

More information

Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials

Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials Biostatistics (2013), pp. 1 31 doi:10.1093/biostatistics/kxt006 Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials XINYI LIN, SEUNGGUEN

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Hunting for significance with multiple testing

Hunting for significance with multiple testing Hunting for significance with multiple testing Etienne Roquain 1 1 Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France Séminaire MODAL X, 19 mai 216 Etienne Roquain Hunting for significance

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities

More information

Methods for Cryptic Structure. Methods for Cryptic Structure

Methods for Cryptic Structure. Methods for Cryptic Structure Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Applying the Benjamini Hochberg procedure to a set of generalized p-values U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) 12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²

More information

Régression en grande dimension et épistasie par blocs pour les études d association

Régression en grande dimension et épistasie par blocs pour les études d association Régression en grande dimension et épistasie par blocs pour les études d association V. Stanislas, C. Dalmasso, C. Ambroise Laboratoire de Mathématiques et Modélisation d Évry "Statistique et Génome" 1

More information

On coding genotypes for genetic markers with multiple alleles in genetic association study of quantitative traits

On coding genotypes for genetic markers with multiple alleles in genetic association study of quantitative traits Wang BMC Genetics 011, 1:8 http://www.biomedcentral.com/171-156/1/8 METHODOLOGY ARTICLE Open Access On coding genotypes for genetic markers with multiple alleles in genetic association study of quantitative

More information

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Lee H. Dicker Rutgers University and Amazon, NYC Based on joint work with Ruijun Ma (Rutgers),

More information

Humans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase

Humans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase Humans have two copies of each chromosome Inherited from mother and father. Genotyping technologies do not maintain the phase Genotyping technologies do not maintain the phase Recall that proximal SNPs

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

Visualizing Population Genetics

Visualizing Population Genetics Visualizing Population Genetics Supervisors: Rachel Fewster, James Russell, Paul Murrell Louise McMillan 1 December 2015 Louise McMillan Visualizing Population Genetics 1 December 2015 1 / 29 Outline 1

More information

Efficient Algorithms for Detecting Genetic Interactions in Genome-Wide Association Study

Efficient Algorithms for Detecting Genetic Interactions in Genome-Wide Association Study Efficient Algorithms for Detecting Genetic Interactions in Genome-Wide Association Study Xiang Zhang A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial

More information

Evolution of quantitative traits

Evolution of quantitative traits Evolution of quantitative traits Introduction Let s stop and review quickly where we ve come and where we re going We started our survey of quantitative genetics by pointing out that our objective was

More information

Stationary Distribution of the Linkage Disequilibrium Coefficient r 2

Stationary Distribution of the Linkage Disequilibrium Coefficient r 2 Stationary Distribution of the Linkage Disequilibrium Coefficient r 2 Wei Zhang, Jing Liu, Rachel Fewster and Jesse Goodman Department of Statistics, The University of Auckland December 1, 2015 Overview

More information

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.

More information

Goodness of Fit Goodness of fit - 2 classes

Goodness of Fit Goodness of fit - 2 classes Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence

More information

Topic 3: Hypothesis Testing

Topic 3: Hypothesis Testing CS 8850: Advanced Machine Learning Fall 07 Topic 3: Hypothesis Testing Instructor: Daniel L. Pimentel-Alarcón c Copyright 07 3. Introduction One of the simplest inference problems is that of deciding between

More information

Variance Component Models for Quantitative Traits. Biostatistics 666

Variance Component Models for Quantitative Traits. Biostatistics 666 Variance Component Models for Quantitative Traits Biostatistics 666 Today Analysis of quantitative traits Modeling covariance for pairs of individuals estimating heritability Extending the model beyond

More information

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) BIRS 016 1 HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) Malka Gorfine, Tel Aviv University, Israel Joint work with Li Hsu, FHCRC, Seattle, USA BIRS 016 The concept of heritability

More information

Private Computation with Genomic Data for Genome-Wide Association and Linkage Studies

Private Computation with Genomic Data for Genome-Wide Association and Linkage Studies Private Computation with Genomic Data for Genome-Wide Association and Linkage Studies Abstract Ali Shahbazi 1, Fattaneh Bayatbabolghani 1, and Marina Blanton 2 1 Department of Computer Science and Engineering,

More information

Lecture 6 April

Lecture 6 April Stats 300C: Theory of Statistics Spring 2017 Lecture 6 April 14 2017 Prof. Emmanuel Candes Scribe: S. Wager, E. Candes 1 Outline Agenda: From global testing to multiple testing 1. Testing the global null

More information

Multiple point hypothesis test problems and effective numbers of tests

Multiple point hypothesis test problems and effective numbers of tests SFB 649 Discussion Paper 2012-041 Multiple point hypothesis test problems and effective numbers of tests Thorsten Dickhaus* Jens Stange* * Humboldt-Universität zu Berlin, Germany SFB 6 4 9 E C O N O M

More information

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012 Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012 Last Time Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based

More information

Computational Approaches to Statistical Genetics

Computational Approaches to Statistical Genetics Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

2. Map genetic distance between markers

2. Map genetic distance between markers Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,

More information

Statistical Methods in Mapping Complex Diseases

Statistical Methods in Mapping Complex Diseases University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Summer 8-12-2011 Statistical Methods in Mapping Complex Diseases Jing He University of Pennsylvania, jinghe@mail.med.upenn.edu

More information

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors The Multiple Testing Problem Multiple Testing Methods for the Analysis of Microarray Data 3/9/2009 Copyright 2009 Dan Nettleton Suppose one test of interest has been conducted for each of m genes in a

More information

the long tau-path for detecting monotone association in an unspecified subpopulation

the long tau-path for detecting monotone association in an unspecified subpopulation the long tau-path for detecting monotone association in an unspecified subpopulation Joe Verducci Current Challenges in Statistical Learning Workshop Banff International Research Station Tuesday, December

More information

Question: If mating occurs at random in the population, what will the frequencies of A 1 and A 2 be in the next generation?

Question: If mating occurs at random in the population, what will the frequencies of A 1 and A 2 be in the next generation? October 12, 2009 Bioe 109 Fall 2009 Lecture 8 Microevolution 1 - selection The Hardy-Weinberg-Castle Equilibrium - consider a single locus with two alleles A 1 and A 2. - three genotypes are thus possible:

More information

Breeding Values and Inbreeding. Breeding Values and Inbreeding

Breeding Values and Inbreeding. Breeding Values and Inbreeding Breeding Values and Inbreeding Genotypic Values For the bi-allelic single locus case, we previously defined the mean genotypic (or equivalently the mean phenotypic values) to be a if genotype is A 2 A

More information

Principal-Agent Games - Equilibria under Asymmetric Information -

Principal-Agent Games - Equilibria under Asymmetric Information - Principal-Agent Games - Equilibria under Asymmetric Information - Ulrich Horst 1 Humboldt-Universität zu Berlin Department of Mathematics and School of Business and Economics Work in progress - Comments

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

High-dimensional statistics, with applications to genome-wide association studies

High-dimensional statistics, with applications to genome-wide association studies EMS Surv. Math. Sci. x (201x), xxx xxx DOI 10.4171/EMSS/x EMS Surveys in Mathematical Sciences c European Mathematical Society High-dimensional statistics, with applications to genome-wide association

More information