How to analyze many contingency tables simultaneously?
|
|
- Archibald Mason
- 5 years ago
- Views:
Transcription
1 How to analyze many contingency tables simultaneously? Thorsten Dickhaus Humboldt-Universität zu Berlin Beuth Hochschule für Technik Berlin,
2 Outline Motivation: Genetic association studies Statistical setup Refined statistical inference methods Real data example Reference: Dickhaus, T., Straßburger, K., Schunk, D., Morcillo, C., Illig, T., and Navarro, A. (2012): How to analyze many contingency tables simultaneously in genetic association studies. SAGMB 11, Article 12.
3 What is a SNP (single nucleotide polymorphism)? Bi-allelic SNPs: Exactly two possible alleles Locus i... M
4 What is a SNP (single nucleotide polymorphism)? Bi-allelic SNPs: Exactly two possible alleles Locus i... M Tom A A G T... A... G
5 What is a SNP (single nucleotide polymorphism)? Bi-allelic SNPs: Exactly two possible alleles Locus i... M Tom A A G T... A... G Andrew A A G C... A... C
6 What is a SNP (single nucleotide polymorphism)? Bi-allelic SNPs: Exactly two possible alleles Locus i... M Tom A A G T... A... G Andrew A A G C... A... C Rachel A A G C... G... G
7 What is a SNP (single nucleotide polymorphism)? Bi-allelic SNPs: Exactly two possible alleles Locus i... M Tom (m) A A G T... A... G Tom (p) A A G T... A... C Andrew A A G C... A... C Rachel A A G C... G... G
8 What is a SNP (single nucleotide polymorphism)? Bi-allelic SNPs: Exactly two possible alleles Locus i... M Tom (m) A A G T... A... G Tom (p) A A G T... A... C Andrew A A G C... A... C A A G C... G... C Rachel A A G C... G... G A A G T... G... G
9 Contingency table layout in association studies Assume a bi-allelic marker (SNP) at a particular locus and a binary phenotype of interest, e. g., a disease status. Genotype A 1 A 1 A 1 A 2 A 2 A 2 Σ Phenotype 1 x 1,1 x 1,2 x 1,3 n 1. Phenotype 0 x 2,1 x 2,2 x 2,3 n 2. Absolute count n.1 n.2 n.3 N In case of allelic tests: Genotype A 1 A 2 Σ Phenotype 1 x 1,1 x 1,2 n 1. Phenotype 0 x 2,1 x 2,2 n 2. Absolute count n.1 n.2 N
10 Formalized association test problem Multiple test problem with system of hypotheses H = (H j : 1 j M), where H j : Genotype j Phenotype with two-sided alternatives K j.
11 Formalized association test problem Multiple test problem with system of hypotheses H = (H j : 1 j M), where H j : Genotype j Phenotype with two-sided alternatives K j. Abbreviated notation (one particular position): n = (n 1., n 2., n.1, n.2, n.3 ) N 5 resp. n = (n 1., n 2., n.1, n.2 ) N 4, ( ) ( ) x11 x x = 12 x 13 N x 21 x 22 x 2 3 x11 x resp. x = 12 N 23 x 21 x In both cases, the probability of observing x given n is under the null given by n n f (x n) = n! N! x x x!.
12 Tests for association of marker and phenotype (i) Chi-squared test Q(x) = r (x rs e rs ) 2, where e rs = n r. n.s /N. s e rs Resulting exact (non-asymptotic) p-value: p Q (x) = x f ( x n), with summation over all x with marginals n such that Q( x) Q(x). (Local) level α test: ϕ Q (x) = 1 pq (x) α
13 Tests for association of marker and phenotype (ii) Tests of Fisher-type p Fisher (x) = x f ( x n), with summation over all x with marginals n such that f ( x n) f (x n). Corresponding level α test: ϕ Fisher (x) = 1 pfisher (x) α
14 Tests for association of marker and phenotype (ii) Tests of Fisher-type p Fisher (x) = x f ( x n), with summation over all x with marginals n such that f ( x n) f (x n). Corresponding level α test: ϕ Fisher (x) = 1 pfisher (x) α ϕ Q (x) and ϕ Fisher (x) keep the (local) significance level α conservatively for any sample size N. In other words: p Q (X) U and p Fisher (X) U under the null, U UNI[0, 1].
15 Estimating the proportion of informative SNPs (References: Schweder and Spjøtvoll (1982), Storey et al., 2004)
16 Estimating the proportion of informative SNPs (References: Schweder and Spjøtvoll (1982), Storey et al., 2004)
17 Estimating the proportion of informative SNPs (References: Schweder and Spjøtvoll (1982), Storey et al., 2004)
18 Caveat: Storey s method does not work for discrete p-values p Q (X) and p Fisher (X)
19 Discreteness: Realized randomized p-values Definition: Statistical model (Ω, A, (P ϑ ) ϑ Θ ) given Two-sided test problem H : {ϑ = ϑ 0 } versus K : {ϑ ϑ 0 } Discrete test statistic: X P ϑ with values in Ω U UNI[0, 1], stochastically independent of X A realized randomized p-value for testing H versus K is a measurable mapping p r : Ω [0, 1] [0, 1] with P ϑ0 (p r (X, U) t) = t for all t [0, 1].
20 Realized randomized p-values based on p Q (X) and p Fisher (X) Lemma: Based upon the chi-squared and Fisher-type testing strategies, corresponding realized randomized p-values can be calculated as p r Q(x, u) = p Q (x) u f ( x n), x:q( x)=q(x) p r Fisher(x, u) = p Fisher (x) uγf (x n), where u denotes the realization of U UNI[0, 1], stochastically independent of X and γ γ(x) = { x : f ( x n) = f (x n)}. We propose realized randomized p-values for estimating π 0. For final decision making, their non-randomized counterparts should be used (Reproducibility!).
21 Effective number of tests A thought experiment Assume markers indexed by I = {1,..., M} can be divided into disjoint groups with indices in subsets I g I, g {1,..., G}.
22 Effective number of tests A thought experiment Assume markers indexed by I = {1,..., M} can be divided into disjoint groups with indices in subsets I g I, g {1,..., G}. Let ϕ = (ϕ i, i I) and assume that for each g {1,..., G} and for any pair (i, j) I g the identity {ϕ i = 1} = {ϕ j = 1} holds. Then, effectively only one single test is performed in each subgroup.
23 Effective number of tests A thought experiment Assume markers indexed by I = {1,..., M} can be divided into disjoint groups with indices in subsets I g I, g {1,..., G}. Let ϕ = (ϕ i, i I) and assume that for each g {1,..., G} and for any pair (i, j) I g the identity {ϕ i = 1} = {ϕ j = 1} holds. Then, effectively only one single test is performed in each subgroup. Denoting i(g) = min I g for g = 1,..., G, it holds G G FWER ϑ (ϕ) = P ϑ P ϑ {ϕ i(g) = 1}. g=1 i I 0 I g {ϕ i = 1} Consequently, multiplicity correction in this extreme scenario only has to be done with respect to G << M. Bonferroni-type adjustment α/g would be valid! g=1
24 Effective number of tests Cheverud-Nyholt method and beyond M eff. = M M i=1 M (1 rij). 2 The numbers r ij are measures of correlation among markers i and j and can typically be obtained from linkage disequilibrium (LD) matrices. More sophisticated methods exist in the literature, e. g.: j=1 simplem by X. Gao et al. (2008) K eff. by Moskvina and Schmidt (2008) All rely on the correlation structure reflected by the r ij s.
25 Our proposed data analysis workflow 1. Compute realized randomized p-values p r (x j, u j ) and non-randomized versions p(x j ), j = 1,..., M. 2. Estimate the proportion π 0 of uninformative SNPs by ˆπ Determine the effective number of tests M eff. by utilizing correlation values obtained from an appropriate LD matrix of the M SNPs. 4. For a pre-defined FWER level α, determine the list of associated markers by performing the multiple test ϕ = (ϕ j, j = 1,..., M), where ϕ j (x j ) = 1 p(xj ) t with t = α/(m eff. ˆπ 0 ).
26 Real data example: Herder et al. (2008) Replication study Herder, C. et al. (2008). Variants of the PPARG, IGF2BP2, CDKAL1, HHEX, and TCF7L2 genes confer risk of type 2 diabetes independently of BMI in the German KORA studies. Horm. Metab. Res. 40, Data: M = 44 SNPs on ten different genes (N 1900 study participants) Results section:...(conservative) Bonferroni correction for 10 genes... Authors claim: Threshold t = for raw marginal p-values controls the FWER at α = 5%
27 Herder et al. (2008): Data re-analysis LD information: Taken from the HapMap project (population CEU ) Estimated effective number of tests: M eff. = K eff. = (Cheverud-Nyholt method), (Moskvina-Schmidt method). Estimated proportion of uninformative SNPs: ˆπ 0 = (Storey et al., 2004) Resulting threshold according to our method: t = α/(k eff. ˆπ 0 ) = α/( ) = α/7.604 = In conclusion: Our proposed method confirms the authors heuristic argumentation and endorses their scientific claims.
28 Future research goals Effective number of tests for continuous response Effective number of tests for FDR control Adaptive estimation of effective numbers of tests Statistical methodology for confirmatory functional studies (fmri data) Hierarchical multiple testing methods for (auto-) correlated data (time series)
COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics
COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics Thorsten Dickhaus University of Bremen Institute for Statistics AG DANK Herbsttagung
More informationWeierstraß-Institut. für Angewandte Analysis und Stochastik. Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN
Weierstraß-Institut für Angewandte Analysis und Stochastik Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN 2198-5855 On an extended interpretation of linkage disequilibrium in genetic
More information1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics
1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
More informationCase-Control Association Testing. Case-Control Association Testing
Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies
More informationp(d g A,g B )p(g B ), g B
Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)
More informationLecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017
Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping
More informationProportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power
Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary
More informationPopulation Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda
1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;
More informationAssociation Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5
Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative
More information1. Understand the methods for analyzing population structure in genomes
MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population
More informationLearning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study
Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study Rui Wang, Yong Li, XiaoFeng Wang, Haixu Tang and Xiaoyong Zhou Indiana University at Bloomington
More informationAsymptotic distribution of the largest eigenvalue with application to genetic data
Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene
More informationBayesian Inference of Interactions and Associations
Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,
More informationLecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015
Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.
More informationOn the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease
On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease Yuehua Cui 1 and Dong-Yun Kim 2 1 Department of Statistics and Probability, Michigan State University,
More informationTheoretical and computational aspects of association tests: application in case-control genome-wide association studies.
Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus
More informationProbability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies
Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National
More informationBinomial Mixture Model-based Association Tests under Genetic Heterogeneity
Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,
More informationLinkage and Linkage Disequilibrium
Linkage and Linkage Disequilibrium Summer Institute in Statistical Genetics 2014 Module 10 Topic 3 Linkage in a simple genetic cross Linkage In the early 1900 s Bateson and Punnet conducted genetic studies
More informationMODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES
MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by
More informationCover Page. The handle holds various files of this Leiden University dissertation
Cover Page The handle http://hdl.handle.net/1887/35195 holds various files of this Leiden University dissertation Author: Balliu, Brunilda Title: Statistical methods for genetic association studies with
More informationMarginal Screening and Post-Selection Inference
Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More informationThe Quantitative TDT
The Quantitative TDT (Quantitative Transmission Disequilibrium Test) Warren J. Ewens NUS, Singapore 10 June, 2009 The initial aim of the (QUALITATIVE) TDT was to test for linkage between a marker locus
More informationFamily-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs
Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs with Remarks on Family Selection Dissertation Defense April 5, 204 Contents Dissertation Defense Introduction 2 FWER Control within
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationSummary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing
Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper
More informationSome models of genomic selection
Munich, December 2013 What is the talk about? Barley! Steptoe x Morex barley mapping population Steptoe x Morex barley mapping population genotyping from Close at al., 2009 and phenotyping from cite http://wheat.pw.usda.gov/ggpages/sxm/
More informationLooking at the Other Side of Bonferroni
Department of Biostatistics University of Washington 24 May 2012 Multiple Testing: Control the Type I Error Rate When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis
More informationPower and sample size calculations for designing rare variant sequencing association studies.
Power and sample size calculations for designing rare variant sequencing association studies. Seunggeun Lee 1, Michael C. Wu 2, Tianxi Cai 1, Yun Li 2,3, Michael Boehnke 4 and Xihong Lin 1 1 Department
More information(Genome-wide) association analysis
(Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by
More informationEnhancing eqtl Analysis Techniques with Special Attention to the Transcript Dependency Structure
Enhancing eqtl Analysis Techniques with Special Attention to the Transcript Dependency Structure by John C. Schwarz A dissertation submitted to the faculty of the University of North Carolina at Chapel
More informationStatistical Power of Model Selection Strategies for Genome-Wide Association Studies
Statistical Power of Model Selection Strategies for Genome-Wide Association Studies Zheyang Wu 1, Hongyu Zhao 1,2 * 1 Department of Epidemiology and Public Health, Yale University School of Medicine, New
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationAffected Sibling Pairs. Biostatistics 666
Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD
More informationA TUTORIAL ON THE INHERITANCE PROCEDURE FOR MULTIPLE TESTING OF TREE-STRUCTURED HYPOTHESES
A TUTORIAL ON THE INHERITANCE PROCEDURE FOR MULTIPLE TESTING OF TREE-STRUCTURED HYPOTHESES by Dilinuer Kuerban B.Sc. (Statistics), Southwestern University of Finance & Economics, 2011 a Project submitted
More informationBTRY 7210: Topics in Quantitative Genomics and Genetics
BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:
More informationLecture 9. QTL Mapping 2: Outbred Populations
Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred
More informationSupplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control
Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model
More informationPowerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions
Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions Nilanjan Chatterjee, Zeynep Kalaylioglu 2, Roxana Moslehi, Ulrike Peters 3, Sholom Wacholder
More informationControlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method
Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman
More informationBTRY 4830/6830: Quantitative Genomics and Genetics
BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements
More informationGenotype Imputation. Biostatistics 666
Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives
More informationGenetic Association Studies in the Presence of Population Structure and Admixture
Genetic Association Studies in the Presence of Population Structure and Admixture Purushottam W. Laud and Nicholas M. Pajewski Division of Biostatistics Department of Population Health Medical College
More informationNature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs.
Supplementary Figure 1 Number of cases and proxy cases required to detect association at designs. = 5 10 8 for case control and proxy case control The ratio of controls to cases (or proxy cases) is 1.
More informationCSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation
CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam Banerjee November 26, 2007 Genetic Polymorphism Single nucleotide polymorphism (SNP) Genetic Polymorphism
More informationIntroduction to Linkage Disequilibrium
Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have
More informationI Have the Power in QTL linkage: single and multilocus analysis
I Have the Power in QTL linkage: single and multilocus analysis Benjamin Neale 1, Sir Shaun Purcell 2 & Pak Sham 13 1 SGDP, IoP, London, UK 2 Harvard School of Public Health, Cambridge, MA, USA 3 Department
More informationModeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17
Modeling IBD for Pairs of Relatives Biostatistics 666 Lecture 7 Previously Linkage Analysis of Relative Pairs IBS Methods Compare observed and expected sharing IBD Methods Account for frequency of shared
More informationCompatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni based closed tests
Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni based closed tests K. Strassburger 1, F. Bretz 2 1 Institute of Biometrics & Epidemiology German Diabetes Center,
More informationSignificant Pattern Mining
Department Biosystems Significant Pattern Mining Karsten Borgwardt ETH Zürich Uni Basel, April 21, 2016 Biomarker Discovery Department Biosystems Karsten Borgwardt Seminar Basel April 21, 2016 2 / 41 Department
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationFriday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo
Friday Harbor 2017 From Genetics to GWAS (Genome-wide Association Study) Sept 7 2017 David Fardo Purpose: prepare for tomorrow s tutorial Genetic Variants Quality Control Imputation Association Visualization
More informationFalse discovery rate and related concepts in multiple comparisons problems, with applications to microarray data
False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using
More informationEMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS
Statistica Sinica 19 (2009), 125-143 EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS Debashis Ghosh Penn State University Abstract: There is much recent interest
More informationBayesian Regression (1/31/13)
STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed
More informationLecture WS Evolutionary Genetics Part I 1
Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in
More informationLet us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided
Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or
More informationThe E-M Algorithm in Genetics. Biostatistics 666 Lecture 8
The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as
More informationNIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15.
NIH Public Access Author Manuscript Published in final edited form as: Stat Sin. 2012 ; 22: 1041 1074. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION
More informationHigh-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018
High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously
More informationTest for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials
Biostatistics (2013), pp. 1 31 doi:10.1093/biostatistics/kxt006 Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials XINYI LIN, SEUNGGUEN
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca
More informationHunting for significance with multiple testing
Hunting for significance with multiple testing Etienne Roquain 1 1 Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France Séminaire MODAL X, 19 mai 216 Etienne Roquain Hunting for significance
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities
More informationMethods for Cryptic Structure. Methods for Cryptic Structure
Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationApplying the Benjamini Hochberg procedure to a set of generalized p-values
U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,
More informationStatistical testing. Samantha Kleinberg. October 20, 2009
October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find
More informationChapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)
12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²
More informationRégression en grande dimension et épistasie par blocs pour les études d association
Régression en grande dimension et épistasie par blocs pour les études d association V. Stanislas, C. Dalmasso, C. Ambroise Laboratoire de Mathématiques et Modélisation d Évry "Statistique et Génome" 1
More informationOn coding genotypes for genetic markers with multiple alleles in genetic association study of quantitative traits
Wang BMC Genetics 011, 1:8 http://www.biomedcentral.com/171-156/1/8 METHODOLOGY ARTICLE Open Access On coding genotypes for genetic markers with multiple alleles in genetic association study of quantitative
More informationHeritability estimation in modern genetics and connections to some new results for quadratic forms in statistics
Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Lee H. Dicker Rutgers University and Amazon, NYC Based on joint work with Ruijun Ma (Rutgers),
More informationHumans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase
Humans have two copies of each chromosome Inherited from mother and father. Genotyping technologies do not maintain the phase Genotyping technologies do not maintain the phase Recall that proximal SNPs
More informationNon-specific filtering and control of false positives
Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview
More informationVisualizing Population Genetics
Visualizing Population Genetics Supervisors: Rachel Fewster, James Russell, Paul Murrell Louise McMillan 1 December 2015 Louise McMillan Visualizing Population Genetics 1 December 2015 1 / 29 Outline 1
More informationEfficient Algorithms for Detecting Genetic Interactions in Genome-Wide Association Study
Efficient Algorithms for Detecting Genetic Interactions in Genome-Wide Association Study Xiang Zhang A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial
More informationEvolution of quantitative traits
Evolution of quantitative traits Introduction Let s stop and review quickly where we ve come and where we re going We started our survey of quantitative genetics by pointing out that our objective was
More informationStationary Distribution of the Linkage Disequilibrium Coefficient r 2
Stationary Distribution of the Linkage Disequilibrium Coefficient r 2 Wei Zhang, Jing Liu, Rachel Fewster and Jesse Goodman Department of Statistics, The University of Auckland December 1, 2015 Overview
More informationExpression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia
Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.
More informationGoodness of Fit Goodness of fit - 2 classes
Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence
More informationTopic 3: Hypothesis Testing
CS 8850: Advanced Machine Learning Fall 07 Topic 3: Hypothesis Testing Instructor: Daniel L. Pimentel-Alarcón c Copyright 07 3. Introduction One of the simplest inference problems is that of deciding between
More informationVariance Component Models for Quantitative Traits. Biostatistics 666
Variance Component Models for Quantitative Traits Biostatistics 666 Today Analysis of quantitative traits Modeling covariance for pairs of individuals estimating heritability Extending the model beyond
More informationHERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)
BIRS 016 1 HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) Malka Gorfine, Tel Aviv University, Israel Joint work with Li Hsu, FHCRC, Seattle, USA BIRS 016 The concept of heritability
More informationPrivate Computation with Genomic Data for Genome-Wide Association and Linkage Studies
Private Computation with Genomic Data for Genome-Wide Association and Linkage Studies Abstract Ali Shahbazi 1, Fattaneh Bayatbabolghani 1, and Marina Blanton 2 1 Department of Computer Science and Engineering,
More informationLecture 6 April
Stats 300C: Theory of Statistics Spring 2017 Lecture 6 April 14 2017 Prof. Emmanuel Candes Scribe: S. Wager, E. Candes 1 Outline Agenda: From global testing to multiple testing 1. Testing the global null
More informationMultiple point hypothesis test problems and effective numbers of tests
SFB 649 Discussion Paper 2012-041 Multiple point hypothesis test problems and effective numbers of tests Thorsten Dickhaus* Jens Stange* * Humboldt-Universität zu Berlin, Germany SFB 6 4 9 E C O N O M
More informationLecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012
Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012 Last Time Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based
More informationComputational Approaches to Statistical Genetics
Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen
More informationBustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #
Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either
More information2. Map genetic distance between markers
Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,
More informationStatistical Methods in Mapping Complex Diseases
University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Summer 8-12-2011 Statistical Methods in Mapping Complex Diseases Jing He University of Pennsylvania, jinghe@mail.med.upenn.edu
More informationTable of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors
The Multiple Testing Problem Multiple Testing Methods for the Analysis of Microarray Data 3/9/2009 Copyright 2009 Dan Nettleton Suppose one test of interest has been conducted for each of m genes in a
More informationthe long tau-path for detecting monotone association in an unspecified subpopulation
the long tau-path for detecting monotone association in an unspecified subpopulation Joe Verducci Current Challenges in Statistical Learning Workshop Banff International Research Station Tuesday, December
More informationQuestion: If mating occurs at random in the population, what will the frequencies of A 1 and A 2 be in the next generation?
October 12, 2009 Bioe 109 Fall 2009 Lecture 8 Microevolution 1 - selection The Hardy-Weinberg-Castle Equilibrium - consider a single locus with two alleles A 1 and A 2. - three genotypes are thus possible:
More informationBreeding Values and Inbreeding. Breeding Values and Inbreeding
Breeding Values and Inbreeding Genotypic Values For the bi-allelic single locus case, we previously defined the mean genotypic (or equivalently the mean phenotypic values) to be a if genotype is A 2 A
More informationPrincipal-Agent Games - Equilibria under Asymmetric Information -
Principal-Agent Games - Equilibria under Asymmetric Information - Ulrich Horst 1 Humboldt-Universität zu Berlin Department of Mathematics and School of Business and Economics Work in progress - Comments
More informationLecture 7: Hypothesis Testing and ANOVA
Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis
More informationHigh-dimensional statistics, with applications to genome-wide association studies
EMS Surv. Math. Sci. x (201x), xxx xxx DOI 10.4171/EMSS/x EMS Surveys in Mathematical Sciences c European Mathematical Society High-dimensional statistics, with applications to genome-wide association
More information