Learning ancestral genetic processes using nonparametric Bayesian models

Size: px
Start display at page:

Download "Learning ancestral genetic processes using nonparametric Bayesian models"

Transcription

1 Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew Stephens, University of Chicago In partial fulfillment of the requirements for the degree of Doctor of Philosophy. 1

2 Recent explosion of genomic data

3 Inference on human migration history Li, Hui, Kelly Cho, Judith R Kidd, and Kenneth K Kidd Genetic landscape of Eurasia and ʻadmixtureʼ in Uyghurs.. American journal of human genetics 85 (6) (December): 934 7; author reply Xu, Shuhua, and Li Jin A genome-wide analysis of admixture in Uyghurs and a high-density admixture map for disease-gene discovery.. American journal of human genetics 83 (3) (September):

4 Finding Disease Genes mutation G A Search for causal genes and mutations 4

5 Key challenges How to model complex inheritance processes underlying the data Realistic and flexible inheritance model 5 Images courtesy of thurj.org and stormfront.org

6 Key challenges Lack of large scale samples, data in high-dimensional space Need to exploit structural information in the data from genetically distinct but related groups Image courtesy of Nature Reviews: Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. 3 (5) (May): doi: /nrg795. 6

7 Thesis goal Develop flexible non-parametric Bayesian models for learning ancestral genetic processes that efficiently utilize structural information in genetic data Biologically, provide practical tools that can reveal interesting characteristics of study populations Statistically, validate how non-parametric Bayesian models can help improving the performance of real applications 7

8 Theoretical component Nonparametric Bayesian models based on Dirichlet process and its extensions such as a hierarchical Dirichlet process or infinite hidden Markov model Combined with a haplotype inheritance model describing various genetic processes 8

9 SNP haplotypes Genetic polymorphism: a difference in DNA sequence among individuals, or populations Single Nucleotide Polymorphism (SNP): DNA sequence variation occurring when a single nucleotide - A, T, C, or G - differs between members of the groups Haplotypes Genotypes: sequences of an unordered pair of haplotype alleles in diploids, e.g. two haplotypes of (AATC) and (ACGG) form a genotype sequence of ({A,A}, {A,C}, {G,T}, {C,G}) 9

10 Thesis outline 1. Haplotype inference from multi-population data 2. Joint inference of population structure and recombination events 3. Local ancestry estimation in admixed populations 10

11 Application 1 A hierarchical Dirichlet process mixture model for haplotype reconstruction from multipopulation data (International Conference on Machine Learning 2006, Annals of Applied Statistics 2009) 11

12 Motivation Few existing programs have explicitly used the population labels in haplotype inference while a lot of datasets come from genetically distinct populations Each population has its own characteristics They are related and may share some components Develop a principled approach that explicitly exploit the population labels 12

13 Bayesian haplotype inference model mutation model A! founder haplotypes genotyping model H n1 H n2 individual haplotypes G n individual genotypes (observed) Each individual haplotype is a mixture of founder haplotypes Eric P. Xing, Michael I Jordan, and Roded Sharan Bayesian haplotype inference via the Dirichlet process.. Journal of computational biology : a journal of computational molecular cell biology 14 (3) (April): doi: /cmb

14 Bayesian haplotype inference model Dirichlet process prior! G 0 F mutation model " A! founder haplotypes genotyping model H n1 H n2 individual haplotypes G n individual genotypes (observed) Each individual haplotype is a mixture of founder haplotypes Eric P. Xing, Michael I Jordan, and Roded Sharan Bayesian haplotype inference via the Dirichlet process.. Journal of computational biology : a journal of computational molecular cell biology 14 (3) (April): doi: /cmb

15 Dirichlet Process A CDF, F on " follows a Dirichlet Process if for any measurable finite partition (B 1,B 2,.., B m ) of ", the joint distribution of the random variables ( F(B 1 ), F(B 2 ),, F(B m ) ) ~ Dirichlet( # G 0 (B 1 ),., # G 0 (B m ) ) where G 0 is the base measure and # is the scale parameter 15

16 Dirichlet Process Pólya urn model We associate mixture components (founders) with colors and samples (individual chromosomes) with balls in the Pólya urn model 16

17 DP mixture model A flexible haplotype inheritance model The number of founders can grow as large as needed by the given data Reasonable approximation to the coalescent theory Coalescent with mutation 17

18 Multi-population data 18

19 A hierarchical Dirichlet Process mixture Two level Pólya urn scheme - Assume population-specific DP - Use a common base measure distributed as another Dirichlet process - Atoms (or founder haplotypes) are shared across populations mutation model genotyping model 19

20 Performance Comparison of DP and HDP mixtures -Mode-1: ignore labels -Mode-2: handle each group separately Haplotype inference as a measurable application of DP and HDP mixture models 20

21 Comparison with benchmark algorithms 21

22 Sensitivity analysis on hyper parameters 22

23 Discussions Explicitly leverage population labels in haplotype inference using a hierarchical Dirichlet process mixture model Recombination not taken into account and a divide-and-conquer scheme called Partition-Ligation is used to deal with long sequences 23

24 Application 2 Spectrum: Joint Inference of Population Structure and Recombination events ( NIPS 2007, ISMB 2007, BA 2007) 24

25 Inheritance process x after one generation x... x after many generations x x x x 25

26 A new model for population representation Basic assumption Modern chromosomes are derived from hypothetical founder chromosomes via recombination and mutation Hypothetical founder chromosomes Individual chromosome Each individual chromosome is a mosaic of founders 26

27 Population analysis Given population data, recover the pool of hypothetical founders and their association with individual haplotypes haplotypes Hypothetical Founder Association 27

28 How to recover the association Use hidden Markov models (HMM) c 1 c c3 2 cn Hidden State sequence Observation sequence How many founders? Use an infinite HMM (Beal et al. ( 2002), Teh et al. (2006)) 28

29 Infinite hidden Markov model (Hidden Markov Dirichlet Process) Transition between infinite number of founders Infinite dimensional transition matrix modeled by a hierarchical Dirichlet process : each row modeled with a DP - rows coupled by a common base measure under another DP 29

30 Infinite HMM model for population analysis H... " A? c 1 c 2 c3 " cn H 30

31 Infinite HMM model for population analysis Founder allele reconstruction Inferring population structure Inferring recombination hotspot H... " A? c 1 c 2 c3 " cn H 31

32 Discussion A new haplotype model using an infinite hidden Markov model allows joint inference of population structure and recombination events Provide an alternative way of characterizing a population 32

33 Application 3 Robust Estimation of Local Ancestry in Admixed Populations (In submission) 33

34 Recently admixed population African Americans: multiple sources of ancestry (African / European) African European 34 Local Genetic Ancestry

35 Admixture mapping in recently admixed populations Percent ancestry from population A 80% 60% Patients Controls 40% 20% Candidate Disease locus 0% Chromosome position 35

36 Local genetic ancestry estimation Input Ancestral populations (train data) African European Admixed individual (test data) aaaaaaaeeee Output 36

37 Challenges Real ancestral populations are not available for study Use un-admixed modern descendants How to reflect the discrepancy in the statistical model How to encode ancestral population data 37

38 Previous Work 1. Allele-frequency-based approach probability of allele A European African Model individual-based approach EUROPEAN AFRICAN

39 Our approach Founder-based population model Hypothetical Founder African European

40 Local ancestry estimation Hypothetical Founder African European African American

41 How to construct the pool of founders Associate each population with a unique infinite HMM and link them hierarchically by using a common base measure under DP ihmm for population 1 ihmm for population 2 41

42 Hidden Markov Model Hidden state: S=(founder indicator k, population indicator j) Transition model: recombination between founders & between populations 42

43 Local ancestry model using infinite HMMS A new haplotype-based ancestry model Unique population encoding based on founder haplotypes Leverage the structural relatedness between multiple ancestral populations Insensitive to the choice of ancestral population data Robust under deviation from typical modeling assumption 43

44 Error rates as a function of train data size Even when only limited amount of reference data are available, our method still performs very well Error rates Data 1 Data 2 Data 3 x-axis: number of individuals per train population 44

45 Robustness under deviation from modeling assumption Our method HAPMIX LAMP More accurate (lower error rates), and most robust (errors do not increase much) x-axis: deviation from modeling assumption 45

46 Analysis of HGDP dataset Geographic location Training: YRI CEU JPT+CHB MAYA African Middle East Europe Asia Oceania American 46

47 Analysis of HGDP dataset 47

48 Discussion A new population representation model using non-parametric Bayesian approach of infinite HMMs Efficiently exploit the shared structural information across multiple populations Insensitive to the amount of train data, robust under deviation from modeling assumption Reveal interesting characteristics of the study populations 48

49 Conclusion We have presented nonparametric Bayesian models for learning ancestral genetic processes Systematically handle grouped data by using hierarchical models Explicitly exploit the structural information Provide a flexible haplotype inheritance model that can incorporate various genetic properties 49

50 Conclusion Future work - Improve computational complexity - Computation in dynamic programming: O( (KJ)^2 * T ) per chromosome per iteration - Reduce redundant computation - Parallelization - Fine-scaled analysis of real datasets - Combination of admixture mapping and association study 50

51 Dedicated to my family: Celine, KiHyun, and my parents 51

52 Thank you 52

53 HDP mixture model for multipopulation haplotype inference 53

54 Result on three-way admixture 54

55 Biological Terms Genetic admixture Mixing of genetically distant populations local ancestry: locus-by-locus ancestry in an admixed individual chromosome

Learning Ancestral Genetic Processes using Nonparametric Bayesian Models

Learning Ancestral Genetic Processes using Nonparametric Bayesian Models Learning Ancestral Genetic Processes using Nonparametric Bayesian Models Kyung-Ah Sohn CMU-CS-11-136 November 2011 Computer Science Department School of Computer Science Carnegie Mellon University Pittsburgh,

More information

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam Banerjee November 26, 2007 Genetic Polymorphism Single nucleotide polymorphism (SNP) Genetic Polymorphism

More information

Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space

Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space Kyung-Ah Sohn School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 ksohn@cs.cmu.edu Eric P.

More information

A Nonparametric Bayesian Approach for Haplotype Reconstruction from Single and Multi-Population Data

A Nonparametric Bayesian Approach for Haplotype Reconstruction from Single and Multi-Population Data A Nonparametric Bayesian Approach for Haplotype Reconstruction from Single and Multi-Population Data Eric P. Xing January 27 CMU-ML-7-17 Kyung-Ah Sohn School of Computer Science Carnegie Mellon University

More information

Effect of Genetic Divergence in Identifying Ancestral Origin using HAPAA

Effect of Genetic Divergence in Identifying Ancestral Origin using HAPAA Effect of Genetic Divergence in Identifying Ancestral Origin using HAPAA Andreas Sundquist*, Eugene Fratkin*, Chuong B. Do, Serafim Batzoglou Department of Computer Science, Stanford University, Stanford,

More information

Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture

Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Eric P. Xing epxing@cs.cmu.edu Kyung-Ah Sohn sohn@cs.cmu.edu Michael I. Jordan jordan@cs.bereley.edu Yee-Whye

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing

More information

Mathematical models in population genetics II

Mathematical models in population genetics II Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population

More information

Genetic Drift in Human Evolution

Genetic Drift in Human Evolution Genetic Drift in Human Evolution (Part 2 of 2) 1 Ecology and Evolutionary Biology Center for Computational Molecular Biology Brown University Outline Introduction to genetic drift Modeling genetic drift

More information

Modelling Genetic Variations with Fragmentation-Coagulation Processes

Modelling Genetic Variations with Fragmentation-Coagulation Processes Modelling Genetic Variations with Fragmentation-Coagulation Processes Yee Whye Teh, Charles Blundell, Lloyd Elliott Gatsby Computational Neuroscience Unit, UCL Genetic Variations in Populations Inferring

More information

Statistical Methods for studying Genetic Variation in Populations

Statistical Methods for studying Genetic Variation in Populations Statistical Methods for studying Genetic Variation in Populations Suyash Shringarpure August 2012 CMU-ML-12-105 Statistical Methods for studying Genetic Variation in Populations Suyash Shringarpure CMU-ML-12-105

More information

mstruct: Inference of Population Structure in Light of both Genetic Admixing and Allele Mutations

mstruct: Inference of Population Structure in Light of both Genetic Admixing and Allele Mutations mstruct: Inference of Population Structure in Light of both Genetic Admixing and Allele Mutations Suyash Shringarpure and Eric P. Xing May 2008 CMU-ML-08-105 mstruct: Inference of Population Structure

More information

Populations in statistical genetics

Populations in statistical genetics Populations in statistical genetics What are they, and how can we infer them from whole genome data? Daniel Lawson Heilbronn Institute, University of Bristol www.paintmychromosomes.com Work with: January

More information

Introduction to population genetics & evolution

Introduction to population genetics & evolution Introduction to population genetics & evolution Course Organization Exam dates: Feb 19 March 1st Has everybody registered? Did you get the email with the exam schedule Summer seminar: Hot topics in Bioinformatics

More information

BIOINFORMATICS. StructHDP: Automatic inference of number of clusters and population structure from admixed genotype data

BIOINFORMATICS. StructHDP: Automatic inference of number of clusters and population structure from admixed genotype data BIOINFORMATICS Vol. 00 no. 00 2011 Pages 1 9 StructHDP: Automatic inference of number of clusters and population structure from admixed genotype data Suyash Shringarpure 1, Daegun Won 1 and Eric P. Xing

More information

A Brief Overview of Nonparametric Bayesian Models

A Brief Overview of Nonparametric Bayesian Models A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine

More information

Population Genetics I. Bio

Population Genetics I. Bio Population Genetics I. Bio5488-2018 Don Conrad dconrad@genetics.wustl.edu Why study population genetics? Functional Inference Demographic inference: History of mankind is written in our DNA. We can learn

More information

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics. Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary

More information

Statistical Methods for studying Genetic Variation in Populations

Statistical Methods for studying Genetic Variation in Populations Statistical Methods for studying Genetic Variation in Populations Suyash Shringarpure August 2012 CMU-ML-12-105 Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the

More information

Frequency Spectra and Inference in Population Genetics

Frequency Spectra and Inference in Population Genetics Frequency Spectra and Inference in Population Genetics Although coalescent models have come to play a central role in population genetics, there are some situations where genealogies may not lead to efficient

More information

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) 12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

p(d g A,g B )p(g B ), g B

p(d g A,g B )p(g B ), g B Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)

More information

Similarity Measures and Clustering In Genetics

Similarity Measures and Clustering In Genetics Similarity Measures and Clustering In Genetics Daniel Lawson Heilbronn Institute for Mathematical Research School of mathematics University of Bristol www.paintmychromosomes.com Talk outline Introduction

More information

Fast Approximate MAP Inference for Bayesian Nonparametrics

Fast Approximate MAP Inference for Bayesian Nonparametrics Fast Approximate MAP Inference for Bayesian Nonparametrics Y. Raykov A. Boukouvalas M.A. Little Department of Mathematics Aston University 10th Conference on Bayesian Nonparametrics, 2015 1 Iterated Conditional

More information

Introduction to Advanced Population Genetics

Introduction to Advanced Population Genetics Introduction to Advanced Population Genetics Learning Objectives Describe the basic model of human evolutionary history Describe the key evolutionary forces How demography can influence the site frequency

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities

More information

How robust are the predictions of the W-F Model?

How robust are the predictions of the W-F Model? How robust are the predictions of the W-F Model? As simplistic as the Wright-Fisher model may be, it accurately describes the behavior of many other models incorporating additional complexity. Many population

More information

6 Introduction to Population Genetics

6 Introduction to Population Genetics 70 Grundlagen der Bioinformatik, SoSe 11, D. Huson, May 19, 2011 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,

More information

Robust demographic inference from genomic and SNP data

Robust demographic inference from genomic and SNP data Robust demographic inference from genomic and SNP data Laurent Excoffier Isabelle Duperret, Emilia Huerta-Sanchez, Matthieu Foll, Vitor Sousa, Isabel Alves Computational and Molecular Population Genetics

More information

Processes of Evolution

Processes of Evolution 15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection

More information

1. Understand the methods for analyzing population structure in genomes

1. Understand the methods for analyzing population structure in genomes MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Haplotyping as Perfect Phylogeny: A direct approach

Haplotyping as Perfect Phylogeny: A direct approach Haplotyping as Perfect Phylogeny: A direct approach Vineet Bafna Dan Gusfield Giuseppe Lancia Shibu Yooseph February 7, 2003 Abstract A full Haplotype Map of the human genome will prove extremely valuable

More information

6 Introduction to Population Genetics

6 Introduction to Population Genetics Grundlagen der Bioinformatik, SoSe 14, D. Huson, May 18, 2014 67 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,

More information

Supporting Information Text S1

Supporting Information Text S1 Supporting Information Text S1 List of Supplementary Figures S1 The fraction of SNPs s where there is an excess of Neandertal derived alleles n over Denisova derived alleles d as a function of the derived

More information

Computational Approaches to Statistical Genetics

Computational Approaches to Statistical Genetics Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen

More information

Phylogenetic Networks with Recombination

Phylogenetic Networks with Recombination Phylogenetic Networks with Recombination October 17 2012 Recombination All DNA is recombinant DNA... [The] natural process of recombination and mutation have acted throughout evolution... Genetic exchange

More information

Computational Methods for Learning Population History from Large Scale Genetic Variation Datasets

Computational Methods for Learning Population History from Large Scale Genetic Variation Datasets Computational Methods for Learning Population History from Large Scale Genetic Variation Datasets Ming-Chi Tsai CMU-CB-13-102 July 2, 2013 School of Computer Science Carnegie Mellon University Pittsburgh,

More information

Bayesian Nonparametric Learning of Complex Dynamical Phenomena

Bayesian Nonparametric Learning of Complex Dynamical Phenomena Duke University Department of Statistical Science Bayesian Nonparametric Learning of Complex Dynamical Phenomena Emily Fox Joint work with Erik Sudderth (Brown University), Michael Jordan (UC Berkeley),

More information

Detecting selection from differentiation between populations: the FLK and hapflk approach.

Detecting selection from differentiation between populations: the FLK and hapflk approach. Detecting selection from differentiation between populations: the FLK and hapflk approach. Bertrand Servin bservin@toulouse.inra.fr Maria-Ines Fariello, Simon Boitard, Claude Chevalet, Magali SanCristobal,

More information

Bayesian Inference of Interactions and Associations

Bayesian Inference of Interactions and Associations Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,

More information

Lecture 13: Population Structure. October 8, 2012

Lecture 13: Population Structure. October 8, 2012 Lecture 13: Population Structure October 8, 2012 Last Time Effective population size calculations Historical importance of drift: shifting balance or noise? Population structure Today Course feedback The

More information

Hierarchical Dirichlet Processes

Hierarchical Dirichlet Processes Hierarchical Dirichlet Processes Yee Whye Teh, Michael I. Jordan, Matthew J. Beal and David M. Blei Computer Science Div., Dept. of Statistics Dept. of Computer Science University of California at Berkeley

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm

More information

Big Idea #1: The process of evolution drives the diversity and unity of life

Big Idea #1: The process of evolution drives the diversity and unity of life BIG IDEA! Big Idea #1: The process of evolution drives the diversity and unity of life Key Terms for this section: emigration phenotype adaptation evolution phylogenetic tree adaptive radiation fertility

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Supplementary Materials for

Supplementary Materials for www.sciencemag.org/content/343/6172/747/suppl/dc1 Supplementary Materials for A Genetic Atlas of Human Admixture History Garrett Hellenthal, George B. J. Busby, Gavin Band, James F. Wilson, Cristian Capelli,

More information

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations

More information

An Integrated Approach for the Assessment of Chromosomal Abnormalities

An Integrated Approach for the Assessment of Chromosomal Abnormalities An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 26, 2007 Karyotypes Karyotypes General Cytogenetics

More information

Bayesian Nonparametrics: Models Based on the Dirichlet Process

Bayesian Nonparametrics: Models Based on the Dirichlet Process Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro

More information

URN MODELS: the Ewens Sampling Lemma

URN MODELS: the Ewens Sampling Lemma Department of Computer Science Brown University, Providence sorin@cs.brown.edu October 3, 2014 1 2 3 4 Mutation Mutation: typical values for parameters Equilibrium Probability of fixation 5 6 Ewens Sampling

More information

Spatial Normalized Gamma Process

Spatial Normalized Gamma Process Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma

More information

SAT in Bioinformatics: Making the Case with Haplotype Inference

SAT in Bioinformatics: Making the Case with Haplotype Inference SAT in Bioinformatics: Making the Case with Haplotype Inference Inês Lynce 1 and João Marques-Silva 2 1 IST/INESC-ID, Technical University of Lisbon, Portugal ines@sat.inesc-id.pt 2 School of Electronics

More information

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES: .5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the

More information

Hidden Markov models in population genetics and evolutionary biology

Hidden Markov models in population genetics and evolutionary biology Hidden Markov models in population genetics and evolutionary biology Gerton Lunter Wellcome Trust Centre for Human Genetics Oxford, UK April 29, 2013 Topics for today Markov chains Hidden Markov models

More information

BIOINFORMATICS. SequenceLDhot: Detecting Recombination Hotspots. Paul Fearnhead a 1 INTRODUCTION 2 METHOD

BIOINFORMATICS. SequenceLDhot: Detecting Recombination Hotspots. Paul Fearnhead a 1 INTRODUCTION 2 METHOD BIOINFORMATICS Vol. 00 no. 00 2006 Pages 1 5 SequenceLDhot: Detecting Recombination Hotspots Paul Fearnhead a a Department of Mathematics and Statistics, Lancaster University, Lancaster LA1 4YF, UK ABSTRACT

More information

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,

More information

Supporting Information

Supporting Information Supporting Information Hammer et al. 10.1073/pnas.1109300108 SI Materials and Methods Two-Population Model. Estimating demographic parameters. For each pair of sub-saharan African populations we consider

More information

Inference in Explicit Duration Hidden Markov Models

Inference in Explicit Duration Hidden Markov Models Inference in Explicit Duration Hidden Markov Models Frank Wood Joint work with Chris Wiggins, Mike Dewar Columbia University November, 2011 Wood (Columbia University) EDHMM Inference November, 2011 1 /

More information

Week 7.2 Ch 4 Microevolutionary Proceses

Week 7.2 Ch 4 Microevolutionary Proceses Week 7.2 Ch 4 Microevolutionary Proceses 1 Mendelian Traits vs Polygenic Traits Mendelian -discrete -single gene determines effect -rarely influenced by environment Polygenic: -continuous -multiple genes

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

An Integrated Approach for the Assessment of Chromosomal Abnormalities

An Integrated Approach for the Assessment of Chromosomal Abnormalities An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 6, 2007 Karyotypes Mitosis and Meiosis Meiosis Meiosis

More information

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8 The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as

More information

Demographic Inference with Coalescent Hidden Markov Model

Demographic Inference with Coalescent Hidden Markov Model Demographic Inference with Coalescent Hidden Markov Model Jade Y. Cheng Thomas Mailund Bioinformatics Research Centre Aarhus University Denmark The Thirteenth Asia Pacific Bioinformatics Conference HsinChu,

More information

Non-parametric Clustering with Dirichlet Processes

Non-parametric Clustering with Dirichlet Processes Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction

More information

2. Map genetic distance between markers

2. Map genetic distance between markers Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

The Lander-Green Algorithm. Biostatistics 666 Lecture 22 The Lander-Green Algorithm Biostatistics 666 Lecture Last Lecture Relationship Inferrence Likelihood of genotype data Adapt calculation to different relationships Siblings Half-Siblings Unrelated individuals

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

Estimating Recombination Rates. LRH selection test, and recombination

Estimating Recombination Rates. LRH selection test, and recombination Estimating Recombination Rates LRH selection test, and recombination Recall that LRH tests for selection by looking at frequencies of specific haplotypes. Clearly the test is dependent on the recombination

More information

Research Statement on Statistics Jun Zhang

Research Statement on Statistics Jun Zhang Research Statement on Statistics Jun Zhang (junzhang@galton.uchicago.edu) My interest on statistics generally includes machine learning and statistical genetics. My recent work focus on detection and interpretation

More information

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1 10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex

More information

Intraspecific gene genealogies: trees grafting into networks

Intraspecific gene genealogies: trees grafting into networks Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation

More information

Integer Programming in Computational Biology. D. Gusfield University of California, Davis Presented December 12, 2016.!

Integer Programming in Computational Biology. D. Gusfield University of California, Davis Presented December 12, 2016.! Integer Programming in Computational Biology D. Gusfield University of California, Davis Presented December 12, 2016. There are many important phylogeny problems that depart from simple tree models: Missing

More information

Hidden Markov models: from the beginning to the state of the art

Hidden Markov models: from the beginning to the state of the art Hidden Markov models: from the beginning to the state of the art Frank Wood Columbia University November, 2011 Wood (Columbia University) HMMs November, 2011 1 / 44 Outline Overview of hidden Markov models

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet

More information

Haplotyping. Biostatistics 666

Haplotyping. Biostatistics 666 Haplotyping Biostatistics 666 Previously Introduction to te E-M algoritm Approac for likeliood optimization Examples related to gene counting Allele frequency estimation recessive disorder Allele frequency

More information

Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models

Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models Tutorial at CVPR 2012 Erik Sudderth Brown University Work by E. Fox, E. Sudderth, M. Jordan, & A. Willsky AOAS 2011: A Sticky HDP-HMM with

More information

Nonparametric Factor Analysis with Beta Process Priors

Nonparametric Factor Analysis with Beta Process Priors Nonparametric Factor Analysis with Beta Process Priors John Paisley Lawrence Carin Department of Electrical & Computer Engineering Duke University, Durham, NC 7708 jwp4@ee.duke.edu lcarin@ee.duke.edu Abstract

More information

Genetic Association Studies in the Presence of Population Structure and Admixture

Genetic Association Studies in the Presence of Population Structure and Admixture Genetic Association Studies in the Presence of Population Structure and Admixture Purushottam W. Laud and Nicholas M. Pajewski Division of Biostatistics Department of Population Health Medical College

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Feature Selection via Block-Regularized Regression

Feature Selection via Block-Regularized Regression Feature Selection via Block-Regularized Regression Seyoung Kim School of Computer Science Carnegie Mellon University Pittsburgh, PA 3 Eric Xing School of Computer Science Carnegie Mellon University Pittsburgh,

More information

List the five conditions that can disturb genetic equilibrium in a population.(10)

List the five conditions that can disturb genetic equilibrium in a population.(10) List the five conditions that can disturb genetic equilibrium in a population.(10) The five conditions are non-random mating, small population size, immigration or emigration, mutations, and natural selection.

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

arxiv: v1 [cs.lg] 22 Jun 2009

arxiv: v1 [cs.lg] 22 Jun 2009 Bayesian two-sample tests arxiv:0906.4032v1 [cs.lg] 22 Jun 2009 Karsten M. Borgwardt 1 and Zoubin Ghahramani 2 1 Max-Planck-Institutes Tübingen, 2 University of Cambridge June 22, 2009 Abstract In this

More information

Population Structure

Population Structure Ch 4: Population Subdivision Population Structure v most natural populations exist across a landscape (or seascape) that is more or less divided into areas of suitable habitat v to the extent that populations

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)

CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG) CS1820 Notes hgupta1, kjline, smechery April 3-April 5 April 3 Notes 1 Minichiello-Durbin Algorithm input: set of sequences output: plausible Ancestral Recombination Graph (ARG) note: the optimal ARG is

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Bayesian Hidden Markov Models and Extensions

Bayesian Hidden Markov Models and Extensions Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling

More information

Learning gene regulatory networks Statistical methods for haplotype inference Part I

Learning gene regulatory networks Statistical methods for haplotype inference Part I Learning gene regulatory networks Statistical methods for haplotype inference Part I Input: Measurement of mrn levels of all genes from microarray or rna sequencing Samples (e.g. 200 patients with lung

More information

The problem Lineage model Examples. The lineage model

The problem Lineage model Examples. The lineage model The lineage model A Bayesian approach to inferring community structure and evolutionary history from whole-genome metagenomic data Jack O Brien Bowdoin College with Daniel Falush and Xavier Didelot Cambridge,

More information

Visualizing Population Genetics

Visualizing Population Genetics Visualizing Population Genetics Supervisors: Rachel Fewster, James Russell, Paul Murrell Louise McMillan 1 December 2015 Louise McMillan Visualizing Population Genetics 1 December 2015 1 / 29 Outline 1

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes

Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes Yee Whye Teh (1), Michael I. Jordan (1,2), Matthew J. Beal (3) and David M. Blei (1) (1) Computer Science Div., (2) Dept. of Statistics

More information

Bayesian analysis of the Hardy-Weinberg equilibrium model

Bayesian analysis of the Hardy-Weinberg equilibrium model Bayesian analysis of the Hardy-Weinberg equilibrium model Eduardo Gutiérrez Peña Department of Probability and Statistics IIMAS, UNAM 6 April, 2010 Outline Statistical Inference 1 Statistical Inference

More information