Learning ancestral genetic processes using nonparametric Bayesian models
|
|
- Karin Campbell
- 6 years ago
- Views:
Transcription
1 Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew Stephens, University of Chicago In partial fulfillment of the requirements for the degree of Doctor of Philosophy. 1
2 Recent explosion of genomic data
3 Inference on human migration history Li, Hui, Kelly Cho, Judith R Kidd, and Kenneth K Kidd Genetic landscape of Eurasia and ʻadmixtureʼ in Uyghurs.. American journal of human genetics 85 (6) (December): 934 7; author reply Xu, Shuhua, and Li Jin A genome-wide analysis of admixture in Uyghurs and a high-density admixture map for disease-gene discovery.. American journal of human genetics 83 (3) (September):
4 Finding Disease Genes mutation G A Search for causal genes and mutations 4
5 Key challenges How to model complex inheritance processes underlying the data Realistic and flexible inheritance model 5 Images courtesy of thurj.org and stormfront.org
6 Key challenges Lack of large scale samples, data in high-dimensional space Need to exploit structural information in the data from genetically distinct but related groups Image courtesy of Nature Reviews: Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. 3 (5) (May): doi: /nrg795. 6
7 Thesis goal Develop flexible non-parametric Bayesian models for learning ancestral genetic processes that efficiently utilize structural information in genetic data Biologically, provide practical tools that can reveal interesting characteristics of study populations Statistically, validate how non-parametric Bayesian models can help improving the performance of real applications 7
8 Theoretical component Nonparametric Bayesian models based on Dirichlet process and its extensions such as a hierarchical Dirichlet process or infinite hidden Markov model Combined with a haplotype inheritance model describing various genetic processes 8
9 SNP haplotypes Genetic polymorphism: a difference in DNA sequence among individuals, or populations Single Nucleotide Polymorphism (SNP): DNA sequence variation occurring when a single nucleotide - A, T, C, or G - differs between members of the groups Haplotypes Genotypes: sequences of an unordered pair of haplotype alleles in diploids, e.g. two haplotypes of (AATC) and (ACGG) form a genotype sequence of ({A,A}, {A,C}, {G,T}, {C,G}) 9
10 Thesis outline 1. Haplotype inference from multi-population data 2. Joint inference of population structure and recombination events 3. Local ancestry estimation in admixed populations 10
11 Application 1 A hierarchical Dirichlet process mixture model for haplotype reconstruction from multipopulation data (International Conference on Machine Learning 2006, Annals of Applied Statistics 2009) 11
12 Motivation Few existing programs have explicitly used the population labels in haplotype inference while a lot of datasets come from genetically distinct populations Each population has its own characteristics They are related and may share some components Develop a principled approach that explicitly exploit the population labels 12
13 Bayesian haplotype inference model mutation model A! founder haplotypes genotyping model H n1 H n2 individual haplotypes G n individual genotypes (observed) Each individual haplotype is a mixture of founder haplotypes Eric P. Xing, Michael I Jordan, and Roded Sharan Bayesian haplotype inference via the Dirichlet process.. Journal of computational biology : a journal of computational molecular cell biology 14 (3) (April): doi: /cmb
14 Bayesian haplotype inference model Dirichlet process prior! G 0 F mutation model " A! founder haplotypes genotyping model H n1 H n2 individual haplotypes G n individual genotypes (observed) Each individual haplotype is a mixture of founder haplotypes Eric P. Xing, Michael I Jordan, and Roded Sharan Bayesian haplotype inference via the Dirichlet process.. Journal of computational biology : a journal of computational molecular cell biology 14 (3) (April): doi: /cmb
15 Dirichlet Process A CDF, F on " follows a Dirichlet Process if for any measurable finite partition (B 1,B 2,.., B m ) of ", the joint distribution of the random variables ( F(B 1 ), F(B 2 ),, F(B m ) ) ~ Dirichlet( # G 0 (B 1 ),., # G 0 (B m ) ) where G 0 is the base measure and # is the scale parameter 15
16 Dirichlet Process Pólya urn model We associate mixture components (founders) with colors and samples (individual chromosomes) with balls in the Pólya urn model 16
17 DP mixture model A flexible haplotype inheritance model The number of founders can grow as large as needed by the given data Reasonable approximation to the coalescent theory Coalescent with mutation 17
18 Multi-population data 18
19 A hierarchical Dirichlet Process mixture Two level Pólya urn scheme - Assume population-specific DP - Use a common base measure distributed as another Dirichlet process - Atoms (or founder haplotypes) are shared across populations mutation model genotyping model 19
20 Performance Comparison of DP and HDP mixtures -Mode-1: ignore labels -Mode-2: handle each group separately Haplotype inference as a measurable application of DP and HDP mixture models 20
21 Comparison with benchmark algorithms 21
22 Sensitivity analysis on hyper parameters 22
23 Discussions Explicitly leverage population labels in haplotype inference using a hierarchical Dirichlet process mixture model Recombination not taken into account and a divide-and-conquer scheme called Partition-Ligation is used to deal with long sequences 23
24 Application 2 Spectrum: Joint Inference of Population Structure and Recombination events ( NIPS 2007, ISMB 2007, BA 2007) 24
25 Inheritance process x after one generation x... x after many generations x x x x 25
26 A new model for population representation Basic assumption Modern chromosomes are derived from hypothetical founder chromosomes via recombination and mutation Hypothetical founder chromosomes Individual chromosome Each individual chromosome is a mosaic of founders 26
27 Population analysis Given population data, recover the pool of hypothetical founders and their association with individual haplotypes haplotypes Hypothetical Founder Association 27
28 How to recover the association Use hidden Markov models (HMM) c 1 c c3 2 cn Hidden State sequence Observation sequence How many founders? Use an infinite HMM (Beal et al. ( 2002), Teh et al. (2006)) 28
29 Infinite hidden Markov model (Hidden Markov Dirichlet Process) Transition between infinite number of founders Infinite dimensional transition matrix modeled by a hierarchical Dirichlet process : each row modeled with a DP - rows coupled by a common base measure under another DP 29
30 Infinite HMM model for population analysis H... " A? c 1 c 2 c3 " cn H 30
31 Infinite HMM model for population analysis Founder allele reconstruction Inferring population structure Inferring recombination hotspot H... " A? c 1 c 2 c3 " cn H 31
32 Discussion A new haplotype model using an infinite hidden Markov model allows joint inference of population structure and recombination events Provide an alternative way of characterizing a population 32
33 Application 3 Robust Estimation of Local Ancestry in Admixed Populations (In submission) 33
34 Recently admixed population African Americans: multiple sources of ancestry (African / European) African European 34 Local Genetic Ancestry
35 Admixture mapping in recently admixed populations Percent ancestry from population A 80% 60% Patients Controls 40% 20% Candidate Disease locus 0% Chromosome position 35
36 Local genetic ancestry estimation Input Ancestral populations (train data) African European Admixed individual (test data) aaaaaaaeeee Output 36
37 Challenges Real ancestral populations are not available for study Use un-admixed modern descendants How to reflect the discrepancy in the statistical model How to encode ancestral population data 37
38 Previous Work 1. Allele-frequency-based approach probability of allele A European African Model individual-based approach EUROPEAN AFRICAN
39 Our approach Founder-based population model Hypothetical Founder African European
40 Local ancestry estimation Hypothetical Founder African European African American
41 How to construct the pool of founders Associate each population with a unique infinite HMM and link them hierarchically by using a common base measure under DP ihmm for population 1 ihmm for population 2 41
42 Hidden Markov Model Hidden state: S=(founder indicator k, population indicator j) Transition model: recombination between founders & between populations 42
43 Local ancestry model using infinite HMMS A new haplotype-based ancestry model Unique population encoding based on founder haplotypes Leverage the structural relatedness between multiple ancestral populations Insensitive to the choice of ancestral population data Robust under deviation from typical modeling assumption 43
44 Error rates as a function of train data size Even when only limited amount of reference data are available, our method still performs very well Error rates Data 1 Data 2 Data 3 x-axis: number of individuals per train population 44
45 Robustness under deviation from modeling assumption Our method HAPMIX LAMP More accurate (lower error rates), and most robust (errors do not increase much) x-axis: deviation from modeling assumption 45
46 Analysis of HGDP dataset Geographic location Training: YRI CEU JPT+CHB MAYA African Middle East Europe Asia Oceania American 46
47 Analysis of HGDP dataset 47
48 Discussion A new population representation model using non-parametric Bayesian approach of infinite HMMs Efficiently exploit the shared structural information across multiple populations Insensitive to the amount of train data, robust under deviation from modeling assumption Reveal interesting characteristics of the study populations 48
49 Conclusion We have presented nonparametric Bayesian models for learning ancestral genetic processes Systematically handle grouped data by using hierarchical models Explicitly exploit the structural information Provide a flexible haplotype inheritance model that can incorporate various genetic properties 49
50 Conclusion Future work - Improve computational complexity - Computation in dynamic programming: O( (KJ)^2 * T ) per chromosome per iteration - Reduce redundant computation - Parallelization - Fine-scaled analysis of real datasets - Combination of admixture mapping and association study 50
51 Dedicated to my family: Celine, KiHyun, and my parents 51
52 Thank you 52
53 HDP mixture model for multipopulation haplotype inference 53
54 Result on three-way admixture 54
55 Biological Terms Genetic admixture Mixing of genetically distant populations local ancestry: locus-by-locus ancestry in an admixed individual chromosome
Learning Ancestral Genetic Processes using Nonparametric Bayesian Models
Learning Ancestral Genetic Processes using Nonparametric Bayesian Models Kyung-Ah Sohn CMU-CS-11-136 November 2011 Computer Science Department School of Computer Science Carnegie Mellon University Pittsburgh,
More informationCSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation
CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam Banerjee November 26, 2007 Genetic Polymorphism Single nucleotide polymorphism (SNP) Genetic Polymorphism
More informationHidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space
Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space Kyung-Ah Sohn School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 ksohn@cs.cmu.edu Eric P.
More informationA Nonparametric Bayesian Approach for Haplotype Reconstruction from Single and Multi-Population Data
A Nonparametric Bayesian Approach for Haplotype Reconstruction from Single and Multi-Population Data Eric P. Xing January 27 CMU-ML-7-17 Kyung-Ah Sohn School of Computer Science Carnegie Mellon University
More informationEffect of Genetic Divergence in Identifying Ancestral Origin using HAPAA
Effect of Genetic Divergence in Identifying Ancestral Origin using HAPAA Andreas Sundquist*, Eugene Fratkin*, Chuong B. Do, Serafim Batzoglou Department of Computer Science, Stanford University, Stanford,
More informationBayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture
Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Eric P. Xing epxing@cs.cmu.edu Kyung-Ah Sohn sohn@cs.cmu.edu Michael I. Jordan jordan@cs.bereley.edu Yee-Whye
More informationAdvanced Machine Learning
Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing
More informationMathematical models in population genetics II
Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population
More informationGenetic Drift in Human Evolution
Genetic Drift in Human Evolution (Part 2 of 2) 1 Ecology and Evolutionary Biology Center for Computational Molecular Biology Brown University Outline Introduction to genetic drift Modeling genetic drift
More informationModelling Genetic Variations with Fragmentation-Coagulation Processes
Modelling Genetic Variations with Fragmentation-Coagulation Processes Yee Whye Teh, Charles Blundell, Lloyd Elliott Gatsby Computational Neuroscience Unit, UCL Genetic Variations in Populations Inferring
More informationStatistical Methods for studying Genetic Variation in Populations
Statistical Methods for studying Genetic Variation in Populations Suyash Shringarpure August 2012 CMU-ML-12-105 Statistical Methods for studying Genetic Variation in Populations Suyash Shringarpure CMU-ML-12-105
More informationmstruct: Inference of Population Structure in Light of both Genetic Admixing and Allele Mutations
mstruct: Inference of Population Structure in Light of both Genetic Admixing and Allele Mutations Suyash Shringarpure and Eric P. Xing May 2008 CMU-ML-08-105 mstruct: Inference of Population Structure
More informationPopulations in statistical genetics
Populations in statistical genetics What are they, and how can we infer them from whole genome data? Daniel Lawson Heilbronn Institute, University of Bristol www.paintmychromosomes.com Work with: January
More informationIntroduction to population genetics & evolution
Introduction to population genetics & evolution Course Organization Exam dates: Feb 19 March 1st Has everybody registered? Did you get the email with the exam schedule Summer seminar: Hot topics in Bioinformatics
More informationBIOINFORMATICS. StructHDP: Automatic inference of number of clusters and population structure from admixed genotype data
BIOINFORMATICS Vol. 00 no. 00 2011 Pages 1 9 StructHDP: Automatic inference of number of clusters and population structure from admixed genotype data Suyash Shringarpure 1, Daegun Won 1 and Eric P. Xing
More informationA Brief Overview of Nonparametric Bayesian Models
A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine
More informationPopulation Genetics I. Bio
Population Genetics I. Bio5488-2018 Don Conrad dconrad@genetics.wustl.edu Why study population genetics? Functional Inference Demographic inference: History of mankind is written in our DNA. We can learn
More informationMajor questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.
Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary
More informationStatistical Methods for studying Genetic Variation in Populations
Statistical Methods for studying Genetic Variation in Populations Suyash Shringarpure August 2012 CMU-ML-12-105 Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the
More informationFrequency Spectra and Inference in Population Genetics
Frequency Spectra and Inference in Population Genetics Although coalescent models have come to play a central role in population genetics, there are some situations where genealogies may not lead to efficient
More informationChapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)
12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²
More informationGenotype Imputation. Biostatistics 666
Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives
More informationp(d g A,g B )p(g B ), g B
Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)
More informationSimilarity Measures and Clustering In Genetics
Similarity Measures and Clustering In Genetics Daniel Lawson Heilbronn Institute for Mathematical Research School of mathematics University of Bristol www.paintmychromosomes.com Talk outline Introduction
More informationFast Approximate MAP Inference for Bayesian Nonparametrics
Fast Approximate MAP Inference for Bayesian Nonparametrics Y. Raykov A. Boukouvalas M.A. Little Department of Mathematics Aston University 10th Conference on Bayesian Nonparametrics, 2015 1 Iterated Conditional
More informationIntroduction to Advanced Population Genetics
Introduction to Advanced Population Genetics Learning Objectives Describe the basic model of human evolutionary history Describe the key evolutionary forces How demography can influence the site frequency
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities
More informationHow robust are the predictions of the W-F Model?
How robust are the predictions of the W-F Model? As simplistic as the Wright-Fisher model may be, it accurately describes the behavior of many other models incorporating additional complexity. Many population
More information6 Introduction to Population Genetics
70 Grundlagen der Bioinformatik, SoSe 11, D. Huson, May 19, 2011 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,
More informationRobust demographic inference from genomic and SNP data
Robust demographic inference from genomic and SNP data Laurent Excoffier Isabelle Duperret, Emilia Huerta-Sanchez, Matthieu Foll, Vitor Sousa, Isabel Alves Computational and Molecular Population Genetics
More informationProcesses of Evolution
15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection
More information1. Understand the methods for analyzing population structure in genomes
MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationHaplotyping as Perfect Phylogeny: A direct approach
Haplotyping as Perfect Phylogeny: A direct approach Vineet Bafna Dan Gusfield Giuseppe Lancia Shibu Yooseph February 7, 2003 Abstract A full Haplotype Map of the human genome will prove extremely valuable
More information6 Introduction to Population Genetics
Grundlagen der Bioinformatik, SoSe 14, D. Huson, May 18, 2014 67 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,
More informationSupporting Information Text S1
Supporting Information Text S1 List of Supplementary Figures S1 The fraction of SNPs s where there is an excess of Neandertal derived alleles n over Denisova derived alleles d as a function of the derived
More informationComputational Approaches to Statistical Genetics
Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen
More informationPhylogenetic Networks with Recombination
Phylogenetic Networks with Recombination October 17 2012 Recombination All DNA is recombinant DNA... [The] natural process of recombination and mutation have acted throughout evolution... Genetic exchange
More informationComputational Methods for Learning Population History from Large Scale Genetic Variation Datasets
Computational Methods for Learning Population History from Large Scale Genetic Variation Datasets Ming-Chi Tsai CMU-CB-13-102 July 2, 2013 School of Computer Science Carnegie Mellon University Pittsburgh,
More informationBayesian Nonparametric Learning of Complex Dynamical Phenomena
Duke University Department of Statistical Science Bayesian Nonparametric Learning of Complex Dynamical Phenomena Emily Fox Joint work with Erik Sudderth (Brown University), Michael Jordan (UC Berkeley),
More informationDetecting selection from differentiation between populations: the FLK and hapflk approach.
Detecting selection from differentiation between populations: the FLK and hapflk approach. Bertrand Servin bservin@toulouse.inra.fr Maria-Ines Fariello, Simon Boitard, Claude Chevalet, Magali SanCristobal,
More informationBayesian Inference of Interactions and Associations
Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,
More informationLecture 13: Population Structure. October 8, 2012
Lecture 13: Population Structure October 8, 2012 Last Time Effective population size calculations Historical importance of drift: shifting balance or noise? Population structure Today Course feedback The
More informationHierarchical Dirichlet Processes
Hierarchical Dirichlet Processes Yee Whye Teh, Michael I. Jordan, Matthew J. Beal and David M. Blei Computer Science Div., Dept. of Statistics Dept. of Computer Science University of California at Berkeley
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm
More informationBig Idea #1: The process of evolution drives the diversity and unity of life
BIG IDEA! Big Idea #1: The process of evolution drives the diversity and unity of life Key Terms for this section: emigration phenotype adaptation evolution phylogenetic tree adaptive radiation fertility
More informationBayesian Nonparametrics for Speech and Signal Processing
Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationSupplementary Materials for
www.sciencemag.org/content/343/6172/747/suppl/dc1 Supplementary Materials for A Genetic Atlas of Human Admixture History Garrett Hellenthal, George B. J. Busby, Gavin Band, James F. Wilson, Cristian Capelli,
More informationInfinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix
Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations
More informationAn Integrated Approach for the Assessment of Chromosomal Abnormalities
An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 26, 2007 Karyotypes Karyotypes General Cytogenetics
More informationBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro
More informationURN MODELS: the Ewens Sampling Lemma
Department of Computer Science Brown University, Providence sorin@cs.brown.edu October 3, 2014 1 2 3 4 Mutation Mutation: typical values for parameters Equilibrium Probability of fixation 5 6 Ewens Sampling
More informationSpatial Normalized Gamma Process
Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma
More informationSAT in Bioinformatics: Making the Case with Haplotype Inference
SAT in Bioinformatics: Making the Case with Haplotype Inference Inês Lynce 1 and João Marques-Silva 2 1 IST/INESC-ID, Technical University of Lisbon, Portugal ines@sat.inesc-id.pt 2 School of Electronics
More information1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:
.5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the
More informationHidden Markov models in population genetics and evolutionary biology
Hidden Markov models in population genetics and evolutionary biology Gerton Lunter Wellcome Trust Centre for Human Genetics Oxford, UK April 29, 2013 Topics for today Markov chains Hidden Markov models
More informationBIOINFORMATICS. SequenceLDhot: Detecting Recombination Hotspots. Paul Fearnhead a 1 INTRODUCTION 2 METHOD
BIOINFORMATICS Vol. 00 no. 00 2006 Pages 1 5 SequenceLDhot: Detecting Recombination Hotspots Paul Fearnhead a a Department of Mathematics and Statistics, Lancaster University, Lancaster LA1 4YF, UK ABSTRACT
More informationBinomial Mixture Model-based Association Tests under Genetic Heterogeneity
Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,
More informationSupporting Information
Supporting Information Hammer et al. 10.1073/pnas.1109300108 SI Materials and Methods Two-Population Model. Estimating demographic parameters. For each pair of sub-saharan African populations we consider
More informationInference in Explicit Duration Hidden Markov Models
Inference in Explicit Duration Hidden Markov Models Frank Wood Joint work with Chris Wiggins, Mike Dewar Columbia University November, 2011 Wood (Columbia University) EDHMM Inference November, 2011 1 /
More informationWeek 7.2 Ch 4 Microevolutionary Proceses
Week 7.2 Ch 4 Microevolutionary Proceses 1 Mendelian Traits vs Polygenic Traits Mendelian -discrete -single gene determines effect -rarely influenced by environment Polygenic: -continuous -multiple genes
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationAn Integrated Approach for the Assessment of Chromosomal Abnormalities
An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 6, 2007 Karyotypes Mitosis and Meiosis Meiosis Meiosis
More informationThe E-M Algorithm in Genetics. Biostatistics 666 Lecture 8
The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as
More informationDemographic Inference with Coalescent Hidden Markov Model
Demographic Inference with Coalescent Hidden Markov Model Jade Y. Cheng Thomas Mailund Bioinformatics Research Centre Aarhus University Denmark The Thirteenth Asia Pacific Bioinformatics Conference HsinChu,
More informationNon-parametric Clustering with Dirichlet Processes
Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction
More information2. Map genetic distance between markers
Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,
More information(Genome-wide) association analysis
(Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by
More informationThe Lander-Green Algorithm. Biostatistics 666 Lecture 22
The Lander-Green Algorithm Biostatistics 666 Lecture Last Lecture Relationship Inferrence Likelihood of genotype data Adapt calculation to different relationships Siblings Half-Siblings Unrelated individuals
More informationLecture 3a: Dirichlet processes
Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics
More informationEstimating Recombination Rates. LRH selection test, and recombination
Estimating Recombination Rates LRH selection test, and recombination Recall that LRH tests for selection by looking at frequencies of specific haplotypes. Clearly the test is dependent on the recombination
More informationResearch Statement on Statistics Jun Zhang
Research Statement on Statistics Jun Zhang (junzhang@galton.uchicago.edu) My interest on statistics generally includes machine learning and statistical genetics. My recent work focus on detection and interpretation
More information27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1
10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex
More informationIntraspecific gene genealogies: trees grafting into networks
Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation
More informationInteger Programming in Computational Biology. D. Gusfield University of California, Davis Presented December 12, 2016.!
Integer Programming in Computational Biology D. Gusfield University of California, Davis Presented December 12, 2016. There are many important phylogeny problems that depart from simple tree models: Missing
More informationHidden Markov models: from the beginning to the state of the art
Hidden Markov models: from the beginning to the state of the art Frank Wood Columbia University November, 2011 Wood (Columbia University) HMMs November, 2011 1 / 44 Outline Overview of hidden Markov models
More informationBayesian Nonparametrics
Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet
More informationHaplotyping. Biostatistics 666
Haplotyping Biostatistics 666 Previously Introduction to te E-M algoritm Approac for likeliood optimization Examples related to gene counting Allele frequency estimation recessive disorder Allele frequency
More informationApplied Bayesian Nonparametrics 3. Infinite Hidden Markov Models
Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models Tutorial at CVPR 2012 Erik Sudderth Brown University Work by E. Fox, E. Sudderth, M. Jordan, & A. Willsky AOAS 2011: A Sticky HDP-HMM with
More informationNonparametric Factor Analysis with Beta Process Priors
Nonparametric Factor Analysis with Beta Process Priors John Paisley Lawrence Carin Department of Electrical & Computer Engineering Duke University, Durham, NC 7708 jwp4@ee.duke.edu lcarin@ee.duke.edu Abstract
More informationGenetic Association Studies in the Presence of Population Structure and Admixture
Genetic Association Studies in the Presence of Population Structure and Admixture Purushottam W. Laud and Nicholas M. Pajewski Division of Biostatistics Department of Population Health Medical College
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationBustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #
Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either
More informationFeature Selection via Block-Regularized Regression
Feature Selection via Block-Regularized Regression Seyoung Kim School of Computer Science Carnegie Mellon University Pittsburgh, PA 3 Eric Xing School of Computer Science Carnegie Mellon University Pittsburgh,
More informationList the five conditions that can disturb genetic equilibrium in a population.(10)
List the five conditions that can disturb genetic equilibrium in a population.(10) The five conditions are non-random mating, small population size, immigration or emigration, mutations, and natural selection.
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph
More informationarxiv: v1 [cs.lg] 22 Jun 2009
Bayesian two-sample tests arxiv:0906.4032v1 [cs.lg] 22 Jun 2009 Karsten M. Borgwardt 1 and Zoubin Ghahramani 2 1 Max-Planck-Institutes Tübingen, 2 University of Cambridge June 22, 2009 Abstract In this
More informationPopulation Structure
Ch 4: Population Subdivision Population Structure v most natural populations exist across a landscape (or seascape) that is more or less divided into areas of suitable habitat v to the extent that populations
More informationEstimating Evolutionary Trees. Phylogenetic Methods
Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent
More informationCS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)
CS1820 Notes hgupta1, kjline, smechery April 3-April 5 April 3 Notes 1 Minichiello-Durbin Algorithm input: set of sequences output: plausible Ancestral Recombination Graph (ARG) note: the optimal ARG is
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationBayesian Hidden Markov Models and Extensions
Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling
More informationLearning gene regulatory networks Statistical methods for haplotype inference Part I
Learning gene regulatory networks Statistical methods for haplotype inference Part I Input: Measurement of mrn levels of all genes from microarray or rna sequencing Samples (e.g. 200 patients with lung
More informationThe problem Lineage model Examples. The lineage model
The lineage model A Bayesian approach to inferring community structure and evolutionary history from whole-genome metagenomic data Jack O Brien Bowdoin College with Daniel Falush and Xavier Didelot Cambridge,
More informationVisualizing Population Genetics
Visualizing Population Genetics Supervisors: Rachel Fewster, James Russell, Paul Murrell Louise McMillan 1 December 2015 Louise McMillan Visualizing Population Genetics 1 December 2015 1 / 29 Outline 1
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationSharing Clusters Among Related Groups: Hierarchical Dirichlet Processes
Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes Yee Whye Teh (1), Michael I. Jordan (1,2), Matthew J. Beal (3) and David M. Blei (1) (1) Computer Science Div., (2) Dept. of Statistics
More informationBayesian analysis of the Hardy-Weinberg equilibrium model
Bayesian analysis of the Hardy-Weinberg equilibrium model Eduardo Gutiérrez Peña Department of Probability and Statistics IIMAS, UNAM 6 April, 2010 Outline Statistical Inference 1 Statistical Inference
More information