CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation
|
|
- Dylan Ray
- 5 years ago
- Views:
Transcription
1 CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam Banerjee November 26, 2007
2 Genetic Polymorphism Single nucleotide polymorphism (SNP)
3 Genetic Polymorphism Single nucleotide polymorphism (SNP) Two possible kinds of nucleotides at a single locus
4 Genetic Polymorphism Single nucleotide polymorphism (SNP) Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G}
5 Genetic Polymorphism Single nucleotide polymorphism (SNP) Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs
6 Genetic Polymorphism Single nucleotide polymorphism (SNP) Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs Each variant is called an allele
7 Genetic Polymorphism Single nucleotide polymorphism (SNP) Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs Each variant is called an allele Haplotype
8 Genetic Polymorphism Single nucleotide polymorphism (SNP) Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs Each variant is called an allele Haplotype List of alleles in a local region of a chromosome
9 Genetic Polymorphism Single nucleotide polymorphism (SNP) Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs Each variant is called an allele Haplotype List of alleles in a local region of a chromosome Inherited as a unit, if there is no recombination
10 Genetic Polymorphism Single nucleotide polymorphism (SNP) Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs Each variant is called an allele Haplotype List of alleles in a local region of a chromosome Inherited as a unit, if there is no recombination Repeated recombinations between ancestral haplotypes
11 Genetic Polymorphism (Contd.) Linkage disequilibrium (LD)
12 Genetic Polymorphism (Contd.) Linkage disequilibrium (LD) Non-random association of alleles at different loci
13 Genetic Polymorphism (Contd.) Linkage disequilibrium (LD) Non-random association of alleles at different loci Recombination decouples alleles, increase randomness, decrease LD
14 Genetic Polymorphism (Contd.) Linkage disequilibrium (LD) Non-random association of alleles at different loci Recombination decouples alleles, increase randomness, decrease LD Infer chromosomal recombination hotspots
15 Genetic Polymorphism (Contd.) Linkage disequilibrium (LD) Non-random association of alleles at different loci Recombination decouples alleles, increase randomness, decrease LD Infer chromosomal recombination hotspots Help understand origin and characteristics of genetic variation
16 Genetic Polymorphism (Contd.) Linkage disequilibrium (LD) Non-random association of alleles at different loci Recombination decouples alleles, increase randomness, decrease LD Infer chromosomal recombination hotspots Help understand origin and characteristics of genetic variation Analyze genetic variation to reconstruct evolutionary history
17 Haplotype Recombination and Inheritance
18 Hidden Markov Process Generative model for choosing recombination sites
19 Hidden Markov Process Generative model for choosing recombination sites Hidden Markov process
20 Hidden Markov Process Generative model for choosing recombination sites Hidden Markov process Hidden states correspond to index over chromosomes
21 Hidden Markov Process Generative model for choosing recombination sites Hidden Markov process Hidden states correspond to index over chromosomes Transition probabilities correspond to recombination rates
22 Hidden Markov Process Generative model for choosing recombination sites Hidden Markov process Hidden states correspond to index over chromosomes Transition probabilities correspond to recombination rates Emission model corresponds to mutation process that give descendants
23 Hidden Markov Process Generative model for choosing recombination sites Hidden Markov process Hidden states correspond to index over chromosomes Transition probabilities correspond to recombination rates Emission model corresponds to mutation process that give descendants Implemented using a Hidden Markov Dirichlet Process (HMDP)
24 Dirichlet Process Mixtures We know the basics of DPMs
25 Dirichlet Process Mixtures We know the basics of DPMs Haplotype modeling using an infinite mixture model
26 Dirichlet Process Mixtures We know the basics of DPMs Haplotype modeling using an infinite mixture model A pool of ancestor haplotypes or founders
27 Dirichlet Process Mixtures We know the basics of DPMs Haplotype modeling using an infinite mixture model A pool of ancestor haplotypes or founders The size of the pool is unknown
28 Dirichlet Process Mixtures We know the basics of DPMs Haplotype modeling using an infinite mixture model A pool of ancestor haplotypes or founders The size of the pool is unknown Standard coalescence based models
29 Dirichlet Process Mixtures We know the basics of DPMs Haplotype modeling using an infinite mixture model A pool of ancestor haplotypes or founders The size of the pool is unknown Standard coalescence based models Hidden variables is prohibitively large
30 Dirichlet Process Mixtures We know the basics of DPMs Haplotype modeling using an infinite mixture model A pool of ancestor haplotypes or founders The size of the pool is unknown Standard coalescence based models Hidden variables is prohibitively large Hard to perform inference of ancestral features
31 Dirichlet Process Mixtures (Contd.) H i = [H i,1,..., H i,t ] haplotype over T SNPs, chromosome i
32 Dirichlet Process Mixtures (Contd.) H i = [H i,1,..., H i,t ] haplotype over T SNPs, chromosome i A k = [A k,1,..., A k,t ] ancestral haplotype, mutation rate θ k
33 Dirichlet Process Mixtures (Contd.) H i = [H i,1,..., H i,t ] haplotype over T SNPs, chromosome i A k = [A k,1,..., A k,t ] ancestral haplotype, mutation rate θ k C i, inheritance variable, latent ancestor of H i
34 Dirichlet Process Mixtures (Contd.) H i = [H i,1,..., H i,t ] haplotype over T SNPs, chromosome i A k = [A k,1,..., A k,t ] ancestral haplotype, mutation rate θ k C i, inheritance variable, latent ancestor of H i Generative Model:
35 Dirichlet Process Mixtures (Contd.) H i = [H i,1,..., H i,t ] haplotype over T SNPs, chromosome i A k = [A k,1,..., A k,t ] ancestral haplotype, mutation rate θ k C i, inheritance variable, latent ancestor of H i Generative Model: Draw a first haplotype a 1 DP(τ, Q 0 ) Q 0 h 1 P h ( a 1, θ 1 )
36 Dirichlet Process Mixtures (Contd.) H i = [H i,1,..., H i,t ] haplotype over T SNPs, chromosome i A k = [A k,1,..., A k,t ] ancestral haplotype, mutation rate θ k C i, inheritance variable, latent ancestor of H i Generative Model: Draw a first haplotype a 1 DP(τ, Q 0 ) Q 0 h 1 P h ( a 1, θ 1 ) For subsequent haplotypes { p(c i = c j for some j < i c 1,..., c i 1 ) = c i DP(τ, Q 0 ) p(c i c j for all j < i c 1,..., c i 1 ) = nc j i 1+α 0 α0 i 1+α 0
37 Dirichlet Process Mixtures (Contd.) Generative Model (contd)
38 Dirichlet Process Mixtures (Contd.) Generative Model (contd) Sample the founder of haplotype i { = {a cj, θ cj }ifc i = c j for somej < i φ ci DP(τ, Q 0 ) Q(a, θ)ifc i c j for allj < i
39 Dirichlet Process Mixtures (Contd.) Generative Model (contd) Sample the founder of haplotype i { = {a cj, θ cj }ifc i = c j for somej < i φ ci DP(τ, Q 0 ) Q(a, θ)ifc i c j for allj < i Sample the haplotype according to its founder h i c i P( a ci, θ ci )
40 Dirichlet Process Mixtures (Contd.) Generative Model (contd) Sample the founder of haplotype i { = {a cj, θ cj }ifc i = c j for somej < i φ ci DP(τ, Q 0 ) Q(a, θ)ifc i c j for allj < i Sample the haplotype according to its founder h i c i P( a ci, θ ci ) Assumes each haplotype originates from one ancestor
41 Dirichlet Process Mixtures (Contd.) Generative Model (contd) Sample the founder of haplotype i { = {a cj, θ cj }ifc i = c j for somej < i φ ci DP(τ, Q 0 ) Q(a, θ)ifc i c j for allj < i Sample the haplotype according to its founder h i c i P( a ci, θ ci ) Assumes each haplotype originates from one ancestor Valid only for short regions in chromosome
42 Dirichlet Process Mixtures (Contd.) Generative Model (contd) Sample the founder of haplotype i { = {a cj, θ cj }ifc i = c j for somej < i φ ci DP(τ, Q 0 ) Q(a, θ)ifc i c j for allj < i Sample the haplotype according to its founder h i c i P( a ci, θ ci ) Assumes each haplotype originates from one ancestor Valid only for short regions in chromosome Long regions will have recombination
43 Hidden Markov Dirichlet Process Nonparametric Bayesian HMM
44 Hidden Markov Dirichlet Process Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space
45 Hidden Markov Dirichlet Process Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support
46 Hidden Markov Dirichlet Process Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns
47 Hidden Markov Dirichlet Process Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns Stock urn Q 0 with balls of K colors, n k of color k
48 Hidden Markov Dirichlet Process Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns Stock urn Q 0 with balls of K colors, n k of color k HMM-urns Q 1,..., Q K for prior and transition probabilities
49 Hidden Markov Dirichlet Process Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns Stock urn Q 0 with balls of K colors, n k of color k HMM-urns Q 1,..., Q K for prior and transition probabilities Let m j,k be the number of balls of color k in urn Q j
50 Hidden Markov Dirichlet Process Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns Stock urn Q 0 with balls of K colors, n k of color k HMM-urns Q 1,..., Q K for prior and transition probabilities Let m j,k be the number of balls of color k in urn Q j HDPM can be simulated by sampling from the urn hierarchy
51 Hidden Markov Dirichlet Process Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns Stock urn Q 0 with balls of K colors, n k of color k HMM-urns Q 1,..., Q K for prior and transition probabilities Let m j,k be the number of balls of color k in urn Q j HDPM can be simulated by sampling from the urn hierarchy Hierarchical DPM Q 0 α, F DP(α, F ) Q j τ, Q 0 DP(τ, Q 0 )
52 Hidden Markov Dirichlet Process (Contd.) Each color corresponds to ancestor configuration φ k = {a k, θ k }
53 Hidden Markov Dirichlet Process (Contd.) Each color corresponds to ancestor configuration φ k = {a k, θ k } For n random draws from Q 0 φ n φ n K k=1 n k n 1 + α δ α φ k (φ n ) + n 1 + α F (φ n)
54 Hidden Markov Dirichlet Process (Contd.) Each color corresponds to ancestor configuration φ k = {a k, θ k } For n random draws from Q 0 φ n φ n K k=1 n k n 1 + α δ α φ k (φ n ) + n 1 + α F (φ n) Conditioned on Q 0, the marginal configs from Q j φ mj φ mj k n k m j,k + τ n 1+α m j 1 + tau + τ α m j 1 + τ n 1 + α F (φ m j )
55 Hidden Markov Dirichlet Process (Contd.) Each color corresponds to ancestor configuration φ k = {a k, θ k } For n random draws from Q 0 φ n φ n K k=1 n k n 1 + α δ α φ k (φ n ) + n 1 + α F (φ n) Conditioned on Q 0, the marginal configs from Q j φ mj φ mj k n k m j,k + τ n 1+α m j 1 + tau + τ α m j 1 + τ n 1 + α F (φ m j )
56 HMDP for Recombination and Inheritance Priors for the conditional model parameters F (A, θ) = p(a)p(θ)
57 HMDP for Recombination and Inheritance Priors for the conditional model parameters F (A, θ) = p(a)p(θ) p(a) is assumed uniform, p(θ) is assumed beta
58 HMDP for Recombination and Inheritance Priors for the conditional model parameters F (A, θ) = p(a)p(θ) p(a) is assumed uniform, p(θ) is assumed beta C i = [C i,1,..., C i,t ] ancestral index for chromosome i
59 HMDP for Recombination and Inheritance Priors for the conditional model parameters F (A, θ) = p(a)p(θ) p(a) is assumed uniform, p(θ) is assumed beta C i = [C i,1,..., C i,t ] ancestral index for chromosome i With no recombination, C i,t = k, t for some k
60 HMDP for Recombination and Inheritance Priors for the conditional model parameters F (A, θ) = p(a)p(θ) p(a) is assumed uniform, p(θ) is assumed beta C i = [C i,1,..., C i,t ] ancestral index for chromosome i With no recombination, C i,t = k, t for some k Non-recombination is modeled by Poisson point process P(C i,t+1 = C i,t = k) = exp( dr) + (1 exp( dr))π kk
61 HMDP for Recombination and Inheritance Priors for the conditional model parameters F (A, θ) = p(a)p(θ) p(a) is assumed uniform, p(θ) is assumed beta C i = [C i,1,..., C i,t ] ancestral index for chromosome i With no recombination, C i,t = k, t for some k Non-recombination is modeled by Poisson point process P(C i,t+1 = C i,t = k) = exp( dr) + (1 exp( dr))π kk d is the distance between the two loci
62 HMDP for Recombination and Inheritance Priors for the conditional model parameters F (A, θ) = p(a)p(θ) p(a) is assumed uniform, p(θ) is assumed beta C i = [C i,1,..., C i,t ] ancestral index for chromosome i With no recombination, C i,t = k, t for some k Non-recombination is modeled by Poisson point process P(C i,t+1 = C i,t = k) = exp( dr) + (1 exp( dr))π kk d is the distance between the two loci r is the rate of recombination per unit distance
63 HMDP for Recombination and Inheritance Priors for the conditional model parameters F (A, θ) = p(a)p(θ) p(a) is assumed uniform, p(θ) is assumed beta C i = [C i,1,..., C i,t ] ancestral index for chromosome i With no recombination, C i,t = k, t for some k Non-recombination is modeled by Poisson point process P(C i,t+1 = C i,t = k) = exp( dr) + (1 exp( dr))π kk d is the distance between the two loci r is the rate of recombination per unit distance The transition probability to state k is P(C i,t = k, C i,t+1 = k ) = (1 exp(dr))π kk
64 HMDP for Recombination and Inheritance (Contd.) H i is a mosaic of multiple ancestral chromosomes
65 HMDP for Recombination and Inheritance (Contd.) H i is a mosaic of multiple ancestral chromosomes Model is a time-inhomogenous infinite HMM
66 HMDP for Recombination and Inheritance (Contd.) H i is a mosaic of multiple ancestral chromosomes Model is a time-inhomogenous infinite HMM With r, we get stationary HMM
67 HMDP for Recombination and Inheritance (Contd.) H i is a mosaic of multiple ancestral chromosomes Model is a time-inhomogenous infinite HMM With r, we get stationary HMM Single locus mutation model for emission ( ) 1 θ I(ht at) p(h t a t, θ) = θ I(ht=at) B 1
68 Haplotype Recombination and Inheritance
69 HMDP for Recombination and Inheritance (Contd.) Conditional probability of haplotype list h p(h c, a) = p(h i,t a k,t, θ k )Beta(θ k α h, β h )dθ k k θ k i,t c i,t =k = Γ(α h + β h ) Γ(α h + l k )Γ(β h + l k ) ( ) 1 l k Γ(α h )Γ(β h ) Γ(α h + β h + l k + l k k ) B 1 where l k = i,t I(h i,t = a k,t )I(c i,t = k) l k = i,t I(h i,t a k,t )I(c i,t = k)
70 Inference Gibbs sampler proceeds in two steps
71 Inference Gibbs sampler proceeds in two steps Sample inheritance {C i,k } given h and a
72 Inference Gibbs sampler proceeds in two steps Sample inheritance {C i,k } given h and a Sample ancestors a = {a 1,..., a K } given h, C
73 Inference Gibbs sampler proceeds in two steps Sample inheritance {C i,k } given h and a Sample ancestors a = {a 1,..., a K } given h, C Improve mixing for sampling inheritance
74 Inference Gibbs sampler proceeds in two steps Sample inheritance {C i,k } given h and a Sample ancestors a = {a 1,..., a K } given h, C Improve mixing for sampling inheritance By Bayes rule t+δ p(c t+1 : t + δ c, h, a) p(c j+1 c j, m, n) j=t t+δ j=t+1 p(h j a cj,j, l cj )
75 Inference Gibbs sampler proceeds in two steps Sample inheritance {C i,k } given h and a Sample ancestors a = {a 1,..., a K } given h, C Improve mixing for sampling inheritance By Bayes rule t+δ p(c t+1 : t + δ c, h, a) p(c j+1 c j, m, n) j=t t+δ j=t+1 p(h j a cj,j, l cj ) Assume probability of having two recombinations is small p(c t+1 : t + δ c, h, a) p(c t c t 1, m, n)p(c t+δ+1 c t+δ = c t, m, n) t j=
76 Inference (Contd.) Assuming d, r to be small, λ = 1 exp( dr) dr { λπ k,k + (1 λ)δ(k, k )fork {1 p(c t = k c t 1 = k, m, n, r, d) = λπ k,k+1 fork = K + 1
77 Inference (Contd.) Assuming d, r to be small, λ = 1 exp( dr) dr { λπ k,k + (1 λ)δ(k, k )fork {1 p(c t = k c t 1 = k, m, n, r, d) = λπ k,k+1 fork = K + 1 Terms can be replaced in original equation to get sampler
78 Inference (Contd.) Assuming d, r to be small, λ = 1 exp( dr) dr { λπ k,k + (1 λ)δ(k, k )fork {1 p(c t = k c t 1 = k, m, n, r, d) = λπ k,k+1 fork = K + 1 Terms can be replaced in original equation to get sampler Posterior distribution for ancestors p(a k,t c, h) Γ(α h + β h ) Γ(α h + l k,t )Γ(β h + l k,t ) ( ) 1 l k,t Γ(α h )Γ(β h ) Γ(α h + β h + l k,t + l k,t ) B 1
79 Single Population Data Haplotype block boundaries HMDP (black solid), HMM (red dotted), MDL (blue dashed)
80 Two Population Data
81 Hierarchical DPM for Haplotype Inference
82 Hierarchical DPM for Haplotype Inference (Contd.)
83 Experiments: Hapmap Data SNP genotypes from four populations
84 Experiments: Hapmap Data SNP genotypes from four populations CEPH, Utah residents with northern/weatern European ancestry, 60
85 Experiments: Hapmap Data SNP genotypes from four populations CEPH, Utah residents with northern/weatern European ancestry, 60 YRI, Yoruba in Ibadan, Nigeria, 60
86 Experiments: Hapmap Data SNP genotypes from four populations CEPH, Utah residents with northern/weatern European ancestry, 60 YRI, Yoruba in Ibadan, Nigeria, 60 CHB, Han Chinese in Beijing, 45
87 Experiments: Hapmap Data SNP genotypes from four populations CEPH, Utah residents with northern/weatern European ancestry, 60 YRI, Yoruba in Ibadan, Nigeria, 60 CHB, Han Chinese in Beijing, 45 JPT, Japanese in Tokyo, 44
88 Experiments: Hapmap Data SNP genotypes from four populations CEPH, Utah residents with northern/weatern European ancestry, 60 YRI, Yoruba in Ibadan, Nigeria, 60 CHB, Han Chinese in Beijing, 45 JPT, Japanese in Tokyo, 44 Experiments on short ( 10) and long ( ) SNPs
89 Short SNP Sequences
90 Long SNP Sequences
91 Mutation Rates and Diversity
92 Mutation Rates and Diversity (Contd.)
Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space
Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space Kyung-Ah Sohn School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 ksohn@cs.cmu.edu Eric P.
More informationA Nonparametric Bayesian Approach for Haplotype Reconstruction from Single and Multi-Population Data
A Nonparametric Bayesian Approach for Haplotype Reconstruction from Single and Multi-Population Data Eric P. Xing January 27 CMU-ML-7-17 Kyung-Ah Sohn School of Computer Science Carnegie Mellon University
More informationLearning ancestral genetic processes using nonparametric Bayesian models
Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew
More information1. Understand the methods for analyzing population structure in genomes
MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population
More informationBayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture
Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Eric P. Xing epxing@cs.cmu.edu Kyung-Ah Sohn sohn@cs.cmu.edu Michael I. Jordan jordan@cs.bereley.edu Yee-Whye
More informationLearning Ancestral Genetic Processes using Nonparametric Bayesian Models
Learning Ancestral Genetic Processes using Nonparametric Bayesian Models Kyung-Ah Sohn CMU-CS-11-136 November 2011 Computer Science Department School of Computer Science Carnegie Mellon University Pittsburgh,
More informationAdvanced Machine Learning
Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing
More informationLinkage and Linkage Disequilibrium
Linkage and Linkage Disequilibrium Summer Institute in Statistical Genetics 2014 Module 10 Topic 3 Linkage in a simple genetic cross Linkage In the early 1900 s Bateson and Punnet conducted genetic studies
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationEffect of Genetic Divergence in Identifying Ancestral Origin using HAPAA
Effect of Genetic Divergence in Identifying Ancestral Origin using HAPAA Andreas Sundquist*, Eugene Fratkin*, Chuong B. Do, Serafim Batzoglou Department of Computer Science, Stanford University, Stanford,
More informationMathematical models in population genetics II
Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population
More informationImage segmentation combining Markov Random Fields and Dirichlet Processes
Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO
More informationp(d g A,g B )p(g B ), g B
Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)
More informationIntroduction to Linkage Disequilibrium
Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have
More informationGenotype Imputation. Biostatistics 666
Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives
More informationFrequency Spectra and Inference in Population Genetics
Frequency Spectra and Inference in Population Genetics Although coalescent models have come to play a central role in population genetics, there are some situations where genealogies may not lead to efficient
More informationGenetic Association Studies in the Presence of Population Structure and Admixture
Genetic Association Studies in the Presence of Population Structure and Admixture Purushottam W. Laud and Nicholas M. Pajewski Division of Biostatistics Department of Population Health Medical College
More informationA Brief Overview of Nonparametric Bayesian Models
A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine
More informationSupporting Information
Supporting Information Hammer et al. 10.1073/pnas.1109300108 SI Materials and Methods Two-Population Model. Estimating demographic parameters. For each pair of sub-saharan African populations we consider
More informationLecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012
Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012 Last Time Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based
More informationLecture 3a: Dirichlet processes
Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics
More informationBayesian Inference of Interactions and Associations
Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,
More informationBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro
More informationModelling Genetic Variations with Fragmentation-Coagulation Processes
Modelling Genetic Variations with Fragmentation-Coagulation Processes Yee Whye Teh, Charles Blundell, Lloyd Elliott Gatsby Computational Neuroscience Unit, UCL Genetic Variations in Populations Inferring
More informationChapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)
12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities
More informationmstruct: Inference of Population Structure in Light of both Genetic Admixing and Allele Mutations
mstruct: Inference of Population Structure in Light of both Genetic Admixing and Allele Mutations Suyash Shringarpure and Eric P. Xing May 2008 CMU-ML-08-105 mstruct: Inference of Population Structure
More informationInfinite latent feature models and the Indian Buffet Process
p.1 Infinite latent feature models and the Indian Buffet Process Tom Griffiths Cognitive and Linguistic Sciences Brown University Joint work with Zoubin Ghahramani p.2 Beyond latent classes Unsupervised
More informationFeature Selection via Block-Regularized Regression
Feature Selection via Block-Regularized Regression Seyoung Kim School of Computer Science Carnegie Mellon University Pittsburgh, PA 3 Eric Xing School of Computer Science Carnegie Mellon University Pittsburgh,
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm
More informationGenotype Imputation. Class Discussion for January 19, 2016
Genotype Imputation Class Discussion for January 19, 2016 Intuition Patterns of genetic variation in one individual guide our interpretation of the genomes of other individuals Imputation uses previously
More informationCS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I
X i Ν CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr 2004 Dirichlet Process I Lecturer: Prof. Michael Jordan Scribe: Daniel Schonberg dschonbe@eecs.berkeley.edu 22.1 Dirichlet
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationBayesian nonparametrics
Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability
More informationThe problem Lineage model Examples. The lineage model
The lineage model A Bayesian approach to inferring community structure and evolutionary history from whole-genome metagenomic data Jack O Brien Bowdoin College with Daniel Falush and Xavier Didelot Cambridge,
More information1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:
.5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the
More informationBIOINFORMATICS. StructHDP: Automatic inference of number of clusters and population structure from admixed genotype data
BIOINFORMATICS Vol. 00 no. 00 2011 Pages 1 9 StructHDP: Automatic inference of number of clusters and population structure from admixed genotype data Suyash Shringarpure 1, Daegun Won 1 and Eric P. Xing
More informationStationary Distribution of the Linkage Disequilibrium Coefficient r 2
Stationary Distribution of the Linkage Disequilibrium Coefficient r 2 Wei Zhang, Jing Liu, Rachel Fewster and Jesse Goodman Department of Statistics, The University of Auckland December 1, 2015 Overview
More informationAnalysis of DNA variations in GSTA and GSTM gene clusters based on the results of genome-wide data from three Russian populations taken as an example
Filippova et al. BMC Genetics 2012, 13:89 RESEARCH ARTICLE Open Access Analysis of DNA variations in GSTA and GSTM gene clusters based on the results of genome-wide data from three Russian populations
More informationSharing Clusters Among Related Groups: Hierarchical Dirichlet Processes
Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes Yee Whye Teh (1), Michael I. Jordan (1,2), Matthew J. Beal (3) and David M. Blei (1) (1) Computer Science Div., (2) Dept. of Statistics
More informationLatent class analysis variable selection
Ann Inst Stat Math (2010) 62:11 35 DOI 10.1007/s10463-009-0258-9 Latent class analysis variable selection Nema Dean Adrian E. Raftery Received: 19 July 2008 / Revised: 22 April 2009 / Published online:
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationSupporting Information Text S1
Supporting Information Text S1 List of Supplementary Figures S1 The fraction of SNPs s where there is an excess of Neandertal derived alleles n over Denisova derived alleles d as a function of the derived
More informationBayesian Nonparametric Models
Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior
More informationAn Integrated Approach for the Assessment of Chromosomal Abnormalities
An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 26, 2007 Karyotypes Karyotypes General Cytogenetics
More informationCS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)
CS1820 Notes hgupta1, kjline, smechery April 3-April 5 April 3 Notes 1 Minichiello-Durbin Algorithm input: set of sequences output: plausible Ancestral Recombination Graph (ARG) note: the optimal ARG is
More informationStatistical Methods for studying Genetic Variation in Populations
Statistical Methods for studying Genetic Variation in Populations Suyash Shringarpure August 2012 CMU-ML-12-105 Statistical Methods for studying Genetic Variation in Populations Suyash Shringarpure CMU-ML-12-105
More informationComputational Approaches to Statistical Genetics
Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen
More informationNonparametric Bayes Modeling
Nonparametric Bayes Modeling Lecture 6: Advanced Applications of DPMs David Dunson Department of Statistical Science, Duke University Tuesday February 2, 2010 Motivation Functional data analysis Variable
More informationBayesian Nonparametric Modelling of Genetic Variations using Fragmentation-Coagulation Processes
Journal of Machine Learning Research 1 (2) 1-48 Submitted 4/; Published 1/ Bayesian Nonparametric Modelling of Genetic Variations using Fragmentation-Coagulation Processes Yee Whye Teh y.w.teh@stats.ox.ac.uk
More information(Genome-wide) association analysis
(Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by
More informationNotes on Population Genetics
Notes on Population Genetics Graham Coop 1 1 Department of Evolution and Ecology & Center for Population Biology, University of California, Davis. To whom correspondence should be addressed: gmcoop@ucdavis.edu
More informationGenetic Variation in Finite Populations
Genetic Variation in Finite Populations The amount of genetic variation found in a population is influenced by two opposing forces: mutation and genetic drift. 1 Mutation tends to increase variation. 2
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationApplied Bayesian Nonparametrics 3. Infinite Hidden Markov Models
Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models Tutorial at CVPR 2012 Erik Sudderth Brown University Work by E. Fox, E. Sudderth, M. Jordan, & A. Willsky AOAS 2011: A Sticky HDP-HMM with
More informationHumans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase
Humans have two copies of each chromosome Inherited from mother and father. Genotyping technologies do not maintain the phase Genotyping technologies do not maintain the phase Recall that proximal SNPs
More informationFigure S1: The model underlying our inference of the age of ancient genomes
A genetic method for dating ancient genomes provides a direct estimate of human generation interval in the last 45,000 years Priya Moorjani, Sriram Sankararaman, Qiaomei Fu, Molly Przeworski, Nick Patterson,
More informationIntroduction to Advanced Population Genetics
Introduction to Advanced Population Genetics Learning Objectives Describe the basic model of human evolutionary history Describe the key evolutionary forces How demography can influence the site frequency
More informationLecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu
Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:
More informationHidden Markov Models for the Assessment of Chromosomal Alterations using High-throughput SNP Arrays
Hidden Markov Models for the Assessment of Chromosomal Alterations using High-throughput SNP Arrays Department of Biostatistics Johns Hopkins Bloomberg School of Public Health November 18, 2008 Acknowledgments
More informationBayesian Nonparametrics: Dirichlet Process
Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian
More informationConstruction of Dependent Dirichlet Processes based on Poisson Processes
1 / 31 Construction of Dependent Dirichlet Processes based on Poisson Processes Dahua Lin Eric Grimson John Fisher CSAIL MIT NIPS 2010 Outstanding Student Paper Award Presented by Shouyuan Chen Outline
More informationWeek 7.2 Ch 4 Microevolutionary Proceses
Week 7.2 Ch 4 Microevolutionary Proceses 1 Mendelian Traits vs Polygenic Traits Mendelian -discrete -single gene determines effect -rarely influenced by environment Polygenic: -continuous -multiple genes
More informationAn Efficient and Accurate Graph-Based Approach to Detect Population Substructure
An Efficient and Accurate Graph-Based Approach to Detect Population Substructure Srinath Sridhar, Satish Rao and Eran Halperin Abstract. Currently, large-scale projects are underway to perform whole genome
More information2. Map genetic distance between markers
Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,
More informationDirichlet Processes: Tutorial and Practical Course
Dirichlet Processes: Tutorial and Practical Course (updated) Yee Whye Teh Gatsby Computational Neuroscience Unit University College London August 2007 / MLSS Yee Whye Teh (Gatsby) DP August 2007 / MLSS
More informationBayesian nonparametric latent feature models
Bayesian nonparametric latent feature models François Caron UBC October 2, 2007 / MLRG François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 1 / 29 Overview 1 Introduction
More informationPhasing via the Expectation Maximization (EM) Algorithm
Computing Haplotype Frequencies and Haplotype Phasing via the Expectation Maximization (EM) Algorithm Department of Computer Science Brown University, Providence sorin@cs.brown.edu September 14, 2010 Outline
More informationBinomial Mixture Model-based Association Tests under Genetic Heterogeneity
Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,
More informationTutorial Session 2. MCMC for the analysis of genetic data on pedigrees:
MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation
More informationTime-Sensitive Dirichlet Process Mixture Models
Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce
More informationAn Integrated Approach for the Assessment of Chromosomal Abnormalities
An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 6, 2007 Karyotypes Mitosis and Meiosis Meiosis Meiosis
More informationURN MODELS: the Ewens Sampling Lemma
Department of Computer Science Brown University, Providence sorin@cs.brown.edu October 3, 2014 1 2 3 4 Mutation Mutation: typical values for parameters Equilibrium Probability of fixation 5 6 Ewens Sampling
More informationDetecting selection from differentiation between populations: the FLK and hapflk approach.
Detecting selection from differentiation between populations: the FLK and hapflk approach. Bertrand Servin bservin@toulouse.inra.fr Maria-Ines Fariello, Simon Boitard, Claude Chevalet, Magali SanCristobal,
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationFigure S2. The distribution of the sizes (in bp) of syntenic regions of humans and chimpanzees on human chromosome 21.
Frequency 0 1000 2000 3000 4000 5000 0 2 4 6 8 10 Distance Figure S1. The distribution of human-chimpanzee sequence divergence for syntenic regions of humans and chimpanzees on human chromosome 21. Distance
More informationBayesian Nonparametric Models on Decomposable Graphs
Bayesian Nonparametric Models on Decomposable Graphs François Caron INRIA Bordeaux Sud Ouest Institut de Mathématiques de Bordeaux University of Bordeaux, France francois.caron@inria.fr Arnaud Doucet Departments
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationInfinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix
Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations
More informationMCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES
MCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES Elizabeth A. Thompson Department of Statistics, University of Washington Box 354322, Seattle, WA 98195-4322, USA Email: thompson@stat.washington.edu This
More informationCollapsed Variational Dirichlet Process Mixture Models
Collapsed Variational Dirichlet Process Mixture Models Kenichi Kurihara Dept. of Computer Science Tokyo Institute of Technology, Japan kurihara@mi.cs.titech.ac.jp Max Welling Dept. of Computer Science
More informationStatistical Methods for studying Genetic Variation in Populations
Statistical Methods for studying Genetic Variation in Populations Suyash Shringarpure August 2012 CMU-ML-12-105 Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the
More information2 : Directed GMs: Bayesian Networks
10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types
More informationBustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #
Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either
More informationHierarchical Bayesian Nonparametric Models with Applications
Hierarchical Bayesian Nonparametric Models with Applications Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 17 Queen Square London WC1N 3AR, United Kingdom Michael I. Jordan
More informationCover Page. The handle holds various files of this Leiden University dissertation
Cover Page The handle http://hdl.handle.net/1887/35195 holds various files of this Leiden University dissertation Author: Balliu, Brunilda Title: Statistical methods for genetic association studies with
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationFull Likelihood Inference from the Site Frequency Spectrum based on the Optimal Tree Resolution
Full Likelihood Inference from the Site Frequency Spectrum based on the Optimal Tree Resolution Raazesh Sainudiin 1, Amandine Véber 2 March 3, 2018 1 Department of Mathematics, Uppsala University, Uppsala,
More information10. Exchangeability and hierarchical models Objective. Recommended reading
10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.
More informationSpatial Normalized Gamma Process
Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma
More informationFast Approximate MAP Inference for Bayesian Nonparametrics
Fast Approximate MAP Inference for Bayesian Nonparametrics Y. Raykov A. Boukouvalas M.A. Little Department of Mathematics Aston University 10th Conference on Bayesian Nonparametrics, 2015 1 Iterated Conditional
More informationModelling Linkage Disequilibrium, And Identifying Recombination Hotspots Using SNP Data
Modelling Linkage Disequilibrium, And Identifying Recombination Hotspots Using SNP Data Na Li and Matthew Stephens July 25, 2003 Department of Biostatistics, University of Washington, Seattle, WA 98195
More informationDirichlet Processes and other non-parametric Bayesian models
Dirichlet Processes and other non-parametric Bayesian models Zoubin Ghahramani http://learning.eng.cam.ac.uk/zoubin/ zoubin@cs.cmu.edu Statistical Machine Learning CMU 10-702 / 36-702 Spring 2008 Model
More informationInfering the Number of State Clusters in Hidden Markov Model and its Extension
Infering the Number of State Clusters in Hidden Markov Model and its Extension Xugang Ye Department of Applied Mathematics and Statistics, Johns Hopkins University Elements of a Hidden Markov Model (HMM)
More informationBayesian Hidden Markov Models and Extensions
Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationBayesian Nonparametric Autoregressive Models via Latent Variable Representation
Bayesian Nonparametric Autoregressive Models via Latent Variable Representation Maria De Iorio Yale-NUS College Dept of Statistical Science, University College London Collaborators: Lifeng Ye (UCL, London,
More informationSteven L. Scott. Presented by Ahmet Engin Ural
Steven L. Scott Presented by Ahmet Engin Ural Overview of HMM Evaluating likelihoods The Likelihood Recursion The Forward-Backward Recursion Sampling HMM DG and FB samplers Autocovariance of samplers Some
More informationSupplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles
Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To
More information