CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

Similar documents
Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space

A Nonparametric Bayesian Approach for Haplotype Reconstruction from Single and Multi-Population Data

Learning ancestral genetic processes using nonparametric Bayesian models

1. Understand the methods for analyzing population structure in genomes

Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture

Learning Ancestral Genetic Processes using Nonparametric Bayesian Models

Advanced Machine Learning

Linkage and Linkage Disequilibrium

Non-Parametric Bayes

Effect of Genetic Divergence in Identifying Ancestral Origin using HAPAA

Mathematical models in population genetics II

Image segmentation combining Markov Random Fields and Dirichlet Processes

p(d g A,g B )p(g B ), g B

Introduction to Linkage Disequilibrium

Genotype Imputation. Biostatistics 666

Frequency Spectra and Inference in Population Genetics

Genetic Association Studies in the Presence of Population Structure and Admixture

A Brief Overview of Nonparametric Bayesian Models

Supporting Information

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Lecture 3a: Dirichlet processes

Bayesian Inference of Interactions and Associations

Bayesian Nonparametrics: Models Based on the Dirichlet Process

Modelling Genetic Variations with Fragmentation-Coagulation Processes

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Calculation of IBD probabilities

mstruct: Inference of Population Structure in Light of both Genetic Admixing and Allele Mutations

Infinite latent feature models and the Indian Buffet Process

Feature Selection via Block-Regularized Regression

Calculation of IBD probabilities

Genotype Imputation. Class Discussion for January 19, 2016

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I

CSC 2541: Bayesian Methods for Machine Learning

Bayesian Nonparametrics

Bayesian nonparametrics

The problem Lineage model Examples. The lineage model

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

BIOINFORMATICS. StructHDP: Automatic inference of number of clusters and population structure from admixed genotype data

Stationary Distribution of the Linkage Disequilibrium Coefficient r 2

Analysis of DNA variations in GSTA and GSTM gene clusters based on the results of genome-wide data from three Russian populations taken as an example

Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes

Latent class analysis variable selection

Linear Regression (1/1/17)

Supporting Information Text S1

Bayesian Nonparametric Models

An Integrated Approach for the Assessment of Chromosomal Abnormalities

CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)

Statistical Methods for studying Genetic Variation in Populations

Computational Approaches to Statistical Genetics

Nonparametric Bayes Modeling

Bayesian Nonparametric Modelling of Genetic Variations using Fragmentation-Coagulation Processes

(Genome-wide) association analysis

Notes on Population Genetics

Genetic Variation in Finite Populations

Nonparametric Bayesian Methods - Lecture I

Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models

Humans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase

Figure S1: The model underlying our inference of the age of ancient genomes

Introduction to Advanced Population Genetics

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Hidden Markov Models for the Assessment of Chromosomal Alterations using High-throughput SNP Arrays

Bayesian Nonparametrics: Dirichlet Process

Construction of Dependent Dirichlet Processes based on Poisson Processes

Week 7.2 Ch 4 Microevolutionary Proceses

An Efficient and Accurate Graph-Based Approach to Detect Population Substructure

2. Map genetic distance between markers

Dirichlet Processes: Tutorial and Practical Course

Bayesian nonparametric latent feature models

Phasing via the Expectation Maximization (EM) Algorithm

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Time-Sensitive Dirichlet Process Mixture Models

An Integrated Approach for the Assessment of Chromosomal Abnormalities

URN MODELS: the Ewens Sampling Lemma

Detecting selection from differentiation between populations: the FLK and hapflk approach.

13: Variational inference II

Figure S2. The distribution of the sizes (in bp) of syntenic regions of humans and chimpanzees on human chromosome 21.

Bayesian Nonparametric Models on Decomposable Graphs

Contents. Part I: Fundamentals of Bayesian Inference 1

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

MCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES

Collapsed Variational Dirichlet Process Mixture Models

Statistical Methods for studying Genetic Variation in Populations

2 : Directed GMs: Bayesian Networks

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Hierarchical Bayesian Nonparametric Models with Applications

Cover Page. The handle holds various files of this Leiden University dissertation

Computational statistics

Full Likelihood Inference from the Site Frequency Spectrum based on the Optimal Tree Resolution

10. Exchangeability and hierarchical models Objective. Recommended reading

Spatial Normalized Gamma Process

Fast Approximate MAP Inference for Bayesian Nonparametrics

Modelling Linkage Disequilibrium, And Identifying Recombination Hotspots Using SNP Data

Dirichlet Processes and other non-parametric Bayesian models

Infering the Number of State Clusters in Hidden Markov Model and its Extension

Bayesian Hidden Markov Models and Extensions

Lecture 13 : Variational Inference: Mean Field Approximation

Bayesian Nonparametric Autoregressive Models via Latent Variable Representation

Steven L. Scott. Presented by Ahmet Engin Ural

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Transcription:

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam Banerjee November 26, 2007

Genetic Polymorphism Single nucleotide polymorphism (SNP)

Genetic Polymorphism Single nucleotide polymorphism (SNP) Two possible kinds of nucleotides at a single locus

Genetic Polymorphism Single nucleotide polymorphism (SNP) Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G}

Genetic Polymorphism Single nucleotide polymorphism (SNP) Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs

Genetic Polymorphism Single nucleotide polymorphism (SNP) Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs Each variant is called an allele

Genetic Polymorphism Single nucleotide polymorphism (SNP) Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs Each variant is called an allele Haplotype

Genetic Polymorphism Single nucleotide polymorphism (SNP) Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs Each variant is called an allele Haplotype List of alleles in a local region of a chromosome

Genetic Polymorphism Single nucleotide polymorphism (SNP) Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs Each variant is called an allele Haplotype List of alleles in a local region of a chromosome Inherited as a unit, if there is no recombination

Genetic Polymorphism Single nucleotide polymorphism (SNP) Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs Each variant is called an allele Haplotype List of alleles in a local region of a chromosome Inherited as a unit, if there is no recombination Repeated recombinations between ancestral haplotypes

Genetic Polymorphism (Contd.) Linkage disequilibrium (LD)

Genetic Polymorphism (Contd.) Linkage disequilibrium (LD) Non-random association of alleles at different loci

Genetic Polymorphism (Contd.) Linkage disequilibrium (LD) Non-random association of alleles at different loci Recombination decouples alleles, increase randomness, decrease LD

Genetic Polymorphism (Contd.) Linkage disequilibrium (LD) Non-random association of alleles at different loci Recombination decouples alleles, increase randomness, decrease LD Infer chromosomal recombination hotspots

Genetic Polymorphism (Contd.) Linkage disequilibrium (LD) Non-random association of alleles at different loci Recombination decouples alleles, increase randomness, decrease LD Infer chromosomal recombination hotspots Help understand origin and characteristics of genetic variation

Genetic Polymorphism (Contd.) Linkage disequilibrium (LD) Non-random association of alleles at different loci Recombination decouples alleles, increase randomness, decrease LD Infer chromosomal recombination hotspots Help understand origin and characteristics of genetic variation Analyze genetic variation to reconstruct evolutionary history

Haplotype Recombination and Inheritance

Hidden Markov Process Generative model for choosing recombination sites

Hidden Markov Process Generative model for choosing recombination sites Hidden Markov process

Hidden Markov Process Generative model for choosing recombination sites Hidden Markov process Hidden states correspond to index over chromosomes

Hidden Markov Process Generative model for choosing recombination sites Hidden Markov process Hidden states correspond to index over chromosomes Transition probabilities correspond to recombination rates

Hidden Markov Process Generative model for choosing recombination sites Hidden Markov process Hidden states correspond to index over chromosomes Transition probabilities correspond to recombination rates Emission model corresponds to mutation process that give descendants

Hidden Markov Process Generative model for choosing recombination sites Hidden Markov process Hidden states correspond to index over chromosomes Transition probabilities correspond to recombination rates Emission model corresponds to mutation process that give descendants Implemented using a Hidden Markov Dirichlet Process (HMDP)

Dirichlet Process Mixtures We know the basics of DPMs

Dirichlet Process Mixtures We know the basics of DPMs Haplotype modeling using an infinite mixture model

Dirichlet Process Mixtures We know the basics of DPMs Haplotype modeling using an infinite mixture model A pool of ancestor haplotypes or founders

Dirichlet Process Mixtures We know the basics of DPMs Haplotype modeling using an infinite mixture model A pool of ancestor haplotypes or founders The size of the pool is unknown

Dirichlet Process Mixtures We know the basics of DPMs Haplotype modeling using an infinite mixture model A pool of ancestor haplotypes or founders The size of the pool is unknown Standard coalescence based models

Dirichlet Process Mixtures We know the basics of DPMs Haplotype modeling using an infinite mixture model A pool of ancestor haplotypes or founders The size of the pool is unknown Standard coalescence based models Hidden variables is prohibitively large

Dirichlet Process Mixtures We know the basics of DPMs Haplotype modeling using an infinite mixture model A pool of ancestor haplotypes or founders The size of the pool is unknown Standard coalescence based models Hidden variables is prohibitively large Hard to perform inference of ancestral features

Dirichlet Process Mixtures (Contd.) H i = [H i,1,..., H i,t ] haplotype over T SNPs, chromosome i

Dirichlet Process Mixtures (Contd.) H i = [H i,1,..., H i,t ] haplotype over T SNPs, chromosome i A k = [A k,1,..., A k,t ] ancestral haplotype, mutation rate θ k

Dirichlet Process Mixtures (Contd.) H i = [H i,1,..., H i,t ] haplotype over T SNPs, chromosome i A k = [A k,1,..., A k,t ] ancestral haplotype, mutation rate θ k C i, inheritance variable, latent ancestor of H i

Dirichlet Process Mixtures (Contd.) H i = [H i,1,..., H i,t ] haplotype over T SNPs, chromosome i A k = [A k,1,..., A k,t ] ancestral haplotype, mutation rate θ k C i, inheritance variable, latent ancestor of H i Generative Model:

Dirichlet Process Mixtures (Contd.) H i = [H i,1,..., H i,t ] haplotype over T SNPs, chromosome i A k = [A k,1,..., A k,t ] ancestral haplotype, mutation rate θ k C i, inheritance variable, latent ancestor of H i Generative Model: Draw a first haplotype a 1 DP(τ, Q 0 ) Q 0 h 1 P h ( a 1, θ 1 )

Dirichlet Process Mixtures (Contd.) H i = [H i,1,..., H i,t ] haplotype over T SNPs, chromosome i A k = [A k,1,..., A k,t ] ancestral haplotype, mutation rate θ k C i, inheritance variable, latent ancestor of H i Generative Model: Draw a first haplotype a 1 DP(τ, Q 0 ) Q 0 h 1 P h ( a 1, θ 1 ) For subsequent haplotypes { p(c i = c j for some j < i c 1,..., c i 1 ) = c i DP(τ, Q 0 ) p(c i c j for all j < i c 1,..., c i 1 ) = nc j i 1+α 0 α0 i 1+α 0

Dirichlet Process Mixtures (Contd.) Generative Model (contd)

Dirichlet Process Mixtures (Contd.) Generative Model (contd) Sample the founder of haplotype i { = {a cj, θ cj }ifc i = c j for somej < i φ ci DP(τ, Q 0 ) Q(a, θ)ifc i c j for allj < i

Dirichlet Process Mixtures (Contd.) Generative Model (contd) Sample the founder of haplotype i { = {a cj, θ cj }ifc i = c j for somej < i φ ci DP(τ, Q 0 ) Q(a, θ)ifc i c j for allj < i Sample the haplotype according to its founder h i c i P( a ci, θ ci )

Dirichlet Process Mixtures (Contd.) Generative Model (contd) Sample the founder of haplotype i { = {a cj, θ cj }ifc i = c j for somej < i φ ci DP(τ, Q 0 ) Q(a, θ)ifc i c j for allj < i Sample the haplotype according to its founder h i c i P( a ci, θ ci ) Assumes each haplotype originates from one ancestor

Dirichlet Process Mixtures (Contd.) Generative Model (contd) Sample the founder of haplotype i { = {a cj, θ cj }ifc i = c j for somej < i φ ci DP(τ, Q 0 ) Q(a, θ)ifc i c j for allj < i Sample the haplotype according to its founder h i c i P( a ci, θ ci ) Assumes each haplotype originates from one ancestor Valid only for short regions in chromosome

Dirichlet Process Mixtures (Contd.) Generative Model (contd) Sample the founder of haplotype i { = {a cj, θ cj }ifc i = c j for somej < i φ ci DP(τ, Q 0 ) Q(a, θ)ifc i c j for allj < i Sample the haplotype according to its founder h i c i P( a ci, θ ci ) Assumes each haplotype originates from one ancestor Valid only for short regions in chromosome Long regions will have recombination

Hidden Markov Dirichlet Process Nonparametric Bayesian HMM

Hidden Markov Dirichlet Process Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space

Hidden Markov Dirichlet Process Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support

Hidden Markov Dirichlet Process Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns

Hidden Markov Dirichlet Process Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns Stock urn Q 0 with balls of K colors, n k of color k

Hidden Markov Dirichlet Process Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns Stock urn Q 0 with balls of K colors, n k of color k HMM-urns Q 1,..., Q K for prior and transition probabilities

Hidden Markov Dirichlet Process Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns Stock urn Q 0 with balls of K colors, n k of color k HMM-urns Q 1,..., Q K for prior and transition probabilities Let m j,k be the number of balls of color k in urn Q j

Hidden Markov Dirichlet Process Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns Stock urn Q 0 with balls of K colors, n k of color k HMM-urns Q 1,..., Q K for prior and transition probabilities Let m j,k be the number of balls of color k in urn Q j HDPM can be simulated by sampling from the urn hierarchy

Hidden Markov Dirichlet Process Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns Stock urn Q 0 with balls of K colors, n k of color k HMM-urns Q 1,..., Q K for prior and transition probabilities Let m j,k be the number of balls of color k in urn Q j HDPM can be simulated by sampling from the urn hierarchy Hierarchical DPM Q 0 α, F DP(α, F ) Q j τ, Q 0 DP(τ, Q 0 )

Hidden Markov Dirichlet Process (Contd.) Each color corresponds to ancestor configuration φ k = {a k, θ k }

Hidden Markov Dirichlet Process (Contd.) Each color corresponds to ancestor configuration φ k = {a k, θ k } For n random draws from Q 0 φ n φ n K k=1 n k n 1 + α δ α φ k (φ n ) + n 1 + α F (φ n)

Hidden Markov Dirichlet Process (Contd.) Each color corresponds to ancestor configuration φ k = {a k, θ k } For n random draws from Q 0 φ n φ n K k=1 n k n 1 + α δ α φ k (φ n ) + n 1 + α F (φ n) Conditioned on Q 0, the marginal configs from Q j φ mj φ mj k n k m j,k + τ n 1+α m j 1 + tau + τ α m j 1 + τ n 1 + α F (φ m j )

Hidden Markov Dirichlet Process (Contd.) Each color corresponds to ancestor configuration φ k = {a k, θ k } For n random draws from Q 0 φ n φ n K k=1 n k n 1 + α δ α φ k (φ n ) + n 1 + α F (φ n) Conditioned on Q 0, the marginal configs from Q j φ mj φ mj k n k m j,k + τ n 1+α m j 1 + tau + τ α m j 1 + τ n 1 + α F (φ m j )

HMDP for Recombination and Inheritance Priors for the conditional model parameters F (A, θ) = p(a)p(θ)

HMDP for Recombination and Inheritance Priors for the conditional model parameters F (A, θ) = p(a)p(θ) p(a) is assumed uniform, p(θ) is assumed beta

HMDP for Recombination and Inheritance Priors for the conditional model parameters F (A, θ) = p(a)p(θ) p(a) is assumed uniform, p(θ) is assumed beta C i = [C i,1,..., C i,t ] ancestral index for chromosome i

HMDP for Recombination and Inheritance Priors for the conditional model parameters F (A, θ) = p(a)p(θ) p(a) is assumed uniform, p(θ) is assumed beta C i = [C i,1,..., C i,t ] ancestral index for chromosome i With no recombination, C i,t = k, t for some k

HMDP for Recombination and Inheritance Priors for the conditional model parameters F (A, θ) = p(a)p(θ) p(a) is assumed uniform, p(θ) is assumed beta C i = [C i,1,..., C i,t ] ancestral index for chromosome i With no recombination, C i,t = k, t for some k Non-recombination is modeled by Poisson point process P(C i,t+1 = C i,t = k) = exp( dr) + (1 exp( dr))π kk

HMDP for Recombination and Inheritance Priors for the conditional model parameters F (A, θ) = p(a)p(θ) p(a) is assumed uniform, p(θ) is assumed beta C i = [C i,1,..., C i,t ] ancestral index for chromosome i With no recombination, C i,t = k, t for some k Non-recombination is modeled by Poisson point process P(C i,t+1 = C i,t = k) = exp( dr) + (1 exp( dr))π kk d is the distance between the two loci

HMDP for Recombination and Inheritance Priors for the conditional model parameters F (A, θ) = p(a)p(θ) p(a) is assumed uniform, p(θ) is assumed beta C i = [C i,1,..., C i,t ] ancestral index for chromosome i With no recombination, C i,t = k, t for some k Non-recombination is modeled by Poisson point process P(C i,t+1 = C i,t = k) = exp( dr) + (1 exp( dr))π kk d is the distance between the two loci r is the rate of recombination per unit distance

HMDP for Recombination and Inheritance Priors for the conditional model parameters F (A, θ) = p(a)p(θ) p(a) is assumed uniform, p(θ) is assumed beta C i = [C i,1,..., C i,t ] ancestral index for chromosome i With no recombination, C i,t = k, t for some k Non-recombination is modeled by Poisson point process P(C i,t+1 = C i,t = k) = exp( dr) + (1 exp( dr))π kk d is the distance between the two loci r is the rate of recombination per unit distance The transition probability to state k is P(C i,t = k, C i,t+1 = k ) = (1 exp(dr))π kk

HMDP for Recombination and Inheritance (Contd.) H i is a mosaic of multiple ancestral chromosomes

HMDP for Recombination and Inheritance (Contd.) H i is a mosaic of multiple ancestral chromosomes Model is a time-inhomogenous infinite HMM

HMDP for Recombination and Inheritance (Contd.) H i is a mosaic of multiple ancestral chromosomes Model is a time-inhomogenous infinite HMM With r, we get stationary HMM

HMDP for Recombination and Inheritance (Contd.) H i is a mosaic of multiple ancestral chromosomes Model is a time-inhomogenous infinite HMM With r, we get stationary HMM Single locus mutation model for emission ( ) 1 θ I(ht at) p(h t a t, θ) = θ I(ht=at) B 1

Haplotype Recombination and Inheritance

HMDP for Recombination and Inheritance (Contd.) Conditional probability of haplotype list h p(h c, a) = p(h i,t a k,t, θ k )Beta(θ k α h, β h )dθ k k θ k i,t c i,t =k = Γ(α h + β h ) Γ(α h + l k )Γ(β h + l k ) ( ) 1 l k Γ(α h )Γ(β h ) Γ(α h + β h + l k + l k k ) B 1 where l k = i,t I(h i,t = a k,t )I(c i,t = k) l k = i,t I(h i,t a k,t )I(c i,t = k)

Inference Gibbs sampler proceeds in two steps

Inference Gibbs sampler proceeds in two steps Sample inheritance {C i,k } given h and a

Inference Gibbs sampler proceeds in two steps Sample inheritance {C i,k } given h and a Sample ancestors a = {a 1,..., a K } given h, C

Inference Gibbs sampler proceeds in two steps Sample inheritance {C i,k } given h and a Sample ancestors a = {a 1,..., a K } given h, C Improve mixing for sampling inheritance

Inference Gibbs sampler proceeds in two steps Sample inheritance {C i,k } given h and a Sample ancestors a = {a 1,..., a K } given h, C Improve mixing for sampling inheritance By Bayes rule t+δ p(c t+1 : t + δ c, h, a) p(c j+1 c j, m, n) j=t t+δ j=t+1 p(h j a cj,j, l cj )

Inference Gibbs sampler proceeds in two steps Sample inheritance {C i,k } given h and a Sample ancestors a = {a 1,..., a K } given h, C Improve mixing for sampling inheritance By Bayes rule t+δ p(c t+1 : t + δ c, h, a) p(c j+1 c j, m, n) j=t t+δ j=t+1 p(h j a cj,j, l cj ) Assume probability of having two recombinations is small p(c t+1 : t + δ c, h, a) p(c t c t 1, m, n)p(c t+δ+1 c t+δ = c t, m, n) t j=

Inference (Contd.) Assuming d, r to be small, λ = 1 exp( dr) dr { λπ k,k + (1 λ)δ(k, k )fork {1 p(c t = k c t 1 = k, m, n, r, d) = λπ k,k+1 fork = K + 1

Inference (Contd.) Assuming d, r to be small, λ = 1 exp( dr) dr { λπ k,k + (1 λ)δ(k, k )fork {1 p(c t = k c t 1 = k, m, n, r, d) = λπ k,k+1 fork = K + 1 Terms can be replaced in original equation to get sampler

Inference (Contd.) Assuming d, r to be small, λ = 1 exp( dr) dr { λπ k,k + (1 λ)δ(k, k )fork {1 p(c t = k c t 1 = k, m, n, r, d) = λπ k,k+1 fork = K + 1 Terms can be replaced in original equation to get sampler Posterior distribution for ancestors p(a k,t c, h) Γ(α h + β h ) Γ(α h + l k,t )Γ(β h + l k,t ) ( ) 1 l k,t Γ(α h )Γ(β h ) Γ(α h + β h + l k,t + l k,t ) B 1

Single Population Data Haplotype block boundaries HMDP (black solid), HMM (red dotted), MDL (blue dashed)

Two Population Data

Hierarchical DPM for Haplotype Inference

Hierarchical DPM for Haplotype Inference (Contd.)

Experiments: Hapmap Data SNP genotypes from four populations

Experiments: Hapmap Data SNP genotypes from four populations CEPH, Utah residents with northern/weatern European ancestry, 60

Experiments: Hapmap Data SNP genotypes from four populations CEPH, Utah residents with northern/weatern European ancestry, 60 YRI, Yoruba in Ibadan, Nigeria, 60

Experiments: Hapmap Data SNP genotypes from four populations CEPH, Utah residents with northern/weatern European ancestry, 60 YRI, Yoruba in Ibadan, Nigeria, 60 CHB, Han Chinese in Beijing, 45

Experiments: Hapmap Data SNP genotypes from four populations CEPH, Utah residents with northern/weatern European ancestry, 60 YRI, Yoruba in Ibadan, Nigeria, 60 CHB, Han Chinese in Beijing, 45 JPT, Japanese in Tokyo, 44

Experiments: Hapmap Data SNP genotypes from four populations CEPH, Utah residents with northern/weatern European ancestry, 60 YRI, Yoruba in Ibadan, Nigeria, 60 CHB, Han Chinese in Beijing, 45 JPT, Japanese in Tokyo, 44 Experiments on short ( 10) and long ( 10 2 10 3 ) SNPs

Short SNP Sequences

Long SNP Sequences

Mutation Rates and Diversity

Mutation Rates and Diversity (Contd.)