Hidden Markov models in population genetics and evolutionary biology

Similar documents
An Introduction to Bioinformatics Algorithms Hidden Markov Models

HIDDEN MARKOV MODELS

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models

Computational Genomics and Molecular Biology, Fall

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

Hidden Markov Models

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

O 3 O 4 O 5. q 3. q 4. Transition

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

Hidden Markov Models

Hidden Markov Models. Three classic HMM problems

Markov Chains and Hidden Markov Models. = stochastic, generative models

Lecture 7 Sequence analysis. Hidden Markov Models

Hidden Markov Models. x 1 x 2 x 3 x K

HMMs and biological sequence analysis

11.3 Decoding Algorithm

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

HMM : Viterbi algorithm - a toy example

CSCE 471/871 Lecture 3: Markov Chains and

Stephen Scott.

Hidden Markov Models for biological sequence analysis

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Basic math for biology

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models (I)

Alignment Algorithms. Alignment Algorithms

Computational Genomics and Molecular Biology, Fall

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

HMM : Viterbi algorithm - a toy example

Today s Lecture: HMMs

Hidden Markov Models for biological sequence analysis I

Lecture 9. Intro to Hidden Markov Models (finish up)

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

Pair Hidden Markov Models

order is number of previous outputs

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models 1

Hidden Markov Models

Data Mining in Bioinformatics HMM

Challenges when applying stochastic models to reconstruct the demographic history of populations.

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Taming the Beast Workshop

CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Use of hidden Markov models for QTL mapping

STA 4273H: Statistical Machine Learning

Introduction to Machine Learning CMU-10701

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Evolutionary Models. Evolutionary Models

Hidden Markov Models. Ron Shamir, CG 08

STA 414/2104: Machine Learning

RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"

Dynamic Approaches: The Hidden Markov Model

Phylogenomics of closely related species and individuals

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

EECS730: Introduction to Bioinformatics

Hidden Markov Models. x 1 x 2 x 3 x K

Statistical Methods for NLP

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

6 Markov Chains and Hidden Markov Models

Stephen Scott.

Directed Probabilistic Graphical Models CMSC 678 UMBC

6 Introduction to Population Genetics

HMM part 1. Dr Philip Jackson

Hidden Markov Models

Multiple Sequence Alignment using Profile HMM

HMM: Parameter Estimation

6 Introduction to Population Genetics

Introduction to Hidden Markov Models (HMMs)

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Hidden Markov Models NIKOLAY YAKOVETS

Hidden Markov Models. Terminology, Representation and Basic Problems

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

Chapter 4: Hidden Markov Models

Hidden Markov Models

Diffusion Models in Population Genetics

Hidden Markov Models (HMMs) November 14, 2017

EVOLUTIONARY DISTANCES

Estimating Evolutionary Trees. Phylogenetic Methods

Demographic Inference with Coalescent Hidden Markov Model

L23: hidden Markov models

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

Hidden Markov Models in Bioinformatics 14.11

Mathematical models in population genetics II

Hidden Markov Models and some applications

Graphical Models Seminar

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Modelling Genetic Variations with Fragmentation-Coagulation Processes

Markov Models & DNA Sequence Evolution

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Linear Dynamical Systems

Transcription:

Hidden Markov models in population genetics and evolutionary biology Gerton Lunter Wellcome Trust Centre for Human Genetics Oxford, UK April 29, 2013

Topics for today Markov chains Hidden Markov models Examples Sequence features (genes, domains) Sequence evolution (alignment, conserved elements) Population genetics (phasing; demographic inference) Journal club

Markov chains

Markov chains Suppose a stochastic process of interest is modelled as a discrete-time process {X i } i 1. This process is a Markov process if it is characterized by X 1 µ( ) (the initial distribution) X k (X k 1 = x k 1 ) f( x k 1 ) (the transition probabilities) Notation: x i:j = (x i, x i+1,...,x j 1, x j ) n p(x 1:n ) = p(x 1 ) p(x k x 1:k 1 ) = µ(x 1 ) k=2 n f(x k x k 1 ) k=2

Markov chains Suppose a stochastic process of interest is modelled as a discrete-time process {X i } i 1. This process is a Markov process if it is characterized by X 1 µ( ) (the initial distribution) X k (X k 1 = x k 1 ) f( x k 1 ) (the transition probabilities) Notation: x i:j = (x i, x i+1,...,x j 1, x j ) n p(x 1:n ) = p(x 1 ) p(x k x 1:k 1 ) = µ(x 1 ) k=2 n f(x k x k 1 ) k=2

Markov chains Suppose a stochastic process of interest is modelled as a discrete-time process {X i } i 1. This process is a Markov process if it is characterized by X 1 µ( ) (the initial distribution) X k (X k 1 = x k 1 ) f( x k 1 ) (the transition probabilities) Notation: x i:j = (x i, x i+1,...,x j 1, x j ) n p(x 1:n ) = p(x 1 ) p(x k x 1:k 1 ) = µ(x 1 ) k=2 n f(x k x k 1 ) k=2

Example: a weather model Modeling the observation that today s weather is likely to be similar to yesterday s: 6 7 1 7 1 3 2 3 f( ) = 1 7, etc. p(,,,,, ) = µ( ) f( ) f( ) f( ) f( ) f( )

Example: CpG frequency in mammalian genomes In mammals, the C in CpG dinucleotides is often methylated, increasing the rate of the C T transition, and causing CpGs to be 5 times less frequent than expected. This can be modelled by a Markov chain along the sequence, on the state space {A, C, G, T}: Start A C G T End (Blue = lower probability transition.)

Example: mutation process Let X i {A, C, G, T} be the nucleotide state at some site at time i t. f(x k x k 1 ) = A xk 1 x k with A = A C G T A 1 3ǫ ǫ ǫ ǫ C ǫ 1 3ǫ ǫ ǫ ; G ǫ ǫ 1 3ǫ ǫ T ǫ ǫ ǫ 1 3ǫ n p(x 2:n x 1 ) = A xk 1 x k ; k=2 p(x n x 1 ) = n x 2:n 1 k=2 A xk 1 x k = (A n 1 ) x1 x n Let A = I + B t (B is the rate matrix), t = 1 n and let n, then p(x n x 1 ) = ( (I + B/n) n 1) x 1 x n exp(b) x1 x n

Example: mutation process Let X i {A, C, G, T} be the nucleotide state at some site at time i t. f(x k x k 1 ) = A xk 1 x k with A = A C G T A 1 3ǫ ǫ ǫ ǫ C ǫ 1 3ǫ ǫ ǫ ; G ǫ ǫ 1 3ǫ ǫ T ǫ ǫ ǫ 1 3ǫ n p(x 2:n x 1 ) = A xk 1 x k ; k=2 p(x n x 1 ) = n x 2:n 1 k=2 A xk 1 x k = (A n 1 ) x1 x n Let A = I + B t (B is the rate matrix), t = 1 n and let n, then p(x n x 1 ) = ( (I + B/n) n 1) x 1 x n exp(b) x1 x n

Example: mutation process Let X i {A, C, G, T} be the nucleotide state at some site at time i t. f(x k x k 1 ) = A xk 1 x k with A = A C G T A 1 3ǫ ǫ ǫ ǫ C ǫ 1 3ǫ ǫ ǫ ; G ǫ ǫ 1 3ǫ ǫ T ǫ ǫ ǫ 1 3ǫ n p(x 2:n x 1 ) = A xk 1 x k ; k=2 p(x n x 1 ) = n x 2:n 1 k=2 A xk 1 x k = (A n 1 ) x1 x n Let A = I + B t (B is the rate matrix), t = 1 n and let n, then p(x n x 1 ) = ( (I + B/n) n 1) x 1 x n exp(b) x1 x n

Example: mutation process Let X i {A, C, G, T} be the nucleotide state at some site at time i t. f(x k x k 1 ) = A xk 1 x k with A = A C G T A 1 3ǫ ǫ ǫ ǫ C ǫ 1 3ǫ ǫ ǫ ; G ǫ ǫ 1 3ǫ ǫ T ǫ ǫ ǫ 1 3ǫ n p(x 2:n x 1 ) = A xk 1 x k ; k=2 p(x n x 1 ) = n x 2:n 1 k=2 A xk 1 x k = (A n 1 ) x1 x n Let A = I + B t (B is the rate matrix), t = 1 n and let n, then p(x n x 1 ) = ( (I + B/n) n 1) x 1 x n exp(b) x1 x n

Example: mutation process Let X i {A, C, G, T} be the nucleotide state at some site at time i t. f(x k x k 1 ) = A xk 1 x k with A = A C G T A 1 3ǫ ǫ ǫ ǫ C ǫ 1 3ǫ ǫ ǫ ; G ǫ ǫ 1 3ǫ ǫ T ǫ ǫ ǫ 1 3ǫ n p(x 2:n x 1 ) = A xk 1 x k ; k=2 p(x n x 1 ) = n x 2:n 1 k=2 A xk 1 x k = (A n 1 ) x1 x n Let A = I + B t (B is the rate matrix), t = 1 n and let n, then p(x n x 1 ) = ( (I + B/n) n 1) x 1 x n exp(b) x1 x n

Hidden Markov models

Hidden Markov models Suppose that {X k } k 1 is not observed (it is hidden), but that we do observe a related process {Y k } k 1. Conditional on {X k } k 1 the observations {Y k } k 1 are independent, and marginally distributed as Y k (X k = x k ) g(, x k ) This implies that conditional on {X k } k 1 we have n p(y 1:n x 1:n ) = g(y k x k ) k=1

Hidden Markov models Suppose that {X k } k 1 is not observed (it is hidden), but that we do observe a related process {Y k } k 1. Conditional on {X k } k 1 the observations {Y k } k 1 are independent, and marginally distributed as Y k (X k = x k ) g(, x k ) This implies that conditional on {X k } k 1 we have n p(y 1:n x 1:n ) = g(y k x k ) k=1

Hidden Markov models Suppose that {X k } k 1 is not observed (it is hidden), but that we do observe a related process {Y k } k 1. Conditional on {X k } k 1 the observations {Y k } k 1 are independent, and marginally distributed as Y k (X k = x k ) g(, x k ) This implies that conditional on {X k } k 1 we have n p(y 1:n x 1:n ) = g(y k x k ) k=1

Example: Weather model 6 7 1 7 2 3 9 10 : 9 10 : 1 10 1 10 1 5 : 3 10 : 7 10 4 5 1 3 Markov chain: Move from state to state according to the transition probabilities f(, ). Observations are the states visited: Hidden Markov model: Move between states (H,L) according to a Markov chain as before, but emit the observation (, ) according to a probability distribution g(, ) instead:,,,,,,...,,,,,,...

Example: Weather model Markov chain (hidden): p(hhhlll) =µ(h) f(h H) f(h H) f(l H) f(l L) f(l L); Observations: p( HHHLLL) =p(hhhlll) g( H)g( H)g( H)g( L)g( L)g( L)

Hidden Markov models Questions you may want to ask: What is the likelihood of observations: p( ) = x 1:6 p(x, ) What is the posterior probability of a particular state given observations: x p(x k ) = 1:k 1 x k+1:n p(x 1:n, ) x p(x 1:n 1:n, ) What is the single most likely state sequence given observations: arg max p(x 1:n, ) x 1:n

Hidden Markov models Questions you may want to ask: What is the likelihood of observations: p( ) = x 1:6 p(x, ) What is the posterior probability of a particular state given observations: x p(x k ) = 1:k 1 x k+1:n p(x 1:n, ) x p(x 1:n 1:n, ) What is the single most likely state sequence given observations: arg max p(x 1:n, ) x 1:n

Hidden Markov models Questions you may want to ask: What is the likelihood of observations: p( ) = x 1:6 p(x, ) What is the posterior probability of a particular state given observations: x p(x k ) = 1:k 1 x k+1:n p(x 1:n, ) x p(x 1:n 1:n, ) What is the single most likely state sequence given observations: arg max p(x 1:n, ) x 1:n

Hidden Markov models Questions you may want to ask: What is the likelihood of observations: p( ) = x 1:6 p(x, ) Forward algorithm What is the posterior probability of a particular state given observations: x p(x k ) = 1:k 1 x k+1:n p(x 1:n, ) x p(x 1:n 1:n, ) Forward + Backward algorithms What is the single most likely state sequence given observations: Viterbi algorithm arg max p(x 1:n, ) x 1:n

Example: posterior probability of a particular state p(x k y 1:n ) = p(x k, y 1:n ) x k p(x k, y 1:n ) p(x k, y 1:n ) = p(x 1:n, y 1:n ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x 1:k, y 1:k ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x k ) x k+1:n x 1:k 1 = p(x k, y 1:k )p(y k+1:n x k ) := α k (x k )β k (x k )

Example: posterior probability of a particular state p(x k y 1:n ) = p(x k, y 1:n ) x k p(x k, y 1:n ) p(x k, y 1:n ) = p(x 1:n, y 1:n ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x 1:k, y 1:k ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x k ) x k+1:n x 1:k 1 = p(x k, y 1:k )p(y k+1:n x k ) := α k (x k )β k (x k )

Example: posterior probability of a particular state p(x k y 1:n ) = p(x k, y 1:n ) x k p(x k, y 1:n ) p(x k, y 1:n ) = p(x 1:n, y 1:n ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x 1:k, y 1:k ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x k ) x k+1:n x 1:k 1 = p(x k, y 1:k )p(y k+1:n x k ) := α k (x k )β k (x k )

Example: posterior probability of a particular state p(x k y 1:n ) = p(x k, y 1:n ) x k p(x k, y 1:n ) p(x k, y 1:n ) = p(x 1:n, y 1:n ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x 1:k, y 1:k ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x k ) x k+1:n x 1:k 1 = p(x k, y 1:k )p(y k+1:n x k ) := α k (x k )β k (x k )

Example: posterior probability of a particular state p(x k y 1:n ) = p(x k, y 1:n ) x k p(x k, y 1:n ) p(x k, y 1:n ) = p(x 1:n, y 1:n ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x 1:k, y 1:k ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x k ) x k+1:n x 1:k 1 = p(x k, y 1:k )p(y k+1:n x k ) := α k (x k )β k (x k )

Example: posterior probability of a particular state p(x k y 1:n ) = p(x k, y 1:n ) x k p(x k, y 1:n ) p(x k, y 1:n ) = p(x 1:n, y 1:n ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x 1:k, y 1:k ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x k ) x k+1:n x 1:k 1 = p(x k, y 1:k )p(y k+1:n x k ) := α k (x k )β k (x k )

Example: posterior probability of a particular state α k (x k ) := p(x k, y 1:k ) α 1 (x 1 ) = p(x 1, y 1 ) = µ(x 1 )g(y 1, x 1 ) α k+1 (x k+1 ) = p(x 1:k+1, y 1:k+1 ) x 1:k = p(x 1:k, y 1:k )f(x k+1 x k )g(y k+1 x k+1 ) x 1:k 1 x k = p(x 1:k, y 1:k ) f(x k+1 x k )g(y k+1 x k+1 ) x k x 1:k 1 = α k (x k )f(x k+1 x k )g(y k+1 x k+1 ) x k

Example: posterior probability of a particular state α k (x k ) := p(x k, y 1:k ) α 1 (x 1 ) = p(x 1, y 1 ) = µ(x 1 )g(y 1, x 1 ) α k+1 (x k+1 ) = p(x 1:k+1, y 1:k+1 ) x 1:k = p(x 1:k, y 1:k )f(x k+1 x k )g(y k+1 x k+1 ) x 1:k 1 x k = p(x 1:k, y 1:k ) f(x k+1 x k )g(y k+1 x k+1 ) x k x 1:k 1 = α k (x k )f(x k+1 x k )g(y k+1 x k+1 ) x k

Example: posterior probability of a particular state α k (x k ) := p(x k, y 1:k ) α 1 (x 1 ) = p(x 1, y 1 ) = µ(x 1 )g(y 1, x 1 ) α k+1 (x k+1 ) = p(x 1:k+1, y 1:k+1 ) x 1:k = p(x 1:k, y 1:k )f(x k+1 x k )g(y k+1 x k+1 ) x 1:k 1 x k = p(x 1:k, y 1:k ) f(x k+1 x k )g(y k+1 x k+1 ) x k x 1:k 1 = α k (x k )f(x k+1 x k )g(y k+1 x k+1 ) x k

Example: posterior probability of a particular state α k (x k ) := p(x k, y 1:k ) α 1 (x 1 ) = p(x 1, y 1 ) = µ(x 1 )g(y 1, x 1 ) α k+1 (x k+1 ) = p(x 1:k+1, y 1:k+1 ) x 1:k = p(x 1:k, y 1:k )f(x k+1 x k )g(y k+1 x k+1 ) x 1:k 1 x k = p(x 1:k, y 1:k ) f(x k+1 x k )g(y k+1 x k+1 ) x k x 1:k 1 = α k (x k )f(x k+1 x k )g(y k+1 x k+1 ) x k

Example: posterior probability of a particular state α k (x k ) := p(x k, y 1:k ) α 1 (x 1 ) = p(x 1, y 1 ) = µ(x 1 )g(y 1, x 1 ) α k+1 (x k+1 ) = p(x 1:k+1, y 1:k+1 ) x 1:k = p(x 1:k, y 1:k )f(x k+1 x k )g(y k+1 x k+1 ) x 1:k 1 x k = p(x 1:k, y 1:k ) f(x k+1 x k )g(y k+1 x k+1 ) x k x 1:k 1 = α k (x k )f(x k+1 x k )g(y k+1 x k+1 ) x k

Example: posterior probability of a particular state α k (x k ) := p(x k, y 1:k ) α 1 (x 1 ) = p(x 1, y 1 ) = µ(x 1 )g(y 1, x 1 ) α k+1 (x k+1 ) = p(x 1:k+1, y 1:k+1 ) x 1:k = p(x 1:k, y 1:k )f(x k+1 x k )g(y k+1 x k+1 ) x 1:k 1 x k = p(x 1:k, y 1:k ) f(x k+1 x k )g(y k+1 x k+1 ) x k x 1:k 1 = α k (x k )f(x k+1 x k )g(y k+1 x k+1 ) x k

Example: posterior probability of a particular state β k (x k ) := p(y k+1:n x k ) β n (x n ) = p( x n ) = 1 β k 1 (x k 1 ) = p(y k:n x k 1 ) = x k p(x k x k 1 )p(y k x k )p(y k+1:n x k ) = x k f(x k x k 1 )g(y k x k )β k (x k )

Example: posterior probability of a particular state β k (x k ) := p(y k+1:n x k ) β n (x n ) = p( x n ) = 1 β k 1 (x k 1 ) = p(y k:n x k 1 ) = x k p(x k x k 1 )p(y k x k )p(y k+1:n x k ) = x k f(x k x k 1 )g(y k x k )β k (x k )

Example: posterior probability of a particular state β k (x k ) := p(y k+1:n x k ) β n (x n ) = p( x n ) = 1 β k 1 (x k 1 ) = p(y k:n x k 1 ) = x k p(x k x k 1 )p(y k x k )p(y k+1:n x k ) = x k f(x k x k 1 )g(y k x k )β k (x k )

Example: posterior probability of a particular state β k (x k ) := p(y k+1:n x k ) β n (x n ) = p( x n ) = 1 β k 1 (x k 1 ) = p(y k:n x k 1 ) = x k p(x k x k 1 )p(y k x k )p(y k+1:n x k ) = x k f(x k x k 1 )g(y k x k )β k (x k )

Example: posterior probability of a particular state β k (x k ) := p(y k+1:n x k ) β n (x n ) = p( x n ) = 1 β k 1 (x k 1 ) = p(y k:n x k 1 ) = x k p(x k x k 1 )p(y k x k )p(y k+1:n x k ) = x k f(x k x k 1 )g(y k x k )β k (x k )

Summary of useful HMM algorithms Sampling from the prior (trivial) Forward or Backward for the likelihood: p(y 1:n ) = x n α n (x n ) = x 1 β 1 (x 1 ) Forward and Backward for: State posteriors: p(x k y 1:n ) = α k (x k )β k (x k )/p(y 1:n ) Sampling state paths from the posterior Expectation-Maximization (Baum-Welch) to estimate parameters Posterior decoding (MAP paths) Viterbi for single most likely state path, arg max x 1:n p(x 1:n y 1:n )

Examples

Example 1: Motif finding hx0a ( 173 )...dyvrsmiadylnklid-igvagfridaskhmw... 1smd ( 173 )...dyvrskiaeymnhlid-igvagfridaskhmw... 1jae ( 161 )...dyvrgvlidymnhmid-lgvagfrvdaakhms... 1g94a ( 150 )...nyvqntiaayindlqa-igvkgfrfdaskhva... 1bag ( 152 )...tqvqsylkrfleraln-dgadgfrfdaakhie... 1smaa ( 303 )...pevkrylldvatywirefdidgwrldvaneid... 1bvza ( 301 )...pevkeylfdvarfwm-eqgidgwrldvanevd... 1uok ( 175 )...ekvrqdvyemmkfwle-kgidgfrmdvinfis... 2aaa ( 181 )...tavrtiwydwvadlvsnysvdglridsvlevq... 7taa ( 181 )...dvvknewydwvgslvsnysidglridtvkhvq... 1cgt ( 205 )...atidkyfkdaiklwld-mgvdgirvdavkhmp... 1ciu ( 206 )...stidsylksaikvwld-mgidgirldavkhmp... 1cyg ( 201 )...pvidrylkdavkmwid-mgidgirmdavkhmp... 1qhpa ( 204 )...gtiaqyltdaavqlva-hgadglridavkhfn... 1hvxa ( 209 )...pevvtelkswgkwyvnttnidgfrldavkhik... 1vjs ( 206 )...pdvaaeikrwgtwyanelqldgfrldavkhik... 1gcya ( 168 )...pqvygmfrdeftnlrsqygaggfrfdfvrgya... 1avaa ( 154 )...lrvqkelvewlnwlkadigfdgwrfdfakgys... 1ehaa ( 227 )...devrkfilenveywikeynvdgfrldavhaii... 1bf2 ( 350 )...tvaqnlivdslaywantmgvdgfrfdlasvlg... 1gjwa ( 360 )...relweylagviphyqkkygidgarldmghalp...

Example 1: Motif finding A motif modeled as an ungapped weight matrix can be represented as an HMM. We can ask for a local alignment by adding padding states at the beginning and the end: X X Start A: A: U: C: C: A: 0.8 1.0 0.8 0.9 1.0 0.8 G: C: U: C: 0.2 0.1 0.1 0.1 A: G: 0.1 0.1 End

Example 1: Motif finding Not all related motifs have exactly the same length; some may lack certain residues. This is modeled by introducing delete states into the HMM: X X Start A: A: U: C: C: A: 0.8 1.0 0.8 0.9 1.0 0.8 G: C: U: C: 0.2 0.1 0.1 0.1 A: G: 0.1 0.1 End The transition probabilities to/from delete states is position-dependent: the probability of deleting a particular nucleotide depends on the location within the motif.

Example 1: Motif finding Similarly, some motif may have extra residues, which are modeled with insert states. X X X X X X X X Start A: A: U: C: C: A: 0.8 1.0 0.8 0.9 1.0 0.8 G: C: U: C: 0.2 0.1 0.1 0.1 A: G: 0.1 0.1 End

Example 1: Motif finding We can add a loopback transition to allow for multiple consecutive matches (think e.g. zinc-finger proteins): X X X X X X X X Start A: A: U: C: C: A: 0.8 1.0 0.8 0.9 1.0 0.8 G: C: U: C: 0.2 0.1 0.1 0.1 A: G: 0.1 0.1 End

Example 1: Motif finding This is the profile HMM architecture in the SAM/HMMER packages: X X X X X X X X Start U: C: C: A: A: A: 1.0 0.8 0.8 0.8 1.0 0.9 U: C: G: C: 0.1 0.2 0.1 0.1 G: A: 0.1 0.1 End In this context, the standard algorithm achieve the following: Viterbi: alignment of a sequence to the HMM Forward: likelihood that a sequence contains the motif Forward-Backward: posterior expected state/transition counts Baum-Welch uses these expectations to maximise the likelihood of a given training set

Example 2: Gene finding (source unknown)

Example 2: Gene finding Burge and Karlin, JMB 1998

Example 2: Gene finding UCSC genome browser

Example 3: PhyloHMM Siepel et al., Genome Research 2005

Example 4: Alignment Observation: two sequences GAATTCGA; GCATCGA Required: Alignment; sequence of alignment columns ######## ####-### GAATTCGA GCAT-CGA

Example 4: Alignment Observation: two sequences GAATTCGA; GCATCGA Required: Alignment; sequence of alignment columns ######## ####-### GAATTCGA GCAT-CGA

Example 4: Alignment To fit into HMM framework: Allow two sequences to be emitted simultaneously Allow states with empty emissions The Markov chain {X i } i 1 is the sequence of alignment columns, # #, # #, # #, # #, # -, # #, # #, # # Nucleotides emitted together are correlated (homologous) α, β now have two indices; computing them involves traversing a 2-dimensional dynamic programming table.

Example 4: Alignment To fit into HMM framework: Allow two sequences to be emitted simultaneously Allow states with empty emissions The Markov chain {X i } i 1 is the sequence of alignment columns, # #, # #, # #, # #, # -, # #, # #, # # Nucleotides emitted together are correlated (homologous) α, β now have two indices; computing them involves traversing a 2-dimensional dynamic programming table.

Example 4: Alignment To fit into HMM framework: Allow two sequences to be emitted simultaneously Allow states with empty emissions The Markov chain {X i } i 1 is the sequence of alignment columns, # #, # #, # #, # #, # -, # #, # #, # # Nucleotides emitted together are correlated (homologous) α, β now have two indices; computing them involves traversing a 2-dimensional dynamic programming table.

Example 5: co-estimating alignment and conservation M = # # ; Ins = - # ; Del = # -

Example 5: co-estimating alignment and conservation

Example 6: Probabilistic progressive alignment Problem: How to align > 2 sequences? Naive HMM implementation Properly accounts for uncertainty (in alignment; not tree) Complexity O(L N ); DP table has N dimensions Progressive alignment: pairwise + infer sequence at root Practical approach; complexity O(NL 2 ) Inferences are biased and overconfident e.g. PRANK; Loytynoja & Goldman 2005

Example 6: Probabilistic progressive alignment Problem: How to align > 2 sequences? Naive HMM implementation Properly accounts for uncertainty (in alignment; not tree) Complexity O(L N ); DP table has N dimensions Progressive alignment: pairwise + infer sequence at root Practical approach; complexity O(NL 2 ) Inferences are biased and overconfident e.g. PRANK; Loytynoja & Goldman 2005

Example 6: Probabilistic progressive alignment Problem: How to align > 2 sequences? Naive HMM implementation Properly accounts for uncertainty (in alignment; not tree) Complexity O(L N ); DP table has N dimensions Progressive alignment: pairwise + infer sequence at root Practical approach; complexity O(NL 2 ) Inferences are biased and overconfident e.g. PRANK; Loytynoja & Goldman 2005

Example 6: Probabilistic progressive alignment Solution: Combine progressive and probabilistic approaches. Represent ancestral sequence a of a pair of descendant sequences s 1, s 2 as a partial likelihood, with a as parameter: P(s 1, s 2 a) Prune unlikely alignment columns, and represent remainder of dynamic programming table as a graph Iterate, aligning/pruning graphs progressively up the tree At root, use prior distribution on a to find multiple alignment Algorithm can be formalized in terms of transducers. Details: Westesson/Holmes, arxiv:1103.4347v2; PLoS ONE e34572

Example 6: Probabilistic progressive alignment

Example 7: Lander-Green (phasing in pedigrees)

Example 7: Lander-Green (phasing in pedigrees) Transmission in pedigree (n non-founders) determined by 2n bits 2 bits identify grandparent of origin for paternal/maternal chromosome The transmission vector is the state of the HMM (2 2n states) States changes (single bit flips) correspond to recombinations State determines more/less likely observed genotypes Some require > 1 mutation per site Lander and Green, PNAS, 1987

Example 8: Li and Stephens (phasing in populations) Li and Stephens, Genetics 2003

Intermezzo: the Wright-Fisher model

Intermezzo: the Wright-Fisher model

Intermezzo: the Wright-Fisher model

Example 9: CoalHMM and incomplete lineage sorting Hobolth,Dutheil,Hawks,Schierup,Mailund (2011) Genome Research

Example 10: PSMC and demographic inference Li and Durbin, Nature 2011

Example 10: PSMC and demographic inference

Example 10: PSMC and demographic inference

Jounal club

Papers: Hobolth, Christensen, Mailund, Schierup (2007) Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet 3(2):e7 Li and Durbin (2011) Inference of human population history from individual whole-genome sequences Nature 475, 493-496. Lunter, Rocco, Mimouni, Heger, Caldeira, Hein (2008) Uncertainty in homology inferences: assessing and improving genomic sequence alignment Genome Res 18(2) 298-309. P Scheet and M Stephens (2006) A fast and flexible method for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78(4) Lander and Green (1987) PNAS 84:2363-7; and Kruglyak, Daly, Reeve-Daly and Lander (1996) AJHG 58:1347-63.

Questions - Hobolth et al. Hobolth et al.: Explain the difference between phylogeny and genealogy, and the concept of incomplete lineage sorting. What do the states of the HMM represent? What are informative sites for the model? The model can in principle be applied to any quartet of species. What aspect of the shape of the phylogeny relating the species, and what other parameters (if any) are relevant to assess whether the model might provide useful inferences?

Questions - Li and Durbin What do the states of the HMM represent? At any locus, the density of heterozygous sites determines which state is currently most likely. On average, and in human, how many heterozygous sites occur between state switches? Do you think the data is very informative about the HMM state at any position? What limits the power to infer N e at recent and ancient times?

Questions - Lunter et al. List some causes of inaccuracies in alignments. Would a more accurate model of sequence evolution improve alignments? Is model misfit the main cause for alignment inaccuracies? What is the practical limit (in terms of evolutionary distance, in mutations/site) for pairwise alignment of DNA? Would multiple alignment allow DNA from more divergent species to be aligned? How can divergence be assessed by alignment for species that are more divergent? What is posterior decoding and how does it work? In what way does this improve alignments compared to a Viterbi decoding? Why is this?