Hidden Markov models in population genetics and evolutionary biology

Size: px
Start display at page:

Download "Hidden Markov models in population genetics and evolutionary biology"

Transcription

1 Hidden Markov models in population genetics and evolutionary biology Gerton Lunter Wellcome Trust Centre for Human Genetics Oxford, UK April 29, 2013

2 Topics for today Markov chains Hidden Markov models Examples Sequence features (genes, domains) Sequence evolution (alignment, conserved elements) Population genetics (phasing; demographic inference) Journal club

3 Markov chains

4 Markov chains Suppose a stochastic process of interest is modelled as a discrete-time process {X i } i 1. This process is a Markov process if it is characterized by X 1 µ( ) (the initial distribution) X k (X k 1 = x k 1 ) f( x k 1 ) (the transition probabilities) Notation: x i:j = (x i, x i+1,...,x j 1, x j ) n p(x 1:n ) = p(x 1 ) p(x k x 1:k 1 ) = µ(x 1 ) k=2 n f(x k x k 1 ) k=2

5 Markov chains Suppose a stochastic process of interest is modelled as a discrete-time process {X i } i 1. This process is a Markov process if it is characterized by X 1 µ( ) (the initial distribution) X k (X k 1 = x k 1 ) f( x k 1 ) (the transition probabilities) Notation: x i:j = (x i, x i+1,...,x j 1, x j ) n p(x 1:n ) = p(x 1 ) p(x k x 1:k 1 ) = µ(x 1 ) k=2 n f(x k x k 1 ) k=2

6 Markov chains Suppose a stochastic process of interest is modelled as a discrete-time process {X i } i 1. This process is a Markov process if it is characterized by X 1 µ( ) (the initial distribution) X k (X k 1 = x k 1 ) f( x k 1 ) (the transition probabilities) Notation: x i:j = (x i, x i+1,...,x j 1, x j ) n p(x 1:n ) = p(x 1 ) p(x k x 1:k 1 ) = µ(x 1 ) k=2 n f(x k x k 1 ) k=2

7 Example: a weather model Modeling the observation that today s weather is likely to be similar to yesterday s: f( ) = 1 7, etc. p(,,,,, ) = µ( ) f( ) f( ) f( ) f( ) f( )

8 Example: CpG frequency in mammalian genomes In mammals, the C in CpG dinucleotides is often methylated, increasing the rate of the C T transition, and causing CpGs to be 5 times less frequent than expected. This can be modelled by a Markov chain along the sequence, on the state space {A, C, G, T}: Start A C G T End (Blue = lower probability transition.)

9 Example: mutation process Let X i {A, C, G, T} be the nucleotide state at some site at time i t. f(x k x k 1 ) = A xk 1 x k with A = A C G T A 1 3ǫ ǫ ǫ ǫ C ǫ 1 3ǫ ǫ ǫ ; G ǫ ǫ 1 3ǫ ǫ T ǫ ǫ ǫ 1 3ǫ n p(x 2:n x 1 ) = A xk 1 x k ; k=2 p(x n x 1 ) = n x 2:n 1 k=2 A xk 1 x k = (A n 1 ) x1 x n Let A = I + B t (B is the rate matrix), t = 1 n and let n, then p(x n x 1 ) = ( (I + B/n) n 1) x 1 x n exp(b) x1 x n

10 Example: mutation process Let X i {A, C, G, T} be the nucleotide state at some site at time i t. f(x k x k 1 ) = A xk 1 x k with A = A C G T A 1 3ǫ ǫ ǫ ǫ C ǫ 1 3ǫ ǫ ǫ ; G ǫ ǫ 1 3ǫ ǫ T ǫ ǫ ǫ 1 3ǫ n p(x 2:n x 1 ) = A xk 1 x k ; k=2 p(x n x 1 ) = n x 2:n 1 k=2 A xk 1 x k = (A n 1 ) x1 x n Let A = I + B t (B is the rate matrix), t = 1 n and let n, then p(x n x 1 ) = ( (I + B/n) n 1) x 1 x n exp(b) x1 x n

11 Example: mutation process Let X i {A, C, G, T} be the nucleotide state at some site at time i t. f(x k x k 1 ) = A xk 1 x k with A = A C G T A 1 3ǫ ǫ ǫ ǫ C ǫ 1 3ǫ ǫ ǫ ; G ǫ ǫ 1 3ǫ ǫ T ǫ ǫ ǫ 1 3ǫ n p(x 2:n x 1 ) = A xk 1 x k ; k=2 p(x n x 1 ) = n x 2:n 1 k=2 A xk 1 x k = (A n 1 ) x1 x n Let A = I + B t (B is the rate matrix), t = 1 n and let n, then p(x n x 1 ) = ( (I + B/n) n 1) x 1 x n exp(b) x1 x n

12 Example: mutation process Let X i {A, C, G, T} be the nucleotide state at some site at time i t. f(x k x k 1 ) = A xk 1 x k with A = A C G T A 1 3ǫ ǫ ǫ ǫ C ǫ 1 3ǫ ǫ ǫ ; G ǫ ǫ 1 3ǫ ǫ T ǫ ǫ ǫ 1 3ǫ n p(x 2:n x 1 ) = A xk 1 x k ; k=2 p(x n x 1 ) = n x 2:n 1 k=2 A xk 1 x k = (A n 1 ) x1 x n Let A = I + B t (B is the rate matrix), t = 1 n and let n, then p(x n x 1 ) = ( (I + B/n) n 1) x 1 x n exp(b) x1 x n

13 Example: mutation process Let X i {A, C, G, T} be the nucleotide state at some site at time i t. f(x k x k 1 ) = A xk 1 x k with A = A C G T A 1 3ǫ ǫ ǫ ǫ C ǫ 1 3ǫ ǫ ǫ ; G ǫ ǫ 1 3ǫ ǫ T ǫ ǫ ǫ 1 3ǫ n p(x 2:n x 1 ) = A xk 1 x k ; k=2 p(x n x 1 ) = n x 2:n 1 k=2 A xk 1 x k = (A n 1 ) x1 x n Let A = I + B t (B is the rate matrix), t = 1 n and let n, then p(x n x 1 ) = ( (I + B/n) n 1) x 1 x n exp(b) x1 x n

14 Hidden Markov models

15 Hidden Markov models Suppose that {X k } k 1 is not observed (it is hidden), but that we do observe a related process {Y k } k 1. Conditional on {X k } k 1 the observations {Y k } k 1 are independent, and marginally distributed as Y k (X k = x k ) g(, x k ) This implies that conditional on {X k } k 1 we have n p(y 1:n x 1:n ) = g(y k x k ) k=1

16 Hidden Markov models Suppose that {X k } k 1 is not observed (it is hidden), but that we do observe a related process {Y k } k 1. Conditional on {X k } k 1 the observations {Y k } k 1 are independent, and marginally distributed as Y k (X k = x k ) g(, x k ) This implies that conditional on {X k } k 1 we have n p(y 1:n x 1:n ) = g(y k x k ) k=1

17 Hidden Markov models Suppose that {X k } k 1 is not observed (it is hidden), but that we do observe a related process {Y k } k 1. Conditional on {X k } k 1 the observations {Y k } k 1 are independent, and marginally distributed as Y k (X k = x k ) g(, x k ) This implies that conditional on {X k } k 1 we have n p(y 1:n x 1:n ) = g(y k x k ) k=1

18 Example: Weather model : 9 10 : : 3 10 : Markov chain: Move from state to state according to the transition probabilities f(, ). Observations are the states visited: Hidden Markov model: Move between states (H,L) according to a Markov chain as before, but emit the observation (, ) according to a probability distribution g(, ) instead:,,,,,,...,,,,,,...

19 Example: Weather model Markov chain (hidden): p(hhhlll) =µ(h) f(h H) f(h H) f(l H) f(l L) f(l L); Observations: p( HHHLLL) =p(hhhlll) g( H)g( H)g( H)g( L)g( L)g( L)

20 Hidden Markov models Questions you may want to ask: What is the likelihood of observations: p( ) = x 1:6 p(x, ) What is the posterior probability of a particular state given observations: x p(x k ) = 1:k 1 x k+1:n p(x 1:n, ) x p(x 1:n 1:n, ) What is the single most likely state sequence given observations: arg max p(x 1:n, ) x 1:n

21 Hidden Markov models Questions you may want to ask: What is the likelihood of observations: p( ) = x 1:6 p(x, ) What is the posterior probability of a particular state given observations: x p(x k ) = 1:k 1 x k+1:n p(x 1:n, ) x p(x 1:n 1:n, ) What is the single most likely state sequence given observations: arg max p(x 1:n, ) x 1:n

22 Hidden Markov models Questions you may want to ask: What is the likelihood of observations: p( ) = x 1:6 p(x, ) What is the posterior probability of a particular state given observations: x p(x k ) = 1:k 1 x k+1:n p(x 1:n, ) x p(x 1:n 1:n, ) What is the single most likely state sequence given observations: arg max p(x 1:n, ) x 1:n

23 Hidden Markov models Questions you may want to ask: What is the likelihood of observations: p( ) = x 1:6 p(x, ) Forward algorithm What is the posterior probability of a particular state given observations: x p(x k ) = 1:k 1 x k+1:n p(x 1:n, ) x p(x 1:n 1:n, ) Forward + Backward algorithms What is the single most likely state sequence given observations: Viterbi algorithm arg max p(x 1:n, ) x 1:n

24 Example: posterior probability of a particular state p(x k y 1:n ) = p(x k, y 1:n ) x k p(x k, y 1:n ) p(x k, y 1:n ) = p(x 1:n, y 1:n ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x 1:k, y 1:k ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x k ) x k+1:n x 1:k 1 = p(x k, y 1:k )p(y k+1:n x k ) := α k (x k )β k (x k )

25 Example: posterior probability of a particular state p(x k y 1:n ) = p(x k, y 1:n ) x k p(x k, y 1:n ) p(x k, y 1:n ) = p(x 1:n, y 1:n ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x 1:k, y 1:k ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x k ) x k+1:n x 1:k 1 = p(x k, y 1:k )p(y k+1:n x k ) := α k (x k )β k (x k )

26 Example: posterior probability of a particular state p(x k y 1:n ) = p(x k, y 1:n ) x k p(x k, y 1:n ) p(x k, y 1:n ) = p(x 1:n, y 1:n ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x 1:k, y 1:k ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x k ) x k+1:n x 1:k 1 = p(x k, y 1:k )p(y k+1:n x k ) := α k (x k )β k (x k )

27 Example: posterior probability of a particular state p(x k y 1:n ) = p(x k, y 1:n ) x k p(x k, y 1:n ) p(x k, y 1:n ) = p(x 1:n, y 1:n ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x 1:k, y 1:k ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x k ) x k+1:n x 1:k 1 = p(x k, y 1:k )p(y k+1:n x k ) := α k (x k )β k (x k )

28 Example: posterior probability of a particular state p(x k y 1:n ) = p(x k, y 1:n ) x k p(x k, y 1:n ) p(x k, y 1:n ) = p(x 1:n, y 1:n ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x 1:k, y 1:k ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x k ) x k+1:n x 1:k 1 = p(x k, y 1:k )p(y k+1:n x k ) := α k (x k )β k (x k )

29 Example: posterior probability of a particular state p(x k y 1:n ) = p(x k, y 1:n ) x k p(x k, y 1:n ) p(x k, y 1:n ) = p(x 1:n, y 1:n ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x 1:k, y 1:k ) x 1:k 1 x k+1:n = p(x 1:k, y 1:k )p(x k+1:n, y k+1:n x k ) x k+1:n x 1:k 1 = p(x k, y 1:k )p(y k+1:n x k ) := α k (x k )β k (x k )

30 Example: posterior probability of a particular state α k (x k ) := p(x k, y 1:k ) α 1 (x 1 ) = p(x 1, y 1 ) = µ(x 1 )g(y 1, x 1 ) α k+1 (x k+1 ) = p(x 1:k+1, y 1:k+1 ) x 1:k = p(x 1:k, y 1:k )f(x k+1 x k )g(y k+1 x k+1 ) x 1:k 1 x k = p(x 1:k, y 1:k ) f(x k+1 x k )g(y k+1 x k+1 ) x k x 1:k 1 = α k (x k )f(x k+1 x k )g(y k+1 x k+1 ) x k

31 Example: posterior probability of a particular state α k (x k ) := p(x k, y 1:k ) α 1 (x 1 ) = p(x 1, y 1 ) = µ(x 1 )g(y 1, x 1 ) α k+1 (x k+1 ) = p(x 1:k+1, y 1:k+1 ) x 1:k = p(x 1:k, y 1:k )f(x k+1 x k )g(y k+1 x k+1 ) x 1:k 1 x k = p(x 1:k, y 1:k ) f(x k+1 x k )g(y k+1 x k+1 ) x k x 1:k 1 = α k (x k )f(x k+1 x k )g(y k+1 x k+1 ) x k

32 Example: posterior probability of a particular state α k (x k ) := p(x k, y 1:k ) α 1 (x 1 ) = p(x 1, y 1 ) = µ(x 1 )g(y 1, x 1 ) α k+1 (x k+1 ) = p(x 1:k+1, y 1:k+1 ) x 1:k = p(x 1:k, y 1:k )f(x k+1 x k )g(y k+1 x k+1 ) x 1:k 1 x k = p(x 1:k, y 1:k ) f(x k+1 x k )g(y k+1 x k+1 ) x k x 1:k 1 = α k (x k )f(x k+1 x k )g(y k+1 x k+1 ) x k

33 Example: posterior probability of a particular state α k (x k ) := p(x k, y 1:k ) α 1 (x 1 ) = p(x 1, y 1 ) = µ(x 1 )g(y 1, x 1 ) α k+1 (x k+1 ) = p(x 1:k+1, y 1:k+1 ) x 1:k = p(x 1:k, y 1:k )f(x k+1 x k )g(y k+1 x k+1 ) x 1:k 1 x k = p(x 1:k, y 1:k ) f(x k+1 x k )g(y k+1 x k+1 ) x k x 1:k 1 = α k (x k )f(x k+1 x k )g(y k+1 x k+1 ) x k

34 Example: posterior probability of a particular state α k (x k ) := p(x k, y 1:k ) α 1 (x 1 ) = p(x 1, y 1 ) = µ(x 1 )g(y 1, x 1 ) α k+1 (x k+1 ) = p(x 1:k+1, y 1:k+1 ) x 1:k = p(x 1:k, y 1:k )f(x k+1 x k )g(y k+1 x k+1 ) x 1:k 1 x k = p(x 1:k, y 1:k ) f(x k+1 x k )g(y k+1 x k+1 ) x k x 1:k 1 = α k (x k )f(x k+1 x k )g(y k+1 x k+1 ) x k

35 Example: posterior probability of a particular state α k (x k ) := p(x k, y 1:k ) α 1 (x 1 ) = p(x 1, y 1 ) = µ(x 1 )g(y 1, x 1 ) α k+1 (x k+1 ) = p(x 1:k+1, y 1:k+1 ) x 1:k = p(x 1:k, y 1:k )f(x k+1 x k )g(y k+1 x k+1 ) x 1:k 1 x k = p(x 1:k, y 1:k ) f(x k+1 x k )g(y k+1 x k+1 ) x k x 1:k 1 = α k (x k )f(x k+1 x k )g(y k+1 x k+1 ) x k

36 Example: posterior probability of a particular state β k (x k ) := p(y k+1:n x k ) β n (x n ) = p( x n ) = 1 β k 1 (x k 1 ) = p(y k:n x k 1 ) = x k p(x k x k 1 )p(y k x k )p(y k+1:n x k ) = x k f(x k x k 1 )g(y k x k )β k (x k )

37 Example: posterior probability of a particular state β k (x k ) := p(y k+1:n x k ) β n (x n ) = p( x n ) = 1 β k 1 (x k 1 ) = p(y k:n x k 1 ) = x k p(x k x k 1 )p(y k x k )p(y k+1:n x k ) = x k f(x k x k 1 )g(y k x k )β k (x k )

38 Example: posterior probability of a particular state β k (x k ) := p(y k+1:n x k ) β n (x n ) = p( x n ) = 1 β k 1 (x k 1 ) = p(y k:n x k 1 ) = x k p(x k x k 1 )p(y k x k )p(y k+1:n x k ) = x k f(x k x k 1 )g(y k x k )β k (x k )

39 Example: posterior probability of a particular state β k (x k ) := p(y k+1:n x k ) β n (x n ) = p( x n ) = 1 β k 1 (x k 1 ) = p(y k:n x k 1 ) = x k p(x k x k 1 )p(y k x k )p(y k+1:n x k ) = x k f(x k x k 1 )g(y k x k )β k (x k )

40 Example: posterior probability of a particular state β k (x k ) := p(y k+1:n x k ) β n (x n ) = p( x n ) = 1 β k 1 (x k 1 ) = p(y k:n x k 1 ) = x k p(x k x k 1 )p(y k x k )p(y k+1:n x k ) = x k f(x k x k 1 )g(y k x k )β k (x k )

41 Summary of useful HMM algorithms Sampling from the prior (trivial) Forward or Backward for the likelihood: p(y 1:n ) = x n α n (x n ) = x 1 β 1 (x 1 ) Forward and Backward for: State posteriors: p(x k y 1:n ) = α k (x k )β k (x k )/p(y 1:n ) Sampling state paths from the posterior Expectation-Maximization (Baum-Welch) to estimate parameters Posterior decoding (MAP paths) Viterbi for single most likely state path, arg max x 1:n p(x 1:n y 1:n )

42 Examples

43 Example 1: Motif finding hx0a ( 173 )...dyvrsmiadylnklid-igvagfridaskhmw... 1smd ( 173 )...dyvrskiaeymnhlid-igvagfridaskhmw... 1jae ( 161 )...dyvrgvlidymnhmid-lgvagfrvdaakhms... 1g94a ( 150 )...nyvqntiaayindlqa-igvkgfrfdaskhva... 1bag ( 152 )...tqvqsylkrfleraln-dgadgfrfdaakhie... 1smaa ( 303 )...pevkrylldvatywirefdidgwrldvaneid... 1bvza ( 301 )...pevkeylfdvarfwm-eqgidgwrldvanevd... 1uok ( 175 )...ekvrqdvyemmkfwle-kgidgfrmdvinfis... 2aaa ( 181 )...tavrtiwydwvadlvsnysvdglridsvlevq... 7taa ( 181 )...dvvknewydwvgslvsnysidglridtvkhvq... 1cgt ( 205 )...atidkyfkdaiklwld-mgvdgirvdavkhmp... 1ciu ( 206 )...stidsylksaikvwld-mgidgirldavkhmp... 1cyg ( 201 )...pvidrylkdavkmwid-mgidgirmdavkhmp... 1qhpa ( 204 )...gtiaqyltdaavqlva-hgadglridavkhfn... 1hvxa ( 209 )...pevvtelkswgkwyvnttnidgfrldavkhik... 1vjs ( 206 )...pdvaaeikrwgtwyanelqldgfrldavkhik... 1gcya ( 168 )...pqvygmfrdeftnlrsqygaggfrfdfvrgya... 1avaa ( 154 )...lrvqkelvewlnwlkadigfdgwrfdfakgys... 1ehaa ( 227 )...devrkfilenveywikeynvdgfrldavhaii... 1bf2 ( 350 )...tvaqnlivdslaywantmgvdgfrfdlasvlg... 1gjwa ( 360 )...relweylagviphyqkkygidgarldmghalp...

44 Example 1: Motif finding A motif modeled as an ungapped weight matrix can be represented as an HMM. We can ask for a local alignment by adding padding states at the beginning and the end: X X Start A: A: U: C: C: A: G: C: U: C: A: G: End

45 Example 1: Motif finding Not all related motifs have exactly the same length; some may lack certain residues. This is modeled by introducing delete states into the HMM: X X Start A: A: U: C: C: A: G: C: U: C: A: G: End The transition probabilities to/from delete states is position-dependent: the probability of deleting a particular nucleotide depends on the location within the motif.

46 Example 1: Motif finding Similarly, some motif may have extra residues, which are modeled with insert states. X X X X X X X X Start A: A: U: C: C: A: G: C: U: C: A: G: End

47 Example 1: Motif finding We can add a loopback transition to allow for multiple consecutive matches (think e.g. zinc-finger proteins): X X X X X X X X Start A: A: U: C: C: A: G: C: U: C: A: G: End

48 Example 1: Motif finding This is the profile HMM architecture in the SAM/HMMER packages: X X X X X X X X Start U: C: C: A: A: A: U: C: G: C: G: A: End In this context, the standard algorithm achieve the following: Viterbi: alignment of a sequence to the HMM Forward: likelihood that a sequence contains the motif Forward-Backward: posterior expected state/transition counts Baum-Welch uses these expectations to maximise the likelihood of a given training set

49 Example 2: Gene finding (source unknown)

50 Example 2: Gene finding Burge and Karlin, JMB 1998

51 Example 2: Gene finding UCSC genome browser

52 Example 3: PhyloHMM Siepel et al., Genome Research 2005

53 Example 4: Alignment Observation: two sequences GAATTCGA; GCATCGA Required: Alignment; sequence of alignment columns ######## ####-### GAATTCGA GCAT-CGA

54 Example 4: Alignment Observation: two sequences GAATTCGA; GCATCGA Required: Alignment; sequence of alignment columns ######## ####-### GAATTCGA GCAT-CGA

55 Example 4: Alignment To fit into HMM framework: Allow two sequences to be emitted simultaneously Allow states with empty emissions The Markov chain {X i } i 1 is the sequence of alignment columns, # #, # #, # #, # #, # -, # #, # #, # # Nucleotides emitted together are correlated (homologous) α, β now have two indices; computing them involves traversing a 2-dimensional dynamic programming table.

56 Example 4: Alignment To fit into HMM framework: Allow two sequences to be emitted simultaneously Allow states with empty emissions The Markov chain {X i } i 1 is the sequence of alignment columns, # #, # #, # #, # #, # -, # #, # #, # # Nucleotides emitted together are correlated (homologous) α, β now have two indices; computing them involves traversing a 2-dimensional dynamic programming table.

57 Example 4: Alignment To fit into HMM framework: Allow two sequences to be emitted simultaneously Allow states with empty emissions The Markov chain {X i } i 1 is the sequence of alignment columns, # #, # #, # #, # #, # -, # #, # #, # # Nucleotides emitted together are correlated (homologous) α, β now have two indices; computing them involves traversing a 2-dimensional dynamic programming table.

58 Example 5: co-estimating alignment and conservation M = # # ; Ins = - # ; Del = # -

59 Example 5: co-estimating alignment and conservation

60 Example 6: Probabilistic progressive alignment Problem: How to align > 2 sequences? Naive HMM implementation Properly accounts for uncertainty (in alignment; not tree) Complexity O(L N ); DP table has N dimensions Progressive alignment: pairwise + infer sequence at root Practical approach; complexity O(NL 2 ) Inferences are biased and overconfident e.g. PRANK; Loytynoja & Goldman 2005

61 Example 6: Probabilistic progressive alignment Problem: How to align > 2 sequences? Naive HMM implementation Properly accounts for uncertainty (in alignment; not tree) Complexity O(L N ); DP table has N dimensions Progressive alignment: pairwise + infer sequence at root Practical approach; complexity O(NL 2 ) Inferences are biased and overconfident e.g. PRANK; Loytynoja & Goldman 2005

62 Example 6: Probabilistic progressive alignment Problem: How to align > 2 sequences? Naive HMM implementation Properly accounts for uncertainty (in alignment; not tree) Complexity O(L N ); DP table has N dimensions Progressive alignment: pairwise + infer sequence at root Practical approach; complexity O(NL 2 ) Inferences are biased and overconfident e.g. PRANK; Loytynoja & Goldman 2005

63 Example 6: Probabilistic progressive alignment Solution: Combine progressive and probabilistic approaches. Represent ancestral sequence a of a pair of descendant sequences s 1, s 2 as a partial likelihood, with a as parameter: P(s 1, s 2 a) Prune unlikely alignment columns, and represent remainder of dynamic programming table as a graph Iterate, aligning/pruning graphs progressively up the tree At root, use prior distribution on a to find multiple alignment Algorithm can be formalized in terms of transducers. Details: Westesson/Holmes, arxiv: v2; PLoS ONE e34572

64 Example 6: Probabilistic progressive alignment

65 Example 7: Lander-Green (phasing in pedigrees)

66 Example 7: Lander-Green (phasing in pedigrees) Transmission in pedigree (n non-founders) determined by 2n bits 2 bits identify grandparent of origin for paternal/maternal chromosome The transmission vector is the state of the HMM (2 2n states) States changes (single bit flips) correspond to recombinations State determines more/less likely observed genotypes Some require > 1 mutation per site Lander and Green, PNAS, 1987

67 Example 8: Li and Stephens (phasing in populations) Li and Stephens, Genetics 2003

68 Intermezzo: the Wright-Fisher model

69 Intermezzo: the Wright-Fisher model

70 Intermezzo: the Wright-Fisher model

71 Example 9: CoalHMM and incomplete lineage sorting Hobolth,Dutheil,Hawks,Schierup,Mailund (2011) Genome Research

72 Example 10: PSMC and demographic inference Li and Durbin, Nature 2011

73 Example 10: PSMC and demographic inference

74 Example 10: PSMC and demographic inference

75 Jounal club

76 Papers: Hobolth, Christensen, Mailund, Schierup (2007) Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet 3(2):e7 Li and Durbin (2011) Inference of human population history from individual whole-genome sequences Nature 475, Lunter, Rocco, Mimouni, Heger, Caldeira, Hein (2008) Uncertainty in homology inferences: assessing and improving genomic sequence alignment Genome Res 18(2) P Scheet and M Stephens (2006) A fast and flexible method for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78(4) Lander and Green (1987) PNAS 84:2363-7; and Kruglyak, Daly, Reeve-Daly and Lander (1996) AJHG 58:

77 Questions - Hobolth et al. Hobolth et al.: Explain the difference between phylogeny and genealogy, and the concept of incomplete lineage sorting. What do the states of the HMM represent? What are informative sites for the model? The model can in principle be applied to any quartet of species. What aspect of the shape of the phylogeny relating the species, and what other parameters (if any) are relevant to assess whether the model might provide useful inferences?

78 Questions - Li and Durbin What do the states of the HMM represent? At any locus, the density of heterozygous sites determines which state is currently most likely. On average, and in human, how many heterozygous sites occur between state switches? Do you think the data is very informative about the HMM state at any position? What limits the power to infer N e at recent and ancient times?

79 Questions - Lunter et al. List some causes of inaccuracies in alignments. Would a more accurate model of sequence evolution improve alignments? Is model misfit the main cause for alignment inaccuracies? What is the practical limit (in terms of evolutionary distance, in mutations/site) for pairwise alignment of DNA? Would multiple alignment allow DNA from more divergent species to be aligned? How can divergence be assessed by alignment for species that are more divergent? What is posterior decoding and how does it work? In what way does this improve alignments compared to a Viterbi decoding? Why is this?

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms  Hidden Markov Models Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training

More information

HIDDEN MARKOV MODELS

HIDDEN MARKOV MODELS HIDDEN MARKOV MODELS Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems

More information

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

The Lander-Green Algorithm. Biostatistics 666 Lecture 22 The Lander-Green Algorithm Biostatistics 666 Lecture Last Lecture Relationship Inferrence Likelihood of genotype data Adapt calculation to different relationships Siblings Half-Siblings Unrelated individuals

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009 Gene Genealogies Coalescence Theory Annabelle Haudry Glasgow, July 2009 What could tell a gene genealogy? How much diversity in the population? Has the demographic size of the population changed? How?

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How

More information

Hidden Markov Models. Three classic HMM problems

Hidden Markov Models. Three classic HMM problems An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Hidden Markov Models Slides revised and adapted to Computational Biology IST 2015/2016 Ana Teresa Freitas Three classic HMM problems

More information

Markov Chains and Hidden Markov Models. = stochastic, generative models

Markov Chains and Hidden Markov Models. = stochastic, generative models Markov Chains and Hidden Markov Models = stochastic, generative models (Drawing heavily from Durbin et al., Biological Sequence Analysis) BCH339N Systems Biology / Bioinformatics Spring 2016 Edward Marcotte,

More information

Lecture 7 Sequence analysis. Hidden Markov Models

Lecture 7 Sequence analysis. Hidden Markov Models Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden

More information

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. x 1 x 2 x 3 x K Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization: f 0 (0) = 1 f k (0)

More information

HMMs and biological sequence analysis

HMMs and biological sequence analysis HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

More information

11.3 Decoding Algorithm

11.3 Decoding Algorithm 11.3 Decoding Algorithm 393 For convenience, we have introduced π 0 and π n+1 as the fictitious initial and terminal states begin and end. This model defines the probability P(x π) for a given sequence

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/14/07 CAP5510 1 CpG Islands Regions in DNA sequences with increased

More information

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P

More information

HMM : Viterbi algorithm - a toy example

HMM : Viterbi algorithm - a toy example MM : Viterbi algorithm - a toy example 0.6 et's consider the following simple MM. This model is composed of 2 states, (high GC content) and (low GC content). We can for example consider that state characterizes

More information

CSCE 471/871 Lecture 3: Markov Chains and

CSCE 471/871 Lecture 3: Markov Chains and and and 1 / 26 sscott@cse.unl.edu 2 / 26 Outline and chains models (s) Formal definition Finding most probable state path (Viterbi algorithm) Forward and backward algorithms State sequence known State

More information

Stephen Scott.

Stephen Scott. 1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for

More information

Hidden Markov Models for biological sequence analysis

Hidden Markov Models for biological sequence analysis Hidden Markov Models for biological sequence analysis Master in Bioinformatics UPF 2017-2018 http://comprna.upf.edu/courses/master_agb/ Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA

More information

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding Example: The Dishonest Casino Hidden Markov Models Durbin and Eddy, chapter 3 Game:. You bet $. You roll 3. Casino player rolls 4. Highest number wins $ The casino has two dice: Fair die P() = P() = P(3)

More information

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models Modeling the statistical properties of biological sequences and distinguishing regions

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes music recognition deal with variations in - actual sound -

More information

Hidden Markov Models (I)

Hidden Markov Models (I) GLOBEX Bioinformatics (Summer 2015) Hidden Markov Models (I) a. The model b. The decoding: Viterbi algorithm Hidden Markov models A Markov chain of states At each state, there are a set of possible observables

More information

Alignment Algorithms. Alignment Algorithms

Alignment Algorithms. Alignment Algorithms Midterm Results Big improvement over scores from the previous two years. Since this class grade is based on the previous years curve, that means this class will get higher grades than the previous years.

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related

More information

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

More information

HMM : Viterbi algorithm - a toy example

HMM : Viterbi algorithm - a toy example MM : Viterbi algorithm - a toy example 0.5 0.5 0.5 A 0.2 A 0.3 0.5 0.6 C 0.3 C 0.2 G 0.3 0.4 G 0.2 T 0.2 T 0.3 et's consider the following simple MM. This model is composed of 2 states, (high GC content)

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

Hidden Markov Models for biological sequence analysis I

Hidden Markov Models for biological sequence analysis I Hidden Markov Models for biological sequence analysis I Master in Bioinformatics UPF 2014-2015 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Example: CpG Islands

More information

Lecture 9. Intro to Hidden Markov Models (finish up)

Lecture 9. Intro to Hidden Markov Models (finish up) Lecture 9 Intro to Hidden Markov Models (finish up) Review Structure Number of states Q 1.. Q N M output symbols Parameters: Transition probability matrix a ij Emission probabilities b i (a), which is

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

More information

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

Pair Hidden Markov Models

Pair Hidden Markov Models Pair Hidden Markov Models Scribe: Rishi Bedi Lecturer: Serafim Batzoglou January 29, 2015 1 Recap of HMMs alphabet: Σ = {b 1,...b M } set of states: Q = {1,..., K} transition probabilities: A = [a ij ]

More information

order is number of previous outputs

order is number of previous outputs Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y

More information

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Hidden Markov Models 1

Hidden Markov Models 1 Hidden Markov Models Dinucleotide Frequency Consider all 2-mers in a sequence {AA,AC,AG,AT,CA,CC,CG,CT,GA,GC,GG,GT,TA,TC,TG,TT} Given 4 nucleotides: each with a probability of occurrence of. 4 Thus, one

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas CG-Islands Given 4 nucleotides: probability of occurrence is ~ 1/4. Thus, probability of

More information

Data Mining in Bioinformatics HMM

Data Mining in Bioinformatics HMM Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics

More information

Challenges when applying stochastic models to reconstruct the demographic history of populations.

Challenges when applying stochastic models to reconstruct the demographic history of populations. Challenges when applying stochastic models to reconstruct the demographic history of populations. Willy Rodríguez Institut de Mathématiques de Toulouse October 11, 2017 Outline 1 Introduction 2 Inverse

More information

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Taming the Beast Workshop

Taming the Beast Workshop Workshop and Chi Zhang June 28, 2016 1 / 19 Species tree Species tree the phylogeny representing the relationships among a group of species Figure adapted from [Rogers and Gibbs, 2014] Gene tree the phylogeny

More information

CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)

CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG) CS1820 Notes hgupta1, kjline, smechery April 3-April 5 April 3 Notes 1 Minichiello-Durbin Algorithm input: set of sequences output: plausible Ancestral Recombination Graph (ARG) note: the optimal ARG is

More information

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

Use of hidden Markov models for QTL mapping

Use of hidden Markov models for QTL mapping Use of hidden Markov models for QTL mapping Karl W Broman Department of Biostatistics, Johns Hopkins University December 5, 2006 An important aspect of the QTL mapping problem is the treatment of missing

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II 6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution Lecture 05 Hidden Markov Models Part II 1 2 Module 1: Aligning and modeling genomes Module 1: Computational foundations Dynamic programming:

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

Hidden Markov Models. Ron Shamir, CG 08

Hidden Markov Models. Ron Shamir, CG 08 Hidden Markov Models 1 Dr Richard Durbin is a graduate in mathematics from Cambridge University and one of the founder members of the Sanger Institute. He has also held carried out research at the Laboratory

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"

RNA Search and! Motif Discovery Genome 541! Intro to Computational! Molecular Biology RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Phylogenomics of closely related species and individuals

Phylogenomics of closely related species and individuals Phylogenomics of closely related species and individuals Matthew Rasmussen Siepel lab, Cornell University In collaboration with Manolis Kellis, MIT CSAIL February, 2013 Short time scales 1kyr-1myrs Long

More information

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Comparative Gene Finding. BMI/CS 776  Spring 2015 Colin Dewey Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

More information

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. x 1 x 2 x 3 x K Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K HiSeq X & NextSeq Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization:

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured

More information

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9: Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2

More information

6 Markov Chains and Hidden Markov Models

6 Markov Chains and Hidden Markov Models 6 Markov Chains and Hidden Markov Models (This chapter 1 is primarily based on Durbin et al., chapter 3, [DEKM98] and the overview article by Rabiner [Rab89] on HMMs.) Why probabilistic models? In problems

More information

Stephen Scott.

Stephen Scott. 1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative

More information

Directed Probabilistic Graphical Models CMSC 678 UMBC

Directed Probabilistic Graphical Models CMSC 678 UMBC Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,

More information

6 Introduction to Population Genetics

6 Introduction to Population Genetics Grundlagen der Bioinformatik, SoSe 14, D. Huson, May 18, 2014 67 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,

More information

HMM part 1. Dr Philip Jackson

HMM part 1. Dr Philip Jackson Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM part 1 Dr Philip Jackson Probability fundamentals Markov models State topology diagrams Hidden Markov models -

More information

Hidden Markov Models

Hidden Markov Models Andrea Passerini passerini@disi.unitn.it Statistical relational learning The aim Modeling temporal sequences Model signals which vary over time (e.g. speech) Two alternatives: deterministic models directly

More information

Multiple Sequence Alignment using Profile HMM

Multiple Sequence Alignment using Profile HMM Multiple Sequence Alignment using Profile HMM. based on Chapter 5 and Section 6.5 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. students Beatrice Miron, Oana Răţoi,

More information

HMM: Parameter Estimation

HMM: Parameter Estimation I529: Machine Learning in Bioinformatics (Spring 2017) HMM: Parameter Estimation Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Content Review HMM: three problems

More information

6 Introduction to Population Genetics

6 Introduction to Population Genetics 70 Grundlagen der Bioinformatik, SoSe 11, D. Huson, May 19, 2011 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,

More information

Introduction to Hidden Markov Models (HMMs)

Introduction to Hidden Markov Models (HMMs) Introduction to Hidden Markov Models (HMMs) But first, some probability and statistics background Important Topics 1.! Random Variables and Probability 2.! Probability Distributions 3.! Parameter Estimation

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

Hidden Markov Models NIKOLAY YAKOVETS

Hidden Markov Models NIKOLAY YAKOVETS Hidden Markov Models NIKOLAY YAKOVETS A Markov System N states s 1,..,s N S 2 S 1 S 3 A Markov System N states s 1,..,s N S 2 S 1 S 3 modeling weather A Markov System state changes over time.. S 1 S 2

More information

Hidden Markov Models. Terminology, Representation and Basic Problems

Hidden Markov Models. Terminology, Representation and Basic Problems Hidden Markov Models Terminology, Representation and Basic Problems Data analysis? Machine learning? In bioinformatics, we analyze a lot of (sequential) data (biological sequences) to learn unknown parameters

More information

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16 VL Algorithmen und Datenstrukturen für Bioinformatik (19400001) WS15/2016 Woche 16 Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie Universität Berlin Based on slides by

More information

Chapter 4: Hidden Markov Models

Chapter 4: Hidden Markov Models Chapter 4: Hidden Markov Models 4.1 Introduction to HMM Prof. Yechiam Yemini (YY) Computer Science Department Columbia University Overview Markov models of sequence structures Introduction to Hidden Markov

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models CI/CI(CS) UE, SS 2015 Christian Knoll Signal Processing and Speech Communication Laboratory Graz University of Technology June 23, 2015 CI/CI(CS) SS 2015 June 23, 2015 Slide 1/26 Content

More information

Diffusion Models in Population Genetics

Diffusion Models in Population Genetics Diffusion Models in Population Genetics Laura Kubatko kubatko.2@osu.edu MBI Workshop on Spatially-varying stochastic differential equations, with application to the biological sciences July 10, 2015 Laura

More information

Hidden Markov Models (HMMs) November 14, 2017

Hidden Markov Models (HMMs) November 14, 2017 Hidden Markov Models (HMMs) November 14, 2017 inferring a hidden truth 1) You hear a static-filled radio transmission. how can you determine what did the sender intended to say? 2) You know that genes

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Demographic Inference with Coalescent Hidden Markov Model

Demographic Inference with Coalescent Hidden Markov Model Demographic Inference with Coalescent Hidden Markov Model Jade Y. Cheng Thomas Mailund Bioinformatics Research Centre Aarhus University Denmark The Thirteenth Asia Pacific Bioinformatics Conference HsinChu,

More information

L23: hidden Markov models

L23: hidden Markov models L23: hidden Markov models Discrete Markov processes Hidden Markov models Forward and Backward procedures The Viterbi algorithm This lecture is based on [Rabiner and Juang, 1993] Introduction to Speech

More information

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:

More information

Hidden Markov Models in Bioinformatics 14.11

Hidden Markov Models in Bioinformatics 14.11 idden Markov Models in Bioinformatics 14.11 60 min Definition Three Key Algorithms Summing over Unknown States Most Probable Unknown States Marginalizing Unknown States Key Bioinformatic Applications Pedigree

More information

Mathematical models in population genetics II

Mathematical models in population genetics II Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population

More information

Hidden Markov Models and some applications

Hidden Markov Models and some applications Oleg Makhnin New Mexico Tech Dept. of Mathematics November 11, 2011 HMM description Application to genetic analysis Applications to weather and climate modeling Discussion HMM description Application to

More information

Graphical Models Seminar

Graphical Models Seminar Graphical Models Seminar Forward-Backward and Viterbi Algorithm for HMMs Bishop, PRML, Chapters 13.2.2, 13.2.3, 13.2.5 Dinu Kaufmann Departement Mathematik und Informatik Universität Basel April 8, 2013

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Modelling Genetic Variations with Fragmentation-Coagulation Processes

Modelling Genetic Variations with Fragmentation-Coagulation Processes Modelling Genetic Variations with Fragmentation-Coagulation Processes Yee Whye Teh, Charles Blundell, Lloyd Elliott Gatsby Computational Neuroscience Unit, UCL Genetic Variations in Populations Inferring

More information

Markov Models & DNA Sequence Evolution

Markov Models & DNA Sequence Evolution 7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under

More information

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named We Live in Exciting Times ACM (an international computing research society) has named CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Apr. 2, 2019 Yoshua Bengio,

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information