Bioinformatics 1--lectures 15, 16. Markov chains Hidden Markov models Profile HMMs

Size: px
Start display at page:

Download "Bioinformatics 1--lectures 15, 16. Markov chains Hidden Markov models Profile HMMs"

Transcription

1 Bioinformatics 1--lectures 15, 16 Markov chains Hidden Markov models Profile HMMs

2 target sequence database input to database search results are sequence family pseudocounts or background-weighted pseudocounts or position-specific pseudocounts aligned become MSA pairwise columns condensed to account for unobserved profile data distances becomes match states in time order redundancy removed by weights adds delete and insert states to profile HMM tree overview

3 Profile hidden Markov models I = insertion state D = deletion state M = match state I begin D I M D I M... D I M D I M end The probability of a gap or insertion might be position specific. Profile HMMs can model this.

4 Markov processes time sequence Markov process is any process where the next item in the list depends on the current item. The dimension can be time, sequence position, etc

5 Modeling proteins using Markov chains A Markov chain is a network of states connected by transitions A Markov chain is a stochastic model that emits symbol data whose probability depends only on the last symbol emitted. H E H=helix L E=extended (strand) L=loop

6 What is a stochastic model? A model is a simplified version of reality. The simpler, the better. A stochastic model has the form: random numbers model synthetic data

7 Markov states...emits a symbol each time you visit it....connects to other states (and possibly itself), with probabilities attached..6 E.3 L The sum of all transition probabilities = 1.1 H note ===> Markov chains emit discrete 1-dimensional data.

8 Setting the parameters of a Markov model from data. Secondary structure data LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLEEEEELLLLEEEEELLLLLLLLLLLEEEEEEEEELLLLLEEEEEEEEELL LLLLLLEEEEEELLLLLEEEEEELLLLLLLLLLLLLLLEEEEEELLLLLEEEEELLLLLLLLLLEEEELLLLEEEELLLLEEE EEEEELLLLLLEEEEEEEEELLLLLLEELLLLLLLLLLLLLLLLLLLLLLLLEEEEELLLEEEEEELLLLLLLLLLEEEEEEL LLLLEEELLLLLLLLLLLLLEEEEEEEEELLLEEEEEELLLLLLLLLLLLLLLLLLLHHHHHHHHHHHHHLLLLLLLEEL HHHHHHHHHHLLLLLLHHHHHHHHHHHLLLLLLLELHHHHHHHHHHHHLLLLLHHHHHHHHHHHHHLLLLLEEEL HHHHHHHHHHLLLLLLHHHHHHHHHHEELLLLLLHHHHHHHHHHHLLLLLLLHHHHHHHHHHHHHHHHHHHHH HHHHHHLLLLLLHHHHHHHHHHHHHHHHHHLLLHHHHHHHHHHHHHHLLLLEEEELLLLLLLLLLLLLLLLEEEEL LLLHHHHHHHHHHHHHHHLLLLLLLLEELLLLLHHHHHHHHHHHHHHLLLLLLEEEEELLLLLLLLLLHHHHHHHH HHHHHHHHHHHHHHHLLLLLHHHHHHHHHLLLLHHHHHHHLLHHHHHHHHHHHHHHHHHHHH Count the pairs to get the transition probability. E P(L E) L P(L E) = P(EL)/P(E) = counts(el)/counts(e) counts(e) = counts(ee) + counts(el) + counts(eh) Therefore: P(E E) + P(L E) + P(H E) = 1.

9 Bayes notation and Rabiner s notation a yx = P(x y) = P(y, x) P(y) = F(y, x) F(y)...the conditional probability of x given y. π x = P(x) = F(x)/N...the probability of x (unconditional).

10 A transition matrix P(H H) P(E H) P(E E) P(q t q t-1 ) H E H E L P(H E) P(L H) P(L E) H P(H L) P(E L) E L L P(L L) **This is a first-order MM. Transition probabilities depand on

11 What is P(S λ), the probability of a sequence, given the model? λ H E L H P( HHEELL λ) =P(H)P(H H)P(E H)P(E E)P(L E)P(L L) =(.33)(.93)(.01)(.80)(.19)(.90) =4.2E-4 E L P( HHHHHH λ) =0.69 P( HEHEHE λ) =1E-6 common protein secondary structure not protein secondary structure Probability discriminates between realistic and unrealistic sequences

12 What is the maximum likelihood model given a dataset of sequences? Dataset. HHEELL H E L H E L HHEELL HHEELL HHEELL H E H E HHEELL L L HHEELL Maximum likelihood model Count the state pairs. Normalize by row.

13 Is this model too simple? H E L Frequency Synthetic helix length data from this model Real helix length data *L.Pal et al, J. Mol. Biol. (2003) 326, A model should be as simple as possible but not simpler --Einstein

14 A Markov chain for proteins where helices are always exactly 4 residues long H H H H L E

15 A Markov chain for proteins where helices are always at least 4 residues long H H H H L E Can you draw a Markov chain where helices are always a multiple of 4 long?

16 Exercise: generate a MM based on the data. how much wood would a wood chuck chuck if a wood chuck would chuck wood?

17 Markov chain for DNA sequence A T C G P(ATCGCGTA...) = π A a AT a TC a CG a GC a CG a GT a TA 1

18 DNA... + methylated CpG Islands Not methylated DNA is methylated on C to protect against endonucleases. Using mass spectroscopy we can find regions of DNA that are methylated and regions that are not. Regions that are protected from methylation may be functionally important, i.e. transcription factor binding sites. NNNCGNNN NNNTGNNN During the course of evolution. Methylated CpG s get mutated to TpG s

19 Using Markov chains for descrimination: CpG Islands in human chromosome sequences CpG rich CpG poor... CpG rich= "+" + A C G T A C G T CpG poor= "-" - A C G T A C G T P(CGCG +) = π C (0.274)(0.339)(0.274) = π C P(CGCG -) = π C (0.078)(0.246)(0.078) = π C From Durbin,Eddy, Krogh and Mitcheson Biological Sequence Analysis (1998) p.50 2

20 Comparing two MMs The log likelihood ratio (LLR) L + a L + L x i 1 x i i =1 a x i 1 x i i =1 a xi 1 x i i =1 log = log a x i 1 x i = β x i 1 x i Log-likelihood ratios for transitions: β A C G T A C G T Sum the LLRs. If the result is positive, its a CpG island, otherwise not. LLR(CGCG)= = yes 3

21 In class exercise: what s the LLR? What is the LLR that this seq is a CpG Island? ATGTCTTAGCGCGATCAGCGAAAGCCACG LLR = L β xi 1 x i i=1 β A C G T A C G T = 3

22 Markov chain for DNA sequence A T C G P(ATCGCGTA...) = π A a AT a TC a CG a GC a CG a GT a TA 1

23 Combining two Markov chains to make a hidden Markov model A hidden Markov model can have multiple paths for a sequence C "+" model A G T " "model A T Transitions between +/- models G In Hidden Markov models (HMM), there is no one-to-one correspondence between the state and the emitted symbol. 4

24 Probability of a sequence using a HMM Different state sequences can produce the same emitted sequence Nucleotide sequence: C G C G State sequences (paths): C+ G+ C+ G+ C- G- C- G- C+ G+ C- G- C+ G- C- G+ etc... P(sequence,path) π C+ a C+G+ a G+C+ a C+G+ π C- a C-Ga G-Ca C-Gπ C+ a C+G+ a G+Ca C-Gπ C+ a C+Ga G-Ca C-G+ etc... sum these P(CGCG λ) = Σ P(Q) All paths Q Each state sequence has a probability. The sum of all state sequences that emit CGCG is the P(CGCG). 5

25 The problem is finding the states given the sequence. Typically, when using a HMM, the task is to determine the optimal state pathway given the sequence. The state pathway provides some predictive feature, such as secondary structure, or splice site/not splice site, or CpG island/not CpG island, etc. In Principle, we can do this task by trying all state pathways Q, and choosing the optimal. In Practice, this is usually impossible, because the number of pathways increases as the number of states to the power of the length, i.e. O(n m ). How do we do it, then?

26 HMM that use profiles b H (i) H E The marble bag represents a probability distribution of amino acids, b. ( a profile ) L Each state emits one amino acid from the marblebag, for each visit. probability distribution == a set of probabilities (0 p 1) that sum to 1. stacked odds? 0. 1.

27 states emit aa and ss. Amino acid Sequence H E L Given an amino acid sequence, what is the most probable state sequence? State sequence (secondary structure)

28 Maximize: Joint probability of a sequence and pathway Q = {q 1,q 2,q 3, q T } = sequence of Markov states, or pathway S = {s 1,s 2,s 3, s T } = sequence of amino acids or nucleotides T = length of S and Q. Joint probability of a pathway and sequence, given a HMM λ. S= A G P L V D H H H H H H Q= E E E E E E L L L L L L P= π H b H (A) a HH b H (G) a HE b E (P) a EE b E (L) a EE b E (V) a EL b L (D)

29 Joint probability : general expression General expression for pathway Q through HMM λ : P(S,Q λ) = π q1 b qt ( s t )a qt q t+1 t =1,T ** **when t=t, there is no q t+1. Use a = 1 A G P L V D H H H H H H E E E E E E L L L L L L

30 The Three HMM Algorithms 1. The Viterbi algorithm: get the optimal state pathway. Maximum joint prob. 2. The Forward/Backward algorithm: get the probability of each state at each position. Sum over all joint probs. 3. Expectation/Maximization: refine the parameters of the model using the data

31 The Viterbi algorithm:the maximum probability path v k (t) = MAX v l (t-1) a lk b k (s t ) Trc k (t) = largmax v l (t-1) a lk b k (s t ) l Recursive. We save the value v and also a traceback arrow Trc as we go along. Plot state versus position. Each v is a MAX over the whole previous column of v s. v l (i) Markov states l T-1 T k When t = T the last position, the traceback arrow from the MAX give the optimal state sequence. sequence position t

32 Exercise: Write the Viterbi algorithm v k (t) = MAX v l (t-1) a lk b k (s t ) Trc k (t) = ARGMAX v l (t-1) a lk b k (s t ) 1 2 states 1..L positions 1..T

33 Exercise: Write the Viterbi algorithm v k (t) = MAX v l (t-1) a lk b k (s t ) Trc k (t) = ARGMAX v l (t-1) a lk b k (s t ) initialize v k (1)=b k (s 1 ) for t=2,t { states 1..L for k=1,l { positions 1..T } }

34 α The Forward algorithm:all paths to a state This is alpha, the forward probability α k (t) = Σ α l (t-1) a lk b k (t) l This is a, the arrow between states. Forward stands for forward recursion After the first row, each α depends on the whole previous row of α s. Markov states l α lt... t-1 t k... Sum of P over all paths up to state k at t = α k (t) sequence position i At the end of the sequence, when t=t, the sum of α k (T) equals the total probability of the seuqence given the model, P(S λ).

35 β The Backward algorithm:all paths from a state β k (t) = Σ β l (t+1) a kl b k (t) l Backward stands for backward recursion. The algorithm starts at t=t, the end of the sequence. (The transitions are still forward.) Each β depends on the whole next row of β s. β lt Sum over all paths to state k from t+1 = β k (t)... k... Markov states l t t+1 T-2 T-1 T sequence position i At the beginning of the sequence, when t=1, the sum of β k (1) equals the total probability of the sequence given the model, P(S λ).

36 Exercise: Write the Forward algorithm α k (t) = Σ α l (t-1) a lk b k (t) initialize α k (1)=π k (s 1 ) for t=2,t { states 1..L for k=1,l { positions 1..T } }

37 γ Forward/Backward algorithm:all paths through a state. γ k (t) = α k (t) *β k (t) γ k (t) is the total probability of state k at t, given the sequence S and the model, λ. α lt β lt Markov states l t-1 k t t+1... T-2 T-1 T Markov states l sequence position t The bottleneck through which all paths must travel.

38 Expectation/Maximization: refining the model Example: refining b k (G) (i.e. the number of Gly s in the k th marble bag) Step 1) Count how many Glycines are found in state k. Step 2) Normalize it. Reset b k (G) in the new model to that value. Step 3) Do steps 1-2 for all states k in λ and all 20 amino acids. Repeat steps 1-3 using the new model. Iterate to convergence. Expectation/Maximization is often abbreviated EM.

39 Σ over all G in all sequences, S Expectation/Maximization: refining the model Example: refining b k (G) To count the Glycines, we calculate the Forward/Backward value for state k at every Glycine in the database. Then sum them. P(k t,s,λ) = Σ all paths though k at t = γ k (t) = α k (t) *β k (t) S D K P H S G L K V S D E S D K P H S S I K G S D E K S G I N C L K V H R S D E S D K P Q G L K V S D E F F S D K P H S E E E G S D E D G L K V S D E G W W N N K P G L K V S D E G Q G Q + + = b k (G) S D K P H S G M G L K E A S D K P H G L K V S D E This is normalized to sum to 1 over all 20 AA s.

40 Expectation/Maximization: refining the model Example: refining a jk, the probability of a transition from state j to state k. Step 1) Get the probability of ending in state j at t --> α j (t) Step 2) Get the probability of starting in state k at t+1 --> β k (t) Step 3) Multiply these by the current a jk Step 4) Do Steps 1-3 for all positions t and all sequences, S. Sum--> a. Then normalize. Reset a jk in the new model to a. Do 1-4 using the new model. Repeat until convergence.

41 Expectation/Maximization: refining the model Example: refining a jk, the probability of a transition from state j to state k. β k (t+1) α j (t) a jk Σ Σ α j (t) a jk β k (t+1) S t + = a After summing all a, they are normalized to sum to 1. Σ over all t in all sequences, S

42 Profile HMMs State emissions: I = insert state, one character from the background profile D = delete state, non-emitting. A connector. M = match state, one character from a specific profile. I begin D I M D I M... D I M D I M end Begin = non-emitting. Source state. End = non-emitting. Sink state. All π(q)=0, except π(begin)=1 To get the scores of a sequence to a profile HMM, we use the F/B algorithm to get P(End). This is the measure of how well the sequence fits the model. Then we can test several models.

43 Generating a profile HMM from a multiple sequence alignment base the model on, say, this one VGA--H V----N VEA--D VKG--- VYS--T FNA--N IAGADN Make four match states begin M M M M end

44 Generating a profile HMM from a multiple sequence alignment base the model on, say, this one VGA--H V----N VEA--D VKG--- VYS--T FNA--N IAGADN Make four match states

45 Generating a profile HMM from a multiple sequence alignment VGA--H V----N VEA--D VKG--- VYS--T FNA--N IAGADN begin M M M M end I Add insertion states where there are insertions. (red)

46 Generating a profile HMM from a multiple sequence alignment VGA--H V----N VEA--D VKG--- VYS--T FNA--N IAGADN begin M D M D I M D M end Add deletion states where there are deletions. (red dashes)...now optimize using expectation maximization.

47 Getting profiles for every Match state w 1 w 2 w 3 w 4 w 5 w 6 w 7 VGA--H V----N VEA--D VKG--- VYS--T FNA--N IAGADN Count the frequency of each amino acid, scaled by sequence weights, w. begin M P( V ) = D M w i s i = V w i all i D I M D M b M1 (V) = (w 1 +w 2 +w 3 +w 4 +w 5 )/ (w 1 +w 2 +w 3 +w 4 +w 5 +w 6 +w 7 ) end

48 Calculating the probability of a sequence given the model: P(s λ) begin M D M D I M D M end Sum forward (forward algorithm) using the sequence s. For each Match state, multiply by the transition (a) and the profile value, b M (s i ), and increment i For each Deletion state, multiply by a, do not increment i. For each Insertion state, multiply by a, increment i.

49 Picking a parent sequence The parent defines the number of Match states A Match state should conserve the chemical nature of the sidechain as much as possible. A Match state implies structural similarity.

50 Homolog detection using a library of profile HMMs Get P(S λ) for each λ 1 MYSEQUENCE P(s λ 1 ) P(s λ 2 ) P(s λ 3 ) P(s λ 4 ) Pick the model with the max P

51 In Class exercise: make a profile HMM AGF---PDG AGGYL-PDG AG----PNG SGFFLIPNG SGF--EPNG Pick the best parent. Draw match states. Draw insertion states for positions followed by "-" in the parent. Draw deletion states for positions in parent that align with "-". For each Match state, write the predominant amino acid.

52 Make a HMM from Blast data Score E Sequences producing significant alignments: (bits) Value gi ref NP_ (NC_003413) hypothetical protein [P e-32 gi ref NP_ (NC_000868) hypothetical protein [P e-09 gi ref NP_ (NC_000961) hypothetical protein [P e-08 gi ref NP_ (NC_003364) translation elongation e-04 gi sp P41203 EF1A_DESMO Elongation factor 1-alpha (EF-1-a gi pir S54734 translation elongation factor aef-1 alpha gi ref NP_ (NC_003364) translation initiation QUERY 3 GLFDFLKRKEVKEEEKIEILSKKPAGKVVVEEVVNIMGK-DVI-IGTVESGMIGVGFK-V GLFDFLKRKEVKEEEKIEILSKKPAGKVVVEEVVNIMGK-DVI-IGTVESGMIGVGFK-V MLGFFRRKKKEEEEKI---TGKPVGKVKVENILIVGFK-TVI-ICEVLEGMVKVGYK-V MFKFFKRKGEDEKD----VTGKPVGKVKVESILKVGFR-DVI-ICEVLEGIVKVGYK-V RMPIQDVFTITGAGTVV-VGRVETGVLKVGDR-V RIPIQDVYNISGI-GVVPVGRVETGVLKVGDKLV RIPIQDVYNISGI-GVVPVGRVETGVLKVGDKLV IVGV-KVL-AGTIKPGVT----L-V 504 QUERY 60 --KGPSGIGGIVR-IERNREKVEFAIAGDRIGISIEGKI---GK--VKKGDVLEIYQT KGPSGIGGIVR-IERNREKVEFAIAGDRIGISIEGKI---GK--VKKGDVLEIYQT RKGKKVAGIVS-MEREHKKVEFAIPGDKIGIMLEKNI---G---AEKGDILEVF KKGKKVAGIVS-MEREHKKIEFAIPGDRVGMMLEKNI----N--AEKDDILEVY VIVPPAKVGDVRS-IETHHMKLEQAQPGDNIGVNVRG-I---AKEDVKRGDVL FMPAGLVAEVKTIETHHTKIEKAEPGDNIGFNVKGVE---KKD-IKRGDV FMPAGLVAEVKTIETHHTKIEKAEPGDNIGFNVKGVE---KKD-IKRGDV KDGREVGRIMQ-IQKTGRAINEAAAGDEVAISIHGDVIVGRQ--IKEGDILYVY-- 555

53 Make a HMM from Blast data begin GLFDFLKRKEVKEEEKIEILSKKPAGKVVVEEVVNIMGK-DVI-IGTVESGMIGVGFK-V GLFDFLKRKEVKEEEKIEILSKKPAGKVVVEEVVNIMGK-DVI-IGTVESGMIGVGFK-V -MLGFFRRKKKEEEEKI---TGKPVGKVKVENILIVGFK-TVI-ICEVLEGMVKVGYK-V -MFKFFKRKGEDEKD----VTGKPVGKVKVESILKVGFR-DVI-ICEVLEGIVKVGYK-V RMPIQDVFTITGAGTVV-VGRVETGVLKVGDR-V RIPIQDVYNISGI-GVVPVGRVETGVLKVGDKLV RIPIQDVYNISGI-GVVPVGRVETGVLKVGDKLV IVGV-KVL-AGTIKPGVT----L-V Delete states Insert states Match states --KGPSGIGGIVR-IERNREKVEFAIAGDRIGISIEGKI---GK--VKKGDVLEIYQT --KGPSGIGGIVR-IERNREKVEFAIAGDRIGISIEGKI---GK--VKKGDVLEIYQT --RKGKKVAGIVS-MEREHKKVEFAIPGDKIGIMLEKNI---G---AEKGDILEVF-- --KKGKKVAGIVS-MEREHKKIEFAIPGDRVGMMLEKNI----N--AEKDDILEVY-- VIVPPAKVGDVRS-IETHHMKLEQAQPGDNIGVNVRG-I---AKEDVKRGDVL FMPAGLVAEVKTIETHHTKIEKAEPGDNIGFNVKGVE---KKD-IKRGDV FMPAGLVAEVKTIETHHTKIEKAEPGDNIGFNVKGVE---KKD-IKRGDV KDGREVGRIMQ-IQKTGRAINEAAAGDEVAISIHGDVIVGRQ--IKEGDILYVY-- end

54 Added information In DP, we assumed insertions and deletions were equally probable, and that the probability was independent of position. With Profile HMMs we allow insertions and deletions to have different probabilities, and to be dependent on the position.

55 Many uses of HMMs Weather prediction Ecosystem modeling Brain activity Language structure Econometrics etc etc HMMs can be applied to any dataset that can be represented as strings. The expert input is the topology, or how the states are connected.

56 Profile HMM libraries available via web Pfam (HMMer): SAM: pfam.wustl.edu

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

HMMs and biological sequence analysis

HMMs and biological sequence analysis HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

More information

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/14/07 CAP5510 1 CpG Islands Regions in DNA sequences with increased

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Stephen Scott.

Stephen Scott. 1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative

More information

Markov Chains and Hidden Markov Models. = stochastic, generative models

Markov Chains and Hidden Markov Models. = stochastic, generative models Markov Chains and Hidden Markov Models = stochastic, generative models (Drawing heavily from Durbin et al., Biological Sequence Analysis) BCH339N Systems Biology / Bioinformatics Spring 2016 Edward Marcotte,

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How

More information

Lecture 3: Markov chains.

Lecture 3: Markov chains. 1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.

More information

Lecture 7 Sequence analysis. Hidden Markov Models

Lecture 7 Sequence analysis. Hidden Markov Models Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden

More information

Hidden Markov Models (I)

Hidden Markov Models (I) GLOBEX Bioinformatics (Summer 2015) Hidden Markov Models (I) a. The model b. The decoding: Viterbi algorithm Hidden Markov models A Markov chain of states At each state, there are a set of possible observables

More information

CSCE 471/871 Lecture 3: Markov Chains and

CSCE 471/871 Lecture 3: Markov Chains and and and 1 / 26 sscott@cse.unl.edu 2 / 26 Outline and chains models (s) Formal definition Finding most probable state path (Viterbi algorithm) Forward and backward algorithms State sequence known State

More information

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9: Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

Markov Models & DNA Sequence Evolution

Markov Models & DNA Sequence Evolution 7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2 Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis Shamir s lecture notes and Rabiner s tutorial on HMM 1 music recognition deal with variations

More information

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence

More information

Hidden Markov Models

Hidden Markov Models Andrea Passerini passerini@disi.unitn.it Statistical relational learning The aim Modeling temporal sequences Model signals which vary over time (e.g. speech) Two alternatives: deterministic models directly

More information

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models Modeling the statistical properties of biological sequences and distinguishing regions

More information

Hidden Markov Models for biological sequence analysis

Hidden Markov Models for biological sequence analysis Hidden Markov Models for biological sequence analysis Master in Bioinformatics UPF 2017-2018 http://comprna.upf.edu/courses/master_agb/ Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding Example: The Dishonest Casino Hidden Markov Models Durbin and Eddy, chapter 3 Game:. You bet $. You roll 3. Casino player rolls 4. Highest number wins $ The casino has two dice: Fair die P() = P() = P(3)

More information

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes music recognition deal with variations in - actual sound -

More information

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. x 1 x 2 x 3 x K Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K HiSeq X & NextSeq Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization:

More information

Hidden Markov Models. Three classic HMM problems

Hidden Markov Models. Three classic HMM problems An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Hidden Markov Models Slides revised and adapted to Computational Biology IST 2015/2016 Ana Teresa Freitas Three classic HMM problems

More information

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics

More information

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II 6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution Lecture 05 Hidden Markov Models Part II 1 2 Module 1: Aligning and modeling genomes Module 1: Computational foundations Dynamic programming:

More information

Hidden Markov Models for biological sequence analysis I

Hidden Markov Models for biological sequence analysis I Hidden Markov Models for biological sequence analysis I Master in Bioinformatics UPF 2014-2015 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Example: CpG Islands

More information

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY BIOINFORMATICS Lecture 11-12 Hidden Markov Models ROBI POLIKAR 2011, All Rights Reserved, Robi Polikar. IGNAL PROCESSING & PATTERN RECOGNITION LABORATORY @ ROWAN UNIVERSITY These lecture notes are prepared

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Stephen Scott.

Stephen Scott. 1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. x 1 x 2 x 3 x K Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization: f 0 (0) = 1 f k (0)

More information

Data Mining in Bioinformatics HMM

Data Mining in Bioinformatics HMM Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics

More information

order is number of previous outputs

order is number of previous outputs Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y

More information

Hidden Markov Models. Ron Shamir, CG 08

Hidden Markov Models. Ron Shamir, CG 08 Hidden Markov Models 1 Dr Richard Durbin is a graduate in mathematics from Cambridge University and one of the founder members of the Sanger Institute. He has also held carried out research at the Laboratory

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information

Pairwise alignment using HMMs

Pairwise alignment using HMMs Pairwise alignment using HMMs The states of an HMM fulfill the Markov property: probability of transition depends only on the last state. CpG islands and casino example: HMMs emit sequence of symbols (nucleotides

More information

Chapter 4: Hidden Markov Models

Chapter 4: Hidden Markov Models Chapter 4: Hidden Markov Models 4.1 Introduction to HMM Prof. Yechiam Yemini (YY) Computer Science Department Columbia University Overview Markov models of sequence structures Introduction to Hidden Markov

More information

Hidden Markov Models (HMMs) November 14, 2017

Hidden Markov Models (HMMs) November 14, 2017 Hidden Markov Models (HMMs) November 14, 2017 inferring a hidden truth 1) You hear a static-filled radio transmission. how can you determine what did the sender intended to say? 2) You know that genes

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas CG-Islands Given 4 nucleotides: probability of occurrence is ~ 1/4. Thus, probability of

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems

More information

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Comparative Gene Finding. BMI/CS 776  Spring 2015 Colin Dewey Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes

More information

HIDDEN MARKOV MODELS

HIDDEN MARKOV MODELS HIDDEN MARKOV MODELS Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands I529: Machine Learning in Bioinformatics (Spring 203 Hidden Markov Models Yuzhen Ye School of Informatics and Computing Indiana Univerty, Bloomington Spring 203 Outline Review of Markov chain & CpG island

More information

Hidden Markov Models (HMMs) and Profiles

Hidden Markov Models (HMMs) and Profiles Hidden Markov Models (HMMs) and Profiles Swiss Institute of Bioinformatics (SIB) 26-30 November 2001 Markov Chain Models A Markov Chain Model is a succession of states S i (i = 0, 1,...) connected by transitions.

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Moreover, the circular logic

Moreover, the circular logic Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms  Hidden Markov Models Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

Multiple Sequence Alignment using Profile HMM

Multiple Sequence Alignment using Profile HMM Multiple Sequence Alignment using Profile HMM. based on Chapter 5 and Section 6.5 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. students Beatrice Miron, Oana Răţoi,

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured

More information

6 Markov Chains and Hidden Markov Models

6 Markov Chains and Hidden Markov Models 6 Markov Chains and Hidden Markov Models (This chapter 1 is primarily based on Durbin et al., chapter 3, [DEKM98] and the overview article by Rabiner [Rab89] on HMMs.) Why probabilistic models? In problems

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

Lecture #5. Dependencies along the genome

Lecture #5. Dependencies along the genome Markov Chains Lecture #5 Background Readings: Durbin et. al. Section 3., Polanski&Kimmel Section 2.8. Prepared by Shlomo Moran, based on Danny Geiger s and Nir Friedman s. Dependencies along the genome

More information

Exercise 5. Sequence Profiles & BLAST

Exercise 5. Sequence Profiles & BLAST Exercise 5 Sequence Profiles & BLAST 1 Substitution Matrix (BLOSUM62) Likelihood to substitute one amino acid with another Figure taken from https://en.wikipedia.org/wiki/blosum 2 Substitution Matrix (BLOSUM62)

More information

Biology 644: Bioinformatics

Biology 644: Bioinformatics A stochastic (probabilistic) model that assumes the Markov property Markov property is satisfied when the conditional probability distribution of future states of the process (conditional on both past

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

11.3 Decoding Algorithm

11.3 Decoding Algorithm 11.3 Decoding Algorithm 393 For convenience, we have introduced π 0 and π n+1 as the fictitious initial and terminal states begin and end. This model defines the probability P(x π) for a given sequence

More information

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

More information

CS711008Z Algorithm Design and Analysis

CS711008Z Algorithm Design and Analysis .. Lecture 6. Hidden Markov model and Viterbi s decoding algorithm Institute of Computing Technology Chinese Academy of Sciences, Beijing, China . Outline The occasionally dishonest casino: an example

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Multiple Sequence Alignment

Multiple Sequence Alignment Multiple Sequence Alignment BMI/CS 576 www.biostat.wisc.edu/bmi576.html Colin Dewey cdewey@biostat.wisc.edu Multiple Sequence Alignment: Tas Definition Given a set of more than 2 sequences a method for

More information

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene. Sequence Analysis, '18 -- lecture 9 Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene. How can I represent thousands of homolog sequences in a compact

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

bioinformatics 1 -- lecture 7

bioinformatics 1 -- lecture 7 bioinformatics 1 -- lecture 7 Probability and conditional probability Random sequences and significance (real sequences are not random) Erdos & Renyi: theoretical basis for the significance of an alignment

More information

Hidden Markov Models and Their Applications in Biological Sequence Analysis

Hidden Markov Models and Their Applications in Biological Sequence Analysis Hidden Markov Models and Their Applications in Biological Sequence Analysis Byung-Jun Yoon Dept. of Electrical & Computer Engineering Texas A&M University, College Station, TX 77843-3128, USA Abstract

More information

HMM : Viterbi algorithm - a toy example

HMM : Viterbi algorithm - a toy example MM : Viterbi algorithm - a toy example 0.6 et's consider the following simple MM. This model is composed of 2 states, (high GC content) and (low GC content). We can for example consider that state characterizes

More information

Sequence Analysis '17- lecture 8. Multiple sequence alignment

Sequence Analysis '17- lecture 8. Multiple sequence alignment Sequence Analysis '17- lecture 8 Multiple sequence alignment Ex5 explanation How many random database search scores have e-values 10? (Answer: 10!) Why? e-value of x = m*p(s x), where m is the database

More information

Hidden Markov models in population genetics and evolutionary biology

Hidden Markov models in population genetics and evolutionary biology Hidden Markov models in population genetics and evolutionary biology Gerton Lunter Wellcome Trust Centre for Human Genetics Oxford, UK April 29, 2013 Topics for today Markov chains Hidden Markov models

More information

HMM part 1. Dr Philip Jackson

HMM part 1. Dr Philip Jackson Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM part 1 Dr Philip Jackson Probability fundamentals Markov models State topology diagrams Hidden Markov models -

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir Sequence Bioinformatics Multiple Sequence Alignment Waqas Nasir 2010-11-12 Multiple Sequence Alignment One amino acid plays coy; a pair of homologous sequences whisper; many aligned sequences shout out

More information

Bioinformatics 2 - Lecture 4

Bioinformatics 2 - Lecture 4 Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),

More information

Multiple Sequence Alignment: HMMs and Other Approaches

Multiple Sequence Alignment: HMMs and Other Approaches Multiple Sequence Alignment: HMMs and Other Approaches Background Readings: Durbin et. al. Section 3.1, Ewens and Grant, Ch4. Wing-Kin Sung, Ch 6 Beerenwinkel N, Siebourg J. Statistics, probability, and

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

Pairwise sequence alignment and pair hidden Markov models

Pairwise sequence alignment and pair hidden Markov models Pairwise sequence alignment and pair hidden Markov models Martin C. Frith April 13, 2012 ntroduction Pairwise alignment and pair hidden Markov models (phmms) are basic textbook fare [2]. However, there

More information

HMM: Parameter Estimation

HMM: Parameter Estimation I529: Machine Learning in Bioinformatics (Spring 2017) HMM: Parameter Estimation Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Content Review HMM: three problems

More information

Hidden Markov Models in computational biology. Ron Elber Computer Science Cornell

Hidden Markov Models in computational biology. Ron Elber Computer Science Cornell Hidden Markov Models in computational biology Ron Elber Computer Science Cornell 1 Or: how to fish homolog sequences from a database Many sequences in database RPOBESEQ Partitioned data base 2 An accessible

More information

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). 1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

More information

Pairwise sequence alignment

Pairwise sequence alignment Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL

More information

Hidden Markov Models. Terminology, Representation and Basic Problems

Hidden Markov Models. Terminology, Representation and Basic Problems Hidden Markov Models Terminology, Representation and Basic Problems Data analysis? Machine learning? In bioinformatics, we analyze a lot of (sequential) data (biological sequences) to learn unknown parameters

More information