Biological Sequences and Hidden Markov Models CPBS7711
|
|
- Michael Curtis
- 6 years ago
- Views:
Transcription
1 Biological Sequences and Hidden Markov Models CPBS7711 Sept 27, 2011 Sonia Leach, PhD Assistant Professor Center for Genes, Environment, and Health National Jewish Health Slides created from David Pollock s 2009 slides from 7711 and current reading list from CPBS711 website Center for Genes, Environment, and Health
2 Introduction Despite complex 3-D structure, biological molecules have primary linear sequence (DNA, RNA, protein) or have linear sequence of features (CpG islands, models of exons, introns, regulatory regions, genes) Hidden Markov Models (HMMs) are probabilistic bili models for processes which transition through a discrete set of states, each emitting a symbol (probabilistic finite state machine) HMMs exhibit the Markov property: the conditional probability distribution of future states of the process depends only upon the present state (memory-less) Linear sequence of molecules/features is modelled as a path through states of the HMM Andrey Markov which emit the sequence of molecules/features Actual state is hidden and observed only through output symbols Center for Genes, Environment, and Health 2
3 Hidden Markov Model Finite set of N states X Finite set of M observations O Parameter set ) Initial state distribution π i PrX 1 i Transition probability a ij PrX t j X t 1 i Emission probability b ik PrO t k X t i Example: N3, M2 π0.25, 0.55, 0.2 A B Center for Genes, Environment, and Health 3
4 Hidden Markov Model Finite set of N states X Finite set of M observations O Parameter set ) Hidden Markov Model (HMM) Initial state distribution π i PrX 1 i Transition probability a ij PrX t j X t 1 i Emission probability b ik PrO t k X t i X t-1 O t-1 X t O t Example: N3, M2 π0.25, 0.55, 0.2 A B Center for Genes, Environment, and Health 4
5 Probabilistic Graphical Models Markov Process (MP) Y X Time X t 1 X t Observability Utility Observability and Utility Hidden Markov Model (HMM) X t-1 O t-1 O t X t Markov Decision Process (MDP) A t 1 X t 1 U t 1 A t X t U t Partially Observable Markov Decision Process (POMDP) A t 1 O t 1 A t X t 1 U t 1 X t U t O t Center for Genes, Environment, and Health 5
6 Three basic problems of HMMs 1. Given the observation sequence O=O 1,O 2,,O n, how do we compute Pr(O )? 2. Given the observation sequence, how do we choose the corresponding state sequence X=X 1,X 2,,X n which is optimal? 3. How do we adjust the model parameters to maximize Pr(O )? Center for Genes, Environment, and Health 6
7 Example: π i = Pr(X 1 = i) a ij = Pr(X t =j X t-1 = i) b ik = Pr(O t =k X t = i) N3, M2 π0.25, 0.55, 0.2 A B Observation sequence O? State Sequence X? Prob(O,X )? Center for Genes, Environment, and Health 7
8 Example: π i = Pr(X 1 = i) a ij = Pr(X t =j X t-1 = i) b ik = Pr(O t =k X t = i) N3, M2 π0.25, 0.55, 0.2 A B Probability of O is sum over all state sequences Pr(O λ) = all X Pr(O X, λ) Pr(X λ) = all X π x11 b x11 o 1 a x11 x 2 b x22 o 2... a xt-1 x T b xtt o T What is computational o a complexity of this sum? Center for Genes, Environment, and Health 8
9 Example: π i = Pr(X 1 = i) a ij = Pr(X t =j X t-1 = i) b ik = Pr(O t =k X t = i) N3, M2 π0.25, 0.55, 0.2 A B Probability of O is sum over all state sequences Pr(O λ) = all X Pr(O X, λ) Pr(X λ) = all X π x11 b x11 o 1 a x11 x 2 b x22 o 2... a xt-1 x T b xtt o T At each t, are N states to reach, so N T possible state sequences and 2T multiplications per seq, means O(2T*N T ) operations So 3 states, length 10 seq = 1,180,980 operations and len 20 = 1e11! Center for Genes, Environment, and Health 9
10 Example: π i = Pr(X 1 = i) a ij = Pr(X t =j X t-1 = i) b ik = Pr(O t =k X t = i) N3, M2 π0.25, 0.55, 0.2 A B Probability of O is sum over all state sequences Pr(O λ) = all X Pr(O X, λ) Pr(X λ) = all X π x11 b x11 o 1 a x11 x 2 b x22 o 2... a xt-1 x T b xtt o T Efficient ce tdynamic cprogramming oga gago algorithm todo this: Forward algorithm(baum and Welch,O(N 2 T)) Center for Genes, Environment, and Health 10
11 A Simple HMM CpG Islands where in one state, much higher probability to be C or G G C.3 A.2 T CpG G.1 C.1 A.4 T.4 Non-CpG From David Pollock
12 CpG G.3 C.3 A.2 T.2 G.1 C.1 A.4 T Non-CpG 0.1 The Forward Algorithm Probability of a Sequence is the Sum of All Paths that Can Produce It Assuming π = (0.5, 0.5) and given the sequence G, G, what is Pr(O=G λ)? For O=G, have 2 possible state sequences C (i.e. CpG state) N (i.e. Non-CpG state) t Adapted from David Pollock s
13 CpG G.3 C.3 A.2 T.2 G.1 C.1 A.4 T Non-CpG 0.1 The Forward Algorithm Probability of a Sequence is the Sum of All Paths that Can Produce It G.3 G.1 G Assuming π X =0.5, Pr(G λ) =ππ C b CG + π N b NG =.5*.3 +.5*.1 For convenience, let s drop the 0.5s for now and add them in later (so number to right of G in box here is probability of emitting G in that state, i.e. b XG ) Adapted from David Pollock s
14 CpG G.3 C.3 A.2 T.2 G.1 C.1 A.4 T Non-CpG 0.1 The Forward Algorithm Probability of a Sequence is the Sum of All Paths that Can Produce It G.3 G.1 G CC NC and emit C CN NN and emit C C For O=GC have 4 possible state sequences CC,NC, CN,NN Adapted from David Pollock s
15 CpG G.3 C.3 A.2 T.2 G.1 C.1 A.4 T Non-CpG 0.1 The Forward Algorithm Probability of a Sequence is the Sum of All Paths that Can Produce It G.3 G.1 G (.3*.8+.1*.1)*.3 =.075 (.3*.2+.1*.9)*.1 =.015 C For O=GC have 4 possible state sequences CC,NC, CN,NN Adapted from David Pollock s
16 CpG G.3 C.3 A.2 T.2 G.1 C.1 A.4 T Non-CpG 0.1 The Forward Algorithm Probability of a Sequence is the Sum of All Paths that Can Produce It G.3 (.3*.8+.1*.1) *.3 =.075 G.1 (.3*.2+.1*.9) *.1 =.015 G C G For O=GCG have 8 possible state sequences CCC, CNC NCC, NNC CCN, CNN NCN, NNN Adapted from David Pollock s
17 CpG G.3 C.3 A.2 T.2 G.1 C.1 A.4 T Non-CpG 0.1 The Forward Algorithm Probability of a Sequence is the Sum of All Paths that Can Produce It G.3 (.3*.8+.1*.1) *.3 =.075 G.1 (.3*.2+.1*.9) *.1 =.015 G C came from C or from N and emit G came from C or from N and emit G G For O=GCG have 8 possible state sequences CCC, CNC NCC, NNC CCN, CNN NCN, NNN Adapted from David Pollock s
18 CpG G.3 C.3 A.2 T.2 G.1 C.1 A.4 T Non-CpG 0.1 The Forward Algorithm Probability of a Sequence is the Sum of All Paths that Can Produce It G.3 (.3*.8+.1*.1) *.3 =.075 G.1 (.3*.2+.1*.9) *.1 =.015 G C (.075* *.1) *3 *.3 =.0185 ( (.075* *.9) *.1 =.0029 G For O=GCG have 8 possible state sequences CCC, CNC NCC, NNC CCN, CNN NCN, NNN Adapted from David Pollock s
19 CpG G.3 C.3 A.2 T.2 G.1 C.1 A.4 T Non-CpG 0.1 The Forward Algorithm Probability of a Sequence is the Sum of All Paths that Can Produce It G.3 (.3*.8+.1*.1) *.3 =.075 G.1 (.3*.2+.1*.9) *.1 =.015 G C (.075* *.1) *3 *.3 =.0185 ( (.075* *.9) *.1 =.0029 G (.0185* *.1 )*.2 =.003 (.0185* *.9) *.4 =.0025 A (.003* *.1) *.2 =.0005 ( (.003* *.9) *.4 =.0011 A Adapted from David Pollock s
20 CpG The Forward Algorithm 0.8 Probability of a Sequence is the Sum of All Paths G.3 that Can Produce It C.3 G.3 (.3*.8+ (.075*.8+ (.0185*.8 A.2.1*.1).015*.1) *.1 T.2 *.3 *3 *.3 )*.2 *.2 =.075 =.0185 =.003 = G.1 (.3*.2+ (.075*.2+ (.0185*.2 ( G.1.1*.9).015*.9) *.9) C.1 *.1 *.1 *.4 A.4 =.015 =.0029 =.0025 =.0011 T G C G A A Problem 1: Pr(O λ)=0.5* *.0011= 8e-4 Non-CpG (.003* *.1) (.003* *.9) *.4
21 CpG The Forward Algorithm 0.8 Probability of a Sequence is the Sum of All Paths G.3 that Can Produce It C.3 G.3 (.3*.8+ (.075*.8+ (.0185*.8 A.2.1*.1).015*.1) *.1 T.2 *.3 *3 *.3 )*.2 *.2 =.075 =.0185 = G.1 (.3*.2+ (.075*.2+ (.0185*.2 ( G.1.1*.9).015*.9) *.9) C.1 *.1 *.1 *.4 A.4 =.015 =.0029 =.0025 T G C G A A Problem 2: What is optimal state sequence? Non-CpG (.003* *.1) =.0005 (.003* *.9) *.4 =.0011
22 CpG G.3 C.3 A.2 T.2 G.1 C.1 A.4 T Non-CpG 0.1 The Forward Algorithm Probability of a Sequence is the Sum of All Paths that Can Produce It G.3 (.3*.8+.1*.1) *.3 =.075 G.1 (.3*.2+.1*.9) *.1 =.015 G C (.075* *.1) *3 *.3 =.0185 ( (.075* *.9) *.1 =.0029 G (.0185* *.1 )*.2 =.003 (.0185* *.9) *.4 =.0025 Probability of being in state CpG or Non-CPG at step i A (.003* *.1) *.2 =.0005 ( (.003* *.9) *.4 =.0011 A Adapted from David Pollock s
23 CpG 0.8 G.3 C.3 A.2 T G.1 C.1 A.4 T Non-CpG The Viterbi Algorithm Most Likely Path (use max instead of sum) G.3 (.3*.8+.1*.1) *.3 =.075 G.1 (.3*.2+.1*.9) *.1 =.015 G C (from forward algorithm) with max becomes.3*.8,.1*.1).1) *.3 =.072.3*.2,.1*.9) *.1 =.009 (with Viterbi algorithm) Adapted from David Pollock s (note error in formulas on his)
24 CpG 0.8 G.3 C.3 A.2 T G.1 C.1 A.4 T Non-CpG The Viterbi Algorithm Most Likely Path (use max instead of sum) G.3 G.1 G.3*.8,.1*.1) *.3 =.072.3*.2,.1*.9) *.1 =.009 C G A A Adapted from David Pollock s (note error in formulas on his)
25 CpG 0.8 G.3 C.3 A.2 T G.1 C.1 A.4 T Non-CpG The Viterbi Algorithm Most Likely Path (use max instead of sum) G.3 G.1 G.3*.8,.1*.1) *.3 =.072.3*.2,.1*.9) *.1 =.009 C.072*.8,.009*.1) *.3 = *.2,.009*.9) *.1 =.0014 G.0173*.8,.0014*.1) *.2 = * *.9) *.4 =.0014 A.0028*.8,.0014*.1) *.2 = *.2,.0014*.9 )*.4 =.0005 A Adapted from David Pollock s (note error in formulas on his)
26 CpG G.3 C.3 A.2 T.2 G.1 C.1 A.4 T Non-CpG 0.1 G.3 G.1 The Viterbi Algorithm Most Likely Path.3*.8,.1*.1) *.3 =.072.3*.2,.1*.9) *.1 = *.8,.009*.1) *.3 = *.2,.009*.9) *.1 = *.8,.0014*.1) *.2 = * *.9) *.4 = *.8,.0014*.1) *.2 = *.2,.0014*.9 )*.4 =.0005 G C G A A What if choose max prob state at each step? Ans: CCCCN. What is problem with doing that? Adapted from David Pollock s (note error in formulas on his)
27 Hint Suppose in same way most likely state at each step is Center for Genes, Environment, and Health 27
28 CpG 0.8 G.3 C.3 A.2 T G.1 C.1 A.4 T G.3 G.1 G The Viterbi Algorithm Most Likely Path: Backtracking.3*.8,.1*.1) *.3 =.072.3*.2,.1*.9) *.1 =.009 C.072*.8,.009*.1) *.3 = *.2,.009*.9) *.1 =.0014 G.0173*.8,.0014*.1) *.2 = * *.9) *.4 =.0014 A.0028*.8,.0014*.1) *.2 = *.2,.0014*.9 )*.4 =.0005 A Non-CpG
29 CpG 0.8 G.3 C.3 A.2 T G.1 C.1 A.4 T Non-CpG G.3 G.1 G The Viterbi Algorithm Most Likely Path: Backtracking.3*.8,.1*.1) *.3 =.072.3*.2,.1*.9) *.1 =.009 C.072*.8,.009*.1) *.3 = *.2,.009*.9) *.1 =.0014 G.0173*.8,.0014*.1) *.2 = * *.9) *.4 =.0014 A.0028*.8,.0014*.1) *.2 = *.2,.0014*.9 )*.4 =.0005 A Adapted from David Pollock s
30 CpG 0.8 G.3 C.3 A.2 T G.1 C.1 A.4 T Non-CpG G.3 G.1 G The Viterbi Algorithm Most Likely Path: Backtracking.3*.8,.1*.1) *.3 =.072.3*.2,.1*.9) *.1 =.009 C.072*.8,.009*.1) *.3 = *.2,.009*.9) *.1 =.0014 G.0173*.8,.0014*.1) *.2 = * *.9) *.4 =.0014 A.0028*.8,.0014*.1) *.2 = *.2,.0014*.9 )*.4 =.0005 A Adapted from David Pollock s
31 CpG G.3 C.3 A.2 T.2 G.1 C.1 A.4 T Non-CpG 0.1 Forward-backward algorithm G.3 (.3*.8+.1*.1) *.3 =.075 G.1 (.3*.2+.1*.9) *.1 =.015 (.075* *.1) *3 *.3 =.0185 ( (.075* *.9) *.1 =.0029 (.0185* *.1 )*.2 =.003 (.0185* *.9) *.4 =.0025 (.003* *.1) *.2 =.0005 ( (.003* *.9) *.4 =.0011 Problem G 3: How C to learn Gmodel? A A Forward algorithm calculated Pr(O 1..t,X t =i λ)
32 How do you learn an HMM? Baum-Welch Iterative algorithm is popular Equivalent to Expectation Maximization (EM) Maximize If know hidden variables (states), maximize model parameters with respect to that knowledge Expectation If know model parameters, find expected values of the hidden variables (states) Iterate between two steps until convergence of parameter estimates Center for Genes, Environment, and Health 32
33 Parameter estimation by Baum-Welch Forward Backward Algorithm Forward variable α t (i) Pr(O 1..t,X t =i λ) Backward variable β t (i) Pr(O t+1..n X t =i, λ) Rabiner 1989
34 Parameter Estimation Define 2 variables ξ and γ. Probability of transitioning at time t from state i to j, no matter the path ξ t (i,j) = Pr(q t =S i, q t =S j O,λ) = α t (i) a ij b jot+1 β t+1 (i) / i=1ton j=1ton α t (i) a ij b jot+1 β t+1 (i) Probability of being in state i at time t, no matter the path γ t(i) = Pr(q t=s i O,λ) = α t(i)β t (i) / i=1ton α t(i)β t (i) Then expected values for parameters are π i = γ 1 (i) a ij = t=1 to T-1 ξ t (i,j) / t=1 to T-1 γ t (j) b jk = t=1 to T-1 s.t Ot =k γ t (j) / t=1 to T-1 γ t (j) Center for Genes, Environment, and Health 34
35 Baum-Welch algorithm (equivalent to EM) Given an initial assignment to parameters λ=(π, b, a), compute ξ and γ from α and β Generate new estimate λ*=(π*, (, b*, a*) from π * i = γ 1 (i) a * ij = t=1 to T-1 ξ t (i,j) / t=1 to T-1 γ t (j) b * jk = t=1 to T-1 s.t Ot =k γ t (j) / t=1 to T-1 γ t (j) Set λ= λ* and repeat until convergence Center for Genes, Environment, and Health 35
36 Where are HMMs used in Computational Biology? DNA motif matching, gene matching, multiple sequence alignment Amino Acids domain matching, fold recognition Microarrays/Whole Genome Sequencing assign copy number ChIP-chip/seq distinct chromatin states Center for Genes, Environment, and Health 36
37 Homologous Sequences what is consensus sequence? how can we recognize all of them? but how to distinguish unlikely members? Center for Genes, Environment, and Health Krogh
38 Homologous Sequences Krogh 1998 Center for Genes, Environment, and Health 38
39 Probability of Sequences Center for Genes, Environment, and Health 39
40 Learning Parameters of Compbio HMMs built from pre-aligned (pre-labeled sequences) so states have meaningful biological labels (like insertion portiop, then parameter estimation just tabulates frequencies, like in previous example note longer sequences have lower probability, so often converted to log-odds parameters (see Krogh 1998) built from unaligned/unlabelled l ll d sequences, where semantics of states can (sometimes) be interpreted later and must do Baum-Welch or equivalent for parameter estimation, like in chromatin state example shown later HMMs encode regular grammars so do poor job on problems where long-range (complementary) correlations (ex. RNA/protein secondary structure) Center for Genes, Environment, and Health 40
41 Homology HMM Gene recognition, classify to identify distant homologs Common Ancestral Sequence Parameter set λ = (A, B, π), strict left-right model Specially defined set of states: start, stop, match, insert, delete For initial state distribution π, use start state For transition matrix A use global transition probabilities For emission matrix B Match, site-specific emission probabilities Insert (relative to ancestor), global emission probs Delete, emit nothing Multiple Sequence Alignments Adapted from David Pollock s
42 Homology HMM insert insert insert start t match match end delete delete Adapted from David Pollock s
43 Homology HMM Example A.1 A.04 A.2 C.05 C.1 C.01 D.2 D.01 D.05 match match E.08 E.2 E.1 F.01 F.02 F.06 match
44 Ungapped blocks Ungapped blocks where insertion states model intervening sequence between blocks Insert/delete states allowed anywhere Center for Genes, Environment, and Health Allow multiple domains, sequence fragments Eddy,
45 Uses for Homology HMM Find homologs to profile HMM in database Score multiple sequences for match to 1 HMM Not always Pr(O λ) since some areas may highly diverge Sometimes use highest scoring subsequence Goal is to find homologs in database Classify sequence using library of profile HMMs Compare 1 seq to >1 alternate models ex. Pfam, PROSITE motif databases Alignment of additional sequences Structural alignment when alphabet is secondary structure symbols so can do fold-recognition, etc Adapted from David Pollock s
46 Variable Length and Composition of Protein Domains Center for Genes, Environment, and Health 46
47 Why Hidden Markov Models for MSA? Multiple sequence alignment as consensus May have substitutions, not all AA are equal FOS_RAT IPTVTAISTSPDLQWLVQPTLVSSVAPSQTRAPHPYGLPTPSTGAYARAGVV 112 FOS_MOUSE IPTVTAISTSPDLQWLVQPTLVSSVAPSQTRAPHPYGLPTQSAGAYARAGMV 112 Could use regular expressions but how to handle indels? FOS_RAT IPTVTAISTSPDLQWLVQPTLVSSVAPSQTRAPHPYGLPTPS-TGAYARAGVV 112 FOS_MOUSE IPTVTAISTSPDLQWLVQPTLVSSVAPSQTRAPHPYGLPTQS-AGAYARAGMV 112 FOS_CHICK VPTVTAISTSPDLQWLVQPTLISSVAPSQNRG-HPYGVPAPAPPAAYSRPAVL 112 What about variable-length members of family? FOS_RAT IPTVTAISTSPDLQWLVQPTLVSSVAPSQ TRAPHPYGLPTPS-TGAYARAGVV 112 FOS_MOUSE IPTVTAISTSPDLQWLVQPTLVSSVAPSQ TRAPHPYGLPTQS-AGAYARAGMV 112 FOS_CHICK VPTVTAISTSPDLQWLVQPTLISSVAPSQ NRG-HPYGVPAPAPPAAYSRPAVL 112 FOSB_MOUSE VPTVTAITTSQDLQWLVQPTLISSMAQSQGQPLASQPPAVDPYDMPGTS----YSTPGLS 110 FOSB_HUMAN VPTVTAITTSQDLQWLVQPTLISSMAQSQGQPLASQPPVVDPYDMPGTS----YSTPGMS 110 Center for Genes, Environment, and Health 47
48 Why Hidden Markov Models? Rather than consensus sequence which describes the most common amino acid per position, HMMs allow more than one amino acid to appear at each position Rather than profiles as position specific scoring matrices (PSSM) which assign a probability to each amino acid in each position of the domain and slide fixed-length profile along a longer sequence to calculate score, HMMs model probability of variable length sequences Rather than regular expressions which can capture variable length sequences yet specify a limited subset of amino acids per position, HMMs quantify difference among using different amino acids at each position Center for Genes, Environment, and Health 48
49 Detecting ti Copy Number in Array Array Data CGH data Discrete number of copies found by segmenting array intensities along chromsome HMM segmentation Naïve smoothing Center for Genes, Environment, and Health 49
50 Detecting ti Copy Number in Whole Genome Sequencing Data ABI Bioscope manual 2010 Center for Genes, Environment, and Health Compute log ratio of observed coverage to expected coverage Fit to HMM with states for 0-9 copies Copy number assigned to region with Viterbi algorithm 50
51 HMMs for Chromatin States Specific amino acid of specific histone protein modified at a given level can be tagged and assayed ex) H3K27me3 means 3 methyl groups have been added to Lysine at postion 27 in histone 3 Center for Genes, Environment, and Health Rodenhiser& Mann CMAJ (3):341 51
52 Combination of Chromatin States If HMM for sequence from single mark with states, eg. has H3K27me3 or no H3K27me3 (peak finding) However peaks for single mark could still be distributed all across genome, which ones are important? Comparing across multiple signals identifies specific combinations which distinguish the important peaks in an individual signal (combinatorial patterns) Barski et al Cell : Center for Genes, Environment, and Health 52
53 Combination States Learned optimized to Q=51 labels (a.k.a. states) where semantics assigned post hoc based on prior biological knowledge, relation to gene models, gene expression data, and sequence conservation Center for Genes, Environment, and Health 53
54 Multivariate HMMs for Chromatin States Center for Genes, Environment, and Health Ernst 2010 learned 51 distinct chromatin states, interpreted t post hoc as promoterassociated, transcriptionassociated, active intergenic, large- scale repressed and repeat- associated states. t 54
55 Hot Topic: Better than HMMs for Chromatin States: Dynamic Bayes Nets! allows specification of min/max length of feature and way to count that down ( memory ) and way to enforce or disallow certain transitions Recall: X t-1 H M M O t-1 X t O t Segway by Hoffman et al 2011 Center for Genes, Environment, and Health hidden state of model sequence of observations for each of n chromatin/txfac marks 55
56 Hot Topic: Better than HMMs for Chromatin States Segway by Hoffman et al 2011 specify Q=25 labels (a.k.a. states) Center for Genes, Environment, and Health semantics of learned states assigned post hoc based on prior bio knowledge 56
57 Hot Topic: Better than HMMs for Chromatin States Center for Genes, Environment, and Health Segway by Hoffman et al
58 Homology HMM Resources Great tutorial (Krogh 1998) ** WUSTL/Janelia (Eddy Bioinformatics (9):755)** Pfam: database of pre-computed HMM alignments for various proteins HMMer: program for building HMMs UCSC (Haussler) SAM: align, secondary structure predictions, HMM parameters, etc. Chromatin States Ernst et al, PMCID: PMC Segway:
59 Center for Genes, Environment, and Health 59
60 Other David Pollock Slides 2009 Center for Genes, Environment, and Health 60
61 Model Comparison Based on P(D, M) For ML, take P max (D, M) lnp max (D, M) Usually to avoid numeric error max For heuristics, score is For Bayesian, calculate log 2 P(D fixed, M) P max (, M D) P(D, M)*P* PM P(D,M) * P * P M Uses prior information on parameters P( ) Adapted from David Pollock s
62 Parameters, Types of parameters Amino acid distributions for positions (match states) Global AA distributions for insert states Order of match states Transition probabilities Phylogenetic tree topology and branch lengths Hidden states (integrate or augment) Wander parameter space (search) Maximize, or move according to posterior probability (Bayes) Adapted from David Pollock s
63 Expectation Maximization (EM) Classic algorithm to fit probabilistic model parameters with unobservable states Two Stages Maximize If know hidden variables (states), maximize model parameters with respect to that t knowledge Expectation If know model parameters, find expected values of the hidden variables (states) Works well even with e.g., Bayesian to find near-equilibrium i space Adapted from David Pollock s
64 Homology HMM EM Start with heuristic MSA (e.g., ClustalW) Maximize Match states are residues aligned in most sequences Amino acid frequencies observed in columns Expectation Realign all the sequences given model Repeat until convergence Problems: Local, not global optimization Use procedures to check how it worked Adapted from David Pollock s
65 Model Comparison Determining significance depends on comparing two models (family vs non-family) Usually null model, H 0, and test model, H 1 Models are nested if H 0 is a subset of H 1 If not nested Akaike Information Criterion (AIC) [similar to empirical Bayes] or Bayes Factor (BF) [but be careful] Generating a null distribution of statistic 2 Z-factor, bootstrapping,, parametric bootstrapping, posterior predictive Adapted from David Pollock s
66 Z Test Method Database of known negative controls E.g., non-homologous (NH) sequences Assume NH scores ~ N(,) i.e., you are modeling known NH sequence scores as a normal distribution Set appropriate significance level for multiple comparisons (more below) Problems Is homology certain? Is it the appropriate null model? Normal distribution often not a good approximation Parameter control hard: e.g., length distribution Adapted from David Pollock s
67 Bootstrapping t and Parametric Models Random sequence sampled from the same set of emission probability distributions Same length is easy Bootstrapping is re-sampling columns Parametric uses estimated frequencies, may include variance, tree, etc. More flexible, can have more complex null Pseudocounts of global l frequencies if data limit it Insertions relatively hard to model What frequencies for insert states? Global? Adapted from David Pollock s
68 Center for Genes, Environment, and Health 68
Multiple Sequence Alignment (MSA) BIOL 7711 Computational Bioscience
Consortium for Comparative Genomics University of Colorado School of Medicine Multiple Sequence Alignment (MSA) BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationStatistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department
More informationHidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)
Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationData Mining in Bioinformatics HMM
Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationHidden Markov Models (I)
GLOBEX Bioinformatics (Summer 2015) Hidden Markov Models (I) a. The model b. The decoding: Viterbi algorithm Hidden Markov models A Markov chain of states At each state, there are a set of possible observables
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related
More informationStephen Scott.
1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang
More informationCISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)
CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models
More informationHMMs and biological sequence analysis
HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the
More informationHidden Markov Models and Their Applications in Biological Sequence Analysis
Hidden Markov Models and Their Applications in Biological Sequence Analysis Byung-Jun Yoon Dept. of Electrical & Computer Engineering Texas A&M University, College Station, TX 77843-3128, USA Abstract
More informationIntroduction to Hidden Markov Models for Gene Prediction ECE-S690
Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence
More informationPage 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence
Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns
More informationHidden Markov Models
Andrea Passerini passerini@disi.unitn.it Statistical relational learning The aim Modeling temporal sequences Model signals which vary over time (e.g. speech) Two alternatives: deterministic models directly
More informationMarkov Chains and Hidden Markov Models. = stochastic, generative models
Markov Chains and Hidden Markov Models = stochastic, generative models (Drawing heavily from Durbin et al., Biological Sequence Analysis) BCH339N Systems Biology / Bioinformatics Spring 2016 Edward Marcotte,
More informationCSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:
Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2
More informationStephen Scott.
1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative
More informationHidden Markov Models
Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm
More informationHidden Markov Models (HMMs) and Profiles
Hidden Markov Models (HMMs) and Profiles Swiss Institute of Bioinformatics (SIB) 26-30 November 2001 Markov Chain Models A Markov Chain Model is a succession of states S i (i = 0, 1,...) connected by transitions.
More informationCSCE 471/871 Lecture 3: Markov Chains and
and and 1 / 26 sscott@cse.unl.edu 2 / 26 Outline and chains models (s) Formal definition Finding most probable state path (Viterbi algorithm) Forward and backward algorithms State sequence known State
More informationHidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2
Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis Shamir s lecture notes and Rabiner s tutorial on HMM 1 music recognition deal with variations
More informationHIDDEN MARKOV MODELS
HIDDEN MARKOV MODELS Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm
More informationHidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden
More informationHidden Markov Models
Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How
More information6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II
6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution Lecture 05 Hidden Markov Models Part II 1 2 Module 1: Aligning and modeling genomes Module 1: Computational foundations Dynamic programming:
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationHidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes
Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes music recognition deal with variations in - actual sound -
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationLecture 9. Intro to Hidden Markov Models (finish up)
Lecture 9 Intro to Hidden Markov Models (finish up) Review Structure Number of states Q 1.. Q N M output symbols Parameters: Transition probability matrix a ij Emission probabilities b i (a), which is
More informationHidden Markov Models. Three classic HMM problems
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Hidden Markov Models Slides revised and adapted to Computational Biology IST 2015/2016 Ana Teresa Freitas Three classic HMM problems
More information3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM
I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition
More informationHMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM
I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington
More informationMarkov Models & DNA Sequence Evolution
7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under
More informationAn Introduction to Sequence Similarity ( Homology ) Searching
An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,
More informationHidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:
Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm
More informationHidden Markov Models for biological sequence analysis
Hidden Markov Models for biological sequence analysis Master in Bioinformatics UPF 2017-2018 http://comprna.upf.edu/courses/master_agb/ Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA
More informationLecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010
Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training
More informationHidden Markov Models for biological sequence analysis I
Hidden Markov Models for biological sequence analysis I Master in Bioinformatics UPF 2014-2015 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Example: CpG Islands
More informationRNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"
RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure
More informationHidden Markov Models Hamid R. Rabiee
Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However
More informationorder is number of previous outputs
Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/14/07 CAP5510 1 CpG Islands Regions in DNA sequences with increased
More informationINTEGRATING EPIGENETIC PRIORS FOR IMPROVING COMPUTATIONAL IDENTIFICATION OF TRANSCRIPTION FACTOR BINDING SITES AFFAN SHOUKAT
INTEGRATING EPIGENETIC PRIORS FOR IMPROVING COMPUTATIONAL IDENTIFICATION OF TRANSCRIPTION FACTOR BINDING SITES AFFAN SHOUKAT A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationBioinformatics 1--lectures 15, 16. Markov chains Hidden Markov models Profile HMMs
Bioinformatics 1--lectures 15, 16 Markov chains Hidden Markov models Profile HMMs target sequence database input to database search results are sequence family pseudocounts or background-weighted pseudocounts
More informationMarkov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University
Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models Modeling the statistical properties of biological sequences and distinguishing regions
More informationBMI/CS 576 Fall 2016 Final Exam
BMI/CS 576 all 2016 inal Exam Prof. Colin Dewey Saturday, December 17th, 2016 10:05am-12:05pm Name: KEY Write your answers on these pages and show your work. You may use the back sides of pages as necessary.
More informationBioinformatics 2 - Lecture 4
Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationExample: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding
Example: The Dishonest Casino Hidden Markov Models Durbin and Eddy, chapter 3 Game:. You bet $. You roll 3. Casino player rolls 4. Highest number wins $ The casino has two dice: Fair die P() = P() = P(3)
More informationHidden Markov Models. Introduction to. Model Fitting. Hagit Shatkay, Celera. Data. Model. The Many Facets of HMMs... Tübingen, Sept.
Introduction to Hidden Markov Models Hagit Shatkay, Celera Tübingen, Sept. 2002 Model Fitting Data Model 2 The Many Facets of HMMs... @#$% Found no match for your criteria. Speech Recognition DNA/Protein
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.
More informationRecall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series
Recall: Modeling Time Series CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates
More informationUsing Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics
Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics Tandy Warnow Founder Professor of Engineering The University of Illinois at Urbana-Champaign http://tandy.cs.illinois.edu
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a
More informationLecture 7 Sequence analysis. Hidden Markov Models
Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden
More informationHidden Markov Models. x 1 x 2 x 3 x K
Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K HiSeq X & NextSeq Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization:
More informationGibbs Sampling Methods for Multiple Sequence Alignment
Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical
More informationLecture 3: Markov chains.
1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.
More informationHidden Markov Models (HMMs) November 14, 2017
Hidden Markov Models (HMMs) November 14, 2017 inferring a hidden truth 1) You hear a static-filled radio transmission. how can you determine what did the sender intended to say? 2) You know that genes
More informationRobert Collins CSE586 CSE 586, Spring 2015 Computer Vision II
CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter Recall: Modeling Time Series State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationHidden Markov Models. Ron Shamir, CG 08
Hidden Markov Models 1 Dr Richard Durbin is a graduate in mathematics from Cambridge University and one of the founder members of the Sanger Institute. He has also held carried out research at the Laboratory
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationPlan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping
Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationHidden Markov Models. x 1 x 2 x 3 x K
Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization: f 0 (0) = 1 f k (0)
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationAdvanced Data Science
Advanced Data Science Dr. Kira Radinsky Slides Adapted from Tom M. Mitchell Agenda Topics Covered: Time series data Markov Models Hidden Markov Models Dynamic Bayes Nets Additional Reading: Bishop: Chapter
More informationOutline of Today s Lecture
University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Jeff A. Bilmes Lecture 12 Slides Feb 23 rd, 2005 Outline of Today s
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationHidden Markov Modelling
Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationAssignments for lecture Bioinformatics III WS 03/04. Assignment 5, return until Dec 16, 2003, 11 am. Your name: Matrikelnummer: Fachrichtung:
Assignments for lecture Bioinformatics III WS 03/04 Assignment 5, return until Dec 16, 2003, 11 am Your name: Matrikelnummer: Fachrichtung: Please direct questions to: Jörg Niggemann, tel. 302-64167, email:
More informationMultiscale Systems Engineering Research Group
Hidden Markov Model Prof. Yan Wang Woodruff School of Mechanical Engineering Georgia Institute of echnology Atlanta, GA 30332, U.S.A. yan.wang@me.gatech.edu Learning Objectives o familiarize the hidden
More informationMultiple Sequence Alignment using Profile HMM
Multiple Sequence Alignment using Profile HMM. based on Chapter 5 and Section 6.5 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. students Beatrice Miron, Oana Răţoi,
More informationUniversity of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I
University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()
More informationROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY
BIOINFORMATICS Lecture 11-12 Hidden Markov Models ROBI POLIKAR 2011, All Rights Reserved, Robi Polikar. IGNAL PROCESSING & PATTERN RECOGNITION LABORATORY @ ROWAN UNIVERSITY These lecture notes are prepared
More informationLab 3: Practical Hidden Markov Models (HMM)
Advanced Topics in Bioinformatics Lab 3: Practical Hidden Markov Models () Maoying, Wu Department of Bioinformatics & Biostatistics Shanghai Jiao Tong University November 27, 2014 Hidden Markov Models
More informationSyllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)
Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural
More informationHidden Markov Models Part 2: Algorithms
Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationHuman Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data
Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data 0. Notations Myungjun Choi, Yonghyun Ro, Han Lee N = number of states in the model T = length of observation sequence
More informationBiology 644: Bioinformatics
A stochastic (probabilistic) model that assumes the Markov property Markov property is satisfied when the conditional probability distribution of future states of the process (conditional on both past
More information11.3 Decoding Algorithm
11.3 Decoding Algorithm 393 For convenience, we have introduced π 0 and π n+1 as the fictitious initial and terminal states begin and end. This model defines the probability P(x π) for a given sequence
More information