Molecular Evolution and Comparative Genomics

Size: px
Start display at page:

Download "Molecular Evolution and Comparative Genomics"

Transcription

1 Molecular Evolution and Comparative Genomics --- the phylogenetic HMM model , CMB lecture 5---Eric Xing Some important dates in history (billions of years ago) Origin of the universe 15 ±4 Formation of the solar system 4.6 First self-replicating system 3.5 ±0.5 Prokaryotic-eukaryotic divergence 1.8 ±0.3 Plant-animal divergence 1.0 Invertebrate-vertebrate divergence 0.5 Mammalian radiation beginning CSH Doolittle et al. 1

2 The three kingdoms Two important early observations Different proteins evolve at different rates, and this seems more or less independent of the host organism, including its generation time. It is necessary to adjust the observed percent difference between two homologous proteins to get a distance more or less linearly related to the time since their common ancestor. ( Later we offer a rational basis for doing this.) n striking early version of these observations is next. 2

3 Rates of macromolecular evolution Corrected amino acid changes per 100 residues 220 Mammals MY Fibrinopeptides 4 Birds/Reptiles Mammals/ Reptiles Reptiles/Fish 200 abcde f g h i Pliocene Miocene Oligocene Cretaceous Jurassic Triassic Permian Carboniferous Devonian Silurian Ordovician Carp/Lamprey V ertebrates/ Insects Hemoglobin Cambrian j 5.8 MY Evolution of the globins Cytochrome c lgonkian 20.0 MY 1 Separation of ancestors of plants and animals Eocene Paleocene Huronian Millions of years since divergence fter Dickerson (1971) How does sequence variation arise? Mutation: (a) Inherent: DN replication errors are not always corrected. (b) External: exposure to chemicals and radiation. Selection: Deleterious mutations are removed quickly. Neutral and rarely, advantageous mutations, are tolerated and stick around. Fixation: It takes time for a new variant to be established (having a stable frequency) in a population. 3

4 Modeling DN base substitution Standard assumptions (sometimes weakened) 1. Site independence. 2. Site homogeneity. 3. Markovian: given current base, future substitutions independent of past. 4. Temporal homogeneity: stationary Markov chain. Strictly speaking, only applicable to regions undergoing little selection. Some terminology In evolution, homology (here of proteins), means similarity due to common ancestry. common mode of protein evolution is by duplication. Depending on the relations between duplication and speciation dates, we have two different types of homologous proteins. Loosely, Orthologues: the same gene in different organisms;common ancestry goes back to a speciation event. Paralogues: different genes in the same organism; common ancestry goes back to a gene duplication. Lateral gene transfer gives another form of homology. 4

5 Beta-globins (orthologues) BG-human BG-macaque BG-bovine BG-platypus BG-chicken BG-shark M V H L T P E E K S V T L W G K V N V D E V G G E L G R L L V V Y P W T Q N... T M F.... K S G G N I N. L W.... Q L I. G C I W S E V. L H E I. T T. K S I D K H S L. K... M F I..... T BG-human BG-macaque BG-bovine BG-platypus BG-chicken BG-shark R F F E S F G D L S T P D V M G N P K V K H G K K V L G F S D G L H L D S N N D S.. N. M K S G T S. G.. K N N.. S. T. I L... M. R T S. G. V K N... Y. G N L K E F T C S Y G E.... T.. L G V V T.. G BG-human BG-macaque BG-bovine BG-platypus BG-chicken BG-shark N L K G T F T L S E L H C D K L H V D P E N F R L L G N V L V C V L H H F G Q K D K V... R N.. D K N R..... I V... R.. S. I. N.. S Q D I. I I..... S D V. S Q. T D.. K K. E E.... V. S. K.. K C F. V E. G I L L K BG-human BG-macaque BG-bovine BG-platypus BG-chicken BG-shark K E F T P P V Q Y Q K V V G V N L H K Y H..... Q V L.. D F R... D. S. E.... W.. L. S... H.. G..... D... E C... W.. L. R V.. H... R... D K.. Q T.. I W E. Y F G V. V D. I S K E... means same as reference sequence - means deletion Beta-globins: uncorrected pairwise distances DISTNCES between protein sequences, calculated over: 1 to 147 Below diagonal: observed number of differences bove diagonal: number of differences per 100 amino acids hum mac bov pla chi sha hum mac bov pla chi sha

6 Beta-globins: corrected pairwise distances DISTNCES between protein sequences, calculated over 1 to 147. Below diagonal: observed number of differences bove diagonal: estimated number of substitutions per 100 amino acids Correction method: Jukes-Cantor hum mac bov pla chi sha hum mac bov pla chi sha Human globins (paralogues) alpha-human - V L S P D K T N V K W G K V G H G E Y G E L E R M F L S F P T T beta-human V H. T. E E. S. T. L N V D. V. G... G. L L V V Y. W. delta-human V H. T. E E... N. L N V D V. G... G. L L V V Y. W. epsilon-human V H F T E E.. T S L. S. M - - N V E.. G... G. L L V V Y. W. gamma-human G H F T E E.. T I T S L N V E D. G. T. G. L L V V Y. W. myo-human - G.. D G E W Q L. L N V.... E. D I P G H. Q. V. I. L. K G H. E alpha-human K T Y F P H F - D L S H G S Q V K G H G K K V D L T N V H V beta-human Q R F. E S. G... T P D. V M G N P K L G. F S D G L.. L delta-human Q R F. E S. G... S P D. V M G N P K L G. F S D G L.. L epsilon-human Q R F. D S. G N.. S P.. I L G N P K L T S F G D. I K N M gamma-human Q R F. D S. G N.. S.. I M G N P K L T S. G D. I K. L myo-human L E K. D K. K H. K S E D E M K S E D L. K.. T. L T.. G G I L K K K alpha-human D D M P N L S L S D L H H K L R V D P V N F K L L S H C L L V T L H L beta-human. N L K G T F T.. E.. C D.. H... E.. R.. G N V. V C V.. H. F delta-human. N L K G T F. Q.. E.. C D.. H... E.. R.. G N V. V C V.. R N F epsilon-human. N L K P. F K.. E.. C D.. H... E..... G N V M V I I.. T. F gamma-human.. L K G T F Q.. E.. C D.. H... E..... G N V. V T V.. I. F myo-human G H H E E I K P. Q S.. T. H K I P V K Y L E F I. E. I I Q V. Q S K H alpha-human P E F T P V H S L D K F L S V S T V L T S K Y R beta-human G K.... P. Q. Y Q. V V. G. N. H.. H delta-human G K.... Q M Q. Y Q. V V. G. N. H.. H epsilon-human G K.... E. Q. W Q. L V S. I. H.. H gamma-human G K.... E. Q.. W Q. M V T. S. S. R. H myo-human. G D. G D Q G M N.. E L F R K D M. N. K E L G F Q G 6

7 Human globins: corrected pairwise distances DISTNCES between protein sequences, calculated over 1 to 141. Below diagonal: observed number of differences bove diagonal: estimated number of substitutions per 100 amino acids Correction method: Jukes-Cantor alpha beta delta epsil gamma myo alpha beta delta epsil gamma myo Correcting distances between DN and protein sequences Why it is necessary to adjust observed percent differences to get a distance measure which scales linearly with time? This is because we can have multiple and back substitutions at a given position along a lineage. ll of the correction methods (with names like Jukes-Cantor, 2- parameter Kimura, etc) are justified by simple probabilistic arguments involving Markov chains whose basis is worth mastering. The same molecular evolutionary models can be used in scoring sequence alignments. 7

8 Markov chain State space = {,C,G,T}. p(i,j) = pr(next state S j current state S i ) Markov assumption: p(next state S j current state S i & any configuration of states before this) = p(i,j) Only the present state, not previous states, affects the probs of moving to next states. The multiplication rule pr(state after next is S k current state is S i ) =? j pr(state after next is S k, next state is S j current state is S i ) [addition rule] =? j pr(next state is S j current state is S i ) x pr(state after next is S k current state is S i, next state is S j ) =? j p i,j x p j,k [multiplication rule] [Markov assumption] = (i,k)-element of P 2, where P=(p i,j ). More generally, pr(state t steps from now is S k current state is S i ) = i,k element of P t 8

9 Continuous-time version For any (s, t), let p ij (t) = pr(s j at time t+s S i at time s) denote the stationary (timehomogeneous) transition probabilities. Let P(t) = (p ij (t)) denote the matrix of p ij (t) s. Then for any (t, u): P(t+u) = P(t) P(u). It follows that P(t) = exp(qt), where Q = P (0) ( the derivative of P(t) at t = 0 ). Q is called the infinitesimal matrix (transition rate matrix) of P(t), and satisfies P (t) = QP(t) = P(t)Q. Interpretation of Q Roughly, q ij is the rate of transitions of i to j, while q ii = Σ j i q ij, so each row sum is 0. Now we have the short-time approximation: p i j (t+h) = q ij h + o(h), and p ii (t+h) = 1+q ii h + o(h), Now consider the Chapman-Kolmogorov relation: (assuming we have a continuous -time Markov chain, and let p j (t) = pr(s j at time t)) p j (t+h) =Σ i pr(s i at t, S j at t+h) = Σ i pr(s i at t)pr(s j at t+h S i at t) = p j (t)x(1+hq jj ) + Σ i j p i (t)x hq ij i.e., h -1 [p j (t+h) - p j (t)] = p j (t)q jj + Σ i j p i (t)q ij, which becomes P = QP as h 0. Important approximation: when t is small, P(t) I + Qt. 9

10 Probabilistic models for DN changes Orc: Elf: Dwarf: Hobbit: Human: CGTGCGCCCCCGT CGTGCGCTCCGT CCTGTGCGTCCG CCTGTGCGTGCCG CCTGTGCGTGCCG The Jukes-Cantor model (1969) Substitution rate: -µ µ/3 -µ G µ/3 µ/3 µ/3 µ/3 -µ C µ/3 T -µ the simplest sysmmetrical model for DN evolution 10

11 Jukes-Cantor (cont.) Prob transition matrix: C G T r(t) s(t) s(t) s(t) S(t) = C s(t) r(t) s(t) s(t) G s(t) s(t) r(t) s(t) T s(t) s(t) s(t) r(t) Where we can derive: r(t) = ¼ (1 + 3 e -4αt ) s(t) = ¼ (1 e -4αt ) Jukes-Cantor (cont.) Fraction of sites differences difference per site time 11

12 Kimura's K2P model (1980) Substitution rate: -α-2β α G -α-2β β β β β -α-2β C α T -α-2β which allows for different rates of transition and transversions. Transitions (rate a) are much more likely than transversions (rate b). Kimura (cont.) Prob transition matrix: r(t) s(t) u(t) s(t) S(t) = s(t) r(t) s(t) u(t) u(t) s(t) r(t) s(t) s(t) u(t) s(t) r(t) Where s(t) = ¼ (1 e -4βt ) u(t) = ¼ (1 + e -4βt e -2(α+β)t ) r(t) = 1 2s(t) u(t) By proper choice of and one can achieve the overall rate of change and Ts=Tn ratio R you want (warning: terminological tangle). 12

13 Kimura (cont.) Transitions, transversions expected under different R: The general time-reversible model It maintains "detailed balance" so that the probability of starting at (say) and ending at (say) T in evolution is the same as the probability of starting at T and ending at : C G T C G T? s(t) ap s(t) s(t) C ßpG?p ap s(t)? s(t) dp s(t) G ep ßp s(t) s(t) dp? s(t) C?p?p s(t) s(t) ep s(t)? C?pG T T T nd there is of course the general 12-parameter model which has arbitrary rates for each of the 12 possible changes (from each of the 4 nucleotides to each of the 3 others). (Neither of these has formulas for the transition probabilities, but those can be done numerically.) 13

14 Relation between models Phylogenetic trees 14

15 pair of homologous bases ancestor T years Q h Q m C Typically, the ancestor is unknown. More assumptions Q h = s h Q and Q m = s m Q, for some positive s h, s m, and a rate matrix Q. The ancestor is sampled from the stationary distribution π of Q. Q is reversible: for a, b, t 0 π(a)p(t,a,b) = P(t,b,a)π(b), (detailed balance). 15

16 The stationary distribution probability distribution π on {,C,G,T} is a stationary distribution of the Markov chain with transition probability matrix P = P(i,j), if for all j, i π(i) P(i,j) = π(j). Exercise. Given any initial distribution, the distribution at time t of a chain with transition matrix P converges to π as t. Thus, π is also called an equilibrium distribution. Exercise. For the Jukes-Cantor and Kimura models, the uniform distribution is stationary. (Hint: diagonalize their infinitesimal rate matrices.) We often assume that the ancestor sequence is i.i.d π. New picture ancestor ~ π s h T PMs Q Q s m T PMs C 16

17 Phylogeny The shaded nodes represent the observed nucleotides at a given site for a set of organisms The unshaded nodes represent putative ancestral nucleotides Transitions between nodes capture the dynamic of evolution Phylogeny methods Basic principles: Degree of sequence difference is proportional to length of independent sequence evolution Only use positions where alignment is pretty certain avoid areas with (too many) gaps Major methods: Parsimony phylogeny methods Likelihood methods 17

18 Likelihood methods tree, with branch lengths, and the data at a single site. t 7 t 5 t 8 t 6 t 2 t 1 t 3 t 4 CGTGCGCCCCCGT CGTGCGCTCCGT CCTGTGCGTCCG CCTGTGCGTGCCG CCTGTGCGTGCCG Since the sites evolve independently on the same tree, L = P( D T) = m? i=1 P( D ( i ) T ) Likelihood at one site on a tree We can compute this by summing over all assignments of states x, y, z and w to the interior nodes: P( D ( i ) T ) =???? P(,, C, C, C, x, y, z, w T ) x y z w x t 8 z t 7 w t 5 t 3 t 4 Due to the Markov property of the tree, we can factorize the complete likelihood according to the tree topology: P(,, C, C, C, x, y, z, w T ) = P( x) P( y x, t6) P( y, t1) P( C y, t2) P z x, t ) P( C y, ) ( 8 t3 P( w z, t7 ) P( C y, t4 ) P( C y, t5 Summing this up, there are 256 terms in this case! t 6 ) y t 2 t 1 18

19 Getting a recursive algorithm when we move the summation signs as far right as possible: ( ( i ) P D T ) =????? x z x y z w P(,, C, C, C, x, y, z, w T ) = P ( x) (? P( y x, t6 ) P( y, t1) P( C y, t2 )) y? P( z x, t8) P( C z, t3) (? P ( w z, t7) P( C w, t4) P( C w, t5) ( w )) Felsenstein s Pruning lgorithm To calculate P(x 1, x 2,, x N T, t) Initialization: Set k = 2N 1 Recursion: Compute P(L k a) for all a Σ If k is a leaf node: Set P(L k a) = 1(a = x k ) If k is not a leaf node: 1. Compute P(L i b), P(L j b) for all b, for daughter nodes i, j 2. Set P(L k a) = Σ b, c P(b a, t i )P(L i b) P(c a, t j ) P(L j c) Termination: Likelihood at this column = P(x 1, x 2,, x N T, t) = Σ a P(L 2N-1 a)p(a) 19

20 Finding the ML tree So far I have just talked about the computation of the likelihood for one tree with branch lengths known. To find a ML tree, we must search the space of tree topologies, and for each one examined, we need to optimize the branch lengths to maximize the likelihood. Phylogenetic HMM G... G T C G G 20

21 Modeling rate variation among sites There are a finite number of rates (denote rate i as r i ). There are probabilities p i of a site having rate i. process not visible to us ("hidden") assigns rates to sites. The probability of our seeing some data are to be obtained by summing over all possible combinations of rates, weighting approproately by their probabilities of occurrence. Rocall the HMM Y 1 Y 2 Y 3... Y T i i i i i e 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 i i i i C G T... G G G T C T X 1 X 2 X 3 X T The shaded nodes represent the observed nucleotides at particular sites of an organism's genome For discrete Y i, widely used in computational biology to represent segments of sequences gene finders and motif finders profile models of protein domains models of secondary structure 21

22 Definition (of HMM) Definition: hidden Markov model (HMM) Observation alphabet Σ = { b 1, b 2,, b M } Set of hidden states Q = { 1,..., K } Transition probabilities between any two states a ij = P(y=j y=i) a i1 + + a ik = 1, for all states i = 1 K Start probabilities a 0i a a 0K = 1 Emission probabilities associated with each state e ib = P( x i = b y i = k) e i,b1 + + e i,bm = 1, for all states i = 1 K 1 K 2 Hidden Markov Phylogeny... G G C G T G This yields a gene finder that exploits evolutionary constraints Based on sequence data from primate species, Mculiffe et al (2003) obtained sensitivity of 100%, with a specificity of 89%. Genscan (state-of-the-art gene finder) yield a sensitivity of 45%, with a specificity of 34%. 22

23 The Forward lgorithm We can compute f k (t) for all k, t, using dynamic programming! Initialization: f 0 (0) = 1 f k (0) = 0, for all k > 0 Iteration: f l (t) = e l (x t ) Σ k f k (t-1) a kl (a 0k is a vector of initial probability) Termination: P(x) = Σ k f k (T) The Backward lgorithm We can compute b k (t) for all k, t, using dynamic programming Initialization: b k (T) = 1, for all k Iteration: b k (t) = Σ l e l (x t+1 ) a kl b l (t+1) Termination: P(x) = Σ l a 0l e l (x 1 ) b l (1) 23

24 Open questions (philosophical) Observation: Finding a good phylogeny will help in finding the genes. Finding the genes will help to find biologically meaningful phylogenetic trees Which came first, the chicken or the egg? Open questions (technical) How to learn a phylogeny (topology and transition prob.)? Should different site use the same phylogeny? Functionspecific phylogeny? Other evolutionary events: duplication, rearrangement, lateral transfer, etc. 24

Chapter 2: Sequence Alignment

Chapter 2: Sequence Alignment Chapter 2: Sequence lignment 2.2 Scoring Matrices Prof. Yechiam Yemini (YY) Computer Science Department Columbia University COMS4761-2007 Overview PM: scoring based on evolutionary statistics Elementary

More information

Computational Genomics

Computational Genomics omputational Genomics Molecular Evolution: Phylogenetic trees Eric Xing Lecture, March, 2007 Reading: DTW boo, hap 2 DEKM boo, hap 7, Phylogeny In, Ernst Haecel coined the word phylogeny and presented

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Advanced Algorithms and Models for Computational Biology -- a machine learning approach. Some important dates in history (billions of years ago)

Advanced Algorithms and Models for Computational Biology -- a machine learning approach. Some important dates in history (billions of years ago) Advanced Algorithm and Model for Computational Biology -- a machine learning approach Molecular Evolution: nucleotide ubtitution model Eric Xing Lecture 20, April 3, 2006 Reading: DTW book, Chap 12 DEKM

More information

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

Lecture Notes: Markov chains

Lecture Notes: Markov chains Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

More information

Reading for Lecture 13 Release v10

Reading for Lecture 13 Release v10 Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

More information

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22 Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Phylogeny: building the tree of life

Phylogeny: building the tree of life Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How

More information

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26 Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

More information

进化树构建方法的概率方法 第 4 章 : 进化树构建的概率方法 问题介绍. 部分 lid 修改自 i i f l 的 ih l i

进化树构建方法的概率方法 第 4 章 : 进化树构建的概率方法 问题介绍. 部分 lid 修改自 i i f l 的 ih l i 第 4 章 : 进化树构建的概率方法 问题介绍 进化树构建方法的概率方法 部分 lid 修改自 i i f l 的 ih l i 部分 Slides 修改自 University of Basel 的 Michael Springmann 课程 CS302 Seminar Life Science Informatics 的讲义 Phylogenetic Tree branch internal node

More information

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. x 1 x 2 x 3 x K Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K HiSeq X & NextSeq Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization:

More information

Probabilistic modeling and molecular phylogeny

Probabilistic modeling and molecular phylogeny Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information

Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Lecture : p he biological problem p lobal alignment p Local alignment p Multiple alignment 6 Background: comparative genomics p Basic question in biology: what properties

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

More information

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics CS5263 Bioinformatics Guest Lecture Part II Phylogenetics Up to now we have focused on finding similarities, now we start focusing on differences (dissimilarities leading to distance measures). Identifying

More information

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Bio 2 Plant and Animal Biology

Bio 2 Plant and Animal Biology Bio 2 Plant and Animal Biology Evolution Evolution as the explanation for life s unity and diversity Darwinian Revolution Two main Points Descent with Modification Natural Selection Biological Species

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Mathematical Statistics Stockholm University Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Bodil Svennblad Tom Britton Research Report 2007:2 ISSN 650-0377 Postal

More information

Fossils Biology 2 Thursday, January 31, 2013

Fossils Biology 2 Thursday, January 31, 2013 Fossils Biology 2 Evolution Change in the genetic composition of a group of organisms over time. Causes: Natural Selection Artificial Selection Genetic Engineering Genetic Drift Hybridization Mutation

More information

Phylogeny Tree Algorithms

Phylogeny Tree Algorithms Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some

More information

Chapter 7: Models of discrete character evolution

Chapter 7: Models of discrete character evolution Chapter 7: Models of discrete character evolution pdf version R markdown to recreate analyses Biological motivation: Limblessness as a discrete trait Squamates, the clade that includes all living species

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

The History of Life. Fossils and Ancient Life (page 417) How Fossils Form (page 418) Interpreting Fossil Evidence (pages ) Chapter 17

The History of Life. Fossils and Ancient Life (page 417) How Fossils Form (page 418) Interpreting Fossil Evidence (pages ) Chapter 17 Chapter 17 The History of Life Section 17 1 The Fossil Record (pages 417 422) This section explains how fossils form and how they can be interpreted. It also describes the geologic time scale that is used

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

Bio94 Discussion Activity week 3: Chapter 27 Phylogenies and the History of Life

Bio94 Discussion Activity week 3: Chapter 27 Phylogenies and the History of Life Bio94 Discussion Activity week 3: Chapter 27 Phylogenies and the History of Life 1. Constructing a phylogenetic tree using a cladistic approach Construct a phylogenetic tree using the following table:

More information

Hidden Markov models in population genetics and evolutionary biology

Hidden Markov models in population genetics and evolutionary biology Hidden Markov models in population genetics and evolutionary biology Gerton Lunter Wellcome Trust Centre for Human Genetics Oxford, UK April 29, 2013 Topics for today Markov chains Hidden Markov models

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Using algebraic geometry for phylogenetic reconstruction

Using algebraic geometry for phylogenetic reconstruction Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA

More information

C.DARWIN ( )

C.DARWIN ( ) C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships

More information

Pairwise sequence alignment

Pairwise sequence alignment Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL

More information

7.36/7.91 recitation CB Lecture #4

7.36/7.91 recitation CB Lecture #4 7.36/7.91 recitation 2-19-2014 CB Lecture #4 1 Announcements / Reminders Homework: - PS#1 due Feb. 20th at noon. - Late policy: ½ credit if received within 24 hrs of due date, otherwise no credit - Answer

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Comparative Gene Finding. BMI/CS 776  Spring 2015 Colin Dewey Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes

More information

Chapter Study Guide Section 17-1 The Fossil Record (pages )

Chapter Study Guide Section 17-1 The Fossil Record (pages ) Name Class Date Chapter Study Guide Section 17-1 The Fossil Record (pages 417-422) Key Concepts What is the fossil record? What information do relative dating and radioactive dating provide about fossils?

More information

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics

More information

Molecular evolution. Joe Felsenstein. GENOME 453, Autumn Molecular evolution p.1/49

Molecular evolution. Joe Felsenstein. GENOME 453, Autumn Molecular evolution p.1/49 Molecular evolution Joe Felsenstein GENOME 453, utumn 2009 Molecular evolution p.1/49 data example for phylogeny inference Five DN sequences, for some gene in an imaginary group of species whose names

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. x 1 x 2 x 3 x K Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization: f 0 (0) = 1 f k (0)

More information

17-1 The Fossil Record Slide 2 of 40

17-1 The Fossil Record Slide 2 of 40 2 of 40 Fossils and Ancient Life What is the fossil record? 3 of 40 Fossils and Ancient Life Fossils and Ancient Life Paleontologists are scientists who collect and study fossils. All information about

More information

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe? How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What

More information

Hidden Markov Models (I)

Hidden Markov Models (I) GLOBEX Bioinformatics (Summer 2015) Hidden Markov Models (I) a. The model b. The decoding: Viterbi algorithm Hidden Markov models A Markov chain of states At each state, there are a set of possible observables

More information

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.

More information

Stephen Scott.

Stephen Scott. 1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family Review: Gene Families Gene Families part 2 03 327/727 Lecture 8 What is a Case study: ian globin genes Gene trees and how they differ from species trees Homology, orthology, and paralogy Last tuesday 1

More information

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

Week 5: Distance methods, DNA and protein models

Week 5: Distance methods, DNA and protein models Week 5: Distance methods, DNA and protein models Genome 570 February, 2016 Week 5: Distance methods, DNA and protein models p.1/69 A tree and the expected distances it predicts E A 0.08 0.05 0.06 0.03

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2011 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley B.D. Mishler Feb. 1, 2011. Qualitative character evolution (cont.) - comparing

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Molecular Evolution and Phylogenetic Tree Reconstruction

Molecular Evolution and Phylogenetic Tree Reconstruction 1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length

More information

Molecular Evolution, course # Final Exam, May 3, 2006

Molecular Evolution, course # Final Exam, May 3, 2006 Molecular Evolution, course #27615 Final Exam, May 3, 2006 This exam includes a total of 12 problems on 7 pages (including this cover page). The maximum number of points obtainable is 150, and at least

More information

Chapter 26 Phylogeny and the Tree of Life

Chapter 26 Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life Chapter focus Shifting from the process of how evolution works to the pattern evolution produces over time. Phylogeny Phylon = tribe, geny = genesis or origin

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). 1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

More information

How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/14/07 CAP5510 1 CpG Islands Regions in DNA sequences with increased

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018 Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

More information

Lecture 7 Sequence analysis. Hidden Markov Models

Lecture 7 Sequence analysis. Hidden Markov Models Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information