Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides
|
|
- Candace Parsons
- 5 years ago
- Views:
Transcription
1 hanks to Paul Lewis, Jeff horne, and Joe Felsenstein for the use of slides
2 Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPM infers a tree from a distance matrix: groups based on similarity fails to give the correct tree if rates of character evolution vary much Modern distance-based approaches: find trees and branch lengths: patristic distances distances from character data. do not use all of the information in the data. Parsimony: prefer the tree that requires the fewest character state changes. Minimize the number of times you invoke homoplasy to explain the data. can work well if if homoplasy is not rare fails if homoplasy very common or is concentrated on certain parts of the tree Maximum likelihood computes the probability of the data given a model (tree and branch lengths) computationally expensive
3 Review ree Searching Hennigian logic builds a tree directly from the characters UPM builds a tree from distances Parsimony, maximum likelihood, and modern distance methods are optimality criteria. We still have to search for the best tree. oo many trees to enumerate them exhaustively We rely on hill-climbing heuristics
4 Even if we find the optimal tree, we do not know that it is the true tree. How do we assess statistical support?
5 estimate of θ he bootstrap (unknown) true value of θ empirical distribution of sample Bootstrap replicates (unknown) true distribution Distribution of estimates of parameters Week 7: Bayesian inference, esting trees, Bootstraps p.33/54
6 he bootstrap for phylogenies Original Data sites sequences Bootstrap sample #1 sites Estimate of the tree sequences sample same number of sites, with replacement Bootstrap sample #2 sequences sites sample same number of sites, with replacement Bootstrap estimate of the tree, #1 (and so on) Bootstrap estimate of the tree, #2 Week 7: Bayesian inference, esting trees, Bootstraps p.34/54
7 Bootstrapping: first step k From the original data, estimate a tree using, say, parsimony (could use NJ, LS, ML, etc., however) opyright 2007 Paul O. Lewis 4
8 Bootstrapping: first replicate weights k 2 Sum of weights equals k (i.e., each bootstrap dataset has same number of sites as the original) From the bootstrap dataset, estimate the tree using the same method you used for the original dataset opyright 2007 Paul O. Lewis 5
9 Bootstrapping: second replicate weights k 0 Note that weights are different this time, reflecting the random sampling with replacement used to generate the weights 4... his time the tree that is estimated is different than the one estimated using the original dataset opyright 2007 Paul O. Lewis 6
10 Bootstrapping: 20 replicates 1234 Freq *-* ** ** Note: usually at least 100 replicates are performed, and 500 is better opyright 2007 Paul O. Lewis 7
11 Bootstrapping: first step k From the original data, estimate a tree using, say, parsimony (could use NJ, LS, ML, etc., however) opyright 2007 Paul O. Lewis 4
12 Bootstrapping: first replicate weights k 2 Sum of weights equals k (i.e., each bootstrap dataset has same number of sites as the original) From the bootstrap dataset, estimate the tree using the same method you used for the original dataset opyright 2007 Paul O. Lewis 5
13 Bootstrapping: second replicate weights k 0 Note that weights are different this time, reflecting the random sampling with replacement used to generate the weights 4... his time the tree that is estimated is different than the one estimated using the original dataset opyright 2007 Paul O. Lewis 6
14 Bootstrapping: 20 replicates 1234 Freq *-* ** ** Note: usually at least 100 replicates are performed, and 500 is better opyright 2007 Paul O. Lewis 7
15 20% 10% 0.5% 5% 0.5% 4.5% 5% 10% 200 Million Year Old Fossil
16 20% 10% 0.5% 5% 4.5% 20% Sequence Divergence in 200 Mill. Years means 1% divergence per 10 Mill. Years 0.5% 10 Million 100 5% Million 10% he "lock Idea" 400 Million 200 Million Year Old Fossil
17 " comparison of the structures of homologous proteins... from different species is important, therefore, for two reasons. First, the similarities found give a measure of the minimum structure for biological function. Second, the differences found may give us important clues to the rate at which successful mutations have occurred throughout evolutionary time and may also serve as an additional basis for establishing phylogenetic relationships." From p. 143 of he Molecular Basis of Evolution by Dr. hristian B. nfinsen (Wiley, 1959)
18 20% 10% 0.5% 5% 4.5% problem with the "lock Idea": Rates of Molecular Evolution hange Over ime!! 0.5% 10 Million 100 5% Million 10% 400 Million 200 Million Year Old Fossil
19 Ernst Mayr recalled at this meeting that there are two distinct aspects to phylogeny: the splitting of lines, and what happens to the lines subsequently by divergence. He emphasized that, after splitting, the resulting lines may evolve at very different rates... How can one then expect a given type of protein to display constant rates of evolutionary modification along different lines of descent? (Evolving enes and Proteins. Zuckerkandl and Pauling, 1965, p. 138).
20 Molecular lock No lock B D E D amount of evolution (substitutions per site) B E
21 ssuming a Strict Molecular lock No lock lnl = lock lnl = LR test statistic = 232 n=15 taxa, n-2 = 13 d.f. Null (clock) hypothesis rejected Langley,. H., and W. M. Fitch n estimation of the constancy of the rate of molecular evolution. Journal of Molecular Evolution 3: Felsenstein, J Statistical inference of phylogenies by Paul O. Lewis Journal of the Royal Statistical Society 146:
22 Reasons that the clock might be rejected 1. Rates of evolution vary across lineages can vary over time: (a) mutation rates can vary (mutations per cell cycle, mutations per time, number of cell cycles per generation, generation time). (b) strength and targets of selection can vary (c) population sizes can vary 2. Incorrect models of sequence evolution lead to errors in the estimation of rates (a) lmost any error in the model can lead to biases (or higher than needed variance) in detecting multiple hits (b) ssumption of a Poisson clock can be wrong even if we correctly count the number of changes, if we don t count for over-dispersion (higher than Poisson-variance in the # of substitutions) then we can falsely reject utler (2000)
23 Penalized likelihood (penalize rates that vary too much) Bayesian approaches: model the rate of evolution of the rate of evolution. incorporates prior knowledge of what rates combinations are most likely.
24 Molecular sequence data protein and (later) DN sequences clearly not environmental or plastic Kimura s neutral theory implies that homoplasy due to functional convergence should be rare
25 Homo sap. Pan trog. orilla gor. Pongo pyg. he sequences cannot be characters states in a Hennigian analysis No two are shared!
26 Homo sap. Pan trog. orilla gor. Pongo pyg. We could treat columns ( sites ) as characters and the bases as states his requires an alignment
27 Insertions and deletions (indels) of nucleotides occur during evolution; So, we cannot count on the 5th position in every sequence as being descended from the same ancestral base; lignment: adding gap characters ( - ) to sequences. he goal of alignment is to make homologous sites occur in the same column. Multiple sequence alignment is a very difficult problem compared to pairwise alignment.
28 Uses of multiple sequence alignment orrespondence We often want to know which parts do the same thing or have the same structure. Profiles we can create profiles that summarize the characteristics of a protein family. enome assembly alignment is a part of the creation of contig maps of genomic fragments such as ESs. Phylogenetics he vast majority of phylogenetic methods require aligned data.
29 urrent standard operating procedure for tree reconstruction from molecular sequence data 1. ollect sequences 2. lign the sequences (usually with clustalw or clustalx) 3. Remove/recode regions of uncertain alignment 4. Infer phylogenetic trees
30 human chimp orang KRSV KRV KPRV
31 human chimp orangutan KRSV KRV KPRV del S KRSV S->R P->R KPSV
32 human chimp gorilla orang KRSV KRV KSV KPRV How should we align these sequences? human KRSV human KRSV chimp KR-V OR chimp K-RV gorilla KS-V gorilla K-SV orang KPRV orang KPRV
33 Pairwise alignment ap penalties and a substitution matrix imply a score for any alignment. Pairwise alignment involves finding the alignment that maximizes this score. substitution matrices assign positive values to matches or similar substitutions (for example Leucine Isoleucine). unlikely substitutions receive negative scores gaps are rare and are heavily penalized (given large negative values).
34 Scoring an alignment. Simplest case osts: Match 1 Mismatch 0 ap -5 lignment: Pongo V D E V E L R L F V V P Q orilla V E V D L R L L I V Y P S R Score otal score = 5
35 Scoring an different alignment. Simplest case Match 1 Mismatch 0 ap -5 Pongo V D E V E L R L - F V V P Q orilla V - E V D L R L L I V Y P S R Score otal score = 0
36 BLOSUM 62 Substitution matrix R N D Q E H I L K M F P S W Y V 4 R -1 5 N D Q E H I L K M F P S W Y V R N D Q E H I L K M F P S W Y V
37 Scoring an alignment with the BLOSUM 62 matrix Pongo V D E V E L R L F V V P Q orilla V E V D L R L L I V Y P S R Score he score for the alignment is D ij = k d (k) ij If i indicates Pongo and j indicates orilla D ij = 12
38 Scoring an alignment with gaps If the P is -8: Pongo V D E V E L R L - F V V P Q orilla V - E V D L R L L I V Y P S R Score By introducing gaps we have improved the score: D ij = 40
39 ap Penalties aps are penalized more heavily than substitutions to avoid alignments like this: Pongo orilla VDEVE-LRLFVVPQ VDEV-WLRLFVVPQ
40 ap Penalties Because multiple residues are often inserted or deleted at the same time, affine gap penalties are often used: P = O + le where: P is the gap penalty. O is the gap-opening penalty E is the gap-extension penalty l is the length of the gap
41 Finding an optimal alignment orilla V E V D L R L L I V Y P S R V Pongo D E V E L R L F V V P Q
42 ligning two sequences, each with length = 1 D E
43 lignment 1 D D- E -E
44 lignment 2 D D E E
45 lignment 3 D -D E E-
46 Longer sequences up to 2 amino acids! V D V E
47 lignment 1 V D V E
48 lignment 2 V D V E
49 lignment 3 V D V E
50 lignment 4 V D V E
51 lignment 5 V D V E
52 lignment 6 V D V E
53 lignment 7 V D V E
54 lignment 8 V D V E
55 lignment 9 V D V E
56 lignment 10 V D V E
57 lignment 11 V D V E
58 lignment 12 V D V E
59 lignment 13 V D V E
60 Pongo V D E V E L R L F V V P Q orilla V E V D L R L L I V Y P S R Score orilla V E V D L R L L I V Y P S R V Pongo D E V E L R L F V V P Q
61 Pongo V D E V E L R L - F V V P Q orilla V - E V D L R L L I V Y P S R Score orilla V E V D L R L L I V Y P S R V Pongo D E V E L R L F V V P Q
62 length Seq # 1 length Seq # 2 # alignments , , , , ,462, ,425,834,724,419
63 Needleman-Wunsch algorithm (paraphrased) Work from the top left (beginning of both sequences) For each cell store the highest score possible for that cell and a back pointer to tell point to the previous step in the best path When you reach the lower right corner, you know the optimal score and the back pointers tell you the alignment. he highest-score calculation at each cell only depends on its the cell s three possible previous neighbors. If one sequence is length N, and the other is length M, then Needleman-Wunsch only takes 6N M calculations. But there are a much larger number of possible alignments.
64 V E V D V D E V 0
65 V E V D V D E V
66 V D E V V -5 4 E -10 V D
67 V D E V V E V -15 D
68 V D E V V E V D
69 V D E V V E V D
70 V D E V E L V E V D L R L L I V Y P S R
71
72 ligning multiple sequences B D E
73 Progressive alignment Devised by Feng and Doolittle 1987 and Higgins and Sharp, n approximate method for producing multiple sequence alignments using a guide tree. Perform pairwise alignments to produce a distance matrix Produce a guide tree from the distances Use the guide tree to specify the ordering used for aligning sequences, closest to furthest.
74 PEEKSVLWKVNVDEV B EEKVLLWDKVNEEEV PDKNVKWKVHEY D DKNVKWSKVHEY E EHEWQLVLHVWKVEDVHQ pairwise alignment - B D E tree inference PEEKSVLWKVNVDEV B EEKVLLWDKVNEEEV PDKNVKWKVHEY + D DKNVKWSKVHEY E EHEWQLVLHVWKVEDVHQ B D E alignment stage PEEKSVLWKVN--VDEV B EEKVLLWDKVN--EEEV PDKNVKWKVHEY D DKNVKWSKVHEY E EHEWQLVLHVWKVEDVHQ
75 lignment stage of progressive alignments Sequences of clades become grouped into profiles as the algorithm descends the tree. he next youngest internal nodes is selected at each step to create a new profile. lignment at each step involves Sequence-Sequence Sequence-Profile Profile-Profile
76 ligning multiple sequences B D E Seq-Seq Seq-Seq Seq-roup 0.09 roup-roup
77 Profile to Profile alignment V E V D L R L L I Y P S R V E D E V L M R L F V P Q L D D E V - V R L F V P Q V E I D L - - L L L Y P R V V E V E L - - L L L Y P K I
78 Profile to profile alignments dding a gap to a profile means that every member of that group of sequences gets a gap at that position of the sequence. Usually the scores for each edge in the Needleman-Wünsch graph are calculated using a sum of pairs scoring system. clustal W 2 uses weights assigned to each sequence in a profile group to downweight closely related sequences so that they are not overrepresented. 2 hompson, Higgins, and ibson. Nuc. cids. Res. 1994
79 Profile 1 Profile 2 Seq weight taxon 0.3 V taxon 0.24 taxon E 0.19 I Seq weight taxon B 0.15 V taxon D 0.25 M D P 1,P 2 = i j w iw j d ij n i n j = 1 6 [d(v, V )w w B + d(v, M)w w D + d(, V )w w B... =... d(, M)w w D + d(i, V )w E w B + d(i, M)w E w D ] = 1 ( = =
80 Opinion Opinion Dealing with alignment ambiguity 3 RENDS in Eco (a) X Y X Z Y Z (a) X (b) (b) Outgroup axon axon B axon axon D axon E (d) Outgroup Outgroup Outgroup axon axon axon RENDS in Ecology axon & Evolution B Vol.16 No.12 axon Decem B ber 2001 axon B axon axon axon axon D axon D axon D axon E axon E axon E X Y Z X Y X Z Y Z (c) (c) (d) Outgroup Outgroup axon axon axon B axon B axon axon axon D - axon - Elision - D axon E - axon - - E B (e) X (e) Y X Z X Y Y Z from M. S. Y. Lee, REE, combined ( concatenated ) into 0 a 1 sin 2 4, 6, 8, 9 Outgroup axon data sets because strong phylogenet required to generate incongruence; B latter criteria might lead to choosing containg the least phylogenetic info DE In the elision method, a range of pla D alignments is generated as detailed instead of being analysed separately and evaluated in a single analysis 1,1 combining the two possible alignme Outgroup axon
81 682 Opinion Dealing with alignment ambiguity 4 - deletion RENDS in Eco (a) X Y Z (b) X Outgroup axon axon B axon axon D axon E Outgroup axon axon B axon axon D axon E X Z (c) Outgroup Outgroup axon axon axon B axon B axon axon axon D axon D axon E axon E X Y X Z Y Z (d) Outgroup axon axon B axon axon - - D-??? axon - - E-??? B DE D 4 (e) X Y Z from M. S. Y. Lee, REE, X Y Outgroup axon
82 ) xon E utgroup axon axon B axon axon D axon E axon E Dealing with alignment ambiguity 5 X Y Z Elision method (Wheeler, 1995) involves simply concatenating matrices ) X Y Z Outgroup axon axon B axon axon D axon E X Y Z "Y" from M. S. Y. Lee, REE, 2001 Outgroup 1 axon 1 axon B 2 axon 2 (d) (g) From state B DE X Y Z o state , 6, 8, 9 Deletion In th alig inst com and com into succ he is ac taxa iden he ever as th (reg dow of th cons L met
83 Simultaneous tree inference and alignment Ideally we would address uncertainty in both types of inference at the same time llows for application of statistical models to improve inference and assessments of reliability Just now becoming feasible: POY (Wheeler, ladstein, Laet, 2002), Handel (Holmes and Bruno, 2001), BliPhy (Redelings and Suchard, 2005), and BES(Lunter et al., 2005, Drummond and Rambaut, 2003). Se (Liu et al 2009; Yu and Holder software).
InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9
Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic
More informationWoods Hole brief primer on Multiple Sequence Alignment presented by Mark Holder
Woods Hole 2014 - brief primer on Multiple Sequence Alignment presented by Mark Holder Many forms of sequence alignment are used in bioinformatics: Structural Alignment Local alignments Global, evolutionary
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationPhylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center
Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationThanks to Paul Lewis and Joe Felsenstein for the use of slides
Thanks to Paul Lewis and Joe Felsenstein for the use of slides Review Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPGMA infers a tree from a distance
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the
More informationConcepts and Methods in Molecular Divergence Time Estimation
Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks
More informationUsing phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)
Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationEffects of Gap Open and Gap Extension Penalties
Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More information9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)
I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by
More informationSequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University
Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of
More informationMolecular phylogeny How to infer phylogenetic trees using molecular sequences
Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues
More informationPhylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches
Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell
More informationMolecular phylogeny How to infer phylogenetic trees using molecular sequences
Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods
More informationPhylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X
More informationA (short) introduction to phylogenetics
A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field
More informationPhylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.
Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony
More informationMichael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D
7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood
More informationC3020 Molecular Evolution. Exercises #3: Phylogenetics
C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from
More informationBINF6201/8201. Molecular phylogenetic methods
BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics
More informationQuantifying sequence similarity
Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally
More informationPractical considerations of working with sequencing data
Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!
More informationIntegrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley
Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;
More informationSequence Alignment (chapter 6)
Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:
More informationMul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu
Mul$ple Sequence Alignment Methods Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu Species Tree Orangutan Gorilla Chimpanzee Human From the Tree of the Life
More informationEvolutionary Models. Evolutionary Models
Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment
More informationPage 1. Evolutionary Trees. Why build evolutionary tree? Outline
Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and
More informationBioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony
ioinformatics -- lecture 9 Phylogenetic trees istance-based tree building Parsimony (,(,(,))) rees can be represented in "parenthesis notation". Each set of parentheses represents a branch-point (bifurcation),
More informationSequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University
Sequence Alignment: Scoring Schemes COMP 571 Luay Nakhleh, Rice University Scoring Schemes Recall that an alignment score is aimed at providing a scale to measure the degree of similarity (or difference)
More informationX X (2) X Pr(X = x θ) (3)
Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree
More informationPhylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline
Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationLecture 11 Friday, October 21, 2011
Lecture 11 Friday, October 21, 2011 Phylogenetic tree (phylogeny) Darwin and classification: In the Origin, Darwin said that descent from a common ancestral species could explain why the Linnaean system
More informationConsistency Index (CI)
Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)
More informationPhylogenetic analyses. Kirsi Kostamo
Phylogenetic analyses Kirsi Kostamo The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species,
More informationCHAPTERS 24-25: Evidence for Evolution and Phylogeny
CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology
More informationEstimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057
Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number
More informationTree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More informationPhylogenetics. BIOL 7711 Computational Bioscience
Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium
More informationInferring Molecular Phylogeny
Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction
More informationPhylogeny Tree Algorithms
Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More information"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian
More informationInferring Molecular Phylogeny
r. Walter Salzburger The tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 2 1. Molecular Markers Inferring Molecular Phylogeny 3 Immunological comparisons! Nuttall
More informationPhylogeny: building the tree of life
Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan
More informationReconstructing the history of lineages
Reconstructing the history of lineages Class outline Systematics Phylogenetic systematics Phylogenetic trees and maps Class outline Definitions Systematics Phylogenetic systematics/cladistics Systematics
More informationHow to read and make phylogenetic trees Zuzana Starostová
How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation
More informationLecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)
Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from
More informationPairwise & Multiple sequence alignments
Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived
More informationTheory of Evolution Charles Darwin
Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties
More informationMany of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationWhat is Phylogenetics
What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)
More informationMultiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:
Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:50 5001 5 Multiple Sequence Alignment The first part of this exposition is based on the following sources, which are recommended reading:
More informationMolecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016
Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,
More informationEvolutionary trees. Describe the relationship between objects, e.g. species or genes
Evolutionary trees Bonobo Chimpanzee Human Neanderthal Gorilla Orangutan Describe the relationship between objects, e.g. species or genes Early evolutionary studies The evolutionary relationships between
More informationSingle alignment: Substitution Matrix. 16 march 2017
Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block
More information1 ATGGGTCTC 2 ATGAGTCTC
We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that
More informationAnatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses
Anatomy of a tree outgroup: an early branching relative of the interest groups sister taxa: taxa derived from the same recent ancestor polytomy: >2 taxa emerge from a node Anatomy of a tree clade is group
More informationComputational Biology
Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,
More informationCopyright 2000 N. AYDIN. All rights reserved. 1
Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment
More informationPhylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?
Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species
More informationBackground: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)
Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?
More informationChapter 26 Phylogeny and the Tree of Life
Chapter 26 Phylogeny and the Tree of Life Chapter focus Shifting from the process of how evolution works to the pattern evolution produces over time. Phylogeny Phylon = tribe, geny = genesis or origin
More information(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise
Bot 421/521 PHYLOGENETIC ANALYSIS I. Origins A. Hennig 1950 (German edition) Phylogenetic Systematics 1966 B. Zimmerman (Germany, 1930 s) C. Wagner (Michigan, 1920-2000) II. Characters and character states
More informationIntraspecific gene genealogies: trees grafting into networks
Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More informationCONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018
CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationOverview Multiple Sequence Alignment
Overview Multiple Sequence Alignment Inge Jonassen Bioinformatics group Dept. of Informatics, UoB Inge.Jonassen@ii.uib.no Definition/examples Use of alignments The alignment problem scoring alignments
More informationBioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics
Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods
More informationMoreover, the circular logic
Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT
More informationBayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies
Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development
More informationPhylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz
Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels
More informationPhylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science
Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.
More informationPhylogenetics in the Age of Genomics: Prospects and Challenges
Phylogenetics in the Age of Genomics: Prospects and Challenges Antonis Rokas Department of Biological Sciences, Vanderbilt University http://as.vanderbilt.edu/rokaslab http://pubmed2wordle.appspot.com/
More informationModule: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment
Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand
More informationMultiple Sequence Alignment. Sequences
Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationPhylogeny. November 7, 2017
Phylogeny November 7, 2017 Phylogenetics Phylon = tribe/race, genetikos = relative to birth Phylogenetics: study of evolutionary relationships among organisms, sequences, or anything in between Related
More information8/23/2014. Phylogeny and the Tree of Life
Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major
More informationTheory of Evolution. Charles Darwin
Theory of Evolution harles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (8-6) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties
More informationPhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence
PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence Are directed quartets the key for more reliable supertrees? Patrick Kück Department of Life Science, Vertebrates Division,
More informationPhylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction
More information7. Tests for selection
Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info
More informationPrinciples of Phylogeny Reconstruction How do we reconstruct the tree of life? Basic Terminology. Looking at Trees. Basic Terminology.
Principles of Phylogeny Reconstruction How do we reconstruct the tree of life? Phylogeny: asic erminology Outline: erminology Phylogenetic tree: Methods Problems parsimony maximum likelihood bootstrapping
More informationSequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of
More informationQuestions we can ask. Recall. Accuracy and Precision. Systematics - Bio 615. Outline
Outline 1. Mechanistic comparison with Parsimony - branch lengths & parameters 2. Performance comparison with Parsimony - Desirable attributes of a method - The Felsenstein and Farris zones - Heterotachous
More information