Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT
|
|
- Gwen Griffith
- 5 years ago
- Views:
Transcription
1 Inferring phylogeny Constructing phylogenetic trees Tõnu Margus
2 Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions we can ask? How to infer phylogeny?
3 Phylogenetics In biology, phylogenetics is the study of evolutionary relatedness among various groups of organisms (for example, species or populations), which is discovered through molecular sequencing data and morphological data matrices. A phylogenetic analysis is the scientific procedure that lets you reconstruct the evolutionary history of a group of organisms or sequences.
4 Time aspect - We are successful descendants of our predecessors
5 Some surprising cases of using concept of phylogeny Who is Probo? warnings!!! do not take it too seriously!
6 warnings!!! do not take it too seriously!
7 Contents What is phylogeny? How/why is possible to infer phylogeny? Representing evolutionary relationships on trees What type questions questions we can ask? How to infer phylogeny?
8 How/why sequences evolve? Copies are made from chromosomes and divided between daughter cells during cell division. However, these copies are not perfect they contain few mistakes, also referred as mutations. accumulating mutations into sequence have been proposed to be proportional with time what separates them from ancestral sequence Cell division. Yellow are chromosomes Wang Z et al 2010 ATGTGGCATTAGCGGCTATTCGGC ATGTGGCAGTAGCGGCTATTCGGC ATGTGGCAGTAGCGTTTATTCGGC ATGTGGCAGTAG--TTTATTCTGC ATGTGGCAGTAG---TTATTCTGC AGTTGGCATTAG---TTATTCTGC
9 How/why is possible to infer phylogeny? Because, accumulating mutations into sequence is proportional with time Closely related sequences are more similar i. e. differences between them are smaller Differences can be expressed as proportion of changed positions between two sequences for example changes pre position A - B - ATGTGGCATT ATGTGGCAGT there is one difference per 10 nt => 0.1 changes pre position
10 Calculating distance Pair wise distances are calculated for each pair of sequences and expressed, for example, as changes pre position pair wise distances A - B - C - ATGTGGCATT ATGTGGCAGT ATCTCGTAGT A 0 B C A B C
11 Drawing tree neighbour joining First, closest sequences are connected Then more distant are added A 0 B C A B C A B C
12 Contents What is phylogeny? How/why is possible to infer phylogeny? Representing evolutionary relationships on trees What type questions questions we can ask? How to infer phylogeny?
13 Understanding tree A, B &C are LEAVES or OTU (operational taxonomic unit) Hypothetical ancestral sequence could occupy NODE NODE A 0 B C A B C A B leaf or OTU C branch length
14 Rooted and unrooted trees ROOTING means the determining of ancestral node Other leaves (OTU's) are rearranged according it Often, root is represented by additional line without OTU ancestral node for A, B & C ROOT B A C
15 Rooted and unrooted trees For ROOTING we can use OUTGROUP OUTGROUP can be a sequence from organism, what inhabits earth before these species, which we try to root ancestral node for human & horse B A human horse C mouse as outgroup For example, for rooting human and horse we can use mouse as outgoup
16 Rooted trees ROOTING defines the branching order by placing it into proper line in time More ancient events are close to ROOT and more recent events ace close to LEAFs ancestral node for A, B & C ROOT B A C t i m e
17 Unrooted trees different ways for presenting the same tree A B B A C C often used for unrooted trees
18 Contents What is phylogeny? How/why is possible to infer phylogeny? Representing evolutionary relationships on trees What type questions questions we can ask? How to infer phylogeny?
19 Four major reasons why you may want to use phylogenetics Determining the closest relatives of the organism that you re interested in: For instance, if you re studying a new bacterium, you can sequence and use its ribosomal RNA for constructing a phylogenetic tree Discovering the function of a gene: you can use phylogenetic trees to be sure that the gene you re interested in is orthologous to another well-characterized gene in another species Retracing the origin of a gene: From time to time, individual genes may jump from one species to another. Phylogenetic trees are a great way to reveal such events, which are called horizontal (or lateral) transfers Characterizing a gene family: Describing the structure of the gene/protein family: determine functionally homogeneous subsets families/subfamilies; gene distribution in different organisms; gene duplications and LGT
20 Determining closest relatives example of environmental sample on 16S tree Firmicutes F i r m i c u t e s unknown species 16S
21 Resolve evolutionary history of living organisms constructing species trees
22 Discovering the function of a gene each group of proteins formed a distinct branche unrooted tree of translational GTPases Tree of translational GTPases (Margus et al 2007)
23 Example of lateral gene transfer (LGT) alpha-proteobacteria -proteobacteria becomes nested by a-protebacteria genes -proteobacteria have acquired an extra copy of EF-G laterally
24 Studying gene duplications and gene families Gene duplications are very widespread Gene duplication generates a material (copies) for evolution Copies start to change/evolve and, in some cases, based on them genes with the new functions appeared It makes difficult to recognize genes/proteins which carry the same function in different organisms mainly because it might difficult to choose the proper gene amongst many homologs Genes, which share common ancestry in two different organisms and are closest pair of proteins between them and called ORTHOLOGS
25 Orthologs and paralogs Duplicate genes in the same genome are called PARALOGS When time pass the functional differences might appear between PARALOGS
26 Orthologs and paralogs ORTHOLOGS ORTHOLOGS have the same function!
27 Example of a protein family tree Big tree is difficult to read We need some marker sequences we need good support Similar OTU's (subfamily, orthologs...) can be compressed COMPRESSED No. of sequences good support diversity diversity
28 Example of a protein family tree proteins in one compressed triangle are orthologs Phylogenetic tree of elongation factor G (EF-G). Four subfamilies are clearly seen
29 Associate function to orthologs UNKNOWN translocation and ribosome recycling translocationg recycling
30 Phylogenetic profiling map subfamilies to bacterial phylogeny Species tree based on 16S rrna EF-G Subfamilies -proteobacteria -proteobacteria Spirocheates disappearing EF-G-I appearing spdefgs
31 Go to infer function?! We have well-defined sets of homologous proteins Function is charactherized for three of them Several 3-D structures are available Several functional domains/ motifs/& amino acids has been determined for EF-G I Can we find positions which are characterizing best the unknown yet the function of this subfamily?
32 Contents What is phylogeny? How/why is possible to infer phylogeny? Representing evolutionary relationships on trees What type questions questions we can ask? How to infer phylogeny? methods data estimating reliability
33 Methods for building trees
34 There are two very different ways to produce trees The first one uses ClustalW; it s quick, hassle-free, and somehow very similar to (good) fast food. The second step list is for those of you out there who love to buy fresh vegetables on the market and make your own salad dressing sticklers for detail, if you catch our drift. With these steps, you can control every ingredient that...
35 Distance methods Single statistics DISTANCE is calculated for each pair of sequences Based on distances, the final tree were built using neighbour joining (nj) or UPGMA methods For calculating DISTANCE better methods are using: amino acid distance matrix (PAM or BLOSUM) correcting distance for multiple changes at the same position enabling position classes (invariant, slowly evolving, medium and fast) Distance methods are fast You can use much large (~10 times) datasets than for ML of Bayesian methods
36 Parsimony methods (MP- maximum parsimony) are good for inferring trees from DNA data many models for DNA evolution models for coding region and for each position in codon separately No models for protein evolution Likelihood methods (ML maximum likelihood) takes into account amino acids replacement patterns observed from sequences (PAM, BLOSUM, WAG... many different models) use for computing protein trees All these methods are CPU expensive and it might take days to compute tree for organism
37 Data selected sequences for input are crucial
38 About importance of preparing data
39 Choosing the right sequences for the right tree There is the assumption that the sequences you are comparing have a common ancestor
40 Using DNA or protein? DNA > 70% identical You can align it If it is coding sequence then align protein coding DNA < 70% Translate to protein and align proteins; then use protein alignment to align DNA on codon base If most synonymous sites are different; refers to saturation. It is safer use proteins for phylogeny If synonymous sites are not saturated, distance measure from DNA is more accurate
41 Choosing sequences to make either a gene tree or a species tree Homologous genes are genes that derive from a common ancestor. They can have three types of relationships: When you need species tree use ONLY Orthologs They are only separated by speciation Orthologs Paralogs Separated by duplication event (within a genome) Xenologs Xeno gr. foreigner result of LGT between two organisms when original copy of a gene is replaced with a foreigner ortholog
42 Pre-computed sets of orthologous proteins/genes COG Clustero of Orthologous Groups at NCBI by Eugene Koonin Collection of homologous genes include HOGENOM and HOVERGEN developed by the Pôle Bioinformatique Lyonnais RDP II Ribosomal database project contains mainly bacterial structurally aligned rrna sequences. Here are lot of paralogues, however widely used for inferring species trees of bacteria
43 Create the perfect set
44 Preparing your multiple sequence alignment Computing multiple sequence alignment (ClustalW, T-Coffee, MUSCLE) Making sure you have the right multiple sequence alignment The quality of your multiple sequence alignment is the real limiting factor when you make a tree; there is no way you can make a good tree with a bad alignment.
45 To ensure that your multiple alignment is both accurate and suitable 1. Make sure there are as many gap-free columns as possible. Gaps cause trouble for most phylogeny reconstruction methods. 2. Remove the extremities of your multiple alignment. The N-terminus and the C-terminus tend to be poorly conserved and therefore poorly aligned. You can safely remove them 3. Remove the gap-rich regions of your alignment. Internal, gap-rich regions in a multiple sequence alignment often correspond to loops. 4. Be sure to keep the most informative blocks. The ideal multiple alignment for building a tree would be a high-quality alignment of sequences with a low level of identity How a good block looks like? It s typically 20 to 30 amino acids long, and contains a few conserved positions. Such blocks are ideal for producing high-quality trees. You can use the T-coffee server to evaluate your multiple sequence alignment and remove columns that are unlikely to be correctly aligned The best way to edit your multiple alignment is to use BioEdit or Jalview
46 Contents What is phylogeny? How/why is possible to infer phylogeny? Representing evolutionary relationships on trees What type questions questions we can ask? How to infer phylogeny? methods data estimating reliability
47 The spectrum of available sequences is restricted with the current time window generally... NO ancestral type of sequences have preserved Therefore, evolutionary history need to be reconstructed by using data what have survived and available NOW A B C D E current timeframe time
48 Q? about reliability of a tree True tree: There is only one true tree Inferred tree: A tree that is obtained by using a certain set of data and a certain method of tree reconstruction How reliable is inferred tree?
49 Bootstrapping - step 1 generate alignments based on original Bootstrap Alignment n
50 Bootstrapping - step 2 computing trees seq1 seq2 tree 1 seq3 seq1 seq2 tree 2 seq3 Bootstrap Alignment n n's tree
51 Bootstrapping - step 3 computing consensus trees tree seq1 seq2 seq3 seq1 seq2 seq % considered very good 96-80% is good < 50% branches are not supported 80 seq1 seq2 seq3 this branching order was found in 80% cases (bootstrap trees)
52 Second round models of the sequence evolution
53 Flow diagram
54 Assumptions Phylogenetic reconstruction from a set of homologous sequences would be considerably easier than it is if two conditions had held during sequence evolution; first, that all the sequences evolved at a constant mutation rate for all mutations at all times; If true then the number of observed differences between any two aligned sequences would be directly proportional to the time elapsed since they diverged from their most recent common ancestor. second, that the sequences have only diverged to a moderate degree such that no position has been subjected to more than one mutation. If true then once the sequences have been accurately aligned, all the mutational events could be observed as non identical aligned bases and the mutation could be assumed to be from one base to the other.
55 Distance correction The simplest way of estimating the evolutionary distance from an alignment is to count the fraction of nonidentical alignment positions, to obtain a measure called the p-distance.
56 The rate of accepted mutation is usually not the same for all types of base substitution The models of evolution used for phylogenetic analysis define base mutation rates and substitution preferences for each position in the alignment. The simplest models assume all rates to be identical and time-invariant with no substitution preferences. More sophisticated models have been proposed that relax these assumptions.
57 Transitions trasversions if a purine base is replaced by another purine on mutation, or a pyrimidine by a pyrimidine, the structure will suffer little if any distortion. Such mutations are called transition mutations, as opposed to transversions in which purines become pyrimidines or vice versa transitions transversions transversions Note that there are twice as many ways of generting transvertions than transitions
58 Different codon positions have different mutation rate The points on each line represent percentage GC values for each of 11 bacteria at the codon position indicated, plotted against the overall genome percentage GC content. While all three codon positions adapt to some extent to the compositional bias of the genome, the third position adapts most.
59 Distance and distance correction p-distance If an alignment of two sequences has L positions (gaps excluded), of which D differ, then the fractional alignment difference, usually called the p- distance, is defined p = D L Poisson distance correction takes account of multiple mutations at the same site d P = - ln(1-p) The assumption have made, that each position in a given sequence mutates with the same rate (r) Where the p is p-distance and the d P is corrected distance called Poisson distance Gamma distribution assumes that different positions can mutate with a different rates (r)
60 Gamma distribution (Γ) Gamma distribution, proposed by T. Uzzell and K Corbinin 1971, which takes account of mutation rate variation at different sequence positions. Corrected distance is called the gamma distance d Γ = a (1-p) -1/a -1 It is more realistic model for distribution of sites with different mutation rate constants. Such a distribution can be written with one parameter α, which determines the site variation. Values of α have been estimated from data. When p < 0.2 then d Γ is not significantly different from p-distance
"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally
More informationIntroduction to Bioinformatics Introduction to Bioinformatics
Dr. rer. nat. Gong Jing Cancer Research Center Medicine School of Shandong University 2012.11.09 1 Chapter 4 Phylogenetic Tree 2 Phylogeny Evidence from morphological ( 形态学的 ), biochemical, and gene sequence
More informationC3020 Molecular Evolution. Exercises #3: Phylogenetics
C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the
More informationMolecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016
Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,
More informationBINF6201/8201. Molecular phylogenetic methods
BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationBioinformatics tools for phylogeny and visualization. Yanbin Yin
Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and
More information8/23/2014. Phylogeny and the Tree of Life
Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major
More informationBio 1B Lecture Outline (please print and bring along) Fall, 2007
Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution
More informationChapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships
Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic
More informationElements of Bioinformatics 14F01 TP5 -Phylogenetic analysis
Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila
More informationTree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More informationWhat is Phylogenetics
What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)
More informationUoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)
- Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationSequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of
More informationQuantifying sequence similarity
Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity
More informationPhylogenetics: Building Phylogenetic Trees
1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationPhylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University
Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationEvolutionary Tree Analysis. Overview
CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based
More informationLecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)
Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from
More informationCHAPTERS 24-25: Evidence for Evolution and Phylogeny
CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationPhylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?
Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species
More informationPhylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline
Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying
More informationCONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018
CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of
More informationMolecular phylogeny How to infer phylogenetic trees using molecular sequences
Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues
More informationPhylogenetics. BIOL 7711 Computational Bioscience
Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium
More informationMolecular phylogeny How to infer phylogenetic trees using molecular sequences
Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues
More informationPhylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki
Phylogene)cs IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, 2016 Joyce Nzioki Phylogenetics The study of evolutionary relatedness of organisms. Derived from two Greek words:» Phle/Phylon: Tribe/Race» Genetikos:
More information1 ATGGGTCTC 2 ATGAGTCTC
We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that
More informationMETHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.
Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern
More information9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)
I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by
More informationGENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.
!! www.clutchprep.com CONCEPT: OVERVIEW OF EVOLUTION Evolution is a process through which variation in individuals makes it more likely for them to survive and reproduce There are principles to the theory
More informationPhylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.
Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony
More informationCladistics and Bioinformatics Questions 2013
AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species
More informationPhylogenetic analyses. Kirsi Kostamo
Phylogenetic analyses Kirsi Kostamo The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species,
More informationMiGA: The Microbial Genome Atlas
December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From
More informationOrthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona
Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona (tgabaldon@crg.es) http://gabaldonlab.crg.es Homology the same organ in different animals under
More information08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega
BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments
More informationPhylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.
Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class
More informationA (short) introduction to phylogenetics
A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field
More informationLecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).
1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff
More informationConsensus Methods. * You are only responsible for the first two
Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is
More informationPhylogenetic inference: from sequences to trees
W ESTFÄLISCHE W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT NIVERSITÄT WILHELMS-U ÜNSTER MM ÜNSTER VOLUTIONARY FUNCTIONAL UNCTIONAL GENOMICS ENOMICS EVOLUTIONARY Bioinformatics 1 Phylogenetic inference: from sequences
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More informationA Phylogenetic Network Construction due to Constrained Recombination
A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer
More informationComparative Genomics II
Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods
More informationC.DARWIN ( )
C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships
More informationPhylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz
Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels
More informationAnatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses
Anatomy of a tree outgroup: an early branching relative of the interest groups sister taxa: taxa derived from the same recent ancestor polytomy: >2 taxa emerge from a node Anatomy of a tree clade is group
More informationComputational approaches for functional genomics
Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding
More informationBioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics
Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods
More informationHomology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB
Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded
More informationSession 5: Phylogenomics
Session 5: Phylogenomics B.- Phylogeny based orthology assignment REMINDER: Gene tree reconstruction is divided in three steps: homology search, multiple sequence alignment and model selection plus tree
More informationMicrobial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B
Microbial Diversity and Assessment (II) Spring, 007 Guangyi Wang, Ph.D. POST03B guangyi@hawaii.edu http://www.soest.hawaii.edu/marinefungi/ocn403webpage.htm General introduction and overview Taxonomy [Greek
More informationMolecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço
Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço jcarrico@fm.ul.pt Charles Darwin (1809-1882) Charles Darwin s tree of life in Notebook B, 1837-1838 Ernst Haeckel (1934-1919)
More informationPhylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science
Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.
More informationComparative Bioinformatics Midterm II Fall 2004
Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationHow to read and make phylogenetic trees Zuzana Starostová
How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation
More informationBig Idea #1: The process of evolution drives the diversity and unity of life
BIG IDEA! Big Idea #1: The process of evolution drives the diversity and unity of life Key Terms for this section: emigration phenotype adaptation evolution phylogenetic tree adaptive radiation fertility
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationChapter 19: Taxonomy, Systematics, and Phylogeny
Chapter 19: Taxonomy, Systematics, and Phylogeny AP Curriculum Alignment Chapter 19 expands on the topics of phylogenies and cladograms, which are important to Big Idea 1. In order for students to understand
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More information7. Tests for selection
Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info
More informationBioinformatics Exercises
Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted
More informationMultiple Sequence Alignment. Sequences
Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe
More informationResearch Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.
Research Proposal Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Name: Minjal Pancholi Howard University Washington, DC. June 19, 2009 Research
More informationMassachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution
Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral
More informationPhylogeny: building the tree of life
Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan
More informationBayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies
Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development
More informationModule: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment
Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand
More informationOrthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona
Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Toni Gabaldón Contact: tgabaldon@crg.es Group website: http://gabaldonlab.crg.es Science blog: http://treevolution.blogspot.com
More informationReading for Lecture 13 Release v10
Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................
More informationWarm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab
Date: Agenda Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab Ask questions based on 5.1 and 5.2 Quiz on 5.1 and 5.2 How
More informationChapter 27: Evolutionary Genetics
Chapter 27: Evolutionary Genetics Student Learning Objectives Upon completion of this chapter you should be able to: 1. Understand what the term species means to biology. 2. Recognize the various patterns
More informationPhylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)
Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction Lesser Tenrec (Echinops telfairi) Goals: 1. Use phylogenetic experimental design theory to select optimal taxa to
More informationBiochemistry 324 Bioinformatics. Pairwise sequence alignment
Biochemistry 324 Bioinformatics Pairwise sequence alignment How do we compare genes/proteins? When we have sequenced a genome, we try and identify the function of unknown genes by finding a similar gene
More informationBiol478/ August
Biol478/595 29 August # Day Inst. Topic Hwk Reading August 1 M 25 MG Introduction 2 W 27 MG Sequences and Evolution Handouts 3 F 29 MG Sequences and Evolution September M 1 Labor Day 4 W 3 MG Database
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationTheory of Evolution Charles Darwin
Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties
More informationPhylogeny and the Tree of Life
Chapter 26 Phylogeny and the Tree of Life PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from
More informationExample of Function Prediction
Find similar genes Example of Function Prediction Suggesting functions of newly identified genes It was known that mutations of NF1 are associated with inherited disease neurofibromatosis 1; but little
More informationSCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology
SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION Using Anatomy, Embryology, Biochemistry, and Paleontology Scientific Fields Different fields of science have contributed evidence for the theory of
More informationAdditive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.
Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then
More informationUsing Bioinformatics to Study Evolutionary Relationships Instructions
3 Using Bioinformatics to Study Evolutionary Relationships Instructions Student Researcher Background: Making and Using Multiple Sequence Alignments One of the primary tasks of genetic researchers is comparing
More informationBiological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor
Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms
More informationPage 1. Evolutionary Trees. Why build evolutionary tree? Outline
Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny
More informationPhylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center
Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods
More informationSequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University
Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of
More informationEvolution by duplication
6.095/6.895 - Computational Biology: Genomes, Networks, Evolution Lecture 18 Nov 10, 2005 Evolution by duplication Somewhere, something went wrong Challenges in Computational Biology 4 Genome Assembly
More informationGene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family
Review: Gene Families Gene Families part 2 03 327/727 Lecture 8 What is a Case study: ian globin genes Gene trees and how they differ from species trees Homology, orthology, and paralogy Last tuesday 1
More informationInDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9
Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic
More information