Fast Hash-Based Algorithms for Analyzing Tens of Thousands of Evolutionary Trees
|
|
- Baldwin Matthews
- 5 years ago
- Views:
Transcription
1 Fast Hash-Based Algorithms for Analyzing Tens of Thousands of Evolutionary Trees Tiffani L. Williams Department of Computer Science & Engineering Texas A&M University
2 What is an Evolutionary (or Phylogenetic) Tree? Evolutionary relationships between organisms (taxa) are depicted in a family tree structure called a phylogenetic tree. Popular techniques often return tens to hundreds of thousands of phylogenetic trees that represent equally-plausible hypotheses for how the taxa evolved from a common ancestor. (a) rooted tree (b) unrooted tree
3 Where do large tree collections come from? 33,306 trees on 567 taxa of flowering plants (U. of Florida) 90,000 trees on 264 taxa of fish (Texas A&M) 150,000 trees on 525 taxa of insects (Texas A&M) We need computational approaches for analyzing these large tree collections especially as the size of phylogenetic studies continue to increase.
4 Our Hash-Based Algorithms for Analyzing Large-Scale Tree Collections Let t represent the number of phylogenetic trees of interest. HashRF: computes a t t Robinson-Fould matrix MrsRF: multi-core version (using MapReduce) of HashRF HashCS: computes a majority or strict consensus tree TreeZip: compresses t trees into a smaller representation (c) MDS (d) Heatmap
5 Why Do We Need to Compress Trees? Large collections of trees can be expensive to store and transfer, and they are only continuing to growing in size. Ideally, we should be able to share tree collections quickly with little or no cost. The most convenient way to share trees seems to be via . Most of the tree collections we have received have been ed. However, large collections either had to be broken into smaller pieces and sent to us or hand-delivered to our lab. When compressed with our approach, these large collections can now be sent via .
6 Our Solution: TreeZip
7 The Newick File Format The Newick file format is the most widely used format to store a phylogenetic tree in a file. Topology of an phylogenetic tree is uniquely defined by its set of bipartitions. TreeZip represents these bipartitions internally as bitstrings. In Newick format, the phylogenetic tree is represented using a notation based on balanced parentheses.
8 Compression: Extracting Bipartitions from Trees (e) Newick file (f) Hash table Figure: During compression, TreeZip parses the input Newick file, extracting all bipartitions and inserting them into a hash table.
9 Compression: Generating.trz file (a) Hash table (b) Shared table Figure: Once the hash table is populated, contents are loaded into a ragged array structure called the shared table.
10 Compression: Generating.trz file (a) Shared table (b) Compact table Figure: Using the contents of the shared table, lines are compacted according to which is shorter: the original line or its complement.
11 Compression: Generating.trz file (a) Compact table (b) TreeZip (.trz) file Figure: Using this compact representation, the contents of the shared table are run length encoded and included as the heart of the.trz file.
12 Decompression (a) Bipartition Collector (b) Newick File Figure: In decompression, the bipartition data stored in the.trz file are extracted and used to rebuild the Newick file.
13 Experimental Methodology: Biological Tree Collections Datasets Taxa Trees File size (MB) Bipartitions 1 mammals 16 8, freshwater , ,168 3 angiosperms , ,444 4 fish , ,115 5 insects ,
14 Experimental Methodology: Platform and Performance Metric System: 2.5Ghz Intel Core 2 quad-core machine with 4GB of RAM running Ubuntu Linux Performance measure: Space savings S is the reduction in size relative to the uncompressed size. S = 1 Our implementation of TreeZip can be found at compressed file original file.
15 TreeZip Results
16 TreeZip Running Time: Compression + Decompression 7zip TreeZip TreeZip+7zip Total Time (s) fish angiosperms freshwater mammals Data Set insects
17 Using Different, but Equivalent Newick Strings There are O(2 n 1 ) newick strings for a tree with n taxa. The designers of TASPI note that their algorithm is affected by the ordering of the taxa in the Newick string. This will result in a larger compressed file. TreeZip, however, is not effected by the ordering of taxa in a Newick string.
18 TreeZip Results: Using Different, but Equivalent Newick Strings (Collections 1 5) 7zip TreeZip TreeZip+7zip Different Newick File/Original Newick File fish angiosperms freshwater mammals insects Data Set
19 Conclusions & Future Work Compression algorithms such as TreeZip will become critical tools for helping biologists manage their rapidly expanding phylogenetic tree collections. TreeZip allows large phylogenetic tree collections to be easily exchanged with others, which is essential for successful scientific collaboration. When compressed with TreeZip+7zip, our largest dataset (434 MB) is compressed to such a small size (32 KB), allowing it to be easily sent over ! In the future, TreeZip will be optimized for speed and will support branch lengths.
20 Thanks for Listening! A big thank you to: Ph.D. students: Suzanne Matthews and Seung-Jin Sul. Additional input from Grant Brammer, Charles Lively, Ralph Crosby and Brian Davis. Funding for this project was supported by NSF under grants DEB and IIS
21 Backup Slides Backup Slides
22 How Encoding Works
23 Decompression: Extracting Bipartitions from.trz file (a) TreeZip file (.trz) (b) Bipartition Collector Figure: In decompression, the bipartition data stored in the.trz file are loaded into the bipartition collector.
24 Decompression: Rebuilding Trees (a) Bipartition Collector (b) Newick File Figure: The bipartition collector is then used to rebuild the phylogenetic trees and output the corresponding Newick file.
25 TreeZip and TASPI Results (Collections 6 14): Full Results 10 gzip bz2 7zip TreeZip TreeZip+gzip TreeZip+bz2 TreeZip+7zip TASPI TASPI+bz2 Compression Ratio (%) lipsc439 john921 eern476 aster328 will2000 three567 rbcl500 ocho854 mari2594 Data Set TreeZip achieves a better (lower) compression ratio than TASPI on all these sets.
26 TreeZip Results (Collections 1 5): Full Results 100 gzip bz2 7zip TreeZip TreeZip+gzip TreeZip+bz2 TreeZip+7zip Compression Ratio (%) fish angiosperms freshwater mammals Data Set insects
27 TreeZip Results: Using Different, but Equivalent Newick Strings (Collections 1 5): Full Results gzip bz2 7zip TreeZip TreeZip+gzip TreeZip+bz2 TreeZip+7zip Different Newick File/Original Newick File fish angiosperms freshwater mammals insects Data Set
28 TreeZip Running Time: Compression + Decompression: Full Results gzip bz2 7zip TreeZip TreeZip+gzip TreeZip+bz2 TreeZip+7zip Total Time (s) fish angiosperms freshwater mammals Data Set insects
29 TreeZip Running Time: Compression (Full Results) gzip bz2 7zip TreeZip TreeZip+gzip TreeZip+bz2 TreeZip+7zip Compression Time (s) fish angiosperms freshwater mammals Data Set insects
30 TreeZip Running Time: Decompression (Full Results) gzip bz2 7zip TreeZip TreeZip+gzip TreeZip+bz2 TreeZip+7zip Decompression Time (s) fish angiosperms freshwater mammals Data Set insects
A Fitness Distance Correlation Measure for Evolutionary Trees
A Fitness Distance Correlation Measure for Evolutionary Trees Hyun Jung Park 1, and Tiffani L. Williams 2 1 Department of Computer Science, Rice University hp6@cs.rice.edu 2 Department of Computer Science
More informationComputing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome
Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression Sergio De Agostino Sapienza University di Rome Parallel Systems A parallel random access machine (PRAM)
More informationCONTENTS. P A R T I Genomes 1. P A R T II Gene Transcription and Regulation 109
CONTENTS ix Preface xv Acknowledgments xxi Editors and contributors xxiv A computational micro primer xxvi P A R T I Genomes 1 1 Identifying the genetic basis of disease 3 Vineet Bafna 2 Pattern identification
More informationChapter 19: Taxonomy, Systematics, and Phylogeny
Chapter 19: Taxonomy, Systematics, and Phylogeny AP Curriculum Alignment Chapter 19 expands on the topics of phylogenies and cladograms, which are important to Big Idea 1. In order for students to understand
More informationUsing MapReduce to Compare Large Collections of Phylogenetic Trees
Using MapReduce to ompare Large ollections of Phylogenetic Trees ryce Tyson ryce.tyson@usma.edu Nathaniel Rollings Nathaniel.Rollings@usma.edu Lisa Jones Lisa.Jones@usma.edu Rosemary etros Rosemary.etros@usma.edu
More information10 Biodiversity Support. AQA Biology. Biodiversity. Specification reference. Learning objectives. Introduction. Background
Biodiversity Specification reference 3.4.5 3.4.6 3.4.7 Learning objectives After completing this worksheet you should be able to: recall the definition of a species and know how the binomial system is
More informationChapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships
Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic
More informationA Tightly-coupled XML Encoderdecoder For 3D Data Transaction
A Tightly-coupled XML Encoderdecoder For 3D Data Transaction Commission III Spatial Information Management Siew Chengxi Bernad Khairul Hafiz Sharkawi 3D GIS Research Lab, Faculty of Geoinformation and
More informationBioinformatics tools for phylogeny and visualization. Yanbin Yin
Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and
More informationData Compression Techniques
Data Compression Techniques Part 2: Text Compression Lecture 5: Context-Based Compression Juha Kärkkäinen 14.11.2017 1 / 19 Text Compression We will now look at techniques for text compression. These techniques
More informationThe Origin of New Species
The Origin of New Species Introduction If microevolution is small changes in gene frequencies What, then would macroevolution be? And how might that work???? The biological species concept emphasizes reproductive
More informationC.DARWIN ( )
C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships
More informationLecture 6 - Raster Data Model & GIS File Organization
Lecture 6 - Raster Data Model & GIS File Organization I. Overview of Raster Data Model Raster data models define objects in a fixed manner see Figure 1. Each grid cell has fixed size (resolution). The
More informationarxiv: v1 [q-bio.pe] 27 Oct 2011
INVARIANT BASED QUARTET PUZZLING JOE RUSINKO AND BRIAN HIPP arxiv:1110.6194v1 [q-bio.pe] 27 Oct 2011 Abstract. Traditional Quartet Puzzling algorithms use maximum likelihood methods to reconstruct quartet
More informationCompressing Tabular Data via Pairwise Dependencies
Compressing Tabular Data via Pairwise Dependencies Amir Ingber, Yahoo! Research TCE Conference, June 22, 2017 Joint work with Dmitri Pavlichin, Tsachy Weissman (Stanford) Huge datasets: everywhere - Internet
More informationCompression Techniques for 3D SDI
Compression Techniques for 3D SDI Bernad S. ChengXi and Alias Abdul Rahman 3D GIS Research Lab Faculty of Geoinformation and Real Estate Universiti Teknologi Malaysia Outline Introduction Background of
More informationNJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees
NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana
More informationOrganizing Life s Diversity
17 Organizing Life s Diversity section 2 Modern Classification Classification systems have changed over time as information has increased. What You ll Learn species concepts methods to reveal phylogeny
More informationSequence comparison by compression
Sequence comparison by compression Motivation similarity as a marker for homology. And homology is used to infer function. Sometimes, we are only interested in a numerical distance between two sequences.
More informationBiological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor
Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms
More informationGTRAC FAST R ETRIEVAL FROM C OMPRESSED C OLLECTIONS OF G ENOMIC VARIANTS. Kedar Tatwawadi Mikel Hernaez Idoia Ochoa Tsachy Weissman
GTRAC FAST R ETRIEVAL FROM C OMPRESSED C OLLECTIONS OF G ENOMIC VARIANTS Kedar Tatwawadi Mikel Hernaez Idoia Ochoa Tsachy Weissman Overview Introduction Results Algorithm Details Summary & Further Work
More informationCryptographic Hash Functions
Cryptographic Hash Functions Çetin Kaya Koç koc@ece.orst.edu Electrical & Computer Engineering Oregon State University Corvallis, Oregon 97331 Technical Report December 9, 2002 Version 1.5 1 1 Introduction
More informationPHYLOGENY AND SYSTEMATICS
AP BIOLOGY EVOLUTION/HEREDITY UNIT Unit 1 Part 11 Chapter 26 Activity #15 NAME DATE PERIOD PHYLOGENY AND SYSTEMATICS PHYLOGENY Evolutionary history of species or group of related species SYSTEMATICS Study
More informationCSE 421 Greedy: Huffman Codes
CSE 421 Greedy: Huffman Codes Yin Tat Lee 1 Compression Example 100k file, 6 letter alphabet: File Size: ASCII, 8 bits/char: 800kbits 2 3 > 6; 3 bits/char: 300kbits better: 2.52 bits/char 74%*2 +26%*4:
More informationIntegrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley
Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley B.D. Mishler Feb. 7, 2012. Morphological data IV -- ontogeny & structure of plants The last frontier
More informationCladistics and Bioinformatics Questions 2013
AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species
More informationGeologic Time on a Strip of Paper
Geologic Time on a Strip of Paper Introduction The Earth is 4,600,000,000 years old. That s 4.6 billion years! But what does this mean? This activity is designed to help you get a feel for the age of the
More informationInvestigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST
Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Introduction Bioinformatics is a powerful tool which can be used to determine evolutionary relationships and
More informationExploring Treespace. Katherine St. John. Lehman College & the Graduate Center. City University of New York. 20 June 2011
Exploring Treespace Katherine St. John Lehman College & the Graduate Center City University of New York 20 June 2011 (Joint work with the Treespace Working Group, CUNY: Ann Marie Alcocer, Kadian Brown,
More informationPhylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz
Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels
More informationLecture V Phylogeny and Systematics Dr. Kopeny
Delivered 1/30 and 2/1 Lecture V Phylogeny and Systematics Dr. Kopeny Lecture V How to Determine Evolutionary Relationships: Concepts in Phylogeny and Systematics Textbook Reading: pp 425-433, 435-437
More informationHow should we organize the diversity of animal life?
How should we organize the diversity of animal life? The difference between Taxonomy Linneaus, and Cladistics Darwin What are phylogenies? How do we read them? How do we estimate them? Classification (Taxonomy)
More information2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51
2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each
More informationB. Phylogeny and Systematics:
Tracing Phylogeny A. Fossils: Some fossils form as is weathered and eroded from the land and carried by rivers to seas and where the particles settle to the bottom. Deposits pile up and the older sediments
More informationInformation and Entropy. Professor Kevin Gold
Information and Entropy Professor Kevin Gold What s Information? Informally, when I communicate a message to you, that s information. Your grade is 100/100 Information can be encoded as a signal. Words
More informationStandards A complete list of the standards covered by this lesson is included in the Appendix at the end of the lesson.
Lesson 8: The History of Life on Earth Time: approximately 45-60 minutes, depending on length of discussion. Can be broken into 2 shorter lessons Materials: Double timeline (see below) Meter stick (to
More informationSmith et al. American Journal of Botany 98(3): Data Supplement S2 page 1
Smith et al. American Journal of Botany 98(3):404-414. 2011. Data Supplement S1 page 1 Smith, Stephen A., Jeremy M. Beaulieu, Alexandros Stamatakis, and Michael J. Donoghue. 2011. Understanding angiosperm
More information情報処理学会研究報告 IPSJ SIG Technical Report Vol.2012-DBS-156 No /12/12 1,a) 1,b) 1,2,c) 1,d) 1999 Larsson Moffat Re-Pair Re-Pair Re-Pair Variable-to-Fi
1,a) 1,b) 1,2,c) 1,d) 1999 Larsson Moffat Re-Pair Re-Pair Re-Pair Variable-to-Fixed-Length Encoding for Large Texts Using a Re-Pair Algorithm with Shared Dictionaries Kei Sekine 1,a) Hirohito Sasakawa
More informationCS425: Algorithms for Web Scale Data
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org Challenges
More informationEvolutionary Tree Analysis. Overview
CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based
More informationText Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2
Text Compression Jayadev Misra The University of Texas at Austin December 5, 2003 Contents 1 Introduction 1 2 A Very Incomplete Introduction to Information Theory 2 3 Huffman Coding 5 3.1 Uniquely Decodable
More informationPHYS Statistical Mechanics I Assignment 4 Solutions
PHYS 449 - Statistical Mechanics I Assignment 4 Solutions 1. The Shannon entropy is S = d p i log 2 p i. The Boltzmann entropy is the same, other than a prefactor of k B and the base of the log is e. Neither
More informationBiol 206/306 Advanced Biostatistics Lab 11 Models of Trait Evolution Fall 2016
Biol 206/306 Advanced Biostatistics Lab 11 Models of Trait Evolution Fall 2016 By Philip J. Bergmann 0. Laboratory Objectives 1. Explore how evolutionary trait modeling can reveal different information
More informationSex, Bugs, and Pollen s Role
Sex, Bugs, and Pollen s Role Principle of Plant Biology #4 Reproduction in flowering plants takes place sexually, resulting in the production of a seed. Reproduction can also occur via asexual reproduction.
More informationHow to read and make phylogenetic trees Zuzana Starostová
How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation
More informationInferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies
Inferring Phylogenetic Trees Distance Approaches Representing distances in rooted and unrooted trees The distance approach to phylogenies given: an n n matrix M where M ij is the distance between taxa
More informationPhylogenetic Networks, Trees, and Clusters
Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University
More informationTheDisk-Covering MethodforTree Reconstruction
TheDisk-Covering MethodforTree Reconstruction Daniel Huson PACM, Princeton University Bonn, 1998 1 Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document
More informationChapters AP Biology Objectives. Objectives: You should know...
Objectives: You should know... Notes 1. Scientific evidence supports the idea that evolution has occurred in all species. 2. Scientific evidence supports the idea that evolution continues to occur. 3.
More informationImpression Store: Compressive Sensing-based Storage for. Big Data Analytics
Impression Store: Compressive Sensing-based Storage for Big Data Analytics Jiaxing Zhang, Ying Yan, Liang Jeff Chen, Minjie Wang, Thomas Moscibroda & Zheng Zhang Microsoft Research The Curse of O(N) in
More informationUoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)
- Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the
More informationA fast algorithm for the Kolakoski sequence
A fast algorithm for the Kolakoski sequence Richard P. Brent Australian National University and University of Newcastle 13 December 2016 (updated 30 Dec. 2016) Joint work with Judy-anne Osborn The Kolakoski
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationHighly-scalable branch and bound for maximum monomial agreement
Highly-scalable branch and bound for maximum monomial agreement Jonathan Eckstein (Rutgers) William Hart Cynthia A. Phillips Sandia National Laboratories Sandia National Laboratories is a multi-program
More informationUser s Manual for. Continuous. (copyright M. Pagel) Mark Pagel School of Animal and Microbial Sciences University of Reading Reading RG6 6AJ UK
User s Manual for Continuous (copyright M. Pagel) Mark Pagel School of Animal and Microbial Sciences University of Reading Reading RG6 6AJ UK email: m.pagel@rdg.ac.uk (www.ams.reading.ac.uk/zoology/pagel/)
More informationTaxon: generally refers to any named group of organisms, such as species, genus, family, order, etc.. Node: represents the hypothetical ancestor
A quick review Taxon: generally refers to any named group of organisms, such as species, genus, family, order, etc.. Node: represents the hypothetical ancestor Branches: lines diverging from a node Root:
More informationBiology 211 (2) Week 1 KEY!
Biology 211 (2) Week 1 KEY Chapter 1 KEY FIGURES: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 VOCABULARY: Adaptation: a trait that increases the fitness Cells: a developed, system bound with a thin outer layer made of
More informationSTEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)
STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University kubatko.2@osu.edu
More informationBio 1M: The evolution of apes. 1 Example. 2 Patterns of evolution. Similarities and differences. History
Bio 1M: The evolution of apes 1 Example Humans are an example of a biological species that has evolved Possibly of interest, since many of your friends are probably humans Humans seem unique: How do they
More informationA New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic Data
A New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic Data Mary E. Cosner Dept. of Plant Biology Ohio State University Li-San Wang Dept.
More informationPhylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?
Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them? Carolus Linneaus:Systema Naturae (1735) Swedish botanist &
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationImproved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts
Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts Philip Bille IT University of Copenhagen Rolf Fagerberg University of Southern Denmark Inge Li Gørtz
More informationCREATING PHYLOGENETIC TREES FROM DNA SEQUENCES
INTRODUCTION CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES This worksheet complements the Click and Learn developed in conjunction with the 2011 Holiday Lectures on Science, Bones, Stones, and Genes:
More informationEnumeration and generation of all constitutional alkane isomers of methane to icosane using recursive generation and a modified Morgan s algorithm
The C n H 2n+2 challenge Zürich, November 21st 2015 Enumeration and generation of all constitutional alkane isomers of methane to icosane using recursive generation and a modified Morgan s algorithm Andreas
More informationBIOLOGICAL SCIENCE. Lecture Presentation by Cindy S. Malone, PhD, California State University Northridge. FIFTH EDITION Freeman Quillin Allison
BIOLOGICAL SCIENCE FIFTH EDITION Freeman Quillin Allison 1 Lecture Presentation by Cindy S. Malone, PhD, California State University Northridge Roadmap 1 Key themes to structure your thinking about Biology
More informationCharles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel
SUPERTREE ALGORITHMS FOR ANCESTRAL DIVERGENCE DATES AND NESTED TAXA Charles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel Department of Mathematics and Statistics University of Canterbury
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationAnatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses
Anatomy of a tree outgroup: an early branching relative of the interest groups sister taxa: taxa derived from the same recent ancestor polytomy: >2 taxa emerge from a node Anatomy of a tree clade is group
More informationLecture 6 Phylogenetic Inference
Lecture 6 Phylogenetic Inference From Darwin s notebook in 1837 Charles Darwin Willi Hennig From The Origin in 1859 Cladistics Phylogenetic inference Willi Hennig, Cladistics 1. Clade, Monophyletic group,
More informationPhylogenetic analysis. Characters
Typical steps: Phylogenetic analysis Selection of taxa. Selection of characters. Construction of data matrix: character coding. Estimating the best-fitting tree (model) from the data matrix: phylogenetic
More information#A32 INTEGERS 10 (2010), TWO NEW VAN DER WAERDEN NUMBERS: w(2; 3, 17) AND w(2; 3, 18)
#A32 INTEGERS 10 (2010), 369-377 TWO NEW VAN DER WAERDEN NUMBERS: w(2; 3, 17) AND w(2; 3, 18) Tanbir Ahmed ConCoCO Research Laboratory, Department of Computer Science and Software Engineering, Concordia
More informationWhat is Phylogenetics
What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)
More informationMichael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D
7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood
More informationA Brief Introduction To. GRTensor. On MAPLE Platform. A write-up for the presentation delivered on the same topic as a part of the course PHYS 601
A Brief Introduction To GRTensor On MAPLE Platform A write-up for the presentation delivered on the same topic as a part of the course PHYS 601 March 2012 BY: ARSHDEEP SINGH BHATIA arshdeepsb@gmail.com
More informationQuestions Sometimes Asked About the Theory of Evolution
Chapter 9: Evidence for Plant and Animal Evolution Questions Sometimes Asked About the Theory of Evolution Many questions about evolution arise in Christian circles. We ll discuss just a few that we frequently
More information17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.
( c ) E p s t e i n, C a r t e r, B o l l i n g e r, A u r i s p a C h a p t e r 17: I n f o r m a t i o n S c i e n c e P a g e 1 CHAPTER 17: Information Science 17.1 Binary Codes Normal numbers we use
More informationIntroduction to Sequence Alignment. Manpreet S. Katari
Introduction to Sequence Alignment Manpreet S. Katari 1 Outline 1. Global vs. local approaches to aligning sequences 1. Dot Plots 2. BLAST 1. Dynamic Programming 3. Hash Tables 1. BLAT 4. BWT (Burrow Wheeler
More informationReducing storage requirements for biological sequence comparison
Bioinformatics Advance Access published July 15, 2004 Bioinfor matics Oxford University Press 2004; all rights reserved. Reducing storage requirements for biological sequence comparison Michael Roberts,
More informationWhole Genome Alignments and Synteny Maps
Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types
More informationCSEP 590 Data Compression Autumn Dictionary Coding LZW, LZ77
CSEP 590 Data Compression Autumn 2007 Dictionary Coding LZW, LZ77 Dictionary Coding Does not use statistical knowledge of data. Encoder: As the input is processed develop a dictionary and transmit the
More informationPlants. SC.912.L.14.7 Relate the structure of each of the major plant organs and tissues to physiological processes.
Plants SC.912.L.14.7 Relate the structure of each of the major plant organs and tissues to physiological processes. 1. Students will explain how the structures of plant tissues and organs are directly
More information9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)
I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by
More informationPHYLOGENY & THE TREE OF LIFE
PHYLOGENY & THE TREE OF LIFE PREFACE In this powerpoint we learn how biologists distinguish and categorize the millions of species on earth. Early we looked at the process of evolution here we look at
More informationInference of Parsimonious Species Phylogenies from Multi-locus Data
RICE UNIVERSITY Inference of Parsimonious Species Phylogenies from Multi-locus Data by Cuong V. Than A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE Doctor of Philosophy APPROVED,
More informationUSING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES
USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES HOW CAN BIOINFORMATICS BE USED AS A TOOL TO DETERMINE EVOLUTIONARY RELATIONSHPS AND TO BETTER UNDERSTAND PROTEIN HERITAGE?
More informationReading for Lecture 13 Release v10
Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................
More informationIntroduction To Marine Biology Bio 228
We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on your computer, you have convenient answers with introduction to marine
More informationBandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)
Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner
More informationAn Investigation of Phylogenetic Likelihood Methods
An Investigation of Phylogenetic Likelihood Methods Tiffani L. Williams and Bernard M.E. Moret Department of Computer Science University of New Mexico Albuquerque, NM 87131-1386 Email: tlw,moret @cs.unm.edu
More informationDraft document version 0.6; ClustalX version 2.1(PC), (Mac); NJplot version 2.3; 3/26/2012
Comparing DNA Sequences to Determine Evolutionary Relationships of Molluscs This activity serves as a supplement to the online activity Biodiversity and Evolutionary Trees: An Activity on Biological Classification
More information! Where are we on course map? ! What we did in lab last week. " How it relates to this week. ! Compression. " What is it, examples, classifications
Lecture #3 Compression! Where are we on course map?! What we did in lab last week " How it relates to this week! Compression " What is it, examples, classifications " Probability based compression # Huffman
More informationMultiple choice questions (1 pt each)
Ant1050 Exam 1 Spring 2012 Name: 1 Multiple choice questions (1 pt each) 1. Which of the following items of evidence supports the view that change occurs within species? a. polyploid hybridization in plants
More informationSupertree Algorithms for Ancestral Divergence Dates and Nested Taxa
Supertree Algorithms for Ancestral Divergence Dates and Nested Taxa Charles Semple 1, Philip Daniel 1, Wim Hordijk 1, Roderic D. M. Page 2, and Mike Steel 1 1 Biomathematics Research Centre, Department
More informationFundamentals of Computational Science
Fundamentals of Computational Science Dr. Hyrum D. Carroll August 23, 2016 Introductions Each student: Name Undergraduate school & major Masters & major Previous research (if any) Why Computational Science
More informationFast Hierarchical Clustering from the Baire Distance
Fast Hierarchical Clustering from the Baire Distance Pedro Contreras 1 and Fionn Murtagh 1,2 1 Department of Computer Science. Royal Holloway, University of London. 57 Egham Hill. Egham TW20 OEX, England.
More informationLecture 11 Friday, October 21, 2011
Lecture 11 Friday, October 21, 2011 Phylogenetic tree (phylogeny) Darwin and classification: In the Origin, Darwin said that descent from a common ancestral species could explain why the Linnaean system
More informationCOMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST
Big Idea 1 Evolution INVESTIGATION 3 COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST How can bioinformatics be used as a tool to determine evolutionary relationships and to
More informationALDEx: ANOVA-Like Differential Gene Expression Analysis of Single-Organism and Meta-RNA-Seq
ALDEx: ANOVA-Like Differential Gene Expression Analysis of Single-Organism and Meta-RNA-Seq Andrew Fernandes, Gregory Gloor, Jean Macklaim July 18, 212 1 Introduction This guide provides an overview of
More information