CONSTRUCTION OF PHYLOGENETIC TREE FROM MULTIPLE GENE TREES USING PRINCIPAL COMPONENT ANALYSIS
|
|
- Margery Black
- 5 years ago
- Views:
Transcription
1 INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14) ISSN (Print) ISSN (Online) Volume 5, Issue 12, December (2014), pp IAEME: Journal Impact Factor (2014): (Calculated by GISI) IJECET I A E M E CONSTRUCTION OF PHYLOGENETIC TREE FROM MULTIPLE GENE TREES USING PRINCIPAL COMPONENT ANALYSIS Anu Sabarish R, Tessamma Thomas Department of Electronics, Cochin University of Science and Technology, Cochin, India ABSTRACT A wide range of methods have been used to study molecular phylogeny to discover the appropriate degree of relationship within a group of organisms. An approach to construct the phylogenetic tree from multiple gene trees is presented here. Multiple gene trees corresponding to different proteins, obtained from 15 placental mammals are created. A principal component analysis based method of inferring a phylogenetic tree from these multiple gene trees is described. The information thus gathered provide meaningful insights into the pattern and process of evolution which will show the recency of common ancestry. The result clearly indicates that the species tree generated using the proposed method is more accurate and consistent compared to the trees that are inferred from a single gene. Keywords: Correlation, Dendrogram, Distance Matrix, Electron Ion Interaction Potential, Gene Tree, Genomic Signal Processing, Principal Component Analysis, Phylogenetic Tree, SVD. 1. INTRODUCTION Genomic signal processing is an interdisciplinary field of representing, analyzing and understanding biological sequences and images using improved computational techniques that are commonly applied to areas of voice, image and video processing. It is becoming increasingly important to process the vast amount of data generated by the wide range of new sequencing techniques. Compared to the conventional approaches, traditional as well as new signal processing methods can play a significant role in processing this large volume of data. Various pattern recognition, data mining and machine learning algorithms are developed for processing and understanding the genomic data. Genome contains the biological information needed to construct and maintain every organism. In most of the organisms, except a few viruses, the genome is made of deoxyribonucleic acid (DNA) having a double helix structure [1] with two strands. Each strand consists of a linked chain of nucleotides or nitrogenated bases which are of 4 typesadenine(a), thymine(t), cytosine(c) and guanine(g). Three adjacent bases in a DNA sequence form a triplet called codon, each one representing an amino acid. A linear chain of amino acids, starting with the codon ATG, corresponding to the amino acid methionine, followed by a sequence of amino acids and ending with a stop codon, constitutes a protein. Protein sequences belonging to the same functional class from different organisms have some sort of sequence similarity that allows them to perform their common function. This similarity in structure and sequence can be attributed to the fact that they are derived phylogenetically from a common precursor and evolution process appears to have exerted a considerable degree of conservatism towards functionally critical residues [2]. Phylogenetics is the study of evolutionary relationship among the different forms of life, both existing and extinct. In the past, different methods were used for phylogenetic analysis including Paleontological, Morphological and Embryological studies. Recently, owing to the development of advanced sequencing techniques, molecular phylogenetics 93
2 is widely used which involves the analysis of hereditary molecular differences, mainly in DNA sequences and amino acid sequences. The pattern of evolutionary relationship among different species is best illustrated by phylogenetic trees. A relationship topology based on the observation of divergence within a single homologous gene is called a gene tree. An internal node in a gene tree represents the divergence of an ancestral gene into alleles with different DNA sequences by mutation. A gene tree represents the evolutionary history of that gene, while the tree based on data from multiple genes is a species tree. An internal node in a species tree represents a speciation event. The species trees are not exactly same as the gene trees, as the mutations and speciation events do not occur strictly at the same time, but gene trees are generally an accurate representation of species trees. A Molecular Clock Hypothesis was proposed in [3] where, the accumulation of amino acid changes was compared to steady ticking of a clock and the substitution rates were expected to remain constant within homologous protein over a large period of time. Thus the number of differences between two homologous proteins can be well correlated to the amount of time since speciation caused them to diverge independently. And this will facilitate to decipher the phylogenetic relationship between different species and also the time of their divergence. As the genomes evolve by gradual accumulation of mutations, the amount of difference in nucleotide sequence between a pair of genomes indicates the recency of common ancestor. In [2] it is demonstrated that the number of variant residues gives an approximation of the evolutionary distance between two species and their time of divergence. The evolutionary distance between all pairs of sequences, obtained from the corresponding nucleotide difference is represented as the distance matrix. Then a representation in the form of a tree is generated from this distance matrix using Unweighted Pair Group Method with Arithmetic average (UPGMA) algorithm. In this work, a computational technique based on direct protein sequence correlation described in [4] is used to analyse the phylogenetic relationship between different species. From a group of organisms, a sample set of proteins are selected. For processing, the alphabetical amino acid sequence has to be converted into a numerical form. Individual gene trees are constructed based on each of these sequences. Then a method based on principal component analysis for combining the information gained from the individual gene trees to infer a phylogenetic tree is proposed. Using the method, phylogenetic tree for two set of organisms are constructed and compared. The paper is organized as follows. The method for representing protein sequence data in numerical form is mentioned in section 2. Section 3 describes a method of constructing phylogenetic tree using sequence correlation, the theory of principal component analysis and its application in inferring phylogenetic tree from gene trees. Section 4 describes the details of implementation and finally the results are discussed in section NUMERICAL REPRESENTATION OF AMINO ACID SEQUENCES Most of the identified protein sequence data is available freely over the web at various online databases, one of which is the Entrez search and retrieval system of the National Center for Biotechnology Information [5]. The protein sequences obtained from these sources are often in the form of a sequence of characters, each representing a distinct amino acid, which has to be converted into numerical form for further processing. Various methods have been used in literature for numerical mapping [6] [7] [8] [9] [10] [11] [12]. A comparison of the informational capacity of various physicochemical, thermodynamic, structural and statistical parameters of amino acids are performed in [9] and it is shown that Electron Ion Interaction Potential (EIIP) is the most suitable known amino acid property that can be used in structure-function analysis of proteins. The EIIP values for amino acids and nucleotides are calculated using the general model of pseudo potential described in [13]: r r r k + q w k α Z Z = )sin(2πβ µ ) ( 0 Z where, q is the change in momentum of delocalized electron in the interaction with the potential w, Z is the atomic number, Z 0 is the atomic number of the inert element that begins the period which includes the actual Z in the standard periodic table, and q µ = (2) 2 K F 2πµ where, q is a wave number and K F the corresponding Fermi momentum. Z 2 ( E ) = F 3 α ( Z where ( E ) is the corresponding Fermi energy. F Z Z Z 0 ) (1) β (3) 94
3 The EIIP values of the 20 amino acids that form the linear polypeptide chain of each protein sequence are obtained from [14]. We can obtain a numerical sequence by substituting the EIIP value of each amino acid for the corresponding alphabetic letter in protein sequence. In this work EIIP value is used for the transformation of protein sequences into numerical form. 3. METHODOLOGY 3.1 Phylogenetic analysis using sequence correlation In this method, the numerical form of amino acid sequence is subjected to cross correlation with other homologous sequences. The correlation function is a measure of similarity between two functions which is normalized such that its magnitude is always less than 1. The correlation coefficient R(i) is calculated as follows, N 1 N 1 Y ( n) X ( n i) 0 R ( i) = (4) 2 N 1 2 X ( n) Y ( n) 0 where, X[n] and Y[n] are the two sequences and i represents the shift. The maximum value of R(i), denoted by C xy, is taken as the measure of similarity between the two sequences X and Y. The correlation C xy thus obtained is then converted to the corresponding distance parameter (D xy ) using the relation given below. 0 D xy = 1- C xy (5) where 0 D xy 1. All the pair wise sequence correlation is found and pair wise distance is calculated. A distance matrix is thus formed which is used for gene tree construction using UPGMA method. 3.2 Principal component analysis Principal component analysis (PCA) is a dimensionality reduction technique which transforms high dimensional data to a lower dimensional subspace. PCA is an orthogonal transformation that transforms a data set with correlated variables into a smaller uncorrelated set of variables with less redundancy while retaining most of the useful information. The first principal component (PC) represents the direction of highest variance; the second PC represents the direction that maximises the remaining variance in the orthogonal subspace to the first component. This can be extended up to the adequate number of PC s required to represent the system in an optimal way. The PCA can be considered as a linear transformation, P that transforms X into Y. Here X represents the original data set and Y represents the new data set which is the projection of X on principal components. Both X and Y are p x q matrices where p is the number of variables and q is the number of observations or samples. In order to quantify the redundancy in the variable a covariance function C x is defined as, = (6) where C x is an p x p matrix with diagonal elements representing the variance of each variables and the off diagonal elements represent the covariance between the variables. The optimized data set without redundancy must have a covariance matrix with all off diagonal elements as zero. Hence the transformation matrix P should be selected such that C y is diagonalized. One method is to calculate the eigen vectors of C x and forming a transformation matrix P with the eigen vectors as its rows. The eigen vectors also represent the PC s of the original data set X where the significance of PC s is given by the corresponding eigen values. The first PC is the eigen vector with the largest eigen value and so on. The principal components can also be obtained by using singular value decomposition (SVD). The SVD of an p x q matrix X is represented by, (7) Here the columns of V are equivalent to the eigen vectors obtained from the covariance matrix C x and are the principal components of X. The number of PC s extracted is same as the number of variables in original data. To achieve dimensionality reduction the first few meaningful components are retained. 3.3 Inferring phylogenetic tree from multiple gene trees using PCA Multiple gene trees are generated using the method described above. Using the correlation method separate distance matrices (D i ) having all pair wise distance are obtained for each of the N genes. 95
4 = : : : (8) where i=1,2,3, N. Here D i is the distance matrix for the i th gene, N is the number of genes taken for analysis, k is the number of species under consideration, D xy is the distance between species x and y. The distance matrix obtained is converted into a one dimensional distance vector (d i) of length M. where i=1,2,3, N. =,,, (9) There will be N such vectors where N is the number of genes selected for analysis. Then a joint data matrix (X) of size N x M is formed with each distance vector of size M occupying the rows. = (10) : : : : A principal component analysis of the data set X is performed using singular value decomposition to obtain the eigen vector of X. The eigen vector with the highest eigen value is taken as the first principal component, PC 1. Here we are interested only in PC 1 and the remaining ones are not considered. Then a projection of the data X onto PC 1 gives the distance vector Y which can be taken as the consensus vector of all the genes selected for the population under consideration. = (11) = = : : : : Y is a one dimensional distance vector of the form, d i, of length M which is converted back to distance matrix in the form D i. Finally the phylogenetic tree based on the N different genes of the k populations is generated using UPGMA method. 4. IMPLEMENTATION AND RESULTS 4.1 Database The amino acid sequences of proteins are obtained from the National Center for Biotechnology Information (NCBI) website. In the first example, amino acid sequences of testin, myoglobin, lysozyme, caveolin-1 and cytochrome b from 11 primates namely Papio anubis (Baboon), Pan troglodytes (Chimpanzee), Hylobates agilis (Gibbon), Gorilla beringei (Gorilla), Homo sapiens (Human), Lepilemur mustelinus (Lemur), Callithrix jacchus (Marmoset), Aotus trivirgatus (Night monkey), Saimiri sciureus (Squirrel monkey), Macaca mulatta (Rhesus monkey) and Pongo abelii (Urangutan) are considered. In the second example, amino acid sequences of 10 proteins, testin, myoglobin, lysozyme, caveolin-1, caveolin-2, caveolin-3, cytochrome-b, somatotropin, prolactin and flotillin, from 15 organisms under Eutheria are considered. The 15 species selected are Papio anubis (Baboon), Pan troglodytes (Chimpanzee), Gorilla beringei (Gorilla), Homo sapiens (Human), Callithrix jacchus (Marmoset), Saimiri sciureus (Squirrel monkey), Macaca mulatta (Rhesus monkey), Pongo abelii (Urangutan), Mus musculus (Mouse), Rattus norvegicus (Rat), Ovis aries (Sheep), Bos Taurus (Cattle), Canis lupus familiaris (Dog), Felis catus (Cat) and Oryctolagus cuniculus (Rabbit). 4.2 Construction of phylogenetic tree from multiple gene trees In this method phylogenies are inferred separately for each gene and the resulting gene trees are used to generate a consensus phylogeny. In the first example, 11 primates, as mentioned above, are considered. 5 different protein sequences, testin, myoglobin, lysozyme, caveolin-1 and cytochrome b of these organisms in the form a character string are obtained and are converted into numerical form using EIIP method. Using the steps described in section 3.1, five distance matrices, D i (i=1,2,..5) for each of the five genes are generated. Separate gene trees are then constructed by UPGMA method using the distance matrices. It can be observed that the structure of these gene trees obtained for the same set of organisms varied considerably from one another. Hence a consensus structure to represent the evolutionary 96
5 pattern is needed. The consensus phylogenetic tree is then obtained using the method described in section 3.3. By applying principal component analysis on the concatenated distance matrix, a variable reduction is achieved and the obtained phylogenetic tree is shown in Fig. 1. The resulting phylogenetic tree matches very well with the taxonomic classification of the set of organisms involved and is more consistent. In the second example, the sample set is widened to include more diverse organisms. Here 15 species belonging to Eutheria or placental mammals which include organisms from different orders such as primate, lagomorpha, rodentia, carnivora and artiodactyla are selected. 10 different proteins from these organisms are used for analysis and separate trees for each of them are generated. Here also the 10 trees have considerable variation among themselves. The consensus phylogenetic tree obtained using PCA is shown in Fig. 2, matches accurately with the taxonomic classification. The results obtained clearly illustrates that the proposed method of generating a consensus tree from multiple genes is more accurate and consistent compared to the phylogenetic trees inferred from individual genes. Figure1: phylogenetic tree of 11 primates constructed by combining the five gene trees using PCA Cat Cattle Sheep Rabbit Dog Mouse Rat Squirrel monkey Marmoset Baboon Rhesus monkey Urangutan Human Gorilla Chimpanzee Figure 2: phylogenetic tree of 15 species belonging to eutheria constructed by combining the 10 gene trees using PCA 5. CONCLUSION An approach to generate a consensus phylogeny from multiple genes using principal component analysis is presented. Principal component analysis is employed to remove the unimportant variability in the evolutionary space. In this method phylogenetic trees are generated separately for individual genes using direct protein sequence correlation analysis. The individual phylogeny information is then combined and a consensus phylogenetic tree is generated with the help of principal component analysis. The proposed method can give better results in inferring phylogenetic relationship using as many individual protein families as possible to identify the common phylogenetic pattern prevailing in them. A comparison of the topology of the consensus tree with that of individual trees clearly indicates that the phylogenetic tree based on multiple genes are more consistent compared to the ones inferred from a single gene. This is due to the fact that the rate of change occurring in genes varies widely from one another. Some genes remain almost constant among the species under consideration while some genes vary too much. Phylogenetic tree based on multiple genes are consistent 97
6 compared to the ones inferred from a single gene, as different rate of changes of sequences of different genes can lead to inconsistent topology of relationship. The proposed approach can also be used to generate a consensus phylogenetic tree from a group of phylogenetic trees generated using different approaches. REFERENCES [1] J.D. Watson and F.H.C. Crick, A structure for DNA, Nature, Vol. 171, , [2] E. Margoliash, Primary structure and evolution of cytochrome c, Proceedings of the National academy of sciences of the USA, Vol. 50, , [3] E. Zuckerkandl, and L.B. Pauling, Molecular disease, evolution and genetic heterogeneity, Horizons in Biochemistry, Academic Press, New York, , [4] A.R. Sabarish, and T. Thomas, Molecular phylogeny analysis using correlation distance and spectral distance, Int. J. Data Mining and Bioinformatics, Vol. 10, No. 4, , [5] National Center for Biotechnology Information. [online] Available at (Accessed 15 October 2013). [6] K.H. Chu, J. Qi, Z.G. Yu, and V.Anh, Origin and phylogeny of chloroplasts revealed by a simple correlation analysis of complete genomes, Molecular Biology and Evolution, Vol. 21, No. 1, , [7] I. Cosic, and E. Pirogova, Application of ionization constant of amino acids for protein signal analysis within the resonant recognition model, Proceedings of the 20th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vol. 20, No 2, , [8] J.A. Glazier, S. Raghavachari, C.L. Berthelsen, and M.H. Skolnick, Reconstructing phylogeny from the multifractal spectrum of mitochondrial DNA, Physical Review Letters, Vol. 51, No. 3, , [9] L. Lazovic, Selection of amino acid parameters for fourier transform-based analysis of proteins, CABIOS Communication, Vol. 12, No. 6, , [10] L. Marsella, F. Sirocco, A. Trovato, F. Seno, and S.C.E. Tosatto, REPETITA: detection and discrimination of the periodicity of protein solenoid repeats by discrete Fourier transform, Bioinformatics, Vol.25, , [11] A.R. Sabarish, and T. Thomas, A Frequency domain approach to protein sequence similarity analysis and functional classification, Signal & Image processing : An International Journal, Vol.2, No.1, 36-49, [12] P.P. Vaidyanathan, Genomics and proteomics: A signal processor s tour, IEEE Circuits and Systems Magazine, Vol. 4, 6-29, [13] V. Veljkovic, and I. Slavic, Simple General-Model Pseudopotential, Physical Review Letters, Vol. 29, No.2, , [14] I. Cosic, Macromolecular Bioactivity: Is it resonant interaction between macromolecules?- Theory and applications, IEEE Transactions on biomedical engineering, Vol. 41, No. 12, , [15] Shashikant S. Patil, Sachin A Sonawane, Nischay Upadhyay and Aanchal Srivastava, Compressive Assessment of Bioinformatics in Biomedical Imaging and Image Processing, International Journal of Computer Engineering & Technology (IJCET), Volume 5, Issue 4, 2014, pp , ISSN Print: , ISSN Online:
Cladistics and Bioinformatics Questions 2013
AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationCHAPTERS 24-25: Evidence for Evolution and Phylogeny
CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology
More informationBiology Keystone (PA Core) Quiz Theory of Evolution - (BIO.B ) Theory Of Evolution, (BIO.B ) Scientific Terms
Biology Keystone (PA Core) Quiz Theory of Evolution - (BIO.B.3.2.1 ) Theory Of Evolution, (BIO.B.3.3.1 ) Scientific Terms Student Name: Teacher Name: Jared George Date: Score: 1) Evidence for evolution
More informationMETHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.
Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern
More informationOrganizing Life s Diversity
17 Organizing Life s Diversity section 2 Modern Classification Classification systems have changed over time as information has increased. What You ll Learn species concepts methods to reveal phylogeny
More informationSCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology
SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION Using Anatomy, Embryology, Biochemistry, and Paleontology Scientific Fields Different fields of science have contributed evidence for the theory of
More informationInvestigating Evolutionary Relationships between Species through the Light of Graph Theory based on the Multiplet Structure of the Genetic Code
07 IEEE 7th International Advance Computing Conference Investigating Evolutionary Relationships between Species through the Light of Graph Theory based on the Multiplet Structure of the Genetic Code Antara
More informationRELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG
RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG Department of Biology (Galton Laboratory), University College London, 4 Stephenson
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationInvestigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST
Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Introduction Bioinformatics is a powerful tool which can be used to determine evolutionary relationships and
More informationChapter 16: Reconstructing and Using Phylogenies
Chapter Review 1. Use the phylogenetic tree shown at the right to complete the following. a. Explain how many clades are indicated: Three: (1) chimpanzee/human, (2) chimpanzee/ human/gorilla, and (3)chimpanzee/human/
More informationBio 1B Lecture Outline (please print and bring along) Fall, 2007
Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution
More informationApplication of new distance matrix to phylogenetic tree construction
Application of new distance matrix to phylogenetic tree construction P.V.Lakshmi Computer Science & Engg Dept GITAM Institute of Technology GITAM University Andhra Pradesh India Allam Appa Rao Jawaharlal
More informationMultiple Sequence Alignment. Sequences
Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe
More informationPhylogenetic Trees. How do the changes in gene sequences allow us to reconstruct the evolutionary relationships between related species?
Why? Phylogenetic Trees How do the changes in gene sequences allow us to reconstruct the evolutionary relationships between related species? The saying Don t judge a book by its cover. could be applied
More informationCubic Spline Interpolation Reveals Different Evolutionary Trends of Various Species
Cubic Spline Interpolation Reveals Different Evolutionary Trends of Various Species Zhiqiang Li 1 and Peter Z. Revesz 1,a 1 Department of Computer Science, University of Nebraska-Lincoln, Lincoln, NE,
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationHomology and Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The
More informationPhylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?
Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them? Carolus Linneaus:Systema Naturae (1735) Swedish botanist &
More informationUSING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES
USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES HOW CAN BIOINFORMATICS BE USED AS A TOOL TO DETERMINE EVOLUTIONARY RELATIONSHPS AND TO BETTER UNDERSTAND PROTEIN HERITAGE?
More informationA Phylogenetic Network Construction due to Constrained Recombination
A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer
More information8/23/2014. Phylogeny and the Tree of Life
Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major
More informationPhylogeny: building the tree of life
Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan
More informationMechanisms of Evolution Darwinian Evolution
Mechanisms of Evolution Darwinian Evolution Descent with modification by means of natural selection All life has descended from a common ancestor The mechanism of modification is natural selection Concept
More informationWarm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab
Date: Agenda Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab Ask questions based on 5.1 and 5.2 Quiz on 5.1 and 5.2 How
More informationHow should we organize the diversity of animal life?
How should we organize the diversity of animal life? The difference between Taxonomy Linneaus, and Cladistics Darwin What are phylogenies? How do we read them? How do we estimate them? Classification (Taxonomy)
More informationPhylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26
Phylogeny Chapter 26 Taxonomy Taxonomy: ordered division of organisms into categories based on a set of characteristics used to assess similarities and differences Carolus Linnaeus developed binomial nomenclature,
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationGENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.
!! www.clutchprep.com CONCEPT: OVERVIEW OF EVOLUTION Evolution is a process through which variation in individuals makes it more likely for them to survive and reproduce There are principles to the theory
More informationPhylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center
Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods
More informationBioinformatics Exercises
Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted
More informationChapter 26. Phylogeny and the Tree of Life. Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick Pearson Education, Inc.
Chapter 26 Phylogeny and the Tree of Life Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick Investigating the Tree of Life Phylogeny is the evolutionary history of a species or group of
More informationChapter 19: Taxonomy, Systematics, and Phylogeny
Chapter 19: Taxonomy, Systematics, and Phylogeny AP Curriculum Alignment Chapter 19 expands on the topics of phylogenies and cladograms, which are important to Big Idea 1. In order for students to understand
More informationEvolution and Taxonomy Laboratory
Evolution and Taxonomy Laboratory 1 Introduction Evolution refers to the process by which forms of life have changed through time by what is described as descent with modification. Evolution explains the
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods
More informationComparing Genomes! Homologies and Families! Sequence Alignments!
Comparing Genomes! Homologies and Families! Sequence Alignments! Allows us to achieve a greater understanding of vertebrate evolution! Tells us what is common and what is unique between different species
More informationWhat is Phylogenetics
What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)
More informationBIOINFORMATICS: An Introduction
BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and
More informationBINF6201/8201. Molecular phylogenetic methods
BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics
More informationPhylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science
Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.
More informationPhylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.
Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony
More informationHomology. and. Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology
More informationPhylogeny and the Tree of Life
Chapter 26 Phylogeny and the Tree of Life PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from
More informationName: Class: Date: ID: A
Class: _ Date: _ Ch 17 Practice test 1. A segment of DNA that stores genetic information is called a(n) a. amino acid. b. gene. c. protein. d. intron. 2. In which of the following processes does change
More informationCOMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST
Big Idea 1 Evolution INVESTIGATION 3 COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST How can bioinformatics be used as a tool to determine evolutionary relationships and to
More informationChapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships
Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic
More informationEmily Blanton Phylogeny Lab Report May 2009
Introduction It is suggested through scientific research that all living organisms are connected- that we all share a common ancestor and that, through time, we have all evolved from the same starting
More informationMolecular phylogeny How to infer phylogenetic trees using molecular sequences
Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues
More informationReproduction- passing genetic information to the next generation
166 166 Essential Question: How has biological evolution led to the diversity of life? B-5 Natural Selection Traits that make an organism more or less likely to survive in an environment and reproduce
More informationBioinformatics. Dept. of Computational Biology & Bioinformatics
Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS
More informationMolecular phylogeny How to infer phylogenetic trees using molecular sequences
Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues
More informationthebiotutor.com AS Biology Unit 2 Classification, Adaptation & Biodiversity
thebiotutor.com AS Biology Unit 2 Classification, Adaptation & Biodiversity 1 Classification and taxonomy Classification Phylogeny Taxonomy The process of sorting living things into groups. The study of
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationPlan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method
Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary
More informationPhylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X
More informationBIOINFORMATICS LAB AP BIOLOGY
BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to
More informationThe Theory of Evolution
Name Date Class CHAPTER 13 DIRECTED READING The Theory of Evolution Section 13-1: The Theory of Evolution by Natural Selection Darwin Proposed a Mechanism for Evolution Mark each statement below T if it
More informationOutline. Evolution: Speciation and More Evidence. Key Concepts: Evolution is a FACT. 1. Key concepts 2. Speciation 3. More evidence 4.
Evolution: Speciation and More Evidence Evolution is a FACT 1. Key concepts 2. Speciation 3. More evidence 4. Conclusions Outline Key Concepts: A species consist of one or more populations of individuals
More informationMATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME
MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:
More informationModule: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment
Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand
More informationLecture 11 Friday, October 21, 2011
Lecture 11 Friday, October 21, 2011 Phylogenetic tree (phylogeny) Darwin and classification: In the Origin, Darwin said that descent from a common ancestral species could explain why the Linnaean system
More informationEvidence for Evolution
Evidence for Evolution 1. 2. 3. 4. 5. Paleontology Comparative Anatomy Embryology Comparative Biochemistry Geographical Distribution How old is everything? The History of Earth as a Clock Station 1: Paleontology
More informationPHYLOGENY AND SYSTEMATICS
AP BIOLOGY EVOLUTION/HEREDITY UNIT Unit 1 Part 11 Chapter 26 Activity #15 NAME DATE PERIOD PHYLOGENY AND SYSTEMATICS PHYLOGENY Evolutionary history of species or group of related species SYSTEMATICS Study
More informationMULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE
MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE Manmeet Kaur 1, Navneet Kaur Bawa 2 1 M-tech research scholar (CSE Dept) ACET, Manawala,Asr 2 Associate Professor (CSE Dept) ACET, Manawala,Asr
More informationTheory of Evolution Charles Darwin
Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties
More informationTree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More informationHUMAN EVOLUTION. Where did we come from?
HUMAN EVOLUTION Where did we come from? www.christs.cam.ac.uk/darwin200 Darwin & Human evolution Darwin was very aware of the implications his theory had for humans. He saw monkeys during the Beagle voyage
More informationCHAPTER 26 PHYLOGENY AND THE TREE OF LIFE Connecting Classification to Phylogeny
CHAPTER 26 PHYLOGENY AND THE TREE OF LIFE Connecting Classification to Phylogeny To trace phylogeny or the evolutionary history of life, biologists use evidence from paleontology, molecular data, comparative
More informationMaster Biomedizin ) UCSC & UniProt 2) Homology 3) MSA 4) Phylogeny. Pablo Mier
Master Biomedizin 2018 1) UCSC & UniProt 2) Homology 3) MSA 4) 1 12 a. All of the sequences in file1.fasta (https://cbdm.uni-mainz.de/mb18/) are homologs. How many groups of orthologs would you say there
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types
More information9/19/2012. Chapter 17 Organizing Life s Diversity. Early Systems of Classification
Section 1: The History of Classification Section 2: Modern Classification Section 3: Domains and Kingdoms Click on a lesson name to select. Early Systems of Classification Biologists use a system of classification
More informationThe Contribution of Bioinformatics to Evolutionary Thought
The Contribution of Bioinformatics to Evolutionary Thought A demonstration of the abilities of Entrez, BLAST, and UCSC s Genome Browser to provide information about common ancestry. American Scientific
More informationPhylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?
Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species
More information5/23/2017. XAVIER RECYCLES! Please place recyclable material in the common area recycling containers
YES: paper, cardboard, and empty plastic containers NO: cans, glass, liquid, garbage, food waste XAVIER RECYCLES! Please place recyclable material in the common area recycling containers Topics Biodiversity
More informationPhylogenetic Tree Generation using Different Scoring Methods
International Journal of Computer Applications (975 8887) Phylogenetic Tree Generation using Different Scoring Methods Rajbir Singh Associate Prof. & Head Department of IT LLRIET, Moga Sinapreet Kaur Student
More informationChapter 26 Phylogeny and the Tree of Life
Chapter 26 Phylogeny and the Tree of Life Biologists estimate that there are about 5 to 100 million species of organisms living on Earth today. Evidence from morphological, biochemical, and gene sequence
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More informationUnderstanding relationship between homologous sequences
Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective
More informationMolecules consolidate the placental mammal tree.
Molecules consolidate the placental mammal tree. The morphological concensus mammal tree Two decades of molecular phylogeny Rooting the placental mammal tree Parallel adaptative radiations among placental
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Lecture : p he biological problem p lobal alignment p Local alignment p Multiple alignment 6 Background: comparative genomics p Basic question in biology: what properties
More informationUsing phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)
Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationIntroduction to Bioinformatics. Shifra Ben-Dor Irit Orr
Introduction to Bioinformatics Shifra Ben-Dor Irit Orr Lecture Outline: Technical Course Items Introduction to Bioinformatics Introduction to Databases This week and next week What is bioinformatics? A
More informationGenomes and Their Evolution
Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from
More informationEvidence of EVOLUTION
Evidence of EVOLUTION Evolution: Genetic change in a population through time Charles Darwin On his journey around the world, Darwin found evidence of GRADUAL CHANGE (evolution) He cited evidences he found
More informationBioinformatics Report Branchiostoma lanceolatum dopamine D 1 / receptor protein phylogenetic analysis. Alanna Lewis
Bioinformatics Report Branchiostoma lanceolatum dopamine D 1 / receptor protein phylogenetic analysis Alanna Lewis 0 Abstract: Dopamine is an essential neurotransmitter for many species of chordates. The
More informationFUNDAMENTALS OF MOLECULAR EVOLUTION
FUNDAMENTALS OF MOLECULAR EVOLUTION Second Edition Dan Graur TELAVIV UNIVERSITY Wen-Hsiung Li UNIVERSITY OF CHICAGO SINAUER ASSOCIATES, INC., Publishers Sunderland, Massachusetts Contents Preface xiii
More informationAn Evolutionary Trend Discovery Algorithm Based on Cubic Spline Interpolation
An Evolutionary Trend Discovery Algorithm Based on Cubic Spline Interpolation ZHIQIANG LI and PETER Z. REVESZ Department of Computer Science and Engineering University of Nebraska-Lincoln Lincoln, NE 68588-0115
More informationConcept Modern Taxonomy reflects evolutionary history.
Concept 15.4 Modern Taxonomy reflects evolutionary history. What is Taxonomy: identification, naming, and classification of species. Common Names: can cause confusion - May refer to several species (ex.
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationPhylogenetic analysis. Characters
Typical steps: Phylogenetic analysis Selection of taxa. Selection of characters. Construction of data matrix: character coding. Estimating the best-fitting tree (model) from the data matrix: phylogenetic
More informationProtein Structure Prediction Using Multiple Artificial Neural Network Classifier *
Protein Structure Prediction Using Multiple Artificial Neural Network Classifier * Hemashree Bordoloi and Kandarpa Kumar Sarma Abstract. Protein secondary structure prediction is the method of extracting
More informationConcepts and Methods in Molecular Divergence Time Estimation
Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks
More informationLetter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons
Letter to the Editor Slow Rates of Molecular Evolution Temperature Hypotheses in Birds and the Metabolic Rate and Body David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons *Department
More informationObjective 3.01 (DNA, RNA and Protein Synthesis)
Objective 3.01 (DNA, RNA and Protein Synthesis) DNA Structure o Discovered by Watson and Crick o Double-stranded o Shape is a double helix (twisted ladder) o Made of chains of nucleotides: o Has four types
More informationAlgorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,
Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837
More informationMolecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016
Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,
More informationResearch Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.
Research Proposal Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Name: Minjal Pancholi Howard University Washington, DC. June 19, 2009 Research
More information