MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

Similar documents
IMPLEMENTING HIERARCHICAL CLUSTERING METHOD FOR MULTIPLE SEQUENCE ALIGNMENT AND PHYLOGENETIC TREE CONSTRUCTION

Phylogenetic Tree Generation using Different Scoring Methods

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Dr. Amira A. AL-Hosary

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Copyright 2000 N. AYDIN. All rights reserved. 1

Phylogenetic inference

EVOLUTIONARY DISTANCES

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Quantifying sequence similarity

Bioinformatics. Dept. of Computational Biology & Bioinformatics

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

BIOINFORMATICS: An Introduction

Sequence analysis and Genomics

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Phylogenetic Tree Reconstruction

Sequencing alignment Ameer Effat M. Elfarash

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Constructing Evolutionary/Phylogenetic Trees

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab

Introduction to Bioinformatics Online Course: IBT

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.


Introduction to Bioinformatics Introduction to Bioinformatics

Phylogenetic analyses. Kirsi Kostamo

A Phylogenetic Network Construction due to Constrained Recombination

Single alignment: Substitution Matrix. 16 march 2017

Effects of Gap Open and Gap Extension Penalties

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Sequencing alignment Ameer Effat M. Elfarash

Cladistics and Bioinformatics Questions 2013

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Phylogeny Tree Algorithms

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Multiple Sequence Alignment. Sequences

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Sequence Analysis '17- lecture 8. Multiple sequence alignment

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Comparing whole genomes

Bioinformatics Chapter 1. Introduction

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Tools and Algorithms in Bioinformatics

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Tree Building Activity

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Constructing Evolutionary/Phylogenetic Trees

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Computational Biology: Basics & Interesting Problems

Multiple Sequence Alignment

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Algorithms in Bioinformatics

Phylogeny: building the tree of life

Phylogenetics: Building Phylogenetic Trees

Lecture Notes: Markov chains

How to read and make phylogenetic trees Zuzana Starostová

Molecular evolution 2. Please sit in row K or forward

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

A (short) introduction to phylogenetics

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

Week 10: Homology Modelling (II) - HHpred

Evolutionary Tree Analysis. Overview

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Using Bioinformatics to Study Evolutionary Relationships Instructions

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

An ant colony algorithm for multiple sequence alignment in bioinformatics

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Introduction to Bioinformatics

Motivating the need for optimal sequence alignments...

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Multiple sequence alignment

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Chapter 19: Taxonomy, Systematics, and Phylogeny

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Transcription:

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE Manmeet Kaur 1, Navneet Kaur Bawa 2 1 M-tech research scholar (CSE Dept) ACET, Manawala,Asr 2 Associate Professor (CSE Dept) ACET, Manawala,Asr ABSTRACT The bioinformatics research has accumulated a large amount of data. The cost of storing is decreasing as the hardware technology is advancing. Since more and more data is added in the field of proteomics, there is a great need for computational methods to be really efficient. The biological data has significant complexity due to the presence of data in different formats and to derive knowledge from these complex databases is the key problem of this era. Machine learning and data mining techniques are required which can scale to the size of the problems and can be made to suit and hence applied to biology. To a molecule, the part of molecular sequence is functionally very important and which is resistant to change. So comparative approaches are used in order to ensure the reliability of sequence alignment. The problem of multiple sequence alignment is a part of evolutionary history. The different pairwise methods are described in this present work. For each individual sequence position, an explicit homologous correspondence is established, aligning species column wise. A method is introduced for aligning sequences based on local alignment with consensus sequence. Medicago Sativa varities are loaded from NCBI databank and phylogenetic trees are constructed for divided parts of dataset. Keywords: Local alignment, Multiple sequence alignment, NCBI databank, Phylogenetic tree 1. INTRODUCTION The management of biological information by applying computer technology is known as Bioinformatics. The science of using and developing algorithms and computer databases and analysing biological information utilizing computer and statistical techniques to enhance and accelerate biological research. It is the tool for analysing biological data and hence is more of a tool than a discipline. In bioinformatics, the way in which primary sequences of DNA, RNA, or protein are arranged to identify regions of similarity that may be a result of structural, functional, or evolutionary relationships among the sequences is called multiple sequence alignment. Aligned sequences of amino acid residues and nucleotides are generally represented as rows in a matrix. In order to align the residues with similar or identical characters, gaps are inserted between the residues in further columns. If two sequences in an alignment share a common ancestor and gaps as insertion and deletion mutations that is indels introduced in lineages, then from that mismatches are identified. In DNA sequence alignment, the amount of similarity among amino acids occupying a particular position in the sequence are represented roughly as a measure of how conserved a region is between various lineages. The substitutions of nucleotides in which the chains have similar biochemical properties tells that the region has functional or structural significance. 2. METHODOLOGY JUKES CANTOR METHOD The model used for the computation of substitution from one state to another is the jukes cantor model. This model helps in calculating the evolutionary distance between two species.the formula is as follows: D = -3/4 ln (1-4/3*Nd/N) Here Nd = number of different nucleotides among two sequences. N = Nucleotide length. ALGORITHM The jukes cantor method is used to calculate distance and on the basis of these distances, the NJ method constructs phylogenetic tree. Volume 5, Issue 6, June 2016 Page 126

STEPS OF ALGORITHM: Load medicago sativa sequences from NCBI databank. Use jukes cantor method to calculate distance. On the basis of JC distance, create the first matrix for various species. By examining the original matrix,get the smallest distance. Half of the smallest distance gives branch length from the first node to each of those two species. With the values for new node and the unpicked species; a new and reduced matrix is created. Analyse the reduced matrix by finding the smallest distance. The distance calculation and matrix reduction steps are continued if the species are unpicked one s, so on till all species are picked. Now the first tree is constructed. The phylogenetic tree is constructed using NJ method 3. RELATED WORK The research work involves Model for Phylogenetic Tree Construction. The available literature has been reviewed in this context. Manmeet Kaur, et. al. (2016): The construction of phylogenetic tree is complex yet very important in the field of bioinformatics. A phylogenetic or evolutionary tree once constructed can help get insight into the evolution of different species. But there is computational complexity as the problem grows into large number of species that cannot be easily solved. So,new methods have been researched that are applied to phylogenetic tree construction and have provided some promising results. Various algorithms have been reviewed and are discovered by studying the patterns of nature. Geetika Munjal,et. al. (2015): specified sequence analysis which includes the alignment based and alignment free methods of tree generation are reviewed and these find distance/similarity among the sequences of different species. Alignment free method based on tuple count and set theory is proposed and the results are compared with the guide tree obtained using alignment based method. The proposed method is tested on DNA sequence of length below 1000bp (dataset1) and Sequence of length above 16000bp (dataset2). It achieves the similar performance as that of the alignment based method but without the alignment phase. Dega Ravi Kumar Yadav and Gunes Ercal (2015): performs multiple sequence alignment, an important way of which is the progressive alignment approach studied in this work.. Our contribution is in comparing two main methods of guide tree construction in terms of both efficiency and accuracy of the overall alignment: UPGMA and Neighbor Join methods. Our experimental results indicate that the Neighbor Join method is both more efficient in terms of performance and more accurate in terms of overall cost minimization. Baye Wodajo, et. al. (2015) : analysed Safflower, Carthamus tinctorius, L. an oilseed crop that belongs to the family Asteracea. The genus Carthamus is comprised of 25 species including the only cultivated species of Carthamus tinctorius. So far, the characterization of safflower using molecular markers has been limited. The objective of this study was to examine the cluster analysis of safflower accessions collected from different regions of Ethiopia using ISSR molecular markers. J.Jayapriya, et. al. (2015): This paper illustrates an enhanced algorithm based on one of the swarm intelligence techniques for constructing the Phylogenetic tree (PT), which is used to study the relationship between species. The main scheme is to formulate a PT, an NP- complete problem through an evolutionary algorithm called Artificial Bee Colony (ABC). The tradeoff between the accuracy and the computational time taken for constructing the tree makes way for new variants of algorithms. A new variant of ABC algorithm is proposed to promote the convergence rate of general ABC algorithm through recommending a new formula for searching solution. Prof. Baldeep Singh, et. al. (2015): discussed that Bioinformatics is a branch of biology science and information technology (Computational Technique) in the field of research and development. Phylogeny is the study of evolutionary relatedness amongst organisms based on genetic information codes. The genetic relationships between species can be represented using phylogenetic trees. To construct a phylogenetic tree is a very challenging problem. The main purpose of phylogenetic tree is to determine the structure of unknown sequence and to predict the genetic difference between different species. There are different methods for phylogenetic tree construction from character or distance data. There are different methods to compute distance which include the comparative distance from two sequences using computational methods. A method for construction of distance based phylogenetic tree using hierarchical clustering is Volume 5, Issue 6, June 2016 Page 127

proposed and implemented on different data sequences. The sequences are downloaded from NCBI databank. Evolutionary distances are calculated using computational methods. 4.RESULTS AND DISCUSSION Biological sequence for different species includes the accession no. and variety as inputs. Table: biological sequence. SERIAL NO. VARIETY ACCESSION NUMBER 1. Phytohemagglutinin gene L38479 2. Cinnayml-alcohol L46857 3. MAP-Kinase L07042 4. Phytoene-synthase KJ955630 5. Cultivar-chilean JL880881 5.EXPERIMENTAL ANALYSIS Multiple sequence alignment of five medicago sativa varities using consensus sequence. Cont. Multiple sequence alignment for five closely related medicago sativa varities. Volume 5, Issue 6, June 2016 Page 128

Cont. Multiple sequence alignment for five closely related medicago sativa varities. Cont. multiple sequence alignment for closely related medicago sativa varities. Distance calculations of medicago sativa varities using jukes cantor method. Volume 5, Issue 6, June 2016 Page 129

Construction of phylogenetic tree on the basis of jukes cantor distance 6.CONCLUSIONS Various heuristics are adopted to simplify the complexity of MSA, to finish the process of MSA each group of patterns is aligned. In order to reduce the complexity; the search space is limited to only the most conserved local portions of all the sequences involved. For aligning the DNA sequences of different medicago sativa varities, a model has been constructed. From the various available sequence formats, fasta format has been used and from different datasets available phylogenetic trees have been constructed. REFERENCES [1]. Manmeet kaur, Navneet kaur and Rajbir Singh(2016) Phylogenetic tree construction revision, International Journal of Advanced Research in Computer Science and Software Engineering Vol.6, pp. 517-520. [2]. Geetika Munjal, Madasu Hanmandlu and Deepti Gaur (2015) A New Alignment Free Method for Phylogenetic Tree Construction,International Journal of Database Theory and Application Vol.8, pp.111-124. [3]. Dega Ravi Kumar Yadav and Gunes Ercal (2015) a comparative analysis of progressive multiple sequence alignment approaches using upgma and neighbor join based guidetrees, International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 5, No.3/4. [4]. Baye Wodajo, Feysal Bushira Mustefa and Kassahun Tesfaye (2015) Clustering Analysis of Ethiopian Safflower (Carthamustinctorius) Using ISSR Markers,International Journal of Scientific and Research Publications, Vol 5. [5]. J. Jayapriya and Michael Arock (2015) Enhanced Bio-Inspired Algorithm For Constructing Phylogenetic Tree,ICTACT Journal On Soft Computing, Vol 06. [6]. Shaminder Kaur,Prof. Baldeep Singh and Prof. Tajinder Kaur (2015) Improved Computational Methods for Phylogenetic Tree Construction using Cluster Analysis, International Journal of Advanced Research in Computer Science and Software Engineering,Vol 5, pp. 378-385. AUTHOR Manmeet Kaur has received B-Tech degree from Punjab Technical University, Jalandhar in 2014 and is currently pursuing her M-Tech degree from PTU,Jal. Volume 5, Issue 6, June 2016 Page 130