# Tools and Algorithms in Bioinformatics

Size: px
Start display at page:

Transcription

1 Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology Core University of Nebraska Medical Center Basic Elements in Searching Biological Databases Sensitivity versus Specificity/selectivity Scoring Scheme, Gap penalties Distance/Substitution Matrices (PAM, BLOSSUM Series) Search Parameters (E-value, Bit score) Handling Data Quality Issues (Filtering, Clustering) Type of Algorithm (Smith-Waterman, Needleman-Wunsch) 1

2 Choice of the Searching Algorithm An ideal algorithm should have Good specificity and sensitivity Should be fast running Should not use too much memory Greedy algorithms are very sensitive, but very slow. Heuristic algorithms are relatively fast, but loose some sensitivity. It s always a challenge for a programmer to develop algorithms that fulfill both of these requirements Needleman-Wunsch algorithm (JMB 48:443-53, 1970) Very greedy algorithm, so very sensitive Implements Dynamic programming Provides global alignment between the two sequences Smith-Waterman algorithm (JMB 147:195-97, 1981) A set of heuristics were applied to the above algorithm to make it less greedy, so it is less sensitive but runs faster Implements Dynamic programming Provide local alignment between two sequences Both BLAST and FASTA use this algorithm with varying heuristics applied in each case 2

3 FASTA (FAST Algorithm) The fist step is application of heuristics and the second step is using dynamic programming First, the query sequence and the database sequence are cut into defined length words and a word matching is performed in all-to-all combinations Word size is 2 for proteins and 6 for nucleic acids If the initial score is above a threshold, the second score is computed by joining fragments and using gaps of less than some maximum length If this second score is above some threshold, Smith-Waterman alignment is performed within the regions of high identities (known as high-scoring pairs) Protein and nucleotide substitution matrices 3

4 BLAST (Basic Local Alignment Search Tool) The fist step is application of heuristics and the second step is using dynamic programming First, the query sequence and the database sequence are cut into defined length words and a word matching is performed in all combinations Words that score above a threshold are used to extend the word list 7 words Expanded list- 47 words Word Expanded List ql ql, qm, hl ln ln nf nf, af, ny, df, qf, ef, gf, hf, kf, sf, tf fs fs, fa, fn, fd, fg, fp, ft, ys gw gw, aw, rw, nw, dw, qw, ew, hw, iw, kw, mw, pw, sw, tw, vw BLAST continued... BLAST is a local alignment algorithm Several High Scoring Segments are found, with the maximum scoring segment used to define a band in the path graph Smith-Waterman algorithm is performed on several possible segments to obtain optimal alignment The word size for Protein is 3 and for Nucleic acid is 11 4

5 Finding High Scoring Pairs (HSPs) Comparison of BLAST and FASTA BLAST uses an expanded list to compensate for the loss of sensitivity from increased word size BLAST is more sensitive than FASTA for protein searches while FASTA is more sensitive than BLAST for nucleic acid searches Both BLAST and FASTA run faster than the original Needleman-Waunch algorithm at the cost of loss of sensitivity Both algorithms fail to find optimal alignments that fall outside of the defined band width 5

6 Essential Elements of an Alignment Algorithm Defining the problem (Global, semi-global, local alignment) Scoring scheme (Gap penalties) Distance Matrix (PAM, BLOSUM series) Scoring/Target function (How scores are calculated) Good programming language to test the algorithm Types of Alignments Global - When two sequences are of approximately equal length. Here, the goal is to obtain maximum score by completely aligning them Semi-global - When one sequence matches towards one end of the other. Ex. Searches for 5 or 3 regulatory sequences Local - When one sequence is a sub-string of the other or the goal is to get maximum local score Protein motif searches in a database 6

7 Local vs Global alignments Pair-wise sequence comparison 7

8 Different Alignment Paths Sequence T S T S T S T S Whole genome sequence comparison 8

9 Scoring System for Alignments Scoring Weights Matches +10 -These are arbitrary values, but Mismatches +4 the real values come from distance matrices (PAM, BLOSUM etc.) Gap Penalties Gap Initiation Arbitrarily chosen but, optimized Gap Extension - 2 for a particular Distance matrix Rules of Thumb for Affine Gap Penalties Gap Initiation Penalty should be 2 to 3 times the largest negative score in a distance matrix table Gap Extension Penalty should be 0.3 to 0.l times the Initiation Penalty Similarity Score Similarity score is the sum of pair-wise scores for matches/ mismatches and gap penalties Scoring Weights Matches +10 Mismatches +4 Gap Initiation -10 Gap Extension -2 Similarity Score = (Match score + Mismatch score) + Gap penalty Global Alignment EPSGFPAWVSTVHGQEQI E----PAWVST-----QI Score: (9 x 10) + (0 x 4) + (2 x x -2) Score = 90 + (- 34) = 56 9

10 Scoring/Target function The scoring function calculates the similarity score. The goal of the algorithm is to maximize similarity score Global Alignment EPSGFPAWVSTVHGQEQI E----PAWVST-----QI Score: (9 x 10) + (2 x x -2) Score = 90 + (-34) = 56 Scoring Weights Matches +10 Mismatches +4 Gap Initiation -10 Gap Extension -2 Local Alignment EPSGFPAWVSTVHGQEQI E----PAWVST-----QI Score: (6 x 10) + (2 x x 0) Score = = 60 Scoring Weights Matches +10 Mismatches +4 Gap Initiation -10 Gap Extension -2 No terminal gap penalty Different types of BLAST programs Program Query Database Comparison blastn nucleotide nucleotide nucleotide blastp protein protein protein blastx nucleotide protein protein tblastn protein nucleotide protein tblastx nucleotide nucleotide protein megablast nucleotide nucleotide nucleotide PHI-Blast protein protein protein PSI-Blast Protein-Profile protein protein RPS-Blast protein Profiles protein 10

11 Multiple Sequence Alignment 21 Terminology Homologous : Similar Paralogous : Present in the same species, diverged after gene duplication. Orthologous: Present in different species, diverged after speciation. Xenologous: Genes acquired by horizontal gene transfer Analogous: Similarity observed by convergent evolution, but not by common evolutionary origin. 11

12 Evolution of Genomes and Genetic Variation Variation between species Recombination, Cis-regulatory elements Point mutations/ insertions/deletions Evolutionary selection à speciation Variation within species: Phenotype = Genotype + Environment SNPs, haplotypes, aneuploidy, etc. Variation at cellular level: Spatial state (Tissue/Subcellular location) Temporal state (Stages in life cycle) Physiological state (normal/disease) External stimuli Phylogenetic Tree of Life 12

13 Multiple Sequence Alignments Aligning homologous residues among a set of sequences, together in columns Multiple alignments exhibit structural and evolutionary information among closely related species i.e., families A column of aligned residues in a conserved region is likely to occupy similar 3-D positions in protein structure Patterns or motifs common to a set of sequences may only be apparent from multiple alignments Necessary for building family profiles, phylogenetic trees and extracting evolutionary relationships Useful for homology modeling if at least one member of the family has a known 3-D structure Phylogenetic Complexity 13

14 Multiple alignments from different VGC protein families Sodium VGC Calcium VGC Potassium VGC Voltage-gated ion channel proteins Topology of six TMS in VGC proteins 14

15 Scoring a Multiple Alignment Some positions are more conserved than others Sequences are not independent, but are related by a phylogeny Pair-wise Multiple Most of the multiple alignment algorithms assume that the individual columns of an alignment are statistically independent Total Alignment Score S = Σ i S(m i ) + G where, m i is column i of the multiple alignment m, S(m i ) is the score for column i, G is gap penalty 15

16 Basic progressive alignment procedure Calculate alignment score for all pair-wise combinations using Dynamic Programming Determine distances from scores for all pairs of sequences and build a distance matrix Use a distance-based method to construct a guide-tree Add sequences to the growing alignment using the order given by the tree Most of the multiple alignment methods differ in the way the guide tree is constructed 16

17 Feng-Doolittle Method Get a distance matrix Calculate pair-wise scores with DP method, convert raw scores into pair-wise distances using the following formula D = -ln S eff S eff = [S obs - S rand ] / [S max - S rand ] where, -S eff is the effective score -S obs is the similarity score(dp score) between a pair -S max is the max score, average of aligning either sequence to itself -S rand is the background noise, obtained by aligning two random sequences of equal length and composition Feng-Doolittle Method.. From pair-wise distances, build a matrix for all sequence combinations as follows Human Chimp Gorilla Orang Human Chimp Gorilla Orang 0 The number of pair-wise distances are N(N-1)/2 17

18 Feng-Doolittle Method.. Alignment of sequences The alignment order is determined from the order sequences were added to the guide tree First 2 sequences from the node are added first. In this case, C-H are aligned according to the standard DP algorithm O G C H Next, G is aligned to CH as the best of G(CH) and (CH)G alignments Assuming that of the above two, G(CH) has the best score, O is aligned to G(CH) as the best of O(G(CH)) and G(O(CH)) Again, higher similarity score determines which one is the best. This process is repeated iteratively until all sequences are aligned. As you notice, the order of sequences in the output are not the same as you see in the guide tree. Multiple sequence alignment (MSA) MSA: tools Clustal, Mafft, Muscle, TCoffee., Mview Accessible from Ebi website: Obtains a distance matrix by using scores from pair-wise DP method Guide tree is built using neighbor joining method Sequences are weighed to compensate for biased representation in larger families Substitution matrices change on the fly as the alignment progresses; closely related sequences are aligned with hard matrices (e. g. BLOSUM80) and distant sequences are aligned with soft matrices (e.g. BLOSUM40) Position specific gap penalties are used similar to profiles Guide tree may be adjusted on the fly to defer the alignment of low-scoring sequences 18

### Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology

### CSE : Computational Issues in Molecular Biology. Lecture 6. Spring 2004

CSE 397-497: Computational Issues in Molecular Biology Lecture 6 Spring 2004-1 - Topics for today Based on premise that algorithms we've studied are too slow: Faster method for global comparison when sequences

### Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

### Bioinformatics for Biologists

Bioinformatics for Biologists Sequence Analysis: Part I. Pairwise alignment and database searching Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Definitions The use of computational

### Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

### THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

### Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in

### 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

### CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

### Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

### Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

### Bioinformatics and BLAST

Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists

### Algorithms in Bioinformatics

Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

### CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

### Sequence Database Search Techniques I: Blast and PatternHunter tools

Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered

### InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

### Sequence analysis and Genomics

Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

### Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

### Practical considerations of working with sequencing data

Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!

### Alignment & BLAST. By: Hadi Mozafari KUMS

Alignment & BLAST By: Hadi Mozafari KUMS SIMILARITY - ALIGNMENT Comparison of primary DNA or protein sequences to other primary or secondary sequences Expecting that the function of the similar sequence

### Grundlagen der Bioinformatik, SS 08, D. Huson, May 2,

Grundlagen der Bioinformatik, SS 08, D. Huson, May 2, 2008 39 5 Blast This lecture is based on the following, which are all recommended reading: R. Merkl, S. Waack: Bioinformatik Interaktiv. Chapter 11.4-11.7

### Single alignment: Substitution Matrix. 16 march 2017

Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block

### Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

### Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Sequence Bioinformatics Multiple Sequence Alignment Waqas Nasir 2010-11-12 Multiple Sequence Alignment One amino acid plays coy; a pair of homologous sequences whisper; many aligned sequences shout out

### BLAST. Varieties of BLAST

BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

### Large-Scale Genomic Surveys

Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

### Collected Works of Charles Dickens

Collected Works of Charles Dickens A Random Dickens Quote If there were no bad people, there would be no good lawyers. Original Sentence It was a dark and stormy night; the night was dark except at sunny

### In-Depth Assessment of Local Sequence Alignment

2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.

### Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

### Computational Biology

Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

### Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

### Heuristic Alignment and Searching

3/28/2012 Types of alignments Global Alignment Each letter of each sequence is aligned to a letter or a gap (e.g., Needleman-Wunsch). Local Alignment An optimal pair of subsequences is taken from the two

### 5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT.03.239 03.10.2012 ALIGNMENT Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity. Homology:

### Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

### Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple

### Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis

he universe of biological sequence analysis Word/pattern recognition- Identification of restriction enzyme cleavage sites Sequence alignment methods PstI he universe of biological sequence analysis - prediction

### Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

### Introduction to Bioinformatics

Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression

### Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary

### Effects of Gap Open and Gap Extension Penalties

Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

### Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm

Lecture 2, 12/3/2003: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties Local alignment the Smith-Waterman algorithm 1 Computational

### Protein function prediction based on sequence analysis

Performing sequence searches Post-Blast analysis, Using profiles and pattern-matching Protein function prediction based on sequence analysis Slides from a lecture on MOL204 - Applied Bioinformatics 18-Oct-2005

### BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

### Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

### Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

### Quantifying sequence similarity

Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

### Study and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis

Study and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis Kumud Joseph Kujur, Sumit Pal Singh, O.P. Vyas, Ruchir Bhatia, Varun Singh* Indian Institute of Information

### Phylogenetic inference

Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

### Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

### G4120: Introduction to Computational Biology

ICB Fall 2003 G4120: Introduction to Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology Copyright 2003 Oliver Jovanovic, All Rights Reserved. Bioinformatics and

### Introduction to protein alignments

Introduction to protein alignments Comparative Analysis of Proteins Experimental evidence from one or more proteins can be used to infer function of related protein(s). Gene A Gene X Protein A compare

Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment

### Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

### Moreover, the circular logic

Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT

### "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

### SUPPLEMENTARY INFORMATION

Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

### Similarity searching summary (2)

Similarity searching / sequence alignment summary Biol4230 Thurs, February 22, 2016 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 What have we covered? Homology excess similiarity but no excess similarity

### EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics Lecture 05: Index-based alignment algorithms Slides adapted from Dr. Shaojie Zhang (University of Central Florida) Real applications of alignment Database search

### Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

### Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Biochemistry 324 Bioinformatics Pairwise sequence alignment How do we compare genes/proteins? When we have sequenced a genome, we try and identify the function of unknown genes by finding a similar gene

### Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

### Biology Tutorial. Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan

Biology Tutorial Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan Viruses A T4 bacteriophage injecting DNA into a cell. Influenza A virus Electron micrograph of HIV. Cone-shaped cores are

### Similarity or Identity? When are molecules similar?

Similarity or Identity? When are molecules similar? Mapping Identity A -> A T -> T G -> G C -> C or Leu -> Leu Pro -> Pro Arg -> Arg Phe -> Phe etc If we map similarity using identity, how similar are

### Overview Multiple Sequence Alignment

Overview Multiple Sequence Alignment Inge Jonassen Bioinformatics group Dept. of Informatics, UoB Inge.Jonassen@ii.uib.no Definition/examples Use of alignments The alignment problem scoring alignments

### Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement

### Fundamentals of database searching

Fundamentals of database searching Aligning novel sequences with previously characterized genes or proteins provides important insights into their common attributes and evolutionary origins. The principles

### Phylogenetic Tree Reconstruction

I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

### EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

### Pairwise sequence alignment

Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL

### CSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182

CSE182-L7 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding 10-07 CSE182 Bell Labs Honors Pattern matching 10-07 CSE182 Just the Facts Consider the set of all substrings

### A greedy, graph-based algorithm for the alignment of multiple homologous gene lists

A greedy, graph-based algorithm for the alignment of multiple homologous gene lists Jan Fostier, Sebastian Proost, Bart Dhoedt, Yvan Saeys, Piet Demeester, Yves Van de Peer, and Klaas Vandepoele Bioinformatics

### Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Toni Gabaldón Contact: tgabaldon@crg.es Group website: http://gabaldonlab.crg.es Science blog: http://treevolution.blogspot.com

### Lecture 5,6 Local sequence alignment

Lecture 5,6 Local sequence alignment Chapter 6 in Jones and Pevzner Fall 2018 September 4,6, 2018 Evolution as a tool for biological insight Nothing in biology makes sense except in the light of evolution

### Sequence analysis and comparison

The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

### Sequence Alignment (chapter 6)

Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:

### Practical Bioinformatics

5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o

### Local Alignment Statistics

Local Alignment Statistics Stephen Altschul National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda, MD Central Issues in Biological Sequence Comparison

### Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first

### BLAST: Target frequencies and information content Dannie Durand

Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences

### Dr. Amira A. AL-Hosary

Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

### RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

Molecular Biology-2018 1 Definitions: RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Heterologues: Genes or proteins that possess different sequences and activities. Homologues: Genes or proteins that

### Tutorial 4 Substitution matrices and PSI-BLAST

Tutorial 4 Substitution matrices and PSI-BLAST 1 Agenda Substitution Matrices PAM - Point Accepted Mutations BLOSUM - Blocks Substitution Matrix PSI-BLAST Cool story of the day: Why should we care about

### Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Institute of Bioinformatics Johannes Kepler University, Linz, Austria Sequence Alignment 2. Sequence Alignment Sequence Alignment 2.1

### Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

### Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 1/23/07 CAP5510 1 Genomic Databases Entrez Portal at National

### Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural

### DNA and protein databases. EMBL/GenBank/DDBJ database of nucleic acids

Database searches 1 DNA and protein databases EMBL/GenBank/DDBJ database of nucleic acids 2 DNA and protein databases EMBL/GenBank/DDBJ database of nucleic acids (cntd) 3 DNA and protein databases SWISS-PROT

### Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations

Sequence Analysis and Structure Prediction Service Centro Nacional de Biotecnología CSIC 8-10 May, 2013 Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations Course Notes Instructor:

### Sequence Comparison. mouse human

Sequence Comparison Sequence Comparison mouse human Why Compare Sequences? The first fact of biological sequence analysis In biomolecular sequences (DNA, RNA, or amino acid sequences), high sequence similarity

### Introduction to Bioinformatics Online Course: IBT

Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple

### Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

### BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged

### 9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

### A profile-based protein sequence alignment algorithm for a domain clustering database

A profile-based protein sequence alignment algorithm for a domain clustering database Lin Xu,2 Fa Zhang and Zhiyong Liu 3, Key Laboratory of Computer System and architecture, the Institute of Computing

### Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Research Proposal Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Name: Minjal Pancholi Howard University Washington, DC. June 19, 2009 Research

### Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

### Lecture 1, 31/10/2001: Introduction to sequence alignment. The Needleman-Wunsch algorithm for global sequence alignment: description and properties

Lecture 1, 31/10/2001: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties 1 Computational sequence-analysis The major goal of computational

### Motivating the need for optimal sequence alignments...

1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use