Basic Local Alignment Search Tool

Similar documents
BLAST. Varieties of BLAST

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Algorithms in Bioinformatics

EECS730: Introduction to Bioinformatics

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Tools and Algorithms in Bioinformatics

Sequence Database Search Techniques I: Blast and PatternHunter tools

Bioinformatics and BLAST

Grundlagen der Bioinformatik, SS 08, D. Huson, May 2,

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Tools and Algorithms in Bioinformatics

Alignment & BLAST. By: Hadi Mozafari KUMS

Bioinformatics for Biologists

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Heuristic Alignment and Searching

Exercise 5. Sequence Profiles & BLAST

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

In-Depth Assessment of Local Sequence Alignment

SUPPLEMENTARY INFORMATION

Large-Scale Genomic Surveys

Introduction to protein alignments

Sequence Alignment Techniques and Their Uses

Protein function prediction based on sequence analysis

Single alignment: Substitution Matrix. 16 march 2017

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Sequence analysis and comparison

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Collected Works of Charles Dickens

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis

Scoring Matrices. Shifra Ben-Dor Irit Orr

BLAST: Basic Local Alignment Search Tool

Introduction to Bioinformatics

DNA and protein databases. EMBL/GenBank/DDBJ database of nucleic acids

Practical considerations of working with sequencing data

Local Alignment Statistics

Pairwise & Multiple sequence alignments

Quantifying sequence similarity

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

Computational methods for predicting protein-protein interactions

In order to compare the proteins of the phylogenomic matrix, we needed a similarity

Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University

BIOINFORMATICS 1 INTRODUCTION TO SEQUENCE ANALYSIS EVOLUTIONARY BASIS OF SEQUENCE ANALYSES EVOLUTIONARY BASIS OF SEQUENCE ANALYSES

EECS730: Introduction to Bioinformatics

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Fundamentals of database searching

Advanced topics in bioinformatics

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Bioinformatics 1 lecture 13. Database searches. Profiles Orthologs/paralogs Tree of Life term projects

BLAST: Target frequencies and information content Dannie Durand

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Chapter 7: Rapid alignment methods: FASTA and BLAST

Comparative Bioinformatics Midterm II Fall 2004

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

An Introduction to Sequence Similarity ( Homology ) Searching

Computational Biology

Scoring Matrices. Shifra Ben Dor Irit Orr

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Optimization of a New Score Function for the Detection of Remote Homologs

Sequence analysis and Genomics

Network Alignment 858L

Pairwise sequence alignment

K-means-based Feature Learning for Protein Sequence Classification

Similarity searching summary (2)

CSE : Computational Issues in Molecular Biology. Lecture 6. Spring 2004

Motivating the need for optimal sequence alignments...

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Week 10: Homology Modelling (II) - HHpred

A New Similarity Measure among Protein Sequences

Substitution matrices

G4120: Introduction to Computational Biology

Phylogenetic inference

Sequences, Structures, and Gene Regulatory Networks

CRITICA: Coding Region Identification Tool Invoking Comparative Analysis

Introduction to Bioinformatics Introduction to Bioinformatics

BIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University

Introduction to Sequence Alignment. Manpreet S. Katari

Tutorial 4 Substitution matrices and PSI-BLAST

Example of Function Prediction

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Small RNA in rice genome

Pairwise sequence alignments

Exploring Evolution & Bioinformatics


Comparing whole genomes

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Sequence Comparison. mouse human

SEQUENCE alignment is an underlying application in the

Transcription:

Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses a subset of a sequence and attempts to align it to subset of other sequences computationally less expensive than other methods Local Alignment Example a small seed is uncovered The initial seed for the alignment: TAT AAGCGAATAATATATTTATACTCAGATTATTGCGCG And now the extended alignment: TATATATTAGTA AAGCGAATAATATATTTATACTCAGATTATTGCGCG

BLAST Basic Local Alignment Search Tool Altschul et al. (1990. J Mol Biol 215:403) Set of alignment algorithms Use the same search protocol to: o Find a short fragment of a query sequence o That aligns with a fragment of a subject sequence found in a database General Concept for Original BLAST Program Sequence (query) is broken into words of length W Align all words with sequences in the database Calculate score T for each word that aligns with a sequence in the database using a substitution matrix Discard words whose T value is below a neighborhood score threshold Extend words in both directions until score falls by dropoff value X when compared to previous best score

Search Matrices BLOSUM 62 matrix. (BLOSUM = BLOcks SUbstitution Matrix) Henikoff and Henikoff (1992. PNAS 89:10915-10919) Studied 2000 aligned blocks of 500 groups of related proteins o Determined the different types of amino acid substitutions that occurred in these proteins o Developed the matrix based on the study o Positive value Identities or high similarities o Negative value Penalty Non similar substitutions BLOSUM 62 Amino Acid Matrix A C D E F G H I K L M N P Q R S T V W Y A 4 0 2 1 2 0 2 1 1 1 1 2 1 1 1 1 0 0 3 2 C 9 3 4 2 3 3 1 3 1 1 3 3 3 3 1 1 1 2 2 D 6 2 3 1 1 3 1 4 3 1 1 0 2 0 2 3 4 3 E 5 3 2 0 3 1 3 2 0 1 2 0 0 1 2 3 2 F 6 3 1 0 3 0 0-3 4 3 3 2 2 1 1 3 G 6 2 4 2 4 3 0 2 2 2 0 2 3 2 3 H 8 3 1 3 2 1 2 0 0 1 2 3 2 2 I 4 3 2 1 3 3 3 3 1 1 3 3 1 K 5 2 1 0 1 1 2 0-1 -2-3 2 L 4 2 3 3 2 2 2 1 1 2 1 M 5 2 2 0 1 1 1 1 1 1 N 6 2 0 0 1 0 3 4 2 P 7 1 2 1 1 2 4 3 Q 5 1 0-1 -2-2 1 R 5 2 2 3 3 2 S 4 1 2 3 2 T 5 0 2 2 V 4 3 1 W 11 2 Y 7

BLAST Words Three characters in length for proteins Complied by using a sliding window -------- M E N G G P A P E S

Align all words and calculate T score 17 18 16 19 TOP SCORE 10 3-2 -------- M E N G G P A P E S QUERY 1-1 -2 6 6 7 4 7 5 4 BLOSUM 62 Score I P A G G P A P E S DATABASE SEQUENCE

Build Alignment 1. Original alignment: T Score =19 G G P 1-1 -2 6 6 7 4 7 5 4 BLOSUM 62 Score I P A G G P A P E S 2. Extend one amino acid in each direction: T Score = 21 N G G P A 1-1 -2 6 6 7 4 7 5 4 BLOSUM 62 Score I P A G G P A P E S 3. Stop when next extension drops off below value X compared to previous score Points to Remember The T score is converted into a bits score by a complicated formula The X value is based on the bit score

BLAST Statistics Score (bits) A statistical conversion of the score derived by summing using the substitution matrix E value of 10 (=1x10-10 ) Unlikely that random chance lead to this current alignment compared to an alignment with an e value of 1 Often considered to be a probability Rules of thumb: E value of 30 or less o Sequences are homologous E values of 5 Often considered significant enough when annotating a genome

BLAST2 (1997. Nucleic Acids Research 25:3389-3402) Takes a different (and three-times faster) approach than the original BLAST algorithm Same word search Lower T value Neighboring words discovered Must be at a distance less than A (default 40) Alignment extended from the neighboring words Gap penalties New in BLAST2 Allow for better alignments Default for amino acid search Introducing a gap o -11 Extending that gap o -1 BLAST Algorithms Search Query Database blastn nucleotide nucleotide blastx translated nucleotide in all six frames protein tblastx translated nucleotide in all six frames translated nucleotide in all six frames blastp protein protein

Homology, Orthology, Parology, Identity, Similarity How do we define the relationship between two nucleotide or protein sequences? Homology Two sequences are said to show homology if they are identical by descent in a taxonomic lineage or the result of within species duplication Homologs Two sequences that exhibit homology Orthologs Two sequences related by descent in a species lineage Species A----- ----- Orthologs Species B-----

Paralogs Two sequences related by a duplication event within a species A ---------------------------- A A ---------------------------- Paralogs Describing the relationship between orthologs and paralogs Important note Two sequences are homologous or not homologous; there is no percentage of homology Identity The % of exact nucleotide or amino acid matches between to sequences Similarity The % of identical or similar amino acids between to protein sequences Examples of similar amino acids Valine/Leucine Valine/Isoleucine Threonine/Serine