# Substitution matrices

Size: px
Start display at page:

Transcription

1 Introduction to Bioinformatics Substitution matrices Jacques van Helden Université d Aix-Marseille, France Lab. Technological Advances for Genomics and Clinics (TAGC, INSERM Unit U1090) B!GRe Bioinformatique des Génomes et Réseaux FORMER ADDRESS ( ) Université Libre de Bruxelles, Belgique Bioinformatique des Génomes et des Réseaux (BiGRe lab)

2 Substitution matrix A substitution matrix indicates the score associated to each possible pair of aligned residues in an alignment. Each row and each column represent one of the possible residues (4 for DNA, 20 for proteins). The diagonal indicates identities. The lower triangle indicates substitutions. The upper triangle is symmetrical to the lower triangle, and does not need to be displayed. Positive scores indicate that the aligned pair of residue is considered beneficial for the alignment. Note that some mismatches might have positive scores in protein alignments (see later). Negative scores are considered as penalties associated to mismatches, and alignment algorithms will try to avoid aligning the pairs of residues. One could decide to give a lower cost to A-T substitutions, if we assume that these are more likely to occur in our sequences Example: the top matrix represents arbitrarily defined scores for DNA alignment match 2 A-T mismatch -1 other mismatch -2 The scoring scheme can be represented as a substitution matrix A C G T A 2 C -2 2 G T A 4 R -1 5 N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V 2

3 Scoring an alignment matrix with a substitution matrix Let us come back to our previous alignment matrix For each cell of the alignment matrix, we compare the residue in sequences A and B, and take the score for this pair of residues in the substitution matrix. A C G T A 2 C -2 2 G T A A T C T T C A G C G T A T T G C T A T C T T A G C C G G A G G T A T T

4 Substitution counts in 71 groups of aligned proteins (Dayhoff, 1978) A R 30 N D en rouge: chiffres illisibles C Q E G H I L K M F è P S T W Y V A R N D C Q E G H I L K M F P S T W Y V Hydrophobic A C G I L M P V Aromatic H F W Y Polar N Q S T Y Basic R H K Acidic D E 4

5 Substitution matrices for proteins " s i, j = s j,i = log 2 \$ # f % i, j ( f i f ' j )& C S T P A G... C S T P A G Margaret Dayhoff (1978) measured the rate of substitutions between each pair of amino acids, in a collection of aligned proteins. Scores are calculated as log-odds Positive values reflect frequent ("accepted") substitutions, i.e. substitutions that occur more frequently than expected by chance. Negative values reflect rare ("unfavourable") mutations, i.e. substitutions that occur less frequently than expected by chance The diagonal reflect residue conservation Reference: Dayhoff et al. (1978). A model of evolutionary change in proteins. In Atlas of tein Sequence and Structure, vol. 5, suppl. 3, National Biomedical Research Foundation, Silver Spring, MD,

6 PAM scoring matrices The alignments used by Dayhoff had ~85% identity However, frequencies of substitutions are expected to depend on the rate of divergence between sequences: the number of substitutions increases with time. In order to take into account the divergence rate, Margaret Dayhoff calculated a series of scoring matrices, each reflecting a certain level of divergence PAM001 rates of substitutions between amino-acid pairs expected for proteins with an average of 1% substitution per position PAM050 rates of substitutions between amino-acid pairs expected for proteins with an average of 50% substitution per position PAM % mutations/position (note: a position could mutate several times) The substitution matrix must this be chosen according to the relatedness of the sequences to be aligned Reference: Dayhoff et al. (1978). A model of evolutionary change in proteins. In Atlas of tein Sequence and Structure, vol. 5, suppl. 3, National Biomedical Research Foundation, Silver Spring, MD,

7 Extrapolation of the PAM series from PAM001 M i,3 =P(X ) M 17,j =P( X) P( -> )= P( -> -> ) + P( -> -> ) P( -> -> ) = (0.0009)(0.0001) + (0.0001)(0.0002) (0.0001)(0.009) 7

8 PAM250 matrix C 12 S 0 2 T P A G N D E Q H R K M I L V F Y W C S T P A G N D E Q H R K M I L V F Y W Hydrophobic C P A G M I L V Aromatic H F Y W Polar S T N Q Y Basic H R K Acidic D E 8

9 Hinton diagram of the PAM250 matrix Yellow boxes indicate positive values (accepted mutations) Red boxes indicate negative values (avoided mutations). The area of each box is proportional to the absolute value of the log-odds score. 9

10 BLOSUM scoring matrices Henikoff and Henikoff (1992) analyzed substitution rates on the basis of aligned regions (blocks) They calculated scoring matrices from blocks with different percentages of protein divergence Example: BLOSUM62calculated from blocks with ~62% identity BLOSUM80calculated from blocks with ~80% identity When these substitution matrices are used to score sequence alignments, one should always choose the matrix appropriate to the expected percentage of similarity. Reference: Henikoff, S. and Henikoff, J.G. (1992). Amino acid substitution matrices from protein blocks. PNAS 89:

11 BLOSUM30 A 4 R -1 8 N D C Q E G H I L K M F P S T W Y V Asx B Glx Z Unkown X End * Asx Glx Unk End A R N D C Q E G H I L K M F P S T W Y V B Z X * Hydrophobic A C G I L M P V Aromatic H F W Y Polar N Q S T Y Basic R H K Acidic D E 11

12 BLOSUM62 A 4 R -1 5 N D C Q E G H I L K M F P S T W Y V Asx B Glx Z Unkown X End * Asx Glx Unk End A R N D C Q E G H I L K M F P S T W Y V B Z X * Hydrophobic A C G I L M P V Aromatic H F W Y Polar N Q S T Y Basic R H K Acidic D E 12

13 BLOSUM80 A 5 R -2 6 N D C Q E G H I L K M F P S T W Y V Asx B Glx Z Unkown X End * Asx Glx Unk End A R N D C Q E G H I L K M F P S T W Y V B Z X * Hydrophobic A C G I L M P V Aromatic H F W Y Polar N Q S T Y Basic R H K Acidic D E 13

14 BLOSUM62 Amino acid properties A 4 R -1 5 N D C Q E G H I L K M F P S T W Y V Asx B Glx Z Unkown X End * Asx Glx Unk End A R N D C Q E G H I L K M F P S T W Y V B Z X * Hydrophobic A C G I L M P V Aromatic H F W Y Polar N Q S T Y Basic R H K Acidic D E 14

15 BLOSUM62 - substitutions between acidic residues A 4 R -1 5 N D C Q E G H I L K M F P S T W Y V Asx B Glx Z Unkn X End * Asx Glx Unk End A R N D C Q E G H I L K M F P S T W Y V B Z X * Hydrophobic A C G I L M P V Aromatic H F W Y Polar N Q S T Y Basic R H K Acidic D E 15

16 BLOSUM62- substitutions between basic residues A 4 R -1 5 N D C Q E G H I L K M F P S T W Y V B Z X * A R N D C Q E G H I L K M F P S T W Y V B Z X * Hydrophobic A C G I L M P V Aromatic H F W Y Polar N Q S T Y Basic R H K Acidic D E 16

17 BLOSUM62 - substitutions between aromatic residues A 4 R -1 5 N D C Q E G H I L K M F P S T W Y V B Z X * A R N D C Q E G H I L K M F P S T W Y V B Z X * Hydrophobic A C G I L M P V Aromatic H F W Y Polar N Q S T Y Basic R H K Acidic D E 17

18 BLOSUM62 - substitutions between polar residues A 4 R -1 5 N D C Q E G H I L K M F P S T W Y V B Z X * A R N D C Q E G H I L K M F P S T W Y V B Z X * Hydrophobic A C G I L M P V Aromatic H F W Y Polar N Q S T Y Basic R H K Acidic D E 18

19 BLOSUM62 - substitutions between hydrophobic residues A 4 R -1 5 N D C Q E G H I L K M F P S T W Y V B Z X * A R N D C Q E G H I L K M F P S T W Y V B Z X * Hydrophobic A C G I L M P V Aromatic H F W Y Polar N Q S T Y Basic R H K Acidic D E 19

20 Substitution matrices - summary Different substitution scoring matrices have been established Residue categories (Phylip) PAM (Dayhoff, 1979). PAM means Percent Accepted Mutations BLOSUM (Henikoff & Henikoff, 1992). BLOSUM means Block sum. Substitution matrices allow to detect similarities between more distant proteins than what would be detected with the simple identity of residues. The matrix must be chosen carefully, depending on the expected rate of conservation between the sequences to be aligned. Beware With PAM matrices the score indicates the percentage of substitution per position -> higher numbers are appropriate for more distant proteins With BLOSUM matrices the score indicates the percentage of conservation -> higher numbers are appropriate for more conserved proteins 20

21 Scoring an alignment with a substitution matrix A 4 R -1 5 N D C Q E G H I L K M F P S L " S = s r1,i r 2,i i=1 T W Y V The substitution matrix can be used to assign a score to a pair-wise alignment. The score of the alignment is the sum, over all the aligned positions (i from 1 to L), of the scores of the pairs of residues (r 1,I and r 2,I ). Gaps are treated by subtracting a penalty, with two parameters: Gap opening (go) penalty Typical values : between -10 and -15 Gap extension (ge) penalty Typical values: between -0.5 and -2 A R N D C Q E G H I L K M F P S T W Y V i ! R L A S V E T D M P L T L R Q H!! T L T S L Q T T L K N L K E M A H L G T H! S! 21

22 Scoring an alignment with a substitution matrix A 4 R -1 5 N D C Q E G H I L K M F P S L " S = s r1,i r 2,i i=1 T W Y V The substitution matrix can be used to assign a score to a pair-wise alignment. The score of the alignment is the sum, over all the aligned positions (i from 1 to L), of the scores of the pairs of residues (r 1,I and r 2,I ). Gaps are treated by subtracting a penalty, with two parameters: Gap opening (go) penalty Typical values : between -10 and -15 Gap extension (ge) penalty Typical values: between -0.5 and -2 A R N D C Q E G H I L K M F P S T W Y V i ! R L A S V E T D M P L T L R Q H!.. : :. :. go ge ge ge ge....! T L T S L Q T T L K N L K E M A H L G T H! S = 7! 22

23 Bibliography Substitution matrices PAM series Dayhoff, M. O., Schwartz, R. M. & Orcutt, B. (1978). A model of evolutionary change in proteins. Atlas of tein Sequence and Structure 5, BLOSUM substitution matrices Henikoff, S. & Henikoff, J. G. (1992). Amino acid substitution matrices from protein blocks. c Natl Acad Sci U S A 89, Gonnet matrices, built by an iterative procedure Gonnet, G. H., Cohen, M. A. & Benner, S. A. (1992). Exhaustive matching of the entire protein sequence database. Science 256,

### Theoretical distribution of PSSM scores

Regulatory Sequence Analysis Theoretical distribution of PSSM scores Jacques van Helden Jacques.van-Helden@univ-amu.fr Aix-Marseille Université, France Technological Advances for Genomics and Clinics (TAGC,

### CSE 549: Computational Biology. Substitution Matrices

CSE 9: Computational Biology Substitution Matrices How should we score alignments So far, we ve looked at arbitrary schemes for scoring mutations. How can we assign scores in a more meaningful way? Are

### Position-specific scoring matrices (PSSM)

Regulatory Sequence nalysis Position-specific scoring matrices (PSSM) Jacques van Helden Jacques.van-Helden@univ-amu.fr Université d ix-marseille, France Technological dvances for Genomics and Clinics

### Quantifying sequence similarity

Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

### Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement

### CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

### Scoring Matrices. Shifra Ben-Dor Irit Orr

Scoring Matrices Shifra Ben-Dor Irit Orr Scoring matrices Sequence alignment and database searching programs compare sequences to each other as a series of characters. All algorithms (programs) for comparison

### Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Biochemistry 324 Bioinformatics Pairwise sequence alignment How do we compare genes/proteins? When we have sequenced a genome, we try and identify the function of unknown genes by finding a similar gene

### Local Alignment Statistics

Local Alignment Statistics Stephen Altschul National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda, MD Central Issues in Biological Sequence Comparison

### Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib

### Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

### Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

### In-Depth Assessment of Local Sequence Alignment

2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.

### Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

### Sequence Database Search Techniques I: Blast and PatternHunter tools

Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered

### Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

### Pairwise sequence alignments

Pairwise sequence alignments Volker Flegel VI, October 2003 Page 1 Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs VI, October

### THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

### Large-Scale Genomic Surveys

Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

### BIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University

BIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University Measures of Sequence Similarity Alignment with dot

### Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University

Sequence Alignment: Scoring Schemes COMP 571 Luay Nakhleh, Rice University Scoring Schemes Recall that an alignment score is aimed at providing a scale to measure the degree of similarity (or difference)

### Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

### Practical considerations of working with sequencing data

Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!

### Sequence analysis and comparison

The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

### CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

### Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

### Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

### Scoring Matrices. Shifra Ben Dor Irit Orr

Scoring Matrices Shifra Ben Dor Irit Orr Scoring matrices Sequence alignment and database searching programs compare sequences to each other as a series of characters. All algorithms (programs) for comparison

### Pairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel )

Pairwise sequence alignments Vassilios Ioannidis (From Volker Flegel ) Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs Importance

### Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology

### Computational Biology

Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

### Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

### 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

### Optimization of a New Score Function for the Detection of Remote Homologs

PROTEINS: Structure, Function, and Genetics 41:498 503 (2000) Optimization of a New Score Function for the Detection of Remote Homologs Maricel Kann, 1 Bin Qian, 2 and Richard A. Goldstein 1,2 * 1 Department

### Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise

### Matrix-based pattern matching

Regulatory sequence analysis Matrix-based pattern matching Jacques van Helden Jacques.van-Helden@univ-amu.fr Aix-Marseille Université, France Technological Advances for Genomics and Clinics (TAGC, INSERM

### Algorithms in Bioinformatics

Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

### Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

### Statistical Distributions of Optimal Global Alignment Scores of Random Protein Sequences

BMC Bioinformatics This Provisional PDF corresponds to the article as it appeared upon acceptance. The fully-formatted PDF version will become available shortly after the date of publication, from the

### Pairwise sequence alignment

Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL

### Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary

### Single alignment: Substitution Matrix. 16 march 2017

Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block

### Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison: Score matrices Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas Informal inductive proof of best alignment path onsider the last step in the best

### C E N T R. Introduction to bioinformatics 2007 E B I O I N F O R M A T I C S V U F O R I N T. Lecture 5 G R A T I V. Pair-wise Sequence Alignment

C E N T R E F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U Introduction to bioinformatics 2007 Lecture 5 Pair-wise Sequence Alignment Bioinformatics Nothing in Biology makes sense except in

### Multiple Alignment using Hydrophobic Clusters : a tool to align and identify distantly related proteins

Multiple Alignment using Hydrophobic Clusters : a tool to align and identify distantly related proteins J. Baussand, C. Deremble, A. Carbone Analytical Genomics Laboratoire d Immuno-Biologie Cellulaire

### Sequence comparison: Score matrices

Sequence comparison: Score matrices http://facultywashingtonedu/jht/gs559_2013/ Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas FYI - informal inductive proof of best

### Lecture 5,6 Local sequence alignment

Lecture 5,6 Local sequence alignment Chapter 6 in Jones and Pevzner Fall 2018 September 4,6, 2018 Evolution as a tool for biological insight Nothing in biology makes sense except in the light of evolution

### Inferring Phylogenies from Protein Sequences by. Parsimony, Distance, and Likelihood Methods. Joseph Felsenstein. Department of Genetics

Inferring Phylogenies from Protein Sequences by Parsimony, Distance, and Likelihood Methods Joseph Felsenstein Department of Genetics University of Washington Box 357360 Seattle, Washington 98195-7360

### Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison: Score matrices Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas FYI - informal inductive proof of best alignment path onsider the last step in

### Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

### Exercise 5. Sequence Profiles & BLAST

Exercise 5 Sequence Profiles & BLAST 1 Substitution Matrix (BLOSUM62) Likelihood to substitute one amino acid with another Figure taken from https://en.wikipedia.org/wiki/blosum 2 Substitution Matrix (BLOSUM62)

### Matrix-based pattern discovery algorithms

Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)

### Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Institute of Bioinformatics Johannes Kepler University, Linz, Austria Sequence Alignment 2. Sequence Alignment Sequence Alignment 2.1

### Moreover, the circular logic

Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT

### 7.36/7.91 recitation CB Lecture #4

7.36/7.91 recitation 2-19-2014 CB Lecture #4 1 Announcements / Reminders Homework: - PS#1 due Feb. 20th at noon. - Late policy: ½ credit if received within 24 hrs of due date, otherwise no credit - Answer

### Tutorial 4 Substitution matrices and PSI-BLAST

Tutorial 4 Substitution matrices and PSI-BLAST 1 Agenda Substitution Matrices PAM - Point Accepted Mutations BLOSUM - Blocks Substitution Matrix PSI-BLAST Cool story of the day: Why should we care about

### Foreword by. Stephen Altschul. An Essential Guide to the Basic Local Alignment Search Tool BLAST. Ian Korf, Mark Yandell & Joseph Bedell

An Essential Guide to the Basic Local Alignment Search Tool Foreword by Stephen Altschul BLAST Ian Korf, Mark Yandell & Joseph Bedell BLAST Other resources from O Reilly oreilly.com oreilly.com is more

### An Introduction to Sequence Similarity ( Homology ) Searching

An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,

### Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

### Evaluation Measures of Multiple Sequence Alignments. Gaston H. Gonnet, *Chantal Korostensky and Steve Benner. Institute for Scientic Computing

Evaluation Measures of Multiple Sequence Alignments Gaston H. Gonnet, *Chantal Korostensky and Steve Benner Institute for Scientic Computing ETH Zurich, 8092 Zuerich, Switzerland phone: ++41 1 632 74 79

### Pairwise Sequence Alignment

Introduction to Bioinformatics Pairwise Sequence Alignment Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Outline Introduction to sequence alignment pair wise sequence alignment The Dot Matrix Scoring

### Similarity searching summary (2)

Similarity searching / sequence alignment summary Biol4230 Thurs, February 22, 2016 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 What have we covered? Homology excess similiarity but no excess similarity

### INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM. Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld

INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld University of Illinois at Chicago, Dept. of Electrical

### A New Similarity Measure among Protein Sequences

A New Similarity Measure among Protein Sequences Kuen-Pin Wu, Hsin-Nan Lin, Ting-Yi Sung and Wen-Lian Hsu * Institute of Information Science Academia Sinica, Taipei 115, Taiwan Abstract Protein sequence

### Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

### First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences 140.638 where do sequences come from? DNA is not hard to extract (getting DNA from a

### Local Alignment: Smith-Waterman algorithm

Local Alignment: Smith-Waterman algorithm Example: a shared common domain of two protein sequences; extended sections of genomic DNA sequence. Sensitive to detect similarity in highly diverged sequences.

### Sequence Comparison. mouse human

Sequence Comparison Sequence Comparison mouse human Why Compare Sequences? The first fact of biological sequence analysis In biomolecular sequences (DNA, RNA, or amino acid sequences), high sequence similarity

### Sequence analysis and Genomics

Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

### Substitution Matrices

C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E [1] Substitution matrices Sequence analysis 2006 Substitution Matrices Introduction to bioinformatics 2007 Lecture 8 C E N T R F

### Bioinformatics and BLAST

Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists

### Sequence Analysis '17 -- lecture 7

Sequence Analysis '17 -- lecture 7 Significance E-values How significant is that? Please give me a number for......how likely the data would not have been the result of chance,......as opposed to......a

### Introduction to Bioinformatics

Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression

### ... and searches for related sequences probably make up the vast bulk of bioinformatics activities.

1 2 ... and searches for related sequences probably make up the vast bulk of bioinformatics activities. 3 The terms homology and similarity are often confused and used incorrectly. Homology is a quality.

### Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:50 5001 5 Multiple Sequence Alignment The first part of this exposition is based on the following sources, which are recommended reading:

### Solving the protein sequence metric problem

Solving the protein sequence metric problem William R. Atchley*, Jieping Zhao, Andrew D. Fernandes, and Tanja Drüke *Department of Genetics, Bioinformatics Research Center, Graduate Program in Biomathematics,

### BINF 730. DNA Sequence Alignment Why?

BINF 730 Lecture 2 Seuence Alignment DNA Seuence Alignment Why? Recognition sites might be common restriction enzyme start seuence stop seuence other regulatory seuences Homology evolutionary common progenitor

### Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

### PAM-1 Matrix 10,000. From: Ala Arg Asn Asp Cys Gln Glu To:

119-1 atrix 10,000 rom: la rg sn sp ys ln lu o: la 9867 2 9 10 3 8 17 rg 1 9913 1 0 1 10 0 sn 4 1 9822 36 0 4 6 sp 6 0 42 9859 0 6 53 ys 1 1 0 0 9973 0 0 ln 3 9 4 5 0 9876 27 lu 10 0 7 56 0 35 9865 120

### Chapter 2: Sequence Alignment

Chapter 2: Sequence lignment 2.2 Scoring Matrices Prof. Yechiam Yemini (YY) Computer Science Department Columbia University COMS4761-2007 Overview PM: scoring based on evolutionary statistics Elementary

### Substitutional Analysis of Orthologous Protein Families Using BLOCKS

www.bioinformation.net Volume 13(1) Hypothesis Substitutional Analysis of Orthologous Protein Families Using BLOCKS Parth Sarthi Sen Gupta 1, Shyamashree Banerjee 1, Rifat Nawaz Ul Islam 1, Vishma Pratap

### Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm

Lecture 2, 12/3/2003: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties Local alignment the Smith-Waterman algorithm 1 Computational

### Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and

### Similarity or Identity? When are molecules similar?

Similarity or Identity? When are molecules similar? Mapping Identity A -> A T -> T G -> G C -> C or Leu -> Leu Pro -> Pro Arg -> Arg Phe -> Phe etc If we map similarity using identity, how similar are

### BLAST: Target frequencies and information content Dannie Durand

Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences

### bioinformatics 1 -- lecture 7

bioinformatics 1 -- lecture 7 Probability and conditional probability Random sequences and significance (real sequences are not random) Erdos & Renyi: theoretical basis for the significance of an alignment

### BLAST: Basic Local Alignment Search Tool

.. CSC 448 Bioinformatics Algorithms Alexander Dekhtyar.. (Rapid) Local Sequence Alignment BLAST BLAST: Basic Local Alignment Search Tool BLAST is a family of rapid approximate local alignment algorithms[2].

### Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in

### Lecture 1, 31/10/2001: Introduction to sequence alignment. The Needleman-Wunsch algorithm for global sequence alignment: description and properties

Lecture 1, 31/10/2001: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties 1 Computational sequence-analysis The major goal of computational

### Bioinformatics. Transcriptome

Bioinformatics Transcriptome Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/ Bioinformatics

### Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

### Introduction to Bioinformatics Algorithms Homework 3 Solution

Introduction to Bioinformatics Algorithms Homework 3 Solution Saad Mneimneh Computer Science Hunter College of CUNY Problem 1: Concave penalty function We have seen in class the following recurrence for

### Lecture Notes: Markov chains

Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

### Algorithms in Bioinformatics: A Practical Introduction. Sequence Similarity

Algorithms in Bioinformatics: A Practical Introduction Sequence Similarity Earliest Researches in Sequence Comparison Doolittle et al. (Science, July 1983) searched for platelet-derived growth factor (PDGF)

Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment

### BIOINFORMATICS: An Introduction

BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and

### 5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT.03.239 03.10.2012 ALIGNMENT Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity. Homology: