Sequence Comparison: Local Alignment. Genome 373 Genomic Informatics Elhanan Borenstein

Similar documents
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Sequence comparison: Score matrices

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Pairwise & Multiple sequence alignments

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Local Alignment: Smith-Waterman algorithm

Sequence analysis and Genomics

EECS730: Introduction to Bioinformatics

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Homology. Bio5488 Ting Wang 1/25/15, 1/27/15

Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Introduction to Bioinformatics

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

1.5 Sequence alignment

Pairwise sequence alignment

Pairwise sequence alignments

Bioinformatics and BLAST

Collected Works of Charles Dickens

In-Depth Assessment of Local Sequence Alignment

An Introduction to Sequence Similarity ( Homology ) Searching

Basic Local Alignment Search Tool

Pairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel )

Large-Scale Genomic Surveys

3.2 Differentiability

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Inverse Functions. Review from Last Time: The Derivative of y = ln x. [ln. Last time we saw that

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

MegAlign Pro Pairwise Alignment Tutorials

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Designing Information Devices and Systems II Spring 2018 J. Roychowdhury and M. Maharbiz Discussion 2A

A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection

Motivating the need for optimal sequence alignments...

x = c of N if the limit of f (x) = L and the right-handed limit lim f ( x)

Lecture 2: Pairwise Alignment. CG Ron Shamir

Algorithms in Bioinformatics

Multivariable Calculus: Chapter 13: Topic Guide and Formulas (pgs ) * line segment notation above a variable indicates vector

Similarity or Identity? When are molecules similar?

The Principle of Least Action

EECS730: Introduction to Bioinformatics

Lecture 1, 31/10/2001: Introduction to sequence alignment. The Needleman-Wunsch algorithm for global sequence alignment: description and properties

Sequence Alignment (chapter 6)

WJEC Core 2 Integration. Section 1: Introduction to integration

A Modification of the Jarque-Bera Test. for Normality

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

Lecture 4: September 19

Math 180, Exam 2, Fall 2012 Problem 1 Solution. (a) The derivative is computed using the Chain Rule twice. 1 2 x x

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

Sequence analysis and comparison

Pair Hidden Markov Models

8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009

Pairwise alignment, Gunnar Klau, November 9, 2005, 16:

Heuristic Alignment and Searching

Administration. ndrew Torda April /04/2008 [ 1 ]

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module FP2. Further Pure 2. Cambridge University Press Further Pure 2 and 3 Hugh Neill and Douglas Quadling Excerpt More information

Introduction to Computation & Pairwise Alignment

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Tools and Algorithms in Bioinformatics

Sturm-Liouville Theory

ARCH 614 Note Set 5 S2012abn. Moments & Supports

Calculus I Sec 2 Practice Test Problems for Chapter 4 Page 1 of 10

Phylogenetic Tree Reconstruction

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Bioinformatics for Biologists

Grundlagen der Bioinformatik, SS 08, D. Huson, May 2,

DERIVATIVES: LAWS OF DIFFERENTIATION MR. VELAZQUEZ AP CALCULUS

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Pairwise alignment. 2.1 Introduction GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSD----LHAHKL

cosh x sinh x So writing t = tan(x/2) we have 6.4 Integration using tan(x/2) 2t 1 + t 2 cos x = 1 t2 sin x =

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Single alignment: Substitution Matrix. 16 march 2017

Sequence Analysis '17- lecture 8. Multiple sequence alignment

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

Similarity Measures for Categorical Data A Comparative Study. Technical Report

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences

Similarity searching summary (2)

8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011

Practical Bioinformatics

Year 11 Matrices Semester 2. Yuk

Alignment & BLAST. By: Hadi Mozafari KUMS

Comparing whole genomes

Evolution. CT Amemiya et al. Nature 496, (2013) doi: /nature12027

Unsupervised Vocabulary Induction

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

x f(x) x f(x) approaching 1 approaching 0.5 approaching 1 approaching 0.

Overview Multiple Sequence Alignment

Introduction to Sequence Alignment. Manpreet S. Katari

Computational Biology

Transcription:

Sequence Comparison: Local Alignment Genome 373 Genomic Informatics Elhanan Borenstein

A quick review: Global Alignment Global Alignment Mission: Fin the best global alignment between two sequences. An algorithm for fining the alignment with the best score A metho for scoring alignments

A quick review: Global Alignment Global Alignment Mission: Fin the best global alignment between two sequences. An algorithm for fining the alignment with the best score A metho for scoring alignments?

Review: Global Alignment Three Possible Moves: A iagonal move aligns a character from each sequence. A horizontal move aligns a gap in the seq along the left ege A vertical move aligns a gap in the seq along the top ege. The move you keep is the best scoring of the three.

Review: Global Alignment Fill DP matrix from upper left to lower right. Traceback alignment from lower right corner. G A A T C -4-8 -12-16 -2 C -4-5 -9-13 -12-6 A -8-4 5 1-3 -7 T -12-8 1 11 7 A -16-12 2 11 7 6 C -2-16 -2 7 11 17

DP in equation form Align sequence x an y. F is the DP matrix; s is the substitution matrix; is the linear gap penalty. j i F j i F y x s j i F j i F F j i 1, 1,, 1 1, max,,

DP equation graphically F i 1, j 1 F i, j 1 s x, y i j F i 1, j F i, j take the max of these three

Local alignment Mission: Fin best partial alignment between two sequences. Why?

Local alignment A single-omain protein may be similar only to one region within a multi-omain protein. A DNA/RNA query may align to a small part of a genome/genomes/metagenomes. An alignment that spans the complete length of both sequences may be unesirable.

BLAST oes local alignments Typical search has a short query against long targets. The alignments returne show only the wellaligne match region of both query an target. Query: matche regions returne in alignment Targets: (e.g. genome contigs, full genomes, metagenomes)

How can we moify the Neeleman- Wunsch DP algorithm (for fining global alignment) such that it will fin instea the best local alignment??

G A A T - C - C A T A C -4-5 1 1-4 1 = 17

G A A T - C - C A T A C -4-5 1 1-4 1 = 17-4 -9 1 11 7 17

Remember: Global alignment DP Align sequence x an y. F is the DP matrix; s is the substitution matrix; is the linear gap penalty. j i F j i F y x s j i F j i F F j i 1, 1,, 1 1, max,,

Local alignment DP Align sequence x an y. F is the DP matrix; s is the substitution matrix; is the linear gap penalty. (correspons to start of alignment)

A simple example A C G T A 2-7 -5-7 initialize the same way as for global alignment C -7 2-7 -5 G -5-7 2-7 A A G T -7-5 -7 2 = -5 A F i 1, j 1 s x i, y j F i, j 1 G C F i 1, j F i, j

A simple example A C G T A 2-7 -5-7 C -7 2-7 -5 G -5-7 2-7 T -7-5 -7 2 = -5 A? A A G??? F i 1, j 1 s x i, y j F i, j 1 G? C? F i 1, j F i, j

A simple example A C G T A 2-7 -5-7 C -7 2-7 -5 G -5-7 2-7 T -7-5 -7 2 = -5 A A A G F i 1, j 1 s x i, y j F i, j 1 G C F i 1, j F i, j

A simple example A C G T A 2-7 -5-7 C -7 2-7 -5 G -5-7 2-7 T -7-5 -7 2 = -5 A A G A? F i 1, j 1 s x i, y j F i, j 1 G C F i 1, j F i, j

A simple example A C G T A 2-7 -5-7 C -7 2-7 -5 G -5-7 2-7 T -7-5 -7 2 = -5 A A A G 2-5 -5 F i 1, j 1 F i, j 1 G s x i, y j C F i 1, j F i, j

A A A simple example A C G T A 2-7 -5-7 C -7 2-7 -5 G -5-7 2-7 T -7-5 -7 2 = -5 A A G A 2 F i 1, j 1 s x i, y j F i, j 1 G C F i 1, j F i, j

A simple example A C G T A 2-7 -5-7 C -7 2-7 -5 G -5-7 2-7 T -7-5 -7 2 = -5 A A G A 2 F i 1, j 1 s x i, y j F i, j 1 G? C? F i 1, j F i, j

A simple example A C G T A 2-7 -5-7 C -7 2-7 -5 G -5-7 2-7 T -7-5 -7 2 = -5 A A A G 2 F i 1, j 1 F i, j 1 G -5-5 -3 s x i, y j C? F i 1, j F i, j

A simple example A C G T A 2-7 -5-7 C -7 2-7 -5 G -5-7 2-7 T -7-5 -7 2 = -5 A A G A 2 F i 1, j 1 s x i, y j F i, j 1 G C? F i 1, j F i, j (signify no preceing alignment with no arrow)

A simple example A C G T A 2-7 -5-7 C -7 2-7 -5 G -5-7 2-7 T -7-5 -7 2 = -5 A A G A 2? F i 1, j 1 s x i, y j F i, j 1 G? C? F i 1, j F i, j

A simple example A C G T A 2-7 -5-7 C -7 2-7 -5 G -5-7 2-7 T -7-5 -7 2 = -5 A A G A 2 2 F i 1, j 1 s x i, y j F i, j 1 G C F i 1, j F i, j

A simple example A C G T A 2-7 -5-7 C -7 2-7 -5 G -5-7 2-7 T -7-5 -7 2 = -5 A A G A 2 2? F i 1, j 1 s x i, y j F i, j 1 G? C? F i 1, j F i, j

A simple example A C G T A 2-7 -5-7 C -7 2-7 -5 G -5-7 2-7 T -7-5 -7 2 = -5 A A G A 2 2 F i 1, j 1 s x i, y j F i, j 1 G 4 C F i 1, j F i, j

What s ifferent about the DP matrix Global Alignment DP Matrix Local Alignment DP Matrix

A simple example F i 1, j 1 F i 1, A C G T A 2-7 -5-7 C -7 2-7 -5 G -5-7 2-7 T -7-5 -7 2 = -5 j traceback? F i, j But F i, j 1 how s x, i y j o we A A G A 2 2 G 4 C

Traceback AG AG A C G T A 2-7 -5-7 C -7 2-7 -5 G -5-7 2-7 T -7-5 -7 2 = -5 A A G A 2 2 G 4 F i 1, j 1 F i 1, j s x i, y j F i, j F i, j 1 C Start traceback at highest score anywhere in matrix, follow arrows back until you reach

Multiple local alignments Traceback from highest score, marking each DP matrix score along traceback. Now traceback from the remaining highest score, etc. The alignments may or may not inclue the same parts of the two sequences. 2 1

Local alignment Two ifferences from global alignment: If a DP score is negative, replace with. Traceback from the highest score in the matrix an continue until you reach. Global alignment algorithm: Neeleman-Wunsch. Local alignment algorithm: Smith-Waterman.

Another example F i 1, j 1 F i 1, A C G T A 2-7 -5-7 C -7 2-7 -5 G -5-7 2-7 T -7-5 -7 2 = -5 j s x i, y j F i, j F i, j 1 Fin the optimal local alignment of AAG an GAAGGC. Use a gap penalty of = -5. A A G G 2 A 2 2 A 2 4 G 6 G 2 C

Compare with the Best GLOBAL Alignment F i 1, j 1 F i 1, A C G T A 2-7 -5-7 C -7 2-7 -5 G -5-7 2-7 T -7-5 -7 2 = -5 j s x i, y j F i, j F i, j 1 (contrast with the best local alignment) Fin the optimal Global alignment of AAG an GAAGGC. Use a gap penalty of = -5. G -5 A -1 A -15 G -2 G -25 C -3 A A G -5-1 -15

Summary Global alignment algorithm: Neeleman-Wunsch. Local alignment algorithm: Smith-Waterman.

Using sequence alignment to stuy evolution

Are these proteins relate? SEQ 1: RVVNLVPS--FWVLDATYKNYAINYNCDVTYKLY L P L Y N Y C L SEQ 2: QFFPLMPPAPYFILATDYENLPLVYSCTTFFWLF The intuitive answer: score = -1 NO? SEQ 1: RVVNLVPS--FWVLDATYKNYAINYNCDVTYKLY L P W LDATYKN A Y C L SEQ 2: QFFPLMPPAPYWILDATYKNLALVYSCTTFFWLF score = 15 PROBABLY? SEQ 1: RVVNLVPS--FWVLDATYKNYAINYNCDVTYKLY RVV L PS W LDATYKNYA Y CDVTYKL SEQ 2: RVVPLMPSAPYWILDATYKNYALVYSCDVTYKLF score = 37 YES?

Significance of scores HPDKKAHSIHAWILSKSKVLEGNTKEVVDNVLKT Alignment algorithm 45 Low score = unrelate High score = relate LENENQGKCTIAEYKYDGKKASVYNSFVSNGVKE But how high is high enough?

Significance of scores HPDKKAHSIHAWILSKSKVLEGNTKEVVDNVLKT Alignment algorithm 45 Low score = unrelate High score = relate LENENQGKCTIAEYKYDGKKASVYNSFVSNGVKE But how high is high enough? Subjective Problem specific Parameter specific

The null hypothesis We want to know how surprising a given score is, assuming that the two sequences are not relate. This assumption is calle the null hypothesis. The purpose of most statistical tests is to etermine whether the observe result provies a reason to reject the null hypothesis. We want to characterize the istribution of scores from pairwise sequence alignments.

Frequency Sequence similarity score istribution Sequence comparison score Search a atabase of unrelate sequences using a given query sequence. What will be the form of the resulting istribution of pairwise alignment scores?