Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm

Similar documents
Lecture 1, 31/10/2001: Introduction to sequence alignment. The Needleman-Wunsch algorithm for global sequence alignment: description and properties

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Advanced topics in bioinformatics

In-Depth Assessment of Local Sequence Alignment

Pairwise sequence alignment

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Collected Works of Charles Dickens

Tools and Algorithms in Bioinformatics

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

An Introduction to Sequence Similarity ( Homology ) Searching

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Single alignment: Substitution Matrix. 16 march 2017

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Sequence analysis and Genomics

Introduction to Bioinformatics

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Bioinformatics for Biologists

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sequence Alignment (chapter 6)

Pairwise & Multiple sequence alignments

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Practical considerations of working with sequencing data

Tools and Algorithms in Bioinformatics

Motivating the need for optimal sequence alignments...

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences

Sequence Alignment Techniques and Their Uses

Algorithms in Bioinformatics

Lecture 5,6 Local sequence alignment

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter

Bioinformatics and BLAST

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Pairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel )

Computational Biology

Sequence Comparison. mouse human

C E N T R. Introduction to bioinformatics 2007 E B I O I N F O R M A T I C S V U F O R I N T. Lecture 5 G R A T I V. Pair-wise Sequence Alignment

Alignment & BLAST. By: Hadi Mozafari KUMS

Pairwise sequence alignments

1.5 Sequence alignment

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Large-Scale Genomic Surveys

Basic Local Alignment Search Tool

Example questions. Z:\summer_10_teaching\bioinfo\Beispiel_frage_bioinformatik.doc [1 / 5]

BIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Practical Bioinformatics

Study and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis

8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Sequence comparison: Score matrices

Sequence Database Search Techniques I: Blast and PatternHunter tools

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Segment-based scores for pairwise and multiple sequence alignments

Similarity or Identity? When are molecules similar?

Administration. ndrew Torda April /04/2008 [ 1 ]

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence analysis and comparison

Statistical Distributions of Optimal Global Alignment Scores of Random Protein Sequences

Heuristic Alignment and Searching

Introduction to Computation & Pairwise Alignment

Pairwise Sequence Alignment

EECS730: Introduction to Bioinformatics

Lecture 2: Pairwise Alignment. CG Ron Shamir

Grundlagen der Bioinformatik, SS 08, D. Huson, May 2,

Biologically significant sequence alignments using Boltzmann probabilities

Exercise 5. Sequence Profiles & BLAST

A profile-based protein sequence alignment algorithm for a domain clustering database

Alignment Strategies for Large Scale Genome Alignments

Fundamentals of database searching

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Finding the Best Biological Pairwise Alignment Through Genetic Algorithm Determinando o Melhor Alinhamento Biológico Através do Algoritmo Genético

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Do Aligned Sequences Share the Same Fold?

Introduction to protein alignments

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

Bioinformatics. Molecular Biophysics & Biochemistry 447b3 / 747b3. Class 3, 1/19/98. Mark Gerstein. Yale University

Chapter 7: Rapid alignment methods: FASTA and BLAST

Local Alignment: Smith-Waterman algorithm

Substitution matrices

BLAST: Target frequencies and information content Dannie Durand

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

Similarity searching summary (2)

Pairwise alignment. 2.1 Introduction GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSD----LHAHKL

Analysis and Design of Algorithms Dynamic Programming

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Lecture 4: September 19

Lecture 5: September Time Complexity Analysis of Local Alignment

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Quantifying sequence similarity

Transcription:

Lecture 2, 12/3/2003: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties Local alignment the Smith-Waterman algorithm 1

Computational sequence-analysis The major goal of computational sequence analysis is to predict the function and structure of genes and proteins from their sequence. This is made possible since organisms evolve by mutation, duplication and selection of their genes. Thus, sequence similarity often indicates functional and structural similarity. 2

Sequence alignment 5 ATCAGAGTC 3 5 TTCAGTC 3 ATC CTA AG GA etc. 3

Sequence alignment ATCAGAGTC TTCAGTC TTCAGTC TTCAGTC TTCA--GTC ++++ +++^^+++ We wish to identify what regions are most similar to each other in the two sequences. Sequences are shifted one by the other and gaps introduced, to cover all possible alignments. The shifts and gaps provide the steps by which one sequence can be converted into the other. 4

Sequence alignment dot-plot T T C A G T C A T C A G A G T C T T C A G T C A T TCAGAGTC TCA-- GTC 5

Sequence alignment scoring Substitution matrix - the similarity value between each pair of residues A C G T A 2 0 0 0 C 0 2 0 0 G 0 0 2 0 T 0 0 0 2 Gap penalty - the cost of introducing gaps Gap penalty-2 ATCAGAGTC TTCA--GTC +++^^+++ : 0+2+2+2-2-2+2+2+2 = 8 6

Sequence alignment Needleman-Wunsch global alignment A C G T A 2 0 0 0 C 0 2 0 0 G 0 0 2 0 T 0 0 0 2 Gap penalty -2 A T C A G A G T C T 0 2 0 0 0 0 0 2 0 T 0 2 0 0 0 0 0 2 0 C 0 0 2 0 0 0 0 0 2 A 2 0 0 2 0 2 0 0 0 G 0 0 0 0 2 0 2 0 0 T 0 2 0 0 0 0 0 2 0 C 0 0 2 0 0 0 0 0 2 Position 3,2: [T 2 T 1 ] ATC -TT [C 3 T 1 ] ATC- --TT [T 2 T 2 ] ATC TT- Initialization [ a b ] [ a -] [ - b ] 7

Sequence alignment Needleman-Wunsch global alignment A C G T A 2 0 0 0 C 0 2 0 0 G 0 0 2 0 T 0 0 0 2 Gap penalty -2 A T C A G A G T C 0-2 -4-6 -8-10 -12-14 -16-18 T -2 0 2 0 0 0 0 0 2 0 T -4 0 2 0 0 0 0 0 2 0 C -6 0 0 2 0 0 0 0 0 2 A -8 2 0 0 2 0 2 0 0 0 G -10 0 0 0 0 2 0 2 0 0 T -12 0 2 0 0 0 0 0 2 0 C -14 0 0 2 0 0 0 0 0 2 Initialization Directionality of score calculation [ a b ] [ a -] [ - b ] 8

Sequence alignment Needleman-Wunsch global alignment A C G T A 2 0 0 0 C 0 2 0 0 G 0 0 2 0 T 0 0 0 2 Gap penalty -2 A T C A G A G T C 0-2 -4-6 -8-10 -12-14 -16-18 T -2 0 0-2 -4-6 -8-10 -12-14 T -4-2 2 0-2 -4-6 -8-8 -10 C -6 0 0 2 0 0 0 0 0 2 A -8 2 0 0 2 0 2 0 0 0 G -10 0 0 0 0 2 0 2 0 0 T -12 0 2 0 0 0 0 0 2 0 C -14 0 0 2 0 0 0 0 0 2 9

Sequence alignment Needleman-Wunsch global alignment A C G T A 2 0 0 0 C 0 2 0 0 G 0 0 2 0 T 0 0 0 2 Gap penalty -2 A T C A G A G T C 0-2 -4-6 -8-10 -12-14 -16-18 T -2 0 0-2 -4-6 -8-10 -12-14 T -4-2 2 0-2 -4-6 -8-8 -10 C -6-4 0 2 0 0 0 0 0 2 A -8 2 0 0 2 0 2 0 0 0 G -10 0 0 0 0 2 0 2 0 0 T -12 0 2 0 0 0 0 0 2 0 C -14 0 0 2 0 0 0 0 0 2 10

Sequence alignment Needleman-Wunsch global alignment A C G T A 2 0 0 0 C 0 2 0 0 G 0 0 2 0 T 0 0 0 2 Gap penalty -2 A T C A G A G T C 0-2 -4-6 -8-10 -12-14 -16-18 T -2 0 0-2 -4-6 -8-10 -12-14 T -4-2 2 0-2 -4-6 -8-8 -10 C -6-4 0 2 0 0 0 0 0 2 A -8 2 0 0 2 0 2 0 0 0 G -10 0 0 0 0 2 0 2 0 0 T -12 0 2 0 0 0 0 0 2 0 C -14 0 0 2 0 0 0 0 0 2 11

Sequence alignment Needleman-Wunsch global alignment A C G T A 2 0 0 0 C 0 2 0 0 G 0 0 2 0 T 0 0 0 2 Gap penalty -2 A T C A G A G T C 0-2 -4-6 -8-10 -12-14 -16-18 T -2 0 0-2 -4-6 -8-10 -12-14 T -4-2 2 0-2 -4-6 -8-8 -10 C -6-4 0 4 0 0 0 0 0 2 A -8 2 0 0 2 0 2 0 0 0 G -10 0 0 0 0 2 0 2 0 0 T -12 0 2 0 0 0 0 0 2 0 C -14 0 0 2 0 0 0 0 0 2 12

Sequence alignment Needleman-Wunsch algorithm σ[ a ] b : score of aligning a pair of residues a and b σ[ a ] - : score of aligning residue a with a gap (gap penalty: -q) S : score matrix S(i,j) : optimal score of aligning residues positions 1 to i on one sequence with residues positions 1 to j on another sequence 13

Sequence alignment Needleman-Wunsch algorithm S(0,0) 0 for j 1 to N do S(0,j) S(0,j-1) + σ[ - bj ] for i 1 to M do { S(i,0) S(i-1,0) + σ[ a i - ] } for j 1 to N do S(i,j) max (S(i-1, j-1) + σ[ a i b j ], S(i-1, j) + σ[ a i - ], S(i, j-1) + σ[ - bj ]) 14 Pearson & Miller Meth Enz 210:575, 92

Sequence alignment Needleman-Wunsch global alignment Optimal score/s is found - more steps needed to find the corresponding alignment/s. This is a time-saving property in database searches and other applications. Only a single pass through the alignment matrix is needed. 15

Needleman-Wunsch global alignment: The TRACEBACK A C G T A 2 0 0 0 C 0 2 0 0 G 0 0 2 0 T 0 0 0 2 Gap penalty -2 A T C A G A G T C 0-2 -4-6 -8-10 -12-14 -16-18 T -2 0 0-2 -4-6 -8-10 -12-14 T -4-2 2 0-2 -4-6 -8-8 -10 C -6-4 0 4 2 0-2 -4-6 -6 A -8-4 -2 2 6 4 2 0-2 -4 G -10-6 -4 0 4 8 6 4 2 0 T -12-8 -4-2 2 6 8 6 6 4 C -14-10 -6-2 0 4 6 8 6 8 ATCAGAGTC -- TTC--AGTC Score: 2 x 6 2x2 = 8 ATCAGAGTC -- TTCAG--TC Score: 2 x 6 2x2 = 8 16

Sequence alignment Needleman-Wunsch global alignment Algorithm calculates score/s of optimal global sequence alignments, penalizes end gaps and penalizes each residue in a gap is equally. ATCAGAGTC has lower score then CAGAGTC --TTCAGTC TTCAGTC ATCACAGTC has same score as ATCACAGTC T-C--AGTC T---CAGTC ATCACAGTC has lower score then ACACAGTC T---CAGTC T--CAGTC 17

Sequence alignment Needleman-Wunsch global alignment In order to score a gap penalty q independent of the gap length, i.e ACACAGTC ATCACAGTC AGCTTTCACAGTC all have the T--CAGTC T---CAGTC T-------CAGTC same score the algorithm we presented is modified to extend alignments in more then the three ways we considered. 18

Sequence alignment Needleman-Wunsch global alignment A T C A G A G T C T 0 2 0 0 0 0 0 2 0 T 0 2 0 0 0 0 0 2 0 [ - b ] C 0 0 2 0 0 0 0 0 2 A 2 0 0 2 0 2 0 0 0 G 0 0 0 0 2 0 2 0 0 T 0 2 0 0 0 0 0 2 0 C 0 0 2 0 0 0 0 0 2 [ a - ] [ a b ] [ a - ] [ - b ] 19

Sequence alignment Needleman-Wunsch algorithm S(0,0) 0 for j 1 to N do S(0,j) -q for i 1 to M do { S(i,0) -q for j 1 to N do S(i,j) max (S(i-1, j-1) + σ[ a i b j ], max {S(0, j)...s(i-1, j)} -q, max {S(i, 0)...S(i, j-1)} -q) } 20 Pearson & Miller Meth Enz 210:575, 92

Sequence alignment Needleman-Wunsch global alignment caveats Every algorithm is limited by the model it is built upon. For example, the NW dynamic programming algorithm guarantees us optimal global alignments with the parameters we supply (substitution matrix, gap penalty and gap scoring). However - Different parameters can give different alignments, The correct alignment might not be the optimal one. The correct alignment might correspond only to part of the global alignments, 21

More details, sources and things to do for next class Source: Pearson WR & Miller W "Dynamic programming algorithms for biological sequence comparison." Methods in Enzymology, 210:575-601 (1992). Assignment: Calculate NW alignments with constant gap penalty seeing the effect of different gap penalties and match/mismatch scores. In all cases use substitution matrices that have two types of scores only a value for an exact match and a lower value for mismatches. Try the nucleotide sequences used in class and the following amino acid sequences: ACDGSMF & AMDFR. 22

Local sequence alignments Local sequence alignments are necessary for cases of: Modular organization of genes and proteins (exons, domains, etc.) Repeats Sequences diverged so that similarity was retained, or can be detected, just in some sub-regions 23

Modular organization Advanced Topics of in Bioinformatics genes Weizmann Institute Science, spring 2003 gene A gene B gene C gene W gene X gene Y gene Z 24

Modular protein Adapted from Henikoff et al Science 278:609, 97 organization Kringle domain IG domain IG domain IG domain IG domain TLK receptor tyrosine-kinase Protein-kinase domain FN3 domain FN3 domain IG domain FN3 domain TEK receptor tyrosine-kinase EGF domain EGF domain EGF domain IG domain 25

Modular protein organization 1KAP secreted calcium-binding alkaline-protease Calcium-binding repeats Protease domain 26

Local sequence alignment 27

Local sequence alignment For local sequence alignment we wish to find what regions (sub-sequences) in the compared pair of sequences will give the best alignment scores with the parameters we supply (substitution matrix, gap penalty and gap scoring model. The aligned regions may be anywhere along the sequences. More then one region might be aligned with a score above the threshold. 28

Sequence alignment Needleman-Wunsch algorithm S(0,0) 0 for j 1 to N do S(1,j) -q for i 1 to M do { S(i,1) -q } for j 1 to N do [ a - ] [ a b ] S(i,j) max (S(i-1, j-1) + σ[ a i b j ], max {S(0, j)...s(i-1, j)} -q, max {S(i, 0)...S(i, j-1)} -q) 29 [ - b ]

Local sequence alignment Smith-Waterman algorithm σ[ a ] b : score of aligning a pair of residues a and b -q : gap penalty S (i,j) : optimal score of an alignment ending at residues i,j best : highest score in the scores-matrix (S) 30

best 0 for j 1 to N do S (0,j) 0 for i 1 to M do { S (i,0) 0 } Local sequence alignment Smith-Waterman algorithm for j 1 to N do S (i,j) max (S (i-1, j-1) + σ[ a i b j ], max {S (0, j)...s(i-1, j)} -q, max {S (i, 0)...S(i, j-1)} -q, 0) best max (S (i, j), best) 31 Pearson & Miller Meth Enz 210:575, 92

Local sequence alignment Smith-Waterman algorithm Finding the optimal alignment A C G T A 1-1 -1-1 C -1 1-1 -1 G -1-1 1-1 T -1-1 -1 1 A T C A G A G T C 0 0 0 0 0 0 0 0 0 0 G 0 0 0 0 0 1 0 1 0 0 T 0 0 1 0 0 0 0 0 2 0 C 0 0 0 2 0 0 0 0 0 3 A 0 1 0 0 3 1 1 1 1 1 G 0 0 0 0 1 4 2 2 2 2 T 0 0 1 0 1 2 3 1 3 1 Gap penalty -2 The optimal local alignment is: C 0 0 0 2 1 2 1 2 1 4 A 0 1 0 0 3 2 3 1 1 2 ATCAGAGTC G TCAG--TC A ++++^^++ : 1+1+1+1-2+1+1=4 32

Local sequence alignment Smith-Waterman algorithm Finding the optimal alignment A C G T A 1-1 -1-1 C -1 1-1 -1 G -1-1 1-1 T -1-1 -1 1 Gap penalty -2 A T C A G A G T C 0 0 0 0 0 0 0 0 0 0 G 0 0 0 0 0 1 0 1 0 0 T 0 0 1 0 0 0 0 0 2 0 C 0 0 0 2 0 0 0 0 0 3 A 0 1 0 0 3 1 1 1 1 1 G 0 0 0 0 1 4 2 2 2 2 T 0 0 1 0 1 2 3 1 3 1 C 0 0 0 2 1 2 1 2 1 4 A 0 1 0 0 3 2 3 1 1 2 Score threshold 3 33

Local sequence alignment Smith-Waterman algorithm Finding the optimal alignment A C G T A 1-1 -1-1 C -1 1-1 -1 G -1-1 1-1 T -1-1 -1 1 A T C A G A G T C 0 0 0 0 0 0 0 0 0 0 G 0-1 -1-1 -1 1-1 1-1 -1 T 0-1 0-1 -1-1 -1-1 1-1 C 0-1 -1 0-1 -1-1 -1-1 1 A 0 1-1 -1 0-1 1-1 -1-1 G 0-1 -1-1 -1 0-1 1-1 -1 T 0-1 1-1 -1-1 -1-1 0-1 Gap penalty -2 C 0-1 -1 1-1 -1-1 -1-1 0 A 0 1-1 -1 1-1 1-1 -1-1 Remove scores of the current optimal ATCAGAGTC alignment and then recalculate the GTCAG--TCA matrix to find the next best alignment /s 34

Local sequence alignment Smith-Waterman algorithm Finding the sub-optimal alignment A C G T A 1-1 -1-1 C -1 1-1 -1 G -1-1 1-1 T -1-1 -1 1 Gap penalty -2 A T C A G A G T C 0 0 0 0 0 0 0 0 0 0 G 0 0 0 0 0 1 0 1 0 0 T 0 0 0 0 0 0 0 0 2 0 C 0 0 0 0 0 0 0 0 0 3 A 0 1 0 0 0 0 1 0 0 0 G 0 0 0 0 0 0 0 2 0 0 T 0 0 1 0 0 0 0 0 0 0 C 0 0 0 2 0 0 0 0 0 0 A 0 1 0 0 3 1 1 1 1 1 A TCAGAGTC GTCAGTCA +++ : 1+1+1 =3 Score threshold 3 35

Local sequence alignment Smith-Waterman algorithm In order for the algorithm to identify local alignments the score for aligning unrelated sequence segments should typically be negative. Otherwise true optimal local alignments will be extended beyond their correct ends or have lower scores then longer alignments between unrelated regions. Alignment scores are determined by substitution matrix and by the gap penalties and gap scoring model. 36

Alignment scoring schemes: gap models Gap scoring by a constant relation to the gap length: σ -q g (g is the number ATCACA σ -3q of gapped residues) T---CA Gap scoring by a constant relation to the gap length: σ -q ATCACA σ -q T---CA Affine gap scoring (opening [d] and extending gap penalties [e]): σ -(d + e (g-1)) ATCACA σ -(d + 2e) T---CA 37

Local sequence alignment Smith-Waterman algorithm If alignment scores of unrelated sequences are mainly or solely determined by the substitution scores then such alignments would have negative scores if the sum of expected substitution scores would be negative: Σ i,j p i p j s ij < 0 i & j - residues, p i - frequency of residue i s ij - score of aligning residues i and j 38

Local sequence alignment Smith-Waterman algorithm We can easily identify substitution matrices that will not give positive scores to random alignments. However, we have no analytical way for finding which gap scores will satisfy the demand for random alignment scores to be less or equal to zero and produce local sequence alignments. Nevertheless, certain sets of scoring schemes (substitution matrix and gap scores) were found to give satisfactory local alignments. 39

More details, sources and things to do for next lecture Sources: Pearson & Miller "Dynamic programming algorithms for biological sequence comparison." Methods in Enz., 210:575-601 (1992), Altschul Amino acid substitution matrices from an information theoretic perspective J Mol Biol 219:555-565 (1991), Henikoff Scores for sequence searches and alignments Curr Opin Struct Biol 6:353-360 (1996). Assignment: Read the source articles for this lecture. They have more details on the material we covered and introduce topics for next lectures. Calculate S for the sequences presented in class, using the unitary matrix (1 for match, -1 for mismatch), and the constant gap penalty model 40 with q=-1, -2 or -4.

More details, sources and things to do for next lecture For those who are not acquainted with information theory or want to be certain they know the basics of it: An information theory primer for molecular biologistshttp://www.lecb.ncifcrf.gov/~toms/paper/primer 41

Next lecture, 12/12/2001: Substitution Matrices: amino-acids features and empirical matrices BLAST and FASTA: algorithms and statistics; assumptions and associated artifacts 42