# CSE 549: Computational Biology. Substitution Matrices

Size: px
Start display at page:

Transcription

1 CSE 9: Computational Biology Substitution Matrices

2 How should we score alignments So far, we ve looked at arbitrary schemes for scoring mutations. How can we assign scores in a more meaningful way? Are these scores better than these scores? A C G T A C G T A C G T A C G T - - -

3 How should we score alignments So far, we ve looked at arbitrary schemes for scoring mutations. How can we assign scores in a more meaningful way? Are these scores A C G T A better than these scores? A C G T A C C One option learn the substitution / mutation rates from real data G G T T - - -

4 How should we score alignments Main Idea: Assume we can obtain (through a potentially burdensome process) a collection of high quality, high confidence sequence alignments. We have a collection of sequences which, presumably, originated from the same ancestor differences are mutations due to divergence. Learn the frequency of different mutations from these alignments, and use the frequencies to derive our scoring function.

5 BLOSUM matrix Brick, Kevin, and Elisabetta Pizzi. "A novel series of compositionally biased substitution matrices for comparing Plasmodium proteins." BMC bioinformatics 9. (8):.

6 Probabilities to Scores Assuming we have a reasonable process by which to compute frequencies, how can we use this to obtain a score?

7 Probabilities to Scores Assuming we have a reasonable process by which to compute frequencies, how can we use this to obtain a score? Hypothesis we wish to test; two amino acids are correlated because they are homologous. score = log odds ratio = s AB / log observed expected Null hypothesis; two amino acids occur independently (and are uncorrelated and unrelated). Eddy, Sean R. "Where did the BLOSUM alignment score matrix come from?." Nature biotechnology.8 (): -.

8 Probabilities to Scores score = log odds ratio = s AB / log observed expected ratio - score - Positive scores mean we find conservative substitutions Negative scores mean we find nonconservative substitutions Eddy, Sean R. "Where did the BLOSUM alignment score matrix come from?." Nature biotechnology.8 (): -.

9 BLOSUM matrix Introduced by Henikoff & Henikoff in 99 Start with the BLOCKS database (H&H 9). Look for conserved (gapless, >=% identical) regions in alignments.. Count all pairs of amino acids in each column of the alignments.. Use amino acid pair frequencies to derive score for a mutation/replacement Henikoff, S.; Henikoff, J.G. (99). "Amino Acid Substitution Matrices from Protein Blocks". PNAS 89 (): 9 99.

10 } BLOSUM matrix Start with the BLOCKS database (H&H 9). Look for conserved (gapless) regions in alignments. } collection of related proteins conserved block within these proteins sequences too similar are clustered & represented by either a single sequence, or a weighted combination of the cluster members BLOSUM r: the matrix built from blocks with no more than r% of similarity e.g., BLOSUM is the matrix built using sequences with no less than % similarity.* Henikoff, S.; Henikoff, J.G. (99). "Amino Acid Substitution Matrices from Protein Blocks". PNAS 89 (): 9 99.

11 BLOSUM matrix Start with the BLOCKS database (H&H 9). Count all pairs of amino acids in each column of the alignments. FPTADAGGRS FVTADALGRS FPTPDAGLRN FVTAEAGIRQ FPTAEAGGRS c (i) AB = c (i) A 8 < : c (i) A A c(i) B c (i) if A = B otherwise = num. of occurrences of A in column i What is the intuition behind this expression? Henikoff, S.; Henikoff, J.G. (99). "Amino Acid Substitution Matrices from Protein Blocks". PNAS 89 (): 9 99.

12 BLOSUM matrix Start with the BLOCKS database (H&H 9). Count all pairs of amino acids in each column of the alignments. FPTADAGGRS FVTADALGRS FPTPDAGLRN FVTAEAGLRQ FPTAEAGGRS c (i) GG = c (i) Example: = GL = c (i) LL = = In this column, there are ways to pair G with G, potential ways to pair G with L and potential way to pair L with L. Henikoff, S.; Henikoff, J.G. (99). "Amino Acid Substitution Matrices from Protein Blocks". PNAS 89 (): 9 99.

13 Computing Scores. Use amino acid pair frequencies to derive score for a mutation/replacement Total # of potential align. between A & B: c AB = X i c (i) AB Total number of pairwise char. alignments: T = X A B c AB Normalized frequency of aligning A & B: q AB = c AB T

14 BLOSUM matrix FPTADAGGRS FVTADALGRS FPTPDAGLRN FVTAEAGLRQ FPTAEAGGRS In our example, we get q GL = ()() = Henikoff, S.; Henikoff, J.G. (99). "Amino Acid Substitution Matrices from Protein Blocks". PNAS 89 (): 9 99.

15 BLOSUM matrix FPTADAGGRS FVTADALGRS FPTPDAGLRN FVTAEAGLRQ FPTAEAGGRS In our example, we get q GL = ()() = why does this denominator work? Henikoff, S.; Henikoff, J.G. (99). "Amino Acid Substitution Matrices from Protein Blocks". PNAS 89 (): 9 99.

16 BLOSUM matrix FPTADAGGRS FVTADALGRS FPTPDAGLRN FVTAEAGLRQ FPTAEAGGRS cvp = * = cpp = choose = cvv = choose = So cvp + cpp +cvv = = choose In our example, we get q GL = ()() = why does this denominator work? Henikoff, S.; Henikoff, J.G. (99). "Amino Acid Substitution Matrices from Protein Blocks". PNAS 89 (): 9 99.

17 BLOSUM matrix FPTADAGGRS FVTADALGRS FPTPDAGLRN FVTAEAGLRQ FPTAEAGGRS In our example, we get q GL = ()() = total column sum is always # rows choose Henikoff, S.; Henikoff, J.G. (99). "Amino Acid Substitution Matrices from Protein Blocks". PNAS 89 (): 9 99.

18 Computing Scores. Use amino acid pair frequencies to derive score for a mutation/replacement Probability of occurrence of amino acid A in any {A,B} pair: p A = q AA + X A=B q AB Expected likelihood of each {A,B} pair, assuming independence: ( (p A )(p B )=(p A ) if A = B e AB = (p A )(p B )+(p B )(p A )=(p A )(p B ) otherwise

19 Computing Scores Recall the original idea (likelihood scores) score = log odds ratio = s AB / log observed expected score = log odds ratio = s AB = Round log qab e AB! Scaling factor used to produce scores that can be rounded to integers; set to. in H&H 9. Henikoff, S.; Henikoff, J.G. (99). "Amino Acid Substitution Matrices from Protein Blocks". PNAS 89 (): 9 99.

20 Scores are data-dependent distribution of amino acids across columns matters GG GA WG WA NG GA GA pg =. egg =. qgg =. sgg = Round[()log(. /.)] = Round[()(-.)] = GW GA GW GA GN GA GA pg =. egg =. qgg =. sgg = Round[()log(. /.)] = Round[()()] = adopted from:

21 Scores are data-dependent {G,W} observed a lot GG GA WG AW NG GA GA {G,W} observed rarely GW GA GW GA GN GA AG pg =. pw =. pg =. pw =. egw =. qgw =.7 sgw = Round[()log(.7 /.)] = Round[()(.)] = egw =. qgw =.8 sgw = Round[()log(.8 /.)] = Round[()(-.7)] = - adopted from:

22 Example FPTADAGGRS FVTADALGRS FPTPDAGLRN FVTAEAGLRQ FPTAEAGGRS c AB = X i c (i) AB Matrix of cab values A D E F G L N P Q R S T V A D E F G 9 L N P Q R S T V

23 Example Matrix of qab values cab q AB = c AB T p A = q AA + X A=B AB q AB A D E F G L N P Q R S T V A. D. E.. F. G.9 L.. N P.. Q. R. S... T. V.. P A P D P E P F P G P L P N P P P Q P R P S P T P V

24 Example Matrix of eab values A D E F G L N P Q R S T V A. D.. E..8. F...8. G L N P Q R S T V pa qab e AB = ( (p A )(p B )=(p A ) if A = B (p A )(p B )+(p B )(p A )=(p A )(p B ) otherwise

25 Example s AB = Round qab eab log qab e AB! Matrix of scores A D E F G L N P Q R S T V A D E 7 F 7 G L N P Q 7 R 7 S 7 7 T 7 V Blank cells are missing data (i.e. no observed values); wouldn t happen with sufficient training data.

26 Dealing with sequence redundancy E.g., for BLOSUM-8, group sequences that are >8% similar (slide from Michael Gribskov)

27 Point Accepted Mutation Matrix Introduced by Margaret Dayhoff in 978 Observed mutation probabilities between amino acids over 7 families of closely related proteins (8% sequence identity within a family) Based on a Markov mutation model; The PAM is a unit of evolutionary mutation. PAM is the unit for which mutation to occurs per amino acids (this varies e.g. by species). The PAM matrix express the log odds ratio of the likelihood of a point accepted mutation from one amino acid to another to the likelihood that these amino acids were aligned by chance. PAM matrix slides below courtesy of Didier Gonze ( Picture from:

28 PAM scoring matrices The substitution score is expected to depend on the rate of divergence between sequences. accepted mutations The PAM matrices derived by Dayhoff (978): are based on evolutionary distances. have been obtained from carefully aligned closely related protein sequences (7 gapless alignments of sequences having at least 8% similarity). M. Dayhoff Reference: Dayhoff et al. (978). A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure, vol., suppl.,. National Biomedical Research Foundation, Silver Spring, MD, 978.

29 PAM scoring matrices PAM = Percent (or Point) Accepted Mutation The PAM matrices are series of scoring matrices, each reflecting a certain level of divergence: PAM = unit of evolution ( PAM = mutation/ amino acid) PAM PAM PAM proteins with an evolutionary distance of % mutation/position idem for % mutations/position % mutations/position (a position could mutate several times) Reference: Dayhoff et al. (978). A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure, vol., suppl.,. National Biomedical Research Foundation, Silver Spring, MD, 978.

30 Derivation of the PAM matrices To illustrate how the PAM substitution matrices have been derived, we will consider the following artificial ungapped aligned sequences: A C G H D B G H A D I J C B I J Reference: Borodovsky & Ekisheva (7) Problems and Solutions in Biological sequence analysis. Cambridge Univ Press.

31 Derivation of the PAM matrices Phylogenetic trees (maximum parsimony) ABGH ABIJ H-J G-I J-H!I-G ABGH ABIJ ABGH ABIJ B-C A-D B-D A-C ACGH DBGH ADIJ CBIJ B-C A-D B-D A-C ACGH DBGH ADIJ CBIJ ABIH ABGJ I-G H-J J-H G-I ABGH ABIJ ABGH ABIJ B-C A-D B-D A-C ACGH DBGH ADIJ CBIJ B-C A-D B-D A-C ACGH DBGH ADIJ CBIJ Here are represented the four more parsimonious (minimum of substitutions) phylogenetic trees for the alignment given above.

32 Derivation of the PAM matrices Matrix of accepted point mutation counts (A) J I H G D C B A J I H G D C B A For each pair of different amino acids (i,j), the total number a ij of substitutions from i to j along the edges of the phylogenetic tree is calculated. (they are indicated in blue on the previous slide)

33 Derivation of the PAM matrices Each edge of a given tree is associated with the ungapped alignment of the two sequences connected by this edge. Thus, any tree shown above generates alignments. For example the first phylogenetic tree generates the following alignments: ABGH H-J G-I A B G H A B G H A B G H A B G H A B I J A C G H ABGH ABIJ B-C A-D B-D A-C ACGH DBGH ADIJ CBIJ A B G H A B I J A B I J D B G H A D I J C B I J Those alignments can be used to assess the "relative mutability" of each amino acid.

34 Derivation of the PAM matrices Relative mutability (m i ) The relative mutability is defined by the ratio of the total number of times that amino acid j has changed in all the pair-wise alignments (in our case x= alignments) to the number of times that j has occurred in these alignments, i.e. alns / tree # of trees m j = number of changes of j number of occurrences of j Relative amino acid mutability values m j for our example Amino acid Changes (substitutions) A 8 B 8 Frequency of occurrence 8 8 Relative mutability m j The relative mutability accounts for the fact that the different amino acids have different mutation rates. This is thus the probability to mutate. I H G J C 8 D 8

35 Derivation of the PAM matrices Relative mutability of the amino acids aa m i aa m i Trp and Cys are less mutable Asn Ser Asp Glu Ala Thr 97 His Arg Lys Pro Gly Tyr 9 Cys is known to have several unique, indispensable function (attachment site of heme group in cytochrome and of FeS clusters in ferredoxin). It also forms cross-links such as in chymotrypsin or ribonuclease. Big groups like Trp or Phe are less mutable due to their particular chemistry. On the other extreme, the low mutability of Cys must be due to its unique smallness that is advatageous in many places. Ile 9 Phe Met 9 Leu Asn, Ser, Asp and Glu are most mutable Gln Val 9 7 Cys Trp 8 Values according Dayhoff (978) The value for Ala has been arbitrarily set at. Although Ser sometimes functions in the active center, it more often performs a function of lesser importance, easily mimicked by several other amino acids of similar physical and chemical properties.

36 Derivation of the PAM matrices Effective frequency (f i ) The notion of effective frequency f i takes into account the difference in variability of the primary structure conservation in proteins with different functional roles. Two alignment blocks corresponding to different families may contribute differently to f i even if the number of occurrence of amino acid j in these blocks is the same. " relative frequency of % \$ ' = # exposure to mutation& " average composition% " number of mutations in% \$ ' ( \$ ' # of each group & # the corresponding tree&

37 Derivation of the PAM matrices Effective frequency (f i ) The effective frequency is defined as where f j = k " b (b q ) j N (b ) the sum is taken over all alignment blocks b q j (b) is the observed frequency of amino acid j in block b, N (b) is the number of substitutions in a tree built for b and the coefficient k is chosen the ensure that the sum of the frequences f j =. In our example, there is only one block, therefore the effective frequencies are equal to the compositional frequencies (f i = q j ): Amino acid A B I H G J C D Frequency f

38 Derivation of the PAM matrices Effective frequency of the amino acids determined for the original alignment data (Dayhoff et al., 978) Amino acid Gly Ala Leu Lys Ser Val Thr Frequency f Amino acid Pro Glu Asp Arg Asn Phe Gln Frequency f Amino acid Ile His Cys Tyr Met Trp Frequency f Source: Dayhoff, 978 Distribution of amino acids found in 8 peptides and proteins listed in the Atlas of Protein Sequence and Structure (98). Doolittle RF (98) Similar amino acid sequences: chance or common ancestry? Science. :9-9.

39 Derivation of the PAM matrices Mutational probability matrix (M) Let's define M ij the probability of the amino acid in column j having been substituted by an amino acid in row i over a given evolutionary time unit. Non-diagonal elements of M: Diagonal elements of M: M ij = "m j A ij A kj # k M ii =" #m i In these equations, m is the relative mutability and A is the matrix of accepted point mutations. The constant λ represents a degree of freedom that could be used to connect the matrix M with an evolutionary time scale. In our example: see matrix A A A B C D this represents / of the cases this represents 8/ of the cases mutability m If A is mutated, the probability that it is mutated into D is A DA /(A BA +A CA +A DA ) = /8 Thus the probability that A is mutated into D is: M DA = /8 * 8/ = / and the probability that A is not mutated is: M AA = - 8/ = /

40 Derivation of the PAM matrices Mutational probability matrix (M) Let's define M ij the probability of the amino acid in column j having been substituted by an amino acid in row i over a given evolutionary time unit. Non-diagonal elements of M: Diagonal elements of M: M ij = "m j A ij A kj # k M ii =" #m i In these equations, m is the relative mutability and A is the matrix of accepted point mutations. The constant λ represents a degree of freedom that could be used to connect the matrix M with an evolutionary time scale. The coefficient λ could be adjusted to ensure that a specific (small) number of substitutions would occur on average per hundred residues. This adjustement was done by Dayhoff et al in the following way. The expected number of amino acids that will remain inchanged unchanged in a protein sequence amino acid long is given by: " f j M jj =" f j (# \$m j ) j If only one substitution per residue is allowed, then λ is calculated from the equation: \$ f j (" #m j ) = 99 j j

41 For every amino acids We want 99 of them to remain unchanged. \$ j f j (" #m j ) = 99 Average probability that amino acids will not mutate

42 Derivation of the PAM matrices Mutational probability matrix In our example, λ =. and the mutation probability matrix (PAM) is: A B C D G H I J A B C...97 D...97 G.997. H.997. I..997 J..997 Note that M is a non-symmetric matrix.

43 Derivation of the PAM matrices Mutational probability matrix derived by Dayhoff for the amino acids V 99 Y 997 W T S 99 8 P F M K 997 L I H 99 7 G 98 7 E Q 997 C 989 D 9 98 N R A V Y W T S P F M K L I H G E Q C D N R A For clarity, the values have been multiplied by Source: Dayhoff, 978 This matrix corresponds to an evolution time period giving mutation/ amino acids, and is refered to as the PAM matrix.

44 Derivation of the PAM matrices Mutational probability matrix derived by Dayhoff for the amino acids This matrix is the mutation probability matrix for an evolution time of PAM. The diagonal represents the probability to still observe the same residue after PAM. Therefore the diagonal represents the 99% of the case of non-mutation. Note that this does not mean that there was no mutation during this time interval. Indeed, the conservation of a residue could reflect either a conservation during the whole period, or a succession of two or more mutations ending at the initial residue Source: J. van Helden

45 Derivation of the PAM matrices From PAM to PAM line of PAM column 7 of PAM => Matrix product: PAM = PAM x PAM Source: J. van Helden

46 Derivation of the PAM matrices From PAM to PAM, PAM, PAM, etc... Remark (from graph theory) a b c d a b c a b c d d Matrix Q indicates the number of paths going from one node to another in step a b c a b c d Matrix Q indicates the number of paths going from one node to another in steps d Source: J. van Helden a b c d a b c d Matrix Q n indicates the number of paths going from one node to another in n steps

47 Derivation of the PAM matrices From PAM to PAM, PAM, PAM, etc... Similarly: PAM gives the probability to observe the changes i j per mutations PAM = PAM gives the probability to observe the changes i j per mutations PAM = PAM gives the probability to observe the changes i j per mutations PAM = PAM gives the probability to observe the changes i j per mutations PAMn = PAM n gives the probability to observe the changes i j per xn mutations. Convergence: it can be verified that PAM = PAM converges to the observed frequencies: \$ f A f A... f A ' & ) f lim M n = & R f R... f R ) n "# & ) & ) % f V f V... f V ( Dayhoff et al. (978) checked this convergence by computing M.

48 Derivation of the PAM matrices PAM derived by Dayhoff for the amino acids V Y W T S P F M K L I H G E Q C D N R A V Y W T S P F M K L I H G E Q C D N R A For clarity, the values have been multiplied by Source: Dayhoff, 978 This matrix corresponds to an evolution time period giving mutation/ amino acids (i.e. an evolutionary distance of PAM), and is refered to as the PAM matrix.

49 Derivation of the PAM matrices Interpretation of the PAM matrix In comparing sequences at this evolutionary distance ( PAM), there is: * * * * A * * * * * PAM * * * * A * * * * * probability of % * * * * R * * * * * probability of % * * * * N * * * * * probability of % * * * * W * * * * * probability of %... Source: Dayhoff, 978

50 Derivation of the PAM matrices From probabilities to scores So far, we have obtained a probability matrix, but we would like a scoring matrix. A score should reflect the significance of an alignment occurring as a result of an evolutionary process with respect to what we could expect by chance. A score should involve the ratio between the probability derived from nonrandom (evolutionary) to random models: r n (i, j) = M n ji f j = P ji,n f i f j probability to see a pair (i,j) due to evolution probability to see a pair (i,j) by chance The matrix M ji n is the mutational probability matrices at PAM distance n. Matrices M and M have been shown before. P ji,n = f i M ji n is the probability that two aligned amino acids have diverged from a common ancestor n/ PAM unit ago, assuming that the substitutions follow a Markov process (for details, see Borodovsky & Ekisheva, 7). Note that R (the odd-score or relatedness matrix) is a symmetric matrix.

51 Derivation of the PAM matrices Log-odd scores In practice, we often use the log-odd scores defined by s n (i, j) = log M n ji f j = log P ji,n f i f j This definition has convenient practical consequences: A positive score (s n > ) characterizes the accepted mutations A negative score (s n < ) characterizes the unfavourable mutations Another property of the log-odd scores is that they can be added to produce the score of an alignment: T A H G K Y S D G D S alignment = s(t,y) + s(a,s) + s(h,d) + s(g,g) + s(k,d)

52 Derivation of the PAM matrices Cys C Ser S PAM matrix (log-odds) Thr T - Pro P - Ala A - Gly G - - Asn N - - Asp D - - Glu E - - Gln Q His H Arg R Lys K Met M Ile I Leu L Val V Phe F Tyr Y Trp W C S T P A G N D E Q H R K M I L V F Y W Cys Ser Thr Pro Ala Gly Asn Asp Glu Gln His Arg Lys Met Ile Leu Val Phe Tyr Trp Hydrophobic C P A G M I L V Aromatic H F Y W Polar S T N Q Y Basic H R K Acidic D E Source: J. van Helden

53 PAM matrices: exercise The original PAM substitution matrix scores a substitution of Gly by Arg by a negative score - (decimal logarithm and scaling factor are used, with rounding to the nearest neighbour). The average frequency of Arg in the protein sequence database is.. Use this information as well as the method described above to estimate the probability that Gly will be substituted by Arg after a PAM time period. Source: Borodovsky & Ekisheva (7)

54 PAM matrices: exercise The original PAM substitution matrix scores a substitution of Gly by Arg by a negative score - (decimal logarithm and scaling factor are used, with rounding to the nearest neighbour). The average frequency of Arg in the protein sequence database is.. Use this information as well as the method described above to estimate the probability that Gly will be substituted by Arg after a PAM time period. The element s ij of the PAM substitution matrix and the frequency of amino acid j (f j ) in a protein sequence database are connected by the following formula: # s ij = % log \$ P(i " j in PAM) f j Therefore, the probability of substitution of Gly by Arg is: & ( ' P(Gly " Arg in PAM) =.# \$. =. Source: Borodovsky & Ekisheva (7)

55 Derivation of the PAM matrices Scoring an alignment A scoring matrix like PAM can be used to score an alignment T A H G K Y S D G D S alignment = s(t,y) + s(a,s) + s(h,d) + s(g,g) + s(k,d) = =

56 Choosing the appropriate PAM matrix How to choose the appropriate PAM matrix? Correspondance between the observed percent of amino acid difference d between the evolutionary distance n (in PAM) between them: # j f j M n jj = " d identity (%) 99 difference d (%) PAM index n twilight zone (detection limit) 8 8

57 Choosing the appropriate PAM matrix How to choose the appropriate PAM matrix? Altschul SF(99) Amino acid substitution matrices from an information theoretic perspective. J Mol Biol. 9:-. PAM matrix is the most appropriate for database searches PAM matrix is the most appropriate for comparing two specific proteins with suspected homology Remark: In the PAM matrices, the index indicates the percentage of substitution per position. Higher indexes are more appropriate for more distant proteins (PAM better than PAM for distant proteins).

58 Other Scoring Matrices PAM vs. BLOSUM from:

59 Other Scoring Matrices PAM vs. BLOSUM PAM BLOSUM Based on global alignments of closely related proteins. Based on local alignments of protein segments. PAM is the matrix calculated from comparisons of sequences with no more than % divergent Other PAM matrices are extrapolated from PAM Larger numbers in name denote larger evolutionary distance Based on explicit, Markovian, model of evolution BLOSUM is calculated from comparisons of sequences no less than % identical Other BLOSUM matrices are not extrapolated, but computed based on observed alignments at different identity percentage Larger numbers in name denote higher sequence similarity (& therefore smaller evolutionary distance) Not based on any explicit model of evolution, but learned empirically from alignments from:

60 What about gap penalties? Despite some work +, the setting of gap penalties is still much more arbitrary than the selection of a substitution matrix. Gap penalty values are designed to reduce the score when an alignment has been disturbed by indels. The value should be small enough to allow a previously accumulated alignment to continue with an insertion of one of the sequences, but should not be so large that this previous alignment score is removed completely. Changing the gap function can have significant effects on reported alignments. People often resort to defaults to avoid having to justify a choice. +Reese, J. T., and William R. Pearson. "Empirical determination of effective gap penalties for sequence comparison." Bioinformatics 8. (): -7.

### Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

### Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement

### Substitution matrices

Introduction to Bioinformatics Substitution matrices Jacques van Helden Jacques.van-Helden@univ-amu.fr Université d Aix-Marseille, France Lab. Technological Advances for Genomics and Clinics (TAGC, INSERM

### Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib

### Quantifying sequence similarity

Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

### Similarity or Identity? When are molecules similar?

Similarity or Identity? When are molecules similar? Mapping Identity A -> A T -> T G -> G C -> C or Leu -> Leu Pro -> Pro Arg -> Arg Phe -> Phe etc If we map similarity using identity, how similar are

### Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University

Sequence Alignment: Scoring Schemes COMP 571 Luay Nakhleh, Rice University Scoring Schemes Recall that an alignment score is aimed at providing a scale to measure the degree of similarity (or difference)

### M.O. Dayhoff, R.M. Schwartz, and B. C, Orcutt

A Model of volutionary Change in Proteins M.O. Dayhoff, R.M. Schwartz, and B. C, Orcutt n the eight years since we last examined the amino acid exchanges seen in closely related proteins,' the information

### Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

### Local Alignment Statistics

Local Alignment Statistics Stephen Altschul National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda, MD Central Issues in Biological Sequence Comparison

### Scoring Matrices. Shifra Ben-Dor Irit Orr

Scoring Matrices Shifra Ben-Dor Irit Orr Scoring matrices Sequence alignment and database searching programs compare sequences to each other as a series of characters. All algorithms (programs) for comparison

### CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

### Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

### Sequence analysis and comparison

The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

### Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

### Exercise 5. Sequence Profiles & BLAST

Exercise 5 Sequence Profiles & BLAST 1 Substitution Matrix (BLOSUM62) Likelihood to substitute one amino acid with another Figure taken from https://en.wikipedia.org/wiki/blosum 2 Substitution Matrix (BLOSUM62)

### Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

### Pairwise sequence alignments

Pairwise sequence alignments Volker Flegel VI, October 2003 Page 1 Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs VI, October

### Pairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel )

Pairwise sequence alignments Vassilios Ioannidis (From Volker Flegel ) Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs Importance

### Computational Biology

Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

### CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

### Pairwise sequence alignment

Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL

### In-Depth Assessment of Local Sequence Alignment

2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.

### Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

### Statistical Distributions of Optimal Global Alignment Scores of Random Protein Sequences

BMC Bioinformatics This Provisional PDF corresponds to the article as it appeared upon acceptance. The fully-formatted PDF version will become available shortly after the date of publication, from the

### Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral

### Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary

### Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

### Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Biochemistry 324 Bioinformatics Pairwise sequence alignment How do we compare genes/proteins? When we have sequenced a genome, we try and identify the function of unknown genes by finding a similar gene

### Practical considerations of working with sequencing data

Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!

### First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences 140.638 where do sequences come from? DNA is not hard to extract (getting DNA from a

### Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

### BLAST: Target frequencies and information content Dannie Durand

Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences

### Single alignment: Substitution Matrix. 16 march 2017

Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block

### Sequence Database Search Techniques I: Blast and PatternHunter tools

Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered

### Supplemental Materials for. Structural Diversity of Protein Segments Follows a Power-law Distribution

Supplemental Materials for Structural Diversity of Protein Segments Follows a Power-law Distribution Yoshito SAWADA and Shinya HONDA* National Institute of Advanced Industrial Science and Technology (AIST),

### THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

### BIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University

BIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University Measures of Sequence Similarity Alignment with dot

### Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison: Score matrices Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas Informal inductive proof of best alignment path onsider the last step in the best

### 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

### Proteins: Characteristics and Properties of Amino Acids

SBI4U:Biochemistry Macromolecules Eachaminoacidhasatleastoneamineandoneacidfunctionalgroupasthe nameimplies.thedifferentpropertiesresultfromvariationsinthestructuresof differentrgroups.thergroupisoftenreferredtoastheaminoacidsidechain.

### Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Bioch/BIMS 503 Lecture 2 Structure and Function of Proteins August 28, 2008 Robert Nakamoto rkn3c@virginia.edu 2-0279 Secondary Structure Φ Ψ angles determine protein structure Φ Ψ angles are restricted

### Sequence comparison: Score matrices

Sequence comparison: Score matrices http://facultywashingtonedu/jht/gs559_2013/ Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas FYI - informal inductive proof of best

### Algorithms in Bioinformatics

Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

### What makes a good graphene-binding peptide? Adsorption of amino acids and peptides at aqueous graphene interfaces: Electronic Supplementary

Electronic Supplementary Material (ESI) for Journal of Materials Chemistry B. This journal is The Royal Society of Chemistry 21 What makes a good graphene-binding peptide? Adsorption of amino acids and

### Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison: Score matrices Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas FYI - informal inductive proof of best alignment path onsider the last step in

### Packing of Secondary Structures

7.88 Lecture Notes - 4 7.24/7.88J/5.48J The Protein Folding and Human Disease Professor Gossard Retrieving, Viewing Protein Structures from the Protein Data Base Helix helix packing Packing of Secondary

### Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Institute of Bioinformatics Johannes Kepler University, Linz, Austria Sequence Alignment 2. Sequence Alignment Sequence Alignment 2.1

### Sequence Based Bioinformatics

Structural and Functional Analysis of Inosine Monophosphate Dehydrogenase using Sequence-Based Bioinformatics Barry Sexton 1,2 and Troy Wymore 3 1 Bioengineering and Bioinformatics Summer Institute, Department

### Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

### An Introduction to Sequence Similarity ( Homology ) Searching

An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,

### Structure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase Cwc27

Acta Cryst. (2014). D70, doi:10.1107/s1399004714021695 Supporting information Volume 70 (2014) Supporting information for article: Structure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase

### C H E M I S T R Y N A T I O N A L Q U A L I F Y I N G E X A M I N A T I O N SOLUTIONS GUIDE

C H E M I S T R Y 2 0 0 0 A T I A L Q U A L I F Y I G E X A M I A T I SLUTIS GUIDE Answers are a guide only and do not represent a preferred method of solving problems. Section A 1B, 2A, 3C, 4C, 5D, 6D,

### Clustering and Model Integration under the Wasserstein Metric. Jia Li Department of Statistics Penn State University

Clustering and Model Integration under the Wasserstein Metric Jia Li Department of Statistics Penn State University Clustering Data represented by vectors or pairwise distances. Methods Top- down approaches

### SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.

### ... and searches for related sequences probably make up the vast bulk of bioinformatics activities.

1 2 ... and searches for related sequences probably make up the vast bulk of bioinformatics activities. 3 The terms homology and similarity are often confused and used incorrectly. Homology is a quality.

### Exam III. Please read through each question carefully, and make sure you provide all of the requested information.

09-107 onors Chemistry ame Exam III Please read through each question carefully, and make sure you provide all of the requested information. 1. A series of octahedral metal compounds are made from 1 mol

### Large-Scale Genomic Surveys

Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

### Pairwise Sequence Alignment

Introduction to Bioinformatics Pairwise Sequence Alignment Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Outline Introduction to sequence alignment pair wise sequence alignment The Dot Matrix Scoring

### 114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 9 Protein tertiary structure Sources for this chapter, which are all recommended reading: D.W. Mount. Bioinformatics: Sequences and Genome

### Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

### A New Similarity Measure among Protein Sequences

A New Similarity Measure among Protein Sequences Kuen-Pin Wu, Hsin-Nan Lin, Ting-Yi Sung and Wen-Lian Hsu * Institute of Information Science Academia Sinica, Taipei 115, Taiwan Abstract Protein sequence

### InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

### Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

Cell communication channel Bioinformatics Methods Iosif Vaisman Email: ivaisman@gmu.edu SEQUENCE STRUCTURE DNA Sequence Protein Sequence Protein Structure Protein structure ATGAAATTTGGAAACTTCCTTCTCACTTATCAGCCACCT...

### Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology

### Physiochemical Properties of Residues

Physiochemical Properties of Residues Various Sources C N Cα R Slide 1 Conformational Propensities Conformational Propensity is the frequency in which a residue adopts a given conformation (in a polypeptide)

### Scoring Matrices. Shifra Ben Dor Irit Orr

Scoring Matrices Shifra Ben Dor Irit Orr Scoring matrices Sequence alignment and database searching programs compare sequences to each other as a series of characters. All algorithms (programs) for comparison

### Supplementary Figure 3 a. Structural comparison between the two determined structures for the IL 23:MA12 complex. The overall RMSD between the two

Supplementary Figure 1. Biopanningg and clone enrichment of Alphabody binders against human IL 23. Positive clones in i phage ELISA with optical density (OD) 3 times higher than background are shown for

### Properties of amino acids in proteins

Properties of amino acids in proteins one of the primary roles of DNA (but not the only one!) is to code for proteins A typical bacterium builds thousands types of proteins, all from ~20 amino acids repeated

### Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise

### Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

### C E N T R. Introduction to bioinformatics 2007 E B I O I N F O R M A T I C S V U F O R I N T. Lecture 5 G R A T I V. Pair-wise Sequence Alignment

C E N T R E F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U Introduction to bioinformatics 2007 Lecture 5 Pair-wise Sequence Alignment Bioinformatics Nothing in Biology makes sense except in

### Other Methods for Generating Ions 1. MALDI matrix assisted laser desorption ionization MS 2. Spray ionization techniques 3. Fast atom bombardment 4.

Other Methods for Generating Ions 1. MALDI matrix assisted laser desorption ionization MS 2. Spray ionization techniques 3. Fast atom bombardment 4. Field Desorption 5. MS MS techniques Matrix assisted

### 8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009

8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009 2 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance and alignment 4. The number

### Moreover, the circular logic

Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT

### 8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011

8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011 2 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance and alignment 4. The number

### NMR Assignments using NMRView II: Sequential Assignments

NMR Assignments using NMRView II: Sequential Assignments DO THE FOLLOWING, IF YOU HAVE NOT ALREADY DONE SO: For Mac OS X, you should have a subdirectory nmrview. At UGA this is /Users/bcmb8190/nmrview.

### Amino Acid Side Chain Induced Selectivity in the Hydrolysis of Peptides Catalyzed by a Zr(IV)-Substituted Wells-Dawson Type Polyoxometalate

Amino Acid Side Chain Induced Selectivity in the Hydrolysis of Peptides Catalyzed by a Zr(IV)-Substituted Wells-Dawson Type Polyoxometalate Stef Vanhaecht, Gregory Absillis, Tatjana N. Parac-Vogt* Department

### bioinformatics 1 -- lecture 7

bioinformatics 1 -- lecture 7 Probability and conditional probability Random sequences and significance (real sequences are not random) Erdos & Renyi: theoretical basis for the significance of an alignment

### Ramachandran Plot. 4ysz Phi (degrees) Plot statistics

B Ramachandran Plot ~b b 135 b ~b ~l l Psi (degrees) 5-5 a A ~a L - -135 SER HIS (F) 59 (G) SER (B) ~b b LYS ASP ASP 315 13 13 (A) (F) (B) LYS ALA ALA 315 173 (E) 173 (E)(A) ~p p ~b - -135 - -5 5 135 (degrees)

### Peptides And Proteins

Kevin Burgess, May 3, 2017 1 Peptides And Proteins from chapter(s) in the recommended text A. Introduction B. omenclature And Conventions by amide bonds. on the left, right. 2 -terminal C-terminal triglycine

### EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

### Lecture Notes: Markov chains

Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

### Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

### Supplementary Information. Broad Spectrum Anti-Influenza Agents by Inhibiting Self- Association of Matrix Protein 1

Supplementary Information Broad Spectrum Anti-Influenza Agents by Inhibiting Self- Association of Matrix Protein 1 Philip D. Mosier 1, Meng-Jung Chiang 2, Zhengshi Lin 2, Yamei Gao 2, Bashayer Althufairi

### Supporting information to: Time-resolved observation of protein allosteric communication. Sebastian Buchenberg, Florian Sittel and Gerhard Stock 1

Supporting information to: Time-resolved observation of protein allosteric communication Sebastian Buchenberg, Florian Sittel and Gerhard Stock Biomolecular Dynamics, Institute of Physics, Albert Ludwigs

### Chapter 2: Sequence Alignment

Chapter 2: Sequence lignment 2.2 Scoring Matrices Prof. Yechiam Yemini (YY) Computer Science Department Columbia University COMS4761-2007 Overview PM: scoring based on evolutionary statistics Elementary

### Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

### Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

### Course Notes: Topics in Computational. Structural Biology.

Course Notes: Topics in Computational Structural Biology. Bruce R. Donald June, 2010 Copyright c 2012 Contents 11 Computational Protein Design 1 11.1 Introduction.........................................

### Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Sequence Bioinformatics Multiple Sequence Alignment Waqas Nasir 2010-11-12 Multiple Sequence Alignment One amino acid plays coy; a pair of homologous sequences whisper; many aligned sequences shout out

### Similarity searching summary (2)

Similarity searching / sequence alignment summary Biol4230 Thurs, February 22, 2016 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 What have we covered? Homology excess similiarity but no excess similarity

### MODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL

MODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL MATTHEW W. DIMMIC*, DAVID P. MINDELL RICHARD A. GOLDSTEIN* * Biophysics Research Division Department of Biology and

### Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in

Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment

### Sequence Comparison. mouse human

Sequence Comparison Sequence Comparison mouse human Why Compare Sequences? The first fact of biological sequence analysis In biomolecular sequences (DNA, RNA, or amino acid sequences), high sequence similarity

### Sequential resonance assignments in (small) proteins: homonuclear method 2º structure determination

Lecture 9 M230 Feigon Sequential resonance assignments in (small) proteins: homonuclear method 2º structure determination Reading resources v Roberts NMR of Macromolecules, Chap 4 by Christina Redfield