Approximate P-Values for Local Sequence Alignments: Numerical Studies. JOHN D. STOREY and DAVID SIEGMUND ABSTRACT
|
|
- Clifton Richard
- 5 years ago
- Views:
Transcription
1 JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 5, 200 Mary Ann Liebert, Inc. Pp Approximate P-Values for Local Sequence Alignments: Numerical Studies JOHN D. STOREY and DAVID SIEGMUND ABSTRACT Siegmund and Yair (2000) have given an approximate p-value when two independent, identically distributed sequences from a nite alphabet are optimally aligned based on a scoring system that rewards similarities according to a general scoring matrix and penalizes gaps (insertions and deletions). The approximation involves an in nite sequence of dif cultto-compute parameters. In this paper, it is shown by numerical studies that these reduce to essentially two numerically distinct parameters, which can be computed as one-dimensional numerical integrals. For an arbitrary scoring matrix and af ne gap penalty, this modi ed approximation is easily evaluated. Comparison with published numerical results show that it is reasonably accurate. Key words: local alignment, af ne gap penalty, p-value, Marov renewal theory.. INTRODUCTION Comparison of a new gene (DNA sequence) or protein (amino acid sequence) with the existing sequences in a database can be an important rst step in learning the structure and/or function of the new sequence. Local pairwise alignment of sequences plays an important role in such database searches. The quality of an alignment is determined by the total similarity score of letters at aligned positions and the number of gaps (insertions/deletions) in each sequence. (These concepts will be de ned precisely in what follows.) Smith and Waterman (98) and Altschul et al. (990) have developed rapid and effective computational algorithms for local pairwise comparison of DNA or amino acid sequences. In addition, one would lie to evaluate the statistical signi cance of sequences showing a particular level of similarity (e.g., Arratia et al., 990; Dembo et al., 994; Altschul and Gish, 996). Approximate p-values when gaps are not allowed have been given by a number of authors, e.g., Arratia et al. (990), for a special scoring function and more generally by Dembo et al. (994). The form of the approximation in the ungapped case has been conjectured to be valid also in the gapped case (Waterman and Vingron, 994) and has been empirically t to simulated and to real data to obtain values for parameters in the approximating formula (cf. Waterman and Vingron, 994; Altschul and Gish, 996). While these approximations seem to be reasonably accurate, development of each approximation requires substantial statistical analysis and computation that go beyond the basic data of the problem: the scoring matrix and the gap costs. More recent results in this direction have been obtained by Altschul et al. (200). Department of Statistics, Stanford University, Stanford, CA
2 550 STOREY AND SIEGMUND For af ne gap costs, Mott and Tribe (999) obtain a useful approximation by an interesting heuristic argument. Assuming the gap open penalty is suf ciently large, Siegmund and Yair (2000) provide a somewhat different approximation, which depends on in nitely many parameters. They also conjecture that for numerical purposes a further approximation depending on two of these parameters would suf ce. The purpose of this paper is to show that a) this conjecture is correct, b) the modi ed approximation is reasonably accurate, and c) it can be rapidly computed directly from basic data of the problem. In addition to better theoretical understanding of how the p-value depends on the parameters of the problem, this approximation requires effectively zero computing time, so it could be implemented on line for a scoring matrix of the user s choice. Section 2 gives our notation and states the formal approximation of Siegmund and Yair (2000), which depends on the parameters 0; ; : Section 3 explains the probabilistic meaning of these parameters. Section 4 contains simulations based on the representations of Section 3, which show for several common scoring matrices that the r for r are essentially constant. Using the two parameters 0 and 3 D lim r r, one obtains a much simpler approximation, which is shown numerically to be in reasonable agreement with the approximation of Atlschul and Gish (996) based on their empirically estimated parameters. Convenient integral representations for 0;3 are given in an appendix. 2. NOTATION AND APPROXIMATION Consider two nite sequences x and y from a nite alphabet. Thus, x D x x 2 x m and y D y y 2 y n with x i ;y j 2 A. We assume throughout that x ;:::;x m are independently distributed with P 0 fx i D g D ¹ for all i; and similarly y ;:::;y n are independently distributed with P 0 fy j D g Dº for all j. Moreover, the x s and y s are independent. A candidate alignment, z D f.i t ;j t / : t g, for some i < i 2 < < i m and j <j 2 < <j n, speci es that x it and y jt are aligned for all t D ; ;: The other x s with subscripts between i and i and the other y s with subscripts between j and j are said to be unaligned. Note that there may be other letters, at the beginning or end of the two sequences, that are neither aligned nor formally designated as unaligned. We assume that either i tc D i t C orj tc D j t C for all t<; i.e., there can be unaligned letters in only one sequence at a time. With each candidate alignment z we associate a score S z D S z.x; y/. Aligned letters x i and y j are scored according to a similarity matrix K.x i ;y j /: We assume a penalty of ± for each unaligned letter. The total number of unaligned letters is l D.i i C C j j C /. A gap is the interval of unaligned letters that begins with a value t C such that i t D j t and either i tc >i t C orj tc >j t C and ends with the next aligned pair after.x it ;y jt /. Each gap is assessed a cost in addition to the cost ± of each unaligned letter. Frequently one refers to as the gap open and ± as the gap extension cost. The score of the candidate alignment z is S z D S z.x; y/ D 6tD K.x i t ;y jt / j ±l, where j is the number of gaps and l is the total number of unaligned letters, or equivalently the total length of all gaps. Note that there are no costs assessed for letters that are neither aligned nor unaligned. Within the collection Z of all candidate alignments, one can identify the best alignment the one with the highest score. The p-value of the best score under the null assumption that the sequences x and y are independent random samples from the given alphabet is P 0.max z2z S z b/; () where b is the observed value of the score for the best alignment and P 0 is the null probability described above. We assume that E 0 K.x;y/ < 0 and P 0 fk.x;y/ > 0g > 0: (2) Let Ã.µ/ D log E 0 exp[µk.x;y/]. Then exp[µk. ; / Ã.µ/]¹ º de nes an exponential family of probabilities indexed by µ, which for µ D 0 reduces to the original probability Q 0. ; / D ¹ º. It follows from (2) by convexity that there is a unique value µ > 0 for which Ã.µ / D 0: Let ¹ 0 D Ã 0.µ />0:
3 LOCAL SEQUENCE ALIGNMENTS 55 Let Q. ; / D exp[µ K. ; /]Q 0. ; / and note that ¹ 0 D E K.x;y/: Also let Q i;j be de ned to be the product probability that gives x the marginal distribution it has under Q i and y the marginal distribution it has under Q j. We assume that E K.x;y/ E i;j K.x;y/ > 0 (3) for all i; j. This condition is easily checed and excludes the case that K. ; / is a sum of a function of and a function of. We shall require the technical assumptions that min.m; n/=b!and that for some 0 <²<, mnexpf.µ b/ ² g is bounded. The second assumption puts the probability of interest into the domain of large deviations. To simplify the presentation of the main approximation, we assume that K.x;y/ is a nonarithmetic random variable and discuss the more complicated arithmetic case below. The main result of Siegmund and Yair (2000) is as follows. Theorem. Assume conditions (2) and (3) hold and that µ D log.µ b/ C C (4) for some constant C. There exist parameters 0; ; in (0,] such that as b!, P 0 max S z b z2z X X mne µ b.µ ¹ 0 / 20 [.2be µ =¹ 0 / j =j!] jd0 ldj e µ ±l X i il j C l jc ; (5) where the innermost summation extends over the l j terms having i C Ci l jc D j and i C 2i 2 C C.l j C /i l jc D l: An upper bound for the indicated sums is =[exp.µ ±/ ]: Remars. (i) A principal consequence of the numerical studies reported below is that the constants r for r are essentially all equal, say, to a single parameter 3. Hence the right hand side of (5) can reasonably be approximated by the much simpler expression mn.µ ¹ 0 / 20 exp. µ bf 2.µ ¹ 0 / 3e µ =[e µ ± ]g/: (6) (ii) In applications, the random variables K.x i ;y j / will be arithmetic of span h, typically h D : In this case it does not seem possible to give a precise asymptotic expression for the desired probability, which can be shown to lie asymptotically in an interval bounded below by hµ =.e hµ / and above by. e hµ /=.hµ / times the quantitity on the right hand side of (5). (iii) As explained by Siegmund and Yair (2000) the condition (4) can be viewed as a diagnostic for the accuracy of the approximation (5). If we suppose that b and are given and (4) de nes the value of C, then one should be careful in applying (5) when C is small, say, C< : THE CONSTANTS l r, r 0,, 2, In the rest of this paper we assume that K.x i ;y j / is arithmetic with span h D : The constant 0 appears in the approximation of Dembo et al. (994), where gaps are not permitted. It has the same probabilistic meaning as the other r, but it is de ned relative to a random wal process rather than a Marov chain and consequently is easily evaluated numerically. It has been thoroughly discussed in the context of sequential analysis (e.g., Woodroofe, 982; Siegmund, 985). To be more precise and prepare for the more complex case of r.r ) given below, let P denote the probability under which.x ;y /;.x 2 ;y 2 /; are independent and identically distributed with the distribution Q de ned following display (2). The log lielihood ratio of the rst n pairs.x i ;y i / under P
4 552 STOREY AND SIEGMUND relative to P 0 is `n D µ S n, where S n D P n K.x i;y i /: It follows from the arguments of Siegmund and Yair (2000) that 0 D lim N which after summation by parts and a change of measure equals N E 0 [exp.max n N `n/]; (7) lim N. e µ / X j E [expf µ.s.j/ j/gi.j/ N]; (8) where.j/ D minfn : S n j g for j (and.0/ is similarly de ned with a strict inequality). Since ¹ 0 D E K.x ;y />0, by the strong law of large numbers P f.j/ N g converges to for j<. ²/N¹ 0 and to 0 for j>.c²/n¹ 0 for arbitrary 0 <²<: It follows from the renewal theorem that º 0 D lim E [expf µ.s.j/ j/g] (9) j exists. From (5) (7) and elementary analysis one obtains 0 D ¹ 0. e µ /º 0 : (0) In addition, º 0 can be expressed in terms of the distribution of the ladder variables..0/; S.0/ / (e.g., Siegmund, 985, Chapter 8) and then can be evaluated numerically as a one-dimensional inverse Fourier transform (cf. Woodroofe [982] for nonarithmetic variables and Tu and Siegmund [999] for the arithmetic case). For completeness the result of Tu and Siegmund (999) is stated in the appendix. Except for the numerical evaluation of º 0, the preceding argument applies with minor modi cations to the r for r : Consider two sequences x and y of length N, for some integers N!. Given r and, <N r, consider the alignment which matches the rst x s with the rst y s and the x s between C and N r with the y s between Cr C andn. Let P.r/ denote the probability measure that maes the aligned pairs x i ;y j independent and identically distributed with joint distribution Q and leaves the distribution of the other x i and y j unchanged. Let `.r/ denote the log lielihood ratio of P.r/ relative to P.r/,so`.r/ 0 D µ [ P K.x i;y i / C P N r C K.x i;y /]and`.r/ icr D µ P 2 [K.x i;y i / K.x i ;y icr /]. Siegmund and Yair (2000) obtain the representations (cf. (7)) r D lim N µ N E.r/ 0 exp.max `.r/ N / D lim N µ N E.r/ exp.max f`.r/ N g/ : () Let P.r/ denote the extension of P.r/ N r to the in nitely long sequence fx i;y i ; i<g: Note that D `.r/ D µ [K.x ;y / K.x ;y Cr /] is a function of x, y and y Cr only. Under P.r/ the are identically distributed; and.r/ is independent of any.r/ l for l such that.l mod r/ 6D. mod r/. For each 0 i<r, the process f.r/ : N r;. mod r/ D ig is a rst order Marov chain. Thus `.r/ D P 2.r/ i is an additive functional of an rth order Marov chain. By () and the argument leading from (7) to (0), we see that.r/.r/ r D ¹ r. e µ /º r ; (2) where ¹ r D E.r/ [K.x 2 ;y 2 / K.x 2 ;y 2Cr ] is constant in r, and the limit de ning º r (exactly as in (9)) is guaranteed to exist by the renewal theorem for additive functionals of a Marov chain applied to the process of ladder variables, e.g., Athreya et al. (978). Althoughwe obtain a representationstructurally identicalto (0), numerical computationsare much more complicated than for a random wal. Asmussen (989) has used a representation of the limit de ning º r in terms of ladder variables to evaluate such a parameter for a similar process. The problem is also discussed by Karlin and Dembo (992), although they do not appear to have developed a numerical algorithm. In the
5 LOCAL SEQUENCE ALIGNMENTS 553 following section we use the representation in () for simulating º r : In this regard, note that simulation requires only the generation of independent, identically distributed, discrete, random variables, while an analytic approach must deal with the Marovian dependence described above. From the probabilistic meaning of the º r, it seems natural to conjecture that they are effectively constant in r ; and since.r/ 2 ; ;.r/ rc are independent and identically distributed, with a distribution not depending on r, the limit of º r as r!can be evaluated in terms of a random wal (albeit a different random wal from that entering into º 0 above). 4. NUMERICAL RESULTS Our goal in this section is two-fold: (i) to show by simulation that it is reasonable to regard the constants r for r as a single constant, so that the complicated approximations given in Theorem can be replaced by the much simpler expresson (6), which is very easy to evaluate numerically via the algorithm given in the appendix; and (ii) to compare the results of our approximation to selected results in the literature reported by Altschul and Gish (996). We use the amino acid frequencies reported by Robinson and Robinson (99) and the scoring matrices BLOSUM62, BLOSUM50 (Henioff and Henioff (992), and PAM250 (e.g., Dayhoff, 978). Tables and 2 give estimated values of º r for the BLOSUM62 and PAM250 substitution matrices, respectively. The estimates are based on 50,000 repetitions of a Monte Carlo experiment and are given for various values of r. Since º r is represented as a limit as j! and large values of j require proportionately long Monte Carlo runs, the estimates are also tabled for different values of j. (The process increases at rate ¹, so a rough estimate, which ignores edge effects, of the mean value of `.r/.j/ is j=¹. Table 3 gives values of ¹.) Standard deviations of the estimates are about The estimates are essentially constant in both r and j. Very similar results have been obtained for the BLOSUM50 substitution matrix and hence are omitted. We conclude from these simulations that it is reasonable to use the simpli ed approximation (6) instead of the much more complicated (5). Table 3 gives the numerical values of different parameters that are required for implementation of the approximation (6). In particular, instead of º r for r, we propose to use º D lim r º r, which involves independent, identically distributed, random variables and hence can be computed by the method given in the appendix. Then 3 D ¹. e µ /º. For ease in maing numerical comparisons with published values, we continue to use the parameters in the form given above. We also give parameters relevant to the scale that Altschul and Gish (996) call nats, which arises from multiplying K given in an arbitrary scale by µ. An increase of one unit in the threshold a D µ b results in a change of the probability () by approximately the factor e. In this scale, the mean values ¹ r become the Kullbac Leibler information numbers I r D µ ¹ r.r D 0; /. In Table 3, the values given for º 0 and º have been obtained by the method given in the appendix, which requires a one-dimensional numerical integration. It is easily automated and very fast to evaluate. To the extent that the values of º r do change with r, the value given for º in Table 3 should be a better approximation for large r, when the increments of the process `.r/ are independent over long stretches. Although the value of º is close to the values given in Table, it is outside the range of Table. Simulated Values of º r for BLOSUM62 Matrix a r j a Estimates are based on 50,000 repetitions of a Monte Carlo experiment, for various values of j.
6 554 STOREY AND SIEGMUND Table 2. Simulated Values of º r for PAM250 Matrix a r j a Estimates are based on 50,000 repetitions of a Monte Carlo experiment, for various values of j. Table 3. Parameters of Selected Substitution Matrices BLOSUM62 BLOSUM50 PAM250 µ º ¹ I º ¹ I Table 4. p-values a K ± b p 0 6 BLOSUM BLOSUM PAM a Upper entry is our approximation; lower entry from Altschul and Gish (996); m D n D 500.
7 LOCAL SEQUENCE ALIGNMENTS 555 reasonable sampling variability. The values of º for the BLOSUM50 and PAM250 matrices are within the range of reasonable sampling variability of the Monte Carlo estimates for large r. To verify that these values are at least reasonable, one can compare them to the case of a random wal having normally distributed increments, for which there is the simple approximation º ¼ exp[ ½.2I/ =2 ]; where ½ D 0:583 (cf. Siegmund, 985). This gives º 0 ¼ 0:625 and º ¼ 0:503 for the BLOSUM62 matrix and similar results for the other two cases. For implementation of the approximation (6), we also mae edge corrections to m and n, by replacing m with m 0 D m b=¹ 0 and by a similar correction to n. This correction is justi ed by the observation that an (ungapped) interval that achieves a score of b is roughly of length b=¹ 0. Similar edge corrections are used by Altschul and Gish (996) and by Mott and Tribe (999). Table 4 gives p-values for m D n D 500 and selected values of ; ±,andb. The rst entry given is the approximation (6); the second uses the extreme value approximation suggested by Waterman and Vingron (994) and by Altschul and Gish (996) with the empirically determined parameters given in Altschul and Gish (996). Our approximation is typically about 2 5 times as large as the Altschul Gish approximation. Very similar results hold for other values of m D n in the range [300,000]. The practical advantage of our approximation is that it requires only the almost trivial evaluation of µ and related parameters and computation of two one dimensional numerical integrals. Hence it can be easily evaluated for a scoring matrix chosen by the user without restriction to the few cases for which the empirical research necessary to estimate the parameters of the Altschul Gish approximation has been carried out. ACKNOWLEDGMENTS This research was supported in part by National Science Foundation Grant DMS and by an NSF Graduate Research Fellowship. 5. APPENDIX Theorem 2. Let x ;x 2 ; ;x n be independent and identically distributed arithmetic random variables with positive mean ¹, nite positive variance ¾ 2, and span h. LetÁ.t/ be the characteristic function of x and.t/ D log. Á.t// D P nd [Á n.t/=n]. LetS n D P n id x i. Then for arbitrary >0, 8 t ± ¹ ² 9 X µ n E.e SC n / C log.¹=h/ D Z 2¼ >< t e h it C log h h. eit / >= dt 2¼ 0 h >: e h it C e it : >; nd Let S. / denote the left hand side of (). Siegmund (985, p. 75) shows that º 0 and º are of the form hexp[ S.µ /]=[ e µ h ]: The values given in Table 2 were obtained from this expression. (3) REFERENCES Altschul, S.F., Bundschuh, R., Olsen, R., and Hwa, T The estimation of statistical parameters for local alignment score distributions. Nucl. Acids Res. 29, Altschul, S.F., and Gish, W Local alignment statistics. Methods in Enzymology 266, Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J Basic local alignment search tool. J. Mol. Biol. 25, Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl. Acids Res. 25,
8 556 STOREY AND SIEGMUND Arratia, R., Goldstein, L., and Gordon L Two moments suf ce for Poisson approximation: The Chen Stein method. Ann. Probab. 7, Arratia, R., Gordon, L., and Waterman, M.S The Erdös-Rényi Law in distribution for coin tossing and sequence matching. Ann. Statist. 8, Asmussen, S Ris theory in a Marovian environment. Scand. Actuarial J., Athreya, K.B., McDonald, D., and Ney, P Limit theorems for semi-marov processes and renewal theory for Marov chains. Ann. Probab. 6, Dayhoff, M.O Atlas of Protein Sequence and Structure, vol. 5, suppl. 3, National Biomedical Research Foundation, Washington, DC. Dembo, A., Karlin, S., and Zeitouni, O Limit distribution of maximal non-aligned two-sequence segmental score, Ann. Probab. 22, Henioff, S., and Henioff, J.G Amino acid substitution matrices from protein blocs. Proc. Nat. Acad. Sci. USA 89, Karlin, S., and Dembo, A Limit distributions of maximal segmental score among Marov-dependent partial sums. Adv. Appl. Probab. 24, Mott, R., and Tribe, R Approximate statistics of gapped alignments. J. Comp. Biol. 6, 9 2. Pearson, W.R Comparison of methods for searching protein databases. Protein Sci. 4, Robinson, A.B., and Robinson, L.R. 99. Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins. Proc. Nat. Acad. Sci. USA 88, Siegmund, D Sequential Analysis: Tests and Con dence Intervals. Springer-Verlag, New Yor. Siegmund, D., and Yair, B Approximate p-values for local sequence alignments. Ann. Statist. 28, Smith, T.F., and Waterman, M.S. 98. Identi cation of common molecular subsequences.j. Mol. Biol. 47, Tu, I.-P., and Siegmund, D The maximum of a function of a Marov chain and application to linage analysis. Adv. Appl. Probab. 3, Waterman, M Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall, London. Waterman, M., and Vingron, M Sequence comparison and Poisson approximation. Statistical Science 9, Woodroofe, M Nonlinear Renewal Theory in Sequential Analysis. SIAM, Philadelphia. Yair, B., and Polla, M A new representation for a renewal-theoretic constant appearing in asymptotic approximations of large deviations. Ann. Appl. Probab. 8, Address correspondence to: David Siegmund Stanford University Department of Statistics Sequoia Hall 40 Stanford, CA, dos@stat.stanford.edu
In-Depth Assessment of Local Sequence Alignment
2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.
More informationSingle alignment: Substitution Matrix. 16 march 2017
Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block
More informationSequence Database Search Techniques I: Blast and PatternHunter tools
Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered
More informationLocal Alignment Statistics
Local Alignment Statistics Stephen Altschul National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda, MD Central Issues in Biological Sequence Comparison
More informationGrundlagen der Bioinformatik, SS 08, D. Huson, May 2,
Grundlagen der Bioinformatik, SS 08, D. Huson, May 2, 2008 39 5 Blast This lecture is based on the following, which are all recommended reading: R. Merkl, S. Waack: Bioinformatik Interaktiv. Chapter 11.4-11.7
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationSequence Alignment Techniques and Their Uses
Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this
More informationAn Introduction to Sequence Similarity ( Homology ) Searching
An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,
More informationBLAST: Target frequencies and information content Dannie Durand
Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationPacific Symposium on Biocomputing 4: (1999)
OPTIMIZING SMITH-WATERMAN ALIGNMENTS ROLF OLSEN, TERENCE HWA Department of Physics, University of California at San Diego La Jolla, CA 92093-0319 email: rolf@cezanne.ucsd.edu, hwa@ucsd.edu MICHAEL L ASSIG
More informationTiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1
Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with
More informationStatistical Signi cance of Probabilistic Sequence Alignment and Related Local Hidden Markov Models ABSTRACT
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 3, 2001 Mary Ann Liebert, Inc. Pp. 249 282 Statistical Signi cance of Probabilistic Sequence Alignment and Related Local Hidden Markov Models YI-KUO YU
More informationOptimization of a New Score Function for the Detection of Remote Homologs
PROTEINS: Structure, Function, and Genetics 41:498 503 (2000) Optimization of a New Score Function for the Detection of Remote Homologs Maricel Kann, 1 Bin Qian, 2 and Richard A. Goldstein 1,2 * 1 Department
More informationPairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55
Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise
More informationBLAST: Basic Local Alignment Search Tool
.. CSC 448 Bioinformatics Algorithms Alexander Dekhtyar.. (Rapid) Local Sequence Alignment BLAST BLAST: Basic Local Alignment Search Tool BLAST is a family of rapid approximate local alignment algorithms[2].
More informationPairwise sequence alignment
Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL
More informationA profile-based protein sequence alignment algorithm for a domain clustering database
A profile-based protein sequence alignment algorithm for a domain clustering database Lin Xu,2 Fa Zhang and Zhiyong Liu 3, Key Laboratory of Computer System and architecture, the Institute of Computing
More informationChapter 7: Rapid alignment methods: FASTA and BLAST
Chapter 7: Rapid alignment methods: FASTA and BLAST The biological problem Search strategies FASTA BLAST Introduction to bioinformatics, Autumn 2007 117 BLAST: Basic Local Alignment Search Tool BLAST (Altschul
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationLecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models
Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary
More informationHomology Modeling. Roberto Lins EPFL - summer semester 2005
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More informationBiologically significant sequence alignments using Boltzmann probabilities
Biologically significant sequence alignments using Boltzmann probabilities P. Clote Department of Biology, Boston College Gasson Hall 416, Chestnut Hill MA 02467 clote@bc.edu May 7, 2003 Abstract In this
More informationMPIPairwiseStatSig: Parallel Pairwise Statistical Significance Estimation of Local Sequence Alignment
MPIPairwiseStatSig: Parallel Pairwise Statistical Significance Estimation of Local Sequence Alignment Ankit Agrawal, Sanchit Misra, Daniel Honbo, Alok Choudhary Dept. of Electrical Engg. and Computer Science
More informationStatistical Distributions of Optimal Global Alignment Scores of Random Protein Sequences
BMC Bioinformatics This Provisional PDF corresponds to the article as it appeared upon acceptance. The fully-formatted PDF version will become available shortly after the date of publication, from the
More informationCISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)
CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST
More informationSequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013
Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationMATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME
MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More informationGeneralized Affine Gap Costs for Protein Sequence Alignment
PROTEINS: Structure, Function, and Genetics 32:88 96 (1998) Generalized Affine Gap Costs for Protein Sequence Alignment Stephen F. Altschul* National Center for Biotechnology Information, National Library
More informationIntroduction to sequence alignment. Local alignment the Smith-Waterman algorithm
Lecture 2, 12/3/2003: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties Local alignment the Smith-Waterman algorithm 1 Computational
More informationLocal Alignment: Smith-Waterman algorithm
Local Alignment: Smith-Waterman algorithm Example: a shared common domain of two protein sequences; extended sections of genomic DNA sequence. Sensitive to detect similarity in highly diverged sequences.
More informationFinite Width Model Sequence Comparison
Finite Width Model Sequence Comparison Nicholas Chia and Ralf Bundschuh Department of Physics, Ohio State University, 74 W. 8th St., Columbus, OH 4, USA (Dated: May, 4) Sequence comparison is a widely
More informationA Practical Approach to Significance Assessment in Alignment with Gaps
A Practical Approach to Significance Assessment in Alignment with Gaps Nicholas Chia and Ralf Bundschuh 1 Ohio State University, Columbus, OH 43210, USA Abstract. Current numerical methods for assessing
More information1.5 Sequence alignment
1.5 Sequence alignment The dramatic increase in the number of sequenced genomes and proteomes has lead to development of various bioinformatic methods and algorithms for extracting information (data mining)
More informationStephen Scott.
1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More informationSEQUENCE ALIGNMENT AS HYPOTHESIS TESTING METHODS /(X) ¼ I Q(X)
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 5, 20 # Mary Ann Liebert, Inc. Pp. 677 69 DOI: 0.089/cmb.200.0328 Research Articles Sequence Alignment as Hypothesis Testing LU MENG, FENGZHU SUN, 2 XUEGONG
More informationSequence Analysis '17 -- lecture 7
Sequence Analysis '17 -- lecture 7 Significance E-values How significant is that? Please give me a number for......how likely the data would not have been the result of chance,......as opposed to......a
More informationPractical considerations of working with sequencing data
Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!
More informationSequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas Informal inductive proof of best alignment path onsider the last step in the best
More informationPoisson Approximation for Two Scan Statistics with Rates of Convergence
Poisson Approximation for Two Scan Statistics with Rates of Convergence Xiao Fang (Joint work with David Siegmund) National University of Singapore May 28, 2015 Outline The first scan statistic The second
More informationE-value Estimation for Non-Local Alignment Scores
E-value Estimation for Non-Local Alignment Scores 1,2 1 Wadsworth Center, New York State Department of Health 2 Department of Computer Science, Rensselaer Polytechnic Institute April 13, 211 Janelia Farm
More informationSequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University
Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of
More informationScoring Matrices. Shifra Ben-Dor Irit Orr
Scoring Matrices Shifra Ben-Dor Irit Orr Scoring matrices Sequence alignment and database searching programs compare sequences to each other as a series of characters. All algorithms (programs) for comparison
More informationFundamentals of database searching
Fundamentals of database searching Aligning novel sequences with previously characterized genes or proteins provides important insights into their common attributes and evolutionary origins. The principles
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationA New Similarity Measure among Protein Sequences
A New Similarity Measure among Protein Sequences Kuen-Pin Wu, Hsin-Nan Lin, Ting-Yi Sung and Wen-Lian Hsu * Institute of Information Science Academia Sinica, Taipei 115, Taiwan Abstract Protein sequence
More informationDerived Distribution Points Heuristic for Fast Pairwise Statistical Significance Estimation
Derived Distribution Points Heuristic for Fast Pairwise Statistical Significance Estimation Ankit Agrawal, Alok Choudhary Dept. of Electrical Engg. and Computer Science Northwestern University 2145 Sheridan
More informationBioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre
Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement
More informationSEPA: Approximate Non-Subjective Empirical p-value Estimation for Nucleotide Sequence Alignment
SEPA: Approximate Non-Subjective Empirical p-value Estimation for Nucleotide Sequence Alignment Ofer Gill and Bud Mishra Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street,
More informationSEQUENCE alignment is an underlying application in the
194 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 8, NO. 1, JANUARY/FEBRUARY 2011 Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific
More informationAlignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)
Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in
More informationDo Aligned Sequences Share the Same Fold?
J. Mol. Biol. (1997) 273, 355±368 Do Aligned Sequences Share the Same Fold? Ruben A. Abagyan* and Serge Batalov The Skirball Institute of Biomolecular Medicine Biochemistry Department NYU Medical Center
More informationDistributional regimes for the number of k-word matches between two random sequences
Distributional regimes for the number of k-word matches between two random sequences Ross A. Lippert, Haiyan Huang, and Michael S. Waterman Informatics Research, Celera Genomics, Rockville, MD 0878; Department
More informationProtein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche
Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its
More informationSequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas FYI - informal inductive proof of best alignment path onsider the last step in
More informationBIOINFORMATICS: An Introduction
BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and
More informationPairwise & Multiple sequence alignments
Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationGrouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids
Science in China Series C: Life Sciences 2007 Science in China Press Springer-Verlag Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids
More informationExercise 5. Sequence Profiles & BLAST
Exercise 5 Sequence Profiles & BLAST 1 Substitution Matrix (BLOSUM62) Likelihood to substitute one amino acid with another Figure taken from https://en.wikipedia.org/wiki/blosum 2 Substitution Matrix (BLOSUM62)
More informationComputational Biology
Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,
More informationbioinformatics 1 -- lecture 7
bioinformatics 1 -- lecture 7 Probability and conditional probability Random sequences and significance (real sequences are not random) Erdos & Renyi: theoretical basis for the significance of an alignment
More informationSequence comparison: Score matrices
Sequence comparison: Score matrices http://facultywashingtonedu/jht/gs559_2013/ Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas FYI - informal inductive proof of best
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationLecture 5,6 Local sequence alignment
Lecture 5,6 Local sequence alignment Chapter 6 in Jones and Pevzner Fall 2018 September 4,6, 2018 Evolution as a tool for biological insight Nothing in biology makes sense except in the light of evolution
More informationAlignment & BLAST. By: Hadi Mozafari KUMS
Alignment & BLAST By: Hadi Mozafari KUMS SIMILARITY - ALIGNMENT Comparison of primary DNA or protein sequences to other primary or secondary sequences Expecting that the function of the similar sequence
More informationReducing storage requirements for biological sequence comparison
Bioinformatics Advance Access published July 15, 2004 Bioinfor matics Oxford University Press 2004; all rights reserved. Reducing storage requirements for biological sequence comparison Michael Roberts,
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationPairwise alignment. 2.1 Introduction GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSD----LHAHKL
2 Pairwise alignment 2.1 Introduction The most basic sequence analysis task is to ask if two sequences are related. This is usually done by first aligning the sequences (or parts of them) and then deciding
More information14.1 Finding frequent elements in stream
Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours
More information1 Variance of a Random Variable
Indian Institute of Technology Bombay Department of Electrical Engineering Handout 14 EE 325 Probability and Random Processes Lecture Notes 9 August 28, 2014 1 Variance of a Random Variable The expectation
More informationThe Kuhn-Tucker Problem
Natalia Lazzati Mathematics for Economics (Part I) Note 8: Nonlinear Programming - The Kuhn-Tucker Problem Note 8 is based on de la Fuente (2000, Ch. 7) and Simon and Blume (1994, Ch. 18 and 19). The Kuhn-Tucker
More information2 ANDREW L. RUKHIN We accept here the usual in the change-point analysis convention that, when the minimizer in ) is not determined uniquely, the smal
THE RATES OF CONVERGENCE OF BAYES ESTIMATORS IN CHANGE-OINT ANALYSIS ANDREW L. RUKHIN Department of Mathematics and Statistics UMBC Baltimore, MD 2228 USA Abstract In the asymptotic setting of the change-point
More information11. Bootstrap Methods
11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and
More informationLet S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT
More informationBioinformatics and BLAST
Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists
More informationChapter 5. Proteomics and the analysis of protein sequence Ⅱ
Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and
More informationSequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University
Sequence Alignment: Scoring Schemes COMP 571 Luay Nakhleh, Rice University Scoring Schemes Recall that an alignment score is aimed at providing a scale to measure the degree of similarity (or difference)
More informationChapter 1. GMM: Basic Concepts
Chapter 1. GMM: Basic Concepts Contents 1 Motivating Examples 1 1.1 Instrumental variable estimator....................... 1 1.2 Estimating parameters in monetary policy rules.............. 2 1.3 Estimating
More informationInDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9
Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic
More informationSequence Comparison. mouse human
Sequence Comparison Sequence Comparison mouse human Why Compare Sequences? The first fact of biological sequence analysis In biomolecular sequences (DNA, RNA, or amino acid sequences), high sequence similarity
More informationBisection Ideas in End-Point Conditioned Markov Process Simulation
Bisection Ideas in End-Point Conditioned Markov Process Simulation Søren Asmussen and Asger Hobolth Department of Mathematical Sciences, Aarhus University Ny Munkegade, 8000 Aarhus C, Denmark {asmus,asger}@imf.au.dk
More informationGibbs Sampling Methods for Multiple Sequence Alignment
Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical
More informationMarkov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1
Markov-Switching Models with Endogenous Explanatory Variables by Chang-Jin Kim 1 Dept. of Economics, Korea University and Dept. of Economics, University of Washington First draft: August, 2002 This version:
More informationThe Distribution of Short Word Match Counts Between Markovian Sequences
The Distribution of Short Word Match Counts Between Markovian Sequences Conrad J. Burden 1, Paul Leopardi 1 and Sylvain Forêt 2 1 Mathematical Sciences Institute, Australian National University, Canberra,
More informationOn the length of the longest exact position. match in a random sequence
On the length of the longest exact position match in a random sequence G. Reinert and Michael S. Waterman Abstract A mixed Poisson approximation and a Poisson approximation for the length of the longest
More informationA Poisson model for gapped local alignments
A Poisson model for gapped local alignments Dirk Metzler 1, 2, Steffen Grossmann, and Anton Wakolbinger Department of Mathematics, Goethe-Universität, Postfach 11 19 32, D-60054 Frankfurt am Main, Germany
More informationPROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 46, Number 1, October 1974 ASYMPTOTIC DISTRIBUTION OF NORMALIZED ARITHMETICAL FUNCTIONS PAUL E
PROCEEDIGS OF THE AMERICA MATHEMATICAL SOCIETY Volume 46, umber 1, October 1974 ASYMPTOTIC DISTRIBUTIO OF ORMALIZED ARITHMETICAL FUCTIOS PAUL ERDOS AD JAOS GALAMBOS ABSTRACT. Let f(n) be an arbitrary arithmetical
More informationComparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey
Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes
More informationECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University
ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University Instructions: Answer all four (4) questions. Be sure to show your work or provide su cient justi cation for
More informationBasic Local Alignment Search Tool
Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses
More informationNonlinear Programming (NLP)
Natalia Lazzati Mathematics for Economics (Part I) Note 6: Nonlinear Programming - Unconstrained Optimization Note 6 is based on de la Fuente (2000, Ch. 7), Madden (1986, Ch. 3 and 5) and Simon and Blume
More informationMaking sense of score statistics for sequence alignments
Marco Pagni is staff member at the Swiss Institute of Bioinformatics. His research interests include software development and handling of databases of protein domains. C. Victor Jongeneel is Director of
More informationHomology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB
Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded
More information