6.6 Sequence Alignment
|
|
- Bruce Austin
- 5 years ago
- Views:
Transcription
1 6.6 Sequence Alignment
2 String Similarity How similar are two strings? ocurrance o c u r r a n c e - occurrence First model the problem Q. How can we measure the distance? o c c u r r e n c e 6 mismatches, 1 gap o c - u r r a n c e o c c u r r e n c e 1 mismatch, 1 gap o c - u r r - a n c e o c c u r r e - n c e 0 mismatches, 3 gaps 2
3 String Similarity How similar are two strings? ocurrance o c u r r a n c e - occurrence o c c u r r e n c e First model the problem Q. How can we measure the distance? A. Idea: best of all possibilities - Penalty for mismatches (depending on characters) - Penalty for gaps Minimize total penalty 6 mismatches, 1 gap o c - u r r a n c e o c c u r r e n c e 1 mismatch, 1 gap o c - u r r - a n c e o c c u r r e - n c e 0 mismatches, 3 gaps 3
4 Edit Distance Applications. Basis for Unix diff. Speech recognition. Spelling suggestions in document editor. Computational biology. Edit distance. [Levenshtein 1966, Needleman-Wunsch 1970] Gap penalty ; mismatch penalty pq of chars p and q. Cost = sum of gap and mismatch penalties. C T G A C C T A C C T - C T G A C C T A C C T C C T G A C T A C A T C C T G A C - T A C A T TC + GT + AG + 2 CA 2 + CA 4
5 Sequence Alignment Goal: Given two strings X = x 1 x 2... x m and Y = y 1 y 2... y n find alignment M of minimum cost. Def. An alignment M is a set of ordered pairs x i -y j such that each item occurs in at most one pair and no crossings. Def. The pair x i -y j and x i' -y j' cross if i < i', but j > j'. cost ( M ) x i y j ( x i, y j ) M i : x i unmatched j : y j unmatched mismatch gap 5
6 Sequence Alignment cost ( M ) x i y j ( x i, y j ) M i : x i unmatched j : y j unmatched mismatch gap Q. CTACCG vs. TACATG. Suppose all α=1 for all mismatches and δ=1, what then is cost of M = {x 2 -y 1, x 3 -y 2, x 4 -y 3, x 5 -y 4, x 6 -y 6 }? A. x 1 x 2 x 3 x 4 x 5 x 6 C T A C C G T A C A T G y 1 y 2 y 3 y 4 y 5 y 6 6
7 Sequence Alignment cost ( M ) x i y j ( x i, y j ) M i : x i unmatched j : y j unmatched mismatch gap Q. CTACCG vs. TACATG. Suppose all α=1 for all mismatches and δ=1, what then is cost of M = {x 2 -y 1, x 3 -y 2, x 4 -y 3, x 5 -y 4, x 6 -y 6 }? A. 3 x 1 x 2 x 3 x 4 x 5 x 6 C T A C C - G - T A C A T G y 1 y 2 y 3 y 4 y 5 y 6 7
8 Sequence Alignment: Problem Structure Def. OPT(i, j) = min cost of aligning strings x 1 x 2... x i and y 1 y 2... y j. Q. How to define OPT recursively? What are the cases? (1 min) x 1 x i-1 x i y 1 y j-1 y j 8
9 Sequence Alignment: Problem Structure Def. OPT(i, j) = min cost of aligning strings x 1 x 2... x i and y 1 y 2... y j. Case 1: OPT matches x i with y j. x 1 x i-1 x i y 1 y j-1 y j Case 2a: OPT leaves x i unmatched. x 1 x i-1 x i y 1 y j-1 y j - Case 2b: OPT leaves y j unmatched. x 1 x i-1 x i - y 1 y j-1 y j 9
10 Sequence Alignment: Problem Structure Def. OPT(i, j) = min cost of aligning strings x 1 x 2... x i and y 1 y 2... y j. Case 1: OPT matches x i with y j. pay mismatch for x i with y j + min cost of aligning x 1 x 2... x i-1 and y 1 y 2... y j-1 Case 2a: OPT leaves x i unmatched. x 1 y 1 x i-1 y j-1 x i y j Case 2b: OPT leaves y j unmatched. 10
11 Sequence Alignment: Problem Structure Def. OPT(i, j) = min cost of aligning strings x 1 x 2... x i and y 1 y 2... y j. Case 1: OPT matches x i with y j. pay mismatch for x i with y j + min cost of aligning x 1 x 2... x i-1 and y 1 y 2... y j-1 Case 2a: OPT leaves x i unmatched. pay gap for x i and min cost of x 1 x i-1 x i aligning x 1 x 2... x i-1 and y 1 y 2... y j Case 2b: OPT leaves y j unmatched. y 1 y j-1 y j - 11
12 Sequence Alignment: Problem Structure Def. OPT(i, j) = min cost of aligning strings x 1 x 2... x i and y 1 y 2... y j. Case 1: OPT matches x i with y j. pay mismatch for x i with y j + min cost of aligning x 1 x 2... x i-1 and y 1 y 2... y j-1 Case 2a: OPT leaves x i unmatched. pay gap for x i and min cost of aligning x 1 x 2... x i-1 and y 1 y 2... y j Case 2b: OPT leaves y j unmatched. pay gap for y j and min cost of x 1 x i-1 x i - aligning x 1 x 2... x i and y 1 y 2... y j-1 y 1 y j-1 y j Q. What to do if one of the strings (subproblems) is empty? 12
13 Sequence Alignment: Problem Structure Def. OPT(i, j) = min cost of aligning strings x 1 x 2... x i and y 1 y 2... y j. Case 1: OPT matches x i with y j. pay mismatch for x i with y j + min cost of aligning x 1 x 2... x i-1 and y 1 y 2... y j-1 Case 2a: OPT leaves x i unmatched. pay gap for x i and min cost of aligning x 1 x 2... x i-1 and y 1 y 2... y j Case 2b: OPT leaves y j unmatched. pay gap for y j and min cost of x 1 x i-1 x i - aligning x 1 x 2... x i and y 1 y 2... y j-1 y 1 y j-1 y j OPT (i, j) j if i 0 x i y OPT (i 1, j 1) j min OPT (i 1, j) otherwise OPT (i, j 1) i if j 0 13
14 Sequence Alignment: Algorithm Sequence-Alignment(m, n, x 1 x 2...x m, y 1 y 2...y n,, ) { for i = 0 to m M[i, 0] = i for j = 0 to n M[0, j] = j } for i = 1 to m for j = 1 to n M[i, j] = min( [x i, y j ] + M[i-1, j-1], + M[i-1, j], + M[i, j-1]) return M[m, n] Q. How to prove correctness of such an algorithms? 14
15 Proving Correctness of Dynamic Programming Approaches Thm. Sequence-Alignment gives minimal cost of possible alignment. Pf. 15
16 Proving Correctness of Dynamic Programming Approaches Thm. Sequence-Alignment gives minimal cost of possible alignment. Pf. (by induction) Base: by definition of cost: if m=0, we have n gap penalties; similar if n=0 IH: Suppose Sequence-Alignment gives the minimal cost of any possible alignment up to lengths i and j-1 and up i-1 and j. Step: Let x and y of length i and j be given. 16
17 Proving Correctness of Dynamic Programming Approaches Thm. Sequence-Alignment gives minimal cost of possible alignment. Pf. (by induction) Base: by definition of cost: if m=0, we have n gap penalties; similar if n=0 IH: Suppose Sequence-Alignment gives the minimal cost of any possible alignment up to lengths i and j-1 and up i-1 and j. Step: Let x and y of length i and j be given. Then there are three options for the alignment of the last characters: Case 1: OPT matches x i with y j. pay mismatch for x i with y j + min cost of aligning up to i-1 and j-1 Case 2a: OPT leaves x i unmatched. pay gap for x i and min cost of aligning up to i-1 and j Case 2b: OPT leaves y j unmatched. pay gap for y j and min cost of aligning up to i and j-1 The algorithm takes exactly the minimum of these three options. With induction on both i and j, the theorem now follows. 17
18 Sequence Alignment: Algorithm Sequence-Alignment(m, n, x 1 x 2...x m, y 1 y 2...y n,, ) { for i = 0 to m M[i, 0] = i for j = 0 to n M[0, j] = j } for i = 1 to m for j = 1 to n M[i, j] = min( [x i, y j ] + M[i-1, j-1], + M[i-1, j], + M[i, j-1]) return M[m, n] Q. What is time + space complexity? 18
19 Sequence Alignment: Algorithm Sequence-Alignment(m, n, x 1 x 2...x m, y 1 y 2...y n,, ) { for i = 0 to m M[i, 0] = i for j = 0 to n M[0, j] = j } for i = 1 to m for j = 1 to n M[i, j] = min( [x i, y j ] + M[i-1, j-1], + M[i-1, j], + M[i, j-1]) return M[m, n] Q. What is time + space complexity? A. (mn) time and space. English words or sentences: m, n 10. Computational biology: m = n = billions ops OK, but 10GB array? 19
20 Sequence Alignment: Linear Space Q. How to avoid quadratic space when only interested in the value? (1 min) OPT (i, j) j if i 0 x i y OPT (i 1, j 1) j min OPT (i 1, j) otherwise OPT (i, j 1) i if j 0 20
21 Sequence Alignment: Linear Space Q. How to avoid quadratic space when only interested in the value? A. We can calculate the optimal value in O(m + n) space and O(mn) time. Compute OPT(i, ) from OPT(i-1, ). Re-use space for row i-1. No longer a simple way to recover alignment itself. OPT (i, j) j if i 0 x i y OPT (i 1, j 1) j min OPT (i 1, j) otherwise OPT (i, j 1) i if j 0 21
22 Sequence Alignment: Linear Space Q. How to avoid quadratic space when only interested in the value? A. We can calculate the optimal value in O(m + n) space and O(mn) time. Compute OPT(i, ) from OPT(i-1, ). Re-use space for row i-1. No longer a simple way to recover alignment itself. Q*. How can we still get the solution as well? 22
23 Sequence Alignment: Linear Space Q. How to avoid quadratic space when only interested in the value? A. We can calculate the optimal value in O(m + n) space and O(mn) time. Compute OPT(i, ) from OPT(i-1, ). Re-use space for row i-1. No longer a simple way to recover alignment itself. Q*. How can we still get the solution as well? Theorem. [Hirschberg 1975] Optimal alignment in O(m + n) space and O(mn) time. Clever combination of divide-and-conquer and dynamic programming. Inspired by idea of Savitch from complexity theory. 23
Objec&ves. Review. Dynamic Programming. What is the knapsack problem? What is our solu&on? Ø Review Knapsack Ø Sequence Alignment 3/28/18
/8/8 Objec&ves Dynamic Programming Ø Review Knapsack Ø Sequence Alignment Mar 8, 8 CSCI - Sprenkle Review What is the knapsack problem? What is our solu&on? Mar 8, 8 CSCI - Sprenkle /8/8 Dynamic Programming:
More informationCSE 202 Dynamic Programming II
CSE 202 Dynamic Programming II Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Algorithmic Paradigms Greed. Build up a solution incrementally,
More informationCS 580: Algorithm Design and Analysis
CS 58: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 28 Announcement: Homework 3 due February 5 th at :59PM Midterm Exam: Wed, Feb 2 (8PM-PM) @ MTHW 2 Recap: Dynamic Programming
More informationChapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Algorithmic Paradigms Greed. Build up a solution incrementally, myopically optimizing
More information6. DYNAMIC PROGRAMMING II
6. DYNAMIC PROGRAMMING II sequence alignment Hirschberg's algorithm Bellman-Ford algorithm distance vector protocols negative cycles in a digraph Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison
More informationChapter 6. Dynamic Programming. CS 350: Winter 2018
Chapter 6 Dynamic Programming CS 350: Winter 2018 1 Algorithmic Paradigms Greedy. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer. Break up a problem into
More informationDynamic Programming. Weighted Interval Scheduling. Algorithmic Paradigms. Dynamic Programming
lgorithmic Paradigms Dynamic Programming reed Build up a solution incrementally, myopically optimizing some local criterion Divide-and-conquer Break up a problem into two sub-problems, solve each sub-problem
More informationChapter 6. Weighted Interval Scheduling. Dynamic Programming. Algorithmic Paradigms. Dynamic Programming Applications
lgorithmic Paradigms hapter Dynamic Programming reedy. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer. Break up a problem into sub-problems, solve each
More informationAreas. ! Bioinformatics. ! Control theory. ! Information theory. ! Operations research. ! Computer science: theory, graphics, AI, systems,.
lgorithmic Paradigms hapter Dynamic Programming reed Build up a solution incrementally, myopically optimizing some local criterion Divide-and-conquer Break up a problem into two sub-problems, solve each
More informationCopyright 2000, Kevin Wayne 1
/9/ lgorithmic Paradigms hapter Dynamic Programming reed. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer. Break up a problem into two sub-problems, solve
More informationChapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. Algorithmic Paradigms Greed. Build up a solution incrementally, myopically optimizing some
More informationMore Dynamic Programming
CS 374: Algorithms & Models of Computation, Spring 2017 More Dynamic Programming Lecture 14 March 9, 2017 Chandra Chekuri (UIUC) CS374 1 Spring 2017 1 / 42 What is the running time of the following? Consider
More informationMore Dynamic Programming
Algorithms & Models of Computation CS/ECE 374, Fall 2017 More Dynamic Programming Lecture 14 Tuesday, October 17, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 48 What is the running time of the following?
More informationOutline DP paradigm Discrete optimisation Viterbi algorithm DP: 0 1 Knapsack. Dynamic Programming. Georgy Gimel farb
Outline DP paradigm Discrete optimisation Viterbi algorithm DP: Knapsack Dynamic Programming Georgy Gimel farb (with basic contributions by Michael J. Dinneen) COMPSCI 69 Computational Science / Outline
More informationAlgoritmi e strutture di dati 2
Algoritmi e strutture di dati 2 Paola Vocca Lezione 5: Allineamento di sequenze Lezione 5 - Allineamento di sequenze 1 Allineamento sequenze Struttura secondaria dell RNA Algoritmi e strutture di dati
More informationChapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Algorithmic Paradigms Greed. Build up a solution incrementally, myopically optimizing
More informationLecture 2: Pairwise Alignment. CG Ron Shamir
Lecture 2: Pairwise Alignment 1 Main source 2 Why compare sequences? Human hexosaminidase A vs Mouse hexosaminidase A 3 www.mathworks.com/.../jan04/bio_genome.html Sequence Alignment עימוד רצפים The problem:
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 18 Dynamic Programming (Segmented LS recap) Longest Common Subsequence Adam Smith Segmented Least Squares Least squares. Foundational problem in statistic and numerical
More informationAside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n
Aside: Golden Ratio Golden Ratio: A universal law. Golden ratio φ = lim n a n+b n a n = 1+ 5 2 a n+1 = a n + b n, b n = a n 1 Ruta (UIUC) CS473 1 Spring 2018 1 / 41 CS 473: Algorithms, Spring 2018 Dynamic
More informationDynamic Programming. Cormen et. al. IV 15
Dynamic Programming Cormen et. al. IV 5 Dynamic Programming Applications Areas. Bioinformatics. Control theory. Operations research. Some famous dynamic programming algorithms. Unix diff for comparing
More informationOutline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009
Outline Approximation: Theory and Algorithms The Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 1 Nikolaus Augsten (DIS) Approximation: Theory and
More informationOutline. Similarity Search. Outline. Motivation. The String Edit Distance
Outline Similarity Search The Nikolaus Augsten nikolaus.augsten@sbg.ac.at Department of Computer Sciences University of Salzburg 1 http://dbresearch.uni-salzburg.at WS 2017/2018 Version March 12, 2018
More informationApproximation: Theory and Algorithms
Approximation: Theory and Algorithms The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 Nikolaus Augsten (DIS) Approximation:
More informationSimilarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012
Similarity Search The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 8, 2012 Nikolaus Augsten (DIS) Similarity Search Unit 2 March 8,
More informationCSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo
CSE 431/531: Analysis of Algorithms Dynamic Programming Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Paradigms for Designing Algorithms Greedy algorithm Make a
More informationSequence Comparison. mouse human
Sequence Comparison Sequence Comparison mouse human Why Compare Sequences? The first fact of biological sequence analysis In biomolecular sequences (DNA, RNA, or amino acid sequences), high sequence similarity
More informationSimilarity Search. The String Edit Distance. Nikolaus Augsten.
Similarity Search The String Edit Distance Nikolaus Augsten nikolaus.augsten@sbg.ac.at Dept. of Computer Sciences University of Salzburg http://dbresearch.uni-salzburg.at Version October 18, 2016 Wintersemester
More informationAnalysis and Design of Algorithms Dynamic Programming
Analysis and Design of Algorithms Dynamic Programming Lecture Notes by Dr. Wang, Rui Fall 2008 Department of Computer Science Ocean University of China November 6, 2009 Introduction 2 Introduction..................................................................
More informationCMPSCI 311: Introduction to Algorithms Second Midterm Exam
CMPSCI 311: Introduction to Algorithms Second Midterm Exam April 11, 2018. Name: ID: Instructions: Answer the questions directly on the exam pages. Show all your work for each question. Providing more
More informationCSE 421 Weighted Interval Scheduling, Knapsack, RNA Secondary Structure
CSE Weighted Interval Scheduling, Knapsack, RNA Secondary Structure Shayan Oveis haran Weighted Interval Scheduling Interval Scheduling Job j starts at s(j) and finishes at f j and has weight w j Two jobs
More informationCSE 421 Dynamic Programming
CSE Dynamic Programming Yin Tat Lee Weighted Interval Scheduling Interval Scheduling Job j starts at s(j) and finishes at f j and has weight w j Two jobs compatible if they don t overlap. Goal: find maximum
More informationA Simple Linear Space Algorithm for Computing a Longest Common Increasing Subsequence
A Simple Linear Space Algorithm for Computing a Longest Common Increasing Subsequence Danlin Cai, Daxin Zhu, Lei Wang, and Xiaodong Wang Abstract This paper presents a linear space algorithm for finding
More informationComputational Biology Lecture 5: Time speedup, General gap penalty function Saad Mneimneh
Computational Biology Lecture 5: ime speedup, General gap penalty function Saad Mneimneh We saw earlier that it is possible to compute optimal global alignments in linear space (it can also be done for
More informationUnsupervised Vocabulary Induction
Infant Language Acquisition Unsupervised Vocabulary Induction MIT (Saffran et al., 1997) 8 month-old babies exposed to stream of syllables Stream composed of synthetic words (pabikumalikiwabufa) After
More informationBio nformatics. Lecture 3. Saad Mneimneh
Bio nformatics Lecture 3 Sequencing As before, DNA is cut into small ( 0.4KB) fragments and a clone library is formed. Biological experiments allow to read a certain number of these short fragments per
More informationLecture 5: Minimizing DFAs
6.45 Lecture 5: Minimizing DFAs 6.45 Announcements: - Pset 2 is up (as of last night) - Dylan says: It s fire. - How was Pset? 2 DFAs NFAs DEFINITION Regular Languages Regular Expressions 3 4 Some Languages
More informationLecture 2: Divide and conquer and Dynamic programming
Chapter 2 Lecture 2: Divide and conquer and Dynamic programming 2.1 Divide and Conquer Idea: - divide the problem into subproblems in linear time - solve subproblems recursively - combine the results in
More informationA Survey of the Longest Common Subsequence Problem and Its. Related Problems
Survey of the Longest Common Subsequence Problem and Its Related Problems Survey of the Longest Common Subsequence Problem and Its Related Problems Thesis Submitted to the Faculty of Department of Computer
More informationarxiv: v1 [cs.ds] 9 Apr 2018
From Regular Expression Matching to Parsing Philip Bille Technical University of Denmark phbi@dtu.dk Inge Li Gørtz Technical University of Denmark inge@dtu.dk arxiv:1804.02906v1 [cs.ds] 9 Apr 2018 Abstract
More informationAlgorithms and Theory of Computation. Lecture 9: Dynamic Programming
Algorithms and Theory of Computation Lecture 9: Dynamic Programming Xiaohui Bei MAS 714 September 10, 2018 Nanyang Technological University MAS 714 September 10, 2018 1 / 21 Recursion in Algorithm Design
More informationEvolution. CT Amemiya et al. Nature 496, (2013) doi: /nature12027
Sequence Alignment Evolution CT Amemiya et al. Nature 496, 311-316 (2013) doi:10.1038/nature12027 Evolutionary Rates next generation OK OK OK X X Still OK? Sequence conservation implies function Alignment
More informationPairwise alignment, Gunnar Klau, November 9, 2005, 16:
Pairwise alignment, Gunnar Klau, November 9, 2005, 16:36 2012 2.1 Growth rates For biological sequence analysis, we prefer algorithms that have time and space requirements that are linear in the length
More informationLecture 4: September 19
CSCI1810: Computational Molecular Biology Fall 2017 Lecture 4: September 19 Lecturer: Sorin Istrail Scribe: Cyrus Cousins Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More informationDynamic Programming 1
Dynamic Programming 1 lgorithmic Paradigms Divide-and-conquer. Break up a problem into two sub-problems, solve each sub-problem independently, and combine solution to sub-problems to form solution to original
More informationLecture 5: September Time Complexity Analysis of Local Alignment
CSCI1810: Computational Molecular Biology Fall 2017 Lecture 5: September 21 Lecturer: Sorin Istrail Scribe: Cyrus Cousins Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More information6. DYNAMIC PROGRAMMING I
6. DYNAMIC PRORAMMIN I weighted interval scheduling segmented least squares knapsack problem RNA secondary structure Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley http://www.cs.princeton.edu/~wayne/kleinberg-tardos
More information6. DYNAMIC PROGRAMMING I
6. DYNAMIC PROGRAMMING I weighted interval scheduling segmented least squares knapsack problem RNA secondary structure Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley Copyright 2013
More informationFoundations of Natural Language Processing Lecture 6 Spelling correction, edit distance, and EM
Foundations of Natural Language Processing Lecture 6 Spelling correction, edit distance, and EM Alex Lascarides (Slides from Alex Lascarides and Sharon Goldwater) 2 February 2019 Alex Lascarides FNLP Lecture
More informationPairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55
Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise
More informationEfficient High-Similarity String Comparison: The Waterfall Algorithm
Efficient High-Similarity String Comparison: The Waterfall Algorithm Alexander Tiskin Department of Computer Science University of Warwick http://go.warwick.ac.uk/alextiskin Alexander Tiskin (Warwick)
More informationSequence Comparison with Mixed Convex and Concave Costs
Sequence Comparison with Mixed Convex and Concave Costs David Eppstein Computer Science Department Columbia University New York, NY 10027 February 20, 1989 Running Head: Sequence Comparison with Mixed
More informationSEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA
SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.
More informationCOMP 355 Advanced Algorithms Algorithm Design Review: Mathematical Background
COMP 355 Advanced Algorithms Algorithm Design Review: Mathematical Background 1 Polynomial Time Brute force. For many non-trivial problems, there is a natural brute force search algorithm that checks every
More informationCS2223 Algorithms D Term 2009 Exam 3 Solutions
CS2223 Algorithms D Term 2009 Exam 3 Solutions May 4, 2009 By Prof. Carolina Ruiz Dept. of Computer Science WPI PROBLEM 1: Asymptoptic Growth Rates (10 points) Let A and B be two algorithms with runtimes
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More informationPair Hidden Markov Models
Pair Hidden Markov Models Scribe: Rishi Bedi Lecturer: Serafim Batzoglou January 29, 2015 1 Recap of HMMs alphabet: Σ = {b 1,...b M } set of states: Q = {1,..., K} transition probabilities: A = [a ij ]
More informationPattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1
Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Outline and Reading Strings ( 9.1.1) Pattern matching algorithms Brute-force algorithm ( 9.1.2) Boyer-Moore algorithm ( 9.1.3) Knuth-Morris-Pratt
More informationCSE 549 Lecture 3: Sequence Similarity & Alignment. slides (w/*) courtesy of Carl Kingsford
CSE 549 Lecture 3: Sequence Similarity & Alignment slides (w/*) courtesy of Carl Kingsford Relatedness of Biological Sequence https://en.wikipedia.org/wiki/phylogenetic_tree Relatedness of Biological Sequence
More informationAlgorithms in Bioinformatics: A Practical Introduction. Sequence Similarity
Algorithms in Bioinformatics: A Practical Introduction Sequence Similarity Earliest Researches in Sequence Comparison Doolittle et al. (Science, July 1983) searched for platelet-derived growth factor (PDGF)
More information6. DYNAMIC PROGRAMMING II. sequence alignment Hirschberg's algorithm Bellman-Ford distance vector protocols negative cycles in a digraph
6. DYNAMIC PROGRAMMING II sequence alignment Hirschberg's algorithm Bellman-Ford distance vector protocols negative cycles in a digraph Shortest paths Shortest path problem. Given a digraph G = (V, E),
More informationSearching. Constant time access. Hash function. Use an array? Better hash function? Hash function 4/18/2013. Chapter 9
Constant time access Searching Chapter 9 Linear search Θ(n) OK Binary search Θ(log n) Better Can we achieve Θ(1) search time? CPTR 318 1 2 Use an array? Use random access on a key such as a string? Hash
More information6. DYNAMIC PROGRAMMING II
6. DYNAMIC PROGRAMMING II sequence alignmen Hirschberg's algorihm Bellman-Ford algorihm disance vecor proocols negaive cycles in a digraph 6. DYNAMIC PROGRAMMING II sequence alignmen Hirschberg's algorihm
More informationIntroduction to Bioinformatics Algorithms Homework 3 Solution
Introduction to Bioinformatics Algorithms Homework 3 Solution Saad Mneimneh Computer Science Hunter College of CUNY Problem 1: Concave penalty function We have seen in class the following recurrence for
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17
601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17 12.1 Introduction Today we re going to do a couple more examples of dynamic programming. While
More informationMinimum Edit Distance. Defini'on of Minimum Edit Distance
Minimum Edit Distance Defini'on of Minimum Edit Distance How similar are two strings? Spell correc'on The user typed graffe Which is closest? graf gra@ grail giraffe Computa'onal Biology Align two sequences
More informationSequence Alignment (chapter 6)
Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:
More information20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming
20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, 2008 4 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance 4. Global and local alignment
More informationCopyright 2000, Kevin Wayne 1
//8 Fast Integer Division Too (!) Schönhage Strassen algorithm CS 8: Algorithm Design and Analysis Integer division. Given two n-bit (or less) integers s and t, compute quotient q = s / t and remainder
More informationCS 580: Algorithm Design and Analysis
CS 580: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 208 Announcement: Homework 3 due February 5 th at :59PM Final Exam (Tentative): Thursday, May 3 @ 8AM (PHYS 203) Recap: Divide
More informationPattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching Goodrich, Tamassia
Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Brute-Force Pattern Matching ( 11.2.1) The brute-force pattern matching algorithm compares the pattern P with the text T for each possible shift
More informationLarge-Scale Genomic Surveys
Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More informationString Matching Problem
String Matching Problem Pattern P Text T Set of Locations L 9/2/23 CAP/CGS 5991: Lecture 2 Computer Science Fundamentals Specify an input-output description of the problem. Design a conceptual algorithm
More informationDynamic programming. Curs 2015
Dynamic programming. Curs 2015 Fibonacci Recurrence. n-th Fibonacci Term INPUT: n nat QUESTION: Compute F n = F n 1 + F n 2 Recursive Fibonacci (n) if n = 0 then return 0 else if n = 1 then return 1 else
More informationAccelerated Natural Language Processing Lecture 3 Morphology and Finite State Machines; Edit Distance
Accelerated Natural Language Processing Lecture 3 Morphology and Finite State Machines; Edit Distance Sharon Goldwater (based on slides by Philipp Koehn) 20 September 2018 Sharon Goldwater ANLP Lecture
More informationCOMP 355 Advanced Algorithms
COMP 355 Advanced Algorithms Algorithm Design Review: Mathematical Background 1 Polynomial Running Time Brute force. For many non-trivial problems, there is a natural brute force search algorithm that
More informationPairwise sequence alignment
Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL
More informationDynamic Programming( Weighted Interval Scheduling)
Dynamic Programming( Weighted Interval Scheduling) 17 November, 2016 Dynamic Programming 1 Dynamic programming algorithms are used for optimization (for example, finding the shortest path between two points,
More informationWelcome to CS262: Computational Genomics
Instructor: Welcome to CS262: Computational Genomics Serafim Batzoglou TA: Paul Chen email: cs262-win2015-staff@lists.stanford.edu Tuesdays & Thursdays 12:50-2:05pm Clark S361 http://cs262.stanford.edu
More informationAn Algorithm for Computing the Invariant Distance from Word Position
An Algorithm for Computing the Invariant Distance from Word Position Sergio Luán-Mora 1 Departamento de Lenguaes y Sistemas Informáticos, Universidad de Alicante, Campus de San Vicente del Raspeig, Ap.
More informationLecture 1, 31/10/2001: Introduction to sequence alignment. The Needleman-Wunsch algorithm for global sequence alignment: description and properties
Lecture 1, 31/10/2001: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties 1 Computational sequence-analysis The major goal of computational
More informationCollected Works of Charles Dickens
Collected Works of Charles Dickens A Random Dickens Quote If there were no bad people, there would be no good lawyers. Original Sentence It was a dark and stormy night; the night was dark except at sunny
More informationLecture 6: Recurrent Algorithms:
Lecture 6: Recurrent Algorithms: Divide-and-Conquer Principle Georgy Gimel farb COMPSCI 220 Algorithms and Data Structures 1 / 20 1 Divide-and-conquer 2 Math induction 3 Telescoping 4 Examples 5 Pros and
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationAdam Blank Spring 2017 CSE 311. Foundations of Computing I
Adam Blank Spring 2017 CSE 311 Foundations of Computing I CSE 311: Foundations of Computing Lecture 17: Structural Induction Strings An alphabet Σ is any finite set of characters The set Σ* is the set
More information2. ALGORITHM ANALYSIS
2. ALGORITHM ANALYSIS computational tractability asymptotic order of growth survey of common running times Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley http://www.cs.princeton.edu/~wayne/kleinberg-tardos
More informationImplementing Approximate Regularities
Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National
More informationDynamic Programming. Reading: CLRS Chapter 15 & Section CSE 6331: Algorithms Steve Lai
Dynamic Programming Reading: CLRS Chapter 5 & Section 25.2 CSE 633: Algorithms Steve Lai Optimization Problems Problems that can be solved by dynamic programming are typically optimization problems. Optimization
More informationInformation Processing Letters. A fast and simple algorithm for computing the longest common subsequence of run-length encoded strings
Information Processing Letters 108 (2008) 360 364 Contents lists available at ScienceDirect Information Processing Letters www.vier.com/locate/ipl A fast and simple algorithm for computing the longest
More informationSTATC141 Spring 2005 The materials are from Pairwise Sequence Alignment by Robert Giegerich and David Wheeler
STATC141 Spring 2005 The materials are from Pairise Sequence Alignment by Robert Giegerich and David Wheeler Lecture 6, 02/08/05 The analysis of multiple DNA or protein sequences (I) Sequence similarity
More informationPage 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence
Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)
More informationApplication of the LCS Problem to High Frequency Financial Data
Application of the LCS Problem to High Frequency Financial Data Kevin Ngan Lady Margaret Hall University of Oxford A thesis submitted in partial fulfillment of the MSc in Mathematical and Computational
More informationBIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University
BIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University Measures of Sequence Similarity Alignment with dot
More informationDynamic programming. Curs 2017
Dynamic programming. Curs 2017 Fibonacci Recurrence. n-th Fibonacci Term INPUT: n nat QUESTION: Compute F n = F n 1 + F n 2 Recursive Fibonacci (n) if n = 0 then return 0 else if n = 1 then return 1 else
More informationMatching Residents to Hospitals
Midterm Review Matching Residents to Hospitals Goal. Given a set of preferences among hospitals and medical school students, design a self-reinforcing admissions process. Unstable pair: applicant x and
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More informationAlgorithms for biological sequence Comparison and Alignment
Algorithms for biological sequence Comparison and Alignment Sara Brunetti, Dipartimento di Ingegneria dell'informazione e Scienze Matematiche University of Siena, Italy, sara.brunetti@unisi.it 1 A piece
More informationCSE 591 Foundations of Algorithms Homework 4 Sample Solution Outlines. Problem 1
CSE 591 Foundations of Algorithms Homework 4 Sample Solution Outlines Problem 1 (a) Consider the situation in the figure, every edge has the same weight and V = n = 2k + 2. Easy to check, every simple
More informationMat 3770 Bin Packing or
Basic Algithm Spring 2014 Used when a problem can be partitioned into non independent sub problems Basic Algithm Solve each sub problem once; solution is saved f use in other sub problems Combine solutions
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Lecture : p he biological problem p lobal alignment p Local alignment p Multiple alignment 6 Background: comparative genomics p Basic question in biology: what properties
More information