Algebraic Dynamic Programming. Dynamic Programming, Old Country Style

Size: px
Start display at page:

Download "Algebraic Dynamic Programming. Dynamic Programming, Old Country Style"

Transcription

1 Algebraic Dynamic Programming Session 2 Dynamic Programming, Old Country Style Robert Giegerich (Lecture) Stefan Janssen (Exercises) Faculty of Technology Summer robert@techfak.uni-bielefeld.de

2 Programme of the day Review a classical application of dynamic programming in biosequence analysis Find out about sources of intrinsic difficulties in dynamic programming An anonymous referee wrote back in 2000:

3 Programme of the day Review a classical application of dynamic programming in biosequence analysis Find out about sources of intrinsic difficulties in dynamic programming An anonymous referee wrote back in 2000: The development of successful dynamic programming recurrences is a matter of experience, talent and luck.

4 Virtues of DP DP solves combinatorial optimization problems over an exponential search space in polynomial time via recursive problem decomposition and tabulation of intermediate results whenever Bellman s Principle of Optimality holds.

5 Simple string edit distance Today s example: edit distance via string alignment DA R L I N G A I R L I N E D A R L I N G d m i m m m m r A I R L I N E Standard edit distance model of x and y is based on operations r/m: replacement (resp. match) d: deletion from x i: insertion into y A variety of scoring schemes δ for r,d,i...

6 Simple string edit distance Variations of a fundamental theme String edit distance comes in many variations: many types of sequences and alphabets text, program code, DNA, proteins, numerical measurement series,... many different scoring schemes, even within the same domain: e.g. for protein sequences: Dayhoff matrices or BLOSUM matrices depending on evolutionary distance; additive versus affine gap scores,... many variants of the question asked: distance versus similarity, global/local alignment, small-in-large and free shift alignment,...

7 DP recurrences Edit distance computation via DP Ali δ (i, j) = best alignment score for suffixes x i+1... x m and y j+1... y n under additive score function δ Ali δ (i, j) is an m by n table that stores results of sub-alignments for re-use essential for polynomial efficiency!

8 DP recurrences Edit distance computation via DP ctd. Ali δ (m, n) = 0 (1) Ali δ (i, n) = Ali δ (i + 1, n) + δ(x i+1 ε) (2) Ali δ (m, j) = Ali δ (m, j + 1) + δ(ε y j+1 ) (3) Ali δ (i, j) = min Ali δ (i + 1, j) + δ(x i+1 ε) Ali δ (i, j + 1) + δ(ε y j+1 ) Ali δ (i + 1, j + 1) + δ(x i+1 y j+1 ) (4)

9 DP recurrences Edit distance computation via DP index bounds for i = m 1 to 0 Ali δ (m, n) = 0 (5) Ali δ (i, n) = Ali δ (i + 1, n) + δ(x i+1 ε)(6) for j = n 1 to 0 Ali δ (m, j) = Ali δ (m, j + 1) + δ(ε y j+1 (7) ) for i = m 1 to 0, for j = n 1 to 0 Ali δ (i, j) = min Ali δ (i + 1, j) + δ(x i+1 ε) Ali δ (i, j + 1) + δ(ε y j+1 ) Ali δ (i + 1, j + 1) + δ(x i+1 y j+1 ) (8)

10 DP recurrences DP as a programming method... (1) WHY does this find the optimal alignment?

11 DP recurrences DP as a programming method... (1) WHY does this find the optimal alignment? Bellman s Principle of Optimality! (Proof by induction)

12 DP recurrences DP as a programming method... (1) WHY does this find the optimal alignment? Bellman s Principle of Optimality! (Proof by induction) (2) What type of problem decomposition is this?

13 DP recurrences DP as a programming method... (1) WHY does this find the optimal alignment? Bellman s Principle of Optimality! (Proof by induction) (2) What type of problem decomposition is this? Plain structural recursion over an invisible alignment data type...

14 DP recurrences How does this scale up? More sophisticated problems need more recurrences/tables (= subproblem types) manifold case distinctions intricate search space higher asymptotic complexity This just means: more recurrences

15 DP recurrences Some examples of larger problems Here are data from some larger problems: We specify the number of DP tables that are computed, the number of different cases that make up the central case distinction, and asymptotic runtime complexity. application tables cases runtime edit distance (affine gaps) [Gotoh 1982] 3 15 n 2 spliced alignment [Usuka/Brendel 2000] 4 22 n 2 pknotsrg-enf [Reeder/Giegerich 2004] 47(17) 140 n 4 The last example has 47 recurrences, but only 17 tables are actually stored. The other 20 recurrences are recomputed on demand.

16 DP recurrences Edit distance with affine gap costs We omit the δ-subscript and for-loops... Ali(m, n) = 0 (9) Ali(i, n) = Del(i + 1, n) + OPEN (10) Ali(m, j) = Ins(m, j + 1) + OPEN (11) Del(i + 1, j) + OPEN Ali(i, j) = min Ins(i, j + 1) + OPEN (12) Ali(i + 1, j + 1) + δ(x i+1, y j+1 ) The new tables Del and Ins store scores of alignments that extend an already openend gap in x or y.

17 DP recurrences Del(m, n) = 0 (13) Del(i, n) = Del(i + 1, n) + EXTEND (14) Ali(i, j) Del(i, j) = min Del(i + 1, j) + EXTEND (15) Ins(i, j + 1) + OPEN Ins(m, n) = 0 (16) Ins(m, j) = Ins(m, j + 1) + EXTEND (17) Ali(i, j) Ins(i, j) = min Del(i + 1, j) + OPEN (18) Ins(i, j + 1) + EXTEND

18 DP recurrences A mild critique We usually say the score scheme δ, including OPEN and EXTEND, is a parameter of the algorithm but how about 0, +, min?

19 Non-separation of concerns Critique of traditional-style DP DP does not scale well simple problems are simple to solve large problems are catastrophic because DP recurrences lack abstraction problem decomposition (top-down) computation (bottom-up) table design correctness and efficiency concerns All issues are intermingled in the DP recurrences

20 Bad habits Bad habit 1: incomplete scoring abstraction Ali(m, n) = 0 (19) Ali(i, n) = Del(i + 1, n) + OPEN (20) Ali(m, j) = Ins(m, j + 1) + OPEN (21) Del(i + 1, j) + OPEN Ali(i, j) = min Ins(i, j + 1) + OPEN (22) Ali(i + 1, j + 1) + δ(x i+1, y j+1 ) Del(m, n) = 0 (23) Del(i, n) = Del(i + 1, n) + EXTEND (24) Ali(i, j) Del(i, j) = min Del(i + 1, j) + EXTEND (25) Ins(i, j + 1) + OPEN

21 Bad habits Bad habit 1: incomplete scoring abstraction Ali(m, n) = 0 (19) Ali(i, n) = Del(i + 1, n) + OPEN (20) Ali(m, j) = Ins(m, j + 1) + OPEN (21) Del(i + 1, j) + OPEN Ali(i, j) = min Ins(i, j + 1) + OPEN (22) Ali(i + 1, j + 1) + δ(x i+1, y j+1 ) Del(m, n) = 0 (23) Del(i, n) = Del(i + 1, n) + EXTEND (24) Ali(i, j) Del(i, j) = min Del(i + 1, j) + EXTEND (25) Ins(i, j + 1) + OPEN

22 Bad habits Bad habit 1: incomplete scoring abstraction Ali(m, n) = 0 (19) Ali(i, n) = Del(i + 1, n) + OPEN (20) Ali(m, j) = Ins(m, j + 1) + OPEN (21) Del(i + 1, j) + OPEN Ali(i, j) = min Ins(i, j + 1) + OPEN (22) Ali(i + 1, j + 1) + δ(x i+1, y j+1 ) Del(m, n) = 0 (23) Del(i, n) = Del(i + 1, n) + EXTEND (24) Ali(i, j) Del(i, j) = min Del(i + 1, j) + EXTEND (25) Ins(i, j + 1) + OPEN Del(i, j) = h [Ali(i, j), xdel(x i, Del(i + 1, j)), ins(y j, Ins(i, j + 1))]

23 Bad habits Bad habit 2: program restricted to single answer It always happens: someone asks: Are there several optimal answers? someone asks for k-best answers someone aks for all answers (on small examples) Always provide lists of answers let the scoring scheme (h) decide how many!

24 Bad habits Bad habit 2: program restricted to single answer It always happens: someone asks: Are there several optimal answers? someone asks for k-best answers someone aks for all answers (on small examples) Always provide lists of answers let the scoring scheme (h) decide how many! Added benefit avoid (ab)use of + or to label non-candidates use [] instead

25 Bad habits Bad habit 3: subword decomposition via subscript fiddling (Ali) rling irline r + + i r + i ling irline (Del) rling rline (Ins) ling rline (Ali)

26 Bad habits Bad habit 3: subword decomposition via subscript fiddling (Ali) rling irline r + + i r + i ling irline (Del) rling rline (Ins) ling rline (Ali) Rules of string decomposition are more clearly written as a CFG: Ali $ a Del Ins a a Ali a (26) Del $ a Del Ins a Ali (27) Ins $ Ins a a Del Ali (28) Each derivation of DARLING $ ENILRIA describes an alignment. Even simpler with a two-track grammar...

27 Bad habits Bad habit 4: redundant search space analysis [ a b [ a b ] ] [ ] a and are equivalent b [ a b (1 gap) competes with ] (2 gaps)

28 Bad habits Bad habit 4: redundant search space analysis [ a b [ a b ] ] [ ] a and are equivalent b [ a b (1 gap) competes with ] (2 gaps) Refine grammar: (axiom is Ali) Rep a Ali a $ Ali Rep a Del Ins a Del Rep a Del Ins a Ins Rep Ins a

29 Bad habits Bad habit 5: over-tabulation Note: Rep needs no DP table In general: With many nonterminal symbols, only some represent subproblems whose answer must be stored. Others an be (re-)calculated in O(1). Large example mentioned above: 47 nonterminals, 17 tables (optimal)

30 Bad habits Bad habit 6: backtracing To retrieve candidate with optimal score, trace back through optimal decisions. It requires ambitious programming to enumerate k best scores with their candidates. This is not treated in the textbooks.

31 Bad habits Bad habit 6: backtracing To retrieve candidate with optimal score, trace back through optimal decisions. It requires ambitious programming to enumerate k best scores with their candidates. This is not treated in the textbooks. I never got around to program the full backtracing...

32 Bad habits Bad habit 6: backtracing To retrieve candidate with optimal score, trace back through optimal decisions. It requires ambitious programming to enumerate k best scores with their candidates. This is not treated in the textbooks. I never got around to program the full backtracing... Solution: Substitute backtracing by a forward calculation for free!

33 Bad habits Bad habit 7: No reuse Adapting a DP algorithm to a problem variant means changing the code and extensive testing.

34 Promises The promise of algebraic dynamic programming We can spend a constant factor runtime overhead and avoid all the bad habits We obtain more reliable (and even faster) code more quickly We make DP fun

35 Solutions Summary: Bad habits and their remedies 1 incomplete scoring abstraction... evaluation algebras

36 Solutions Summary: Bad habits and their remedies 1 incomplete scoring abstraction... evaluation algebras

37 Solutions Summary: Bad habits and their remedies 1 incomplete scoring abstraction... evaluation algebras 2 single answer... answer lists

38 Solutions Summary: Bad habits and their remedies 1 incomplete scoring abstraction... evaluation algebras 2 single answer... answer lists 3 subscript fiddling... tree grammar

39 Solutions Summary: Bad habits and their remedies 1 incomplete scoring abstraction... evaluation algebras 2 single answer... answer lists 3 subscript fiddling... tree grammar 4 search space redundancy... grammar refinement

40 Solutions Summary: Bad habits and their remedies 1 incomplete scoring abstraction... evaluation algebras 2 single answer... answer lists 3 subscript fiddling... tree grammar 4 search space redundancy... grammar refinement 5 overtabulation... annotated grammar

41 Solutions Summary: Bad habits and their remedies 1 incomplete scoring abstraction... evaluation algebras 2 single answer... answer lists 3 subscript fiddling... tree grammar 4 search space redundancy... grammar refinement 5 overtabulation... annotated grammar 6 backtracing... pretty printing algebras

42 Solutions Summary: Bad habits and their remedies 1 incomplete scoring abstraction... evaluation algebras 2 single answer... answer lists 3 subscript fiddling... tree grammar 4 search space redundancy... grammar refinement 5 overtabulation... annotated grammar 6 backtracing... pretty printing algebras 7 more power... product algebras

43 Solutions Sneak Preview Next session s topics: The Reverse Engineering view on Dynamic Programming: Given a DP algorithm, what is its invisible data type? Basic definitions of algebraic dynamic programming.

Algebraic Dynamic Programming. Solving Satisfiability with ADP

Algebraic Dynamic Programming. Solving Satisfiability with ADP Algebraic Dynamic Programming Session 12 Solving Satisfiability with ADP Robert Giegerich (Lecture) Stefan Janssen (Exercises) Faculty of Technology Summer 2013 http://www.techfak.uni-bielefeld.de/ags/pi/lehre/adp

More information

Algebraic Dynamic Programming

Algebraic Dynamic Programming Algebraic Dynamic Programming Unit 2.b: Introduction to Bellman s GAP Robert Giegerich 1 (Lecture) Benedikt Löwes (Exercises) Faculty of Technology Bielefeld University http://www.techfak.uni-bielefeld.de/ags/pi/lehre/adp

More information

RNA Structure Prediction and Comparison. RNA folding

RNA Structure Prediction and Comparison. RNA folding RNA Structure Prediction and Comparison Session 3 RNA folding Faculty of Technology robert@techfak.uni-bielefeld.de Bielefeld, WS 2013/2014 Base Pair Maximization This was the first structure prediction

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17 12.1 Introduction Today we re going to do a couple more examples of dynamic programming. While

More information

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55 Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise

More information

Similarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012

Similarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012 Similarity Search The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 8, 2012 Nikolaus Augsten (DIS) Similarity Search Unit 2 March 8,

More information

Outline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009

Outline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009 Outline Approximation: Theory and Algorithms The Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 1 Nikolaus Augsten (DIS) Approximation: Theory and

More information

Outline. Similarity Search. Outline. Motivation. The String Edit Distance

Outline. Similarity Search. Outline. Motivation. The String Edit Distance Outline Similarity Search The Nikolaus Augsten nikolaus.augsten@sbg.ac.at Department of Computer Sciences University of Salzburg 1 http://dbresearch.uni-salzburg.at WS 2017/2018 Version March 12, 2018

More information

Approximation: Theory and Algorithms

Approximation: Theory and Algorithms Approximation: Theory and Algorithms The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 Nikolaus Augsten (DIS) Approximation:

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Lecture 11: Measuring the Complexity of Proofs

Lecture 11: Measuring the Complexity of Proofs IAS/PCMI Summer Session 2000 Clay Mathematics Undergraduate Program Advanced Course on Computational Complexity Lecture 11: Measuring the Complexity of Proofs David Mix Barrington and Alexis Maciel July

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

More information

Probabilistic Context-free Grammars

Probabilistic Context-free Grammars Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John

More information

Similarity Search. The String Edit Distance. Nikolaus Augsten.

Similarity Search. The String Edit Distance. Nikolaus Augsten. Similarity Search The String Edit Distance Nikolaus Augsten nikolaus.augsten@sbg.ac.at Dept. of Computer Sciences University of Salzburg http://dbresearch.uni-salzburg.at Version October 18, 2016 Wintersemester

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo CSE 431/531: Analysis of Algorithms Dynamic Programming Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Paradigms for Designing Algorithms Greedy algorithm Make a

More information

Practical considerations of working with sequencing data

Practical considerations of working with sequencing data Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!

More information

Local Alignment: Smith-Waterman algorithm

Local Alignment: Smith-Waterman algorithm Local Alignment: Smith-Waterman algorithm Example: a shared common domain of two protein sequences; extended sections of genomic DNA sequence. Sensitive to detect similarity in highly diverged sequences.

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Algorithms Exam TIN093 /DIT602

Algorithms Exam TIN093 /DIT602 Algorithms Exam TIN093 /DIT602 Course: Algorithms Course code: TIN 093, TIN 092 (CTH), DIT 602 (GU) Date, time: 21st October 2017, 14:00 18:00 Building: SBM Responsible teacher: Peter Damaschke, Tel. 5405

More information

Today s Outline. CS 362, Lecture 13. Matrix Chain Multiplication. Paranthesizing Matrices. Matrix Multiplication. Jared Saia University of New Mexico

Today s Outline. CS 362, Lecture 13. Matrix Chain Multiplication. Paranthesizing Matrices. Matrix Multiplication. Jared Saia University of New Mexico Today s Outline CS 362, Lecture 13 Jared Saia University of New Mexico Matrix Multiplication 1 Matrix Chain Multiplication Paranthesizing Matrices Problem: We are given a sequence of n matrices, A 1, A

More information

Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition

Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition Salil Vadhan October 11, 2012 Reading: Sipser, Section 2.3 and Section 2.1 (material on Chomsky Normal Form). Pumping Lemma for

More information

CS473 - Algorithms I

CS473 - Algorithms I CS473 - Algorithms I Lecture 10 Dynamic Programming View in slide-show mode CS 473 Lecture 10 Cevdet Aykanat and Mustafa Ozdal, Bilkent University 1 Introduction An algorithm design paradigm like divide-and-conquer

More information

Computational Group Theory

Computational Group Theory Computational Group Theory Soria Summer School 2009 Session 3: Coset enumeration July 2009, Hans Sterk (sterk@win.tue.nl) Where innovation starts Coset enumeration: contents 2/25 What is coset enumeration

More information

Enumeration Schemes for Words Avoiding Permutations

Enumeration Schemes for Words Avoiding Permutations Enumeration Schemes for Words Avoiding Permutations Lara Pudwell November 27, 2007 Abstract The enumeration of permutation classes has been accomplished with a variety of techniques. One wide-reaching

More information

Pairwise alignment, Gunnar Klau, November 9, 2005, 16:

Pairwise alignment, Gunnar Klau, November 9, 2005, 16: Pairwise alignment, Gunnar Klau, November 9, 2005, 16:36 2012 2.1 Growth rates For biological sequence analysis, we prefer algorithms that have time and space requirements that are linear in the length

More information

Automata Theory (2A) Young Won Lim 5/31/18

Automata Theory (2A) Young Won Lim 5/31/18 Automata Theory (2A) Copyright (c) 2018 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later

More information

Class Note #20. In today s class, the following four concepts were introduced: decision

Class Note #20. In today s class, the following four concepts were introduced: decision Class Note #20 Date: 03/29/2006 [Overall Information] In today s class, the following four concepts were introduced: decision version of a problem, formal language, P and NP. We also discussed the relationship

More information

CSE 311 Lecture 25: Relating NFAs, DFAs, and Regular Expressions. Emina Torlak and Kevin Zatloukal

CSE 311 Lecture 25: Relating NFAs, DFAs, and Regular Expressions. Emina Torlak and Kevin Zatloukal CSE 3 Lecture 25: Relating NFAs, DFAs, and Regular Expressions Emina Torlak and Kevin Zatloukal Topics From regular expressions to NFAs Theorem, algorithm, and examples. From NFAs to DFAs Theorem, algorithm,

More information

INF 4130 / /8-2014

INF 4130 / /8-2014 INF 4130 / 9135 26/8-2014 Mandatory assignments («Oblig-1», «-2», and «-3»): All three must be approved Deadlines around: 25. sept, 25. oct, and 15. nov Other courses on similar themes: INF-MAT 3370 INF-MAT

More information

Classified Dynamic Programming

Classified Dynamic Programming Bled, Feb. 2009 Motivation Our topic: Programming methodology A trade-off in dynamic programming between search space design and evaluation of candidates A trade-off between modifying your code and adding

More information

STATC141 Spring 2005 The materials are from Pairwise Sequence Alignment by Robert Giegerich and David Wheeler

STATC141 Spring 2005 The materials are from Pairwise Sequence Alignment by Robert Giegerich and David Wheeler STATC141 Spring 2005 The materials are from Pairise Sequence Alignment by Robert Giegerich and David Wheeler Lecture 6, 02/08/05 The analysis of multiple DNA or protein sequences (I) Sequence similarity

More information

Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) Scribe: John Ekins

Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) Scribe: John Ekins Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) 2 19 2015 Scribe: John Ekins Multiple Sequence Alignment Given N sequences x 1, x 2,, x N : Insert gaps in each of the sequences

More information

Lecture 7: Dynamic Programming I: Optimal BSTs

Lecture 7: Dynamic Programming I: Optimal BSTs 5-750: Graduate Algorithms February, 06 Lecture 7: Dynamic Programming I: Optimal BSTs Lecturer: David Witmer Scribes: Ellango Jothimurugesan, Ziqiang Feng Overview The basic idea of dynamic programming

More information

String Search. 6th September 2018

String Search. 6th September 2018 String Search 6th September 2018 Search for a given (short) string in a long string Search problems have become more important lately The amount of stored digital information grows steadily (rapidly?)

More information

Decision Diagrams and Dynamic Programming

Decision Diagrams and Dynamic Programming Decision Diagrams and Dynamic Programming J. N. Hooker Carnegie Mellon University CPAIOR 13 Decision Diagrams & Dynamic Programming Binary/multivalued decision diagrams are related to dynamic programming.

More information

More Dynamic Programming

More Dynamic Programming CS 374: Algorithms & Models of Computation, Spring 2017 More Dynamic Programming Lecture 14 March 9, 2017 Chandra Chekuri (UIUC) CS374 1 Spring 2017 1 / 42 What is the running time of the following? Consider

More information

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3 MA008 p.1/37 MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3 Dr. Markus Hagenbuchner markus@uow.edu.au. MA008 p.2/37 Exercise 1 (from LN 2) Asymptotic Notation When constants appear in exponents

More information

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21 Parsing Unger s Parser Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2016/17 1 / 21 Table of contents 1 Introduction 2 The Parser 3 An Example 4 Optimizations 5 Conclusion 2 / 21 Introduction

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

More Dynamic Programming

More Dynamic Programming Algorithms & Models of Computation CS/ECE 374, Fall 2017 More Dynamic Programming Lecture 14 Tuesday, October 17, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 48 What is the running time of the following?

More information

Sequence analysis and Genomics

Sequence analysis and Genomics Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

More information

INF 4130 / /8-2017

INF 4130 / /8-2017 INF 4130 / 9135 28/8-2017 Algorithms, efficiency, and complexity Problem classes Problems can be divided into sets (classes). Problem classes are defined by the type of algorithm that can (or cannot) solve

More information

Improved TBL algorithm for learning context-free grammar

Improved TBL algorithm for learning context-free grammar Proceedings of the International Multiconference on ISSN 1896-7094 Computer Science and Information Technology, pp. 267 274 2007 PIPS Improved TBL algorithm for learning context-free grammar Marcin Jaworski

More information

Aside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n

Aside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n Aside: Golden Ratio Golden Ratio: A universal law. Golden ratio φ = lim n a n+b n a n = 1+ 5 2 a n+1 = a n + b n, b n = a n 1 Ruta (UIUC) CS473 1 Spring 2018 1 / 41 CS 473: Algorithms, Spring 2018 Dynamic

More information

INF2220: algorithms and data structures Series 1

INF2220: algorithms and data structures Series 1 Universitetet i Oslo Institutt for Informatikk I. Yu, D. Karabeg INF2220: algorithms and data structures Series 1 Topic Function growth & estimation of running time, trees (Exercises with hints for solution)

More information

Computational Biology

Computational Biology Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

More information

arxiv: v1 [cs.ds] 9 Apr 2018

arxiv: v1 [cs.ds] 9 Apr 2018 From Regular Expression Matching to Parsing Philip Bille Technical University of Denmark phbi@dtu.dk Inge Li Gørtz Technical University of Denmark inge@dtu.dk arxiv:1804.02906v1 [cs.ds] 9 Apr 2018 Abstract

More information

Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort

Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort Xi Chen Columbia University We continue with two more asymptotic notation: o( ) and ω( ). Let f (n) and g(n) are functions that map

More information

Computational Models Lecture 8 1

Computational Models Lecture 8 1 Computational Models Lecture 8 1 Handout Mode Nachum Dershowitz & Yishay Mansour. Tel Aviv University. May 17 22, 2017 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice

More information

Properties of Context-Free Languages

Properties of Context-Free Languages Properties of Context-Free Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Pairwise alignment using HMMs

Pairwise alignment using HMMs Pairwise alignment using HMMs The states of an HMM fulfill the Markov property: probability of transition depends only on the last state. CpG islands and casino example: HMMs emit sequence of symbols (nucleotides

More information

Computational Models Lecture 8 1

Computational Models Lecture 8 1 Computational Models Lecture 8 1 Handout Mode Ronitt Rubinfeld and Iftach Haitner. Tel Aviv University. May 11/13, 2015 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice

More information

Theory Bridge Exam Example Questions

Theory Bridge Exam Example Questions Theory Bridge Exam Example Questions Annotated version with some (sometimes rather sketchy) answers and notes. This is a collection of sample theory bridge exam questions. This is just to get some idea

More information

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata. CMSC 330: Organization of Programming Languages Last Lecture Languages Sets of strings Operations on languages Finite Automata Regular expressions Constants Operators Precedence CMSC 330 2 Clarifications

More information

Lecture 5,6 Local sequence alignment

Lecture 5,6 Local sequence alignment Lecture 5,6 Local sequence alignment Chapter 6 in Jones and Pevzner Fall 2018 September 4,6, 2018 Evolution as a tool for biological insight Nothing in biology makes sense except in the light of evolution

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Introduction to Logic in Computer Science: Autumn 2006

Introduction to Logic in Computer Science: Autumn 2006 Introduction to Logic in Computer Science: Autumn 2006 Ulle Endriss Institute for Logic, Language and Computation University of Amsterdam Ulle Endriss 1 Plan for Today The first part of the course will

More information

Searching Sear ( Sub- (Sub )Strings Ulf Leser

Searching Sear ( Sub- (Sub )Strings Ulf Leser Searching (Sub-)Strings Ulf Leser This Lecture Exact substring search Naïve Boyer-Moore Searching with profiles Sequence profiles Ungapped approximate search Statistical evaluation of search results Ulf

More information

CSE 202 Homework 4 Matthias Springer, A

CSE 202 Homework 4 Matthias Springer, A CSE 202 Homework 4 Matthias Springer, A99500782 1 Problem 2 Basic Idea PERFECT ASSEMBLY N P: a permutation P of s i S is a certificate that can be checked in polynomial time by ensuring that P = S, and

More information

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Limitations of Algorithms We conclude with a discussion of the limitations of the power of algorithms. That is, what kinds

More information

Lecture 2: Pairwise Alignment. CG Ron Shamir

Lecture 2: Pairwise Alignment. CG Ron Shamir Lecture 2: Pairwise Alignment 1 Main source 2 Why compare sequences? Human hexosaminidase A vs Mouse hexosaminidase A 3 www.mathworks.com/.../jan04/bio_genome.html Sequence Alignment עימוד רצפים The problem:

More information

This lecture covers Chapter 7 of HMU: Properties of CFLs

This lecture covers Chapter 7 of HMU: Properties of CFLs This lecture covers Chapter 7 of HMU: Properties of CFLs Chomsky Normal Form Pumping Lemma for CFs Closure Properties of CFLs Decision Properties of CFLs Additional Reading: Chapter 7 of HMU. Chomsky Normal

More information

CS 170 Algorithms Fall 2014 David Wagner MT2

CS 170 Algorithms Fall 2014 David Wagner MT2 CS 170 Algorithms Fall 2014 David Wagner MT2 PRINT your name:, (last) SIGN your name: (first) Your Student ID number: Your Unix account login: cs170- The room you are sitting in right now: Name of the

More information

CSE 311: Foundations of Computing I Autumn 2014 Practice Final: Section X. Closed book, closed notes, no cell phones, no calculators.

CSE 311: Foundations of Computing I Autumn 2014 Practice Final: Section X. Closed book, closed notes, no cell phones, no calculators. CSE 311: Foundations of Computing I Autumn 014 Practice Final: Section X YY ZZ Name: UW ID: Instructions: Closed book, closed notes, no cell phones, no calculators. You have 110 minutes to complete the

More information

CS 241 Analysis of Algorithms

CS 241 Analysis of Algorithms CS 241 Analysis of Algorithms Professor Eric Aaron Lecture T Th 9:00am Lecture Meeting Location: OLB 205 Business Grading updates: HW5 back today HW7 due Dec. 10 Reading: Ch. 22.1-22.3, Ch. 25.1-2, Ch.

More information

.Cycle counting: the next generation. Matthew Skala 30 January 2013

.Cycle counting: the next generation. Matthew Skala 30 January 2013 .Cycle counting: the next generation δ ζ µ β ι θ γ η ε λ κ σ ρ α o Matthew Skala 30 January 2013 Outline Cycle counting ECCHI Knight s Tours Equivalent circuits The next generation Cycle counting Let G

More information

Indiana Academic Standards for Precalculus

Indiana Academic Standards for Precalculus PRECALCULUS correlated to the Indiana Academic Standards for Precalculus CC2 6/2003 2004 Introduction to Precalculus 2004 by Roland E. Larson and Robert P. Hostetler Precalculus thoroughly explores topics

More information

INF4130: Dynamic Programming September 2, 2014 DRAFT version

INF4130: Dynamic Programming September 2, 2014 DRAFT version INF4130: Dynamic Programming September 2, 2014 DRAFT version In the textbook: Ch. 9, and Section 20.5 Chapter 9 can also be found at the home page for INF4130 These slides were originally made by Petter

More information

Pairwise sequence alignment

Pairwise sequence alignment Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL

More information

COS 341: Discrete Mathematics

COS 341: Discrete Mathematics COS 341: Discrete Mathematics Midterm Exam Fall 2006 Print your name General directions: This exam is due on Monday, November 13 at 4:30pm. Late exams will not be accepted. Exams must be submitted in hard

More information

Data Structures and Algorithms CSE 465

Data Structures and Algorithms CSE 465 Data Structures and Algorithms CSE 465 LECTURE 3 Asymptotic Notation O-, Ω-, Θ-, o-, ω-notation Divide and Conquer Merge Sort Binary Search Sofya Raskhodnikova and Adam Smith /5/0 Review Questions If input

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 03: Edit distance and sequence alignment Slides adapted from Dr. Shaojie Zhang (University of Central Florida) KUMC visit How many of you would like to attend

More information

Moreover, the circular logic

Moreover, the circular logic Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT

More information

Pairwise sequence alignment and pair hidden Markov models

Pairwise sequence alignment and pair hidden Markov models Pairwise sequence alignment and pair hidden Markov models Martin C. Frith April 13, 2012 ntroduction Pairwise alignment and pair hidden Markov models (phmms) are basic textbook fare [2]. However, there

More information

Great Theoretical Ideas

Great Theoretical Ideas 15-251 Great Theoretical Ideas in Computer Science Gödel s Legacy: Proofs and Their Limitations Lecture 25 (November 16, 2010) The Halting Problem A Quick Recap of the Previous Lecture Is there a program

More information

NAME... Soc. Sec. #... Remote Location... (if on campus write campus) FINAL EXAM EE568 KUMAR. Sp ' 00

NAME... Soc. Sec. #... Remote Location... (if on campus write campus) FINAL EXAM EE568 KUMAR. Sp ' 00 NAME... Soc. Sec. #... Remote Location... (if on campus write campus) FINAL EXAM EE568 KUMAR Sp ' 00 May 3 OPEN BOOK exam (students are permitted to bring in textbooks, handwritten notes, lecture notes

More information

Foundations of

Foundations of 91.304 Foundations of (Theoretical) Computer Science Chapter 3 Lecture Notes (Section 3.2: Variants of Turing Machines) David Martin dm@cs.uml.edu With some modifications by Prof. Karen Daniels, Fall 2012

More information

Algorithms: COMP3121/3821/9101/9801

Algorithms: COMP3121/3821/9101/9801 Algorithms: COMP311/381/9101/9801 Aleks Ignjatović, ignjat@cse.unsw.edu.au office: 504 (CSE building); phone: 5-6659 Course Admin: Amin Malekpour, a.malekpour@unsw.edu.au School of Computer Science and

More information

UNIT-VIII COMPUTABILITY THEORY

UNIT-VIII COMPUTABILITY THEORY CONTEXT SENSITIVE LANGUAGE UNIT-VIII COMPUTABILITY THEORY A Context Sensitive Grammar is a 4-tuple, G = (N, Σ P, S) where: N Set of non terminal symbols Σ Set of terminal symbols S Start symbol of the

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Computational Models Lecture 8 1

Computational Models Lecture 8 1 Computational Models Lecture 8 1 Handout Mode Ronitt Rubinfeld and Iftach Haitner. Tel Aviv University. April 18/ May 2, 2016 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice

More information

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir Sequence Bioinformatics Multiple Sequence Alignment Waqas Nasir 2010-11-12 Multiple Sequence Alignment One amino acid plays coy; a pair of homologous sequences whisper; many aligned sequences shout out

More information

1 Computational Problems

1 Computational Problems Stanford University CS254: Computational Complexity Handout 2 Luca Trevisan March 31, 2010 Last revised 4/29/2010 In this lecture we define NP, we state the P versus NP problem, we prove that its formulation

More information

Parsing. Unger s Parser. Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is

Parsing. Unger s Parser. Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is Unger s Parser Laura Heinrich-Heine-Universität Düsseldorf Wintersemester 2012/2013 a top-down parser: we start with S and

More information

Pattern-Matching for Strings with Short Descriptions

Pattern-Matching for Strings with Short Descriptions Pattern-Matching for Strings with Short Descriptions Marek Karpinski marek@cs.uni-bonn.de Department of Computer Science, University of Bonn, 164 Römerstraße, 53117 Bonn, Germany Wojciech Rytter rytter@mimuw.edu.pl

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Lecture 15 Ana Bove May 17th 2018 Recap: Context-free Languages Chomsky hierarchy: Regular languages are also context-free; Pumping lemma

More information

CSE 206A: Lattice Algorithms and Applications Spring Basic Algorithms. Instructor: Daniele Micciancio

CSE 206A: Lattice Algorithms and Applications Spring Basic Algorithms. Instructor: Daniele Micciancio CSE 206A: Lattice Algorithms and Applications Spring 2014 Basic Algorithms Instructor: Daniele Micciancio UCSD CSE We have already seen an algorithm to compute the Gram-Schmidt orthogonalization of a lattice

More information

Supplementary Notes on Inductive Definitions

Supplementary Notes on Inductive Definitions Supplementary Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 August 29, 2002 These supplementary notes review the notion of an inductive definition

More information

Theory of Computation

Theory of Computation Thomas Zeugmann Hokkaido University Laboratory for Algorithmics http://www-alg.ist.hokudai.ac.jp/ thomas/toc/ Lecture 3: Finite State Automata Motivation In the previous lecture we learned how to formalize

More information

Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University

Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University Sequence Alignment: Scoring Schemes COMP 571 Luay Nakhleh, Rice University Scoring Schemes Recall that an alignment score is aimed at providing a scale to measure the degree of similarity (or difference)

More information

Theory of Computation 8 Deterministic Membership Testing

Theory of Computation 8 Deterministic Membership Testing Theory of Computation 8 Deterministic Membership Testing Frank Stephan Department of Computer Science Department of Mathematics National University of Singapore fstephan@comp.nus.edu.sg Theory of Computation

More information

13 Comparative RNA analysis

13 Comparative RNA analysis 13 Comparative RNA analysis Sources for this lecture: R. Durbin, S. Eddy, A. Krogh und G. Mitchison, Biological sequence analysis, Cambridge, 1998 D.W. Mount. Bioinformatics: Sequences and Genome analysis,

More information

Classes of Boolean Functions

Classes of Boolean Functions Classes of Boolean Functions Nader H. Bshouty Eyal Kushilevitz Abstract Here we give classes of Boolean functions that considered in COLT. Classes of Functions Here we introduce the basic classes of functions

More information

NP, polynomial-time mapping reductions, and NP-completeness

NP, polynomial-time mapping reductions, and NP-completeness NP, polynomial-time mapping reductions, and NP-completeness In the previous lecture we discussed deterministic time complexity, along with the time-hierarchy theorem, and introduced two complexity classes:

More information

15.1 Proof of the Cook-Levin Theorem: SAT is NP-complete

15.1 Proof of the Cook-Levin Theorem: SAT is NP-complete CS125 Lecture 15 Fall 2016 15.1 Proof of the Cook-Levin Theorem: SAT is NP-complete Already know SAT NP, so only need to show SAT is NP-hard. Let L be any language in NP. Let M be a NTM that decides L

More information

Notes for Lecture Notes 2

Notes for Lecture Notes 2 Stanford University CS254: Computational Complexity Notes 2 Luca Trevisan January 11, 2012 Notes for Lecture Notes 2 In this lecture we define NP, we state the P versus NP problem, we prove that its formulation

More information

String Matching Problem

String Matching Problem String Matching Problem Pattern P Text T Set of Locations L 9/2/23 CAP/CGS 5991: Lecture 2 Computer Science Fundamentals Specify an input-output description of the problem. Design a conceptual algorithm

More information

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming 20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, 2008 4 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance 4. Global and local alignment

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information