Alignment of Long Sequences. BMI/CS Spring 2016 Anthony Gitter

Size: px
Start display at page:

Download "Alignment of Long Sequences. BMI/CS Spring 2016 Anthony Gitter"

Transcription

1 Alignment of Long Sequences BMI/CS Spring 2016 Anthony Gitter

2 Gols for Lecture Key concepts how lrge-scle lignment differs from the simple cse the cnonicl three step pproch of lrge-scle ligners using suffix trees to find mximl unique mtching subsequences (MUMs) If time permits using tries nd threded tries to find lignment seeds constrined dynmic progrmming to lign between/round nchors using sprse dynmic progrmming (DP) to find chin of locl lignments 2

3 Pirwise Lrge-Scle Alignment: Tsk Definition Given pir of lrge-scle sequences (e.g. chromosomes) method for scoring the lignment (e.g. substitution mtrices, insertion/deletion prmeters) Do construct globl lignment: identify ll mtching positions between the two sequences 3

4 Lrge Scle Alignment Exmple: Mouse Chr6 vs. Humn Chr12 Figure from: Delcher et l., Nucleic Acids Reserch 27,

5 Why the Problem is Chllenging Sequences too big to mke O(n 2 ) dynmicprogrmming methods prcticl Long sequences re less likely to be colliner becuse of rerrngements initilly we ll ssume collinerity we ll consider rerrngements in next lecture 5

6 Generl Strtegy Figure from: Brudno et l. Genome Reserch, perform pttern mtching to find seeds for globl lignment 2. find good chin of nchors 3. fill in reminder with stndrd but constrined lignment method 6

7 The MUMmer System Delcher et l., Nucleic Acids Reserch, 1999 Given: genomes A nd B 1. find ll mximl unique mtching subsequences (MUMs) 2. extrct the longest possible set of mtches tht occur in the sme order in both genomes 3. close the gps 7

8 Step 1: Finding Seeds in MUMmer Mximl unique mtch: occurs exctly once in both genomes A nd B not contined in ny longer MUM mismtches Key insight: significntly long MUM is certin to be prt of the globl lignment 8

9 Suffix Trees Substring problem: given text S of length m preprocess S in O(m) time such tht, given query string Q of length n, find occurrence (if ny) of Q in S in O(n) time Suffix trees solve this problem nd others 9

10 Suffix Tree Definition key property A suffix tree T for string S of length m is tree with the following properties: rooted nd directed m leves, lbeled 1 to m ech edge lbeled by substring of S conctention of edge lbels on pth from root to lef i is suffix i of S (we will denote this by Si...m) ech internl non-root node hs t lest two children edges out of node must begin with different chrcters 10

11 Suffixes S = bnn$ suffixes of S $ $ n$ n$ nn$ nn$ bnn$ 11

12 Suffix Tree Exmple S = bnn$ Add $ to end so tht suffix tree exists (no suffix is prefix of nother suffix) n $ $ n $ b n n $ n $ n $ $

13 Solving the Substring Problem Assume we hve suffix tree T FindMtch(Q, T): follow (unique) pth down from root of T ccording to chrcters in Q if ll of Q is found to be prefix of such pth return lbel of some lef below this pth else, return no mtch found 13

14 Solving the Substring Problem Q = nn Q = nb n $ $ n $ b n n $ n $ n $ $ 7 STOP n $ $ n $ b n n $ n $ n $ $ return return no mtch found 14

15 MUMs nd Generlized Suffix Trees Build one suffix tree for both genomes A nd B Lbel ech lef node with genome it represents Genome A: cccg# Genome B: cct$ cg# c g# t$ ech internl node represents repeted sequence A, 3 cg# c g# t$ A, 5 B, 3 A, 2 A, 4 B, 2 cg# t$ 15 A, 1 B, 1 ech lef represents suffix nd its position in sequence

16 MUMs nd Suffix Trees Unique mtch: internl node with 2 children, lef nodes from different genomes But these mtches re not necessrily mximl Genome A: cccg# Genome B: cct$ cg# c g# t$ A, 3 cg# c g# t$ A, 5 B, 3 A, 2 A, 4 B, 2 cg# t$ A, 1 B, 1 represents unique mtch 16

17 MUMs nd Suffix Trees To identify mximl mtches, cn compre suffixes following unique mtch nodes Genome A: ct# Genome B: c$ c t# A, 4 $ $ c t# t# $ B, 4 B, 3 A, 3 A, 2 B, 2 t# A, 1 $ B, 1 the suffixes following these two mtch nodes re the sme; the left one represents longer mtch (c) 17

18 Using Suffix Trees to Find MUMs O(n) time to construct suffix tree for both sequences (of lengths n) O(n) time to find MUMs - one scn of the tree (which is O(n) in size) O(n) possible MUMs in contrst to O(n 2 ) possible exct mtches Min prmeter of pproch: length of shortest MUM tht should be identified (20 50 bses) 18

19 Step 2: Chining in MUMmer Sort MUMs ccording to position in genome A Solve vrition of Longest Incresing Subsequence (LIS) problem to find sequences in scending order in both genomes Figure from: Delcher et l., Nucleic Acids Reserch 27,

20 Finding Longest Subsequence Unlike ordinry LIS problems, MUMmer tkes into ccount lengths of sequences represented by MUMs overlps Requires O( k log k) time where k is number of MUMs 20

21 Types of Gps in MUMmer Alignment Figure from: Delcher et l., Nucleic Acids Reserch 27,

22 Step 3: Close the Gps SNPs: between MUMs: trivil to detect otherwise: hndle like repets Insertions trnspositions (subsequences tht were deleted from one loction nd inserted elsewhere): look for out-of-sequence MUMs simple insertions: trivil to detect 22

23 Step 3: Close the Gps Polymorphic regions short ones: lign them with dynmic progrmming method long ones: cll MUMmer recursively with reduced minimum MUM length Repets detected by overlpping MUMs Figure from: Delcher et l. Nucleic Acids Reserch 27,

24 The LAGAN Method Brudno et l., Genome Reserch, 2003 Given: genomes A nd B nchors = find_nchors(a, B) step 3: finish globl lignment with DP constrined by nchors find_nchors(a, B) step 1: find locl lignments by mtching, chining k-mer seeds step 2: nchors = highest-weight sequence of locl lignments for ech pir of djcent nchors 1, 2 in nchors if 1, 2 re more thn d bses prt A, B = sequences between 1, 2 sub-nchors = find_nchors( A, B ) insert sub-nchors between 1, 2 in nchors return nchors 24

25 Step 1: Finding Seeds in LAGAN Degenerte k-mers: mtching k-long sequences with smll number of mismtches llowed By defult, LAGAN uses 10-mers nd llows 1 mismtch ccg cgcgctct cct ct cgcggtct cgt 25

26 Finding Seeds in LAGAN Exmple: trie to represent ll 3-mers of the sequence gccgcct c g c c g c c g t c 2 3, One sequence is used to build the trie The other sequence (the query) is wlked through to find mtching k-mers 26

27 Allowing Degenerte Mtches Suppose we re llowing 1 bse to mismtch in looking for mtches to the 3-mer cc; need to explore green nodes c g c c g c c g t 2 3, c 27

28 LAGAN Uses Threded Tries In threded trie, ech lef for word w 1...w p hs bck pointer to the node for w 2...w p c g c c g c c g t 2 3, c 28

29 Usully requires following only two pointers to mtch ginst the next k-mer, insted of trversing tree from root for ech 29 Trversing Threded Trie Consider trversing the trie to find 3-mer mtches for the query sequence: ccgt c g c c g c c g t 2 3, c

30 Step 1b: Chining Seeds in LAGAN Cn chin seeds s 1 nd s 2 if the indices of s 1 > indices of s 2 (for both sequences) s 1 nd s 2 re ner ech other Keep trck of seeds in the serch box s the query sequence is processed Figure from: Brudno et l. BMC Bioinformtics,

31 Step 2: Chining in LAGAN Use sprse dynmic progrmming to chin locl lignments 31

32 The Problem: Find Chin of Locl Alignments (x,y) (x,y ) requires x < x y < y Ech locl lignment hs weight FIND the chin with highest totl weight Slide from Serfim Btzoglou, Stnford University 32

33 Sprse DP for rectngle chining 1,, N: rectngles h (h j, l j ): y-coordintes of rectngle j l w(j): weight of rectngle j V(j): optiml score of chin ending in j L: list of triplets (l j, V(j), j) y L is sorted by l j : smllest (North) to lrgest (South) vlue L is implemented s blnced binry tree Slide from Serfim Btzoglou, Stnford University 33

34 Sprse DP for rectngle chining Min ide: Sweep through x- coordintes To the right of b, nything chinble to is chinble to b Therefore, if V(b) > V(), rectngle is useless for subsequent chining In L, keep rectngles j sorted with incresing l j - coordintes sorted with incresing V(j) score V() V(b) Slide from Serfim Btzoglou, Stnford University 34

35 Sprse DP for rectngle chining Go through rectngle x-coordintes, from lowest to highest: 1. When on the leftmost end of rectngle i: j. j: rectngle in L, with lrgest l j < h i b. V(i) = w(i) + V(j) k i 2. When on the rightmost end of i:. k: rectngle in L, with lrgest l k l i b. If V(i) > V(k): i. INSERT (l i, V(i), i) in L ii. REMOVE ll (l j, V(j), j) with V(j) V(i) & l j l i Slide from Serfim Btzoglou, Stnford University 35

36 Exmple x y : 5 c: 3 b: 6 d: 4 e: 2 V L b c d e l i V(i) i cb d e 1. When on the leftmost end of rectngle i:. j: rectngle in L, with lrgest l j < h i b. V(i) = w(i) + V(j) 36 Slide from Serfim Btzoglou, Stnford University 2. When on the rightmost end of i:. k: rectngle in L, with lrgest l k l i b. If V(i) > V(k): i. INSERT (l i, V(i), i) in L ii. REMOVE ll (l j, V(j), j) with V(j) V(i) & l j l i

37 Time Anlysis 1. Sorting the x-coords tkes O(N log N) 2. Going through x-coords: N steps 3. Ech of N steps requires O(log N) time: Serching L tkes log N Inserting to L tkes log N All deletions re consecutive, so log N per deletion Ech element is deleted t most once: N log N for ll deletions Recll tht INSERT, DELETE, SUCCESSOR, tke O(log N) time in blnced binry serch tree Slide from Serfim Btzoglou, Stnford University 37

38 Constrined Dynmic If we know tht the i th element in one sequence must lign with the j th element in the other, we cn ignore two rectngles in the DP mtrix Progrmming j i 38

39 Step 3: Computing the Globl Alignment in LAGAN Given n nchor tht strts t (i, j) nd ends t (i, j ), LAGAN limits the DP to the unshded regions Thus nchors re somewht flexible Figure from: Brudno et l. Genome Reserch,

40 Step 3: Computing the Globl Alignment in LAGAN Figures from: Brudno et l. Genome Reserch,

41 Exmple Alignment: E. Coli O157:H7 vs. E. coli K-12 Figure from: Pern et l. Nture,

42 Comprison of Lrge-Scle Alignment Methods Method Pttern mtching Chining MUMmer suffix tree - MUMs LIS vrint AVID (not discussed) suffix tree - exct & wobble mtches Smith-Wtermn vrint LAGAN k-mer trie, inexct mtches sprse DP 42

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer

More information

Where did dynamic programming come from?

Where did dynamic programming come from? Where did dynmic progrmming come from? String lgorithms Dvid Kuchk cs302 Spring 2012 Richrd ellmn On the irth of Dynmic Progrmming Sturt Dreyfus http://www.eng.tu.c.il/~mi/cd/ or50/1526-5463-2002-50-01-0048.pdf

More information

Computing the Optimal Global Alignment Value. B = n. Score of = 1 Score of = a a c g a c g a. A = n. Classical Dynamic Programming: O(n )

Computing the Optimal Global Alignment Value. B = n. Score of = 1 Score of = a a c g a c g a. A = n. Classical Dynamic Programming: O(n ) Alignment Grph Alignment Mtrix Computing the Optiml Globl Alignment Vlue An Introduction to Bioinformtics Algorithms A = n c t 2 3 c c 4 g 5 g 6 7 8 9 B = n 0 c g c g 2 3 4 5 6 7 8 t 9 0 2 3 4 5 6 7 8

More information

Balanced binary search trees

Balanced binary search trees 02110 Inge Li Gørtz Overview Blnced binry serch trees: Red-blck trees nd 2-3-4 trees Amortized nlysis Dynmic progrmming Network flows String mtching String indexing Computtionl geometry Introduction to

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

1 APL13: Suffix Arrays: more space reduction

1 APL13: Suffix Arrays: more space reduction 1 APL13: Suffix Arrys: more spce reduction In Section??, we sw tht when lphbet size is included in the time nd spce bounds, the suffix tree for string of length m either requires Θ(m Σ ) spce or the minimum

More information

Preview 11/1/2017. Greedy Algorithms. Coin Change. Coin Change. Coin Change. Coin Change. Greedy algorithms. Greedy Algorithms

Preview 11/1/2017. Greedy Algorithms. Coin Change. Coin Change. Coin Change. Coin Change. Greedy algorithms. Greedy Algorithms Preview Greed Algorithms Greed Algorithms Coin Chnge Huffmn Code Greed lgorithms end to e simple nd strightforwrd. Are often used to solve optimiztion prolems. Alws mke the choice tht looks est t the moment,

More information

Fingerprint idea. Assume:

Fingerprint idea. Assume: Fingerprint ide Assume: We cn compute fingerprint f(p) of P in O(m) time. If f(p) f(t[s.. s+m 1]), then P T[s.. s+m 1] We cn compre fingerprints in O(1) We cn compute f = f(t[s+1.. s+m]) from f(t[s.. s+m

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment

More information

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18 Computt onl Biology Leture 18 Genome Rerrngements Finding preserved genes We hve seen before how to rerrnge genome to obtin nother one bsed on: Reversls Knowledge of preserved bloks (or genes) Now we re

More information

Tries and suffixes trees

Tries and suffixes trees Trie: A dt-structure for set of words Tries nd suffixes trees Alon Efrt Comuter Science Dertment University of Arizon All words over the lhet Σ={,,..z}. In the slides, let sy tht the lhet is only {,,c,d}

More information

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!) CMSC 330: Orgniztion of Progrmming Lnguges DFAs, nd NFAs, nd Regexps (Oh my!) CMSC330 Spring 2018 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All

More information

(e) if x = y + z and a divides any two of the integers x, y, or z, then a divides the remaining integer

(e) if x = y + z and a divides any two of the integers x, y, or z, then a divides the remaining integer Divisibility In this note we introduce the notion of divisibility for two integers nd b then we discuss the division lgorithm. First we give forml definition nd note some properties of the division opertion.

More information

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute Victor Admchik Dnny Sletor Gret Theoreticl Ides In Computer Science CS 5-25 Spring 2 Lecture 2 Mr 3, 2 Crnegie Mellon University Deterministic Finite Automt Finite Automt A mchine so simple tht you cn

More information

What s Behind BLAST. Gene Myers, Director MPI for Cell Biology and Genetics Dresden, DE

What s Behind BLAST. Gene Myers, Director MPI for Cell Biology and Genetics Dresden, DE Wht s Behind BLAST Gene Myers, Director MPI for Cell Biology nd Genetics Dresden, DE Approximte String Serch Given string A of length n, query Q of length p n, n lignment scoring function δ, nd threshold

More information

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation Strong Bisimultion Overview Actions Lbeled trnsition system Trnsition semntics Simultion Bisimultion References Robin Milner, Communiction nd Concurrency Robin Milner, Communicting nd Mobil Systems 32

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

Faster Regular Expression Matching. Philip Bille Mikkel Thorup

Faster Regular Expression Matching. Philip Bille Mikkel Thorup Fster Regulr Expression Mtching Philip Bille Mikkel Thorup Outline Definition Applictions History tour of regulr expression mtching Thompson s lgorithm Myers lgorithm New lgorithm Results nd extensions

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

4. GREEDY ALGORITHMS I

4. GREEDY ALGORITHMS I 4. GREEDY ALGORITHMS I coin chnging intervl scheduling scheduling to minimize lteness optiml cching Lecture slides by Kevin Wyne Copyright 2005 Person-Addison Wesley http://www.cs.princeton.edu/~wyne/kleinberg-trdos

More information

1.4 Nonregular Languages

1.4 Nonregular Languages 74 1.4 Nonregulr Lnguges The number of forml lnguges over ny lphbet (= decision/recognition problems) is uncountble On the other hnd, the number of regulr expressions (= strings) is countble Hence, ll

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automt Theory nd Forml Lnguges TMV027/DIT321 LP4 2018 Lecture 10 An Bove April 23rd 2018 Recp: Regulr Lnguges We cn convert between FA nd RE; Hence both FA nd RE ccept/generte regulr lnguges; More

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs Prm University, Mth. Deprtment Summry of lecture 9 Algorithms nd Dt Structures Disjoint sets Summry of this lecture: (CLR.1-3) Dt Structures for Disjoint sets: Union opertion Find opertion Mrco Pellegrini

More information

CS:4330 Theory of Computation Spring Regular Languages. Equivalences between Finite automata and REs. Haniel Barbosa

CS:4330 Theory of Computation Spring Regular Languages. Equivalences between Finite automata and REs. Haniel Barbosa CS:4330 Theory of Computtion Spring 208 Regulr Lnguges Equivlences between Finite utomt nd REs Hniel Brbos Redings for this lecture Chpter of [Sipser 996], 3rd edition. Section.3. Finite utomt nd regulr

More information

USA Mathematical Talent Search Round 1 Solutions Year 21 Academic Year

USA Mathematical Talent Search Round 1 Solutions Year 21 Academic Year 1/1/21. Fill in the circles in the picture t right with the digits 1-8, one digit in ech circle with no digit repeted, so tht no two circles tht re connected by line segment contin consecutive digits.

More information

The area under the graph of f and above the x-axis between a and b is denoted by. f(x) dx. π O

The area under the graph of f and above the x-axis between a and b is denoted by. f(x) dx. π O 1 Section 5. The Definite Integrl Suppose tht function f is continuous nd positive over n intervl [, ]. y = f(x) x The re under the grph of f nd ove the x-xis etween nd is denoted y f(x) dx nd clled the

More information

1.3 Regular Expressions

1.3 Regular Expressions 56 1.3 Regulr xpressions These hve n importnt role in describing ptterns in serching for strings in mny pplictions (e.g. wk, grep, Perl,...) All regulr expressions of lphbet re 1.Ønd re regulr expressions,

More information

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1 Chpter Five: Nondeterministic Finite Automt Forml Lnguge, chpter 5, slide 1 1 A DFA hs exctly one trnsition from every stte on every symol in the lphet. By relxing this requirement we get relted ut more

More information

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018 DATA620006 魏忠钰 Serch I Mrch 7 th, 2018 Outline Serch Problems Uninformed Serch Depth-First Serch Bredth-First Serch Uniform-Cost Serch Rel world tsk - Pc-mn Serch problems A serch problem consists of:

More information

a < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1

a < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1 Mth 33 Volume Stewrt 5.2 Geometry of integrls. In this section, we will lern how to compute volumes using integrls defined by slice nlysis. First, we recll from Clculus I how to compute res. Given the

More information

Introduction to Computational Molecular Biology. Suffix Trees

Introduction to Computational Molecular Biology. Suffix Trees 18.417 Itroductio to Computtiol Moleculr Biology Lecture 11: October 14, 004 Scribe: Athich Muthitchroe Lecturer: Ross Lippert Editor: Toy Scelfo Suffix Trees This is oe of the most complicted dt structures

More information

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (

More information

CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages C 314 Principles of Progrmming Lnguges Lecture 6: LL(1) Prsing Zheng (Eddy) Zhng Rutgers University Ferury 5, 2018 Clss Informtion Homework 2 due tomorrow. Homework 3 will e posted erly next week. 2 Top

More information

Winter 2016 COMP-250: Introduction to Computer Science. Lecture 24, April 7, 2016

Winter 2016 COMP-250: Introduction to Computer Science. Lecture 24, April 7, 2016 Winter 2016 COMP-250: Introduction to Computer Science Lecture 24, April 7, 2016 Tries 1 2 3 4 5 Tries Atrie is tree-sed dt dte structure for storing strings in order to mke pttern mtching fster. Tries

More information

First Midterm Examination

First Midterm Examination Çnky University Deprtment of Computer Engineering 203-204 Fll Semester First Midterm Exmintion ) Design DFA for ll strings over the lphet Σ = {,, c} in which there is no, no nd no cc. 2) Wht lnguge does

More information

Scientific notation is a way of expressing really big numbers or really small numbers.

Scientific notation is a way of expressing really big numbers or really small numbers. Scientific Nottion (Stndrd form) Scientific nottion is wy of expressing relly big numbers or relly smll numbers. It is most often used in scientific clcultions where the nlysis must be very precise. Scientific

More information

Harvard University Computer Science 121 Midterm October 23, 2012

Harvard University Computer Science 121 Midterm October 23, 2012 Hrvrd University Computer Science 121 Midterm Octoer 23, 2012 This is closed-ook exmintion. You my use ny result from lecture, Sipser, prolem sets, or section, s long s you quote it clerly. The lphet is

More information

State Minimization for DFAs

State Minimization for DFAs Stte Minimiztion for DFAs Red K & S 2.7 Do Homework 10. Consider: Stte Minimiztion 4 5 Is this miniml mchine? Step (1): Get rid of unrechle sttes. Stte Minimiztion 6, Stte is unrechle. Step (2): Get rid

More information

Lecture 21: Order statistics

Lecture 21: Order statistics Lecture : Order sttistics Suppose we hve N mesurements of sclr, x i =, N Tke ll mesurements nd sort them into scending order x x x 3 x N Define the mesured running integrl S N (x) = 0 for x < x = i/n for

More information

We partition C into n small arcs by forming a partition of [a, b] by picking s i as follows: a = s 0 < s 1 < < s n = b.

We partition C into n small arcs by forming a partition of [a, b] by picking s i as follows: a = s 0 < s 1 < < s n = b. Mth 255 - Vector lculus II Notes 4.2 Pth nd Line Integrls We begin with discussion of pth integrls (the book clls them sclr line integrls). We will do this for function of two vribles, but these ides cn

More information

Lecture 2: January 27

Lecture 2: January 27 CS 684: Algorithmic Gme Theory Spring 217 Lecturer: Év Trdos Lecture 2: Jnury 27 Scrie: Alert Julius Liu 2.1 Logistics Scrie notes must e sumitted within 24 hours of the corresponding lecture for full

More information

The Greedy Algorithm for the Minimum Common String Partition Problem

The Greedy Algorithm for the Minimum Common String Partition Problem The Greedy Algorithm for the Minimum Common String Prtition Problem Mrek Chrobk Petr Kolmn Jiří Sgll July 23, 2004 Abstrct In the Minimum Common String Prtition problem (MCSP) we re given two strings on

More information

Algorithms in Computational. Biology. More on BWT

Algorithms in Computational. Biology. More on BWT Algorithms in Computtionl Biology More on BWT tody Plese Lst clss! don't forget to submit And by next (vi emil, repo ) implementtion week or shre prgectfltw get Not I would like reding overview! Discuss

More information

This lecture covers Chapter 8 of HMU: Properties of CFLs

This lecture covers Chapter 8 of HMU: Properties of CFLs This lecture covers Chpter 8 of HMU: Properties of CFLs Turing Mchine Extensions of Turing Mchines Restrictions of Turing Mchines Additionl Reding: Chpter 8 of HMU. Turing Mchine: Informl Definition B

More information

Lexical Analysis Part III

Lexical Analysis Part III Lexicl Anlysis Prt III Chpter 3: Finite Automt Slides dpted from : Roert vn Engelen, Florid Stte University Alex Aiken, Stnford University Design of Lexicl Anlyzer Genertor Trnslte regulr expressions to

More information

Solving the String Statistics Problem in Time O(n log n)

Solving the String Statistics Problem in Time O(n log n) Alcom-FT Technicl Report Series ALCOMFT-TR-02-55 Solving the String Sttistics Prolem in Time O(n log n) Gerth Stølting Brodl 1,,, Rune B. Lyngsø 3, Ann Östlin1,, nd Christin N. S. Pedersen 1,2, 1 BRICS,

More information

Lecture 1: Introduction to integration theory and bounded variation

Lecture 1: Introduction to integration theory and bounded variation Lecture 1: Introduction to integrtion theory nd bounded vrition Wht is this course bout? Integrtion theory. The first question you might hve is why there is nything you need to lern bout integrtion. You

More information

FLAG: Fast Local Alignment Generating Methodology. Abstract. Introduction

FLAG: Fast Local Alignment Generating Methodology. Abstract. Introduction Romnin Biotechnologicl Letters Vol 8, No, 23 Copyright 23 University of Buchrest Printed in Romni All rights reserved SHORT COMMUNICATION FLAG: Fst Locl Alignment Generting Methodology Abstrct Received

More information

Natural examples of rings are the ring of integers, a ring of polynomials in one variable, the ring

Natural examples of rings are the ring of integers, a ring of polynomials in one variable, the ring More generlly, we define ring to be non-empty set R hving two binry opertions (we ll think of these s ddition nd multipliction) which is n Abelin group under + (we ll denote the dditive identity by 0),

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

Designing finite automata II

Designing finite automata II Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of

More information

1 Structural induction, finite automata, regular expressions

1 Structural induction, finite automata, regular expressions Discrete Structures Prelim 2 smple uestions s CS2800 Questions selected for spring 2017 1 Structurl induction, finite utomt, regulr expressions 1. We define set S of functions from Z to Z inductively s

More information

Homework 3 Solutions

Homework 3 Solutions CS 341: Foundtions of Computer Science II Prof. Mrvin Nkym Homework 3 Solutions 1. Give NFAs with the specified numer of sttes recognizing ech of the following lnguges. In ll cses, the lphet is Σ = {,1}.

More information

Java II Finite Automata I

Java II Finite Automata I Jv II Finite Automt I Bernd Kiefer Bernd.Kiefer@dfki.de Deutsches Forschungszentrum für künstliche Intelligenz Finite Automt I p.1/13 Processing Regulr Expressions We lredy lerned out Jv s regulr expression

More information

Formal languages, automata, and theory of computation

Formal languages, automata, and theory of computation Mälrdlen University TEN1 DVA337 2015 School of Innovtion, Design nd Engineering Forml lnguges, utomt, nd theory of computtion Thursdy, Novemer 5, 14:10-18:30 Techer: Dniel Hedin, phone 021-107052 The exm

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design nd Anlysis LECTURE 12 Solving Recurrences Mster Theorem Adm Smith Review Question: Exponentition Problem: Compute b, where b N is n bits long. Question: How mny multiplictions? Nive lgorithm:

More information

First Midterm Examination

First Midterm Examination 24-25 Fll Semester First Midterm Exmintion ) Give the stte digrm of DFA tht recognizes the lnguge A over lphet Σ = {, } where A = {w w contins or } 2) The following DFA recognizes the lnguge B over lphet

More information

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018 CS 301 Lecture 04 Regulr Expressions Stephen Checkowy Jnury 29, 2018 1 / 35 Review from lst time NFA N = (Q, Σ, δ, q 0, F ) where δ Q Σ P (Q) mps stte nd n lphet symol (or ) to set of sttes We run n NFA

More information

Physics 201 Lab 3: Measurement of Earth s local gravitational field I Data Acquisition and Preliminary Analysis Dr. Timothy C. Black Summer I, 2018

Physics 201 Lab 3: Measurement of Earth s local gravitational field I Data Acquisition and Preliminary Analysis Dr. Timothy C. Black Summer I, 2018 Physics 201 Lb 3: Mesurement of Erth s locl grvittionl field I Dt Acquisition nd Preliminry Anlysis Dr. Timothy C. Blck Summer I, 2018 Theoreticl Discussion Grvity is one of the four known fundmentl forces.

More information

On Suffix Tree Breadth

On Suffix Tree Breadth On Suffix Tree Bredth Golnz Bdkoeh 1,, Juh Kärkkäinen 2, Simon J. Puglisi 2,, nd Bell Zhukov 2, 1 Deprtment of Computer Science University of Wrwick Conventry, United Kingdom g.dkoeh@wrwick.c.uk 2 Helsinki

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms The Minimum Lel Spnning Tree Prolem: Illustrting the Utility of Genetic Algorithms Yupei Xiong, Univ. of Mrylnd Bruce Golden, Univ. of Mrylnd Edwrd Wsil, Americn Univ. Presented t BAE Systems Distinguished

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 CMSC 330 1 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All exmples so fr Nondeterministic

More information

INTRODUCTION TO LINEAR ALGEBRA

INTRODUCTION TO LINEAR ALGEBRA ME Applied Mthemtics for Mechnicl Engineers INTRODUCTION TO INEAR AGEBRA Mtrices nd Vectors Prof. Dr. Bülent E. Pltin Spring Sections & / ME Applied Mthemtics for Mechnicl Engineers INTRODUCTION TO INEAR

More information

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun: CMPU 240 Lnguge Theory nd Computtion Spring 2019 NFAs nd Regulr Expressions Lst clss: Introduced nondeterministic finite utomt with -trnsitions Tody: Prove n NFA- is no more powerful thn n NFA Introduce

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb. CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

Frobenius numbers of generalized Fibonacci semigroups

Frobenius numbers of generalized Fibonacci semigroups Frobenius numbers of generlized Fiboncci semigroups Gretchen L. Mtthews 1 Deprtment of Mthemticl Sciences, Clemson University, Clemson, SC 29634-0975, USA gmtthe@clemson.edu Received:, Accepted:, Published:

More information

Exact Matching. Exact Matching Algorithms 5/19/2015. Exact Matching Problem: search pattern P in text T (P,T are strings)

Exact Matching. Exact Matching Algorithms 5/19/2015. Exact Matching Problem: search pattern P in text T (P,T are strings) Exct Mtching Exct Mtching Problem: Exct Mtching Algorithms serch pttern P in text T (P,T re strings) Knuth Morris Prtt preprocess pttern P Aho Corsick preprocess pttern P of severl strings P = { P 1,,

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2 CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

Graph Theory. Dr. Saad El-Zanati, Faculty Mentor Ryan Bunge Graduate Assistant Illinois State University REU. Graph Theory

Graph Theory. Dr. Saad El-Zanati, Faculty Mentor Ryan Bunge Graduate Assistant Illinois State University REU. Graph Theory Grph Theory Gibson, Ngel, Stnley, Zle Specil Types of Bckground Motivtion Grph Theory Dniel Gibson, Concordi University Jckelyn Ngel, Dominicn University Benjmin Stnley, New Mexico Stte University Allison

More information

Lecture Note 9: Orthogonal Reduction

Lecture Note 9: Orthogonal Reduction MATH : Computtionl Methods of Liner Algebr 1 The Row Echelon Form Lecture Note 9: Orthogonl Reduction Our trget is to solve the norml eution: Xinyi Zeng Deprtment of Mthemticl Sciences, UTEP A t Ax = A

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

GRADE 4. Division WORKSHEETS

GRADE 4. Division WORKSHEETS GRADE Division WORKSHEETS Division division is shring nd grouping Division cn men shring or grouping. There re cndies shred mong kids. How mny re in ech shre? = 3 There re 6 pples nd go into ech bsket.

More information

Minimal DFA. minimal DFA for L starting from any other

Minimal DFA. minimal DFA for L starting from any other Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA

More information

GNFA GNFA GNFA GNFA GNFA

GNFA GNFA GNFA GNFA GNFA DFA RE NFA DFA -NFA REX GNFA Definition GNFA A generlize noneterministic finite utomton (GNFA) is grph whose eges re lele y regulr expressions, with unique strt stte with in-egree, n unique finl stte with

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted

More information

Math 61CM - Solutions to homework 9

Math 61CM - Solutions to homework 9 Mth 61CM - Solutions to homework 9 Cédric De Groote November 30 th, 2018 Problem 1: Recll tht the left limit of function f t point c is defined s follows: lim f(x) = l x c if for ny > 0 there exists δ

More information

CS 275 Automata and Formal Language Theory

CS 275 Automata and Formal Language Theory CS 275 Automt nd Forml Lnguge Theory Course Notes Prt II: The Recognition Problem (II) Chpter II.6.: Push Down Automt Remrk: This mteril is no longer tught nd not directly exm relevnt Anton Setzer (Bsed

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

CISC 4090 Theory of Computation

CISC 4090 Theory of Computation 9/6/28 Stereotypicl computer CISC 49 Theory of Computtion Finite stte mchines & Regulr lnguges Professor Dniel Leeds dleeds@fordhm.edu JMH 332 Centrl processing unit (CPU) performs ll the instructions

More information

Part 5 out of 5. Automata & languages. A primer on the Theory of Computation. Last week was all about. a superset of Regular Languages

Part 5 out of 5. Automata & languages. A primer on the Theory of Computation. Last week was all about. a superset of Regular Languages Automt & lnguges A primer on the Theory of Computtion Lurent Vnbever www.vnbever.eu Prt 5 out of 5 ETH Zürich (D-ITET) October, 19 2017 Lst week ws ll bout Context-Free Lnguges Context-Free Lnguges superset

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh Lnguges nd Automt Finite Automt Informtics 2A: Lecture 3 John Longley School of Informtics University of Edinburgh jrl@inf.ed.c.uk 22 September 2017 1 / 30 Lnguges nd Automt 1 Lnguges nd Automt Wht is

More information

Closure Properties of Regular Languages

Closure Properties of Regular Languages Closure Properties of Regulr Lnguges Regulr lnguges re closed under mny set opertions. Let L 1 nd L 2 e regulr lnguges. (1) L 1 L 2 (the union) is regulr. (2) L 1 L 2 (the conctention) is regulr. (3) L

More information

The Knapsack Problem. COSC 3101A - Design and Analysis of Algorithms 9. Fractional Knapsack Problem. Fractional Knapsack Problem

The Knapsack Problem. COSC 3101A - Design and Analysis of Algorithms 9. Fractional Knapsack Problem. Fractional Knapsack Problem The Knpsck Prolem COSC A - Design nd Anlsis of Algorithms Knpsck Prolem Huffmn Codes Introduction to Grphs Mn of these slides re tken from Monic Nicolescu, Univ. of Nevd, Reno, monic@cs.unr.edu The - knpsck

More information

A recursive construction of efficiently decodable list-disjunct matrices

A recursive construction of efficiently decodable list-disjunct matrices CSE 709: Compressed Sensing nd Group Testing. Prt I Lecturers: Hung Q. Ngo nd Atri Rudr SUNY t Bufflo, Fll 2011 Lst updte: October 13, 2011 A recursive construction of efficiently decodble list-disjunct

More information

1 Error Analysis of Simple Rules for Numerical Integration

1 Error Analysis of Simple Rules for Numerical Integration cs41: introduction to numericl nlysis 11/16/10 Lecture 19: Numericl Integrtion II Instructor: Professor Amos Ron Scries: Mrk Cowlishw, Nthnel Fillmore 1 Error Anlysis of Simple Rules for Numericl Integrtion

More information

Student Activity 3: Single Factor ANOVA

Student Activity 3: Single Factor ANOVA MATH 40 Student Activity 3: Single Fctor ANOVA Some Bsic Concepts In designed experiment, two or more tretments, or combintions of tretments, is pplied to experimentl units The number of tretments, whether

More information

8 Laplace s Method and Local Limit Theorems

8 Laplace s Method and Local Limit Theorems 8 Lplce s Method nd Locl Limit Theorems 8. Fourier Anlysis in Higher DImensions Most of the theorems of Fourier nlysis tht we hve proved hve nturl generliztions to higher dimensions, nd these cn be proved

More information

Chapter 2 Finite Automata

Chapter 2 Finite Automata Chpter 2 Finite Automt 28 2.1 Introduction Finite utomt: first model of the notion of effective procedure. (They lso hve mny other pplictions). The concept of finite utomton cn e derived y exmining wht

More information

Math 426: Probability Final Exam Practice

Math 426: Probability Final Exam Practice Mth 46: Probbility Finl Exm Prctice. Computtionl problems 4. Let T k (n) denote the number of prtitions of the set {,..., n} into k nonempty subsets, where k n. Argue tht T k (n) kt k (n ) + T k (n ) by

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014 CS125 Lecture 12 Fll 2014 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

Lecture 17. Integration: Gauss Quadrature. David Semeraro. University of Illinois at Urbana-Champaign. March 20, 2014

Lecture 17. Integration: Gauss Quadrature. David Semeraro. University of Illinois at Urbana-Champaign. March 20, 2014 Lecture 17 Integrtion: Guss Qudrture Dvid Semerro University of Illinois t Urbn-Chmpign Mrch 0, 014 Dvid Semerro (NCSA) CS 57 Mrch 0, 014 1 / 9 Tody: Objectives identify the most widely used qudrture method

More information