SUBSTRING SEARCH BBM ALGORITHMS DEPT. OF COMPUTER ENGINEERING

Size: px
Start display at page:

Download "SUBSTRING SEARCH BBM ALGORITHMS DEPT. OF COMPUTER ENGINEERING"

Transcription

1 BBM LGORITHMS DEPT. OF OMPUTER ENGINEERING SUBSTRING SERH cknowledgement: The course slides are adapted from the slides prepared by R. Sedgewick and K. Wayne of Princeton University. 1

2 TODY Substring search Brute force Knuth-Morris-Pratt Boyer-Moore Rabin-Karp 2

3 Substring search Goal. Find pattern of length M in a text of length N. typically N >> M pattern text N E E D L E I N H Y S T K N E E D L E I N match 3 3

4 Substring search applications Goal. Find pattern of length M in a text of length N. typically N >> M pattern N E E D L E text I N H Y S T K N E E D L E I N match Substring search omputer forensics. Search memory or disk for signatures, e.g., all URLs or RS keys that the user has entered

5 Substring search applications Goal. Find pattern of length M in a text of length N. typically N >> M pattern text N E E D L E I N H Y S T K N E E D L E I N match Identify patterns indicative of spam. PROFITS L0SE WE1GHT There is no catch. This is a one-time mailing. This message is sent in compliance with spam regulations. 5 5

6 Substring search applications Electronic surveillance. Need to monitor all internet traffic. (security) No way! (privacy) Well, we re mainly interested in TTK T DWN OK. Build a machine that just looks for that. TTK T DWN substring search machine found 6 6

7 Substring search applications Screen scraping. Extract relevant data from web page. Ex. Find string delimited by <b> and </b> after first occurrence of pattern Last Trade:. <tr> <td class= "yfnc_tablehead1" width= "48%"> Last Trade: </td> <td class= "yfnc_tabledata1"> <big><b>452.92</b></big> </td></tr> <td class= "yfnc_tablehead1" width= "48%"> Trade Time: </td> <td class= "yfnc_tabledata1">

8 Screen scraping: Java implementation Java library. The indexof() method in Java's string library returns the index of the first occurrence of a given string, starting at a given offset. public class StockQuote { public static void main(string[] args) { String name = " In in = new In(name + args[0]); String text = in.readll(); int start = text.indexof("last Trade:", 0); int from = text.indexof("<b>", start); int to = text.indexof("</b>", from); String price = text.substring(from + 3, to); StdOut.println(price); } } % java StockQuote goog % java StockQuote msft

9 Brute force Knuth-Morris-Pratt Boyer-Moore Rabin-Karp SUBSTRING SERH 9

10 Brute-force substring search heck for pattern starting at each text position. i j i+j txt B D B R B R pat B R entries in red are mismatches B R B R entries in gray are for reference only B R entries in black match the text B R B R return i when j is M match 10 10

11 Brute-force substring search: Java implementation heck for pattern starting at each text position. i j i+j B D B R D R D R public static int search(string pat, String txt) { int M = pat.length(); int N = txt.length(); for (int i = 0; i <= N - M; i++) { int j; } for (j = 0; j < M; j++) if (txt.chart(i+j)!= pat.chart(j)) break; index in text where if (j == M) return i; pattern starts } return N; not found 11 11

12 Brute-force substring search: worst case Brute-force algorithm can be slow if text and pattern are repetitive. i j i+j txt B B pat B B B B B Brute-force substring search (worst case) match Worst case. ~ M N char compares

13 Backup In many applications, we want to avoid backup in text stream. Treat input as stream of data. bstract model: standard input. Brute-force algorithm needs backup for every mismatch. matched chars backup pproach 1. Maintain buffer of last M characters. B pproach 2. Stay tuned. TTK T DWN substring search machine found B shift pattern right one position B mismatch B 13 13

14 Brute-force substring search: alternate implementation Same sequence of char compares as previous implementation. i points to end of sequence of already-matched chars in text. j stores number of already-matched chars (end of sequence in pattern). i j B D B R 7 3 D R 5 0 D R public static int search(string pat, String txt) { int i, N = txt.length(); int j, M = pat.length(); for (i = 0, j = 0; i < N && j < M; i++) { if (txt.chart(i) == pat.chart(j)) j++; else { i -= j; j = 0; } } if (j == M) return i - M; } else return N; backup 14 14

15 lgorithmic challenges in substring search Brute-force is not always good enough. Theoretical challenge. Linear-time guarantee. Practical challenge. void backup in text stream. fundamental algorithmic problem often no room or time to save text Now is the time for all people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for a lot of good people to come to the aid of their party. Now is the time for all of the good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for each good person to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Republicans to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many or all good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Democrats to come to the aid of their party. Now is the time for all people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for a lot of good people to come to the aid of their party. Now is the time for all of the good people to come to the aid of their party. Now is the time for all good people to come to the aid of their attack at dawn party. Now is the time for each person to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Republicans to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many or all good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Democrats to come to the aid of their party

16 Brute force Knuth-Morris-Pratt Boyer-Moore Rabin-Karp SUBSTRING SERH 16

17 Knuth-Morris-Pratt substring search Intuition. Suppose we are searching in text for pattern B. Suppose we match 5 chars in pattern, with mismatch on 6 th char. We know previous 6 chars in text are BB. Don't need to back up text pointer! text after mismatch on sixth char brute-force backs up to try this B B B and this and this B and this but no backup is needed i B and this assuming {, B } alphabet pattern B B B B Knuth-Morris-Pratt algorithm. lever method to always avoid backup. (!) 17 17

18 Deterministic finite state automaton (DF) DF is abstract string-searching machine. Finite number of states (including start and halt). Exactly one transition for each char in alphabet. ccept if sequence of transitions leads to halt state. internal representation j pat.chart(j) dfa[][j] B B B If in state j reading char c: if j is 6 halt and accept else move to state dfa[c][j] graphical representation B, B 0 1 B 2 3 B B, B, 18 18

19 DF simulation B B B pat.chart(j) dfa[][j] B B B B, B 0 1 B 2 3 B B, B, 19 19

20 DF simulation B B B pat.chart(j) dfa[][j] B B B B, B 0 1 B 2 3 B B, B, 20 20

21 DF simulation B B B pat.chart(j) dfa[][j] B B B B, B 0 1 B 2 3 B B, B, 21 21

22 DF simulation B B B pat.chart(j) dfa[][j] B B B B, B 0 1 B 2 3 B B, B, 22 22

23 DF simulation B B B pat.chart(j) dfa[][j] B B B B, B 0 1 B 2 3 B B, B, 23 23

24 DF simulation B B B pat.chart(j) dfa[][j] B B B B, B 0 1 B 2 3 B B, B, 24 24

25 DF simulation B B B pat.chart(j) dfa[][j] B B B B, B 0 1 B 2 3 B B, B, 25 25

26 DF simulation B B B pat.chart(j) dfa[][j] B B B B, B 0 1 B 2 3 B B, B, 26 26

27 DF simulation B B B pat.chart(j) dfa[][j] B B B B, B 0 1 B 2 3 B B, B, 27 27

28 DF simulation B B B pat.chart(j) dfa[][j] B B B B, B 0 1 B 2 3 B B, B, 28 28

29 DF simulation B B B pat.chart(j) dfa[][j] B B B B, B 0 1 B 2 3 B B, B, 29 29

30 DF simulation B B B pat.chart(j) dfa[][j] B B B B, B 0 1 B 2 3 B B, B, 30 30

31 DF simulation B B B pat.chart(j) dfa[][j] B B B B, B 0 1 B 2 3 B B, B, 31 31

32 DF simulation B B B pat.chart(j) dfa[][j] B B B B, B 0 1 B 2 3 B B, substring found B, 32 32

33 Interpretation of Knuth-Morris-Pratt DF Q. What is interpretation of DF state after reading in txt[i]?. State = number of characters in pattern that have been matched. Ex. DF is in state 3 after reading in txt[0..6]. length of longest prefix of pat[] that is a suffix of txt[0..i] i txt B B B suffix of text[0..6] pat B B prefix of pat[] B, B 0 1 B 2 3 B B, B, 33 33

34 Knuth-Morris-Pratt substring search: Java implementation Key differences from brute-force implementation. Need to precompute dfa[][] from pattern. Text pointer i never decrements. public int search(string txt) { int i, j, N = txt.length(); } for (i = 0, j = 0; i < N && j < M; i++) j = dfa[txt.chart(i)][j]; if (j == M) return i - M; else return N; Running time. Simulate DF on text: at most N character accesses. Build DF: how to do efficiently? [warning: tricky algorithm ahead] no backup 34 34

35 Knuth-Morris-Pratt substring search: Java implementation Key differences from brute-force implementation. Need to precompute dfa[][] from pattern. Text pointer i never decrements. ould use input stream. public int search(in in) { int i, j; } for (i = 0, j = 0;!in.isEmpty() && j < M; i++) j = dfa[in.readhar()][j]; if (j == M) return i - M; else return NOT_FOUND; no backup B, B 0 1 B 2 3 B B, B, 35 35

36 Knuth-Morris-Pratt construction Include one state for each character in pattern (plus accept state). pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B

37 Knuth-Morris-Pratt construction Match transition. If in state j and next char c == pat.chart(j), go to j+1. first j characters of pattern have already been matched next char matches now first j+1 characters of pattern have been matched pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B 0 1 B 2 3 B

38 Knuth-Morris-Pratt construction Mismatch transition: back up if c!= pat.chart(j). pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B B, j 0 1 B 2 3 B

39 Knuth-Morris-Pratt construction Mismatch transition: back up if c!= pat.chart(j). pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B B, j 0 1 B 2 3 B

40 Knuth-Morris-Pratt construction Mismatch transition: back up if c!= pat.chart(j). pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B B, j 0 1 B 2 3 B B, 40 40

41 Knuth-Morris-Pratt construction Mismatch transition: back up if c!= pat.chart(j). pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B B, j 0 1 B 2 3 B B, 41 41

42 Knuth-Morris-Pratt construction Mismatch transition: back up if c!= pat.chart(j). pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B B, j 0 1 B 2 3 B B, B, 42 42

43 Knuth-Morris-Pratt construction Mismatch transition: back up if c!= pat.chart(j). pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B B, B j 0 1 B 2 3 B B, B, 43 43

44 Knuth-Morris-Pratt construction pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B B, B 0 1 B 2 3 B B, B, 44 44

45 How to build DF from pattern? Include one state for each character in pattern (plus accept state). pat.chart(j) dfa[][j] B B B

46 How to build DF from pattern? Match transition. If in state j and next char c == pat.chart(j), go to j+1. first j characters of pattern have already been matched next char matches now first j+1 characters of pattern have been matched pat.chart(j) dfa[][j] B B B B 2 3 B

47 How to build DF from pattern? Mismatch transition. If in state j and next char c!= pat.chart(j), then the last j-1 characters of input are pat[1..j-1], followed by c. To compute dfa[c][j]: Simulate pat[1..j-1] on DF and take transition c. Running time. Seems to require j steps. still under construction (!) Ex. dfa[''][5] = 1; dfa['b'][5] = 4 simulate BB; simulate BB; take transition '' take transition 'B' = dfa[''][3] = dfa['b'][3] j pat.chart(j) B B simulation B, of BB B j 0 1 B 2 3 B B, B, 47 47

48 How to build DF from pattern? Mismatch transition. If in state j and next char c!= pat.chart(j), then the last j-1 characters of input are pat[1..j-1], followed by c. state X To compute dfa[c][j]: Simulate pat[1..j-1] on DF and take transition c. Running time. Takes only constant time if we maintain state X. Ex. dfa[''][5] = 1; dfa['b'][5] = 4; from state X, from state X, take transition '' take transition 'B' = dfa[''][x] = dfa['b'][x] X'= 0 from state X, take transition '' = dfa[''][x] B B B, X B j 0 1 B 2 3 B B, B, 48 48

49 Knuth-Morris-Pratt construction (in linear time) Include one state for each character in pattern (plus accept state). pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B

50 Knuth-Morris-Pratt construction (in linear time) Match transition. For each state j, dfa[pat.chart(j)][j] = j+1. first j characters of pattern have already been matched now first j+1 characters of pattern have been matched pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B 0 1 B 2 3 B

51 Knuth-Morris-Pratt construction (in linear time) Mismatch transition. For state 0 and char c!= pat.chart(j), set dfa[c][0] = 0. pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B B, j 0 1 B 2 3 B

52 Knuth-Morris-Pratt construction (in linear time) Mismatch transition. For each state j and char c!= pat.chart(j), set dfa[c][j] = dfa[c][x]; then update X = dfa[pat.chart(j)][x]. X = simulation of empty string pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B B, j 0 1 B 2 3 B X 52 52

53 Knuth-Morris-Pratt construction (in linear time) Mismatch transition. For each state j and char c!= pat.chart(j), set dfa[c][j] = dfa[c][x]; then update X = dfa[pat.chart(j)][x]. X = simulation of B pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B B, j X 0 1 B 2 3 B B, 53 53

54 Knuth-Morris-Pratt construction (in linear time) Mismatch transition. For each state j and char c!= pat.chart(j), set dfa[c][j] = dfa[c][x]; then update X = dfa[pat.chart(j)][x]. X = simulation of B pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B B, j 0 1 B 2 3 B X B, 54 54

55 Knuth-Morris-Pratt construction (in linear time) Mismatch transition. For each state j and char c!= pat.chart(j), set dfa[c][j] = dfa[c][x]; then update X = dfa[pat.chart(j)][x]. X = simulation of B B pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B B, j 0 1 B 2 3 B X B, B, 55 55

56 Knuth-Morris-Pratt construction (in linear time) Mismatch transition. For each state j and char c!= pat.chart(j), set dfa[c][j] = dfa[c][x]; then update X = dfa[pat.chart(j)][x]. X = simulation of B B pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B B, j B 0 1 B 2 3 B X B, B, 56 56

57 Knuth-Morris-Pratt construction (in linear time) Mismatch transition. For each state j and char c!= pat.chart(j), set dfa[c][j] = dfa[c][x]; then update X = dfa[pat.chart(j)][x]. X = simulation of B B pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B X B, B 0 1 B 2 3 B B, j B, 57 57

58 Knuth-Morris-Pratt construction (in linear time) pat.chart(j) dfa[][j] B B B onstructing the DF for KMP substring search for B B B, B 0 1 B 2 3 B B, B, 58 58

59 onstructing the DF for KMP substring search: Java implementation For each state j: opy dfa[][x] to dfa[][j] for mismatch case. Set dfa[pat.chart(j)][j] to j+1 for match case. Update X. public KMP(String pat) { this.pat = pat; M = pat.length(); dfa = new int[r][m]; dfa[pat.chart(0)][0] = 1; for (int X = 0, j = 1; j < M; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; dfa[pat.chart(j)][j] = j+1; X = dfa[pat.chart(j)][x]; } } copy mismatch cases set match case update restart state Running time. M character accesses (but space proportional to R M)

60 KMP substring search analysis Proposition. KMP substring search accesses no more than M + N chars to search for a pattern of length M in a text of length N. Pf. Each pattern char accessed once when constructing the DF; each text char accessed once (in the worst case) when simulating the DF. Proposition. KMP constructs dfa[][] in time and space proportional to R M. Larger alphabets. Improved version of KMP constructs nfa[] in time and space proportional to M. 0 1 B 2 3 B KMP NF for BB 60 60

61 Knuth-Morris-Pratt: brief history Independently discovered by two theoreticians and a hacker. - Knuth: inspired by esoteric theorem, discovered linear-time algorithm - Pratt: made running time independent of alphabet size - Morris: built a text editor for the D 6400 computer Theory meets practice. SIM J. OMPUT. Vol. 6, No. 2, June 1977 FST PTTERN MTHING IN STRINGS* DONLD E. KNUTHf, JMES H. MORRIS, JR.:l: ND VUGHN R. PRTT bstract. n algorithm is presented which finds all occurrences of one. given string within another, in running time proportional to the sum of the lengths of the strings. The constant of proportionality is low enough to make this algorithm of practical use, and the procedure can also be extended to deal with some more general pattern-matching problems. theoretical application of the algorithm shows that the set of concatenations of even palindromes, i.e., the language {can}*, can be recognized in linear time. Other algorithms which run even faster on the average are also considered. Don Knuth Jim Morris Vaughan Pratt 61 61

62 Brute force Knuth-Morris-Pratt Boyer-Moore Rabin-Karp SUBSTRING SERH 62

63 Boyer Moore Intuition Scan the text with a window of M chars (length of pattern) Pattern in Text (M) Text Scan Window (M) ase 1: Scan Window is exactly on top of the searched pattern - Starting from one end check if all characters are equal. (We must check!) ase 2: Scan Window starts after the pattern starts

64 Boyer Moore Intuition (2) ase 3: Scan Window starts before the pattern starts ase 4: Independent In case 4, simply shift window M characters void ase 2 onvert ase 3 to ase 1, by shifting appropriately 64 64

65 Boyer Moore Intuition (3) If we can recognise the character in the scan window end-point, we can find how many characters to shift. B D E F G H So, for example D is the 4th character, we must shift window 4 characters so that they overlap. d = 4 d = 4 B D E F G H 65 65

66 Boyer Moore Intuition (4) potential problem, the character in the text can repeat. For example, pattern = XXXX and the text is X X X X X X X X X X X Solution: be conservative, choose the instance with the least Shift (so we cannot miss the others)

67 Boyer Moore Intuition (5) X X X X X X X X X X X Search: XXXX So, for the example when it is at the endpoint we must shift for 2 characters. text: X we have a mismatch in last, now we must shift only once, so that we can check the configuration where the we found moves to middle. text: YXX we have a mismatch in Y, now we must shift 3 times as we know that the last 2 characters are in pattern and they can be repeating in the first 3 characters

68 Boyer-Moore: mismatched character heuristic Intuition. Scan characters in pattern from right to left. an skip as many as M text chars when finding one not in the pattern. - First we check the character in index pattern.length()-1 - It is N which is not E, so we know that first 5 characters is not a match. Shift text 5 characters - S!= E so shift 5, E == E so we can check for the pattern.length()-2, L!=N, skip 4. i text j F I N D I N H Y S T K N E E D L E I N 0 5 N E E D L E pattern 5 5 N E E D L E 11 4 N E E D L E 15 0 N E E D L E return i = 15 Mismatched character heuristic for right-to-left (Boyer-Moore) substring search 68 68

69 Boyer-Moore: mismatched character heuristic Q. How much to skip? ase 1. Mismatch character not in pattern. before i txt pat T L E N E E D L E i after txt pat T L E N E E D L E mismatch character 'T' not in pattern: increment i one character beyond 'T' 69 69

70 Boyer-Moore: mismatched character heuristic Q. How much to skip? ase 2a. Mismatch character in pattern. before i txt pat N L E N E E D L E after txt pat i N L E N E E D L E mismatch character 'N' in pattern: align text 'N' with rightmost pattern 'N' 70 70

71 Boyer-Moore: mismatched character heuristic Q. How much to skip? ase 2b. Mismatch character in pattern (but heuristic no help). before i txt pat E L E N E E D L E i aligned with rightmost E? txt pat E L E N E E D L E mismatch character 'E' in pattern: align text 'E' with rightmost pattern 'E'? 71 71

72 Boyer-Moore: mismatched character heuristic Q. How much to skip? ase 2b. Mismatch character in pattern (but heuristic no help). before i txt pat E L E N E E D L E i after txt pat E L E N E E D L E mismatch character 'E' in pattern: increment i by

73 Boyer-Moore: mismatched character heuristic Q. How much to skip?. Precompute index of rightmost occurrence of character c in pattern (-1 if character not in pattern). right = new int[r]; for (int c = 0; c < R; c++) right[c] = -1; for (int j = 0; j < M; j++) right[pat.chart(j)] = j; N E E D L E c right[c] B D E L M N Boyer-Moore skip table computation 73 73

74 Boyer-Moore: Java implementation } public int search(string txt) { int N = txt.length(); int M = pat.length(); int skip; for (int i = 0; i <= N-M; i += skip) { skip = 0; for (int j = M-1; j >= 0; j--) { if (pat.chart(j)!= txt.chart(i+j)) { skip = Math.max(1, j - right[txt.chart(i+j)]); break; } in case other term is nonpositive } if (skip == 0) return i; } return N; compute skip value match 74 74

75 nother Example SERH FOR: XXXX X X X X X X X X X X X If the window scan points to an unrecognised character, we can skip past that character. For this example, for the initial step we first match X at the end, when check for previous character () which is not in the string we skip 3 steps. The X at the end, we matched can still be the first character of the pattern, so we do not skip that

76 Boyer-Moore: analysis Property. Substring search with the Boyer-Moore mismatched character heuristic takes about ~ N / M character compares to search for a pattern of length M in a text of length N. Worst-case. an be as bad as ~ M N. sublinear! i skip txt B B B B B B B B B B 0 0 B B B B 1 1 B B B B 2 1 B B B B 3 1 B B B B 4 1 B B B B 5 1 B B B B Boyer-Moore-Horspool substring search (worst case) Boyer-Moore variant. an improve worst case to ~ 3 N by adding a KMP-like rule to guard against repetitive patterns. pat 76 76

77 Brute force Knuth-Morris-Pratt Boyer-Moore Rabin-Karp SUBSTRING SERH 77

78 Rabin-Karp fingerprint search Basic idea = modular hashing. ompute a hash of pattern characters 0 to M - 1. For each i, compute a hash of text characters i to M + i - 1. If pattern hash = text substring hash, check for a match. pat.chart(i) i % 997 = 613 txt.chart(i) i % 997 = % 997 = % 997 = % 997 = % 997 = % 997 = 929 match 6 return i = % 997 = 613 Basis for Rabin-Karp substring search 78 78

79 Efficiently computing the hash function Modular hash function. Using the notation t i for txt.chart(i), we wish to compute xi = ti R M-1 + ti+1 R M ti+m-1 R 0 (mod Q) Intuition. M-digit, base-r integer, modulo Q. Horner's method. Linear-time method to evaluate degree-m polynomial. pat.chart() i % 997 = 2 R Q % 997 = (2*10 + 6) % 997 = % 997 = (26*10 + 5) % 997 = % 997 = (265*10 + 3) % 997 = % 997 = (659*10 + 5) % 997 = 613 // ompute hash for M-digit key private long hash(string key, int M) { long h = 0; for (int j = 0; j < M; j++) h = (R * h + key.chart(j)) % Q; return h; } 79 79

80 Efficiently computing the hash function hallenge. How to efficiently compute x i+1 given that we know x i. xi = ti R M 1 + ti+1 R M ti+m 1 R 0 xi+1 = ti+1 R M 1 + ti+2 R M ti+m R 0 Key property. an update hash function in constant time! xi+1 = ( xi t i R M 1 ) R + t i +M current value subtract leading digit multiply by radix add new trailing digit (can precompute R M 2 ) i current value new value text * current value subtract leading digit multiply by radix add new trailing digit new value 80 80

81 Rabin-Karp substring search example i % 997 = 3 Q % 997 = (3*10 + 1) % 997 = % 997 = (31*10 + 4) % 997 = % 997 = (314*10 + 1) % 997 = % 997 = (150*10 + 5) % 997 = 508 RM R % 997 = (( *(997-30))*10 + 9) % 997 = % 997 = (( *(997-30))*10 + 2) % 997 = % 997 = (( *(997-30))*10 + 6) % 997 = % 997 = (( *(997-30))*10 + 5) % 997 = 442 match % 997 = (( *(997-30))*10 + 3) % 997 = % 997 = (( *(997-30))*10 + 5) % 997 = 613 return i-m+1 = 6 Rabin-Karp substring search example 81 81

82 Rabin-Karp: Java implementation public class RabinKarp { private long pathash; private int M; private long Q; private int R; private long RM; // pattern hash value // pattern length // modulus // radix // R^(M-1) % Q public RabinKarp(String pat) { M = pat.length(); R = 256; Q = longrandomprime(); } RM = 1; for (int i = 1; i <= M-1; i++) RM = (R * RM) % Q; pathash = hash(pat, M); a large prime (but avoid overflow) precompute R M 1 (mod Q) private long hash(string key, int M) { /* as before */ } } public int search(string txt) { /* see next slide */ } 82 82

83 Rabin-Karp: Java implementation (continued) Monte arlo version. Return match if hash match. public int search(string txt) check for hash collision { int N = txt.length(); using rolling hash function int txthash = hash(txt, M); if (pathash == txthash) return 0; for (int i = M; i < N; i++) { txthash = (txthash + Q - RM*txt.chart(i-M) % Q) % Q; txthash = (txthash*r + txt.chart(i)) % Q; if (pathash == txthash) return i - M + 1; } return N; } Las Vegas version. heck for substring match if hash match; continue search if false collision

84 Rabin-Karp analysis Theory. If Q is a sufficiently large random prime (about M N 2 ), then the probability of a false collision is about 1 / N. Practice. hoose Q to be a large prime (but not so large as to cause overflow). Under reasonable assumptions, probability of a collision is about 1 / Q. Monte arlo version. lways runs in linear time. Extremely likely to return correct answer (but not always!). Las Vegas version. lways returns correct answer. Extremely likely to run in linear time (but worst case is M N)

85 Rabin-Karp fingerprint search dvantages. Extends to 2d patterns. Extends to finding multiple patterns. Disadvantages. rithmetic ops slower than char compares. Las Vegas version requires backup. Poor worst-case guarantee

86 Substring search cost summary ost of searching for an M-character pattern in an N-character text. algorithm version operation count guarantee typical backup in input? correct? extra space brute force M N 1.1 N yes yes 1 Knuth-Morris-Pratt full DF (lgorithm 5.6 ) mismatch transitions only 2 N 1.1 N no yes MR 3 N 1.1 N no yes M full algorithm 3 N N / M yes yes R Boyer-Moore mismatched char heuristic only (lgorithm 5.7 ) M N N / M yes yes R Rabin-Karp Monte arlo (lgorithm 5.8 ) 7 N 7 N no yes 1 Las Vegas 7 N 7 N no yes yes 1 probabilisitic guarantee, with uniform hash function 86 86

SUBSTRING SEARCH BBM ALGORITHMS DEPT. OF COMPUTER ENGINEERING ERKUT ERDEM. Apr. 28, 2015

SUBSTRING SEARCH BBM ALGORITHMS DEPT. OF COMPUTER ENGINEERING ERKUT ERDEM. Apr. 28, 2015 BBM 202 - LGORITHMS DEPT. OF OMPUTER ENGINEERING ERKUT ERDEM SUBSTRING SERH pr. 28, 2015 cknowledgement: The course slides are adapted from the slides prepared by R. Sedgewick and K. Wayne of Princeton

More information

5.3 Substring Search

5.3 Substring Search 5.3 Substring Search brute force Knuth-Morris-Pratt Boyer-Moore Rabin-Karp lgorithms, 4 th Edition Robert Sedgewick and Kevin Wayne opyright 2002 2010 pril 5, 2011 9:29:31 PM Substring search Goal. Find

More information

Algorithms. Algorithms 5.3 SUBSTRING SEARCH. introduction brute force Knuth Morris Pratt Boyer Moore Rabin Karp ROBERT SEDGEWICK KEVIN WAYNE

Algorithms. Algorithms 5.3 SUBSTRING SEARCH. introduction brute force Knuth Morris Pratt Boyer Moore Rabin Karp ROBERT SEDGEWICK KEVIN WAYNE lgorithms ROBERT SEDGEWIK KEVIN WYNE 5.3 SUBSTRING SERH lgorithms F O U R T H E D I T I O N ROBERT SEDGEWIK KEVIN WYNE introduction brute force Knuth Morris Pratt Boyer Moore Rabin Karp http://algs4.cs.princeton.edu

More information

SUBSTRING SEARCH BBM ALGORITHMS TODAY DEPT. OF COMPUTER ENGINEERING ERKUT ERDEM. Apr. 28, Substring search applications.

SUBSTRING SEARCH BBM ALGORITHMS TODAY DEPT. OF COMPUTER ENGINEERING ERKUT ERDEM. Apr. 28, Substring search applications. M 22 - LGORITHMS TODY DEPT. OF OMPUTER ENGINEERING ERKUT ERDEM Substrng search rute force Knuth-Morrs-Pratt oyer-moore Rabn-Karp SUSTRING SERH pr. 28, 215 cknowledgement:.the$course$sldes$are$adapted$from$the$sldes$prepared$by$r.$sedgewck$

More information

SUBSTRING SEARCH BBM ALGORITHMS TODAY DEPT. OF COMPUTER ENGINEERING ERKUT ERDEM. Apr. 29, Substring search applications.

SUBSTRING SEARCH BBM ALGORITHMS TODAY DEPT. OF COMPUTER ENGINEERING ERKUT ERDEM. Apr. 29, Substring search applications. M 22 - LGORITHMS TODY Substrng search DPT. OF OMPUTR NGINRING RKUT RDM rute force Knuth-Morrs-Pratt oyer-moore Rabn-Karp SUSTRING SRH pr. 29, 214 cknowledgement: The course sldes are adapted from the sldes

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries and String Matching CS 240 - Data Structures and Data Management Sajed Haque Veronika Irvine Taylor Smith Based on lecture notes by many previous cs240 instructors David R. Cheriton School

More information

2. Exact String Matching

2. Exact String Matching 2. Exact String Matching Let T = T [0..n) be the text and P = P [0..m) the pattern. We say that P occurs in T at position j if T [j..j + m) = P. Example: P = aine occurs at position 6 in T = karjalainen.

More information

String Search. 6th September 2018

String Search. 6th September 2018 String Search 6th September 2018 Search for a given (short) string in a long string Search problems have become more important lately The amount of stored digital information grows steadily (rapidly?)

More information

INF 4130 / /8-2017

INF 4130 / /8-2017 INF 4130 / 9135 28/8-2017 Algorithms, efficiency, and complexity Problem classes Problems can be divided into sets (classes). Problem classes are defined by the type of algorithm that can (or cannot) solve

More information

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1 Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Outline and Reading Strings ( 9.1.1) Pattern matching algorithms Brute-force algorithm ( 9.1.2) Boyer-Moore algorithm ( 9.1.3) Knuth-Morris-Pratt

More information

15 Text search. P.D. Dr. Alexander Souza. Winter term 11/12

15 Text search. P.D. Dr. Alexander Souza. Winter term 11/12 Algorithms Theory 15 Text search P.D. Dr. Alexander Souza Text search Various scenarios: Dynamic texts Text editors Symbol manipulators Static texts Literature databases Library systems Gene databases

More information

INF 4130 / /8-2014

INF 4130 / /8-2014 INF 4130 / 9135 26/8-2014 Mandatory assignments («Oblig-1», «-2», and «-3»): All three must be approved Deadlines around: 25. sept, 25. oct, and 15. nov Other courses on similar themes: INF-MAT 3370 INF-MAT

More information

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching Goodrich, Tamassia

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching Goodrich, Tamassia Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Brute-Force Pattern Matching ( 11.2.1) The brute-force pattern matching algorithm compares the pattern P with the text T for each possible shift

More information

Analysis of Algorithms Prof. Karen Daniels

Analysis of Algorithms Prof. Karen Daniels UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Spring, 2012 Tuesday, 4/24/2012 String Matching Algorithms Chapter 32* * Pseudocode uses 2 nd edition conventions 1 Chapter

More information

Algorithm Theory. 13 Text Search - Knuth, Morris, Pratt, Boyer, Moore. Christian Schindelhauer

Algorithm Theory. 13 Text Search - Knuth, Morris, Pratt, Boyer, Moore. Christian Schindelhauer Algorithm Theory 13 Text Search - Knuth, Morris, Pratt, Boyer, Moore Institut für Informatik Wintersemester 2007/08 Text Search Scenarios Static texts Literature databases Library systems Gene databases

More information

Lecture 3: String Matching

Lecture 3: String Matching COMP36111: Advanced Algorithms I Lecture 3: String Matching Ian Pratt-Hartmann Room KB2.38: email: ipratt@cs.man.ac.uk 2017 18 Outline The string matching problem The Rabin-Karp algorithm The Knuth-Morris-Pratt

More information

String Matching II. Algorithm : Design & Analysis [19]

String Matching II. Algorithm : Design & Analysis [19] String Matching II Algorithm : Design & Analysis [19] In the last class Simple String Matching KMP Flowchart Construction Jump at Fail KMP Scan String Matching II Boyer-Moore s heuristics Skipping unnecessary

More information

String Matching. Jayadev Misra The University of Texas at Austin December 5, 2003

String Matching. Jayadev Misra The University of Texas at Austin December 5, 2003 String Matching Jayadev Misra The University of Texas at Austin December 5, 2003 Contents 1 Introduction 1 2 Rabin-Karp Algorithm 3 3 Knuth-Morris-Pratt Algorithm 5 3.1 Informal Description.........................

More information

Efficient Sequential Algorithms, Comp309

Efficient Sequential Algorithms, Comp309 Efficient Sequential Algorithms, Comp309 University of Liverpool 2010 2011 Module Organiser, Igor Potapov Part 2: Pattern Matching References: T. H. Cormen, C. E. Leiserson, R. L. Rivest Introduction to

More information

String Matching. Thanks to Piotr Indyk. String Matching. Simple Algorithm. for s 0 to n-m. Match 0. for j 1 to m if T[s+j] P[j] then

String Matching. Thanks to Piotr Indyk. String Matching. Simple Algorithm. for s 0 to n-m. Match 0. for j 1 to m if T[s+j] P[j] then String Matching Thanks to Piotr Indyk String Matching Input: Two strings T[1 n] and P[1 m], containing symbols from alphabet Σ Goal: find all shifts 0 s n-m such that T[s+1 s+m]=p Example: Σ={,a,b,,z}

More information

Algorithms: COMP3121/3821/9101/9801

Algorithms: COMP3121/3821/9101/9801 NEW SOUTH WALES Algorithms: COMP3121/3821/9101/9801 Aleks Ignjatović School of Computer Science and Engineering University of New South Wales LECTURE 8: STRING MATCHING ALGORITHMS COMP3121/3821/9101/9801

More information

Graduate Algorithms CS F-20 String Matching

Graduate Algorithms CS F-20 String Matching Graduate Algorithms CS673-2016F-20 String Matching David Galles Department of Computer Science University of San Francisco 20-0: String Matching Given a source text, and a string to match, where does the

More information

Hash tables. Hash tables

Hash tables. Hash tables Dictionary Definition A dictionary is a data-structure that stores a set of elements where each element has a unique key, and supports the following operations: Search(S, k) Return the element whose key

More information

Hash tables. Hash tables

Hash tables. Hash tables Dictionary Definition A dictionary is a data-structure that stores a set of elements where each element has a unique key, and supports the following operations: Search(S, k) Return the element whose key

More information

Knuth-Morris-Pratt Algorithm

Knuth-Morris-Pratt Algorithm Knuth-Morris-Pratt Algorithm Jayadev Misra June 5, 2017 The Knuth-Morris-Pratt string matching algorithm (KMP) locates all occurrences of a pattern string in a text string in linear time (in the combined

More information

Searching Sear ( Sub- (Sub )Strings Ulf Leser

Searching Sear ( Sub- (Sub )Strings Ulf Leser Searching (Sub-)Strings Ulf Leser This Lecture Exact substring search Naïve Boyer-Moore Searching with profiles Sequence profiles Ungapped approximate search Statistical evaluation of search results Ulf

More information

Motivation. Dictionaries. Direct Addressing. CSE 680 Prof. Roger Crawfis

Motivation. Dictionaries. Direct Addressing. CSE 680 Prof. Roger Crawfis Motivation Introduction to Algorithms Hash Tables CSE 680 Prof. Roger Crawfis Arrays provide an indirect way to access a set. Many times we need an association between two sets, or a set of keys and associated

More information

Overview. Knuth-Morris-Pratt & Boyer-Moore Algorithms. Notation Review (2) Notation Review (1) The Kunth-Morris-Pratt (KMP) Algorithm

Overview. Knuth-Morris-Pratt & Boyer-Moore Algorithms. Notation Review (2) Notation Review (1) The Kunth-Morris-Pratt (KMP) Algorithm Knuth-Morris-Pratt & s by Robert C. St.Pierre Overview Notation review Knuth-Morris-Pratt algorithm Discussion of the Algorithm Example Boyer-Moore algorithm Discussion of the Algorithm Example Applications

More information

Hash tables. Hash tables

Hash tables. Hash tables Basic Probability Theory Two events A, B are independent if Conditional probability: Pr[A B] = Pr[A] Pr[B] Pr[A B] = Pr[A B] Pr[B] The expectation of a (discrete) random variable X is E[X ] = k k Pr[X

More information

Samson Zhou. Pattern Matching over Noisy Data Streams

Samson Zhou. Pattern Matching over Noisy Data Streams Samson Zhou Pattern Matching over Noisy Data Streams Finding Structure in Data Pattern Matching Finding all instances of a pattern within a string ABCD ABCAABCDAACAABCDBCABCDADDDEAEABCDA Knuth-Morris-Pratt

More information

Formal Languages. We ll use the English language as a running example.

Formal Languages. We ll use the English language as a running example. Formal Languages We ll use the English language as a running example. Definitions. A string is a finite set of symbols, where each symbol belongs to an alphabet denoted by. Examples. The set of all strings

More information

Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts

Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts Domenico Cantone Simone Faro Emanuele Giaquinta Department of Mathematics and Computer Science, University of Catania, Italy 1 /

More information

Fast String Searching. J Strother Moore Department of Computer Sciences University of Texas at Austin

Fast String Searching. J Strother Moore Department of Computer Sciences University of Texas at Austin Fast String Searching J Strother Moore Department of Computer Sciences University of Texas at Austin 1 The Problem One of the classic problems in computing is string searching: find the first occurrence

More information

Text Searching. Thierry Lecroq Laboratoire d Informatique, du Traitement de l Information et des

Text Searching. Thierry Lecroq Laboratoire d Informatique, du Traitement de l Information et des Text Searching Thierry Lecroq Thierry.Lecroq@univ-rouen.fr Laboratoire d Informatique, du Traitement de l Information et des Systèmes. International PhD School in Formal Languages and Applications Tarragona,

More information

Formal Languages. We ll use the English language as a running example.

Formal Languages. We ll use the English language as a running example. Formal Languages We ll use the English language as a running example. Definitions. A string is a finite set of symbols, where each symbol belongs to an alphabet denoted by Σ. Examples. The set of all strings

More information

Authentication. Chapter Message Authentication

Authentication. Chapter Message Authentication Chapter 5 Authentication 5.1 Message Authentication Suppose Bob receives a message addressed from Alice. How does Bob ensure that the message received is the same as the message sent by Alice? For example,

More information

Searching, mainly via Hash tables

Searching, mainly via Hash tables Data structures and algorithms Part 11 Searching, mainly via Hash tables Petr Felkel 26.1.2007 Topics Searching Hashing Hash function Resolving collisions Hashing with chaining Open addressing Linear Probing

More information

Deterministic Finite Automaton (DFA)

Deterministic Finite Automaton (DFA) 1 Lecture Overview Deterministic Finite Automata (DFA) o accepting a string o defining a language Nondeterministic Finite Automata (NFA) o converting to DFA (subset construction) o constructed from a regular

More information

Integer Sorting on the word-ram

Integer Sorting on the word-ram Integer Sorting on the word-rm Uri Zwick Tel viv University May 2015 Last updated: June 30, 2015 Integer sorting Memory is composed of w-bit words. rithmetical, logical and shift operations on w-bit words

More information

Deterministic Finite Automata (DFAs)

Deterministic Finite Automata (DFAs) CS/ECE 374: Algorithms & Models of Computation, Fall 28 Deterministic Finite Automata (DFAs) Lecture 3 September 4, 28 Chandra Chekuri (UIUC) CS/ECE 374 Fall 28 / 33 Part I DFA Introduction Chandra Chekuri

More information

Deterministic Finite Automata (DFAs)

Deterministic Finite Automata (DFAs) Algorithms & Models of Computation CS/ECE 374, Fall 27 Deterministic Finite Automata (DFAs) Lecture 3 Tuesday, September 5, 27 Sariel Har-Peled (UIUC) CS374 Fall 27 / 36 Part I DFA Introduction Sariel

More information

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science

More information

Dynamic Programming. Weighted Interval Scheduling. Algorithmic Paradigms. Dynamic Programming

Dynamic Programming. Weighted Interval Scheduling. Algorithmic Paradigms. Dynamic Programming lgorithmic Paradigms Dynamic Programming reed Build up a solution incrementally, myopically optimizing some local criterion Divide-and-conquer Break up a problem into two sub-problems, solve each sub-problem

More information

UNIT-III REGULAR LANGUAGES

UNIT-III REGULAR LANGUAGES Syllabus R9 Regulation REGULAR EXPRESSIONS UNIT-III REGULAR LANGUAGES Regular expressions are useful for representing certain sets of strings in an algebraic fashion. In arithmetic we can use the operations

More information

Insert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park

Insert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park 1617 Preview Data Structure Review COSC COSC Data Structure Review Linked Lists Stacks Queues Linked Lists Singly Linked List Doubly Linked List Typical Functions s Hash Functions Collision Resolution

More information

Mechanized Operational Semantics

Mechanized Operational Semantics Mechanized Operational Semantics J Strother Moore Department of Computer Sciences University of Texas at Austin Marktoberdorf Summer School 2008 (Lecture 5: Boyer-Moore Fast String Searching) 1 The Problem

More information

A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms

A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms Simone Faro Thierry Lecroq University of Catania, Italy University of Rouen, LITIS EA 4108, France Symposium on Eperimental Algorithms

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

Deterministic Finite Automata (DFAs)

Deterministic Finite Automata (DFAs) Algorithms & Models of Computation CS/ECE 374, Spring 29 Deterministic Finite Automata (DFAs) Lecture 3 Tuesday, January 22, 29 L A TEXed: December 27, 28 8:25 Chan, Har-Peled, Hassanieh (UIUC) CS374 Spring

More information

All three must be approved Deadlines around: 21. sept, 26. okt, and 16. nov

All three must be approved Deadlines around: 21. sept, 26. okt, and 16. nov INF 4130 / 9135 29/8-2012 Today s slides are produced mainly by Petter Kristiansen Lecturer Stein Krogdahl Mandatory assignments («Oblig1», «-2», and «-3»): All three must be approved Deadlines around:

More information

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata CISC 4090: Theory of Computation Chapter Regular Languages Xiaolan Zhang, adapted from slides by Prof. Werschulz Section.: Finite Automata Fordham University Department of Computer and Information Sciences

More information

Lecture 7: Fingerprinting. David Woodruff Carnegie Mellon University

Lecture 7: Fingerprinting. David Woodruff Carnegie Mellon University Lecture 7: Fingerprinting David Woodruff Carnegie Mellon University How to Pick a Random Prime How to pick a random prime in the range {1, 2,, M}? How to pick a random integer X? Pick a uniformly random

More information

Lecture 5: The Shift-And Method

Lecture 5: The Shift-And Method Biosequence Algorithms, Spring 2005 Lecture 5: The Shift-And Method Pekka Kilpeläinen University of Kuopio Department of Computer Science BSA Lecture 5: Shift-And p.1/19 Seminumerical String Matching Most

More information

CpSc 421 Final Exam December 15, 2006

CpSc 421 Final Exam December 15, 2006 CpSc 421 Final Exam December 15, 2006 Do problem zero and six of problems 1 through 9. If you write down solutions for more that six problems, clearly indicate those that you want graded. Note that problems

More information

Turing Machines Part II

Turing Machines Part II Turing Machines Part II Problem Set Set Five Five due due in in the the box box up up front front using using a late late day. day. Hello Hello Condensed Slide Slide Readers! Readers! This This lecture

More information

The streaming k-mismatch problem

The streaming k-mismatch problem The streaming k-mismatch problem Raphaël Clifford 1, Tomasz Kociumaka 2, and Ely Porat 3 1 Department of Computer Science, University of Bristol, United Kingdom raphael.clifford@bristol.ac.uk 2 Institute

More information

V Honors Theory of Computation

V Honors Theory of Computation V22.0453-001 Honors Theory of Computation Problem Set 3 Solutions Problem 1 Solution: The class of languages recognized by these machines is the exactly the class of regular languages, thus this TM variant

More information

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is Abstraction of Problems Data: abstracted as a word in a given alphabet. Σ: alphabet, a finite, non-empty set of symbols. Σ : all the words of finite length built up using Σ: Conditions: abstracted as a

More information

Hashing Techniques For Finite Automata

Hashing Techniques For Finite Automata Hashing Techniques For Finite Automata Hady Zeineddine Logic Synthesis Course Project - Spring 2007 Professor Adnan Aziz 1. Abstract This report presents two hashing techniques - Alphabet and Power-Set

More information

Chapter 5. Finite Automata

Chapter 5. Finite Automata Chapter 5 Finite Automata 5.1 Finite State Automata Capable of recognizing numerous symbol patterns, the class of regular languages Suitable for pattern-recognition type applications, such as the lexical

More information

Turing Machine Variants

Turing Machine Variants CS311 Computational Structures Turing Machine Variants Lecture 12 Andrew Black Andrew Tolmach 1 The Church-Turing Thesis The problems that can be decided by an algorithm are exactly those that can be decided

More information

Automata Theory and Formal Grammars: Lecture 1

Automata Theory and Formal Grammars: Lecture 1 Automata Theory and Formal Grammars: Lecture 1 Sets, Languages, Logic Automata Theory and Formal Grammars: Lecture 1 p.1/72 Sets, Languages, Logic Today Course Overview Administrivia Sets Theory (Review?)

More information

Automata and Computability. Solutions to Exercises

Automata and Computability. Solutions to Exercises Automata and Computability Solutions to Exercises Spring 27 Alexis Maciel Department of Computer Science Clarkson University Copyright c 27 Alexis Maciel ii Contents Preface vii Introduction 2 Finite Automata

More information

Lecture 18 - Secret Sharing, Visual Cryptography, Distributed Signatures

Lecture 18 - Secret Sharing, Visual Cryptography, Distributed Signatures Lecture 18 - Secret Sharing, Visual Cryptography, Distributed Signatures Boaz Barak November 27, 2007 Quick review of homework 7 Existence of a CPA-secure public key encryption scheme such that oracle

More information

Compilers. Lexical analysis. Yannis Smaragdakis, U. Athens (original slides by Sam

Compilers. Lexical analysis. Yannis Smaragdakis, U. Athens (original slides by Sam Compilers Lecture 3 Lexical analysis Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Big picture Source code Front End IR Back End Machine code Errors Front end responsibilities Check

More information

Formal Definition of Computation. August 28, 2013

Formal Definition of Computation. August 28, 2013 August 28, 2013 Computation model The model of computation considered so far is the work performed by a finite automaton Finite automata were described informally, using state diagrams, and formally, as

More information

Finite Automata Part Two

Finite Automata Part Two Finite Automata Part Two Recap from Last Time Old MacDonald Had a Symbol, Σ-eye-ε-ey, Oh! You may have noticed that we have several letter- E-ish symbols in CS103, which can get confusing! Here s a quick

More information

Automata and Computability. Solutions to Exercises

Automata and Computability. Solutions to Exercises Automata and Computability Solutions to Exercises Fall 28 Alexis Maciel Department of Computer Science Clarkson University Copyright c 28 Alexis Maciel ii Contents Preface vii Introduction 2 Finite Automata

More information

CSE 311: Foundations of Computing. Lecture 26: Cardinality

CSE 311: Foundations of Computing. Lecture 26: Cardinality CSE 311: Foundations of Computing Lecture 26: Cardinality Cardinality and Computability Computers as we know them grew out of a desire to avoid bugs in mathematical reasoning A brief history of reasoning

More information

Three new strategies for exact string matching

Three new strategies for exact string matching Three new strategies for exact string matching Simone Faro 1 Thierry Lecroq 2 1 University of Catania, Italy 2 University of Rouen, LITIS EA 4108, France SeqBio 2012 November 26th-27th 2012 Marne-la-Vallée,

More information

CPSC 467: Cryptography and Computer Security

CPSC 467: Cryptography and Computer Security CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 16 October 30, 2017 CPSC 467, Lecture 16 1/52 Properties of Hash Functions Hash functions do not always look random Relations among

More information

CS 580: Algorithm Design and Analysis

CS 580: Algorithm Design and Analysis CS 580: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 2018 Announcements: Homework 6 deadline extended to April 24 th at 11:59 PM Course Evaluation Survey: Live until 4/29/2018

More information

Languages, regular languages, finite automata

Languages, regular languages, finite automata Notes on Computer Theory Last updated: January, 2018 Languages, regular languages, finite automata Content largely taken from Richards [1] and Sipser [2] 1 Languages An alphabet is a finite set of characters,

More information

Lexical Analysis: DFA Minimization & Wrap Up

Lexical Analysis: DFA Minimization & Wrap Up Lexical Analysis: DFA Minimization & Wrap Up Automating Scanner Construction PREVIOUSLY RE NFA (Thompson s construction) Build an NFA for each term Combine them with -moves NFA DFA (subset construction)

More information

CP405 Theory of Computation

CP405 Theory of Computation CP405 Theory of Computation BB(3) q 0 q 1 q 2 0 q 1 1R q 2 0R q 2 1L 1 H1R q 1 1R q 0 1L Growing Fast BB(3) = 6 BB(4) = 13 BB(5) = 4098 BB(6) = 3.515 x 10 18267 (known) (known) (possible) (possible) Language:

More information

Parallel Rabin-Karp Algorithm Implementation on GPU (preliminary version)

Parallel Rabin-Karp Algorithm Implementation on GPU (preliminary version) Bulletin of Networking, Computing, Systems, and Software www.bncss.org, ISSN 2186-5140 Volume 7, Number 1, pages 28 32, January 2018 Parallel Rabin-Karp Algorithm Implementation on GPU (preliminary version)

More information

Foundations of

Foundations of 91.304 Foundations of (Theoretical) Computer Science Chapter 3 Lecture Notes (Section 3.2: Variants of Turing Machines) David Martin dm@cs.uml.edu With some modifications by Prof. Karen Daniels, Fall 2012

More information

Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering

Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering Tasks of lexer CISC 5920: Compiler Construction Chapter 2 Lexical Analysis Arthur G. Werschulz Fordham University Department of Computer and Information Sciences Copyright Arthur G. Werschulz, 2017. All

More information

String Range Matching

String Range Matching String Range Matching Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi Department of Computer Science, University of Helsinki Helsinki, Finland firstname.lastname@cs.helsinki.fi Abstract. Given strings

More information

A Pattern Matching Algorithm Using Deterministic Finite Automata with Infixes Checking. Jung-Hua Hsu

A Pattern Matching Algorithm Using Deterministic Finite Automata with Infixes Checking. Jung-Hua Hsu A Pattern Matching Algorithm Using Deterministic Finite Automata with Infixes Checking Jung-Hua Hsu A Pattern Matching Algorithm Using Deterministic Finite Automata with Infixes Checking Student:Jung-Hua

More information

Outline. 1 Merging. 2 Merge Sort. 3 Complexity of Sorting. 4 Merge Sort and Other Sorts 2 / 10

Outline. 1 Merging. 2 Merge Sort. 3 Complexity of Sorting. 4 Merge Sort and Other Sorts 2 / 10 Merge Sort 1 / 10 Outline 1 Merging 2 Merge Sort 3 Complexity of Sorting 4 Merge Sort and Other Sorts 2 / 10 Merging Merge sort is based on a simple operation known as merging: combining two ordered arrays

More information

2. ALGORITHM ANALYSIS

2. ALGORITHM ANALYSIS 2. ALGORITHM ANALYSIS computational tractability asymptotic order of growth survey of common running times Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley http://www.cs.princeton.edu/~wayne/kleinberg-tardos

More information

Nondeterminism. September 7, Nondeterminism

Nondeterminism. September 7, Nondeterminism September 7, 204 Introduction is a useful concept that has a great impact on the theory of computation Introduction is a useful concept that has a great impact on the theory of computation So far in our

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17 12.1 Introduction Today we re going to do a couple more examples of dynamic programming. While

More information

Closure under the Regular Operations

Closure under the Regular Operations September 7, 2013 Application of NFA Now we use the NFA to show that collection of regular languages is closed under regular operations union, concatenation, and star Earlier we have shown this closure

More information

ALG 4.1 Randomized Pattern Matching:

ALG 4.1 Randomized Pattern Matching: Algorithms Professor John Reif Generalized Pattern Matching ALG 4.1 Randomized Pattern Matching: Reading Selection: CLR: Chapter 12 Handout: R. Karp and M. Rabin, "Efficient Randomized Pattern-Matching"

More information

CSCI Honor seminar in algorithms Homework 2 Solution

CSCI Honor seminar in algorithms Homework 2 Solution CSCI 493.55 Honor seminar in algorithms Homework 2 Solution Saad Mneimneh Visiting Professor Hunter College of CUNY Problem 1: Rabin-Karp string matching Consider a binary string s of length n and another

More information

4.5 Applications of Congruences

4.5 Applications of Congruences 4.5 Applications of Congruences 287 66. Find all solutions of the congruence x 2 16 (mod 105). [Hint: Find the solutions of this congruence modulo 3, modulo 5, and modulo 7, and then use the Chinese remainder

More information

CPSC 421: Tutorial #1

CPSC 421: Tutorial #1 CPSC 421: Tutorial #1 October 14, 2016 Set Theory. 1. Let A be an arbitrary set, and let B = {x A : x / x}. That is, B contains all sets in A that do not contain themselves: For all y, ( ) y B if and only

More information

Nondeterministic Finite Automata

Nondeterministic Finite Automata Nondeterministic Finite Automata Mahesh Viswanathan Introducing Nondeterminism Consider the machine shown in Figure. Like a DFA it has finitely many states and transitions labeled by symbols from an input

More information

A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS *

A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS * A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS * 1 Jorma Tarhio and Esko Ukkonen Department of Computer Science, University of Helsinki Tukholmankatu 2, SF-00250 Helsinki,

More information

Lecture 3: Nondeterministic Finite Automata

Lecture 3: Nondeterministic Finite Automata Lecture 3: Nondeterministic Finite Automata September 5, 206 CS 00 Theory of Computation As a recap of last lecture, recall that a deterministic finite automaton (DFA) consists of (Q, Σ, δ, q 0, F ) where

More information

Streaming for Aibohphobes: Longest Near-Palindrome under Hamming Distance

Streaming for Aibohphobes: Longest Near-Palindrome under Hamming Distance Streaming for Aibohphobes: Longest Near-Palindrome under Hamming Distance Elena Grigorescu, Purdue University Erfan Sadeqi Azer, Indiana University Samson Zhou, Purdue University Structure of Talk Background

More information

Continuing discussion of CRC s, especially looking at two-bit errors

Continuing discussion of CRC s, especially looking at two-bit errors Continuing discussion of CRC s, especially looking at two-bit errors The definition of primitive binary polynomials Brute force checking for primitivity A theorem giving a better test for primitivity Fast

More information

Deterministic Finite Automata (DFAs)

Deterministic Finite Automata (DFAs) Algorithms & Models of Computation CS/ECE 374, Fall 27 Deterministic Finite Automata (DFAs) Lecture 3 Tuesday, September 5, 27 Part I DFA Introduction Sariel Har-Peled (UIUC) CS374 Fall 27 / 36 Sariel

More information

Pattern Matching (Exact Matching) Overview

Pattern Matching (Exact Matching) Overview CSI/BINF 5330 Pattern Matching (Exact Matching) Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Pattern Matching Exhaustive Search DFA Algorithm KMP Algorithm

More information

UNIVERSITY of OSLO. Faculty of Mathematics and Natural Sciences. INF 4130/9135: Algorithms: Design and efficiency Date of exam: 12th December 2014

UNIVERSITY of OSLO. Faculty of Mathematics and Natural Sciences. INF 4130/9135: Algorithms: Design and efficiency Date of exam: 12th December 2014 UNIVERSITY of OSLO Faculty of Mathematics and Natural Sciences Exam in: INF 4130/9135: Algorithms: Design and efficiency Date of exam: 12th December 2014 Exam hours: 09:00 13:00 (4 hours) Exam paper consists

More information

Cryptographic Hash Functions

Cryptographic Hash Functions Cryptographic Hash Functions Çetin Kaya Koç koc@ece.orst.edu Electrical & Computer Engineering Oregon State University Corvallis, Oregon 97331 Technical Report December 9, 2002 Version 1.5 1 1 Introduction

More information

Inf2A: The Pumping Lemma

Inf2A: The Pumping Lemma Inf2A: Stuart Anderson School of Informatics University of Edinburgh October 8, 2009 Outline 1 Deterministic Finite State Machines and Regular Languages 2 3 4 The language of a DFA ( M = Q, Σ, q 0, F,

More information

CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata

CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata Sungjin Im University of California, Merced 1-27-215 Nondeterminism Michael Rabin and Dana Scott (1959) Michael Rabin Dana

More information