Fingerprint ide Assume: We cn compute fingerprint f(p) of P in O(m) time. If f(p) f(t[s.. s+m 1]), then P T[s.. s+m 1] We cn compre fingerprints in O(1) We cn compute f = f(t[s+1.. s+m]) from f(t[s.. s+m 1]), in O(1) f AALG, lecture 3, Simons Šltenis, 2004 f
Algorithm with Fingerprints Let the lphet ={0,1,2,3,4,5,6,7,8,9} Let fingerprint to e just deciml numer, i.e., f( 1045 ) = 1*103 + 0*102 + 4*101 + 5 = 1045 Fingerprint-Serch(T,P) 01 fp compute f(p) 02 f compute f(t[0..m 1]) 03 for s 0 to n m do 04 if fp = f return s 05 f (f T[s]*10 m-1 )*10 + T[s+m] 06 return 1 T[s] new f f T[s+m] Running time 2O(m) + O(n m) = O(n) AALG, lecture 3, Simons Šltenis, 2004
Using Hsh Function Prolem: we cn not ssume we cn do rithmetics with m-digits-long numers in O(1) time Solution: Use hsh function h = f mod q For exmple, if q = 7, h( 52 ) = 52 mod 7 = 3 h(s1) h(s2) S1 S2 But h(s1) = h(s2) does not imply S1=S2 For exmple, if q = 7, h( 73 ) = 3, ut 73 52 Bsic mod q rithmetics: (+) mod q = ( mod q + mod q) mod q (*) mod q = ( mod q)*( mod q) mod q AALG, lecture 3, Simons Šltenis, 2004
Preprocessing nd Stepping Preprocessing: fp = P[m-1] + 10*(P[m-2] + 10*(P[m-3]+ + 10*(P[1] + 10*P[0]) )) mod q In the sme wy compute ft from T[0..m-1] Exmple: P = 2531, q = 7, fp =? Stepping: ft = (ft T[s]*10 m-1 mod q)*10 + T[s+m]) mod q 10 m-1 mod q cn e computed once in the preprocessing Exmple: Let T[ ] = 5319, q = 7, wht is the corresponding ft? T[s] new ft AALG, lecture 3, Simons Šltenis, 2004 ft T[s+m]
Stepping T = 25316446766, m = 4, q=7 T 0 = 2531 ft = 2531 mod 7 = 4 T 1 = 5319 ft = ((ft T[s]*(10 m-1 mod q))*10 + T[s+m]) mod q ft = ((ft T[0]*(10 3 mod 7))*10 + T[0+4]) mod 7 = ((4 (2*1000 mod 7)) * 10 + T[4]) mod 7 = ((4-(2*6))*10+6) mod 7 = (-8*10+ 9) mod 7 = -71 mod 7 = 6 5319 mod 7 = 6
Rin-Krp Algorithm Rin-Krp-Serch(T,P) 01 q prime lrger thn m 02 c 10 m-1 mod q // run loop multiplying y 10 mod q 03 fp 0; ft 0 04 for i 0 to m-1 // preprocessing 05 fp (10*fp + P[i]) mod q 06 ft (10*ft + T[i]) mod q 07 for s 0 to n m // mtching 08 if fp = ft then // run loop to compre strings 09 if P[0..m-1] = T[s..s+m-1] return s 10 ft ((ft T[s]*c)*10 + T[s+m]) mod q 11 return 1 AALG, lecture 3, Simons Šltenis, 2004
Anlysis If q is prime, the hsh function distriutes m-digit strings evenly mong the q vlues Thus, only every q th vlue of shift s will result in mtching fingerprints (which will require compring strings with O(m) comprisons) Expected running time (if q > m): Preprocessing: O(m) Outer loop: O(n-m) All inner loops: Totl time: O(n-m) Worst-cse running time: O(nm) n m m O n m q AALG, lecture 3, Simons Šltenis, 2004
Rin-Krp in Prctice If the lphet hs d chrcters, interpret chrcters s rdix-d digits (replce 10 with d in the lgorithm). Choosing prime q > m cn e done with rndomized lgorithms in O(m), or q cn e fixed to e the lrgest prime so tht 10*q fits in computer word. AALG, lecture 3, Simons Šltenis, 2004
Serching in n comprisons The gol: ech chrcter of the text is compred only once! Prolem with the nïve lgorithm: Forgets wht ws lerned from prtil mtch! Exmples: T = Tweedledee nd Tweedledum nd P = Tweedledum T = pppppppr nd P = pppr AALG, lecture 3, Simons Šltenis, 2004
Finite utomton serch c input stte c P 0 1 0 0 1 1 2 0 2 3 0 0 3 1 4 0 4 5 0 0 5 1 4 6 c 6 7 0 0 7 1 2 0 i -- 1 2 3 4 5 6 7 8 9 10 11 T[i] -- c stte (i) 0 1 2 3 4 5 4 5 6 7 2 3 Processing time tkes (n). But hve to first construct FA. Min Issue: How to construct FA?
Need some Nottion (w) = stte FA ends up in fter processing w. Exmple: () = 4. (x) = mx{k: P k suf x}. Clled the suffix function. Exmples: Let P =. () = 0 (ccc) = 1 (cc) = 2 Note: If P = m, then (x) = m indictes mtch. T: c Sttes: 0 1...m..m. mtch mtch
FA Construction Given: P[1..m] Let Q = sttes = {0, 1,, m}. initil finl Define trnsition function s follows: (q, ) = (P q ) for ech q nd. Exmple: (5, ) = (P 5 ) = () = 4 Intuition: Encountering in stte 5 mens the current sustring doesn t mtch. But, you know this sustring ends with -- nd this is the longest suffix tht mtches the eginning of P. Thus, we go to stte 4 nd continue processing.
P=c,c c m=7; Q={0,1,2,3,4,5,6,7) Prefixes c c
P=c,c c (1, ) = (P 1 ) = () = () = 1 Prefixes c c
P=c,c c c (1, ) = (P 1 ) = () = () = 1 (1, c) = (P 1 c) = (c) = 0 Prefixes c c
P=c,c c c c (2, ) = (P 2 ) = () = 0 (2, c) = (P 2 c) = (c) = 0 Prefixes c c
P=c (fst forwrd & simplified),c c (5, ) = (P 5 ) = () = () = 1 (5, ) = (P 5 ) = () = () = 4 Prefixes c c
P=c (finl, simplified),c c
Serch,c c T= c Prefixes c c
Serch,c c T= c Prefixes c c
Serch,c c T= c Prefixes c c
Serch,c c T= c Prefixes c c
Serch,c c T= c Prefixes c c
Serch,c c T= c Prefixes c c
Serch,c c T= c Prefixes c c
Serch,c c T= c Prefixes c c
Serch,c c T= c Accept stte, we re done Prefixes c c
Anlysis of FA Serching: O(n) good Preprocessing: O(m ) d Memory: O(m ) d