Smll Pth Quiz ML n Dtss Cn you giv n xprssion tht rturns th lst / irst ourrn o h istint pri lmnt? Ltur 8 Strming Evlution: how muh mmory o you n? Sstin Mnth NICTA n UNSW <> <pri></pri> <pri></pri> <pri></pri> <pri></pri> <pri></pri> <pri>4</pri> <pri></pri> <pri>7</pri> </> Shoul rturn <pri></pri> <pri>4</pri> <pri></pri> <pri>7</pri> Shoul rturn <pri></pri> <pri></pri> <pri>4</pri> <pri>7</pri> CSE@UNSW -- Smstr, 009 0. Rll Smll Pth Quiz Cn you giv n xprssion tht rturns th lst / irst ourrn o h istint pri lmnt? Evlution o Simpl Pths //// Aritrry Quris ovr //, /, * Outlin <> <pri>.0</pri> <pri></pri> <pri>.00</pri> <pri></pri> <pri></pri> <pri>4</pri> <pri>.000</pri> <pri>7</pri> </> Shoul rturn <pri></pri> <pri>4</pri> <pri>.000</pri> <pri>7</pri> Shoul rturn <pri>.0</pri> <pri></pri> <pri>4</pri> <pri>7</pri>. Automton Approh. Prlll Evlution o Multipl Quris. Sizs o Automt 4. How to l with Filtrs 5. Existing Systms or Strming Pth Evlution Wht i w mn numr-istintnss (not strings)? 4 Rll: p-down Evlution o Simpl Pths Rll: p-down Evlution o Simpl Pths vlut in on singl pr-orr trvrsl (using stk) vlut in on singl pr-orr trvrsl (using stk) /// =Q qury mth position: p = /// =Q qury mth position: p = [nelmnt( )] p = pop() = [strtelmnt( )] [strtelmnt( )] rsult no [strtelmnt( )] [nelmnt( )] [strtelmnt( )] [strtelmnt( )] [nelmnt( )] [strtelmnt( )] rsult no [strtelmnt( )] push() [nelmnt( )] p = pop() = [strtelmnt( )] push() [nelmnt( )] p = pop() = 5 Simpl Pth //_/_/_/... /_n Strming Algorithm! TIME on pss through oumnt tr. SPACE stk o qury positions. hight is oun y pth o oumnt tr. No n to stor th oumnt!! Cn vlut on SA vnt strm. BUT N output urs, i sutrs o mth nos shoul print! 6
Rll: p-down Evlution o Simpl Pths Rll: p-down Evlution o Simpl Pths vlut in on singl pr-orr trvrsl (using stk) vlut in on singl pr-orr trvrsl (using stk) I w print no-ids, thn no output urs r n! Tru Strming, with mmory n proportionl to hight. /// =Q qury mth position: p = Strming Algorithm! No n to stor th oumnt!! Cn vlut on SA vnt strm. I w print no-ids, thn no output urs r n! ny goo implmnttion o this lgorithm shoul work or oumnts with pth up to oupl o millions, n NO rstrition on oumnt siz! /// =Q qury mth position: p = Strming Algorithm! No n to stor th oumnt!! Cn vlut on SA vnt strm. Simpl Pth //_/_/_/... /_n TIME on pss through oumnt tr. SPACE stk o qury positions. hight is oun y pth o oumnt tr. BUT N output urs, i sutrs o mth nos shoul print! 7 Simpl Pth //_/_/_/... /_n TIME on pss through oumnt tr. SPACE stk o qury positions. hight is oun y pth o oumnt tr. Byt is nough or smll quris! 8 Aritrry Slsh+Slshslsh Aritrry Slsh+Slshslsh vlut in on singl pr-orr trvrsl (using stk) vlut in on singl pr-orr trvrsl (using stk) Aritrry quris with /,//,* multipl // s ///// qury mth position: p = Aritrry quris with /,//,* multipl // s ///// qury mth position: p = no mth sty in p=! [strtelmnt( )] push() [nelmnt( )] p = pop() = no mth sty in p=! [strtelmnt( )] push() [nelmnt( )] p = pop() = [strtelmnt( )] push() [strtelmnt( )] push() Rsult no! Mrk it, n sty in p=. 9 0 Aritrry Slsh+Slshslsh Aritrry Slsh+Slshslsh vlut in on singl pr-orr trvrsl (using stk) Aritrry quris with /,//,* ///// multipl // s qury mth position: p = vlut in on singl pr-orr trvrsl (using stk) Aritrry quris with /,//,* ///// multipl // s qury mth position: p = no mth sty in p=! [strtelmnt( )] push() [nelmnt( )] p = pop() = [strtelmnt( )] push() [strtelmnt( )] push() Rsult no! Mrk it, n sty in p=. no mth sty in p=! [strtelmnt( )] push() [nelmnt( )] p = pop() = [strtelmnt( )] push() [strtelmnt( )] push() [nelmnt( )] p = pop() = [strtelmnt( )] push() [strtelmnt( )] push() Output No-ID Strt opying to Output Bur
Aritrry Slsh+Slshslsh Aritrry Slsh+Slshslsh vlut in on singl pr-orr trvrsl (using stk) Aritrry quris with /,//,* ///// multipl // s qury mth position: p = vlut in on singl pr-orr trvrsl (using stk) Aritrry quris with /,//,* ///// multipl // s qury mth position: p = Optimiztions (or Output Burs) Sty t position, or th omplt sutr! Nvr go k to pos. or pos.! () I insi mth sutr, ror position (or rng within ur), inst o rting nw output ur. () I sutr is inish (w r not insi mth), thn w n writ its ur out n n strt with mpty ur gin. [ Worst Cs: root no slt. siz o o. N. ] 4 vlut in on singl pr-orr trvrsl Aritrry quris with /,//,* multipl // s Aritrry Slsh+Slshslsh (using stk) ///// qury mth position: p = vlut in on singl pr-orr trvrsl Aritrry quris with /,//,* multipl // s Aritrry Slsh+Slshslsh (using stk) ///// qury mth position: p = ///////*////g//h ///////*////g//h Sm s or jump k within /-squn. AT MOST to th ginning o th lst //. Us KMP within /-squn. For * s: uil svrl KMP-tls. Qury Prolm is solv! Lv optimiztions o >t il.xml [..7,.,..., ] OS/UNI hkrs.. I No-IDs r print, thn no output urs r n. Thn: Mmory proportionl to hight. Shoul run or ritrry lrg os! 5 6. Automton Approh. Automton Approh * g h * g h ///////*////g//h, not n not ///////*////g//h Sm s or jump k within /-squn. AT MOST to th ginning o th lst //. Us KMP within /-squn. For * s: uil svrl KMP-tls. Rll Dtrministi Automton runs in linr tim n onstnt sp (plus stk o stts, i w run on pths o tr) 7 Sm s or jump k within /-squn. AT MOST to th ginning o th lst //. Us KMP within /-squn. For * s: uil svrl KMP-tls. Rll Dtrministi Automton runs in linr tim n onstnt sp (plus stk o stts, i w run on pths o tr) 8
. Automton Approh. Automton Approh * g h * g h,,,, not n not ///////*////g//h not n not ///////*////g//h Sm s or jump k within /-squn. AT MOST to th ginning o th lst //. Us KMP within /-squn. For * s: uil svrl KMP-tls. Rll Dtrministi Automton runs in linr tim n onstnt sp (plus stk o stts, i w run on pths o tr) 9 E.g., W shoul in stt! Prolm I it is NOT n hr, thn wht to o?? 0. Automton Approh. Automton Approh *= g h, not n not, ///////*////g//h *= g h,, not n not,, ///////*////g//h E.g., W shoul in stt! Prolm I it is NOT n hr, thn wht to o?? N to know wht th * ws!! *=? Whih othr lttrs n to onsir? x y splitting n t most #irnt symols mny *= g h,, not n not,, ///////*////g//h *= g h,, not n not,, ///////*////g//h *=? Whih othr lttrs n to onsir? *=? Whih othr lttrs n to onsir? x y or x, not importnt wht x is only x= / x mttrs x y or x, not importnt wht x is only x= / x mttrs 4 4
, gin+, gin+ *= g h,, mth stt not,, or n not = ///////*////g//h *=? Whih othr lttrs n to onsir? *= g h, not n not,,, = ///////*////g//h Avntg o utomt: n omin to vlut MANY quris in prlll. mth stt or x y or x, not importnt wht x is only x= / x mttrs 5 6, gin+, gin+ *= g h, not n not,,, = ///////*////g//h Avntg o utomt: n omin to vlut MANY quris in prlll. mth stt or *= g h, not n not,,, = ///////*////g//h Avntg o utomt: n omin to vlut MANY quris in prlll. mth stt or =//// Q=/// Q =//// Q=/// Q Qustions. Whih trnsition is WRONG?,, 7,, 8, gin+, gin+ *= g h, not n not,,, = ///////*////g//h Avntg o utomt: n omin to vlut MANY quris in prlll. =//// Q=///,, Q Qustions mth stt or. Whih trnsition is WRONG?. How mny trnsitions r missing? 9 *= g h, not n not,,, = ///////*////g//h Avntg o utomt: n omin to vlut MANY quris in prlll. =//// Q=///,, Q Qustions mth stt or. Whih trnsition is WRONG?. How mny trnsitions r missing? 0 5
, gin+, gin+ *= g h, not n not,,, = ///////*////g//h Avntg o utomt: n omin to vlut MANY quris in prlll. =//// Q=///,, Q, Qustions mth stt or. Whih trnsition is WRONG?. How mny trnsitions r missing? *= g h, not n not,,, = ///////*////g//h Avntg o utomt: n omin to vlut MANY quris in prlll. =//// Q=///,, Q, Qustions mth stt or. Whih trnsition is WRONG?. How mny trnsitions r missing?, gin+, gin+ *= g h, not n not,,, = ///////*////g//h Avntg o utomt: n omin to vlut MANY quris in prlll. =//// Q=///,, Q, Qustions mth stt or. Whih trnsition is WRONG?. How mny trnsitions r missing? *= g h, not n not,,, = ///////*////g//h Avntg o utomt: n omin to vlut MANY quris in prlll. =//// Q=///,, Q, Qustions mth stt or. Whih trnsition is WRONG?. How mny trnsitions r missing? 4, gin+, gin+ *= g h, not n not,,, = ///////*////g//h Avntg o utomt: n omin to vlut MANY quris in prlll. =//// Q=/// Q,,, Qustions mth stt or. Whih trnsition is WRONG?. How mny trnsitions r 5 missing? 5 *= g h, not n not,,, = ///////*////g//h Avntg o utomt: n omin to vlut MANY quris in prlll. =//// Q=/// mth stt or ONE look-up Q pr no! Comin utomton: SIZE SIZE(A) x SIZE(A),,, 6 6
, gin+. Th Siz o th DFA *= g h, not n not,,, Avntg o utomt: n omin to vlut MANY quris in prlll. =//// Q=/// Qustion Wht is SIZE(A) wrt siz o? mth stt or Tk () SIZE(A) = #stts () SIZE(A) = #trnsitions ONE look-up Q pr no! Comin utomton: SIZE SIZE(A) x SIZE(A),,, 7 ///*/*/*/ Siz Siz o o DFA DFA = xponntil in in * s * s (not (not rl rl onrn) NFA 0 * * * 4 5 * 0 [othr] 0 [othr] 0 0 [othr] [othr] 0 0 0 0 [othr] [othr] 04 04 04 04 045 045 045 045 DFA (rgmnt, n without k gs). Th Siz o th DFA How to l with iltrs? //[.//]/// Thorm [GMOS 0] Th numr o stts in th DFA or on linr Pth xprssion P is t most: k+ P k s m Whn w mt th -nos (in pr orr trvrsl) w o not know yt i th iltr will vlut to tru. k = numr o // s = siz o th lpht (numr o tgs) m = mx numr o * twn two onsutiv // 40 How to l with iltrs? //[.//]/// How to l with iltrs? //[.//]/// Whn w mt th -nos (in pr orr trvrsl) w o not know yt i th iltr will vlut to tru. Siz o lrgst oumnts tht n strm in this wy pns on - #iltrs, - sizs o (pr) slt trs, - qulity o (), (), t.. W hv to us urs, s or. Qustion Howvr, now urs my lt without ing us. Optimiztions () Stor potntil mth trs s DAGs () Rls potntil mth trs s rly s possil! Must stor in mmory I w output no ID s, thn how muh mmory is n in th worst s or quris with iltrs? 4 Must stor in mmory 4 7
How to l with iltrs? //[.//]/// How to l with iltrs? //[.//]/// Siz o lrgst oumnts tht n strm in this wy pns on - #iltrs, - sizs o (pr) slt trs, - qulity o (), (), t.. Siz o lrgst oumnts tht n strm in this wy pns on - #iltrs, - sizs o (pr) slt trs, - qulity o (), (), t.. Rls potntil mth trs s rly s possil! Rls potntil mth trs s rly s possil! Fin rlist point t whih w know th iltr is tru. Fin rlist point t whih w know th iltr is tru. Must stor in mmory 4 No n to stor. Strm! 44 How to l with iltrs? How to l with iltrs? //[.//]/// //[.//]/// Siz o lrgst oumnts tht n strm in this wy pns on - #iltrs, - sizs o (pr) slt trs, - qulity o (), (), t.. Siz o lrgst oumnts tht n strm in this wy pns on - #iltrs, - sizs o (pr) slt trs, - qulity o (), (), t.. Fin rlist point t whih w know th iltr is tru. Hrr or Booln omintions: Fin rlist point t whih w know th iltr is tru. Hrr or Booln omintions: [not(.//) n (.// or ///)] [not(.//) n (.// or ///)] No n to stor. Strm! Qustion whr is th rlist point or this iltr? 45 No n to stor. Strm! Qustion whr is th rlist point or this iltr? n now? 46 How to l with iltrs? How to l with iltrs? //[.//]/// //[.//]/// Siz o lrgst oumnts tht n strm in this wy pns on - #iltrs, - sizs o (pr) slt trs, - qulity o (), (), t.. Anothr I Us -pss lgorithm: irst (ottom-up) phs to mrk sutrs with iltr inormtion. Son (top-own) phs to trmin mth nos. Why is this intrsting? W n lso onstrut utomt or iltr xprssions! Fst min mmory vlution Us isk s intrmit stor (strm twi) Us push-own or potntil nits. Push-Down Automton n proly sign so tht it pops/outputs nits s rly s possil. 47 48 8
5. Strming Pth Algorithms 5. Strming Pth Algorithms Filtr n YFiltr [Altinl n Frnklin 00] [Dio t l 0] -sn [Ivs, Lvy, n Wl 00] MLTK [Avil-Cmpillo t l 0] Tri [Chn t l 0] SPE [Oltnu, Kisling, n Bry 0] Lzy DFAs [Grn t l 0] Th Push Mhin [Gupt n Suiu 0] SQ [Png n Chwth 0] TuroPth [Josiovski, Fontour, n Brt 04] Som ollowing slis r y T. Amgs n M Onizuk (Jpn) S http://www.s07.it..th/dasfaa007_tutoril_.p Most o th ollowing slis r y Dn Suiu (th ov slis r Atully lso s on Suiu s slis ) S http://www.s.wshington.u/homs/suiu/tlk-spir00.ppt 49 50 9
0
Bsi NFA Evlution Proprtis: Sp = linr Throughput = rss linrly Systms: Filtr [Altinl&Frnklin 99], YFiltr. Tri [Chn t l. 0] Bsi DFA Evlution Proprtis: Throughput = onstnt! Sp = GOOD QUESTION Systm: ML olkit [Univrsity o Wshington] http://xmltk.sourorg.nt Th Siz o th DFA Thorm [GMOS 0] Th numr o stts in th DFA or on linr Pth xprssion P is t most: k+ P k s m k = numr o // s = siz o th lpht (numr o tgs) m = mx numr o * twn two onsutiv // Siz o DFA: Multipl Exprssions //stion//ootnot //tl//ootnot //igur//ootnot........ //strt//ootnot 00 xprssions 00 00 stts!!!! Thr is thorm hr too, ut it s not usul Solution: Comput th DFA Lzily Also us in txt srhing But will it work or 0 6 Pth xprssions? YES! For Pth it is provly tiv, or two rsons: ML t is not vry p Th nsting strutur in ML t tns to pritl
Lzy DFA n Simpl DTDs Doumnt Typ Dinition (DTD) Prt o th ML stnr Will rpl y ML Shm Exmpl DTD: <!ELEMENT oumnt (stion*)> <!ELEMENT stion ((stion strt tl igur)*)> <!ELEMENT igur (tl?,ootnot*)>.......... Dinition A DTD is simpl i ll yls r loops Lzy DFA n Simpl DTDs Lzy DFA n Simpl DTDs igur Simpl DTD: oumnt stion ootnot tl strt Pth xprssions //stion//ootnot //tl//ootnot //igur//ootnot //strt//ootnot Egr DFA rmmrs 4 sts Lzy DFA rmmrs only 4 sts Thorm [GMOS 0] I th ML t hs simpl DTD, thn lzy DFA hs t most: +D(+n) stts. n = mx pths o Pth xprssions D = siz o th unol DTD = mx pths o sl-loops in th DTD Ft o li: Dt-lik ML hs simpl DTDs Lzy DFA n Dt Guis Non-simpl DTDs r uslss or th lzy DFA Evrything my ontin vrything Lzy DFA n Dt Guis Dinition [Golmn&Wiom 97] Th t gui or n ML t instn is th Tri o ll its root-to-l pths <!ELEMENT oumnt (stion*)> <!ELEMENT stion ((stion tl igur strt ootnot)*)> <!ELEMENT tl ((stion tl igur strt ootnot)*)> <!ELEMENT igur ((stion tl igur strt ootnot)*)> <!ELEMENT strt ((stion tl igur strt ootnot)*)> Ft o li: Txt -lik ML hs non-simpl DTDs
Lzy DFA n Dt Guis Lzy DFA n Simpl DTDs ML Dt stion stion tl tl tl oumnt stion stion igur stion tl Dt Gui stion tl igur oumnt stion igur stion tl igur Thorm [GMOS 0] I th ML t hs t gui with G nos, thn th numr o stts in th lzy DFA is t most: +G Ft o li: rl ML t hs smll t gui [Lik&S. 00] G = numr o nos in th t gui 00000 0000 000 Numr o Lzy DFA Stts - SYNTHETIC Dt 0 Pth 0 4 Pth 0 5 Pth 4000 stts 00000 0000 000 40000 stts G = 50000 Numr o Lzy DFA Stts - REAL Dt 0 Pth 0 4 Pth 0 5 Pth 95 stts 00 00 0 0 simpl prov BPSS protin ns trnk protin ns trnk Throughput or 0, 0 4, 0 5, 0 6 Pth xprssions 00MB/s 0MB/s [ pro(*)=0%, pro(//)=0% ] Prsr: 0MB/s Lzy DFA: 5.4MB/s END Ltur 9 MB/s 0.MB/s 0.0MB/s 0.00MB/s 0.000MB/s 5MB 0MB 5MB 0MB 5MB tl input siz prsr lzydfa (0 Pth) lzydfa (0 4 Pth) lzydfa (0 5 Pth) lzydfa (0 6 Pth) xiltr (0 Pth) xiltr (0 4 Pth) xiltr(0 5 Pth) xiltr(0 6 Pth) 78