Graphs and Trees: cycles detecton and stream segmentaton Lorenzo Con Dpartmento d Informatca Largo Pontecorvo 3 Psa lcon@d.unp.t
Man topcs of the talk Two algorthms: segmentaton of a stream of data applcaton: syllabfcaton of wrtten Italan cycles detecton n drected (even complete) graphs applcaton: graphc edtor for graphs edtng wth cycles detecton, desgned to be used for the defnton of drected graphs used n causal loop graphs (system dynamcs)
Trees and segmentaton RB-tree (paper of mne, 996): syllabfcaton of wrtten Italan; bnary tree; recursve calls; vowels/consonants. m-ary tree, under development m drect descendants of each node; recursve calls; alphabet of m dstnct symbols.
Trees and segmentaton 2 Lexcon A alphabet, A =m S strngs on A, length n, potentally unbound S={s + s A} M, set of markers M A = Ø R, rules, fxed and fnte set, R={r,..., r k } W, weghts, fxed and fnte set, W={w,..., w k } r w one-to-one correspondence Lexcon/Syntax A={α 0,..., α m- } β=m base of the numberng scheme Ŵ={w j w j = =0,k β l } k=[0,..., K], K max number of elements of a rule W Ŵ dentfes the set of weghts to whch corresponds a rule m-ary tree dynamcally bult, no permanent data structures rules as weghts
Trees and segmentaton 3 nput stream n n 2 n 3...n... n k... wth n A ŵ= n β where n s a codng of each nput symbol (cf. the examples) f ŵ W then t dentfes a rule r j formally we have: we defne a runnng sum ŵ= n β and, at each step, we check f ŵ W or not. In the former case we apply the correspondng rule r j otherwse we step to the followng nput symbol and update ŵ.
Trees and segmentaton 4 evaluaton of ŵ check f ŵ W, cost O(), two cases: no: we ncrement k (the ponter on the nput stream) by one (k=k+) and evaluate a new value of ŵ as ŵ=ŵ+n k β k yes: we apply the correspondng rule r and reset to zero both the ponter k and the runnng sum ŵ the applcaton of a rule s equvalent to the nserton of: one marker: after, segmentaton nsde, syllabfcaton two markers, one before and one after, extracton
Unformty: Propertes of the rules we have unformty f every rule nvolves the same number of nput symbols otherwse we have a value max that defne the longest rule[s] Completeness: a set of rules s complete f there s a rule for every value of the runnng sum from 0 to β - There s no relaton between the two propertes Unformty and completeness translate n propertes of the structure of the m-ary tree, dynamcally bult
f w r then Executon of the rules else apply r reset ponter k and runnng sum k=k+ update runnng sum wth the current nput symbol apply = nsert, n the proper poston[s] one or two markers
Example A={a,b}, b 0, a, M={*}, S={s + s A} completeness and unformty r 0 =bb w 0 =0; r =ab w =;r 2 =ba w 2 =2; r 3 =aa w 3 =3 R={r 0,r,r 2,r 3 } W={w 0,w,w 2,w 3 } w a rule that defnes where to nsert the marker[s] r 0,r 3 marker nsde aa a*a, bb b*b r,r 2 marker after ab ab*, ba bab* abbbaaab... ab*b*ba*aab*...
Example contnued A={a,b}, b 0, a, M={*}, S={s + s A} no completeness but unformty r 0 =bb w 0 =0; r 3 =aa w 3 =3 r 0,r 3 marker before and after abbbaaab... ab*bb**aa*ab... no completeness and no unformty conflctng rules? used n syllabfcaton
Example contnued r =ab w =; r =aba w =5 conflct: the former covers the 0 0 latter no completeness no unformty (cf. fgure) r =ab w =; 0 0 r =bab w =2 r =bba w =4 2 2
Example 2: syllabfcaton Man features: marker nsde no unformty completeness lookahead & stepback vs. stepforward Two versons: smple verson: small alphabet, complex rules; complex verson: bg alphabet, smpler rules
Example 2 contnued Smple verson: alphabet A: vowels V (wth/wthout stress), consonants C, separators S (spaces, tabs) and punctuaton marks P A= V C S P markers M={-} weghts W={w } rules R={r } r w Rules rules are appled wth a lookahead and a stepback wth a recursve call rules are complex snce we have to dscrmnate many cases wthn a rule
Example 2 contnued Some rules of smple verson: v V, v, c C, c 0 w 0 = 0 r 0 =cvcv cv-cv w 9 = 9 r 9 =vc c 2 v some cases: f c =c 2 then vc -c 2 v f c =n then vc -c 2 v f c =s then v-c c 2 v Examples (n blue substrngs analysed wth recursve calls) pallone 9 pal-lone 0 pal-lo-ne asmatco 9 a-smatco 0 a-sma-tco 0 a-sma-t-co
Example 2 contnued Complex verson (under development): alphabet A'=A V n C 2 C 3... alphabet A' contans old alphabet plus groups of vowels (V n ) and of two (C 2 ), three (C 3 ) consonants and all the relevant subgroups markers M={-} weghts W={w }, rules R={r },r w β= A' Rules rules are appled wth a lookahead and a stepback wth a recursve call we have more rules but each rule s smpler snce t contans a small set of sub-cases
Segmentaton : computatonal complexty Constant number of rules Input stream of n symbols. Two cases: n fxed (not really relevant for complexty); n not known a pror. Only stepforward complexty O(n) Wth lookahead and stepback complexty > O(n) complexty < O(n 2 ) complexty O(n log n)??
Cycles detecton n drected graphs Data structures G=(N,E) drected graph, N =n, E =m A alphabet A={a } =,..., n S=Â={a...a k a A k 2} a n N, (a l,a k ) (n l,n k ) E Three man steps mappng: from graphs to strngs searchng substrngs wth gven propertes (every substrng dentfes a cycle) prunng of equvalent substrngs (duplcate cycles)
Cycles detecton n drected graphs 2 Data structures G=(N,E) drected graph, N =n, E =m m=n(n-) at the most number of cycles (complete graph): Frst example fve cycles: ABBA, BCCB, ACCA, ABBCCA (ant clock wse), ACCBBA (clock wse) CBBAAC and ACCBBA equvalent through shft left BCCAAB and ABBCCA equvalent through shft left
Cycles detecton n drected graphs 3 Second example three cycles: ABBA BCCB, ABBCCA casual mappng: ABCBCABCBA lexcographc mappng (by nodes order and by forward star): ABBABCCACB
Cycles detecton n drected graphs 4 Second example contnued cycles detecton n ABBABCCACB (head node n blue, tal node n black, subcycles n talcs) ABBABCCACB ABBA cycle, scan ABBCCB subcycle (premature closure), dscard ABBCCA cycle, 2 scans ABBABCCACB BAAB duplcate cycle, 2 scans ABBABCCACB BCCAAB duplcate cycle, 2 scans BCCB cycle, scan ABBABCCACB CAABBA subcycle, dscard CAABBC duplcate cycle, 2 scans and so on...
Operatons Mappng g G=(N,E) s S = {s = aa + a A} unform vs. non unform number of symbols for each node dentfer modes of mappng: casual, lexcographc Searchng number of scans: maxmum n- wth n= N search operatons produces a lst of substrngs, one for each (duplcate) cycle Prunng (or removal of duplcate cycles) ex-ante ex-post
contguty_check (cc) Search Operatons fnds couple of contguous arcs (parwse underlned), returns a boolean premature_closure (pc) fnds subcycles (.e. cyclcal substrngs) to be dscarded (velvet) condton: head tal closure (cl) fnds a cycle as represented by a cyclc substrng wthout pc condton: head (blue) = tal (talc blue) Examples ABB CCACB ABB CCBCA ABB CCA
Prunng removes duplcate cycles. e. substrngs of equal length and equvalent under left/rght shft ex-ante durng the search phase does not nsert equvalent substrngs n the lst, usually one sngle growng vector C[] ex-post after the search phase two man cases: one sngle unordered vector C[] of k elements a set of vectors C [] =2,..., n, each of k elements, some even empty all substrngs contaned n the lst created at the end of the search phase are examned and equvalent substrngs are removed
Prunng 2 ex-ante to each arc (of two dstnct nodes) corresponds a strng s S so that to a chan of arcs corresponds a chan of strngs s s 2 s 3...s k C [] s ether empty (at the very begnnng) or contans the substrngs cycles defned up to a certan pont we have an terated algorthm: at step (=,..., k) we check f s C [] and defne C + []={c j C [] c j s } f the cycle contans k arcs and the correspondng strng s of 2k symbols and C k [] then the current substrng corresponds to a duplcate cycle and must be dscarded f, at any step, we have C []= then the current strng (f satsfes the aforesad propertes) corresponds to a new cycle and must be added to C []
Prunng 3 ex-post frst case: one sngle unordered vector C[] of k elements we have a cycle for = to k-, at each step we have: confront C[] wth all the shfted versons of C[j] for j=+ to k every matchng strng must be removed from C[] whose number of elements reduces at the end C[] contans the resdual substrngs to each of whch corresponds a cycle of the gven graph second case: a set of vectors C [] =2,..., n, each of k elements, some even empty we repeat the algorthm we have desgned for the sngle vector case for all the non empty vectors C []
Applcatons of the algorthm cycles detecton connectvty? more?
Complexty of the algorthm mappng (complete graph of n nodes): O(n 2 ) searchng (complete graph of n nodes): O(n n- )?? prunng: ex-ante, costly ex-post, very costly
Concludng remarks so many thngs to do and so a short tme...
Game Over... Thank you for your attenton Lorenzo Con Dpartmento d Informatca Largo Pontecorvo 3 Psa lcon@d.unp.t