Towards Efficient String Processing of Annotated Events David Woods 1 Tim Fernando 2 Carl Vogel 2 1 ADAPT Centre Trinity College Dublin, Ireland 2 Computational Linguistics Group Trinity Centre for Computing and Language Studies School of Computer Science Trinity College Dublin, Ireland ISA-13, 2017
Motivation ISO-TimeML ISO-TimeML Fragment
Motivation TLINKs TLINKs in an ISO-TimeML Document
Motivation TLINKs Examples 1 <TLINK reltype="is INCLUDED" eventinstanceid="ei1" relatedtotime="t1"/> 2 <TLINK reltype="is INCLUDED" timeid="t1" relatedtoeventinstance="ei9"/> 3 <TLINK reltype="before" eventinstanceid="ei9" relatedtoeventinstance="ei10"/>
Motivation Allen Relations Allen Relations Allen (1983, p835, Fig. 2)
Motivation Allen Relations Example Example John slept through the fire alarm last Tuesday. This sentence gives us two events, and one time period: 1 js = John slept (event) 2 fa = a fire alarm occurred (event) 3 lt = last Tuesday (time period) We can represent the information with the binary Allen Relations: js di fa js d lt
Introduction Strings as Models Strings as Models We can use strings as models to effectively represent this event data. Example John slept through the fire alarm last Tuesday. lt lt, js lt, js, fa lt, js lt
Introduction Sets as Symbols Sets as Symbols Fix a finite set A of fluents. Fluents will be understood as naming an event instance (or time) in ISO-TimeML. We encode finite sets of these fluents as symbols, which may appear in a string.
Introduction Event-Strings Event-Strings A string s = α 1 α n of subsets α i of A can be construed as a finite model consisting of n moments of time i {1,..., n}. Each α i specifies all fluents in A that hold simultaneously at i. Each α i is understood to occur chronologically before α j if and only if i < j. The powerset 2 A of A will serve as an alphabet Σ = 2 A of an event-string s Σ +.
Introduction No Time Without Change No Time Without Change But neither does time exist without change Aristotle, Physics IV
Introduction No Time Without Change No Time Without Change The precise real-time duration of each symbol is disregarded (for now). Event-strings model a kind of inertial world. Change is the only marker of progression from one moment to the next.
Superposition and Block Compression Superposition Superposition In order to usefully collect information from multiple strings into a single string, we define the operation of superposition: Definition With two strings s and s of equal length, their superposition, s & s, is their componentwise union: α 1 α n & α 1 α n := (α 1 α 1) (α n α n)
Superposition and Block Compression Superposition Box Notation For convenience of notation, we draw boxes rather than curly braces { } to represent sets of fluents in an event-string. Example With a, b, c, d A: a c & b d = a, b c, d
Superposition and Block Compression String Manipulation Stutter We can cause a string s = α 1 α n to stutter such that α i = α i+1 for some integer 0 < i < n. For example, a a a c c is a stuttering version of a c. Since the realtime duration of each box is not taken into account, the interpretation of the string is unaffected.
Superposition and Block Compression String Manipulation Block Compression We can transform a stuttering string to a stutterless string through block compression: Definition bc(s) := s if length(s) 1 bc(αs ) if s = ααs α bc(α s ) if s = αα s with α α Thus, bc( a a a c c ) = a c.
Superposition and Block Compression String Manipulation Inverse Block Compression We can generate infinitely many stuttering strings, all of which are bc-equivalent: Example bc 1 ( a c ) = { a c, a a c, a c c,...} = a + c + Precisely, a string s is bc-equivalent to a string s iff s bc 1 bc(s), and s bc 1 bc(s) iff bc(s) = bc(s ).
Superposition and Block Compression Asynchronous Superposition Asynchronous Superposition This gives our initial definition of asynchronous superposition: Definition (Initial) The asynchronous superposition of two strings s and s is the set of strings obtained by block compressing the results of superposing the strings which are bc-equivalent to s and s : s & s := {bc(s ) s bc 1 bc(s) & bc 1 bc(s )} Example a c & b d = { a, b c, d, a, b a, d c, d, a, b b, c c, d }
Superposition and Block Compression Asynchronous Superposition Upper Bound on Asynchronous Superposition We can improve this definition. It can be shown that for two strings of length n and n, the longest string produced by asynchronous superposition which has no bc-equivalent strings will be of length n + n 1. Thus, for any integer k > 0 and string s, we introduce a new operation pad k (s) which will generate the set of strings with length k which are bc-equivalent to s. Definition pad k (s) = bc 1 (s) Σ k
Superposition and Block Compression Asynchronous Superposition Upper Bound on Asynchronous Superposition An improved definition of asynchronous superposition, which puts a clear finite bound on the infinite language generated by inverse block compression: Definition (Improved) For any s, s Σ + with nonzero lengths n and n respectively, s & s = {bc(s ) s pad n+n 1(s) & pad n+n 1(s )}
Event Representation Allen Relations Bounding Boxes We use the empty box as a string of length 1 (not to be confused with the empty string ɛ, which is length 0) to bound events, allowing us to represent the fact that they are finite. Asynchronous superposition allows us to generate the 13 strings in e & e, each of which corresponds to one of the unique Allen Relations, and also one of the relation types in ISO-TimeML s TLINKs.
Event Representation Allen Relations Allen Relations as Event-Strings e = e e, e equal e s e e, e e starts e si e e, e e starts (inverse) e f e e e, e finishes e fi e e e, e finishes (inverse) e d e e e, e e during e di e e e, e e during (inverse) e o e e e, e e overlaps e oi e e e, e e overlaps (inverse) e m e e e meets e mi e e e meets (inverse) e < e e e before e > e e e after
Event Representation Allen Relations Three Unconstrained Bounded Events e & e & e = { e, e, e, e e, e, e, e e, e, e e, e, e, e e, e, e, e, e, e e, e, e e, e, e e, e, e, e e, e, e, e e, e e, e, e e, e, e e, e, e e, e, e, e e, e, e e, e, e e, e e, e, e, e, e e, e, e, e, e e, e, e e, e e, e, e, e e, e e, e, e e e, e, e e e, e, e e, e e, e, e, e e, e e, e, e, e e e, e, e e, e e e, e, e e, e e, e, e, e e, e, e e, e, e, e e, e, e e, e, e e, e e, e, e e, e, e e, e e, e e, e, e e, e, e e, e, e e, e e, e e, e, e e, e e, e e e, e, e e, e e, e, e e, e, e e, e e, e, e e, e e, e e, e, e e, e, e e, e e, e, e e, e e, e e,... }
Constraints on Event-Strings Well-formed Event-Strings Constraints How to prevent unnecessary over-generation?
Constraints on Event-Strings Well-formed Event-Strings Reduct The reduct operation will help to identify well-formed event-strings: Definition The reduct ρ X (s) for any X A and event-string s produces a componentwise intersection of s with X : ρ X (α 1 α n ) := (α 1 X ) (α n X ) Example With a, b A: bc(ρ {a} ( a a, b b )) = a
Constraints on Event-Strings Well-formed Event-Strings Well-formed Event-Strings Fluents are interval-like. Thus for any event-string s and any e A, bc(ρ {e} (s)) = e (or, if e doesn t appear in s). Relations are consistent. For example, if the relations e > e and e > e hold, then the relation e > e cannot also hold. We may discard any event-string which is not well-formed.
Constraints on Event-Strings Well-formed Event-Strings Constrained Superposition When a fluent appears in two different strings, s and s, which are to be asynchronously superposed, the number of well-formed results is usually reduced. Example The fluent b appears in both strings, yielding only one well-formed result: a b & b c = a b c Without the constraint of being well-formed, the above example would generate 270 strings, rather than 1.
Constraints on Event-Strings Multiple Events Transitivity Table Fragment before b c during c b, c c meets b c a c b, c c, before a b a b c a a, c c b, c c, a c b, c c, c a, c c b, c c, a b c a, c c b, c c during b a, b b b a, b b c c b, c a, b, c b, c c b a, b b c meets a b a b c a a, c b, c c, c a, c b, c c, a, c b, c c a b c
Constraints on Event-Strings Multiple Events Arbitrary Events No matter how many events feature in an event-string, applying the reduct ρ {e,e } and block compressing (where e and e are the events we are interested in) will give the event-string which corresponds to the Allen Relation between e and e. For example, given a > b, b > c, and c > d we can deduce a > d:
Constraints on Event-Strings Multiple Events Arbitrary Events 1 bc(ρ {a,d} ( a b & b c & c d )) 2 bc(ρ {a,d} ( a b c d )) 3 bc( a d ) 4 a d 5 a > d
Applied to ISO-TimeML Translating TLINKs Example TLINKs 1 <TLINK reltype="is INCLUDED" eventinstanceid="ei1" relatedtotime="t1"/> 2 <TLINK reltype="is INCLUDED" timeid="t1" relatedtoeventinstance="ei9"/> 3 <TLINK reltype="before" eventinstanceid="ei9" relatedtoeventinstance="ei10"/>
Applied to ISO-TimeML Translating TLINKs TLINKs as Allen Relations 1 ei1 d t1 2 t1 d ei9 3 ei9 > ei10
Applied to ISO-TimeML Translating TLINKs TLINKs as Event-Strings 1 t1 ei1, t1 t1 2 ei9 t1, ei9 ei9 3 ei9 ei10
Applied to ISO-TimeML Translating TLINKs Combining Information t1 ei1, t1 t1 & ei9 t1, ei9 ei9 & ei9 ei10 = ei9 t1, ei9 ei1, t1, ei9 t1, ei9 ei9 ei10
Applied to ISO-TimeML Translating TLINKs Extracting New Information 1 bc(ρ {ei1,ei10} ( ei9 t1, ei9 ei1, t1, ei9 t1, ei9 ei9 ei10 )) 2 bc( ei1 ei10 ) 3 ei1 ei10 4 ei1 > ei10
Further Work Deciding when to use asynchronous superposition (too many generated strings may not be worth it). Developing the framework to treat event types and include more information (durations, etc.).
Acknowledgements This research is supported by Science Foundation Ireland (SFI) through the CNGL Programme (Grant 12/CE/I2267) in the ADAPT Centre (https://www.adaptcentre.ie) at Trinity College Dublin. The ADAPT Centre for Digital Content Technology is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund. Thank you for listening!