Learning Dependency-Based Compositional Semantics Semantic Representations for Textual Inference Workshop Mar. 0, 0 Percy Liang Google/Stanford joint work with Michael Jordan and Dan Klein
Motivating Problem: Question Answering
Motivating Problem: Question Answering What is the largest city in California?
Motivating Problem: Question Answering What is the largest city in California? What is the largest city in a state bordering California?
Semantic Interpretation 4
Semantic Interpretation What is the largest city in a state bordering California? Phoenix 4
Semantic Interpretation What is the largest city in a state bordering California?? Phoenix 4
Semantic Interpretation What is the largest city in a state bordering California? argmax({c :city(c)^9s.state(s) ^ loc(c, s) ^ border(s,ca)}, population) Phoenix 4
Semantic Interpretation What is the largest city in a state bordering California? argmax({c :city(c) ^9s.state(s) ^ loc(c, s)^border(s,ca)}, population) Phoenix 4
Semantic Interpretation What is the largest city in a state bordering California? argmax({c :city(c) ^9s.state(s) ^ loc(c, s) ^ border(s,ca)}, population) Phoenix 4
Semantic Interpretation What is the largest city in a state bordering California? argmax({c : city(c) ^9s.state(s) ^ loc(c, s) ^ border(s,ca)}, population) Phoenix 4
Semantic Interpretation What is the largest city in a state bordering California? argmax({c : city(c) ^9s.state(s) ^ loc(c, s) ^ border(s,ca)}, population) computation Phoenix 4
Semantic Interpretation What is the largest city in a state bordering California?? computation Phoenix 4
Supervision for Semantic Interpretation 6
Supervision for Semantic Interpretation Detailed Supervision (current) What is the largest city in California? argmax({c : city(c) ^ loc(c,ca)}, population) 6
Supervision for Semantic Interpretation Detailed Supervision (current) What is the largest city in California? expert argmax({c : city(c) ^ loc(c,ca)}, population) 6
Supervision for Semantic Interpretation Detailed Supervision (current) -doesn tscaleup What is the largest city in California? expert argmax({c : city(c) ^ loc(c,ca)}, population) 6
Supervision for Semantic Interpretation Detailed Supervision (current) -doesn tscaleup What is the largest city in California? expert argmax({c : city(c) ^ loc(c,ca)}, population) Natural Supervision (new) What is the largest city in California? Los Angeles 6
Supervision for Semantic Interpretation Detailed Supervision (current) -doesn tscaleup What is the largest city in California? expert argmax({c : city(c) ^ loc(c,ca)}, population) Natural Supervision (new) What is the largest city in California? Los Angeles non-expert 6
Supervision for Semantic Interpretation Detailed Supervision (current) -doesn tscaleup What is the largest city in California? expert argmax({c : city(c) ^ loc(c,ca)}, population) Natural Supervision (new) -scalesup What is the largest city in California? non-expert Los Angeles 6
Supervision for Semantic Interpretation Detailed Supervision (current) -doesn tscaleup - representation-dependent What is the largest city in California? expert argmax({c : city(c) ^ loc(c,ca)}, population) Natural Supervision (new) -scalesup What is the largest city in California? non-expert Los Angeles 6
Supervision for Semantic Interpretation Detailed Supervision (current) -doesn tscaleup - representation-dependent What is the largest city in California? expert argmax({c : city(c) ^ loc(c,ca)}, population) Natural Supervision (new) -scalesup - representation-independent What is the largest city in California? Los Angeles non-expert 6
Outline Representation loc state border CA city major traverse river traverse AZ x Learning z w y Experiments 9
Considerations Computational: how to e ciently search exponential space? 0
Considerations Computational: how to e ciently search exponential space? What is the most populous city in California? Los Angeles 0
Considerations Computational: how to e ciently search exponential space? What is the most populous city in California? x.state(x) Los Angeles 0
Considerations Computational: how to e ciently search exponential space? What is the most populous city in California? x.city(x) Los Angeles 0
Considerations Computational: how to e ciently search exponential space? What is the most populous city in California? x.city(x) ^ loc(x, CA) Los Angeles 0
Considerations Computational: how to e ciently search exponential space? What is the most populous city in California? x.state(x) ^ border(x, CA) Los Angeles 0
Considerations Computational: how to e ciently search exponential space? What is the most populous city in California? population(ca) Los Angeles 0
Considerations Computational: how to e ciently search exponential space? What is the most populous city in California? argmax( x.city(x) ^ loc(x, CA), x.population(x)) Los Angeles 0
Considerations Computational: how to e ciently search exponential space? What is the most populous city in California? LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF Los Angeles 0
Considerations Computational: how to e ciently search exponential space? What is the most populous city in California? LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF Los Angeles Statistical: how to parametrize mapping from sentence to logical form? What is the most populous city in California? argmax( x.city(x) ^ loc(x, CA), x.population(x)) 0
Dependency-Based Compositional Semantics (DCS) What is the most populous city in California?
Dependency-Based Compositional Semantics (DCS) What is the most populous city in California? c argmax population city loc CA Los Angeles
Dependency-Based Compositional Semantics (DCS) What is the most populous city in California? c argmax population city loc CA Los Angeles Advantages of DCS: nice computational, statistical, linguistic properties
Where do the answers come from? What is the most populous city in California? c argmax population city loc CA Los Angeles
Where do the answers come from? What is the most populous city in California? c argmax population city loc CA Database Los Angeles
Database city San Francisco Chicago Boston loc Mount Shasta California San Francisco California Boston Massachusetts state Alabama Alaska Arizona border Washington Oregon Washington Idaho Oregon Washington 3
Basic DCS Trees DCS tree Database city loc CA 4
Basic DCS Trees DCS tree Constraints Database city loc CA ADCStreeencodesaconstraint satisfaction problem (CSP) 4
Basic DCS Trees DCS tree Constraints Database city loc c city city San Francisco Chicago Boston CA ADCStreeencodesaconstraint satisfaction problem (CSP) 4
DCS tree Basic DCS Trees Constraints Database city loc CA c city ` loc city San Francisco Chicago Boston loc Mount Shasta California San Francisco California Boston Massachusetts ADCStreeencodesaconstraint satisfaction problem (CSP) 4
DCS tree Basic DCS Trees Constraints Database city loc CA c city ` loc s CA city San Francisco Chicago Boston loc Mount Shasta California San Francisco California Boston Massachusetts CA California ADCStreeencodesaconstraint satisfaction problem (CSP) 4
DCS tree Basic DCS Trees Constraints Database city loc CA c city c = ` ` loc s CA city San Francisco Chicago Boston loc Mount Shasta California San Francisco California Boston Massachusetts CA California ADCStreeencodesaconstraint satisfaction problem (CSP) 4
DCS tree Basic DCS Trees Constraints Database city loc CA c city c = ` ` loc ` = s s CA city San Francisco Chicago Boston loc Mount Shasta California San Francisco California Boston Massachusetts CA California ADCStreeencodesaconstraint satisfaction problem (CSP) 4
DCS tree Basic DCS Trees Constraints Database city loc CA c city c = ` ` loc ` = s s CA city San Francisco Chicago Boston loc Mount Shasta California San Francisco California Boston Massachusetts CA California ADCStreeencodesaconstraint satisfaction problem (CSP) 4
DCS tree Basic DCS Trees Constraints Database city loc CA c city c = ` ` loc ` = s s CA city San Francisco Chicago Boston loc Mount Shasta California San Francisco California Boston Massachusetts CA California ADCStreeencodesaconstraint satisfaction problem (CSP) 4
DCS tree Basic DCS Trees Constraints Database city loc CA c city c = ` ` loc ` = s s CA city San Francisco Chicago Boston loc Mount Shasta California San Francisco California Boston Massachusetts CA California ADCStreeencodesaconstraint satisfaction problem (CSP) 4
DCS tree Basic DCS Trees Constraints Database city loc CA c city c = ` ` loc ` = s s CA city San Francisco Chicago Boston loc Mount Shasta California San Francisco California Boston Massachusetts CA California ADCStreeencodesaconstraint satisfaction problem (CSP) Computation: dynamic programming ) time = O(# nodes) 4
Properties of DCS Trees loc state border CA city traverse river major traverse AZ 5
Properties of DCS Trees loc state border CA city traverse river major traverse AZ Trees 5
Properties of DCS Trees loc state border CA city traverse river major traverse AZ Linguistics Trees syntactic locality 5
Properties of DCS Trees loc state border CA city traverse river major traverse AZ Linguistics Trees Computation syntactic locality e cient interpretation 5
Divergence between Syntactic and Semantic Scope most populous city in California 6
Divergence between Syntactic and Semantic Scope most populous city in California Syntax city populous in most California 6
Divergence between Syntactic and Semantic Scope most populous city in California Syntax Semantics city populous in argmax( x.city(x) ^ loc(x, CA), x.population(x)) most California 6
Divergence between Syntactic and Semantic Scope most populous city in California Syntax Semantics city populous in argmax( x.city(x) ^ loc(x, CA), x.population(x)) most California 6
Divergence between Syntactic and Semantic Scope most populous city in California Syntax Semantics city populous in argmax( x.city(x) ^ loc(x, CA), x.population(x)) most California Problem: syntactic scope is lower than semantic scope 6
Divergence between Syntactic and Semantic Scope most populous city in California Syntax Semantics city populous in argmax( x.city(x) ^ loc(x, CA), x.population(x)) most California Problem: syntactic scope is lower than semantic scope If DCS trees look like syntax, how do we get correct semantics? 6
Solution: Mark-Execute most populous city in California Superlatives x c argmax population city loc CA 7
Solution: Mark-Execute most populous city in California Superlatives x Mark at syntactic scope c argmax population city loc CA 7
most populous city in California Solution: Mark-Execute Superlatives Execute at semantic scope Mark at syntactic scope population c argmax x city loc CA 7
Alaska borders no states. Solution: Mark-Execute Negation Execute at semantic scope Mark at syntactic scope AK x border state q no 7
Some river traverses every city. Solution: Mark-Execute Quantification (narrow) Execute at semantic scope Mark at syntactic scope river q x traverse city q some every 7
Some river traverses every city. Solution: Mark-Execute Quantification (wide) Execute at semantic scope Mark at syntactic scope river q x traverse city q some every 7
Some river traverses every city. Solution: Mark-Execute Quantification (wide) Execute at semantic scope Mark at syntactic scope river q x traverse city q some every Analogy: Montague s quantifying in, Carpenter s scoping constructor 7
Outline Representation loc state border CA city major traverse river traverse AZ x Learning z w y Experiments 8
Graphical Model z capital CA database w 9
Graphical Model z capital CA database w y Sacramento 9
Graphical Model z database capital CA w y Sacramento Interpretation: p(y z, w) (deterministic) 9
Graphical Model x capital of California? z database capital CA w y Sacramento Interpretation: p(y z, w) (deterministic) 9
Graphical Model x capital of California? parameters z database capital CA w y Sacramento Interpretation: p(y z, w) (deterministic) 9
Graphical Model parameters database x z capital of California? capital CA w y Sacramento Semantic Parsing: p(z x, ) (probabilistic) Interpretation: p(y z, w) (deterministic) 9
Plan x capital of California? parameters z capital What s possible? z Z(x) What s probable? p(z x, ) CA Learning from (x, y) data database w y Sacramento 0
Words to Predicates (Lexical Semantics) What is the most populous city in CA?
Words to Predicates (Lexical Semantics) CA What is the most populous city in CA? Lexical Triggers:. String match CA ) CA
Words to Predicates (Lexical Semantics) argmax What is the most populous city in CA? CA Lexical Triggers:. String match CA ) CA. Function words (0 words) most ) argmax
Words to Predicates (Lexical Semantics) city state river city state river argmax population population CA What is the most populous city in CA? Lexical Triggers:. String match CA ) CA. Function words (0 words) most ) argmax 3. Nouns/adjectives city ) city state river population
Predicates to DCS Trees (Compositional Semantics) C i,j = set of DCS trees for span [i, j] i most populous city in California j
Predicates to DCS Trees (Compositional Semantics) C i,j = set of DCS trees for span [i, j] i most populous k city in California j
Predicates to DCS Trees (Compositional Semantics) C i,j = set of DCS trees for span [i, j] C i,k C k,j i most populous k city in California j
Predicates to DCS Trees (Compositional Semantics) C i,j = set of DCS trees for span [i, j] population city c argmax loc CA C i,k C k,j i most populous k city in California j
Predicates to DCS Trees (Compositional Semantics) C i,j = set of DCS trees for span [i, j] city population loc population c c argmax CA city argmax loc CA C i,k C k,j i most populous k city in California j
Predicates to DCS Trees (Compositional Semantics) C i,j = set of DCS trees for span [i, j] city population loc population c c argmax CA city argmax loc CA C i,k C k,j i most populous k city in California j
Predicates to DCS Trees (Compositional Semantics) C i,j = set of DCS trees for span [i, j] city loc loc population c population CA city argmax c argmax loc CA C i,k C k,j i most populous k city in California j
Predicates to DCS Trees (Compositional Semantics) C i,j = set of DCS trees for span [i, j] city loc loc population c population CA city argmax c argmax loc CA C i,k C k,j i most populous k city in California j
Predicates to DCS Trees (Compositional Semantics) C i,j = set of DCS trees for span [i, j] city border loc population c population CA city argmax c argmax loc CA C i,k C k,j i most populous k city in California j
Predicates to DCS Trees (Compositional Semantics) C i,j = set of DCS trees for span [i, j] population c argmax city population c argmax loc CA city loc CA C i,k C k,j i most populous k city in California j
Plan x capital of California? parameters z capital What s possible? z Z(x) What s probable? p(z x, ) CA Learning from (x, y) data database w y Sacramento 3
Log-linear Model z: city loc CA x: city in California 4
Log-linear Model z: city loc CA x: city in California features(x, z) =( ) R d 4
Log-linear Model z: city loc CA x: city in California in features(x, z) =( loc :) R d 4
Log-linear Model z: city loc CA x: city in California in loc : city loc features(x, z) =( :) R d 4
Log-linear Model z: city loc CA x: city in California in loc : city loc : features(x, z) =( ) R d 4
Log-linear Model z: city loc CA x: city in California in loc : city loc : features(x, z) =( R d ) score(x, z) =features(x, z) 4
Log-linear Model z: city loc CA x: city in California in loc : city loc : features(x, z) =( ) R d score(x, z) =features(x, z) p(z x, ) = P e score(x,z) z 0 Z(x) escore(x,z0 ) 4
Plan x capital of California? parameters z capital What s possible? z Z(x) What s probable? p(z x, ) CA Learning from (x, y) data database w y Sacramento 5
Learning Objective Function: p(y z,w) p(z x, ) Interpretation Semantic parsing 6
Learning Objective Function: max p(y z,w) p(z x, ) Interpretation Semantic parsing 6
Learning Objective Function: max Pz p(y z,w) p(z x, ) Interpretation Semantic parsing 6
Learning Objective Function: max Pz p(y z,w) p(z x, ) Interpretation Semantic parsing EM-like Algorithm: parameters (0, 0,...,0) 6
Learning Objective Function: max Pz p(y z,w) p(z x, ) Interpretation Semantic parsing EM-like Algorithm: parameters (0, 0,...,0) enumerate/score DCS trees 6
Learning Objective Function: max Pz p(y z,w) p(z x, ) Interpretation Semantic parsing EM-like Algorithm: parameters (0, 0,...,0) enumerate/score DCS trees k-best list tree tree tree3 tree4 tree5 6
Learning Objective Function: max Pz p(y z,w) p(z x, ) Interpretation Semantic parsing EM-like Algorithm: parameters (0.,.3,...,0.7) enumerate/score DCS trees numerical optimization (L-BFGS) k-best list tree tree tree3 tree4 tree5 6
Learning Objective Function: max Pz p(y z,w) p(z x, ) Interpretation Semantic parsing EM-like Algorithm: parameters (0.,.3,...,0.7) enumerate/score DCS trees numerical optimization (L-BFGS) k-best list tree3 tree8 tree6 tree tree4 6
Learning Objective Function: max Pz p(y z,w) p(z x, ) Interpretation Semantic parsing EM-like Algorithm: parameters (0.3,.4,...,0.6) enumerate/score DCS trees numerical optimization (L-BFGS) k-best list tree3 tree8 tree6 tree tree4 6
Learning Objective Function: max Pz p(y z,w) p(z x, ) Interpretation Semantic parsing EM-like Algorithm: parameters (0.3,.4,...,0.6) enumerate/score DCS trees numerical optimization (L-BFGS) k-best list tree3 tree8 tree tree4 tree9 6
Outline Representation loc state border CA city major traverse river traverse AZ x Learning z w y Experiments 7
US Geography Benchmark Standard semantic parsing benchmark since 990s 600 training examples, 80 test examples 8
US Geography Benchmark Standard semantic parsing benchmark since 990s 600 training examples, 80 test examples What is the highest point in Florida? How many states have a city called Rochester? What is the longest river that runs through a state that borders Tennessee? Of the states washed by the Mississippi river which has the lowest point? 8
US Geography Benchmark Standard semantic parsing benchmark since 990s 600 training examples, 80 test examples What is the highest point in Florida? ) answer(a,highest(a,(place(a),loc(a,b),const(b,stateid(florida))))) How many states have a city called Rochester? ) answer(a,count(b,(state(b),loc(c,b),const(c,cityid(rochester, ))),A)) What is the longest river that runs through a state that borders Tennessee? ) answer(a,longest(a,(river(a),traverse(a,b),state(b),next to(b,c),const(c,stateid(tennessee))))) Of the states washed by the Mississippi river which has the lowest point? ) answer(a,lowest(b,(state(a),traverse(c,a),const(c,riverid(mississippi)),loc(b,a),place(b)))) Supervision in past work: question + program 8
US Geography Benchmark Standard semantic parsing benchmark since 990s 600 training examples, 80 test examples What is the highest point in Florida? ) Walton County How many states have a city called Rochester? ) What is the longest river that runs through a state that borders Tennessee? ) Missouri Of the states washed by the Mississippi river which has the lowest point? ) Louisiana Supervision in past work: question + program Supervision in this work: question + answer 8
Input to Learning Algorithm Training data (600 examples) What is the highest point in Florida? ) Walton County How many states have a city called Rochester? ) What is the longest river that runs through a state that borders Tennessee? ) Missouri Of the states washed by the Mississippi river which has the lowest point? ) Louisiana 9
Input to Learning Algorithm Training data (600 examples) What is the highest point in Florida? ) Walton County How many states have a city called Rochester? ) What is the longest river that runs through a state that borders Tennessee? ) Missouri Of the states washed by the Mississippi river which has the lowest point? ) Louisiana Lexicon (75 words) city ) city state ) state mountain ) mountain, peak 9
Input to Learning Algorithm Training data (600 examples) What is the highest point in Florida? ) Walton County How many states have a city called Rochester? ) What is the longest river that runs through a state that borders Tennessee? ) Missouri Of the states washed by the Mississippi river which has the lowest point? ) Louisiana Lexicon (75 words) Database city ) city state ) state mountain ) mountain, peak city San Francisco Chicago Boston loc Mount Shasta California San Francisco California Boston Massachusetts state Alabama Alaska Arizona border Washington Oregon Washington Idaho Oregon Washington 9
Experiment On Geo, 50 training examples, 50 test examples 00 95 test accuracy 90 85 80 75 30
Experiment On Geo, 50 training examples, 50 test examples System Description Lexicon (gen./spec.) Logical forms cgcr0 FunQL [Clarke et al., 00] 00 95 test accuracy 90 85 80 75 73.% cgcr0 30
Experiment On Geo, 50 training examples, 50 test examples System Description Lexicon (gen./spec.) Logical forms cgcr0 FunQL [Clarke et al., 00] dcs our system 00 95 test accuracy 90 85 80 75 73.% 78.9% cgcr0 dcs 30
Experiment On Geo, 50 training examples, 50 test examples System Description Lexicon (gen./spec.) Logical forms cgcr0 FunQL [Clarke et al., 00] dcs our system dcs + our system 00 95 test accuracy 90 85 80 75 73.% 78.9% 87.% cgcr0 dcs dcs + 30
Experiment On Geo, 600 training examples, 80 test examples 3
Experiment On Geo, 600 training examples, 80 test examples System Description Lexicon Logical forms 00 95 test accuracy 90 85 80 75 3
Experiment On Geo, 600 training examples, 80 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 005] 00 95 test accuracy 90 85 80 75 79.3% zc05 3
Experiment On Geo, 600 training examples, 80 test examples System Description zc05 CCG [Zettlemoyer & Collins, 005] zc07 relaxed CCG [Zettlemoyer & Collins, 007] Lexicon Logical forms 00 95 test accuracy 90 85 80 79.3% 86.% 75 zc05 zc07 3
Experiment On Geo, 600 training examples, 80 test examples System Description zc05 CCG [Zettlemoyer & Collins, 005] zc07 relaxed CCG [Zettlemoyer & Collins, 007] kzgs0 CCG w/unification [Kwiatkowski et al., 00] Lexicon Logical forms 00 test accuracy 95 90 85 80 79.3% 86.% 88.9% 75 zc05 zc07 kzgs0 3
Experiment On Geo, 600 training examples, 80 test examples System Description zc05 CCG [Zettlemoyer & Collins, 005] zc07 relaxed CCG [Zettlemoyer & Collins, 007] kzgs0 CCG w/unification [Kwiatkowski et al., 00] dcs our system Lexicon Logical forms 00 test accuracy 95 90 85 80 79.3% 86.% 88.9% 88.6% 75 zc05 zc07 kzgs0 dcs 3
Experiment On Geo, 600 training examples, 80 test examples System Description zc05 CCG [Zettlemoyer & Collins, 005] zc07 relaxed CCG [Zettlemoyer & Collins, 007] kzgs0 CCG w/unification [Kwiatkowski et al., 00] dcs our system dcs + our system 00 Lexicon Logical forms test accuracy 95 90 85 80 79.3% 86.% 88.9% 88.6% 9.% 75 zc05 zc07 kzgs0 dcs dcs + 3
Some Intuition on Learning 3
Some Intuition on Learning parameters () search DCS trees (hard!) () numerical optimization k-best lists 3
Some Intuition on Learning () search DCS trees (hard!) parameters k-best lists () numerical optimization If no DCS tree on k-best list is correct, skip example in () 3
Some Intuition on Learning () search DCS trees (hard!) parameters k-best lists () numerical optimization If no DCS tree on k-best list is correct, skip example in () % examples trained on 00 80 60 40 0 3 4 iteration 3
Some Intuition on Learning () search DCS trees (hard!) parameters k-best lists () numerical optimization If no DCS tree on k-best list is correct, skip example in () % examples trained on 00 80 60 40 0 3 4 iteration E ect: automatic curriculum learning, learning improves search 3
Current Limitations 9
Only using forward information Current Limitations Execute program to get answer, but want to invert 9
Current Limitations Only using forward information Execute program to get answer, but want to invert Non-identifiability of program If all cities in database are in US, then can t distinguish {c : city(c)} and {c : city(c) ^ loc(c, US)} 9
Current Limitations Only using forward information Execute program to get answer, but want to invert Non-identifiability of program If all cities in database are in US, then can t distinguish {c : city(c)} and {c : city(c) ^ loc(c, US)} Unknown facts: How far is Los Angeles from Boston? Database has no distance information 9
Current Limitations Only using forward information Execute program to get answer, but want to invert Non-identifiability of program If all cities in database are in US, then can t distinguish {c : city(c)} and {c : city(c) ^ loc(c, US)} Unknown facts: How far is Los Angeles from Boston? Database has no distance information Unknown concepts: What states are landlocked? Need to induce database view for landlocked(x) = border(x, ocean) 9
Conclusion Goal: learn to answer questions from question/answer pairs 30
Conclusion Goal: learn to answer questions from question/answer pairs Empirical result: DCS (no logical forms) u existing systems (with logical forms) 30
Conclusion Goal: learn to answer questions from question/answer pairs Empirical result: DCS (no logical forms) u existing systems (with logical forms) Conceptual contribution: DCS trees Trees: connects dependency syntax with e cient evaluation 30
Conclusion Goal: learn to answer questions from question/answer pairs Empirical result: DCS (no logical forms) u existing systems (with logical forms) Conceptual contribution: DCStrees Trees: connects dependency syntax with e cient evaluation Mark-Execute: unifying framework for handling scope 30
thank you 35