The Infinite PCFG using Hierarchical Dirichlet Processes
|
|
- Cornelius Simmons
- 6 years ago
- Views:
Transcription
1 S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise The Infinite PCFG using Hierarchical Dirichlet Processes EMNLP 2007 Prague, Czech Republic June 29, 2007 Percy Liang Slav Petrov Michael I. Jordan Dan Klein S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise
2 How do we choose the grammar complexity? Grammar induction: How many grammar symbols (NP, VP, etc.)?? She heard the noise 2
3 How do we choose the grammar complexity? Grammar induction: How many grammar symbols (NP, VP, etc.)? Grammar refinement: How many grammar subsymbols (NP-loc, NP-subj, etc.)? S-? NP-? PRP-? VBD-? VP-? NP-? She heard DT-? NN-? the noise 2
4 How do we choose the grammar complexity? Grammar induction: How many grammar symbols (NP, VP, etc.)? Grammar refinement: How many grammar subsymbols (NP-loc, NP-subj, etc.)? S-? NP-? PRP-? VBD-? VP-? NP-? She heard DT-? NN-? the noise Our solution: the HDP-PCFG allows number of (sub)symbols to adapt to data 2
5 True grammar: S A A B B C C D D A a 1 a 2 a 3 B b 1 b 2 b 3 C c 1 c 2 c 3 D d 1 d 2 d 3 A motivating example 3
6 A motivating example True grammar: S A A B B C C D D A a 1 a 2 a 3 B b 1 b 2 b 3 C c 1 c 2 c 3 D d 1 d 2 d 3 Generate examples: S S S A A B B A A a 2 a 3 b 1 b 3 a 1 a 1 S C C c 1 c 1 3
7 A motivating example True grammar: S A A B B C C D D A a 1 a 2 a 3 B b 1 b 2 b 3 C c 1 c 2 c 3 D d 1 d 2 d 3 Collapse A,B,C,D X: S S X X X X X X X X a 2 a 3 b 1 b 3 a 1 a 1 c 1 c 1 S S 3
8 A motivating example True grammar: S A A B B C C D D A a 1 a 2 a 3 B b 1 b 2 b 3 C c 1 c 2 c 3 D d 1 d 2 d 3 Results: Collapse A,B,C,D X: S S X X X X X X X X a 2 a 3 b 1 b 3 a 1 a 1 c 1 c 1 S S 0.25 posterior subsymbol standard PCFG 3
9 A motivating example True grammar: S A A B B C C D D A a 1 a 2 a 3 B b 1 b 2 b 3 C c 1 c 2 c 3 D d 1 d 2 d 3 Results: Collapse A,B,C,D X: S S X X X X X X X X a 2 a 3 b 1 b 3 a 1 a 1 c 1 c 1 S S 0.25 posterior subsymbol HDP-PCFG 3
10 The meeting of two fields Grammar learning Lexicalized [Charniak, 1996] [Collins, 1999] Manual refinement [Johnson, 1998] [Klein, Manning, 2003] Automatic refinement [Matsuzaki, et al., 2005] [Petrov, et al., 2006] 4
11 The meeting of two fields Grammar learning Lexicalized [Charniak, 1996] [Collins, 1999] Manual refinement [Johnson, 1998] [Klein, Manning, 2003] Automatic refinement [Matsuzaki, et al., 2005] [Petrov, et al., 2006] Bayesian nonparametrics... Basic theory [Ferguson, 1973] [Antoniak, 1974] [Sethuraman, 1994] [Escobar, West, 1995] [Neal, 2000] More complex models [Teh, et al., 2006] [Beal, et al., 2002] [Goldwater, et al., 2006] [Sohn, Xing, 2007] 4
12 The meeting of two fields Grammar learning Lexicalized [Charniak, 1996] [Collins, 1999] Manual refinement [Johnson, 1998] [Klein, Manning, 2003] Automatic refinement [Matsuzaki, et al., 2005] [Petrov, et al., 2006] Bayesian nonparametrics... Basic theory [Ferguson, 1973] [Antoniak, 1974] [Sethuraman, 1994] [Escobar, West, 1995] [Neal, 2000] More complex models [Teh, et al., 2006] [Beal, et al., 2002] [Goldwater, et al., 2006] [Sohn, Xing, 2007] Nonparametric grammars [Johnson, et al., 2006] [Finkel, et al., 2007] [Liang, et al., 2007] 4
13 The meeting of two fields Grammar learning Bayesian nonparametrics... Lexicalized [Charniak, 1996] [Collins, 1999] Basic theory [Ferguson, 1973] [Antoniak, 1974] [Sethuraman, 1994] Manual refinement Our contribution [Escobar, West, 1995] [Johnson, 1998] [Klein, Definition Manning, 2003] of the HDP-PCFG [Neal, 2000] More complex models Automatic Simple refinement and efficient variational inference algorithm [Teh, et al., 2006] [Matsuzaki, et al., 2005] Empirical comparison with finite [Beal, models et al., 2002] [Petrov, et al., 2006] on a full-scale parsing task Nonparametric grammars [Johnson, et al., 2006] [Finkel, et al., 2007] [Liang, et al., 2007] [Goldwater, et al., 2006] [Sohn, Xing, 2007] 4
14 Bayesian paradigm Generative model: α HDP θ PCFG hyperparameters. grammar. z: parse tree x: sentence 5
15 Bayesian paradigm Generative model: α HDP θ PCFG hyperparameters. grammar. z: parse tree x: sentence Bayesian posterior inference: Observe x. What s θ and z? 5
16 HDP probabilistic context-free grammars β z 1 φ B z z 2 z 3 z φ E z x 2 x 3 HDP-PCFG β GEM(α) [generate distribution over symbols] For each symbol z {1, 2,... }: φ E z Dirichlet(α E ) [generate emission probs] φ B z DP(α B, ββ T ) [binary production probs] For each nonterminal node i: (z L(i), z R(i) ) Multinomial(φ B z i ) [child symbols] For each preterminal node i: x i Multinomial(φ E z i ) [terminal symbol] 6
17 HDP probabilistic context-free grammars β z 1 φ B z z 2 z 3 z φ E z x 2 x 3 HDP-PCFG β GEM(α) [generate distribution over symbols] For each symbol z {1, 2,... }: φ E z Dirichlet(α E ) [generate emission probs] φ B z DP(α B, ββ T ) [binary production probs] For each nonterminal node i: (z L(i), z R(i) ) Multinomial(φ B z i ) [child symbols] For each preterminal node i: x i Multinomial(φ E z i ) [terminal symbol] 7
18 HDP-PCFG: prior over symbols β GEM(α) α = 1 GEM 8
19 HDP-PCFG: prior over symbols β GEM(α) α = 1 GEM 8
20 HDP-PCFG: prior over symbols β GEM(α) α = 1 GEM 8
21 HDP-PCFG: prior over symbols β GEM(α) α = 1 GEM 8
22 HDP-PCFG: prior over symbols β GEM(α) α = 0.5 α = 1 GEM GEM 8
23 α = 0.2 α = 0.5 α = 1 HDP-PCFG: prior over symbols β GEM(α) GEM GEM GEM 8
24 α = 0.2 α = 0.5 α = 1 α = 5 α = 10 HDP-PCFG: prior over symbols β GEM(α) GEM GEM GEM GEM GEM 8
25 HDP probabilistic context-free grammars β z 1 φ B z z 2 z 3 z φ E z x 2 x 3 HDP-PCFG β GEM(α) [generate distribution over symbols] For each symbol z {1, 2,... }: φ E z Dirichlet(α E ) [generate emission probs] φ B z DP(α B, ββ T ) [binary production probs] For each nonterminal node i: (z L(i), z R(i) ) Multinomial(φ B z i ) [child symbols] For each preterminal node i: x i Multinomial(φ E z i ) [terminal symbol] 9
26 HDP-PCFG: binary productions Distribution over symbols (top-level): β GEM(α)
27 HDP-PCFG: binary productions Distribution over symbols (top-level): β GEM(α) Mean distribution over child symbols: ββ T right child left child 10
28 HDP-PCFG: binary productions Distribution over symbols (top-level): β GEM(α) Mean distribution over child symbols: ββ T right child left child Distribution over child symbols (per-state): φ B z DP(α B, ββ T ) right child left child 10
29 HDP-PCFG: binary productions Distribution over symbols (top-level): β GEM(α) Mean distribution over child symbols: ββ T right child left child Distribution over child symbols (per-state): φ B z DP(α B, ββ T ) right child left child 10
30 HDP-PCFG: binary productions Distribution over symbols (top-level): β GEM(α) Mean distribution over child symbols: ββ T right child left child Distribution over child symbols (per-state): φ B z DP(α B, ββ T ) right child left child 10
31 HDP-PCFG: binary productions Distribution over symbols (top-level): β GEM(α) Mean distribution over child symbols: ββ T right child left child Distribution over child symbols (per-state): φ B z DP(α B, ββ T ) right child left child 10
32 HDP probabilistic context-free grammars β z 1 φ B z z 2 z 3 z φ E z x 2 x 3 HDP-PCFG β GEM(α) [generate distribution over symbols] For each symbol z {1, 2,... }: φ E z Dirichlet(α E ) [generate emission probs] φ B z DP(α B, ββ T ) [binary production probs] For each nonterminal node i: (z L(i), z R(i) ) Multinomial(φ B z i ) [child symbols] For each preterminal node i: x i Multinomial(φ E z i ) [terminal symbol] 11
33 HDP probabilistic context-free grammars β z 1 φ B z z 2 z 3 z φ E z x 2 x 3 HDP-PCFG β GEM(α) [generate distribution over symbols] For each symbol z {1, 2,... }: φ E z Dirichlet(α E ) [generate emission probs] φ B z DP(α B, ββ T ) [binary production probs] For each nonterminal node i: (z L(i), z R(i) ) Multinomial(φ B z i ) [child symbols] For each preterminal node i: x i Multinomial(φ E z i ) [terminal symbol] 12
34 Variational Bayesian inference α HDP θ PCFG hyperparameters. grammar. Goal: compute posterior p(θ, z x) z: parse tree x: sentence 13
35 Variational Bayesian inference α HDP θ PCFG hyperparameters. grammar. Goal: compute posterior p(θ, z x) Variational inference: approximate posterior with best from a set of tractable distributions Q: q = argmin q Q KL(q p) z: parse tree x: sentence Q q p 13
36 Variational Bayesian inference α HDP θ PCFG hyperparameters. grammar. Goal: compute posterior p(θ, z x) Variational inference: approximate posterior with best from a set of tractable distributions Q: q = argmin q Q Q q KL(q p) p z: parse tree x: sentence Mean-field { approximation: } Q = q : q = q(z)q(β)q(φ) z β φ B z φ E z z 1 z 2 z 3 13
37 Coordinate-wise descent algorithm Goal: argmin KL(q p) q Q q = q(z)q(φ)q(β) z = parse tree φ = rule probabilities β = inventory of symbols β z 1 φ B z z φ E z z 2 z 3 14
38 Coordinate-wise descent algorithm Goal: argmin KL(q p) q Q q = q(z)q(φ)q(β) z = parse tree φ = rule probabilities β = inventory of symbols Iterate: Optimize q(z) (E-step): Optimize q(φ) (M-step): β z 1 φ B z Optimize q(β) (no equivalent in EM): z φ E z z 2 z 3 14
39 Coordinate-wise descent algorithm Goal: argmin KL(q p) q Q q = q(z)q(φ)q(β) z = parse tree φ = rule probabilities β = inventory of symbols Iterate: Optimize q(z) (E-step): Inside-outside with rule weights W (r) Gather expected rule counts C(r) Optimize q(φ) (M-step): β z 1 φ B z Optimize q(β) (no equivalent in EM): z φ E z z 2 z 3 14
40 Coordinate-wise descent algorithm Goal: argmin KL(q p) q Q q = q(z)q(φ)q(β) z = parse tree φ = rule probabilities β = inventory of symbols β φ B z z 1 Iterate: Optimize q(z) (E-step): Inside-outside with rule weights W (r) Gather expected rule counts C(r) Optimize q(φ) (M-step): Update Dirichlet posteriors (expected rule counts + pseudocounts) Compute rule weights W (r) Optimize q(β) (no equivalent in EM): z φ E z z 2 z 3 14
41 Coordinate-wise descent algorithm Goal: argmin KL(q p) q Q q = q(z)q(φ)q(β) z = parse tree φ = rule probabilities β = inventory of symbols z β φ B z φ E z z 1 z 2 z 3 Iterate: Optimize q(z) (E-step): Inside-outside with rule weights W (r) Gather expected rule counts C(r) Optimize q(φ) (M-step): Update Dirichlet posteriors (expected rule counts + pseudocounts) Compute rule weights W (r) Optimize q(β) (no equivalent in EM): Truncate at level K (set the maximum number of symbols) Use projected gradient to adapt number of symbols 14
42 Rule weights Weight W (r) of rule r similar to probability p(r) W (r) unnormalized extra degree of freedom 15
43 Rule weights Weight W (r) of rule r similar to probability p(r) W (r) unnormalized extra degree of freedom EM (maximum likelihood): W (r) = C(r) P r C(r ) 15
44 Rule weights Weight W (r) of rule r similar to probability p(r) W (r) unnormalized extra degree of freedom EM (maximum likelihood): W (r) = C(r) P r C(r ) EM (maximum a posteriori): W (r) = P prior(r) 1+C(r) r prior(r ) 1+C(r ) 15
45 Rule weights Weight W (r) of rule r similar to probability p(r) W (r) unnormalized extra degree of freedom EM (maximum likelihood): W (r) = C(r) P r C(r ) EM (maximum a posteriori): W (r) = P prior(r) 1+C(r) r prior(r ) 1+C(r ) Mean-field (with DP prior): W (r) = exp Ψ(prior(r)+C(r)) exp Ψ( P r prior(r )+C(r )) 15
46 Rule weights Weight W (r) of rule r similar to probability p(r) W (r) unnormalized extra degree of freedom EM (maximum likelihood): 2.0 W (r) = C(r) P r C(r ) EM (maximum a posteriori): W (r) = P prior(r) 1+C(r) r prior(r ) 1+C(r ) Mean-field (with DP prior): W (r) = exp Ψ(prior(r)+C(r)) exp Ψ( P r prior(r )+C(r )) x exp(ψ( )) x 15
47 Rule weights Weight W (r) of rule r similar to probability p(r) W (r) unnormalized extra degree of freedom EM (maximum likelihood): 2.0 W (r) = C(r) P r C(r ) EM (maximum a posteriori): W (r) = P prior(r) 1+C(r) r prior(r ) 1+C(r ) Mean-field (with DP prior): W (r) = exp Ψ(prior(r)+C(r)) exp Ψ( P r prior(r )+C(r )) C(r) 0.5 P r C(r ) x exp(ψ( )) x 15
48 Rule weights Weight W (r) of rule r similar to probability p(r) W (r) unnormalized extra degree of freedom EM (maximum likelihood): 2.0 W (r) = C(r) P r C(r ) EM (maximum a posteriori): W (r) = P prior(r) 1+C(r) r prior(r ) 1+C(r ) Mean-field (with DP prior): W (r) = exp Ψ(prior(r)+C(r)) exp Ψ( P r prior(r )+C(r )) C(r) 0.5 P r C(r ) x exp(ψ( )) x Subtract 0.5 small counts hurt more than large counts rich gets richer controls number of symbols 15
49 Parsing the WSJ Penn Treebank Setup: grammar refinement (split symbols into subsymbols) Training on one section: Standard PCFG F maximum number of subsymbols (truncation K) 16
50 Parsing the WSJ Penn Treebank Setup: grammar refinement (split symbols into subsymbols) Training on one section: F HDP-PCFG Standard PCFG maximum number of subsymbols (truncation K) 16
51 Parsing the WSJ Penn Treebank Setup: grammar refinement (split symbols into subsymbols) Training on one section: F HDP-PCFG (K = 20) Standard PCFG maximum number of subsymbols (truncation K) 16
52 Parsing the WSJ Penn Treebank Setup: grammar refinement (split symbols into subsymbols) Training on one section: F HDP-PCFG (K = 20) Standard PCFG maximum number of subsymbols (truncation K) Training on 20 sections: (K = 16) Standard PCFG: HDP-PCFG:
53 Parsing the WSJ Penn Treebank Setup: grammar refinement (split symbols into subsymbols) Training on one section: F HDP-PCFG (K = 20) Standard PCFG maximum number of subsymbols (truncation K) Training on 20 sections: (K = 16) Standard PCFG: HDP-PCFG: Results: HDP-PCFG overfits less than standard PCFG If have large amounts of data, HDP-PCFG standard PCFG 16
54 Conclusions What? HDP-PCFG model allows number of grammar symbols to adapt to data How? Mean-field algorithm (variational inference) simple, efficient, similar to EM When? Have small amounts of data overfits less than standard PCFG Why? Declarative framework Grammar complexity specified declaratively in model rather than in learning procedure S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise 17
The Infinite PCFG using Hierarchical Dirichlet Processes
The Infinite PCFG using Hierarchical Dirichlet Processes Percy Liang Slav Petrov Michael I. Jordan Dan Klein Computer Science Division, EECS Department University of California at Berkeley Berkeley, CA
More informationThe Infinite PCFG using Hierarchical Dirichlet Processes
The Infinite PCFG using Hierarchical Dirichlet Processes Liang, Petrov, Jordan & Klein Presented by: Will Allen November 8, 2011 Overview 1. Overview 2. (Very) Brief History of Context Free Grammars 3.
More informationProbabilistic modeling of NLP
Structured Bayesian Nonparametric Models with Variational Inference ACL Tutorial Prague, Czech Republic June 24, 2007 Percy Liang and Dan Klein Probabilistic modeling of NLP Document clustering Topic modeling
More informationLatent Variable Models in NLP
Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable
More informationProbabilistic Context-Free Grammars. Michael Collins, Columbia University
Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationNatural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):
More informationAn Overview of Nonparametric Bayesian Models and Applications to Natural Language Processing
An Overview of Nonparametric Bayesian Models and Applications to Natural Language Processing Narges Sharif-Razavian and Andreas Zollmann School of Computer Science Carnegie Mellon University Pittsburgh,
More informationStatistical Methods for NLP
Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification
More informationMultiword Expression Identification with Tree Substitution Grammars
Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins
Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning
Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic
More informationNatural Language Processing : Probabilistic Context Free Grammars. Updated 5/09
Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require
More informationPCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities
1 / 22 Inside L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 Inside- 2 / 22 for PCFGs 3 questions for Probabilistic Context Free Grammars (PCFGs): What is the probability of a sentence
More informationType-Based MCMC. Michael I. Jordan UC Berkeley 1 In NLP, this is sometimes referred to as simply the collapsed
Type-Based MCMC Percy Liang UC Berkeley pliang@cs.berkeley.edu Michael I. Jordan UC Berkeley jordan@cs.berkeley.edu Dan Klein UC Berkeley klein@cs.berkeley.edu Abstract Most existing algorithms for learning
More informationChapter 14 (Partially) Unsupervised Parsing
Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with
More informationProbabilistic Context-Free Grammar
Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationProbabilistic Context-free Grammars
Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John
More informationBayesian Structure Modeling. SPFLODD December 1, 2011
Bayesian Structure Modeling SPFLODD December 1, 2011 Outline Defining Bayesian Parametric Bayesian models Latent Dirichlet allocabon (Blei et al., 2003) Bayesian HMM (Goldwater and Griffiths, 2007) A limle
More informationLECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More informationSpectral Unsupervised Parsing with Additive Tree Metrics
Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach
More informationCS 545 Lecture XVI: Parsing
CS 545 Lecture XVI: Parsing brownies_choco81@yahoo.com brownies_choco81@yahoo.com Benjamin Snyder Parsing Given a grammar G and a sentence x = (x1, x2,..., xn), find the best parse tree. We re not going
More information{Probabilistic Stochastic} Context-Free Grammars (PCFGs)
{Probabilistic Stochastic} Context-Free Grammars (PCFGs) 116 The velocity of the seismic waves rises to... S NP sg VP sg DT NN PP risesto... The velocity IN NP pl of the seismic waves 117 PCFGs APCFGGconsists
More informationPenn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark
Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument
More informationBayesian Analysis for Natural Language Processing Lecture 2
Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion
More informationTree-Based Inference for Dirichlet Process Mixtures
Yang Xu Machine Learning Department School of Computer Science Carnegie Mellon University Pittsburgh, USA Katherine A. Heller Department of Engineering University of Cambridge Cambridge, UK Zoubin Ghahramani
More informationProcessing/Speech, NLP and the Web
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More informationLecture 5: UDOP, Dependency Grammars
Lecture 5: UDOP, Dependency Grammars Jelle Zuidema ILLC, Universiteit van Amsterdam Unsupervised Language Learning, 2014 Generative Model objective PCFG PTSG CCM DMV heuristic Wolff (1984) UDOP ML IO K&M
More informationBayesian Nonparametrics for Speech and Signal Processing
Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More informationAnnotating images and image objects using a hierarchical Dirichlet process model
Annotating images and image objects using a hierarchical Dirichlet process model ABSTRACT Oksana Yakhnenko Computer Science Department Iowa State University Ames, IA, 50010 oksayakh@cs.iastate.edu Many
More informationA Context-Free Grammar
Statistical Parsing A Context-Free Grammar S VP VP Vi VP Vt VP VP PP DT NN PP PP P Vi sleeps Vt saw NN man NN dog NN telescope DT the IN with IN in Ambiguity A sentence of reasonable length can easily
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences
More informationMultilevel Coarse-to-Fine PCFG Parsing
Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph Austerweil, David Ellis, Isaac Haxton, Catherine Hill, Shrivaths Iyengar, Jeremy Moore, Michael Pozar, and Theresa
More informationRoger Levy Probabilistic Models in the Study of Language draft, October 2,
Roger Levy Probabilistic Models in the Study of Language draft, October 2, 2012 224 Chapter 10 Probabilistic Grammars 10.1 Outline HMMs PCFGs ptsgs and ptags Highlight: Zuidema et al., 2008, CogSci; Cohn
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationSpatial Bayesian Nonparametrics for Natural Image Segmentation
Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes
More informationBayesian Nonparametrics: Dirichlet Process
Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian
More informationCS : Speech, NLP and the Web/Topics in AI
CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; insideoutside probabilities Probability of a parse tree (cont.) S 1,l NP 1,2
More informationTwo Useful Bounds for Variational Inference
Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy
More informationStochastic Variational Inference for the HDP-HMM
Stochastic Variational Inference for the HDP-HMM Aonan Zhang San Gultekin John Paisley Department of Electrical Engineering & Data Science Institute Columbia University, New York, NY Abstract We derive
More informationDirichlet Process. Yee Whye Teh, University College London
Dirichlet Process Yee Whye Teh, University College London Related keywords: Bayesian nonparametrics, stochastic processes, clustering, infinite mixture model, Blackwell-MacQueen urn scheme, Chinese restaurant
More informationParsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22
Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky
More informationThe effect of non-tightness on Bayesian estimation of PCFGs
The effect of non-tightness on Bayesian estimation of PCFGs hay B. Cohen Department of Computer cience Columbia University scohen@cs.columbia.edu Mark Johnson Department of Computing Macquarie University
More informationDirichlet Processes: Tutorial and Practical Course
Dirichlet Processes: Tutorial and Practical Course (updated) Yee Whye Teh Gatsby Computational Neuroscience Unit University College London August 2007 / MLSS Yee Whye Teh (Gatsby) DP August 2007 / MLSS
More informationApplied Nonparametric Bayes
Applied Nonparametric Bayes Michael I. Jordan Department of Electrical Engineering and Computer Science Department of Statistics University of California, Berkeley http://www.cs.berkeley.edu/ jordan Acknowledgments:
More informationDirichlet Enhanced Latent Semantic Analysis
Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,
More informationParsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)
Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements
More informationLogistic Normal Priors for Unsupervised Probabilistic Grammar Induction
Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction Shay B. Cohen Kevin Gimpel Noah A. Smith Language Technologies Institute School of Computer Science Carnegie Mellon University {scohen,gimpel,nasmith}@cs.cmu.edu
More informationThe infinite HMM for unsupervised PoS tagging
The infinite HMM for unsupervised PoS tagging Jurgen Van Gael Department of Engineering University of Cambridge jv249@cam.ac.uk Andreas Vlachos Computer Laboratory University of Cambridge av308@cl.cam.ac.uk
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationAlgorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Classification II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Minimize Training Error? A loss function declares how costly each mistake is E.g. 0 loss for correct label,
More informationBayesian Inference for PCFGs via Markov chain Monte Carlo
Bayesian Inference for PCFGs via Markov chain Monte Carlo Mark Johnson Cognitive and Linguistic Sciences Thomas L. Griffiths Department of Psychology Brown University University of California, Berkeley
More informationGenerative Models for Sentences
Generative Models for Sentences Amjad Almahairi PhD student August 16 th 2014 Outline 1. Motivation Language modelling Full Sentence Embeddings 2. Approach Bayesian Networks Variational Autoencoders (VAE)
More informationInfinite Hierarchical Hidden Markov Models
Katherine A. Heller Engineering Department University of Cambridge Cambridge, UK heller@gatsby.ucl.ac.uk Yee Whye Teh and Dilan Görür Gatsby Unit University College London London, UK {ywteh,dilan}@gatsby.ucl.ac.uk
More informationRecap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees
Recap: Lexicalized PCFGs We now need to estimate rule probabilities such as P rob(s(questioned,vt) NP(lawyer,NN) VP(questioned,Vt) S(questioned,Vt)) 6.864 (Fall 2007): Lecture 5 Parsing and Syntax III
More informationSharing Clusters Among Related Groups: Hierarchical Dirichlet Processes
Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes Yee Whye Teh (1), Michael I. Jordan (1,2), Matthew J. Beal (3) and David M. Blei (1) (1) Computer Science Div., (2) Dept. of Statistics
More informationA Supertag-Context Model for Weakly-Supervised CCG Parser Learning
A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette Chris Dyer Jason Baldridge Noah A. Smith U. Washington CMU UT-Austin CMU Contributions 1. A new generative model for learning
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationAdvanced Natural Language Processing Syntactic Parsing
Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm
More informationImage segmentation combining Markov Random Fields and Dirichlet Processes
Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO
More informationHierarchical Dirichlet Processes
Hierarchical Dirichlet Processes Yee Whye Teh, Michael I. Jordan, Matthew J. Beal and David M. Blei Computer Science Div., Dept. of Statistics Dept. of Computer Science University of California at Berkeley
More informationCollapsed Variational Bayesian Inference for Hidden Markov Models
Collapsed Variational Bayesian Inference for Hidden Markov Models Pengyu Wang, Phil Blunsom Department of Computer Science, University of Oxford International Conference on Artificial Intelligence and
More informationAnalyzing the Errors of Unsupervised Learning
Analyzing the Errors of Unsupervised Learning Percy Liang Dan Klein Computer Science Division, EECS Department University of California at Berkeley Berkeley, CA 94720 {pliang,klein}@cs.berkeley.edu Abstract
More informationIn this chapter, we explore the parsing problem, which encompasses several questions, including:
Chapter 12 Parsing Algorithms 12.1 Introduction In this chapter, we explore the parsing problem, which encompasses several questions, including: Does L(G) contain w? What is the highest-weight derivation
More informationAn introduction to PRISM and its applications
An introduction to PRISM and its applications Yoshitaka Kameya Tokyo Institute of Technology 2007/9/17 FJ-2007 1 Contents What is PRISM? Two examples: from population genetics from statistical natural
More informationLecture 3a: Dirichlet processes
Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationProbabilistic Context-Free Grammars and beyond
Probabilistic Context-Free Grammars and beyond Mark Johnson Microsoft Research / Brown University July 2007 1 / 87 Outline Introduction Formal languages and Grammars Probabilistic context-free grammars
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationShared Segmentation of Natural Scenes. Dependent Pitman-Yor Processes
Shared Segmentation of Natural Scenes using Dependent Pitman-Yor Processes Erik Sudderth & Michael Jordan University of California, Berkeley Parsing Visual Scenes sky skyscraper sky dome buildings trees
More informationClassification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature
More informationFeatures of Statistical Parsers
Features of tatistical Parsers Preliminary results Mark Johnson Brown University TTI, October 2003 Joint work with Michael Collins (MIT) upported by NF grants LI 9720368 and II0095940 1 Talk outline tatistical
More informationOnline Bayesian Passive-Agressive Learning
Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online
More informationThe SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania
The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania 1 PICTURE OF ANALYSIS PIPELINE Tokenize Maximum Entropy POS tagger MXPOST Ratnaparkhi Core Parser Collins
More informationBayesian Nonparametrics
Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet
More informationHierarchical Dirichlet Processes with Random Effects
Hierarchical Dirichlet Processes with Random Effects Seyoung Kim Department of Computer Science University of California, Irvine Irvine, CA 92697-34 sykim@ics.uci.edu Padhraic Smyth Department of Computer
More informationStructured Prediction Models via the Matrix-Tree Theorem
Structured Prediction Models via the Matrix-Tree Theorem Terry Koo Amir Globerson Xavier Carreras Michael Collins maestro@csail.mit.edu gamir@csail.mit.edu carreras@csail.mit.edu mcollins@csail.mit.edu
More informationDiscrimina)ve Latent Variable Models. SPFLODD November 15, 2011
Discrimina)ve Latent Variable Models SPFLODD November 15, 2011 Lecture Plan 1. Latent variables in genera)ve models (review) 2. Latent variables in condi)onal models 3. Latent variables in structural SVMs
More informationConstruction of Dependent Dirichlet Processes based on Poisson Processes
1 / 31 Construction of Dependent Dirichlet Processes based on Poisson Processes Dahua Lin Eric Grimson John Fisher CSAIL MIT NIPS 2010 Outstanding Student Paper Award Presented by Shouyuan Chen Outline
More informationStreaming Variational Bayes
Streaming Variational Bayes Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson, Michael I. Jordan UC Berkeley Discussion led by Miao Liu September 13, 2013 Introduction The SDA-Bayes Framework
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationNatural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation
atural Language Processing 1 lecture 7: constituent parsing Ivan Titov Institute for Logic, Language and Computation Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing
More informationMulti-Modal Hierarchical Dirichlet Process Model for Predicting Image Annotation and Image-Object Label Correspondence
Multi-Modal Hierarchical Dirichlet Process Model for Predicting Image Annotation and Image-Object Label Correspondence Oksana Yakhnenko and Vasant Honavar {oksayakh, honavar}@cs.iastate.edu Iowa State
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationAttendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.
even Lectures on tatistical Parsing Christopher Manning LA Linguistic Institute 7 LA Lecture Attendee information Please put on a piece of paper: ame: Affiliation: tatus (undergrad, grad, industry, prof,
More informationProbabilistic Reasoning in Deep Learning
Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationA permutation-augmented sampler for DP mixture models
Percy Liang University of California, Berkeley Michael Jordan University of California, Berkeley Ben Taskar University of Pennsylvania Abstract We introduce a new inference algorithm for Dirichlet process
More informationBayesian Tools for Natural Language Learning. Yee Whye Teh Gatsby Computational Neuroscience Unit UCL
Bayesian Tools for Natural Language Learning Yee Whye Teh Gatsby Computational Neuroscience Unit UCL Bayesian Learning of Probabilistic Models Potential outcomes/observations X. Unobserved latent variables
More informationComputational Learning of Probabilistic Grammars in the Unsupervised Setting
Computational Learning of Probabilistic Grammars in the Unsupervised Setting Shay Cohen CMU-LTI-11-016 Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes
More informationHierarchical Models, Nested Models and Completely Random Measures
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/238729763 Hierarchical Models, Nested Models and Completely Random Measures Article March 2012
More informationPosterior Sparsity in Unsupervised Dependency Parsing
Journal of Machine Learning Research 10 (2010)?? Submitted??; Published?? Posterior Sparsity in Unsupervised Dependency Parsing Jennifer Gillenwater Kuzman Ganchev João Graça Computer and Information Science
More information