Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division
Latent Variable Models
Latent Variable Models
Latent Variable Models Observed
Latent Variable Models Observed Hidden
Latent Variable Models Observed Hidden e.g. Word Alignment
Latent Variable Models Observed Hidden e.g. Word Alignment
Latent Variable Models Observed Hidden e.g. Word Alignment Observed Bitext
Latent Variable Models Observed Hidden e.g. Word Alignment Observed Bitext Hidden Alignment
Latent Variable Models Observed Hidden
Latent Variable Models Observed Hidden Basic question for LV models
Latent Variable Models Observed Hidden Basic question for LV models Q: How do I make latent variables behave the way I want them too.
Latent Variable Models Observed Hidden Basic question for LV models Q: How do I make latent variables behave the way I want them too. A: Very careful model design!
Recent Applications
Recent Applications Parsing PCFG Annotation
Recent Applications Parsing PCFG Annotation Machine Translation Learning Phrase Tables
Recent Applications Parsing PCFG Annotation Machine Translation Learning Phrase Tables Information Extraction / Discourse Unsupervised Coref Resolution
Recent Applications Parsing Machine Translation Learning Phrase Tables Information Extraction / Discourse Disclaimer There are many other excellent examples of PCFG LV models Annotation in NLP, these examples are from the Berkeley group. Unsupervised Coref Resolution
PCFG Annotation
PCFG Annotation
Subject NP PCFG Annotation
PCFG Annotation Subject NP Object NP
PCFG Annotation
PCFG Annotation Annotation refines base treebank symbols to improve statistical fit of the grammar
PCFG Annotation Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson 98]
PCFG Annotation Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson 98] Head Lexicalization [Collins 99]
PCFG Annotation Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson 98] Head Lexicalization [Collins 99] Automatic Clustering? [Matsuzaki 05, Petrov 06]
[Matsuzaki et. al., 05] Learning Latent Annotations EM algorithm:
[Matsuzaki et. al., 05] Learning Latent Annotations EM algorithm: Brackets are known Base categories are known Only subcategories hidden
[Matsuzaki et. al., 05] Learning Latent Annotations EM algorithm: Brackets are known Base categories are known Only subcategories hidden Just like Forward-Backward for HMMs
[Matsuzaki et. al., 05] Learning Latent Annotations EM algorithm: Brackets are known Base categories are known Only subcategories hidden Just like Forward-Backward for HMMs
[Matsuzaki et. al., 05] Learning Latent Annotations EM algorithm: Brackets are known Base categories are known Only subcategories hidden Forward X 1 X X 2 X4 7 X 3 X 5 X 6. Just like Forward-Backward for HMMs He was right Backward
Learning Latent Annotations DT
Learning Latent Annotations DT DT-1 DT-2 DT-3 DT-4
Learning Latent Annotations DT
[Petrov et. al., 06] Hierarchical Split/Merge Train DT
[Petrov et. al., 06] Hierarchical Split/Merge Train DT
[Petrov et. al., 06] Hierarchical Split/Merge Train DT
[Petrov et. al., 06] Hierarchical Split/Merge Train DT
[Petrov et. al., 06] Hierarchical Split/Merge Train DT
[Petrov et. al., 06] Hierarchical Split/Merge Train DT Unhelpful Split
[Petrov et. al., 06] Hierarchical Split/Merge Train DT Unhelpful Split
[Petrov et. al., 06] Hierarchical Split/Merge Train DT
[Petrov et. al., 06] Hierarchical Split/Merge Train DT
Hierarchical Split/Merge Train DT Model k=16 F1 Flat Training 87.3 Split Merge 89.5 [Petrov et. al., 06]
Hierarchical Split/Merge Train DT Lesson Don t give EM too much of a leash Model k=16 F1 Flat Training 87.3 Split Merge 89.5 [Petrov et. al., 06]
Linguistic Candy Proper Nouns (NNP): NNP-14 Oct. Nov. Sept. NNP-12 John Robert James NNP-2 J. E. L. NNP-1 Bush Noriega Peters NNP-15 New San Wall NNP-3 York Francisco Street Personal pronouns (PRP): PRP-0 It He I PRP-1 it he they PRP-2 it them him [Petrov et. al., 06]
[Petrov et. al., 07] Learning Latent Annotations 40 words F1 all F1 E N G G E R C H N Charniak&Johnson 05 (generative) 90.1 89.6 Petrov et. al. 07 90.6 90.1 Dubey 05 76.3 - Petrov et. al. 07 80.8 80.1 Chiang et al. 02 80.0 76.6 Petrov et. al. 07 86.3 83.4
[Petrov et. al., 07] Learning Latent Annotations 40 words F1 all F1 E N G G E R C H N Charniak&Johnson 05 (generative) 90.1 89.6 Petrov et. al. 07 90.6 90.1 Download Parser http://nlp.cs.berkeley.edu Dubey 05 76.3 - Petrov et. al. 07 80.8 80.1 Chiang et al. 02 80.0 76.6 Petrov et. al. 07 86.3 83.4
Outline Parsing PCFG Annotation Machine Translation Learning Phrase Tables Information Extraction / Discourse Unsupervised Coref Resolution
Phrase-Based Model
Phrase-Based Model Sentence-aligned corpus
Phrase-Based Model Sentence-aligned corpus
Phrase-Based Model Sentence-aligned corpus Directional word alignments
Phrase-Based Model Sentence-aligned corpus Directional word alignments
Phrase-Based Model Sentence-aligned corpus Directional word alignments Intersected and grown word alignments
Phrase-Based Model Sentence-aligned corpus Directional word alignments Intersected and grown word alignments
Phrase-Based Model cat chat 0.9 the cat le chat 0.8 dog chien 0.8 house maison 0.6 my house ma maison 0.9 language langue 0.9 Sentence-aligned corpus Phrase table (translation model) Directional word alignments Intersected and grown word alignments
Overview: Phrase Extraction appelle un chat un chat call a spade a spade
Overview: Phrase Extraction appelle un chat un chat appelle call call a spade a spade
Overview: Phrase Extraction appelle un chat un chat appelle call call a spade a spade chat un chat spade a spade
Overview: Phrase Extraction appelle un chat un chat appelle call call a spade a spade chat un chat spade a spade What s Hidden: Sentence segmentation and phrasal alignment
Overview: Phrase Extraction appelle un chat un chat call a spade a spade
Overview: Phrase Extraction appelle un chat un chat call a spade a spade appelle appelle un appelle un chat un un chat un chat un chat chat un chat un chat call call a call a spade a x2 a spade x2 a spade a spade x2 spade a spade a spade
Overview: Phrase Extraction appelle un chat un chat call a spade a spade appelle appelle un appelle un chat un un chat un chat un chat chat un chat un chat call call a call a spade a x2 a spade x2 a spade a spade x2 spade a spade a spade Extract all phrases, no competition for which are more useful.
Learning Phrases
Learning Phrases Sentence-aligned corpus
Learning Phrases Phrase-level generative model Sentence-aligned corpus
Learning Phrases Phrase-level generative model cat chat 0.9 the cat le chat 0.8 dog chien 0.8 house maison 0.6 my house ma maison 0.9 language langue 0.9 Sentence-aligned corpus Phrase table (translation model)
Learning Phrases Phrase-level generative model cat chat 0.9 the cat le chat 0.8 dog chien 0.8 house maison 0.6 my house ma maison 0.9 language langue 0.9 Sentence-aligned corpus Phrase table (translation model) DeNero et. al. (2006) argues underperforms surface statistic
Learning Phrases Phrase-level generative model cat chat 0.9 the cat le chat 0.8 dog chien 0.8 house maison 0.6 my house ma maison 0.9 language langue 0.9 Sentence-aligned corpus Phrase table (translation model) DeNero et. al. (2006) argues underperforms surface statistic Latent variables abused, EM finds bad solution
Learning Phrases Phrase-level generative model cat chat 0.9 the cat le chat 0.8 dog chien 0.8 house maison 0.6 my house ma maison 0.9 language langue 0.9 Sentence-aligned corpus Phrase table (translation model) DeNero et. al. (2006) argues underperforms surface statistic Latent variables abused, EM finds bad solution DeNero et. al. (2008) model outperforms surface statistics
Dirichlet Process (DP) Prior
Dirichlet Process (DP) Prior Distribution over distributions
Dirichlet Process (DP) Prior Distribution over distributions
Dirichlet Process (DP) Prior Distribution over distributions Base Measure
Dirichlet Process (DP) Prior Concentration Distribution over distributions Base Measure
Dirichlet Process (DP) Prior Concentration Distribution over distributions Base Measure Almost surely discrete (even if base isn t)
Dirichlet Process (DP) Prior Concentration Distribution over distributions Base Measure Almost surely discrete (even if base isn t) G = π i δ xi ( ) i=1
Dirichlet Process (DP) Prior Concentration Distribution over distributions Base Measure Almost surely discrete (even if base isn t) G = π i δ xi ( ) i=1 Each point x i represent params of a latent cluster
Dirichlet Process (DP) Prior Concentration Distribution over distributions Base Measure Almost surely discrete (even if base isn t) G = π i δ xi ( ) i=1 Each point Nonparametric x i represent params of a latent cluster
Dirichlet Process (DP) Prior Concentration Distribution over distributions Base Measure Almost surely discrete (even if base isn t) G = x i π i δ xi ( ) i=1 Each point represent params of a latent cluster Nonparametric # of params grows with data (logn for N points)
Chinese Restaurant Process
Chinese Restaurant Process
Chinese Restaurant Process
Chinese Restaurant Process
Chinese Restaurant Process
Chinese Restaurant Process
Chinese Restaurant Process
Chinese Restaurant Process
Chinese Restaurant Process
Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]
Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]
Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]
Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]
Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]
Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]
Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]
Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]
Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]
Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]
Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]
Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]
Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]
Bayesian Phrase Learning [DeNero et. al.,2006] [DeNero et. al., 2008]
Bayesian Phrase Learning [DeNero et. al.,2006] [DeNero et. al., 2008]
Bayesian Phrase Learning [DeNero et. al.,2006] [DeNero et. al., 2008]
Gibbs Sampling Inference [DeNero et. al., 2008]
Gibbs Sampling Inference [DeNero et. al., 2008]
Gibbs Sampling Inference [DeNero et. al., 2008]
Gibbs Sampling Inference [DeNero et. al., 2008]
Gibbs Sampling Inference [DeNero et. al., 2008]
Gibbs Sampling Inference [DeNero et. al., 2008]
[DeNero et. al., 2008] Phrase Learning Results EN-ES Europarl, evaluate on BLEU 10k 100k Heuristic Baseline 14.6 21.8 Learned 15.5 23.2
Outline Parsing PCFG Annotation Machine Translation Learning Phrase Tables Information Extraction / Discourse Unsupervised Coref Resolution
Coreference Resolution Weir group.. whose.. headquarters.. U.S.. corporation.. power plant.. which.. Jiangsu..
Coreference Resolution Weir group.. whose.. headquarters.. U.S.. corporation.. power plant.. which.. Jiangsu..
Coreference Resolution Weir group.. whose.. headquarters Mention.. U.S.. corporation.. power plant.. which.. Jiangsu..
Coreference Resolution Weir group.. whose.. headquarters.. U.S.. corporation.. power plant.. which.. Jiangsu..
Coreference Resolution Weir Group Weir Group Weir HQ Weir group.. whose.. headquarters United States Weir Group.. U.S.. corporation.. Weir Plant Weir Plant Jiangsu power plant.. which.. Jiangsu..
Coreference Resolution Weir Group Weir Group Weir HQ Weir group.. whose.. headquarters Entity United States Weir Group.. U.S.. corporation.. Weir Plant Weir Plant Jiangsu power plant.. which.. Jiangsu..
Coreference Resolution Weir Group Weir Group Weir HQ Weir group.. whose.. headquarters United States Weir Group.. U.S.. corporation.. Weir Plant Weir Plant Jiangsu power plant.. which.. Jiangsu..
Finite Mixture Model [Haghighi & Klein, 2007]
[Haghighi & Klein, 2007] Finite Mixture Model Z1= Weir Group Z2= Weir Group Z3= Weir HQ
[Haghighi & Klein, 2007] Finite Mixture Model Entity Distribution P(Weir Group) = 0.2,... P(Weir HQ) = 0.5, Z1= Weir Group Z2= Weir Group Z3= Weir HQ
[Haghighi & Klein, 2007] Finite Mixture Model Entity Distribution P(Weir Group) = 0.2,... P(Weir HQ) = 0.5, Z1= Weir Group Z2= Weir Group Z3= Weir HQ W1= Weir Group W2= whose W3= headquart.
Finite Mixture Model Entity Distribution Z1= Weir Group P(Weir Group) = 0.2,... P(Weir HQ) = 0.5, Z2= Weir Group Z3= Weir HQ Mention Distribution P(W Weir Group): Weir Group =0.4, whose =0.2,... W1= Weir Group W2= whose W3= headquart. [Haghighi & Klein, 2007]
Bayesian Finite Mixture Model Entity Distribution β K Mention Distribution Z1= Weir Group Z2= Weir Group Z3= Weir HQ P(W Weir Group): Weir Group =0.4, whose =0.2,... W1= Weir Group W2= whose W3= headquart. [Haghighi & Klein, 2007]
[Haghighi & Klein, 2007] Bayesian Finite Mixture Model Z1= Weir Group Entity Distribution β Z2= Weir Group K Z3= Weir HQ This is how many entities there are Mention Distribution P(W Weir Group): Weir Group =0.4, whose =0.2,... W1= Weir Group W2= whose W3= headquart.
Bayesian Finite Mixture Model Entity Distribution β K Mention Distribution Z1= Weir Group Z2= Weir Group Z3= Weir HQ P(W Weir Group): Weir Group =0.4, whose =0.2,... W1= Weir Group W2= whose W3= headquart. [Haghighi & Klein, 2007]
[Haghighi & Klein, 2007] Bayesian Finite Mixture Model Entity Distribution β K Mention Distribution Z1= Weir Group Z2= Weir Group Z3= Weir HQ φ K W1= Weir Group W2= whose W3= headquart.
Bayesian Finite Mixture Model Entity Distribution How do you choose K? β K Mention Distribution Z1= Weir Group Z2= Weir Group Z3= Weir HQ φ K W1= Weir Group W2= whose W3= headquart. [Haghighi & Klein, 2007]
[Haghighi & Klein, 2007] Bayesian Finite Mixture Model Entity Distribution β K Mention Distribution Z1= Weir Group Z2= Weir Group Z3= Weir HQ φ K W1= Weir Group W2= whose W3= headquart.
[Haghighi & Klein, 2007] Infinite Mixture Model Entity Distribution β Mention Distribution Z1= Weir Group Z2= Weir Group Z3= Weir HQ φ W1= Weir Group W2= whose W3= headquart.
[Haghighi & Klein, 2007] Infinite Mixture Model Drawn from a Dirichlet Entity Distribution β Process (DP) prior [Teh et al., 2006] Mention Distribution Z1= Weir Group Z2= Weir Group Z3= Weir HQ φ W1= Weir Group W2= whose W3= headquart.
[Haghighi & Klein, 2007] Infinite Mixture Model Entity Distribution β Mention Distribution Z1= Weir Group Z2= Weir Group Z3= Weir HQ φ W1= Weir Group W2= whose W3= headquart.
Infinite Mixture Model β...... φ L L Z Z S S T N G T N G M W M W
Global Coreference Resolution [Haghighi & Klein, 2007]
Global Coreference Resolution [Haghighi & Klein, 2007]
Global Coreference Resolution [Haghighi & Klein, 2007]
Global Coreference Resolution [Haghighi & Klein, 2007]
[Haghighi & Klein, 2007] Global Coreference Resolution Global Entities
HDP Model ψ β.... φ L Z L Z S T N G S T N G M W M W N
HDP Model β 0 ψ β.... φ L Z L Z S T N G S T N G M W M W N
HDP Model β 0 Global Entity Distribution drawn from a DP ψ β.... φ L Z L Z S T N G S T N G M W M W N
HDP Model β 0 ψ β.... φ L Z L Z S T N G S T N G M W M W N
HDP Model β 0 ψ L Z β.... L Document Entity Distribution subsampled from Global Distr. Z φ S T N G S T N G M W M W N
HDP Model β 0 ψ β.... φ L Z L Z S T N G S T N G M W M W N
Unsupervised Coref Results MUC6: 30 train/test documents Data Set # Doc P R F MUC6 60 80.8 52.8 63.9 +DRYRUN 251 79.1 59.7 68.0 +NWIRE 381 80.4 62.4 70.3 Recent Supervised Number 73.4 F1 [McCallum & Wellner, 2004] [Haghighi & Klein, 2007]
Summary Latent Variable Models Flexible Require careful model design Can yield state-of-the-art results for a variety of tasks Many interesting unsupervised problems yet to explore
Thanks http://nlp.cs.berkeley.edu
Latent Variable Models Modeling Capture information missing in annotation Compensate for independence assumptions Induction Only game in town. Careful structural design
Manual Annotation Klein & Manning, 2003
Manual Annotation Manually split categories NP: subject vs object DT: determiners vs demonstratives IN: sentential vs prepositional Advantages: Fairly compact grammar Linguistic motivations Disadvantages: Performance leveled out Manually annotated Klein & Manning, 2003
Manual Annotation Manually split categories NP: subject vs object DT: determiners vs demonstratives IN: sentential vs prepositional Advantages: Fairly compact grammar Linguistic motivations Disadvantages: Performance leveled out Manually annotated Klein & Manning, 2003
Manual Annotation Manually split categories NP: subject vs object DT: determiners vs demonstratives IN: sentential vs prepositional Advantages: Fairly compact grammar Linguistic motivations Disadvantages: Performance leveled out Manually annotated Manual Annotation 86.36 F1 Klein & Manning, 2003
Latent Variable Models
Latent Variable Models
Latent Variable Models Observed
Latent Variable Models Observed Hidden
Latent Variable Models
Latent Variable Models
Latent Variable Models Modeling
Latent Variable Models Modeling
Latent Variable Models Modeling
Latent Variable Models Modeling
Latent Variable Models Modeling Induction (Unsupervised Learning)
Latent Variable Models Modeling Induction (Unsupervised Learning)
Learning Latent Annotations
Learning Latent Annotations Learning [Petrov et. al. 06] Adaptive Merging Hierarchical Training Inference [Petrov et. al. 07] Coarse-to-fine approximation Max constituent parsing
Latent Variable Models
Latent Variable Models
Latent Variable Models Observed
Latent Variable Models Observed Hidden
Latent Variable Models Observed Hidden Better model
Latent Variable Models Observed Hidden Better model Induce
Linguistic Candy [Petrov et. al., 06]
Linguistic Candy Proper Nouns (NNP): NNP-14 Oct. Nov. Sept. NNP-12 John Robert James NNP-15 New San Wall NNP-3 York Francisco Street [Petrov et. al., 06]
Linguistic Candy Proper Nouns (NNP): NNP-14 Oct. Nov. Sept. NNP-12 John Robert James NNP-15 New San Wall NNP-3 York Francisco Street Personal pronouns (PRP): PRP-0 It He I PRP-1 it he they PRP-2 it them him [Petrov et. al., 06]