Latent Variable Models in NLP

Size: px

Start display at page:

Download "Latent Variable Models in NLP"

Marcus Little
5 years ago
Views:

1 Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division

2 Latent Variable Models

3 Latent Variable Models

4 Latent Variable Models Observed

5 Latent Variable Models Observed Hidden

6 Latent Variable Models Observed Hidden e.g. Word Alignment

7 Latent Variable Models Observed Hidden e.g. Word Alignment

8 Latent Variable Models Observed Hidden e.g. Word Alignment Observed Bitext

9 Latent Variable Models Observed Hidden e.g. Word Alignment Observed Bitext Hidden Alignment

10 Latent Variable Models Observed Hidden

11 Latent Variable Models Observed Hidden Basic question for LV models

12 Latent Variable Models Observed Hidden Basic question for LV models Q: How do I make latent variables behave the way I want them too.

13 Latent Variable Models Observed Hidden Basic question for LV models Q: How do I make latent variables behave the way I want them too. A: Very careful model design!

14 Recent Applications

15 Recent Applications Parsing PCFG Annotation

16 Recent Applications Parsing PCFG Annotation Machine Translation Learning Phrase Tables

17 Recent Applications Parsing PCFG Annotation Machine Translation Learning Phrase Tables Information Extraction / Discourse Unsupervised Coref Resolution

18 Recent Applications Parsing Machine Translation Learning Phrase Tables Information Extraction / Discourse Disclaimer There are many other excellent examples of PCFG LV models Annotation in NLP, these examples are from the Berkeley group. Unsupervised Coref Resolution

19 PCFG Annotation

20 PCFG Annotation

21 Subject NP PCFG Annotation

22 PCFG Annotation Subject NP Object NP

23 PCFG Annotation

24 PCFG Annotation Annotation refines base treebank symbols to improve statistical fit of the grammar

25 PCFG Annotation Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson 98]

26 PCFG Annotation Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson 98] Head Lexicalization [Collins 99]

Parent annotation [Johnson 98] Head Lexicalization

27 PCFG Annotation Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson 98] Head Lexicalization [Collins 99] Automatic Clustering? [Matsuzaki 05, Petrov 06]

28 [Matsuzaki et. al., 05] Learning Latent Annotations EM algorithm:

29 [Matsuzaki et. al., 05] Learning Latent Annotations EM algorithm: Brackets are known Base categories are known Only subcategories hidden

30 [Matsuzaki et. al., 05] Learning Latent Annotations EM algorithm: Brackets are known Base categories are known Only subcategories hidden Just like Forward-Backward for HMMs

31 [Matsuzaki et. al., 05] Learning Latent Annotations EM algorithm: Brackets are known Base categories are known Only subcategories hidden Just like Forward-Backward for HMMs

32 [Matsuzaki et. al., 05] Learning Latent Annotations EM algorithm: Brackets are known Base categories are known Only subcategories hidden Forward X 1 X X 2 X4 7 X 3 X 5 X 6. Just like Forward-Backward for HMMs He was right Backward

33 Learning Latent Annotations DT

34 Learning Latent Annotations DT DT-1 DT-2 DT-3 DT-4

35 Learning Latent Annotations DT

36 [Petrov et. al., 06] Hierarchical Split/Merge Train DT

37 [Petrov et. al., 06] Hierarchical Split/Merge Train DT

38 [Petrov et. al., 06] Hierarchical Split/Merge Train DT

39 [Petrov et. al., 06] Hierarchical Split/Merge Train DT

40 [Petrov et. al., 06] Hierarchical Split/Merge Train DT

41 [Petrov et. al., 06] Hierarchical Split/Merge Train DT Unhelpful Split

42 [Petrov et. al., 06] Hierarchical Split/Merge Train DT Unhelpful Split

43 [Petrov et. al., 06] Hierarchical Split/Merge Train DT

44 [Petrov et. al., 06] Hierarchical Split/Merge Train DT

45 Hierarchical Split/Merge Train DT Model k=16 F1 Flat Training 87.3 Split Merge 89.5 [Petrov et. al., 06]

46 Hierarchical Split/Merge Train DT Lesson Don t give EM too much of a leash Model k=16 F1 Flat Training 87.3 Split Merge 89.5 [Petrov et. al., 06]

47 Linguistic Candy Proper Nouns (NNP): NNP-14 Oct. Nov. Sept. NNP-12 John Robert James NNP-2 J. E. L. NNP-1 Bush Noriega Peters NNP-15 New San Wall NNP-3 York Francisco Street Personal pronouns (PRP): PRP-0 It He I PRP-1 it he they PRP-2 it them him [Petrov et. al., 06]

48 [Petrov et. al., 07] Learning Latent Annotations 40 words F1 all F1 E N G G E R C H N Charniak&Johnson 05 (generative) Petrov et. al Dubey Petrov et. al Chiang et al Petrov et. al

49 [Petrov et. al., 07] Learning Latent Annotations 40 words F1 all F1 E N G G E R C H N Charniak&Johnson 05 (generative) Petrov et. al Download Parser Dubey Petrov et. al Chiang et al Petrov et. al

50 Outline Parsing PCFG Annotation Machine Translation Learning Phrase Tables Information Extraction / Discourse Unsupervised Coref Resolution

51 Phrase-Based Model

52 Phrase-Based Model Sentence-aligned corpus

53 Phrase-Based Model Sentence-aligned corpus

54 Phrase-Based Model Sentence-aligned corpus Directional word alignments

55 Phrase-Based Model Sentence-aligned corpus Directional word alignments

56 Phrase-Based Model Sentence-aligned corpus Directional word alignments Intersected and grown word alignments

57 Phrase-Based Model Sentence-aligned corpus Directional word alignments Intersected and grown word alignments

58 Phrase-Based Model cat chat 0.9 the cat le chat 0.8 dog chien 0.8 house maison 0.6 my house ma maison 0.9 language langue 0.9 Sentence-aligned corpus Phrase table (translation model) Directional word alignments Intersected and grown word alignments

59 Overview: Phrase Extraction appelle un chat un chat call a spade a spade

60 Overview: Phrase Extraction appelle un chat un chat appelle call call a spade a spade

61 Overview: Phrase Extraction appelle un chat un chat appelle call call a spade a spade chat un chat spade a spade

62 Overview: Phrase Extraction appelle un chat un chat appelle call call a spade a spade chat un chat spade a spade What s Hidden: Sentence segmentation and phrasal alignment

63 Overview: Phrase Extraction appelle un chat un chat call a spade a spade

64 Overview: Phrase Extraction appelle un chat un chat call a spade a spade appelle appelle un appelle un chat un un chat un chat un chat chat un chat un chat call call a call a spade a x2 a spade x2 a spade a spade x2 spade a spade a spade

65 Overview: Phrase Extraction appelle un chat un chat call a spade a spade appelle appelle un appelle un chat un un chat un chat un chat chat un chat un chat call call a call a spade a x2 a spade x2 a spade a spade x2 spade a spade a spade Extract all phrases, no competition for which are more useful.

66 Learning Phrases

67 Learning Phrases Sentence-aligned corpus

68 Learning Phrases Phrase-level generative model Sentence-aligned corpus

69 Learning Phrases Phrase-level generative model cat chat 0.9 the cat le chat 0.8 dog chien 0.8 house maison 0.6 my house ma maison 0.9 language langue 0.9 Sentence-aligned corpus Phrase table (translation model)

70 Learning Phrases Phrase-level generative model cat chat 0.9 the cat le chat 0.8 dog chien 0.8 house maison 0.6 my house ma maison 0.9 language langue 0.9 Sentence-aligned corpus Phrase table (translation model) DeNero et. al. (2006) argues underperforms surface statistic

71 Learning Phrases Phrase-level generative model cat chat 0.9 the cat le chat 0.8 dog chien 0.8 house maison 0.6 my house ma maison 0.9 language langue 0.9 Sentence-aligned corpus Phrase table (translation model) DeNero et. al. (2006) argues underperforms surface statistic Latent variables abused, EM finds bad solution

72 Learning Phrases Phrase-level generative model cat chat 0.9 the cat le chat 0.8 dog chien 0.8 house maison 0.6 my house ma maison 0.9 language langue 0.9 Sentence-aligned corpus Phrase table (translation model) DeNero et. al. (2006) argues underperforms surface statistic Latent variables abused, EM finds bad solution DeNero et. al. (2008) model outperforms surface statistics

73 Dirichlet Process (DP) Prior

74 Dirichlet Process (DP) Prior Distribution over distributions

75 Dirichlet Process (DP) Prior Distribution over distributions

76 Dirichlet Process (DP) Prior Distribution over distributions Base Measure

77 Dirichlet Process (DP) Prior Concentration Distribution over distributions Base Measure

78 Dirichlet Process (DP) Prior Concentration Distribution over distributions Base Measure Almost surely discrete (even if base isn t)

79 Dirichlet Process (DP) Prior Concentration Distribution over distributions Base Measure Almost surely discrete (even if base isn t) G = π i δ xi ( ) i=1

80 Dirichlet Process (DP) Prior Concentration Distribution over distributions Base Measure Almost surely discrete (even if base isn t) G = π i δ xi ( ) i=1 Each point x i represent params of a latent cluster

81 Dirichlet Process (DP) Prior Concentration Distribution over distributions Base Measure Almost surely discrete (even if base isn t) G = π i δ xi ( ) i=1 Each point Nonparametric x i represent params of a latent cluster

82 Dirichlet Process (DP) Prior Concentration Distribution over distributions Base Measure Almost surely discrete (even if base isn t) G = x i π i δ xi ( ) i=1 Each point represent params of a latent cluster Nonparametric # of params grows with data (logn for N points)

83 Chinese Restaurant Process

84 Chinese Restaurant Process

85 Chinese Restaurant Process

86 Chinese Restaurant Process

87 Chinese Restaurant Process

88 Chinese Restaurant Process

89 Chinese Restaurant Process

90 Chinese Restaurant Process

91 Chinese Restaurant Process

92 Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]

93 Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]

94 Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]

95 Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]

96 Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]

97 Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]

98 Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]

99 Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]

100 Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]

101 Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]

102 Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]

103 Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]

104 Bayesian Phrase Learning Given: Sentence pair (e, f) 1. Generate l phrase pairs: P (l) =p $ (1 p $ ) l 1 2. Generate l Phrase Pairs {(e i,f i )} (e i,f i ) G, G DP(α,P 0 ) P 0 (f, e) =P f (f)p e (e) P e (e) = ( p s (1 p s ) e 1) ( 1 n e ) e 3. Align and Reorder foreign phrases {f i } Output: Aligned phrase pairs {(e i,f i )} [DeNero et. al., 2008]

105 Bayesian Phrase Learning [DeNero et. al.,2006] [DeNero et. al., 2008]

106 Bayesian Phrase Learning [DeNero et. al.,2006] [DeNero et. al., 2008]

107 Bayesian Phrase Learning [DeNero et. al.,2006] [DeNero et. al., 2008]

108 Gibbs Sampling Inference [DeNero et. al., 2008]

109 Gibbs Sampling Inference [DeNero et. al., 2008]

110 Gibbs Sampling Inference [DeNero et. al., 2008]

111 Gibbs Sampling Inference [DeNero et. al., 2008]

112 Gibbs Sampling Inference [DeNero et. al., 2008]

113 Gibbs Sampling Inference [DeNero et. al., 2008]

114 [DeNero et. al., 2008] Phrase Learning Results EN-ES Europarl, evaluate on BLEU 10k 100k Heuristic Baseline Learned

115 Outline Parsing PCFG Annotation Machine Translation Learning Phrase Tables Information Extraction / Discourse Unsupervised Coref Resolution

116 Coreference Resolution Weir group.. whose.. headquarters.. U.S.. corporation.. power plant.. which.. Jiangsu..

117 Coreference Resolution Weir group.. whose.. headquarters.. U.S.. corporation.. power plant.. which.. Jiangsu..

118 Coreference Resolution Weir group.. whose.. headquarters Mention.. U.S.. corporation.. power plant.. which.. Jiangsu..

119 Coreference Resolution Weir group.. whose.. headquarters.. U.S.. corporation.. power plant.. which.. Jiangsu..

120 Coreference Resolution Weir Group Weir Group Weir HQ Weir group.. whose.. headquarters United States Weir Group.. U.S.. corporation.. Weir Plant Weir Plant Jiangsu power plant.. which.. Jiangsu..

121 Coreference Resolution Weir Group Weir Group Weir HQ Weir group.. whose.. headquarters Entity United States Weir Group.. U.S.. corporation.. Weir Plant Weir Plant Jiangsu power plant.. which.. Jiangsu..

122 Coreference Resolution Weir Group Weir Group Weir HQ Weir group.. whose.. headquarters United States Weir Group.. U.S.. corporation.. Weir Plant Weir Plant Jiangsu power plant.. which.. Jiangsu..

123 Finite Mixture Model [Haghighi & Klein, 2007]

124 [Haghighi & Klein, 2007] Finite Mixture Model Z1= Weir Group Z2= Weir Group Z3= Weir HQ

125 [Haghighi & Klein, 2007] Finite Mixture Model Entity Distribution P(Weir Group) = 0.2,... P(Weir HQ) = 0.5, Z1= Weir Group Z2= Weir Group Z3= Weir HQ

126 [Haghighi & Klein, 2007] Finite Mixture Model Entity Distribution P(Weir Group) = 0.2,... P(Weir HQ) = 0.5, Z1= Weir Group Z2= Weir Group Z3= Weir HQ W1= Weir Group W2= whose W3= headquart.

127 Finite Mixture Model Entity Distribution Z1= Weir Group P(Weir Group) = 0.2,... P(Weir HQ) = 0.5, Z2= Weir Group Z3= Weir HQ Mention Distribution P(W Weir Group): Weir Group =0.4, whose =0.2,... W1= Weir Group W2= whose W3= headquart. [Haghighi & Klein, 2007]

128 Bayesian Finite Mixture Model Entity Distribution β K Mention Distribution Z1= Weir Group Z2= Weir Group Z3= Weir HQ P(W Weir Group): Weir Group =0.4, whose =0.2,... W1= Weir Group W2= whose W3= headquart. [Haghighi & Klein, 2007]

129 [Haghighi & Klein, 2007] Bayesian Finite Mixture Model Z1= Weir Group Entity Distribution β Z2= Weir Group K Z3= Weir HQ This is how many entities there are Mention Distribution P(W Weir Group): Weir Group =0.4, whose =0.2,... W1= Weir Group W2= whose W3= headquart.

130 Bayesian Finite Mixture Model Entity Distribution β K Mention Distribution Z1= Weir Group Z2= Weir Group Z3= Weir HQ P(W Weir Group): Weir Group =0.4, whose =0.2,... W1= Weir Group W2= whose W3= headquart. [Haghighi & Klein, 2007]

131 [Haghighi & Klein, 2007] Bayesian Finite Mixture Model Entity Distribution β K Mention Distribution Z1= Weir Group Z2= Weir Group Z3= Weir HQ φ K W1= Weir Group W2= whose W3= headquart.

132 Bayesian Finite Mixture Model Entity Distribution How do you choose K? β K Mention Distribution Z1= Weir Group Z2= Weir Group Z3= Weir HQ φ K W1= Weir Group W2= whose W3= headquart. [Haghighi & Klein, 2007]

133 [Haghighi & Klein, 2007] Bayesian Finite Mixture Model Entity Distribution β K Mention Distribution Z1= Weir Group Z2= Weir Group Z3= Weir HQ φ K W1= Weir Group W2= whose W3= headquart.

134 [Haghighi & Klein, 2007] Infinite Mixture Model Entity Distribution β Mention Distribution Z1= Weir Group Z2= Weir Group Z3= Weir HQ φ W1= Weir Group W2= whose W3= headquart.

135 [Haghighi & Klein, 2007] Infinite Mixture Model Drawn from a Dirichlet Entity Distribution β Process (DP) prior [Teh et al., 2006] Mention Distribution Z1= Weir Group Z2= Weir Group Z3= Weir HQ φ W1= Weir Group W2= whose W3= headquart.

136 [Haghighi & Klein, 2007] Infinite Mixture Model Entity Distribution β Mention Distribution Z1= Weir Group Z2= Weir Group Z3= Weir HQ φ W1= Weir Group W2= whose W3= headquart.

137 Infinite Mixture Model β φ L L Z Z S S T N G T N G M W M W

138 Global Coreference Resolution [Haghighi & Klein, 2007]

139 Global Coreference Resolution [Haghighi & Klein, 2007]

140 Global Coreference Resolution [Haghighi & Klein, 2007]

141 Global Coreference Resolution [Haghighi & Klein, 2007]

142 [Haghighi & Klein, 2007] Global Coreference Resolution Global Entities

143 HDP Model ψ β.... φ L Z L Z S T N G S T N G M W M W N

144 HDP Model β 0 ψ β.... φ L Z L Z S T N G S T N G M W M W N

145 HDP Model β 0 Global Entity Distribution drawn from a DP ψ β.... φ L Z L Z S T N G S T N G M W M W N

146 HDP Model β 0 ψ β.... φ L Z L Z S T N G S T N G M W M W N

147 HDP Model β 0 ψ L Z β.... L Document Entity Distribution subsampled from Global Distr. Z φ S T N G S T N G M W M W N

148 HDP Model β 0 ψ β.... φ L Z L Z S T N G S T N G M W M W N

149 Unsupervised Coref Results MUC6: 30 train/test documents Data Set # Doc P R F MUC DRYRUN NWIRE Recent Supervised Number 73.4 F1 [McCallum & Wellner, 2004] [Haghighi & Klein, 2007]

150 Summary Latent Variable Models Flexible Require careful model design Can yield state-of-the-art results for a variety of tasks Many interesting unsupervised problems yet to explore

151 Thanks

152 Latent Variable Models Modeling Capture information missing in annotation Compensate for independence assumptions Induction Only game in town. Careful structural design

153 Manual Annotation Klein & Manning, 2003

154 Manual Annotation Manually split categories NP: subject vs object DT: determiners vs demonstratives IN: sentential vs prepositional Advantages: Fairly compact grammar Linguistic motivations Disadvantages: Performance leveled out Manually annotated Klein & Manning, 2003

155 Manual Annotation Manually split categories NP: subject vs object DT: determiners vs demonstratives IN: sentential vs prepositional Advantages: Fairly compact grammar Linguistic motivations Disadvantages: Performance leveled out Manually annotated Klein & Manning, 2003

156 Manual Annotation Manually split categories NP: subject vs object DT: determiners vs demonstratives IN: sentential vs prepositional Advantages: Fairly compact grammar Linguistic motivations Disadvantages: Performance leveled out Manually annotated Manual Annotation F1 Klein & Manning, 2003

157 Latent Variable Models

158 Latent Variable Models

159 Latent Variable Models Observed

160 Latent Variable Models Observed Hidden

161 Latent Variable Models

162 Latent Variable Models

163 Latent Variable Models Modeling

164 Latent Variable Models Modeling

165 Latent Variable Models Modeling

166 Latent Variable Models Modeling

167 Latent Variable Models Modeling Induction (Unsupervised Learning)

168 Latent Variable Models Modeling Induction (Unsupervised Learning)

169 Learning Latent Annotations

170 Learning Latent Annotations Learning [Petrov et. al. 06] Adaptive Merging Hierarchical Training Inference [Petrov et. al. 07] Coarse-to-fine approximation Max constituent parsing

171 Latent Variable Models

172 Latent Variable Models

173 Latent Variable Models Observed

174 Latent Variable Models Observed Hidden

175 Latent Variable Models Observed Hidden Better model

176 Latent Variable Models Observed Hidden Better model Induce

177 Linguistic Candy [Petrov et. al., 06]

178 Linguistic Candy Proper Nouns (NNP): NNP-14 Oct. Nov. Sept. NNP-12 John Robert James NNP-15 New San Wall NNP-3 York Francisco Street [Petrov et. al., 06]

179 Linguistic Candy Proper Nouns (NNP): NNP-14 Oct. Nov. Sept. NNP-12 John Robert James NNP-15 New San Wall NNP-3 York Francisco Street Personal pronouns (PRP): PRP-0 It He I PRP-1 it he they PRP-2 it them him [Petrov et. al., 06]

The Infinite PCFG using Hierarchical Dirichlet Processes

The Infinite PCFG using Hierarchical Dirichlet Processes S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise