Current Word Previous Word Next Word Current Word Character n-gram all Current POS Tag Surrounding POS Tag Sequence Current Word Shape Surrounding

Size: px

Start display at page:

Download "Current Word Previous Word Next Word Current Word Character n-gram all Current POS Tag Surrounding POS Tag Sequence Current Word Shape Surrounding"

Crystal Stafford
6 years ago
Views:

1 Feature NER Current Word Previous Word Next Word Current Word Character n-gram all Current POS Tag Surrounding POS Tag Sequence Current Word Shape Surrounding Word Shape Sequence Presence of Word in Left Window size 4 Presence of Word in Right Window size 4

2 Machine Learning in Practice Describing your data with features a computer can understand Learning algorithm

10 CONSONANTS (PULMONIC) 2005 IPA Bilabial Labiodental Dental Alveolar Post alveolar Retroflex Palatal Velar Uvular Pharyngeal Glottal Plosive Nasal Trill Tap or Flap Fricative Lateral fricative Approximant Lateral approximant Where symbols appear in pairs, the one to the right represents a voiced consonant. Shaded areas denote articulations judged impossible.

14 Comparison N(T)N layer Softmax classifier Composition all reptiles walk RN(T)N layers all reptiles walk all P ( ) =0.8 all reptiles walk vs. some turtles move reptiles some some turtles move some turtles turtles move

16 Yes Is main verb trigger? No Condition Wh- word subjective? Wh- word object? Regular Exp. AGENT THEME Condition Regular Exp. default (ENABLE SUPER) + DIRECT (ENABLE SUPER) PREVENT (ENABLE SUPER) PREVENT(ENABLE SUPER)

26 h w,b (x) = f (w x + b) 1 f (z) = 1+ e z

33 Simple Chain Rule

34 Multiple Paths Chain Rule

36 Chain Rule in Flow Graph

37 Back-Prop in Multi-Layer Net h = sigmoid(vx)

38 Back-Prop in General Flow Graph

39 Automatic Differentiation

45 y = b +Wx+U tanh(d + Hx) ˆP(w t w t 1, w t n+1 )= ey wt i e y i. i-th output = P(w t = i context) softmax most computation here tanh C(w t n+1 )... Table look up in C w t n+1... C(w t 2 ) C(w t 1 ) Matrix C shared parameters across words w t 2 index for index for index for w t 1

73 a T Xa X = at X T a X = aa T

85 Method Fine-grained Binary RAE (Socher et al., 2013) MV-RNN (Socher et al., 2013) RNTN (Socher et al., 2013) DCNN (Blunsom et al., 2014) Paragraph-Vec (Le and Mikolov, 2014) CNN-non-static (Kim, 2014) CNN-multichannel (Kim, 2014) DRNN (Irsoy and Cardie, 2014) LSTM Bidirectional LSTM layer LSTM layer Bidirectional LSTM Constituency Tree LSTM (no tuning) Constituency Tree LSTM

86 Relatedness score Example A: A man is jumping into an empty pool B: There is no biker jumping in the air A: Two children are lying in the snow and are making snow angels B: Two angels are making snow on the lying children A: The young boys are playing outdoors and the man is smiling nearby B: There is no boy playing outdoors and there is no man smiling A: A person in a black jacket is doing tricks on a motorbike B: A man in a black jacket is doing tricks on a motorbike

87 Method r ρ MSE Mean vectors DT-RNN (Socher et al., 2014) SDT-RNN (Socher et al., 2014) Illinois-LH (Lai and Hockenmaier, 2014) UNAL-NLP (Jimenez et al., 2014) Meaning Factory (Bjerva et al., 2014) ECNU (Zhao et al., 2014) LSTM Bidirectional LSTM layer LSTM layer Bidirectional LSTM Constituency Tree LSTM Dependency Tree LSTM

88 r DepTree-LSTM LSTM Bi-LSTM ConstTree-LSTM mean sentence length

Algorithms for NLP. Speech Signals. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Speech Signals. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Speech Signals Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Maximum Entropy Models Improving on N-Grams? N-grams don t combine multiple sources of evidence well P(construction