Ecient Higher-Order CRFs for Morphological Tagging

Size: px
Start display at page:

Download "Ecient Higher-Order CRFs for Morphological Tagging"

Transcription

1 Ecient Higher-Order CRFs for Morphological Tagging Thomas Müller, Helmut Schmid and Hinrich Schütze Center for Information and Language Processing University of Munich

2 Outline 1 Contributions 2 Motivation 3 Model 4 Experiments

3 Contributions Fast approximate CRF tagger for big tag sets

4 Contributions Fast approximate CRF tagger for big tag sets Allows to train CRFs with high orders

5 Contributions Fast approximate CRF tagger for big tag sets Allows to train CRFs with high orders Training time reductions from several days to several hours

6 Contributions Fast approximate CRF tagger for big tag sets Allows to train CRFs with high orders Training time reductions from several days to several hours Accuracy improvements due to higher orders

7 Motivation

8 Introduction Die Rebellen haben kein Lösegeld verlangt The rebels have no ransom demanded ART NN VAF PI NN VVP nom pl masc nom pl masc pl 3 acc sg neut acc sg neut Assign coarse POS and ne MORPH tags

9 Introduction Die Rebellen haben kein Lösegeld verlangt The rebels have no ransom demanded ART NN VAF PI NN VVP nom pl masc nom pl masc pl 3 acc sg neut acc sg neut Assign coarse POS and ne MORPH tags

10 Introduction Die Rebellen haben kein Lösegeld verlangt The rebels have no ransom demanded ART NN VAF PI NN VVP nom pl masc nom pl masc pl 3 acc sg neut acc sg neut Tagging works for short-distance dependencies

11 Introduction Die Rebellen haben kein Lösegeld verlangt The rebels have no ransom demanded ART NN VAF PI NN VVP nom pl masc nom pl masc pl 3 acc sg neut acc sg neut Need higher orders for long-distance dependencies weil er kein Lösegeld verlangt because he no ransom demands KO PP PI NN VVF

12 Introduction Die Rebellen haben kein Lösegeld verlangt The rebels have no ransom demanded ART NN VAF PI NN VVP nom pl masc nom pl masc pl 3 acc sg neut acc sg neut Higher-order tagging is expensive Coarse-To-Fine approach

13 Coarse-To-Fine Decoding Basic Idea: 1 Create a 0-order lattice

14 Coarse-To-Fine Decoding Basic Idea: 1 Create a 0-order lattice 2 Calculate posterior probabilities

15 Coarse-To-Fine Decoding Basic Idea: 1 Create a 0-order lattice 2 Calculate posterior probabilities 3 Prune states with low posteriors

16 Coarse-To-Fine Decoding Basic Idea: 1 Create a 0-order lattice 2 Calculate posterior probabilities 3 Prune states with low posteriors 4 Increase lattice order

17 Coarse-To-Fine Decoding Basic Idea: 1 Create a 0-order lattice 2 Calculate posterior probabilities 3 Prune states with low posteriors 4 Increase lattice order 5 Go to 2

18 Coarse-To-Fine Decoding II Create a 0-order lattice Die Rebellen haben kein Lösegeld verlangt The rebels have no ransom demanded ART NN VAF PI NN VVP ART NN VAF PI NN VVF PDS, VAI ART NE VVP PRO PRO VVI PRO ADJ VAF, VAF,, FM VAP

19 Coarse-To-Fine Decoding II Calculate posterior probabilities Die Rebellen haben kein Lösegeld verlangt The rebels have no ransom demanded ART NN VAF PI NN VVP ART 0.90 NN 1.00 VAF 0.93 PI 1.00 NN 0.93 VVF 0.54 PDS 0.10, 0.00 VAI 0.07 ART 0.00 NE 0.05 VVP 0.46 PRO 0.00 PRO 0.00 VVI 0.00 PRO 0.00 ADJ 0.01 VAF 0.00, 0.00 VAF 0.00, 0.00, 0.00 FM 0.01 VAP

20 Coarse-To-Fine Decoding II Prune states with low posteriors Die Rebellen haben kein Lösegeld verlangt The rebels have no ransom demanded ART NN VAF PI NN VVP ART 0.90 NN 1.00 VAF 0.93 PI 1.00 NN 0.93 VVF 0.54 PDS 0.10 VAI 0.07 NE 0.05 VVP 0.46 VVI 0.00 ADJ 0.01 FM 0.01

21 Coarse-To-Fine Decoding II Increase lattice order to 1 Die Rebellen haben kein Lösegeld verlangt The rebels have no ransom demanded ART NN VAF PI NN VVP ART 1.00 NN 1.00 VAF 1.00 PI 1.00 NN 1.00 VVF 0.69 PDS 0.00 VAI 0.00 NE 0.00 VVP 0.31 VVI 0.00 ADJ 0.00 FM 0.00

22 Coarse-To-Fine Decoding II Increase lattice order to 2 Die Rebellen haben kein Lösegeld verlangt The rebels have no ransom demanded ART NN VAF PI NN VVP (S ART) 1.00 (ART NN) 1.00 (NN VAF) 1.00 (VAF PI) 0.99 (PI NN) 1.00 (NN VVP) 0.55 (NN VAI) 0.00 (VAI PI) 0.01 (PI NE) 0.00 (NN VVF) 0.45 (NE VVP) 0.00 (NE VVF) 0.00

23 Coarse-To-Fine Decoding II Increase lattice order to 3 Die Rebellen haben kein Lösegeld verlangt The rebels have no ransom demanded ART NN VAF PI NN VVP (S ART) 1.00 (S ART NN) 1.00 (ART NN VAF) 0.99 (NN VAF PI) 0.99 (VAF PI NN) 0.99 (PI NN VVP) 0.63 (ART NN VAI) 0.01 (NN VAI PI) 0.01 (VAI PI NE) 0.01 (PI NN VVF) 0.37 (PI NE VVP) 0.00 (PI NE VVF) 0.00

24 Model

25 Conditional Random Fields Model:

26 Conditional Random Fields Model: pme ( y x) = t 1 Z ME ( exp λ φ( x, yt, t) λ, x)

27 Conditional Random Fields Model: pme ( y x) = t 1 Z ME ( exp λ φ( x, yt, t) λ, x) pcrf ( y x) = 1 Z CRF ( λ, x) exp t λ φ( x, yt, yt 1, t)

28 Conditional Random Fields Model: Prune: pme ( y x) = t 1 Z ME ( exp λ φ( x, yt, t) λ, x) pcrf ( y x) = 1 Z CRF ( λ, x) exp t λ φ( x, yt, yt 1, t)

29 Conditional Random Fields Model: Prune: pme ( y x) = t 1 Z ME ( exp λ φ( x, yt, t) λ, x) pcrf ( y x) = 1 Z CRF ( λ, x) exp t λ φ( x, yt, yt 1, t) pme (y x, t) = 1 Z ME ( λ, x) exp λ φ( x, y, t)

30 Conditional Random Fields Model: Prune: pme ( y x) = t 1 Z ME ( exp λ φ( x, yt, t) λ, x) pcrf ( y x) = 1 Z CRF ( λ, x) exp t λ φ( x, yt, yt 1, t) 1 pme (y x, t) = Z ME ( exp λ φ( x, y, t) λ, x) pcrf (y x, t) = ( y :y t =y exp t λ φ( x,y t,y t 1,t )) Z CRF ( x)

31 Conditional Random Fields Model: Prune: pme ( y x) = t 1 Z ME ( exp λ φ( x, yt, t) λ, x) pcrf ( y x) = 1 Z CRF ( λ, x) exp t λ φ( x, yt, yt 1, t) 1 pme (y x, t) = Z ME ( exp λ φ( x, y, t) λ, x) pcrf (y x, t) = ( y :y t =y exp t λ φ( x,y t,y t 1,t )) Z CRF ( x) Train using L1-regularized Stochastic Gradient Descent (SGD) [Tsuruoka et al., 2009]

32 Coarse-To-Fine Decoding We could now do the following:

33 Coarse-To-Fine Decoding We could now do the following: Train a 0-order model and lter Train a 1 st -order model on the ltered data and lter...

34 Coarse-To-Fine Decoding We could now do the following: Train a 0-order model and lter Train a 1 st -order model on the ltered data and lter... Do not want to do that, because: We would need to train multiple models We would get multiple weights for the same features

35 Coarse-To-Fine Decoding We could now do the following: Train a 0-order model and lter Train a 1 st -order model on the ltered data and lter... Do not want to do that, because: We would need to train multiple models We would get multiple weights for the same features We do not need optimal lower-order models

36 Coarse-To-Fine Decoding We could now do the following: Train a 0-order model and lter Train a 1 st -order model on the ltered data and lter... Do not want to do that, because: We would need to train multiple models We would get multiple weights for the same features We do not need optimal lower-order models Train a single joint model instead!

37 Lattice Generation function GetSumLattice(sentence, τ, n) candidates getallcandidates(sentence) lattice ZeroOrderLattice(candidates) for i = 1 n do candidates lattice. prune(τ i 1) lattice SequenceLattice(candidates, i) end for return lattice end function

38 Lattice Generation II With the current lattice generation we never do lower-order updates

39 Lattice Generation II With the current lattice generation we never do lower-order updates We thus never force the model to keep the gold tags in the lower-order lattices

40 Lattice Generation II With the current lattice generation we never do lower-order updates We thus never force the model to keep the gold tags in the lower-order lattices If a gold tag gets pruned do an Early Update [Collins and Roark, 2004]

41 Lattice Generation III function GetSumLattice(sentence, τ, n) gold-tags gettags(sentence) candidates getallcandidates(sentence) lattice ZeroOrderLattice(candidates) for i = 1 n do candidates lattice. prune(τ i 1) if gold-tags candidates then return lattice end if lattice SequenceLattice(candidates, i) end for return lattice end function

42 Lattice Generation IV 0.2 Unreachable gold candidates Epochs

43 Lattice Generation V How do we set the τ i, the pruning threshold at order i?

44 Lattice Generation V How do we set the τ i, the pruning threshold at order i? Fixed τ i do not work, because

45 Lattice Generation V How do we set the τ i, the pruning threshold at order i? Fixed τ i do not work, because p(y x, t) decreases with increasing tag sizes

46 Lattice Generation V How do we set the τ i, the pruning threshold at order i? Fixed τ i do not work, because p(y x, t) decreases with increasing tag sizes During training, we start with uniform models and end with sparse models

47 Lattice Generation V How do we set the τ i, the pruning threshold at order i? Fixed τ i do not work, because p(y x, t) decreases with increasing tag sizes During training, we start with uniform models and end with sparse models Solution: Set τ i dynamically to achieve a certain average number of tags µ τ i = { +0.1 τ i if ˆµ i < µ i 0.1 τ i if ˆµ i > µ i

48 Experiments

49 Languages Language Sentences POS Arabic 15, Czech 38, English 38, Spanish 14, German 40, Hungarian 61,034 57

50 Baseline Taggers SVMTool [Giménez and Màrquez, 2004] SVM-based Left-To-Right tagger CRFSuite [Okazaki, 2007] First-Order Conditional Random Field (trained with SGD) Morfette [Chrupaªa et al., 2008] Averaged Perceptron Stanford Tagger [Toutanova et al., 2003] Bidirectional Maximum Entropy Markov Model

51 POS Experiments Arabic Czech Spanish German Hungarian English n TT ACC TT ACC TT ACC TT ACC TT ACC TT ACC CRF CRF training is fast for small tag sets (Czech and Spanish) but slow for big tagsets

52 POS Experiments Arabic Czech Spanish German Hungarian English n TT ACC TT ACC TT ACC TT ACC TT ACC TT ACC CRF PCRF CRF training is fast for small tag sets (Czech and Spanish) but slow for big tagsets First order: PCRF is twice as fast to 30-times as fast as CRF

53 POS Experiments Arabic Czech Spanish German Hungarian English n TT ACC TT ACC TT ACC TT ACC TT ACC TT ACC CRF PCRF CRF training is fast for small tag sets (Czech and Spanish) but slow for big tagsets First order: PCRF is twice as fast to 30-times as fast as CRF

54 POS Experiments Arabic Czech Spanish German Hungarian English n TT ACC TT ACC TT ACC TT ACC TT ACC TT ACC CRF PCRF PCRF * * * * * * PCRF * * * * * CRF training is fast for small tag sets (Czech and Spanish) but slow for big tagsets First order: PCRF is twice as fast to 30-times as fast as CRF Higher orders: Small but signicant improvements in accuracy for all languages

55 POS Experiments II - Accuracy Arabic Czech Spanish German Hungarian English SVMTool Morfette CRFSuite Stanford PCRF * 98.83* * 97.09* PCRF * 98.66* 97.36* 97.50* PCRF * 98.66* 97.44* 97.49* 97.19* PCRF outperforms best baseline for 3 out of 6 languages

56 POS Experiments II - Accuracy Arabic Czech Spanish German Hungarian English SVMTool Morfette CRFSuite Stanford PCRF * 98.83* * 97.09* PCRF * 98.66* 97.36* 97.50* PCRF * 98.66* 97.44* 97.49* 97.19* PCRF outperforms best baseline for 3 out of 6 languages Never signicantly worse than best baseline

57 POS Experiments II - Training Times Arabic Czech Spanish German Hungarian English Morfette CRFSuite PCRF PCRF CRFSuite is fastest baseline tagger for all languages

58 POS Experiments II - Training Times Arabic Czech Spanish German Hungarian English Morfette CRFSuite PCRF PCRF CRFSuite is fastest baseline tagger for all languages PCRF is faster for bigger tag sets (> 38)

59 Languages Language Sentences POS POS+MORPH A Arabic 15, Czech 38, , Spanish 14, German 40, Hungarian 61, ,

60 POS + MORPH Experiments Order Arabic Czech Spanish German Hungarian Oracle Oracle is a rst-order PCRF, but gold tags get reinserted when pruned

61 POS + MORPH Experiments Order Arabic Czech Spanish German Hungarian Oracle PCRF * * Oracle is a rst-order PCRF, but gold tags get reinserted when pruned Small losses for Spanish and Hungarian, greater losses for Arabic, English and German

62 POS + MORPH Experiments Order Arabic Czech Spanish German Hungarian Oracle PCRF * * PCRF * 93.06* * 96.57* PCRF * 92.97* * Oracle is a rst-order PCRF, but gold tags get reinserted when pruned Small losses for Spanish and Hungarian, greater losses for Arabic, English and German Higher-order models outperform Oracle

63 POS + MORPH Experiments II - Accuracy Arabic Czech Spanish German Hungarian SVMTool Morfette CRFSuite PCRF PCRF PCRF PCRF outperforms best baselines

64 POS + MORPH Experiments II - Accuracy Arabic Czech Spanish German Hungarian SVMTool Morfette CRFSuite PCRF PCRF PCRF PCRF outperforms best baselines Moderate improvements for less ambiguous languages (Spanish, Hungarian)

65 POS + MORPH Experiments II - Accuracy Arabic Czech Spanish German Hungarian SVMTool Morfette CRFSuite PCRF PCRF PCRF PCRF outperforms best baselines Moderate improvements for less ambiguous languages (Spanish, Hungarian) Large improvements for more ambiguous languages (Arabic, Czech, German)

66 POS + MORPH Experiments II - Training Times Arabic Czech Spanish German Hungarian Morfette CRFSuite PCRF PCRF PCRF CRFSuite slower than Morfette (An order of magnitude for Hungarian and Czech)

67 POS + MORPH Experiments II - Training Times Arabic Czech Spanish German Hungarian Morfette CRFSuite PCRF PCRF PCRF CRFSuite slower than Morfette (An order of magnitude for Hungarian and Czech) PCRF usually twice as fast as Morfette

68 POS + MORPH Experiments II - Training Times Arabic Czech Spanish German Hungarian Morfette CRFSuite PCRF PCRF PCRF CRFSuite slower than Morfette (An order of magnitude for Hungarian and Czech) PCRF usually twice as fast as Morfette Czech training takes a week for CRFSuite and 5h for PCRF

69 Conclusion Approximate CRF tagger for big tag sets Fast due to coarse-to-ne decoding (speedups of up to 30) Supports high-order CRFs Higher accuracy than a number of baselines thanks to high order

70 MarMoT - MarMoT Morphological Tagger Our open-source implementation MarMoT is available at Thank you for your attention!

71 References I Chrupaªa, G., Dinu, G., and van Genabith, J. (2008). Learning morphology with Morfette. In Proceedings of LREC. Collins, M. and Roark, B. (2004). Incremental parsing with the perceptron algorithm. In Proceedings of ACL. Giménez, J. and Màrquez, L. (2004). Svmtool: A general POS tagger generator based on Support Vector Machines. In Proceedings of LREC. Okazaki, N. (2007). Crfsuite: A fast implementation of conditional random elds (CRFs). URL

72 References II Schmid, H. and Laws, F. (2008). Estimation of conditional probabilities with decision trees and an application to ne-grained POS tagging. In Proceedings of COLING. Toutanova, K., Klein, D., Manning, C. D., and Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of NAACL. Tsuruoka, Y., Tsujii, J., and Ananiadou, S. (2009). Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty. In Proceedings of ACL.

Natural Language Processing

Natural Language Processing Natural Language Processing Global linear models Based on slides from Michael Collins Globally-normalized models Why do we decompose to a sequence of decisions? Can we directly estimate the probability

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing

Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing Dependency Parsing Statistical NLP Fall 2016 Lecture 9: Dependency Parsing Slav Petrov Google prep dobj ROOT nsubj pobj det PRON VERB DET NOUN ADP NOUN They solved the problem with statistics CoNLL Format

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron)

Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron) Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron) Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor (http://brenocon.com) 1 Models for

More information

A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing

A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing Jianfeng Gao *, Galen Andrew *, Mark Johnson *&, Kristina Toutanova * * Microsoft Research, Redmond WA 98052,

More information

Conditional Random Fields for Sequential Supervised Learning

Conditional Random Fields for Sequential Supervised Learning Conditional Random Fields for Sequential Supervised Learning Thomas G. Dietterich Adam Ashenfelter Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.eecs.oregonstate.edu/~tgd

More information

TnT Part of Speech Tagger

TnT Part of Speech Tagger TnT Part of Speech Tagger By Thorsten Brants Presented By Arghya Roy Chaudhuri Kevin Patel Satyam July 29, 2014 1 / 31 Outline 1 Why Then? Why Now? 2 Underlying Model Other technicalities 3 Evaluation

More information

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers

More information

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

Structured Prediction

Structured Prediction Machine Learning Fall 2017 (structured perceptron, HMM, structured SVM) Professor Liang Huang (Chap. 17 of CIML) x x the man bit the dog x the man bit the dog x DT NN VBD DT NN S =+1 =-1 the man bit the

More information

Graphical models for part of speech tagging

Graphical models for part of speech tagging Indian Institute of Technology, Bombay and Research Division, India Research Lab Graphical models for part of speech tagging Different Models for POS tagging HMM Maximum Entropy Markov Models Conditional

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

CSE 490 U Natural Language Processing Spring 2016

CSE 490 U Natural Language Processing Spring 2016 CSE 490 U Natural Language Processing Spring 2016 Feature Rich Models Yejin Choi - University of Washington [Many slides from Dan Klein, Luke Zettlemoyer] Structure in the output variable(s)? What is the

More information

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

with Local Dependencies

with Local Dependencies CS11-747 Neural Networks for NLP Structured Prediction with Local Dependencies Xuezhe Ma (Max) Site https://phontron.com/class/nn4nlp2017/ An Example Structured Prediction Problem: Sequence Labeling Sequence

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

Marrying Dynamic Programming with Recurrent Neural Networks

Marrying Dynamic Programming with Recurrent Neural Networks Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f

More information

Probabilistic Models for Sequence Labeling

Probabilistic Models for Sequence Labeling Probabilistic Models for Sequence Labeling Besnik Fetahu June 9, 2011 Besnik Fetahu () Probabilistic Models for Sequence Labeling June 9, 2011 1 / 26 Background & Motivation Problem introduction Generative

More information

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013 More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Neural Networks in Structured Prediction. November 17, 2015

Neural Networks in Structured Prediction. November 17, 2015 Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving

More information

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.

More information

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations

More information

Log-Linear Models, MEMMs, and CRFs

Log-Linear Models, MEMMs, and CRFs Log-Linear Models, MEMMs, and CRFs Michael Collins 1 Notation Throughout this note I ll use underline to denote vectors. For example, w R d will be a vector with components w 1, w 2,... w d. We use expx

More information

Chunking with Support Vector Machines

Chunking with Support Vector Machines NAACL2001 Chunking with Support Vector Machines Graduate School of Information Science, Nara Institute of Science and Technology, JAPAN Taku Kudo, Yuji Matsumoto {taku-ku,matsu}@is.aist-nara.ac.jp Chunking

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Deep Learning for Natural Language Processing

Deep Learning for Natural Language Processing Deep Learning for Natural Language Processing Dylan Drover, Borui Ye, Jie Peng University of Waterloo djdrover@uwaterloo.ca borui.ye@uwaterloo.ca July 8, 2015 Dylan Drover, Borui Ye, Jie Peng (University

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

CS395T: Structured Models for NLP Lecture 19: Advanced NNs I

CS395T: Structured Models for NLP Lecture 19: Advanced NNs I CS395T: Structured Models for NLP Lecture 19: Advanced NNs I Greg Durrett Administrivia Kyunghyun Cho (NYU) talk Friday 11am GDC 6.302 Project 3 due today! Final project out today! Proposal due in 1 week

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

CS388: Natural Language Processing Lecture 4: Sequence Models I

CS388: Natural Language Processing Lecture 4: Sequence Models I CS388: Natural Language Processing Lecture 4: Sequence Models I Greg Durrett Mini 1 due today Administrivia Project 1 out today, due September 27 Viterbi algorithm, CRF NER system, extension Extension

More information

CS260: Machine Learning Algorithms

CS260: Machine Learning Algorithms CS260: Machine Learning Algorithms Lecture 4: Stochastic Gradient Descent Cho-Jui Hsieh UCLA Jan 16, 2019 Large-scale Problems Machine learning: usually minimizing the training loss min w { 1 N min w {

More information

Discrimina)ve Latent Variable Models. SPFLODD November 15, 2011

Discrimina)ve Latent Variable Models. SPFLODD November 15, 2011 Discrimina)ve Latent Variable Models SPFLODD November 15, 2011 Lecture Plan 1. Latent variables in genera)ve models (review) 2. Latent variables in condi)onal models 3. Latent variables in structural SVMs

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Sequential Supervised Learning

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given

More information

Minibatch and Parallelization for Online Large Margin Structured Learning

Minibatch and Parallelization for Online Large Margin Structured Learning Minibatch and Parallelization for Online Large Margin Structured Learning Kai Zhao Computer Science Program, Graduate Center City University of New York kzhao@gc.cuny.edu Liang Huang, Computer Science

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Linear classifiers: Overfitting and regularization

Linear classifiers: Overfitting and regularization Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x

More information

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project

More information

CSE 447/547 Natural Language Processing Winter 2018

CSE 447/547 Natural Language Processing Winter 2018 CSE 447/547 Natural Language Processing Winter 2018 Feature Rich Models (Log Linear Models) Yejin Choi University of Washington [Many slides from Dan Klein, Luke Zettlemoyer] Announcements HW #3 Due Feb

More information

NLP Homework: Dependency Parsing with Feed-Forward Neural Network

NLP Homework: Dependency Parsing with Feed-Forward Neural Network NLP Homework: Dependency Parsing with Feed-Forward Neural Network Submission Deadline: Monday Dec. 11th, 5 pm 1 Background on Dependency Parsing Dependency trees are one of the main representations used

More information

LEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING

LEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING LEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING Yichi Zhang Zhijian Ou Speech Processing and Machine Intelligence (SPMI) Lab Department of Electronic Engineering

More information

Sparse Forward-Backward for Fast Training of Conditional Random Fields

Sparse Forward-Backward for Fast Training of Conditional Random Fields Sparse Forward-Backward for Fast Training of Conditional Random Fields Charles Sutton, Chris Pal and Andrew McCallum University of Massachusetts Amherst Dept. Computer Science Amherst, MA 01003 {casutton,

More information

Soft Inference and Posterior Marginals. September 19, 2013

Soft Inference and Posterior Marginals. September 19, 2013 Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference

More information

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017 CPSC 340: Machine Learning and Data Mining Stochastic Gradient Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation code

More information

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss Jeffrey Flanigan Chris Dyer Noah A. Smith Jaime Carbonell School of Computer Science, Carnegie Mellon University, Pittsburgh,

More information

SYNTHER A NEW M-GRAM POS TAGGER

SYNTHER A NEW M-GRAM POS TAGGER SYNTHER A NEW M-GRAM POS TAGGER David Sündermann and Hermann Ney RWTH Aachen University of Technology, Computer Science Department Ahornstr. 55, 52056 Aachen, Germany {suendermann,ney}@cs.rwth-aachen.de

More information

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

CS395T: Structured Models for NLP Lecture 19: Advanced NNs I. Greg Durrett

CS395T: Structured Models for NLP Lecture 19: Advanced NNs I. Greg Durrett CS395T: Structured Models for NLP Lecture 19: Advanced NNs I Greg Durrett Administrivia Kyunghyun Cho (NYU) talk Friday 11am GDC 6.302 Project 3 due today! Final project out today! Proposal due in 1 week

More information

Structured Prediction

Structured Prediction Structured Prediction Classification Algorithms Classify objects x X into labels y Y First there was binary: Y = {0, 1} Then multiclass: Y = {1,...,6} The next generation: Structured Labels Structured

More information

Structured Prediction with Perceptron: Theory and Algorithms

Structured Prediction with Perceptron: Theory and Algorithms Structured Prediction with Perceptron: Theor and Algorithms the man bit the dog the man hit the dog DT NN VBD DT NN 那 人咬了狗 =+1 =-1 Kai Zhao Dept. Computer Science Graduate Center, the Cit Universit of

More information

MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING

MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING Outline Some Sample NLP Task [Noah Smith] Structured Prediction For NLP Structured Prediction Methods Conditional Random Fields Structured Perceptron Discussion

More information

Spectral Unsupervised Parsing with Additive Tree Metrics

Spectral Unsupervised Parsing with Additive Tree Metrics Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach

More information

Feature Noising. Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager

Feature Noising. Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager Feature Noising Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager Outline Part 0: Some backgrounds Part 1: Dropout as adaptive

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

2.2 Structured Prediction

2.2 Structured Prediction The hinge loss (also called the margin loss), which is optimized by the SVM, is a ramp function that has slope 1 when yf(x) < 1 and is zero otherwise. Two other loss functions squared loss and exponential

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Transition-Based Parsing

Transition-Based Parsing Transition-Based Parsing Based on atutorial at COLING-ACL, Sydney 2006 with Joakim Nivre Sandra Kübler, Markus Dickinson Indiana University E-mail: skuebler,md7@indiana.edu Transition-Based Parsing 1(11)

More information

A Support Vector Method for Multivariate Performance Measures

A Support Vector Method for Multivariate Performance Measures A Support Vector Method for Multivariate Performance Measures Thorsten Joachims Cornell University Department of Computer Science Thanks to Rich Caruana, Alexandru Niculescu-Mizil, Pierre Dupont, Jérôme

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Logistic Regression. William Cohen

Logistic Regression. William Cohen Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting

More information

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017 Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

Feature-Frequency Adaptive On-line Training for Fast and Accurate Natural Language Processing

Feature-Frequency Adaptive On-line Training for Fast and Accurate Natural Language Processing Feature-Frequency Adaptive On-line Training for Fast and Accurate Natural Language Processing Xu Sun Peking University Wenjie Li Hong Kong Polytechnic University Houfeng Wang Peking University Qin Lu Hong

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Better! Faster! Stronger*!

Better! Faster! Stronger*! Jason Eisner Jiarong Jiang He He Better! Faster! Stronger*! Learning to balance accuracy and efficiency when predicting linguistic structures (*theorems) Hal Daumé III UMD CS, UMIACS, Linguistics me@hal3.name

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

Notes on AdaGrad. Joseph Perla 2014

Notes on AdaGrad. Joseph Perla 2014 Notes on AdaGrad Joseph Perla 2014 1 Introduction Stochastic Gradient Descent (SGD) is a common online learning algorithm for optimizing convex (and often non-convex) functions in machine learning today.

More information

Tuning as Linear Regression

Tuning as Linear Regression Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Structured Prediction Models via the Matrix-Tree Theorem

Structured Prediction Models via the Matrix-Tree Theorem Structured Prediction Models via the Matrix-Tree Theorem Terry Koo Amir Globerson Xavier Carreras Michael Collins maestro@csail.mit.edu gamir@csail.mit.edu carreras@csail.mit.edu mcollins@csail.mit.edu

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins

Probabilistic Context Free Grammars. Many slides from Michael Collins Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Integrating Morphology in Probabilistic Translation Models

Integrating Morphology in Probabilistic Translation Models Integrating Morphology in Probabilistic Translation Models Chris Dyer joint work with Jon Clark, Alon Lavie, and Noah Smith January 24, 2011 lti das alte Haus the old house mach das do that 2 das alte

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

A Discriminative Model for Semantics-to-String Translation

A Discriminative Model for Semantics-to-String Translation A Discriminative Model for Semantics-to-String Translation Aleš Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley

More information

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Projects Project descriptions due today! Last class Sequence to sequence models Attention Pointer networks Today Weak

More information

Dual Decomposition for Natural Language Processing. Decoding complexity

Dual Decomposition for Natural Language Processing. Decoding complexity Dual Decomposition for atural Language Processing Alexander M. Rush and Michael Collins Decoding complexity focus: decoding problem for natural language tasks motivation: y = arg max y f (y) richer model

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

lecture 6: modeling sequences (final part)

lecture 6: modeling sequences (final part) Natural Language Processing 1 lecture 6: modeling sequences (final part) Ivan Titov Institute for Logic, Language and Computation Outline After a recap: } Few more words about unsupervised estimation of

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

HMM part 1. Dr Philip Jackson

HMM part 1. Dr Philip Jackson Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM part 1 Dr Philip Jackson Probability fundamentals Markov models State topology diagrams Hidden Markov models -

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

Feedforward Neural Networks

Feedforward Neural Networks Feedforward Neural Networks Michael Collins 1 Introduction In the previous notes, we introduced an important class of models, log-linear models. In this note, we describe feedforward neural networks, which

More information

Classification with Perceptrons. Reading:

Classification with Perceptrons. Reading: Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters

More information

Features of Statistical Parsers

Features of Statistical Parsers Features of tatistical Parsers Preliminary results Mark Johnson Brown University TTI, October 2003 Joint work with Michael Collins (MIT) upported by NF grants LI 9720368 and II0095940 1 Talk outline tatistical

More information

AN ABSTRACT OF THE DISSERTATION OF

AN ABSTRACT OF THE DISSERTATION OF AN ABSTRACT OF THE DISSERTATION OF Kai Zhao for the degree of Doctor of Philosophy in Computer Science presented on May 30, 2017. Title: Structured Learning with Latent Variables: Theory and Algorithms

More information

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Hidden Markov Models Murhaf Fares & Stephan Oepen Language Technology Group (LTG) October 27, 2016 Recap: Probabilistic Language

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information