Structured Prediction
|
|
- Silas Newton
- 5 years ago
- Views:
Transcription
1 Machine Learning Fall 2017 (structured perceptron, HMM, structured SVM) Professor Liang Huang (Chap. 17 of CIML)
2 x x the man bit the dog x the man bit the dog x DT NN VBD DT NN S =+1 =-1 the man bit the dog x DT NP NN VB VP NP the man bit DT NN binar classification: output is binar multiclass classification: output is a number (small # of classes) structured classification: output is a structure (seq., tree, graph) part-of-speech tagging, parsing, summarization, translation exponentiall man classes: search (inference) efficienc is crucial! 2 the dog
3 Generic Perceptron online-learning: one example at a time learning b doing update weights at mistakes find the best output under the current weights xi inference i zi update weights w 3
4 Perceptron: from binar to structured binar classification x x 2 classes trivial x w exact inference z update weights if z =+1 =-1 multiclass classification constant x # of classes eas w exact inference z update weights if z structured classification exponential # of classes hard w the man bit the dog DT NN VBD DT NN x x exact inference z update weights if z 4
5 From Perceptron to SVM max margin 1964 Vapnik Chervonenkis same journal! 1964 Aizerman+ online kernels kernel. perc. +soft-margin fall of USSR 1995 Cortes/Vapnik SVM conservative updates online approx. max margin subgradient descent 2003 Crammer/Singer MIRA minibatch batch Singer group Pegasos 2006 Singer group aggressive 1959 Rosenblatt invention 1962 Novikoff proof inseparable case 1999 Freund/Schapire voted/avg: revived 2003 Taskar M3N 2004 Tsochantaridis struct. SVM multinomial logistic regression (max. entrop) 2001 Laffert+ CRF AT&T Research 2002 Collins structured ex-at&t and students 2005 McDonald+ structured MIRA 5
6 Multiclass Classification one weight vector ( prototpe ) for each class: multiclass decision rule: ŷ = argmax z21...m (best agreement w/ prototpe) w (z) x Q1: what about 2-class? Q2: do we still need augmented space?
7 Multiclass Perceptron on an error, penalize the weight for the wrong class, and reward the weight for the true class
8 Convergence of Multiclass update rule: w w + (x,, z) separabilit: 9u, s.t. 8(x, ) 2 D, z 6= u (x,, z) for all
9 Example: POS Tagging gold-standard: DT NN VBD DT NN the man bit the dog current output: DT NN NN DT NN the man bit the dog assume onl two feature classes tag bigrams t i-1 ti word/tag pairs w i weights ++: (NN, VBD) (VBD, DT) (VBD, bit) weights --: (NN, NN) (NN, DT) (NN, bit) x Φ(x, ) z Φ(x, z) x ɸ(x,)={(<s>, DT): 1, (DT, NN):2,..., (NN, </s>): 1, (DT, the):1,..., (VBD, bit): 1,..., (NN, dog): 1} ɸ(x,z)={(<s>, DT): 1, (DT, NN):2,..., (NN, </s>): 1, (DT, the):1,..., (NN, bit): 1,..., (NN, dog): 1} ɸ(x,) - ɸ(x,z) 9
10 Structured Perceptron xi inference w i zi update weights 10
11 Inference: Dnamic Programming w x exact inference z update weights if z 11
12 Pthon implementation Q: what about top-down recursive + memoization? CS Lec 5-6: Probs & WFSTs 12
13 xi Efficienc vs. Expressiveness argmax GEN(x) inference w the inference (argmax) must be efficient i zi update weights either the search space GEN(x) is small, or factored features must be local to (but can be global to x) e.g. bigram tagger, but look at all input words (cf. CRFs) x 13
14 Averaged Perceptron 0 j j +1 j = j W j more stable and accurate results approximation of voted perceptron (Freund & Schapire, 1999) 14
15 Averaging Tricks Daume (2006, PhD thesis) sparse vector: defaultdict w t t w w (0) = w (1) = w (2) = w (3) = w (4) = w (1) w (1) w (2) w (1) w (2) w (3) w (1) w (2) w (3) w (4) 15
16 Do we need smoothing? x smoothing is much easier in discriminative models just make sure for each feature template, its subset templates are also included e.g., to include (t 0 w0 w-1) ou must also include (t 0 w0) (t0 w-1) (w0 w-1) and mabe also (t 0 t-1) because t is less sparse than w 16
17 Convergence with Exact Search linear classification: converges iff. data is separable structured: converges iff. data separable & search exact there is an oracle vector that correctl labels all examples one vs the rest (correct label better than all incorrect labels) theorem: if separable, then # of updates R 2 / δ 2 R: diameter R: diameter R: diameter x 100 x 100 x x 3012 δ δ x 2000 =-1 =+1 Rosenblatt => Collins z
18 Geometr of Convergence Proof pt 1 w z exact 1-best x exact inference z update weights if z (x,, z) update perceptron update: correct label w (k) current model update new model w (k+1) δ separation unit oracle vector u margin (b induction) (part 1: upperbound) 18
19 Geometr of Convergence Proof pt 2 z (x,, z) exact 1-best update x w exact inference z update weights if z violation: incorrect label scored higher perceptron update: R: max diameter correct label w (k) current model update <90 new w (k+1) model b induction: apple R 2 diameter violation (part 2: upperbound) parts 1+2 => update bounds: k apple R 2 / 2 19
20 Experiments
21 Experiments: Tagging (almost) identical features from (Ratnaparkhi, 1996) trigram tagger: current tag t i, previous tags ti-1, ti-2 current word w i and its spelling features surrounding words w i-1 wi+1 wi-2 wi
22 Experiments: NP Chunking B-I-O scheme Rockwell International Corp. B I I 's Tulsa unit said it signed B I I O B O a tentative agreement... B I I features: unigram model surrounding words and POS tags 22
23 Experiments: NP Chunking results (Sha and Pereira, 2003) trigram tagger voted perceptron: 94.09% vs. CRF: 94.38% 23
24 Structured SVM structured perceptron: w Δɸ(x,,z) > 0 SVM: for all (x,), functional margin (w x) 1 structured SVM version 1: simple loss for all (x,), for all z, margin w Δɸ(x,,z) 1 correct has to score higher than an wrong z b 1 structured SVM version 2: structured loss for all (x,), for all z, margin w Δɸ(x,,z) (,z) correct has to score higher than an wrong z b (,z), a distance metric such as hamming loss 24
25 Loss-Augmented Decoding want for all z: w ɸ(x,) w ɸ(x,z) + (,z) same as: w ɸ(x,) maxz w ɸ(x,z) + (,z) loss-augmented decoding: argmaxz w ɸ(x,z) + (,z) if (,z) factors in z (e.g. hamming), just modif DP CIML version modified DP should have learning rate! λ = 1/(2C) ver similar to Pegasos; but should use Pegasos framework instead 25
26 Correct Version following Pegasos want for all z: w ɸ(x,) w ɸ(x,z) + (,z) same as: w ɸ(x,) maxz w ɸ(x,z) + (,z) loss-augmented decoding: argmaxz w ɸ(x,z) + (,z) if (,z) factors in z (e.g. hamming), just modif DP 1/t NC/2t ( ) N= D, C is from SVM t +=1 for each example 26
27 Struct. Perceptron vs Struct. SVM tagging, ATIS (train: 488 sent); SVM < avg perc << perc perceptron epoch 1 updates 102, W =291, train_err 3.90%, dev_err 9.36% avg_err 6.14% epoch 2 updates 91, W =334, train_err 3.33%, dev_err 8.19% avg_err 4.97% epoch 3 updates 78, W =347, train_err 2.92%, dev_err 5.85% avg_err 4.97% epoch 4 updates 81, W =368, train_err 3.11%, dev_err 6.73% avg_err 5.85% epoch 5 updates 78, W =378, train_err 2.70%, dev_err 6.14% avg_err 5.56% epoch 6 updates 63, W =385, train_err 2.26%, dev_err 6.14% avg_err 5.56% epoch 7 updates 69, W =385, train_err 2.43%, dev_err 7.02% avg_err 5.56% epoch 8 updates 60, W =388, train_err 2.15%, dev_err 6.73% avg_err 5.56% epoch 9 updates 59, W =390, train_err 2.04%, dev_err 6.14% avg_err 5.56% epoch 10 updates 64, W =394, train_err 2.15%, dev_err 5.85% avg_err 5.26% SVM C=1 epoch 1 updates 116, W =311, train_err 4.55%, dev_err 5.85% epoch 2 updates 82, W =328, train_err 3.05%, dev_err 4.97% epoch 3 updates 78, W =334, train_err 2.92%, dev_err 5.56% epoch 4 updates 77, W =339, train_err 2.92%, dev_err 5.26% epoch 5 updates 80, W =344, train_err 2.94%, dev_err 5.56% epoch 6 updates 73, W =345, train_err 2.75%, dev_err 4.68% epoch 7 updates 72, W =347, train_err 2.75%, dev_err 4.97% epoch 8 updates 75, W =352, train_err 2.86%, dev_err 4.97% epoch 9 updates 74, W =353, train_err 2.78%, dev_err 4.97% epoch 10 updates 72, W =354, train_err 2.78%, dev_err 4.97% 27
Machine Learning A Geometric Approach
Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang some slides from Alex Smola (CMU) Perceptron Frank Rosenblatt deep learning multilayer perceptron linear regression
More informationStructured Prediction with Perceptron: Theory and Algorithms
Structured Prediction with Perceptron: Theor and Algorithms the man bit the dog the man hit the dog DT NN VBD DT NN 那 人咬了狗 =+1 =-1 Kai Zhao Dept. Computer Science Graduate Center, the Cit Universit of
More informationMachine Learning A Geometric Approach
Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationAlgorithms for Predicting Structured Data
1 / 70 Algorithms for Predicting Structured Data Thomas Gärtner / Shankar Vembu Fraunhofer IAIS / UIUC ECML PKDD 2010 Structured Prediction 2 / 70 Predicting multiple outputs with complex internal structure
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationAlgorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Classification II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Minimize Training Error? A loss function declares how costly each mistake is E.g. 0 loss for correct label,
More informationMachine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)
Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:
More informationLow-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park
Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features
More informationLecture 3: Multiclass Classification
Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in
More informationNatural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley
Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:
More informationPerceptron (Theory) + Linear Regression
10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationLecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationIntroduction to Data-Driven Dependency Parsing
Introduction to Data-Driven Dependency Parsing Introductory Course, ESSLLI 2007 Ryan McDonald 1 Joakim Nivre 2 1 Google Inc., New York, USA E-mail: ryanmcd@google.com 2 Uppsala University and Växjö University,
More informationGeneralized Linear Classifiers in NLP
Generalized Linear Classifiers in NLP (or Discriminative Generalized Linear Feature-Based Classifiers) Graduate School of Language Technology, Sweden 2009 Ryan McDonald Google Inc., New York, USA E-mail:
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationSearch-Based Structured Prediction
Search-Based Structured Prediction by Harold C. Daumé III (Utah), John Langford (Yahoo), and Daniel Marcu (USC) Submitted to Machine Learning, 2007 Presented by: Eugene Weinstein, NYU/Courant Institute
More informationPerceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015
Perceptron Subhransu Maji CMPSCI 689: Machine Learning 3 February 2015 5 February 2015 So far in the class Decision trees Inductive bias: use a combination of small number of features Nearest neighbor
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationNLP Programming Tutorial 11 - The Structured Perceptron
NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Prediction Problems Given x, A book review Oh, man I love this book! This book is
More informationA Support Vector Method for Multivariate Performance Measures
A Support Vector Method for Multivariate Performance Measures Thorsten Joachims Cornell University Department of Computer Science Thanks to Rich Caruana, Alexandru Niculescu-Mizil, Pierre Dupont, Jérôme
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationLecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron)
Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron) Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor (http://brenocon.com) 1 Models for
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More informationPersonal Project: Shift-Reduce Dependency Parsing
Personal Project: Shift-Reduce Dependency Parsing 1 Problem Statement The goal of this project is to implement a shift-reduce dependency parser. This entails two subgoals: Inference: We must have a shift-reduce
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More information2.2 Structured Prediction
The hinge loss (also called the margin loss), which is optimized by the SVM, is a ramp function that has slope 1 when yf(x) < 1 and is zero otherwise. Two other loss functions squared loss and exponential
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationEXPERIMENTS ON PHRASAL CHUNKING IN NLP USING EXPONENTIATED GRADIENT FOR STRUCTURED PREDICTION
EXPERIMENTS ON PHRASAL CHUNKING IN NLP USING EXPONENTIATED GRADIENT FOR STRUCTURED PREDICTION by Porus Patell Bachelor of Engineering, University of Mumbai, 2009 a project report submitted in partial fulfillment
More informationThe Perceptron Algorithm
The Perceptron Algorithm Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Outline The Perceptron Algorithm Perceptron Mistake Bound Variants of Perceptron 2 Where are we? The Perceptron
More informationPredicting Sequences: Structured Perceptron. CS 6355: Structured Prediction
Predicting Sequences: Structured Perceptron CS 6355: Structured Prediction 1 Conditional Random Fields summary An undirected graphical model Decompose the score over the structure into a collection of
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive
More informationDistributed Training Strategies for the Structured Perceptron
Distributed Training Strategies for the Structured Perceptron Ryan McDonald Keith Hall Gideon Mann Google, Inc., New York / Zurich {ryanmcd kbhall gmann}@google.com Abstract Perceptron training is widely
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins
Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationMore on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013
More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative
More informationMIRA, SVM, k-nn. Lirong Xia
MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output
More informationLinear Classifiers IV
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationOnline Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri
Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over
More informationMachine Learning Lecture 6 Note
Machine Learning Lecture 6 Note Compiled by Abhi Ashutosh, Daniel Chen, and Yijun Xiao February 16, 2016 1 Pegasos Algorithm The Pegasos Algorithm looks very similar to the Perceptron Algorithm. In fact,
More informationML4NLP Multiclass Classification
ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 12, April 23, 2013 David Sontag NYU) Graphical Models Lecture 12, April 23, 2013 1 / 24 What notion of best should learning be optimizing?
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationMinibatch and Parallelization for Online Large Margin Structured Learning
Minibatch and Parallelization for Online Large Margin Structured Learning Kai Zhao Computer Science Program, Graduate Center City University of New York kzhao@gc.cuny.edu Liang Huang, Computer Science
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationLinear and Logistic Regression. Dr. Xiaowei Huang
Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationUndirected Graphical Models for Sequence Analysis
Undirected Graphical Models for Sequence Analysis Fernando Pereira University of Pennsylvania Joint work with John Lafferty, Andrew McCallum, and Fei Sha Motivation: Information Extraction Co-reference
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationSequential Supervised Learning
Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationStatistical NLP Spring A Discriminative Approach
Statistical NLP Spring 2008 Lecture 6: Classification Dan Klein UC Berkeley A Discriminative Approach View WSD as a discrimination task (regression, really) P(sense context:jail, context:county, context:feeding,
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationIntroduction to Classification and Sequence Labeling
Introduction to Classification and Sequence Labeling Grzegorz Chrupa la Spoken Language Systems Saarland University Annual IRTG Meeting 2009 Grzegorz Chrupa la (UdS) Classification and Sequence Labeling
More informationSupport Vector Machines. Machine Learning Fall 2017
Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative
More informationEE 511 Online Learning Perceptron
Slides adapted from Ali Farhadi, Mari Ostendorf, Pedro Domingos, Carlos Guestrin, and Luke Zettelmoyer, Kevin Jamison EE 511 Online Learning Perceptron Instructor: Hanna Hajishirzi hannaneh@washington.edu
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationSequence Labeling: HMMs & Structured Perceptron
Sequence Labeling: HMMs & Structured Perceptron CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu HMM: Formal Specification Q: a finite set of N states Q = {q 0, q 1, q 2, q 3, } N N Transition
More informationStructured Prediction Theory and Algorithms
Structured Prediction Theory and Algorithms Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Google Research) Scott Yang (Courant Institute) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE
More informationLinear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt)
Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt) Nathan Schneider (some slides borrowed from Chris Dyer) ENLP 12 February 2018 23 Outline Words, probabilities Features,
More informationLab 12: Structured Prediction
December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More information6.891: Lecture 24 (December 8th, 2003) Kernel Methods
6.891: Lecture 24 (December 8th, 2003) Kernel Methods Overview ffl Recap: global linear models ffl New representations from old representations ffl computational trick ffl Kernels for NLP structures ffl
More informationCS 188: Artificial Intelligence. Outline
CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class
More informationwith Local Dependencies
CS11-747 Neural Networks for NLP Structured Prediction with Local Dependencies Xuezhe Ma (Max) Site https://phontron.com/class/nn4nlp2017/ An Example Structured Prediction Problem: Sequence Labeling Sequence
More informationIN FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning
1 IN4080 2018 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning 2 Logistic regression Lecture 8, 26 Sept Today 3 Recap: HMM-tagging Generative and discriminative classifiers Linear classifiers Logistic
More informationProbabilistic Context-free Grammars
Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John
More informationCSE 546 Midterm Exam, Fall 2014
CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationNatural Language Processing
Natural Language Processing Global linear models Based on slides from Michael Collins Globally-normalized models Why do we decompose to a sequence of decisions? Can we directly estimate the probability
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationAbstract of Discriminative Methods for Label Sequence Learning by Yasemin Altun, Ph.D., Brown University, May 2005.
Abstract of Discriminative Methods for Label Sequence Learning by Yasemin Altun, Ph.D., Brown University, May 2005. Discriminative learning framework is one of the very successful fields of machine learning.
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationESS2222. Lecture 4 Linear model
ESS2222 Lecture 4 Linear model Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Logistic Regression Predicting Continuous Target Variables Support Vector Machine (Some Details)
More informationMulticlass and Introduction to Structured Prediction
Multiclass and Introduction to Structured Prediction David S. Rosenberg New York University March 27, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 27, 2018 1 / 49 Contents
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationSequential Data Modeling - The Structured Perceptron
Sequential Data Modeling - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Prediction Problems Given x, predict y 2 Prediction Problems Given x, A book review
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationMulticlass and Introduction to Structured Prediction
Multiclass and Introduction to Structured Prediction David S. Rosenberg Bloomberg ML EDU November 28, 2017 David S. Rosenberg (Bloomberg ML EDU) ML 101 November 28, 2017 1 / 48 Introduction David S. Rosenberg
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationAN ABSTRACT OF THE DISSERTATION OF
AN ABSTRACT OF THE DISSERTATION OF Kai Zhao for the degree of Doctor of Philosophy in Computer Science presented on May 30, 2017. Title: Structured Learning with Latent Variables: Theory and Algorithms
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification
More informationStructured Prediction
Structured Prediction Classification Algorithms Classify objects x X into labels y Y First there was binary: Y = {0, 1} Then multiclass: Y = {1,...,6} The next generation: Structured Labels Structured
More informationGraphical models for part of speech tagging
Indian Institute of Technology, Bombay and Research Division, India Research Lab Graphical models for part of speech tagging Different Models for POS tagging HMM Maximum Entropy Markov Models Conditional
More information