Search-Based Structured Prediction
|
|
- Erika Butler
- 5 years ago
- Views:
Transcription
1 Search-Based Structured Prediction by Harold C. Daumé III (Utah), John Langford (Yahoo), and Daniel Marcu (USC) Submitted to Machine Learning, 2007 Presented by: Eugene Weinstein, NYU/Courant Institute October 2nd, 2007
2 Structured Prediction Intro Given: labeled training data Task: learn mapping from inputs Special cases Binary classification: Y = { 1, 1} ution. Unlike the Y = {1,..., k} bered labels, we Multiclass classification: Natural language parsing example: (x 1, y 1 ),..., (x m, y m ) X Y x X to outputs y Y 2
3 Exploiting Structure Naive approach: treat each possible output in Y as discrete label, apply multiclass classification. But: Enumerating all members of Y often intractable Cannot model closeness of examples (changing one node of tree vs. changing the entire tree) Approach: try to exploit structure and dependencies within the output space Represent closeness of outputs using loss function small loss 3 big loss
4 SP Overview Discriminative structured prediction papers typically extend multiclass classification or regression techniques Most classification schemes use SVM-like max-margin linear classifications incorporating loss functions [Taskar, Guestrin, Koller 03], [Tsochantaridis, Hofmann, Joachims, Altun 04] [Sha, Saul 07] Regression formulation of SP: [Cortes, Mohri, Weston 06] Searn is a meta-algorithm. Claim: given multiclass classifier achieving good generalization, Searn does the same for SP 4
5 Search-based SP [Daumé 06] [Daumé, Langford, Marcu 07] Searn: view structured prediction as search problem SP: distribution D over inputs, output costs (x, c) c = Y e.g.: x i is input, c y is the loss for any y to the true label y i Define loss of cost-sensitive classifier as h : X Y L(D, h) = E (x,c) D { ch(x) } View outputs as vectors y = [y (1),..., y (l) ], but classification problems not limited to sequences A classifier defines a path through space of input/output pairs, and training process iteratively refines the classifier 5
6 Searn Specifics We need to provide: Cost-sensitive multiclass learning algorithm Initial classifier Loss function Initial classifier should have low training error, but need not generalize well Could be best path from any standard search algorithm Each Searn iteration finds a classifier that is not as good on the training set, but generalizes a little better 6
7 Searn Training Search state space: (input, partial output): Initial classifier: pick next label that minimizes cost, assuming that all future decisions are also optimal: s = (x, y (1),..., y (l) ) h 0 (s, c) = arg min y (l+1) min y (l+2),...,y (L) c [(y (1),...,y (L) )] Iterative step: use current classifier h to construct a set of examples to train the next classifier; then interpolate For each state, try every possible next output Cost assigned to each output tried is loss difference l h (c, s, a) = E y (s,a,h) c y min a 7 E y (s,a,h)c y
8 Searn Training Illustration y i = 1 l h (c, s, a) = E y (s,a,h) c y min a i = 1 i = 2 i = 3 i = 4 i = 5 E y (s,a,h)c y i = Prediction of current classifier h Other path being considered (s, a, h) Current state s Potential next state a 8
9 Searn Training Illustration y i = 1 2 l h (c, s, a) = E y (s,a,h) c y min a i = 1 i = 2 i = 3 i = 4 i = 5 l h = 2 E y (s,a,h)c y i = Prediction of current classifier h Other path being considered (s, a, h) Current state s Potential next state a 8
10 Searn Training Illustration y i = l h (c, s, a) = E y (s,a,h) c y min a i = 1 i = 2 i = 3 i = 4 i = 5 l h = 2 l h = 5 E y (s,a,h)c y i = 6 4 Prediction of current classifier h Other path being considered (s, a, h) Current state s Potential next state a 8
11 Searn Training Illustration y i = l h (c, s, a) = E y (s,a,h) c y min a i = 1 i = 2 i = 3 i = 4 i = 5 l h = 2 l h = 5 l h = 1 E y (s,a,h)c y i = 6 Prediction of current classifier h Other path being considered (s, a, h) Current state s Potential next state a 8
12 Searn Training Illustration y i = l h (c, s, a) = E y (s,a,h) c y min a i = 1 i = 2 i = 3 i = 4 i = 5 l h = 2 l h = 5 l h = 1 l h = 0 E y (s,a,h)c y i = 6 Prediction of current classifier h Other path being considered (s, a, h) Current state s Potential next state a 8
13 Searn Meta-Algorithm Input: (x 1, y 1 ),..., (x m, y m ), h 0, A while h has a significant dependence on h 0 : Initialize set of cost-sensitive examples: for i 1,..., m Compute prediction: State consists of input and for l 1,..., L s l (x i, y (1),..., y (l) ) for each next output a after : Compute features and add example: Learn and interpolate: Return h with h 0 removed S (y (1),..., y (L) ) h(x i ) Use losses to build up training examples for next iteration s c l s l,a l h (c, s l, a) S f(sl, c ) h A(S); h βh + (1 β)h
14 Algorithm Analysis h i is the classifier trained up to the ith iteration and is the loss of on this iteration s training examples T is the maximum length of any output sequence Theorem: If c max = E (x,c) D max c y and y (average loss over I iterations) then total loss with and iterations is bounded as 2T 3 ln T h i L(D, h last ) L(D, h 0 ) + 2T l avg log T + (1 + log T )c max /T Proof analyses the mixture of old and new classifiers In practice, β can be larger (more aggressive learning) 10 l hi (h i) l avg = 1 I I i=1 l h i (h i ) β = 1/T 3
15 Proof Lemma 1: For classifier h new learned by interpolating and as, if c max = E (x,c) D max, we have h new βh + (1 β)h L(D, h new ) L(D, h) + T βlh CS (h ) β2 T 2 c max Proof: Consider 3 cases: h is never called ( c = 0), is called exactly once ( c = 1), and is called more than once ( c 2 ) Then loss of h new is bounded as L(D, h new ) =P r(c = 0)L(D, h new c = 0) + P r(c = 1)L(D, h new c = 1) y c y h h + P r(c 2)L(D, h new c 2) (1 β) T L(D, h) + T β(1 β) T 1[ ] L(D, h) + l CS h (h ) [ [ + 1 (1 β) T T β(1 β) T 1] c max
16 Proof Cont d [ [ [ ] ] [ [ ] (1 β) T L(D, h) + T β(1 β) T 1[ ] ] L(D, h new ) = L(D, h) + l CS h (h ) [ + 1 (1 β) T T β(1 β) T 1] ( ( ) ) ( c max ( [ ] T ( ) ) T =L(D, D[ h) + T β(1 β) T 1 l CS h (h ) + ] ( 1) i β i L(D, h) ( i i=2 ] ( ) [ ) + 1 (1 β) T T β(1 β) T 1] [ ] c max L(D, [ h) + T βl CS h (h ) ] [ + 1 (1 β) T T β(1 β) T 1] [ ] (c max L(D, h)) [ L(D, h) + T βl CS h (h ] ) [ + 1 (1 β) T T β(1 ( β) T 1] c max ( ( ) T ( ) ) T =L(D, h) + T βl CS h (h ) + ( ( 1) i β i c i ( ) max ) i=2 L(D, h) + T βl CS h (h ) T 2 β 2 c max [Binomial Expansion] [Binomial Expansion] [Keep first term and t β < T/2 ] 12
17 Proof Cont d Lemma 2: After C/β iterations of Searn, the loss of the final classifier learned is bounded as ( ) 1 L(D, h last ) L(D, h 0 ) + CT l avg + c max 2 CT 2 β + T exp( C) Proof: Invoking Lemma 1 repeatedly, we get ( ) L(D, h) L(D, h 0 ) + CT l avg + ( 1 2 CT 2 β If we remove the initial (optimal) classifier, might incur a loss of ; probability of failing after C/β iterations c max T (1 β) C/β T exp[ C] ) 13
18 Experiments Handwriting recognition [Kassel 95] Named entity recognition El presidente de la [Junta de Extremadura] ORG, [Juan Carlos Rodríguez Ibarra] PER, recibirá en la sede de la [Presidencia del Gobierno] ORG extremeño a familiares de varios de los condenados por el proceso [Lasa-Zabala] MISC, entre ellos a [Lourdes Díez Urraca] PER, esposa del ex gobernador civil de [Guipúzcoa] LOC [Julen Elgorriaga] PER ; y a [Antonio Rodríguez Galindo] PER, hermano del general [Enrique Rodríguez Galindo] PER. Syntactic chunking and part-of-speech (POS) tagging [Great American] NP [said] VP [it] NP [increased] VP [its loan-loss reserves] NP [by] PP [$ 93 million] NP [after] PP [reviewing] VP [its loan portfolio] NP, [raising] VP [its total loan and real estate reserves] NP [to] PP [$ 217 million] NP. Great NNP B-NP reserves NNS I-NP portfolio NN I-NP. Ȯ American NNP I-NP said VBD B-VP it PRP B-NP 14 increased VBD B-VP its PRP$ B-NP by IN B-PP $ $ B-NP 93 CD I-NP million CD I-NP after IN B-PP reviewing VBG B-VP its PRP$ B-NP loan-loss NN I-NP loan NN I-NP
19 Experiments ALGORITHM Handwriting NER Chunk C+T Small Large Small Large CLASSIFICATION Perceptron Log Reg SVM-Lin SVM-Quad STRUCTURED Str. Perc CRF SVM struct M 3 N-Lin M 3 N-Quad SEARN Perceptron Log Reg SVM-Lin SVM-Quad
20 Experiments New vine-growth model for sentence summarization DUC 2005 data set: 50 sets of 25 documents each Evaluation: Rouge ( n-gram overlap) vs. human summaries 26 Hal Daumé III et al. ORACLE SEARN BAYESUM Vine Extr Vine Extr D05 D03 Base Best 100 w w Table 2 Summarization results; values are Rouge 2 scores (higher is better). 16
21 Bibliography Harold C. Daumé III, Practical structured learning for natural language processing, Ph.D. Thesis, University of Southern California, Harold C. Daumé III, John Langford, and Daniel Marcu. Search-Based Structured Prediction, Submitted to Machine Learning, 2007 Robert Kassel. A Comparison of Approaches to On-line Handwritten Character Recognition. PhD thesis, Massachusetts Institute of Technology, Spoken Language Systems Group, Koby Crammer and Yoram Singer. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2(5): , Ben Taskar, Carlos Guestrin, and Daphne Koller. Max-margin Markov networks. Neural Information Processing Systems (NIPS) 16, Ioannis Tsochantaridis, Thomas Hofmann, Thorsten Joachims, and Yasemin Altun. Support vector machine learning for interdependent and structured output spaces. Proceedings ICML, Fei Sha and Lawrence K. Saul. Large margin hidden Markov models for automatic speech recognition, Neural Information Processing Systems (NIPS) 19, William W. Cohen and Vitor Carvalho. Stacked sequential learning. In Proceedings of the International Joint Conference on Artificial Intelligence (IJ-CAI), Michael Collins and Brian Roark. Incremental parsing with the perceptron algorithm. In Proceedings of the Conference of the Association for Computational Linguistics (ACL), Alina Beygelzimer, Varsha Dani, Tom Hayes, John Langford, and Bianca Zadrozny. Error limiting reductions between classification tasks. In Proceedings of the International Conference on Machine Learning (ICML), 2005.
Discriminative Methods for Structured Prediction
Discriminative Methods for Structured Prediction Eugene Weinstein, PhD Candidate New York University, Courant Institute Department of Computer Science Depth Qualifying Exam June 20th, 2007 Talk Outline
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationSupport Vector Machines: Kernels
Support Vector Machines: Kernels CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 14.1, 14.2, 14.4 Schoelkopf/Smola Chapter 7.4, 7.6, 7.8 Non-Linear Problems
More informationCMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss
CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss Jeffrey Flanigan Chris Dyer Noah A. Smith Jaime Carbonell School of Computer Science, Carnegie Mellon University, Pittsburgh,
More informationStructured Prediction
Machine Learning Fall 2017 (structured perceptron, HMM, structured SVM) Professor Liang Huang (Chap. 17 of CIML) x x the man bit the dog x the man bit the dog x DT NN VBD DT NN S =+1 =-1 the man bit the
More informationAlgorithms for Predicting Structured Data
1 / 70 Algorithms for Predicting Structured Data Thomas Gärtner / Shankar Vembu Fraunhofer IAIS / UIUC ECML PKDD 2010 Structured Prediction 2 / 70 Predicting multiple outputs with complex internal structure
More informationSequence Labelling SVMs Trained in One Pass
Sequence Labelling SVMs Trained in One Pass Antoine Bordes 12, Nicolas Usunier 1, and Léon Bottou 2 1 LIP6, Université Paris 6, 104 Avenue du Pdt Kennedy, 75016 Paris, France 2 NEC Laboratories America,
More information2.2 Structured Prediction
The hinge loss (also called the margin loss), which is optimized by the SVM, is a ramp function that has slope 1 when yf(x) < 1 and is zero otherwise. Two other loss functions squared loss and exponential
More informationA Support Vector Method for Multivariate Performance Measures
A Support Vector Method for Multivariate Performance Measures Thorsten Joachims Cornell University Department of Computer Science Thanks to Rich Caruana, Alexandru Niculescu-Mizil, Pierre Dupont, Jérôme
More informationGeneralized Linear Classifiers in NLP
Generalized Linear Classifiers in NLP (or Discriminative Generalized Linear Feature-Based Classifiers) Graduate School of Language Technology, Sweden 2009 Ryan McDonald Google Inc., New York, USA E-mail:
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationIntroduction to Data-Driven Dependency Parsing
Introduction to Data-Driven Dependency Parsing Introductory Course, ESSLLI 2007 Ryan McDonald 1 Joakim Nivre 2 1 Google Inc., New York, USA E-mail: ryanmcd@google.com 2 Uppsala University and Växjö University,
More informationIntroduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 13 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification Mehryar Mohri - Introduction to Machine Learning page 2 Motivation
More informationDistributed Training Strategies for the Structured Perceptron
Distributed Training Strategies for the Structured Perceptron Ryan McDonald Keith Hall Gideon Mann Google, Inc., New York / Zurich {ryanmcd kbhall gmann}@google.com Abstract Perceptron training is widely
More informationLecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationThe Structured Weighted Violations Perceptron Algorithm
The Structured Weighted Violations Perceptron Algorithm Rotem Dror and Roi Reichart Faculty of Industrial Engineering and Management, Technion, IIT {rtmdrr@campus roiri@ie}.technion.ac.il Abstract We present
More informationStructured Prediction
Structured Prediction Classification Algorithms Classify objects x X into labels y Y First there was binary: Y = {0, 1} Then multiclass: Y = {1,...,6} The next generation: Structured Labels Structured
More informationPolyhedral Outer Approximations with Application to Natural Language Parsing
Polyhedral Outer Approximations with Application to Natural Language Parsing André F. T. Martins 1,2 Noah A. Smith 1 Eric P. Xing 1 1 Language Technologies Institute School of Computer Science Carnegie
More informationNLP Programming Tutorial 11 - The Structured Perceptron
NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Prediction Problems Given x, A book review Oh, man I love this book! This book is
More informationSequential Supervised Learning
Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given
More informationMore on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013
More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Classification: Maximum Entropy Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 24 Introduction Classification = supervised
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationFoundations of Machine Learning Multi-Class Classification. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Multi-Class Classification Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation Real-world problems often have multiple classes: text, speech,
More information1 SVM Learning for Interdependent and Structured Output Spaces
1 SVM Learning for Interdependent and Structured Output Spaces Yasemin Altun Toyota Technological Institute at Chicago, Chicago IL 60637 USA altun@tti-c.org Thomas Hofmann Google, Zurich, Austria thofmann@google.com
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationApplied Natural Language Processing
Applied Natural Language Processing Info 256 Lecture 20: Sequence labeling (April 9, 2019) David Bamman, UC Berkeley POS tagging NNP Labeling the tag that s correct for the context. IN JJ FW SYM IN JJ
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationLinear Classifiers IV
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationFoundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification page 2 Motivation Real-world problems often have multiple classes:
More information2006/06/19 14:19. Predicting Structured Data
Predicting Structured Data Advances in Neural Information Processing Systems Published by Morgan-Kaufmann NIPS-1 Advances in Neural Information Processing Systems 1: Proceedings of the 1988 Conference,
More informationAlgorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Classification II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Minimize Training Error? A loss function declares how costly each mistake is E.g. 0 loss for correct label,
More informationML4NLP Multiclass Classification
ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationMachine Learning Overview
Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationAN ABSTRACT OF THE DISSERTATION OF
AN ABSTRACT OF THE DISSERTATION OF Kai Zhao for the degree of Doctor of Philosophy in Computer Science presented on May 30, 2017. Title: Structured Learning with Latent Variables: Theory and Algorithms
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive
More informationStructured Prediction Theory and Algorithms
Structured Prediction Theory and Algorithms Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Google Research) Scott Yang (Courant Institute) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE
More informationChunking with Support Vector Machines
NAACL2001 Chunking with Support Vector Machines Graduate School of Information Science, Nara Institute of Science and Technology, JAPAN Taku Kudo, Yuji Matsumoto {taku-ku,matsu}@is.aist-nara.ac.jp Chunking
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationPersonal Project: Shift-Reduce Dependency Parsing
Personal Project: Shift-Reduce Dependency Parsing 1 Problem Statement The goal of this project is to implement a shift-reduce dependency parser. This entails two subgoals: Inference: We must have a shift-reduce
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationGraphical models for part of speech tagging
Indian Institute of Technology, Bombay and Research Division, India Research Lab Graphical models for part of speech tagging Different Models for POS tagging HMM Maximum Entropy Markov Models Conditional
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationLecture 7: Sequence Labeling
http://courses.engr.illinois.edu/cs447 Lecture 7: Sequence Labeling Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Recap: Statistical POS tagging with HMMs (J. Hockenmaier) 2 Recap: Statistical
More information1 Training and Approximation of a Primal Multiclass Support Vector Machine
1 Training and Approximation of a Primal Multiclass Support Vector Machine Alexander Zien 1,2 and Fabio De Bona 1 and Cheng Soon Ong 1,2 1 Friedrich Miescher Lab., Max Planck Soc., Spemannstr. 39, Tübingen,
More informationMulticlass and Introduction to Structured Prediction
Multiclass and Introduction to Structured Prediction David S. Rosenberg New York University March 27, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 27, 2018 1 / 49 Contents
More informationSubgradient Methods for Maximum Margin Structured Learning
Nathan D. Ratliff ndr@ri.cmu.edu J. Andrew Bagnell dbagnell@ri.cmu.edu Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. 15213 USA Martin A. Zinkevich maz@cs.ualberta.ca Department of Computing
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationMulticlass and Introduction to Structured Prediction
Multiclass and Introduction to Structured Prediction David S. Rosenberg Bloomberg ML EDU November 28, 2017 David S. Rosenberg (Bloomberg ML EDU) ML 101 November 28, 2017 1 / 48 Introduction David S. Rosenberg
More informationNatural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley
Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:
More informationSequence Labeling: HMMs & Structured Perceptron
Sequence Labeling: HMMs & Structured Perceptron CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu HMM: Formal Specification Q: a finite set of N states Q = {q 0, q 1, q 2, q 3, } N N Transition
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework
More informationBayes Risk Minimization in Natural Language Parsing
UNIVERSITE DE GENEVE CENTRE UNIVERSITAIRE D INFORMATIQUE ARTIFICIAL INTELLIGENCE LABORATORY Date: June, 2006 TECHNICAL REPORT Baes Risk Minimization in Natural Language Parsing Ivan Titov Universit of
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationLearning to Rank and Quadratic Assignment
Learning to Rank and Quadratic Assignment Thomas Mensink TVPA - XRCE & LEAR - INRIA Grenoble, France Jakob Verbeek LEAR Team INRIA Rhône-Alpes Grenoble, France Abstract Tiberio Caetano Machine Learning
More informationAbstract of Discriminative Methods for Label Sequence Learning by Yasemin Altun, Ph.D., Brown University, May 2005.
Abstract of Discriminative Methods for Label Sequence Learning by Yasemin Altun, Ph.D., Brown University, May 2005. Discriminative learning framework is one of the very successful fields of machine learning.
More informationFast Inference and Learning with Sparse Belief Propagation
Fast Inference and Learning with Sparse Belief Propagation Chris Pal, Charles Sutton and Andrew McCallum University of Massachusetts Department of Computer Science Amherst, MA 01003 {pal,casutton,mccallum}@cs.umass.edu
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationStructured Output Prediction: Generative Models
Structured Output Prediction: Generative Models CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 17.3, 17.4, 17.5.1 Structured Output Prediction Supervised
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L6: Structured Estimation Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune, January
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationNumerical Learning Algorithms
Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................
More informationLearning and Inference over Constrained Output
Learning and Inference over Constrained Output Vasin Punyakanok Dan Roth Wen-tau Yih Dav Zimak Department of Computer Science University of Illinois at Urbana-Champaign {punyakan, danr, yih, davzimak}@uiuc.edu
More informationProtein tertiary structure prediction with new machine learning approaches
Protein tertiary structure prediction with new machine learning approaches Rui Kuang Department of Computer Science Columbia University Supervisor: Jason Weston(NEC) and Christina Leslie(Columbia) NEC
More informationLecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron)
Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron) Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor (http://brenocon.com) 1 Models for
More informationwith Local Dependencies
CS11-747 Neural Networks for NLP Structured Prediction with Local Dependencies Xuezhe Ma (Max) Site https://phontron.com/class/nn4nlp2017/ An Example Structured Prediction Problem: Sequence Labeling Sequence
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationThe Perceptron. Volker Tresp Summer 2016
The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters
More informationAndrew McCallum Department of Computer Science University of Massachusetts Amherst, MA
University of Massachusetts TR 04-49; July 2004 1 Collective Segmentation and Labeling of Distant Entities in Information Extraction Charles Sutton Department of Computer Science University of Massachusetts
More informationTask-Oriented Dialogue System (Young, 2000)
2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More information10/17/04. Today s Main Points
Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points
More informationMACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING
MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING Outline Some Sample NLP Task [Noah Smith] Structured Prediction For NLP Structured Prediction Methods Conditional Random Fields Structured Perceptron Discussion
More informationNatural Language Processing
Natural Language Processing Global linear models Based on slides from Michael Collins Globally-normalized models Why do we decompose to a sequence of decisions? Can we directly estimate the probability
More informationHidden Markov Models
CS 2750: Machine Learning Hidden Markov Models Prof. Adriana Kovashka University of Pittsburgh March 21, 2016 All slides are from Ray Mooney Motivating Example: Part Of Speech Tagging Annotate each word
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins
Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationConditional Random Fields: An Introduction
University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 2-24-2004 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania
More informationStructural Learning with Amortized Inference
Structural Learning with Amortized Inference AAAI 15 Kai-Wei Chang, Shyam Upadhyay, Gourab Kundu, and Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign {kchang10,upadhya3,kundu2,danr}@illinois.edu
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationThe Perceptron. Volker Tresp Summer 2014
The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a
More informationFeature Noising. Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager
Feature Noising Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager Outline Part 0: Some backgrounds Part 1: Dropout as adaptive
More informationEmpirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs
Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21
More informationOn Label Dependence in Multi-Label Classification
Krzysztof Dembczynski 1,2 dembczynski@informatik.uni-marburg.de Willem Waegeman 3 willem.waegeman@ugent.be Weiwei Cheng 1 cheng@informatik.uni-marburg.de Eyke Hüllermeier 1 eyke@informatik.uni-marburg.de
More information