Predicting Sequences: Structured Perceptron. CS 6355: Structured Prediction
|
|
- Jerome Patrick
- 5 years ago
- Views:
Transcription
1 Predicting Sequences: Structured Perceptron CS 6355: Structured Prediction 1
2 Conditional Random Fields summary An undirected graphical model Decompose the score over the structure into a collection of factors Each factor assigns a score to assignment of the random variables it is connected to Training and prediction Final prediction via argmax w T Á(x, y) Train by maximum (regularized) likelihood Connections to other models Effectively a linear classifier A generalization of logistic regression to structures An instance of Markov Random Field, with some random variables observed We will see this soon 2
3 Global features The feature function decomposes over the sequence y 0 y 1 y 2 y 3 x w T Á(x, y 0, y 1 ) w T Á(x, y 1, y 2 ) w T Á(x, y 2, y 3 ) 3
4 Outline Sequence models Hidden Markov models Inference with HMM Learning Conditional Models and Local Classifiers Global models Conditional Random Fields Structured Perceptron for sequences 4
5 HMM is also a linear classifier Consider the HMM 5
6 HMM is also a linear classifier Consider the HMM Log probabilities = transition scores + emission scores 6
7 HMM is also a linear classifier Consider the HMM Define indicators: I z = 1 if z is true; else 0 7
8 HMM is also a linear classifier Consider the HMM Define indicators: I z = 1 if z is true; else 0 8
9 HMM is also a linear classifier Consider the HMM Define indicators: I z = 1 if z is true; else 0 9
10 HMM is also a linear classifier Consider the HMM Indicators: I z = 1 if z is true; else 0 10
11 HMM is also a linear classifier Consider the HMM Indicators: I z = 1 if z is true; else 0 Or equivalently This is a linear function log P terms are the weights; counts and indicators are features Can be written as w T Á(x, y) and add more features 11
12 HMM is also a linear classifier Consider the HMM Indicators: I z = 1 if z is true; else 0 Or equivalently This is a linear function log P terms are the weights; counts and indicators are features Can be written as w T Á(x, y) and add more features 12
13 HMM is a linear classifier: An example Det Noun Verb Det The dog ate the Noun homework 13
14 HMM is a linear classifier: An example Det Noun Verb Det Noun Consider The dog ate the homework 14
15 HMM is a linear classifier: An example Det Noun Verb Det Noun Consider The dog ate the homework Transition scores + Emission scores 15
16 HMM is a linear classifier: An example Det Noun Verb Det Noun Consider The dog ate the homework log P(Det! Noun) 2 + log P(Noun! Verb) 1 + Emission scores + log P(Verb! Det) 1 16
17 HMM is a linear classifier: An example Det Noun Verb Det Noun Consider The dog ate the homework log P(The Det) 1 log P(Det! Noun) 2 + log P(Noun! Verb) 1 + log P(Verb! Det) log P(dog Noun) 1 + log P(ate Verb) 1 + log P(the Det) 1 + log P(homework Noun) 1 17
18 HMM is a linear classifier: An example Det Noun Verb Det Noun Consider The dog ate the homework log P(The Det) 1 log P(Det! Noun) 2 + log P(Noun! Verb) 1 + log P(Verb! Det) log P(dog Noun) 1 + log P(ate Verb) 1 + log P(the Det) 1 w: Parameters of the model + log P(homework Noun) 1 18
19 HMM is a linear classifier: An example Det Noun Verb Det Noun Consider The dog ate the homework log P(The Det) 1 log P(Det! Noun) 2 + log P(Noun! Verb) log P(dog Noun) 1 + log P(ate Verb) 1 Á(x, y): Properties of this output and the input + log P(Verb! Det) 1 + log P(the Det) 1 + log P(homework Noun) 1 19
20 HMM is a linear classifier: An example Det Noun Verb Det Noun Consider The dog ate the homework log P(Det Noun) log P(Noun Verb) log P(Verb Det) log P The Det) log P dog Noun) log P ate Verb) log P the Det log P homework Noun) w: Parameters of the model Á(x, y): Properties of this output and the input 20
21 HMM is a linear classifier: An example Det Noun Verb Det Noun Consider The dog ate the homework log P(Det Noun) log P(Noun Verb) log P(Verb Det) log P The Det) log P dog Noun) log P ate Verb) log P the Det log P homework Noun) log P(x, y) = A linear scoring function = w T Á(x,y) w: Parameters of the model Á(x, y): Properties of this output and the input 21
22 Towards structured Perceptron 1. HMM is a linear classifier Can we treat it as any linear classifier for training? If so, we could add additional features that are global properties As long as the output can be decomposed for easy inference 2. The Viterbi algorithm calculates max w T Á(x, y) Viterbi only cares about scores to structures (not necessarily normalized) 3. We could push the learning algorithm to train for un-normalized scores If we need normalization, we could always normalize by computing exponentiating and dividing by Z That is, the learning algorithm can effectively just focus on the score of y for a particular x Train a discriminative model! 22
23 Towards structured Perceptron 1. HMM is a linear classifier Can we treat it as any linear classifier for training? If so, we could add additional features that are global properties As long as the output can be decomposed for easy inference 2. The Viterbi algorithm calculates max w T Á(x, y) Viterbi only cares about scores to structures (not necessarily normalized) 3. We could push the learning algorithm to train for un-normalized scores If we need normalization, we could always normalize by computing exponentiating and dividing by Z That is, the learning algorithm can effectively just focus on the score of y for a particular x Train a discriminative model! 23
24 Towards structured Perceptron 1. HMM is a linear classifier Can we treat it as any linear classifier for training? If so, we could add additional features that are global properties As long as the output can be decomposed for easy inference 2. The Viterbi algorithm calculates max w T Á(x, y) Viterbi only cares about scores to structures (not necessarily normalized) 3. We could push the learning algorithm to train for un-normalized scores If we need normalization, we could always normalize by exponentiating and dividing by Z That is, the learning algorithm can effectively just focus on the score of y for a particular x Train a discriminative model instead of the generative one! 24
25 Structured Perceptron algorithm Given a training set D = {(x,y)} 1. Initialize w = 0 2 < n 2. For epoch = 1 T: 1. For each training example (x, y) 2 D: 1. Predict Structured y = argmax Perceptron y w T Á(x, update y ) 2. If y y, update w à w + learningrate (Á(x, y) - Á(x, y )) 3. Return w Prediction: argmax y w T Á(x, y) 25
26 Structured Perceptron algorithm Given a training set D = {(x,y)} 1. Initialize w = 0 2 < n 2. For epoch = 1 T: 1. For each training example (x, y) 2 D: 1. Predict y = argmax y w T Á(x, y ) 2. If y y, update w à w + learningrate (Á(x, y) - Á(x, y )) 3. Return w Prediction: argmax y w T Á(x, y) 26
27 Structured Perceptron algorithm Given a training set D = {(x,y)} 1. Initialize w = 0 2 < n 2. For epoch = 1 T: 1. For each training example (x, y) 2 D: 1. Predict y = argmax y w T Á(x, y ) 2. If y y, update w à w + learningrate (Á(x, y) - Á(x, y )) 3. Return w Prediction: argmax y w T Á(x, y) Update only on an error. Perceptron is an mistakedriven algorithm. If there is a mistake, promote y and demote y 27
28 Structured Perceptron algorithm Given a training set D = {(x,y)} 1. Initialize w = 0 2 < n 2. For epoch = 1 T: 1. For each training example (x, y) 2 D: 1. Predict y = argmax y w T Á(x, y ) 2. If y y, update w à w + learningrate (Á(x, y) - Á(x, y )) 3. Return w Prediction: argmax y w T Á(x, y) T is a hyperparameter to the algorithm Update only on an error. Perceptron is an mistakedriven algorithm. If there is a mistake, promote y and demote y 28
29 Structured Perceptron algorithm Given a training set D = {(x,y)} 1. Initialize w = 0 2 < n 2. For epoch = 1 T: 1. For each training example (x, y) 2 D: 1. Predict y = argmax y w T Á(x, y ) 2. If y y, update w à w + learningrate (Á(x, y) - Á(x, y )) 3. Return w Prediction: argmax y w T Á(x, y) In practice, good to shuffle D before the inner loop T is a hyperparameter to the algorithm Update only on an error. Perceptron is an mistakedriven algorithm. If there is a mistake, promote y and demote y 29
30 Structured Perceptron algorithm Given a training set D = {(x,y)} 1. Initialize w = 0 2 < n 2. For epoch = 1 T: 1. For each training example (x, y) 2 D: 1. Predict y = argmax y w T Á(x, y ) 2. If y y, update w à w + learningrate (Á(x, y) - Á(x, y )) 3. Return w Prediction: argmax y w T Á(x, y) In practice, good to shuffle D before the inner loop T is a hyperparameter to the algorithm Inference in training loop! Update only on an error. Perceptron is an mistakedriven algorithm. If there is a mistake, promote y and demote y 30
31 Notes on structured perceptron Mistake bound for separable data, just like perceptron In practice, use averaging for better generalization Initialize a = 0 After each step, whether there is an update or not, a à w + a Note, we still check for mistake using w not a Return a at the end instead of w Exercise: Optimize this for performance modify a only on errors Global update One weight vector for entire sequence Not for each position Same algorithm can be derived via constraint classification Create a binary classification data set and run perceptron 31
32 Structured Perceptron with averaging Given a training set D = {(x,y)} 1. Initialize w = 0 2 < n, a = 0 2 < n 2. For epoch = 1 T: 1. For each training example (x, y) 2 D: 1. Predict y = argmax y w T Á(x, y ) 2. If y y, update w à w + learningrate (Á(x, y) - Á(x, y )) 3. Set a à a + w 3. Return a 32
33 CRF vs. structured perceptron Stochastic gradient descent update for CRF For a training example (x i, y i ) Structured perceptron For a training example (x i, y i ) Expectation vs max Caveat: Adding regularization will change the CRF update, averaging changes the perceptron update 33
34 The lay of the land HMM: A generative model, assigns probabilities to sequences 34
35 The lay of the land HMM: A generative model, assigns probabilities to sequences Two roads diverge 35
36 The lay of the land HMM: A generative model, assigns probabilities to sequences Hidden Markov Models are actually just linear classifiers Don t really care whether we are predicting probabilities. We are assigning scores to a full output for a given input (like multiclass) Generalize algorithms for linear classifiers. Sophisticated models that can use arbitrary features Structured Perceptron Structured SVM Two roads diverge 36
37 The lay of the land HMM: A generative model, assigns probabilities to sequences Two roads diverge Hidden Markov Models are actually just linear classifiers Don t really care whether we are predicting probabilities. We are assigning scores to a full output for a given input (like multiclass) Generalize algorithms for linear classifiers. Sophisticated models that can use arbitrary features Structured Perceptron Structured SVM Model probabilities via exponential functions. Gives us the log-linear representation Log-probabilities for sequences for a given input Learn by maximizing likelihood. Sophisticated models that can use arbitrary features Conditional Random field 37
38 The lay of the land HMM: A generative model, assigns probabilities to sequences Hidden Markov Models are actually just linear classifiers Don t really care whether we are predicting probabilities. We are assigning scores to a full output for a given input (like multiclass) Generalize algorithms for linear classifiers. Sophisticated models that can use arbitrary features Two roads diverge Discriminative/Conditional models Model probabilities via exponential functions. Gives us the log-linear representation Log-probabilities for sequences for a given input Learn by maximizing likelihood. Sophisticated models that can use arbitrary features Structured Perceptron Structured SVM Conditional Random field 38
39 The lay of the land HMM: A generative model, assigns probabilities to sequences Hidden Markov Models are actually just linear classifiers Don t really care whether we are predicting probabilities. We are assigning scores to a full output for a given input (like multiclass) Generalize algorithms for linear classifiers. Sophisticated models that can use arbitrary features Two roads diverge Discriminative/Conditional models Model probabilities via exponential functions. Gives us the log-linear representation Log-probabilities for sequences for a given input Learn by maximizing likelihood. Sophisticated models that can use arbitrary features Structured Perceptron Structured SVM Conditional Random field Applicable beyond sequences Eventually, similar objective minimized with different loss functions 39
40 The lay of the land HMM: A generative model, assigns probabilities to sequences Hidden Markov Models are actually just linear classifiers Don t really care whether we are predicting probabilities. We are assigning scores to a full output for a given input (like multiclass) Generalize algorithms for linear classifiers. Sophisticated models that can use arbitrary features Two roads diverge Discriminative/Conditional models Model probabilities via exponential functions. Gives us the log-linear representation Log-probabilities for sequences for a given input Learn by maximizing likelihood. Sophisticated models that can use arbitrary features Structured Perceptron Structured SVM Conditional Random field Coming soon Applicable beyond sequences Eventually, similar objective minimized with different loss functions 40
41 Sequence models: Summary Goal: Predict an output sequence given input sequence Hidden Markov Model Inference Predict via Viterbi algorithm Conditional models/discriminative models Local approaches (no inference during training) MEMM, conditional Markov model Global approaches (inference during training) CRF, structured perceptron Prediction is not always tractable for general structures Same dichotomy for more general structures To think What are the parts in a sequence model? How is each model scoring these parts? 41
Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron)
Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron) Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor (http://brenocon.com) 1 Models for
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationLecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationLecture 3: Multiclass Classification
Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationConditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013
Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General
More informationSequential Supervised Learning
Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationStatistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields
Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationProbabilistic Models for Sequence Labeling
Probabilistic Models for Sequence Labeling Besnik Fetahu June 9, 2011 Besnik Fetahu () Probabilistic Models for Sequence Labeling June 9, 2011 1 / 26 Background & Motivation Problem introduction Generative
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationSequence Labeling: HMMs & Structured Perceptron
Sequence Labeling: HMMs & Structured Perceptron CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu HMM: Formal Specification Q: a finite set of N states Q = {q 0, q 1, q 2, q 3, } N N Transition
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationWhat s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction
Hidden Markov Models (HMMs) for Information Extraction Daniel S. Weld CSE 454 Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) standard sequence model in genomics, speech, NLP, What
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationCISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)
CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationMidterm sample questions
Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts
More informationLinear Classifiers IV
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationLearning: Binary Perceptron. Examples: Perceptron. Separable Case. In the space of feature vectors
Linear Classifiers CS 88 Artificial Intelligence Perceptrons and Logistic Regression Pieter Abbeel & Dan Klein University of California, Berkeley Feature Vectors Some (Simplified) Biology Very loose inspiration
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationLog-Linear Models, MEMMs, and CRFs
Log-Linear Models, MEMMs, and CRFs Michael Collins 1 Notation Throughout this note I ll use underline to denote vectors. For example, w R d will be a vector with components w 1, w 2,... w d. We use expx
More informationThe Perceptron Algorithm
The Perceptron Algorithm Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Outline The Perceptron Algorithm Perceptron Mistake Bound Variants of Perceptron 2 Where are we? The Perceptron
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationLecture 6: Graphical Models
Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationSupport Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017
Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationCS 7180: Behavioral Modeling and Decision- making in AI
CS 7180: Behavioral Modeling and Decision- making in AI Learning Probabilistic Graphical Models Prof. Amy Sliva October 31, 2012 Hidden Markov model Stochastic system represented by three matrices N =
More information10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging
10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will
More informationHidden Markov Models Part 2: Algorithms
Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationStructure Learning in Sequential Data
Structure Learning in Sequential Data Liam Stewart liam@cs.toronto.edu Richard Zemel zemel@cs.toronto.edu 2005.09.19 Motivation. Cau, R. Kuiper, and W.-P. de Roever. Formalising Dijkstra's development
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationMore on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013
More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationDirected Probabilistic Graphical Models CMSC 678 UMBC
Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,
More informationMACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING
MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING Outline Some Sample NLP Task [Noah Smith] Structured Prediction For NLP Structured Prediction Methods Conditional Random Fields Structured Perceptron Discussion
More informationNLP Programming Tutorial 11 - The Structured Perceptron
NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Prediction Problems Given x, A book review Oh, man I love this book! This book is
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationInformation Extraction from Text
Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information
More informationML4NLP Multiclass Classification
ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we
More informationA.I. in health informatics lecture 8 structured learning. kevin small & byron wallace
A.I. in health informatics lecture 8 structured learning kevin small & byron wallace today models for structured learning: HMMs and CRFs structured learning is particularly useful in biomedical applications:
More informationCS 6140: Machine Learning Spring 2017
CS 6140: Machine Learning Spring 2017 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis@cs Assignment
More information2.2 Structured Prediction
The hinge loss (also called the margin loss), which is optimized by the SVM, is a ramp function that has slope 1 when yf(x) < 1 and is zero otherwise. Two other loss functions squared loss and exponential
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More informationCS 188: Artificial Intelligence. Outline
CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationClassification with Perceptrons. Reading:
Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive
More informationRandom Field Models for Applications in Computer Vision
Random Field Models for Applications in Computer Vision Nazre Batool Post-doctorate Fellow, Team AYIN, INRIA Sophia Antipolis Outline Graphical Models Generative vs. Discriminative Classifiers Markov Random
More informationGraphical models for part of speech tagging
Indian Institute of Technology, Bombay and Research Division, India Research Lab Graphical models for part of speech tagging Different Models for POS tagging HMM Maximum Entropy Markov Models Conditional
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationLogistic Regression Trained with Different Loss Functions. Discussion
Logistic Regression Trained with Different Loss Functions Discussion CS640 Notations We restrict our discussions to the binary case. g(z) = g (z) = g(z) z h w (x) = g(wx) = + e z = g(z)( g(z)) + e wx =
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationBowl Maximum Entropy #4 By Ejay Weiss. Maxent Models: Maximum Entropy Foundations. By Yanju Chen. A Basic Comprehension with Derivations
Bowl Maximum Entropy #4 By Ejay Weiss Maxent Models: Maximum Entropy Foundations By Yanju Chen A Basic Comprehension with Derivations Outlines Generative vs. Discriminative Feature-Based Models Softmax
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationHidden Markov Models
Hidden Markov Models Slides mostly from Mitch Marcus and Eric Fosler (with lots of modifications). Have you seen HMMs? Have you seen Kalman filters? Have you seen dynamic programming? HMMs are dynamic
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationLogistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor
Logistic Regression Some slides adapted from Dan Jurfasky and Brendan O Connor Naïve Bayes Recap Bag of words (order independent) Features are assumed independent given class P (x 1,...,x n c) =P (x 1
More informationEnergy Based Learning. Energy Based Learning
Energy Based Learning Energy Based Learning The Courant Institute of Mathematical Sciences The Courant Institute of Mathematical Sciences New York University New York University http://yann.lecun.com http://www.cs.nyu.edu/~yann
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationCSCI 5832 Natural Language Processing. Today 2/19. Statistical Sequence Classification. Lecture 9
CSCI 5832 Natural Language Processing Jim Martin Lecture 9 1 Today 2/19 Review HMMs for POS tagging Entropy intuition Statistical Sequence classifiers HMMs MaxEnt MEMMs 2 Statistical Sequence Classification
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationCSC321 Lecture 4 The Perceptron Algorithm
CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:
More informationSPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks
Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension
More informationSequential Data Modeling - The Structured Perceptron
Sequential Data Modeling - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Prediction Problems Given x, predict y 2 Prediction Problems Given x, A book review
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework
More information