CS395T: Structured Models for NLP Lecture 2: Machine Learning Review
|
|
- Clarence Harris
- 5 years ago
- Views:
Transcription
1 CS395T: Structured Models for NLP Lecture 2: Machine Learning Review Greg Durrett Some slides adapted from Vivek Srikumar, University of Utah
2 Administrivia Course enrollment Lecture slides posted on website
3 This Lecture Linear classificalon fundamentals Naive Bayes, maximum likelihood in generalve models Three discriminalve models: logislc regression, perceptron, SVM Different molvalons but very similar update rules / inference!
4 ClassificaLon Datapoint x with label y 2{0, 1} Embed datapoint in a feature space f(x) 2 R n but in this lecture f(x) and x are interchangeable Linear decision rule: w > f(x)+b>0 w > f(x) > 0 Can delete bias if we augment feature space: f(x) = [0.5, 1.6, 0.3] [0.5, 1.6, 0.3, 1]
5 Linear funclons are powerful! x x x
6 Linear funclons are powerful! x 2 x 1 x ??? x 1 x f(x) = [x 1, x 2 ] f(x) = [x 1, x 2, x 12, x 22, x 1 x 2 ] Kernel trick does this for free, but is too expensive to use in NLP applicalons, training is O(n 2 ) instead of O(n (num feats))
7 ClassificaLon: SenLment Analysis this movie was great! would watch again PosiLve this movie was not really very enjoyable NegaLve Doing well at this is going to require structure, but let s start with simple approaches
8 Text ClassificaLon: Ham or Spam Hi, I just wanted to send over the latest results from training the LSTM model. In the aeachment. What do you think of the performance? hi i have very valuable business proposilon for you. you make lots of $$$ I just need you to send a small amount of funds Ham Spam Surface cues can basically tell you what s going on here Machine learning is good at this! Lots of data, simple paeern recognilon task, hard to write rules by hand
9 Text ClassificaLon: Ham or Spam Hi, I just wanted to send over the latest results from training the LSTM model. In the aeachment. What do you think of the performance? hi i have very valuable business proposilon for you. you make lots of $$$ I just need you to send a small amount of funds Ham Spam??? Why do we think this? CondiLonal probabililes (chance of spam given $$$ is high)
10 Text ClassificaLon: Ham or Spam Hi, I just wanted to send over the latest results from training the LSTM model. In the aeachment. What do you think of the performance? hi i have very valuable business proposilon for you. you make lots of $$$ I just need you to send a small amount of funds Ham Spam??? Feature representalon: Indicator[doc contains $$$], Indicator[doc contains training], Indicator[doc contains send] Convert a document to a vector: [1, 0, 1, ] (~50,000 long) Requires indexing the features (mapping them to axes) Very high dimensional space! How do we learn feature weights?
11 Naive Bayes Data point x =(x 1,...,x n ), label y 2{0, 1} Formulate a probabilislc model that places a distribulon P (x, y) Compute P (y x) and then label an example with argmax y P (y x) P (y x) = P (y)p (x y) Bayes Rule y P (x) constant: irrelevant / P (y)p (x y) for finding the max x i ny Naive assumplon: n = P (y) i=1 P (x i y) linear model! nx # argmax y P (y x) = argmax y log P (y x) = argmax y "log P (y)+ log P (x i y) i=1
12 Text ClassificaLon: Ham or Spam Hi, I just wanted to send over the latest results from training the LSTM model. In the aeachment. What do you think of the performance? hi i have very valuable business proposilon for you. you make lots of $$$ I just need you to send a small amount of funds Ham Spam??? nx # argmax y log P (y x) = argmax y "log P (y)+ log P (x i y) i=1 P (x funds =0 spam) = 0.9 P (x funds =1 spam) = 0.1 P (x funds =0 ham) = 0.99 P (x funds =1 ham) = 0.01 spam gets more points in the final posterior Note that this is not P (y x) not the probability of ham given the word
13 Maximum Likelihood EsLmaLon Data points (x j,y j ) provided (j indexes over examples) Find values of my P (y), P(x i y) that maximize data likelihood (generalve): " # my Y n P (y j,x j )= P (y j ) P (x ji y j ) j=1 j=1 i=1 data points (j) features (i) ith feature of jth example Equivalent to maximizing logarithm of data likelihood: mx mx " nx # log P (y j,x j )= log P (y j )+ log P (x ji y j ) j=1 j=1 i=1
14 Maximum Likelihood EsLmaLon Imagine a coin flip which is heads with probability p Observe (H, H, H, T) and maximize log likelihood likelihood mx log P (y j ) = 3 log p + log(1 p) p j=1 Maximum likelihood parameters for mullnomial = read counts off of the data
15 Maximum Likelihood for Naive Bayes Hi, I just wanted to send over the latest results from training the LSTM model. In the aeachment. What do you think of the performance? hi i have very valuable business proposilon for you. you make lots of $$$ I just need you to send a small amount of funds Ham Spam??? P (y = ham) = 0.5 P (x funds =1 spam) = 1 P (x funds =0 spam) = 0 Smoothing: add very small counts for each entry to avoid zeroes (bias-variance tradeoff) P (x funds =1 spam) = 0.99 P (x funds =0 spam) = 0.01
16 Naive Bayes: Summary Model y ny P (x, y) =P (y) P (x i y) x i i=1 n Inference nx # argmax y log P (y x) = argmax y "log P (y)+ log P (x i y) i=1 AlternaLvely: log P (y = spam x) log P (y = ham x) > 0, log P (y = spam x) P (y = ham x) + n X i=1 log P (x i y = spam) P (x i y = ham) > 0 Learning: maximize P (x, y) by reading counts off the data
17 Problems with Naive Bayes Features are correlated P (x funds =1 spam) = 0.1 P (x funds =1 ham) = 0.01 P (x transfer =1 spam) = 0.1 P (x transfer =1 ham) = 0.01 Hi, in order to close on the house we need you to transfer the requested funds to the escrow account. Ham Spam??? This one sentence will make the probability of spam very high! Ham Bad independence assumplon in NB: these words are not independent! SoluLon: beeer model, algorithms that explicitly minimize loss rather than maximizing data likelihood
18 GeneraLve vs. DiscriminaLve Models GeneraLve models: P (x, y) Bayes nets / graphical models Some of the model capacity goes to explaining the distribulon of x; prediclon uses Bayes rule post-hoc Can sample new instances (x, y) DiscriminaLve models: P (y x) SVMs, logislc regression, CRFs, most neural networks Model is trained to be good at prediclon, but doesn t model x We ll come back to this dislnclon throughout this class Break!
19 LogisLc Regression P (y = spam x) = logistic(w > x) P (y = spam x) = exp (P n i=1 w ix i ) 1+exp( P n i=1 w ix i ) How to set the weights w? (StochasLc) gradient ascent to maximize log likelihood L(x j,y j = spam) = log P (y j = spam x j ) nx = w i x ji log 1+exp nx w i x ji!! i=1 i=1
20 LogisLc Regression L(x j,y j = spam) = log P (y j x j )= nx w i x ji log 1+exp nx w i x ji!! i=1 j,y j i = i log 1+exp = x ji 1 1+exp( P n i=1 w ix ji ) i w i x ji!! 1+exp = x ji 1 1+exp( P n i=1 w ix ji ) x ji exp nx i=1 nx i=1 w i x ji!! w i x ji! deriv of log deriv of exp = x ji x ji exp ( P n i=1 w ix ji ) 1+exp( P n i=1 w ix ji ) = x ji (1 P (y j = spam x j ))
21 LogisLc Regression Gradient of w i on posilve example = x ji (1 P (y j = spam x j )) If P(spam) is close to 1, make very liele update Otherwise make w i look more like x ji, which will increase P(spam) Gradient of wi on negalve example = x ji ( P (y j = spam x j )) If P(spam) is close to 0, make very liele update Otherwise make w i look less like x ji, which will decrease P(spam) Final gradient: x j (y j P (y j =1 x j ))
22 RegularizaLon Can end up making extreme updates to fit the training data w funds = w transfer = -900 w send = +742 w the = +203 All examples have P(correct) > 0.999, but classifier does crazy things on new examples
23 RegularizaLon Can end up making extreme updates to fit the training data Rather than oplmizing likelihood alone, impose a penalty on the norm of the weight vector (can also view as a Gaussian prior) w 2 `1 Maximize mx `2 L(x j,y j ) kwk 2 2 j=1 w 1
24 LogisLc Regression: Summary Model P (y = spam x) = exp (P n i=1 w ix i ) 1+exp( P n i=1 w ix i ) Inference argmax y P (y x) similar to Naive Bayes, but different model/learning P (y =1 x) 0.5, w > x 0 Learning: gradient ascent on the (regularized) discriminalve loglikelihood
25 Perceptron Simple error-driven learning approach similar to logislc regression Decision rule: w > f(x) > 0 LogisLc Regression If incorrect: if posilve, w w + x w w + x(1 P (y =1 x)) if negalve, w w x w w xp (y =1 x) Guaranteed to eventually separate the data if the data are separable, but does it learn a good boundary?
26 Support Vector Machines Many separalng hyperplanes is there a best one?
27 Support Vector Machines Many separalng hyperplanes is there a best one? margin
28 Support Vector Machines Constraint formulalon: find w via following quadralc program: Minimize kwk 2 2 minimizing norm <=> s.t. 8j w > x j 1ify j =1 maximizing margin w > x j apple 1ify j =0 As a single constraint: 8j y j (w > x j )+(1 y j )( w > x j ) 1 8j (2y j 1)(w > x j ) 1 What s wrong with this quadralc program for real data? Data is generally non-separable need slack!
29 N-Slack SVMs Minimize kwk mx j=1 j s.t. 8j (2y j 1)(w > x j ) 1 j 8j j 0 The j are a fudge factor to make all constraints salsfied (Sub-)gradient descent: focus on second part i j =0if i j =(2y j 1)x ji if j > 0 = x ji if y j =1, x ji if y j =0 Looks like the perceptron! But updates more frequently
30 Loss FuncLons Hinge (SVM) 0-1 (ideal) LogisLc Perceptron
31 OpLmizaLon next Lme Haven t talked about oplmizalon at all Range of techniques from simple gradient descent (works preey well) to more complex methods (can work beeer)
32 SenLment Analysis Classify sentence as posilve or negalve senlment PosiLve this movie was great! would watch again NegaLve the movie was gross and overwrought, but I liked it this movie was not really very enjoyable Bag-of-words doesn t seem sufficient (discourse structure, negalon) There are some ways around this: extract bigram feature for not X for all X following the not Bo Pang, Lillian Lee, Shivakumar Vaithyanathan (2002)
33 SenLment Analysis Simple feature sets can do preey well! Bo Pang, Lillian Lee, Shivakumar Vaithyanathan (2002)
34 SenLment Analysis Wang and Manning (2012) Naive Bayes is doing well! Ng and Jordan (2002) NB can be beeer for small data Before neural nets had taken off results weren t that great Two years later Kim (2014) with neural networks
35 Recap LogisLc regression: P (y =1 x) = exp ( P n i=1 w ix i ) (1 + exp ( P n i=1 w ix i )) Decision rule: P (y =1 x) 0.5, w > x 0 Gradient (unregularized): x(y P (y =1 x)) SVM: Decision rule: w > x 0 (Sub)gradient (unregularized): 0 if correct with margin of 1, else x(2y 1)
36 Recap LogisLc regression, SVM, and perceptron are closely related SVM and perceptron inference require taking maxes, logislc regression has a similar update but is soyer due to its probabilislc nature All gradient updates: make it look more like the right thing and less like the wrong thing
CS395T: Structured Models for NLP Lecture 5: Sequence Models II
CS395T: Structured Models for NLP Lecture 5: Sequence Models II Greg Durrett Some slides adapted from Dan Klein, UC Berkeley Recall: HMMs Input x =(x 1,...,x n ) Output y =(y 1,...,y n ) y 1 y 2 y n P
More informationLinear Regression. Machine Learning Seyoung Kim. Many of these slides are derived from Tom Mitchell. Thanks!
Linear Regression Machine Learning 10-601 Seyoung Kim Many of these slides are derived from Tom Mitchell. Thanks! Regression So far, we ve been interested in learning P(Y X) where Y has discrete values
More informationSlides modified from: PATTERN RECOGNITION CHRISTOPHER M. BISHOP. and: Computer vision: models, learning and inference Simon J.D.
Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP and: Computer vision: models, learning and inference. 2011 Simon J.D. Prince ClassificaLon Example: Gender ClassificaLon
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 23: Perceptrons 11/20/2008 Dan Klein UC Berkeley 1 General Naïve Bayes A general naive Bayes model: C E 1 E 2 E n We only specify how each feature depends
More informationGeneral Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering
CS 188: Artificial Intelligence Fall 2008 General Naïve Bayes A general naive Bayes model: C Lecture 23: Perceptrons 11/20/2008 E 1 E 2 E n Dan Klein UC Berkeley We only specify how each feature depends
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationSupport Vector Machines
Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of
More informationStatistical Methods for NLP
Statistical Methods for NLP Text Categorization, Support Vector Machines Sameer Maskey Announcement Reading Assignments Will be posted online tonight Homework 1 Assigned and available from the course website
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationAlgorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Classification II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Minimize Training Error? A loss function declares how costly each mistake is E.g. 0 loss for correct label,
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationAbout this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes
About this class Maximum margin classifiers SVMs: geometric derivation of the primal problem Statement of the dual problem The kernel trick SVMs as the solution to a regularization problem Maximizing the
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland SUPPORT VECTOR MACHINES Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Machine Learning: Jordan
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationCSE546: SVMs, Dual Formula5on, and Kernels Winter 2012
CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012 Luke ZeClemoyer Slides adapted from Carlos Guestrin Linear classifiers Which line is becer? w. = j w (j) x (j) Data Example i Pick the one with the
More informationLinear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt)
Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt) Nathan Schneider (some slides borrowed from Chris Dyer) ENLP 12 February 2018 23 Outline Words, probabilities Features,
More informationNatural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley
Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationGenerative Learning algorithms
CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,
More informationCOMP 562: Introduction to Machine Learning
COMP 562: Introduction to Machine Learning Lecture 20 : Support Vector Machines, Kernels Mahmoud Mostapha 1 Department of Computer Science University of North Carolina at Chapel Hill mahmoudm@cs.unc.edu
More informationDay 4: Classification, support vector machines
Day 4: Classification, support vector machines Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 21 June 2018 Topics so far
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationStatistical NLP Spring A Discriminative Approach
Statistical NLP Spring 2008 Lecture 6: Classification Dan Klein UC Berkeley A Discriminative Approach View WSD as a discrimination task (regression, really) P(sense context:jail, context:county, context:feeding,
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationCS388: Natural Language Processing Lecture 9: CNNs, Neural CRFs. CNNs. Administrivia. This Lecture. Greg Durrett. Project 1 due today at 5pm
CS388: Natural Language Processing Lecture 9: CNNs, Neural CRFs Project 1 due today at 5pm Administrivia Mini 2 out tonight, due in two weeks Greg Durrett This Lecture CNNs CNNs for SenLment Neural CRFs
More informationCOMP 875 Announcements
Announcements Tentative presentation order is out Announcements Tentative presentation order is out Remember: Monday before the week of the presentation you must send me the final paper list (for posting
More informationMachine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.
10-601 Machine Learning, Midterm Exam: Spring 2008 Please put your name on this cover sheet If you need more room to work out your answer to a question, use the back of the page and clearly mark on the
More informationCS 188: Artificial Intelligence Spring Today
CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationMachine Learning. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 8 May 2012
Machine Learning Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421 Introduction to Artificial Intelligence 8 May 2012 g 1 Many slides courtesy of Dan Klein, Stuart Russell, or Andrew
More informationCS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)
CS325 Artificial Intelligence Cengiz Spring 2013 Model Complexity in Learning f(x) x Model Complexity in Learning f(x) x Let s start with the linear case... Linear Regression Linear Regression price =
More informationPart of the slides are adapted from Ziko Kolter
Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationLogistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor
Logistic Regression Some slides adapted from Dan Jurfasky and Brendan O Connor Naïve Bayes Recap Bag of words (order independent) Features are assumed independent given class P (x 1,...,x n c) =P (x 1
More informationBANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1
BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationSupport Vector Machines and Bayes Regression
Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationTopics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families
Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationPolynomial Regression and Regularization
Polynomial Regression and Regularization Administrivia o If you still haven t enrolled in Moodle yet Enrollment key on Piazza If joined course recently, email me to get added to Piazza o Homework 1 posted
More information