Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri
|
|
- Toby Perry
- 6 years ago
- Views:
Transcription
1 Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31
2 Motivation PAC learning: distribution fixed over time (training and test), IID assumption. On-line learning: no distributional assumption. worst-case analysis (adversarial). mixed training and test. Performance measure: mistake model, regret. Jordan Boyd-Graber Boulder Online Learning 2 of 31
3 General Online Setting For t = 1 to T : Get instance x t X Predict ŷ t Y Get true label y t Y Incur loss L(ŷ t, y t ) Classification: Y = {0, 1}, L(y, y ) = y y Regression: Y R, L(y, y ) = (y y) 2 Jordan Boyd-Graber Boulder Online Learning 3 of 31
4 General Online Setting For t = 1 to T : Get instance x t X Predict ŷ t Y Get true label y t Y Incur loss L(ŷ t, y t ) Classification: Y = {0, 1}, L(y, y ) = y y Regression: Y R, L(y, y ) = (y y) 2 Objective: Minimize total loss t L(ŷ t, y t ) Jordan Boyd-Graber Boulder Online Learning 3 of 31
5 Plan Experts Perceptron Algorithm Online Perceptron for Structure Learning Jordan Boyd-Graber Boulder Online Learning 4 of 31
6 Prediction with Expert Advice For t = 1 to T : Get instance x t X and advice a t, i Y, i [1, N] Predict ŷ t Y Get true label y t Y Incur loss L(ŷ t, y t ) Jordan Boyd-Graber Boulder Online Learning 5 of 31
7 Prediction with Expert Advice For t = 1 to T : Get instance x t X and advice a t, i Y, i [1, N] Predict ŷ t Y Get true label y t Y Incur loss L(ŷ t, y t ) Objective: Minimize regret, i.e., difference of total loss vs. best expert Regret(T ) = L(ŷ t, y t ) min L(a t,i, y t ) (1) i t t Jordan Boyd-Graber Boulder Online Learning 5 of 31
8 Mistake Bound Model Define the maximum number of mistakes a learning algorithm L makes to learn a concept c over any set of examples (until it s perfect). M L (c) = max x 1,...,x T mistakes(l, c) (2) For any concept class C, this is the max over concepts c. M L (C) = max c C M L(c) (3) Jordan Boyd-Graber Boulder Online Learning 6 of 31
9 Mistake Bound Model Define the maximum number of mistakes a learning algorithm L makes to learn a concept c over any set of examples (until it s perfect). M L (c) = max x 1,...,x T mistakes(l, c) (2) For any concept class C, this is the max over concepts c. M L (C) = max c C M L(c) (3) In the expert advice case, assumes some expert matches the concept (realizable) Jordan Boyd-Graber Boulder Online Learning 6 of 31
10 Halving Algorithm H 1 H; for t 1... T do Receive x t ; ŷ t Majority(H t, a t, x t ); Receive y t ; if ŷ t y t then H t+1 {a H t : a(x t ) = y t }; return H T +1 Algorithm 1: The Halving Algorithm (Mitchell, 1997) Jordan Boyd-Graber Boulder Online Learning 7 of 31
11 Halving Algorithm Bound (Littlestone, 1998) For a finite hypothesis set M Halving(H) lg H (4) After each mistake, the hypothesis set is reduced by at least by half Jordan Boyd-Graber Boulder Online Learning 8 of 31
12 Halving Algorithm Bound (Littlestone, 1998) For a finite hypothesis set M Halving(H) lg H (4) After each mistake, the hypothesis set is reduced by at least by half Consider the optimal mistake bound opt(h). Then VC(H) opt(h) M Halving(H) lg H (5) For a fully shattered set, form a binary tree of mistakes with height VC(H) Jordan Boyd-Graber Boulder Online Learning 8 of 31
13 Halving Algorithm Bound (Littlestone, 1998) For a finite hypothesis set M Halving(H) lg H (4) After each mistake, the hypothesis set is reduced by at least by half Consider the optimal mistake bound opt(h). Then VC(H) opt(h) M Halving(H) lg H (5) For a fully shattered set, form a binary tree of mistakes with height VC(H) What about non-realizable case? Jordan Boyd-Graber Boulder Online Learning 8 of 31
14 Weighted Majority (Littlestone and Warmuth, 1998) for t 1... N do w 1,i 1; for t 1... T do Receive x t ; ŷ t 1 [ y t,i =1 w t y t,i =0 w t Receive y t ; if ŷ t y t then for t 1... N do if ŷ t y t then w t+1,i βw t,i ; else wt+1,i w t,i ] ; Weights for every expert Classifications in favor of side with higher total weight (y {0, 1}) Experts that are wrong get their weights decreased (β [0, 1]) If you re right, you stay unchanged return w T +1 Jordan Boyd-Graber Boulder Online Learning 9 of 31
15 Weighted Majority (Littlestone and Warmuth, 1998) for t 1... N do w 1,i 1; for t 1... T do Receive x t ; ŷ t 1 [ y t,i =1 w t y t,i =0 w t Receive y t ; if ŷ t y t then for t 1... N do if ŷ t y t then w t+1,i βw t,i ; else wt+1,i w t,i ] ; Weights for every expert Classifications in favor of side with higher total weight (y {0, 1}) Experts that are wrong get their weights decreased (β [0, 1]) If you re right, you stay unchanged return w T +1 Jordan Boyd-Graber Boulder Online Learning 9 of 31
16 Weighted Majority (Littlestone and Warmuth, 1998) for t 1... N do w 1,i 1; for t 1... T do Receive x t ; ŷ t 1 [ y t,i =1 w t y t,i =0 w t Receive y t ; if ŷ t y t then for t 1... N do if ŷ t y t then w t+1,i βw t,i ; else wt+1,i w t,i ] ; Weights for every expert Classifications in favor of side with higher total weight (y {0, 1}) Experts that are wrong get their weights decreased (β [0, 1]) If you re right, you stay unchanged return w T +1 Jordan Boyd-Graber Boulder Online Learning 9 of 31
17 Weighted Majority (Littlestone and Warmuth, 1998) for t 1... N do w 1,i 1; for t 1... T do Receive x t ; ŷ t 1 [ y t,i =1 w t y t,i =0 w t Receive y t ; if ŷ t y t then for t 1... N do if ŷ t y t then w t+1,i βw t,i ; else wt+1,i w t,i ] ; Weights for every expert Classifications in favor of side with higher total weight (y {0, 1}) Experts that are wrong get their weights decreased (β [0, 1]) If you re right, you stay unchanged return w T +1 Jordan Boyd-Graber Boulder Online Learning 9 of 31
18 Weighted Majority Let m t be the number of mistakes made by WM until time t Let m t be the best expert s mistakes until time t m t log N + m t log 1 β log 2 1+β (6) Thus, mistake bound is O(log N) plus the best expert Halving algorithm β = 0 Jordan Boyd-Graber Boulder Online Learning 10 of 31
19 Proof: Potential Function Potential function is the sum of all weights Φ t i w t,i (7) We ll create sandwich of upper and lower bounds Jordan Boyd-Graber Boulder Online Learning 11 of 31
20 Proof: Potential Function Potential function is the sum of all weights Φ t i w t,i (7) We ll create sandwich of upper and lower bounds For any expert i, we have lower bound Φ t w t,i = β mt,i (8) Jordan Boyd-Graber Boulder Online Learning 11 of 31
21 Proof: Potential Function Potential function is the sum of all weights Φ t i w t,i (7) We ll create sandwich of upper and lower bounds For any expert i, we have lower bound Φ t w t,i = β mt,i (8) Weights are nonnegative, so i w t,i w t,i Jordan Boyd-Graber Boulder Online Learning 11 of 31
22 Proof: Potential Function Potential function is the sum of all weights Φ t i w t,i (7) We ll create sandwich of upper and lower bounds For any expert i, we have lower bound Φ t w t,i = β mt,i (8) Each error multiplicatively reduces weight by β Jordan Boyd-Graber Boulder Online Learning 11 of 31
23 Proof: Potential Function (Upper Bound) If an algorithm makes an error at round t Φ t+1 Φ t 2 + βφ t 2 (9) Jordan Boyd-Graber Boulder Online Learning 12 of 31
24 Proof: Potential Function (Upper Bound) If an algorithm makes an error at round t Φ t+1 Φ t 2 + βφ t 2 (9) Half (at most) of the experts by weight were right Jordan Boyd-Graber Boulder Online Learning 12 of 31
25 Proof: Potential Function (Upper Bound) If an algorithm makes an error at round t Φ t+1 Φ t 2 + βφ t 2 (9) Half (at least) of the experts by weight were wrong Jordan Boyd-Graber Boulder Online Learning 12 of 31
26 Proof: Potential Function (Upper Bound) If an algorithm makes an error at round t Φ t+1 Φ t 2 + βφ [ ] t 1 + β 2 = Φ t (9) 2 Jordan Boyd-Graber Boulder Online Learning 12 of 31
27 Proof: Potential Function (Upper Bound) If an algorithm makes an error at round t Φ t+1 Φ t 2 + βφ [ ] t 1 + β 2 = Φ t (9) 2 Initially potential function sums all weights, which start at 1 Φ 1 = N (10) Jordan Boyd-Graber Boulder Online Learning 12 of 31
28 Proof: Potential Function (Upper Bound) If an algorithm makes an error at round t Φ t+1 Φ t 2 + βφ [ ] t 1 + β 2 = Φ t (9) 2 Initially potential function sums all weights, which start at 1 Φ 1 = N (10) After m T mistakes after T rounds [ ] 1 + β mt Φ T N (11) 2 Jordan Boyd-Graber Boulder Online Learning 12 of 31
29 Weighted Majority Proof Put the two inequalities together, using the best expert β m T [ ] 1 + β mt ΦT N (12) 2 Jordan Boyd-Graber Boulder Online Learning 13 of 31
30 Weighted Majority Proof Put the two inequalities together, using the best expert β m T [ ] 1 + β mt ΦT N (12) 2 Take the log of both sides [ ] 1 + β mt log β log N + m T log 2 (13) Jordan Boyd-Graber Boulder Online Learning 13 of 31
31 Weighted Majority Proof Put the two inequalities together, using the best expert β m T [ ] 1 + β mt ΦT N (12) 2 Take the log of both sides [ ] 1 + β mt log β log N + m T log 2 (13) Solve for m T m T log N + m T log 1 β [ ] (14) log 2 1+β Jordan Boyd-Graber Boulder Online Learning 13 of 31
32 Weighted Majority Recap Simple algorithm No harsh assumptions (non-realizable) Depends on best learner Downside: Takes a long time to do well in worst case (but okay in practice) Solution: Randomization Jordan Boyd-Graber Boulder Online Learning 14 of 31
33 Perceptron Algorithm Plan Experts Perceptron Algorithm Online Perceptron for Structure Learning Jordan Boyd-Graber Boulder Online Learning 15 of 31
34 Perceptron Algorithm Perceptron Algorithm Online algorithm for classification Very similar to logistic regression (but 0/1 loss) But what can we prove? Jordan Boyd-Graber Boulder Online Learning 16 of 31
35 Perceptron Algorithm Perceptron Algorithm w 1 0; for t 1... T do Receive x t ; ŷ t sgn( w t x t ); Receive y t ; if ŷ t y t then w t+1 w t + y t x t ; else w t+1 w t ; return w T +1 Algorithm 2: Perceptron Algorithm (Rosenblatt, 1958) Jordan Boyd-Graber Boulder Online Learning 17 of 31
36 Perceptron Algorithm Objective Function Optimizes 1 T Convex but not differentiable max (0, y t ( w x t )) (15) t Jordan Boyd-Graber Boulder Online Learning 18 of 31
37 Perceptron Algorithm Margin and Errors If there s a good margin ρ, you ll converge quickly Jordan Boyd-Graber Boulder Online Learning 19 of 31
38 Perceptron Algorithm Margin and Errors If there s a good margin ρ, you ll converge quickly Whenever you se an error, you move the classifier to get it right Convergence only possible if data are separable Jordan Boyd-Graber Boulder Online Learning 19 of 31
39 Perceptron Algorithm How many errors does Perceptron make? If your data are in a R ball and there is a margin ρ y t( v x t ) v (16) for some v, then the number of mistakes is bounded by R 2 /ρ 2 The places where you make an error are support vectors Convergence can be slow for small margins Jordan Boyd-Graber Boulder Online Learning 20 of 31
40 Online Perceptron for Structure Learning Plan Experts Perceptron Algorithm Online Perceptron for Structure Learning Jordan Boyd-Graber Boulder Online Learning 21 of 31
41 Online Perceptron for Structure Learning Binary to Structure Jordan Boyd-Graber Boulder Online Learning 22 of 31
42 Online Perceptron for Structure Learning Binary to Structure Jordan Boyd-Graber Boulder Online Learning 22 of 31
43 Online Perceptron for Structure Learning Binary to Structure Jordan Boyd-Graber Boulder Online Learning 22 of 31
44 Online Perceptron for Structure Learning Generic Perceptron perceptron is the simplest machine learning algorithm online-learning: one example at a time learning by doing find the best output under the current weights update weights at mistakes Jordan Boyd-Graber Boulder Online Learning 23 of 31
45 Online Perceptron for Structure Learning Structured Perceptron Jordan Boyd-Graber Boulder Online Learning 24 of 31
46 Online Perceptron for Structure Learning Perceptron Algorithm Jordan Boyd-Graber Boulder Online Learning 25 of 31
47 Online Perceptron for Structure Learning POS Example Jordan Boyd-Graber Boulder Online Learning 26 of 31
48 Online Perceptron for Structure Learning What must be true? Finding highest scoring structure must be really fast (you ll do it often) Requires some sort of dynamic programming algorithm For tagging: features must be local to y (but can be global to x) Jordan Boyd-Graber Boulder Online Learning 27 of 31
49 Online Perceptron for Structure Learning Averaging is Good Jordan Boyd-Graber Boulder Online Learning 28 of 31
50 Online Perceptron for Structure Learning Averaging is Good Jordan Boyd-Graber Boulder Online Learning 28 of 31
51 Online Perceptron for Structure Learning Smoothing Must include subset templates for features For example, if you have feature (t 0, w 0, w 1 ), you must also have (t 0, w 0 ); (t 0, w 1 ); (w 0, w 1 ) Jordan Boyd-Graber Boulder Online Learning 29 of 31
52 Online Perceptron for Structure Learning Inexact Search? Sometimes search is too hard So we use beam search instead How to create algorithms that respect this relaxation: track when right answer falls off the beam Jordan Boyd-Graber Boulder Online Learning 30 of 31
53 Online Perceptron for Structure Learning Wrapup Structured prediction: when one label isn t enough Generative models can help with not a lot of data Discriminative models are state of the art Jordan Boyd-Graber Boulder Online Learning 31 of 31
54 Online Perceptron for Structure Learning Wrapup Structured prediction: when one label isn t enough Generative models can help with not a lot of data Discriminative models are state of the art More in Natural Language Processing (at least when I teach it) Jordan Boyd-Graber Boulder Online Learning 31 of 31
Classification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13
Classification Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY Slides adapted from Mohri Jordan Boyd-Graber UMD Classification 1 / 13 Beyond Binary Classification Before we ve talked about
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationInexact Search is Good Enough
Inexact Search is Good Enough Advanced Machine Learning for NLP Jordan Boyd-Graber MATHEMATICAL TREATMENT Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 1 of 1 Preliminaries:
More informationMistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron
Stat 928: Statistical Learning Theory Lecture: 18 Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron Instructor: Sham Kakade 1 Introduction This course will be divided into 2 parts.
More informationOnline Learning, Mistake Bounds, Perceptron Algorithm
Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next
More informationLittlestone s Dimension and Online Learnability
Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David
More informationClassification: The PAC Learning Framework
Classification: The PAC Learning Framework Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 5 Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber Boulder Classification:
More informationOnline Convex Optimization
Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed
More informationSupport Vector Machines
Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of
More informationEmpirical Risk Minimization Algorithms
Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:
More information1 Review of the Perceptron Algorithm
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #15 Scribe: (James) Zhen XIANG April, 008 1 Review of the Perceptron Algorithm In the last few lectures, we talked about various kinds
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationTHE WEIGHTED MAJORITY ALGORITHM
THE WEIGHTED MAJORITY ALGORITHM Csaba Szepesvári University of Alberta CMPUT 654 E-mail: szepesva@ualberta.ca UofA, October 3, 2006 OUTLINE 1 PREDICTION WITH EXPERT ADVICE 2 HALVING: FIND THE PERFECT EXPERT!
More informationPerceptron Mistake Bounds
Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce
More informationPerceptron (Theory) + Linear Regression
10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A
More information1. Implement AdaBoost with boosting stumps and apply the algorithm to the. Solution:
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 October 31, 2016 Due: A. November 11, 2016; B. November 22, 2016 A. Boosting 1. Implement
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationIntroduction to Machine Learning
Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE
More informationThe Perceptron Algorithm
The Perceptron Algorithm Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Outline The Perceptron Algorithm Perceptron Mistake Bound Variants of Perceptron 2 Where are we? The Perceptron
More informationClassification: Logistic Regression from Data
Classification: Logistic Regression from Data Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 3 Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber Boulder Classification:
More informationLecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds
Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland SUPPORT VECTOR MACHINES Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Machine Learning: Jordan
More informationIntroduction to Machine Learning
Introduction to Machine Learning Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber University of Maryland FEATURE ENGINEERING Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationOutline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual
Outline: Ensemble Learning We will describe and investigate algorithms to Ensemble Learning Lecture 10, DD2431 Machine Learning A. Maki, J. Sullivan October 2014 train weak classifiers/regressors and how
More informationOnline Learning with Experts & Multiplicative Weights Algorithms
Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect
More informationName (NetID): (1 Point)
CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationClustering. Introduction to Data Science. Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM LAUREN HANNAH
Clustering Introduction to Data Science Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM LAUREN HANNAH Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Introduction to Data Science
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative
More informationThe Perceptron Algorithm, Margins
The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt
More informationPAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:
More informationComputational Learning Theory
09s1: COMP9417 Machine Learning and Data Mining Computational Learning Theory May 20, 2009 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997
More informationLinear and Logistic Regression. Dr. Xiaowei Huang
Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics
More informationComputational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationLinear Classifiers. Michael Collins. January 18, 2012
Linear Classifiers Michael Collins January 18, 2012 Today s Lecture Binary classification problems Linear classifiers The perceptron algorithm Classification Problems: An Example Goal: build a system that
More informationAgnostic Online learnability
Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes
More informationOLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research
OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationLearning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14
Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our
More informationPerceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015
Perceptron Subhransu Maji CMPSCI 689: Machine Learning 3 February 2015 5 February 2015 So far in the class Decision trees Inductive bias: use a combination of small number of features Nearest neighbor
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationComputational Learning Theory. Definitions
Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland RADEMACHER COMPLEXITY Slides adapted from Rob Schapire Machine Learning: Jordan Boyd-Graber UMD Introduction
More informationPart of the slides are adapted from Ziko Kolter
Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,
More informationCS 446: Machine Learning Lecture 4, Part 2: On-Line Learning
CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning 0.1 Linear Functions So far, we have been looking at Linear Functions { as a class of functions which can 1 if W1 X separate some data and not
More informationLearning Theory Continued
Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.
More informationCSC321 Lecture 4 The Perceptron Algorithm
CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For
More informationComputational Learning Theory
0. Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 7 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell 1. Main Questions
More informationPredicting Graph Labels using Perceptron. Shuang Song
Predicting Graph Labels using Perceptron Shuang Song shs037@eng.ucsd.edu Online learning over graphs M. Herbster, M. Pontil, and L. Wainer, Proc. 22nd Int. Conf. Machine Learning (ICML'05), 2005 Prediction
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationCS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims
CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target
More informationHypothesis Testing and Computational Learning Theory. EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell
Hypothesis Testing and Computational Learning Theory EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell Overview Hypothesis Testing: How do we know our learners are good? What does performance
More informationMachine Learning (CSE 446): Learning as Minimizing Loss; Least Squares
Machine Learning (CSE 446): Learning as Minimizing Loss; Least Squares Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 13 Review 1 / 13 Alternate View of PCA: Minimizing
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationLogistic regression and linear classifiers COMS 4771
Logistic regression and linear classifiers COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationVC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.
VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about
More informationComputational Learning Theory (VC Dimension)
Computational Learning Theory (VC Dimension) 1 Difficulty of machine learning problems 2 Capabilities of machine learning algorithms 1 Version Space with associated errors error is the true error, r is
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationCS 188: Artificial Intelligence. Outline
CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class
More information1 Learning Linear Separators
10-601 Machine Learning Maria-Florina Balcan Spring 2015 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being from {0, 1} n or
More informationGame Theory, On-line prediction and Boosting (Freund, Schapire)
Game heory, On-line prediction and Boosting (Freund, Schapire) Idan Attias /4/208 INRODUCION he purpose of this paper is to bring out the close connection between game theory, on-line prediction and boosting,
More informationCS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell
CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate
More informationNatural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley
Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:
More informationEfficient and Principled Online Classification Algorithms for Lifelon
Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,
More informationOnline Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
Online Learning 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das About this class Goal To introduce the general setting of online learning. To describe an online version of the RLS algorithm
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16
600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationOptimization and Gradient Descent
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function
More informationLearning: Binary Perceptron. Examples: Perceptron. Separable Case. In the space of feature vectors
Linear Classifiers CS 88 Artificial Intelligence Perceptrons and Logistic Regression Pieter Abbeel & Dan Klein University of California, Berkeley Feature Vectors Some (Simplified) Biology Very loose inspiration
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationClassification II: Decision Trees and SVMs
Classification II: Decision Trees and SVMs Digging into Data: Jordan Boyd-Graber February 25, 2013 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Digging into Data: Jordan Boyd-Graber ()
More information4.1 Online Convex Optimization
CS/CNS/EE 53: Advanced Topics in Machine Learning Topic: Online Convex Optimization and Online SVM Lecturer: Daniel Golovin Scribe: Xiaodi Hou Date: Jan 3, 4. Online Convex Optimization Definition 4..
More informationLecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron)
Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron) Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor (http://brenocon.com) 1 Models for
More informationML4NLP Multiclass Classification
ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More informationTraining the linear classifier
215, Training the linear classifier A natural way to train the classifier is to minimize the number of classification errors on the training data, i.e. choosing w so that the training error is minimized.
More informationMore about the Perceptron
More about the Perceptron CMSC 422 MARINE CARPUAT marine@cs.umd.edu Credit: figures by Piyush Rai and Hal Daume III Recap: Perceptron for binary classification Classifier = hyperplane that separates positive
More information