Classification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13
|
|
- Chrystal Paul
- 5 years ago
- Views:
Transcription
1 Classification Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY Slides adapted from Mohri Jordan Boyd-Graber UMD Classification 1 / 13
2 Beyond Binary Classification Before we ve talked about combining weak predictor (boosting) Jordan Boyd-Graber UMD Classification 2 / 13
3 Beyond Binary Classification Before we ve talked about combining weak predictor (boosting) What if you have strong predictors? Jordan Boyd-Graber UMD Classification 2 / 13
4 Beyond Binary Classification Before we ve talked about combining weak predictor (boosting) What if you have strong predictors? How do you make inherently binary algorithms multiclass? How do you answer questions like ranking? Jordan Boyd-Graber UMD Classification 2 / 13
5 General Online Setting For t = 1 to T : Get instance x t X Predict ŷ t Y Get true label y t Y Incur loss L(ŷ t, y t ) Classification: Y = {0,1}, L(y, y ) = y y Regression: Y, L(y, y ) = (y y ) 2 Jordan Boyd-Graber UMD Classification 3 / 13
6 General Online Setting For t = 1 to T : Get instance x t X Predict ŷ t Y Get true label y t Y Incur loss L(ŷ t, y t ) Classification: Y = {0,1}, L(y, y ) = y y Regression: Y, L(y, y ) = (y y ) 2 Objective: Minimize total loss t L(ŷ t, y t ) Jordan Boyd-Graber UMD Classification 3 / 13
7 Prediction with Expert Advice For t = 1 to T : Get instance x t X and advice a t,i Y,i [1,N ] Predict ŷ t Y Get true label y t Y Incur loss L(ŷ t, y t ) Jordan Boyd-Graber UMD Classification 4 / 13
8 Prediction with Expert Advice For t = 1 to T : Get instance x t X and advice a t,i Y,i [1,N ] Predict ŷ t Y Get true label y t Y Incur loss L(ŷ t, y t ) Objective: Minimize regret, i.e., difference of total loss vs. best expert Regret(T ) = L(ŷ t, y t ) min L(a t,i, y t ) (1) i t t Jordan Boyd-Graber UMD Classification 4 / 13
9 Mistake Bound Model Define the maximum number of mistakes a learning algorithm L makes to learn a concept c over any set of examples (until it s perfect). M L (c ) = max x 1,...,x T mistakes(l, c ) (2) For any concept class C, this is the max over concepts c. M L (C ) = max c C M L(c ) (3) Jordan Boyd-Graber UMD Classification 5 / 13
10 Mistake Bound Model Define the maximum number of mistakes a learning algorithm L makes to learn a concept c over any set of examples (until it s perfect). M L (c ) = max x 1,...,x T mistakes(l, c ) (2) For any concept class C, this is the max over concepts c. M L (C ) = max c C M L(c ) (3) In the expert advice case, assumes some expert matches the concept (realizable) Jordan Boyd-Graber UMD Classification 5 / 13
11 Halving Algorithm H 1 H ; for t 1...T do Receive x t ; ŷ t Majority(H t, a t, x t ); Receive y t ; if ŷ t y t then H t +1 {a H t : a (x t ) = y t }; return H T +1Algorithm 1: The Halving Algorithm (Mitchell, 1997) Jordan Boyd-Graber UMD Classification 6 / 13
12 Halving Algorithm Bound (Littlestone, 1998) For a finite hypothesis set M Halving(H ) lg H (4) After each mistake, the hypothesis set is reduced by at least by half Jordan Boyd-Graber UMD Classification 7 / 13
13 Halving Algorithm Bound (Littlestone, 1998) For a finite hypothesis set M Halving(H ) lg H (4) After each mistake, the hypothesis set is reduced by at least by half Consider the optimal mistake bound opt(h ). Then VC(H ) opt(h ) M Halving(H ) lg H (5) For a fully shattered set, form a binary tree of mistakes with height VC(H ) Jordan Boyd-Graber UMD Classification 7 / 13
14 Halving Algorithm Bound (Littlestone, 1998) For a finite hypothesis set M Halving(H ) lg H (4) After each mistake, the hypothesis set is reduced by at least by half Consider the optimal mistake bound opt(h ). Then VC(H ) opt(h ) M Halving(H ) lg H (5) For a fully shattered set, form a binary tree of mistakes with height VC(H ) What about non-realizable case? Jordan Boyd-Graber UMD Classification 7 / 13
15 Weighted Majority (Littlestone and Warmuth, 1998) for i 1...N do w 1,i 1; for t 1...T do Receive x t ; ŷ t 1 a t,i =1 w t a t,i =0 w t Receive y t ; if ŷ t y t then for i 1...N do if a t,i y t then w t +1,i β w t,i ; else wt +1,i w t,i ; Weights for every expert Classifications in favor of side with higher total weight (y {0,1}) that are wrong get their weights decreased (β [0,1]) If you re right, you stay unchanged return w T +1 Jordan Boyd-Graber UMD Classification 8 / 13
16 Weighted Majority (Littlestone and Warmuth, 1998) for i 1...N do w 1,i 1; for t 1...T do Receive x t ; ŷ t 1 a t,i =1 w t a t,i =0 w t Receive y t ; if ŷ t y t then for i 1...N do if a t,i y t then w t +1,i β w t,i ; else wt +1,i w t,i ; Weights for every expert Classifications in favor of side with higher total weight (y {0,1}) that are wrong get their weights decreased (β [0,1]) If you re right, you stay unchanged return w T +1 Jordan Boyd-Graber UMD Classification 8 / 13
17 Weighted Majority (Littlestone and Warmuth, 1998) for i 1...N do w 1,i 1; for t 1...T do Receive x t ; ŷ t 1 a t,i =1 w t a t,i =0 w t Receive y t ; if ŷ t y t then for i 1...N do if a t,i y t then w t +1,i β w t,i ; else wt +1,i w t,i ; Weights for every expert Classifications in favor of side with higher total weight (y {0,1}) that are wrong get their weights decreased (β [0,1]) If you re right, you stay unchanged return w T +1 Jordan Boyd-Graber UMD Classification 8 / 13
18 Weighted Majority (Littlestone and Warmuth, 1998) for i 1...N do w 1,i 1; for t 1...T do Receive x t ; ŷ t 1 a t,i =1 w t a t,i =0 w t Receive y t ; if ŷ t y t then for i 1...N do if a t,i y t then w t +1,i β w t,i ; else wt +1,i w t,i ; Weights for every expert Classifications in favor of side with higher total weight (y {0,1}) that are wrong get their weights decreased (β [0,1]) If you re right, you stay unchanged return w T +1 Jordan Boyd-Graber UMD Classification 8 / 13
19 Weighted Majority Let mt be the number of mistakes made by WM until time t Let m t be the best expert s mistakes until time t N is the number of experts m t logn + m t log 1 β log 2 1+β (6) Thus, mistake bound is O (logn ) plus the best expert Halving algorithm β = 0 Jordan Boyd-Graber UMD Classification 9 / 13
20 Proof: Potential Function Potential function is the sum of all weights Φ t w t,i (7) We ll create sandwich of upper and lower bounds i Jordan Boyd-Graber UMD Classification 10 / 13
21 Proof: Potential Function Potential function is the sum of all weights Φ t w t,i (7) We ll create sandwich of upper and lower bounds For any expert i, we have lower bound i Φ t w t,i = β m t,i (8) Jordan Boyd-Graber UMD Classification 10 / 13
22 Proof: Potential Function Potential function is the sum of all weights Φ t w t,i (7) We ll create sandwich of upper and lower bounds For any expert i, we have lower bound i Φ t w t,i = β m t,i (8) Weights are nonnegative, so i w t,i w t,i Jordan Boyd-Graber UMD Classification 10 / 13
23 Proof: Potential Function Potential function is the sum of all weights Φ t w t,i (7) We ll create sandwich of upper and lower bounds For any expert i, we have lower bound i Φ t w t,i = β m t,i (8) Each error multiplicatively reduces weight by β Jordan Boyd-Graber UMD Classification 10 / 13
24 Proof: Potential Function (Upper Bound) If an algorithm makes an error at round t Φ t +1 Φ t 2 + βφ t 2 (9) Jordan Boyd-Graber UMD Classification 11 / 13
25 Proof: Potential Function (Upper Bound) If an algorithm makes an error at round t Φ t +1 Φ t 2 + βφ t 2 (9) Half (at most) of the experts by weight were right Jordan Boyd-Graber UMD Classification 11 / 13
26 Proof: Potential Function (Upper Bound) If an algorithm makes an error at round t Φ t +1 Φ t 2 + βφ t 2 (9) Half (at least) of the experts by weight were wrong Jordan Boyd-Graber UMD Classification 11 / 13
27 Proof: Potential Function (Upper Bound) If an algorithm makes an error at round t Φ t +1 Φ t 2 + βφ t 2 = 1 + β 2 Φ t (9) Jordan Boyd-Graber UMD Classification 11 / 13
28 Proof: Potential Function (Upper Bound) If an algorithm makes an error at round t Φ t +1 Φ t 2 + βφ t 2 = 1 + β 2 Φ t (9) Initially potential function sums all weights, which start at 1 Φ 1 = N (10) Jordan Boyd-Graber UMD Classification 11 / 13
29 Proof: Potential Function (Upper Bound) If an algorithm makes an error at round t Φ t +1 Φ t 2 + βφ t 2 = 1 + β 2 Φ t (9) Initially potential function sums all weights, which start at 1 Φ 1 = N (10) After mt mistakes after T rounds 1 + β mt Φ T N (11) 2 Jordan Boyd-Graber UMD Classification 11 / 13
30 Weighted Majority Proof Put the two inequalities together, using the best expert β m T 1 + β mt ΦT N (12) 2 Jordan Boyd-Graber UMD Classification 12 / 13
31 Weighted Majority Proof Put the two inequalities together, using the best expert β m T 1 + β mt ΦT N (12) 2 Take the log of both sides 1 + β m T logβ logn + m T log 2 (13) Jordan Boyd-Graber UMD Classification 12 / 13
32 Weighted Majority Proof Put the two inequalities together, using the best expert β m T 1 + β mt ΦT N (12) 2 Take the log of both sides 1 + β m T logβ logn + m T log 2 (13) Solve for mt m T logn + m T log 1 β (14) log 2 1+β Jordan Boyd-Graber UMD Classification 12 / 13
33 Weighted Majority Recap Simple algorithm No harsh assumptions (non-realizable) Depends on best learner Downside: Takes a long time to do well in worst case (but okay in practice) Solution: Randomization Jordan Boyd-Graber UMD Classification 13 / 13
Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri
Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationMistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron
Stat 928: Statistical Learning Theory Lecture: 18 Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron Instructor: Sham Kakade 1 Introduction This course will be divided into 2 parts.
More informationLittlestone s Dimension and Online Learnability
Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David
More informationInexact Search is Good Enough
Inexact Search is Good Enough Advanced Machine Learning for NLP Jordan Boyd-Graber MATHEMATICAL TREATMENT Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 1 of 1 Preliminaries:
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland RADEMACHER COMPLEXITY Slides adapted from Rob Schapire Machine Learning: Jordan Boyd-Graber UMD Introduction
More informationEmpirical Risk Minimization Algorithms
Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:
More informationAgnostic Online learnability
Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes
More informationDiscrete Probability Distributions
Discrete Probability Distributions Data Science: Jordan Boyd-Graber University of Maryland JANUARY 18, 2018 Data Science: Jordan Boyd-Graber UMD Discrete Probability Distributions 1 / 1 Refresher: Random
More informationOnline Learning, Mistake Bounds, Perceptron Algorithm
Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which
More information1. Implement AdaBoost with boosting stumps and apply the algorithm to the. Solution:
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 October 31, 2016 Due: A. November 11, 2016; B. November 22, 2016 A. Boosting 1. Implement
More informationIntroduction to Machine Learning
Introduction to Machine Learning Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber University of Maryland FEATURE ENGINEERING Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine
More informationExpectations and Entropy
Expectations and Entropy Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM DAVE BLEI AND LAUREN HANNAH Data Science: Jordan Boyd-Graber UMD Expectations and Entropy 1 / 9 Expectation
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationLogistic Regression. Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE
Logistic Regression Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE Introduction to Data Science Algorithms Boyd-Graber and Paul Logistic
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland SUPPORT VECTOR MACHINES Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Machine Learning: Jordan
More information1 Review of the Perceptron Algorithm
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #15 Scribe: (James) Zhen XIANG April, 008 1 Review of the Perceptron Algorithm In the last few lectures, we talked about various kinds
More informationOnline Convex Optimization
Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed
More informationDecision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1
Decision Trees Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, 2018 Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Roadmap Classification: machines labeling data for us Last
More informationOnline Learning with Experts & Multiplicative Weights Algorithms
Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect
More informationPAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationTHE WEIGHTED MAJORITY ALGORITHM
THE WEIGHTED MAJORITY ALGORITHM Csaba Szepesvári University of Alberta CMPUT 654 E-mail: szepesva@ualberta.ca UofA, October 3, 2006 OUTLINE 1 PREDICTION WITH EXPERT ADVICE 2 HALVING: FIND THE PERFECT EXPERT!
More informationHypothesis Testing and Computational Learning Theory. EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell
Hypothesis Testing and Computational Learning Theory EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell Overview Hypothesis Testing: How do we know our learners are good? What does performance
More informationOnline Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
Online Learning 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das About this class Goal To introduce the general setting of online learning. To describe an online version of the RLS algorithm
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationAlgorithms, Games, and Networks January 17, Lecture 2
Algorithms, Games, and Networks January 17, 2013 Lecturer: Avrim Blum Lecture 2 Scribe: Aleksandr Kazachkov 1 Readings for today s lecture Today s topic is online learning, regret minimization, and minimax
More informationOptimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning
More informationSolving Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 12. Slides adapted from Matt Nedrich and Trevor Hastie
Solving Regression Jordan Boyd-Graber University of Colorado Boulder LECTURE 12 Slides adapted from Matt Nedrich and Trevor Hastie Jordan Boyd-Graber Boulder Solving Regression 1 of 17 Roadmap We talked
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16
600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an
More informationOn-Line Learning with Path Experts and Non-Additive Losses
On-Line Learning with Path Experts and Non-Additive Losses Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Courant Institute) Manfred Warmuth (UC Santa Cruz) MEHRYAR MOHRI MOHRI@ COURANT
More informationIs our computers learning?
Attribution These slides were prepared for the New Jersey Governor s School course The Math Behind the Machine taught in the summer of 2012 by Grant Schoenebeck Large parts of these slides were copied
More informationOnline Learning Summer School Copenhagen 2015 Lecture 1
Online Learning Summer School Copenhagen 2015 Lecture 1 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Online Learning Shai Shalev-Shwartz (Hebrew U) OLSS Lecture
More informationLecture 2: Weighted Majority Algorithm
EECS 598-6: Prediction, Learning and Games Fall 3 Lecture : Weighted Majority Algorithm Lecturer: Jacob Abernethy Scribe: Petter Nilsson Disclaimer: These notes have not been subjected to the usual scrutiny
More informationClustering. Introduction to Data Science. Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM LAUREN HANNAH
Clustering Introduction to Data Science Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM LAUREN HANNAH Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Introduction to Data Science
More informationClassification: The PAC Learning Framework
Classification: The PAC Learning Framework Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 5 Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber Boulder Classification:
More informationAdvanced Machine Learning
Advanced Machine Learning Follow-he-Perturbed Leader MEHRYAR MOHRI MOHRI@ COURAN INSIUE & GOOGLE RESEARCH. General Ideas Linear loss: decomposition as a sum along substructures. sum of edge losses in a
More informationIntroduction to Machine Learning
Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE
More informationCS 395T Computational Learning Theory. Scribe: Rahul Suri
CS 395T Computational Learning Theory Lecture 6: September 19, 007 Lecturer: Adam Klivans Scribe: Rahul Suri 6.1 Overview In Lecture 5, we showed that for any DNF formula f with n variables and s terms
More informationLeaving The Span Manfred K. Warmuth and Vishy Vishwanathan
Leaving The Span Manfred K. Warmuth and Vishy Vishwanathan UCSC and NICTA Talk at NYAS Conference, 10-27-06 Thanks to Dima and Jun 1 Let s keep it simple Linear Regression 10 8 6 4 2 0 2 4 6 8 8 6 4 2
More informationMulticlass Learnability and the ERM principle
Multiclass Learnability and the ERM principle Amit Daniely Sivan Sabato Shai Ben-David Shai Shalev-Shwartz November 5, 04 arxiv:308.893v [cs.lg] 4 Nov 04 Abstract We study the sample complexity of multiclass
More informationClassification II: Decision Trees and SVMs
Classification II: Decision Trees and SVMs Digging into Data: Jordan Boyd-Graber February 25, 2013 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Digging into Data: Jordan Boyd-Graber ()
More informationGame Theory, On-line prediction and Boosting (Freund, Schapire)
Game heory, On-line prediction and Boosting (Freund, Schapire) Idan Attias /4/208 INRODUCION he purpose of this paper is to bring out the close connection between game theory, on-line prediction and boosting,
More informationSupport Vector Machines
Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of
More informationEfficient Bandit Algorithms for Online Multiclass Prediction
Efficient Bandit Algorithms for Online Multiclass Prediction Sham Kakade, Shai Shalev-Shwartz and Ambuj Tewari Presented By: Nakul Verma Motivation In many learning applications, true class labels are
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationRandom Variables and Events
Random Variables and Events Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM DAVE BLEI AND LAUREN HANNAH Data Science: Jordan Boyd-Graber UMD Random Variables and Events 1 /
More informationLearning Theory: Basic Guarantees
Learning Theory: Basic Guarantees Daniel Khashabi Fall 2014 Last Update: April 28, 2016 1 Introduction The Learning Theory 1, is somewhat least practical part of Machine Learning, which is most about the
More informationLecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland LOGISTIC REGRESSION FROM TEXT Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber UMD Introduction
More informationWorst-Case Analysis of the Perceptron and Exponentiated Update Algorithms
Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms Tom Bylander Division of Computer Science The University of Texas at San Antonio San Antonio, Texas 7849 bylander@cs.utsa.edu April
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationLinear Classifiers. Michael Collins. January 18, 2012
Linear Classifiers Michael Collins January 18, 2012 Today s Lecture Binary classification problems Linear classifiers The perceptron algorithm Classification Problems: An Example Goal: build a system that
More informationOLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research
OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and
More informationMulticlass Learnability and the ERM Principle
Journal of Machine Learning Research 6 205 2377-2404 Submitted 8/3; Revised /5; Published 2/5 Multiclass Learnability and the ERM Principle Amit Daniely amitdaniely@mailhujiacil Dept of Mathematics, The
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationA simple algorithmic explanation for the concentration of measure phenomenon
A simple algorithmic explanation for the concentration of measure phenomenon Igor C. Oliveira October 10, 014 Abstract We give an elementary algorithmic argument that sheds light on the concentration of
More informationLinear and Logistic Regression. Dr. Xiaowei Huang
Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationCS 446: Machine Learning Lecture 4, Part 2: On-Line Learning
CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning 0.1 Linear Functions So far, we have been looking at Linear Functions { as a class of functions which can 1 if W1 X separate some data and not
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call
More informationGANs. Machine Learning: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM GRAHAM NEUBIG
GANs Machine Learning: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM GRAHAM NEUBIG Machine Learning: Jordan Boyd-Graber UMD GANs 1 / 7 Problems with Generation Generative Models Ain t Perfect
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix
More informationComputational Learning Theory
0. Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 7 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell 1. Main Questions
More informationPerceptron Mistake Bounds
Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce
More informationNew Algorithms for Contextual Bandits
New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised
More informationLecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds
Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,
More informationClassification objectives COMS 4771
Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationUsing Additive Expert Ensembles to Cope with Concept Drift
Jeremy Z. Kolter and Marcus A. Maloof {jzk, maloof}@cs.georgetown.edu Department of Computer Science, Georgetown University, Washington, DC 20057-1232, USA Abstract We consider online learning where the
More informationAgnostic Online Learning
Agnostic Online Learning Shai Ben-David and Dávid Pál David R. Cheriton School of Computer Science University of Waterloo Waterloo, ON, Canada {shai,dpal}@cs.uwaterloo.ca Shai Shalev-Shwartz Toyota Technological
More informationOnline Learning versus Offline Learning*
Machine Learning, 29, 45 63 (1997) c 1997 Kluwer Academic Publishers. Manufactured in The Netherlands. Online Learning versus Offline Learning* SHAI BEN-DAVID Computer Science Dept., Technion, Israel.
More informationComputational Learning Theory
09s1: COMP9417 Machine Learning and Data Mining Computational Learning Theory May 20, 2009 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 6: Generalized Hinge Loss and Multiclass SVM
DS-GA 1003: Machine Learning and Computational Statistics Homework 6: Generalized Hinge Loss and Multiclass SVM Due: Monday, April 11, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationTheory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!
Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training
More information0.1 Motivating example: weighted majority algorithm
princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes
More informationLanguage Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN
Language Models Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN Data Science: Jordan Boyd-Graber UMD Language Models 1 / 8 Language models Language models answer
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationCS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell
CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationLearning with Large Number of Experts: Component Hedge Algorithm
Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T
More informationTime Series Prediction & Online Learning
Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.
More informationLearning Ensembles. 293S T. Yang. UCSB, 2017.
Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)
More informationConvex Repeated Games and Fenchel Duality
Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationOutline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual
Outline: Ensemble Learning We will describe and investigate algorithms to Ensemble Learning Lecture 10, DD2431 Machine Learning A. Maki, J. Sullivan October 2014 train weak classifiers/regressors and how
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification MariaFlorina (Nina) Balcan 10/05/2016 Reminders Midterm Exam Mon, Oct. 10th Midterm Review Session
More informationLearning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationBoosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13
Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y
More informationA Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997
A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science
More informationDistributed Machine Learning. Maria-Florina Balcan, Georgia Tech
Distributed Machine Learning Maria-Florina Balcan, Georgia Tech Model for reasoning about key issues in supervised learning [Balcan-Blum-Fine-Mansour, COLT 12] Supervised Learning Example: which emails
More informationName (NetID): (1 Point)
CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationOnline Bounds for Bayesian Algorithms
Online Bounds for Bayesian Algorithms Sham M. Kakade Computer and Information Science Department University of Pennsylvania Andrew Y. Ng Computer Science Department Stanford University Abstract We present
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More information