Online Learning Summer School Copenhagen 2015 Lecture 1

Size: px
Start display at page:

Download "Online Learning Summer School Copenhagen 2015 Lecture 1"

Transcription

1 Online Learning Summer School Copenhagen 2015 Lecture 1 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Online Learning Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 1 / 32

2 Outline 1 The Online Learning Framework Online Classification Hypothesis class 2 Learning Finite Hypothesis Classes The Consistent learner The Halving learner 3 Structure over the hypothesis class Halfspaces The Ellipsoid Learner Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 2 / 32

3 Gentle Start: An Online Classification Game For t = 1, 2,... Environment presents a question x t Learner predicts an answer ŷ t {±1} Environment reveals true label y t {±1} Learner pays 1 if ŷ t y t and 0 otherwise Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 3 / 32

4 Gentle Start: An Online Classification Game For t = 1, 2,... Environment presents a question x t Learner predicts an answer ŷ t {±1} Environment reveals true label y t {±1} Learner pays 1 if ŷ t y t and 0 otherwise Goal of the learner: Make few mistakes Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 3 / 32

5 Example Applications Weather forecasting (will it rain tomorrow) Finance (buy or sell an asset) Spam filtering (is this a spam) Compression (what s the next symbol in a sequence) Proxy for optimization (will be clear later) Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 4 / 32

6 When can we hope to make few mistakes? Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 5 / 32

7 When can we hope to make few mistakes? Task is hopeless if there s no correlation between past and future We are making no statistical assumptions on the origin of the sequence Need to give more knowledge to the learner Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 5 / 32

8 Prior Knowledge Recall the online game: For t = 1, 2,...: get question x t X, predict ŷ t {±1}, then get y t {±1} The realizability by H assumption H is a predefined set of functions from X to {±1} Exists f H s.t. for every t, y t = f(x t ) The learner knows H (but of course doesn t know f) Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 6 / 32

9 Prior Knowledge Recall the online game: For t = 1, 2,...: get question x t X, predict ŷ t {±1}, then get y t {±1} The realizability by H assumption H is a predefined set of functions from X to {±1} Exists f H s.t. for every t, y t = f(x t ) The learner knows H (but of course doesn t know f) Remark: What if our prior knowledge is wrong? We ll get back to this question later Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 6 / 32

10 Not always helpful Let X = R, and H be thresholds: H = {h θ : θ R}, where h θ (x) = sign(x θ) h θ (x) = 1 h θ (x) = 1 θ Theorem: for every learner, exists sequence of examples which is consistent with some f H but on which the learner will always err Exercise: Prove the theorem by showing that the environment can follow the bisection method Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 7 / 32

11 Outline 1 The Online Learning Framework Online Classification Hypothesis class 2 Learning Finite Hypothesis Classes The Consistent learner The Halving learner 3 Structure over the hypothesis class Halfspaces The Ellipsoid Learner Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 8 / 32

12 Learning Finite Classes Assume that H is of finite size E.g.: H is all the functions from X to {±1} that can be implemented using a Python program of length at most b E.g.: H is thresholds over a grid X = {0, 1 n, 2 n,..., 1} Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 9 / 32

13 Learning Finite Classes The consistent learner Initialize V 1 = H For t = 1, 2,... Get x t Pick some h V t and predict ŷ t = h(x t ) Get y t and update V t+1 = {h V t : h(x t ) = y t } Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 10 / 32

14 Analysis Theorem The consistent learner will make at most H 1 mistakes Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 11 / 32

15 Analysis Theorem The consistent learner will make at most H 1 mistakes Proof. If we err at round t, then the h V t we used for prediction will not be in V t+1. Therefore, V t+1 V t 1. Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 11 / 32

16 Analysis Theorem The consistent learner will make at most H 1 mistakes Proof. If we err at round t, then the h V t we used for prediction will not be in V t+1. Therefore, V t+1 V t 1. Can we do better? Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 11 / 32

17 The Halving learner The Halving learner Initialize V 1 = H For t = 1, 2,... Get x t Predict Majority(h(x t ) : h V t ) Get y t and update V t+1 = {h V t : h(x t ) = y t } Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 12 / 32

18 Analysis Theorem The Halving learner will make at most log 2 ( H ) mistakes Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 13 / 32

19 Analysis Theorem The Halving learner will make at most log 2 ( H ) mistakes Proof. If we err at round t, then at least half of the functions in V t will not be in V t+1. Therefore, V t+1 V t /2. Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 13 / 32

20 Analysis Theorem The Halving learner will make at most log 2 ( H ) mistakes Proof. If we err at round t, then at least half of the functions in V t will not be in V t+1. Therefore, V t+1 V t /2. Corollary The Halving learner can learn the class H of all python programs of length < b bits while making at most b mistakes. Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 13 / 32

21 Powerful, but... 1 What if the environment is not consistent with any f H? We ll deal with this later Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 14 / 32

22 Powerful, but... 1 What if the environment is not consistent with any f H? We ll deal with this later 2 While the mistake bound of Halving grows with log 2 ( H ), the runtime of Halving grows with H Learning must take computational considerations into account Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 14 / 32

23 Outline 1 The Online Learning Framework Online Classification Hypothesis class 2 Learning Finite Hypothesis Classes The Consistent learner The Halving learner 3 Structure over the hypothesis class Halfspaces The Ellipsoid Learner Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 15 / 32

24 Efficient learning with structured H Example: Recall again the class H of thresholds over a grid X = {0, 1 n,..., 1} for some integer n 1 Halving mistake bound is log(n + 1) A naive implementation of Halving takes Ω(n) time How to implement Halving efficiently? Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 16 / 32

25 Efficient learning with structured H Efficient Halving for discrete thresholds Initialize l 1 = 0, r 1 = 1 For t = 1, 2,... Get x t {0, 1 n,..., 1} Predict sign((x t l t ) (r t x t )) Get y t and if x t [l t, r t ] update: if y t = 1 then l t+1 = l t, r t+1 = x t if y t = 1 then l t+1 = x t, r t+1 = r t Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 17 / 32

26 Efficient learning with structured H Efficient Halving for discrete thresholds Initialize l 1 = 0, r 1 = 1 For t = 1, 2,... Get x t {0, 1 n,..., 1} Predict sign((x t l t ) (r t x t )) Get y t and if x t [l t, r t ] update: if y t = 1 then l t+1 = l t, r t+1 = x t if y t = 1 then l t+1 = x t, r t+1 = r t Exercise: show that the above is indeed an implementation of Halving and that the runtime of each iteration is O(log(n)) Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 17 / 32

27 Outline 1 The Online Learning Framework Online Classification Hypothesis class 2 Learning Finite Hypothesis Classes The Consistent learner The Halving learner 3 Structure over the hypothesis class Halfspaces The Ellipsoid Learner Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 18 / 32

28 Halfspaces w + + H = {x sign( w, x + b) : w R d, b R} Inner product: w, x = w x = d i=1 w ix i w is called a weight vector and b a bias Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 19 / 32

29 Halfspaces w + + H = {x sign( w, x + b) : w R d, b R} Inner product: w, x = w x = d i=1 w ix i w is called a weight vector and b a bias For d = 1, the class of Halfspaces is the class of thresholds W.l.o.g., assume that x d = 1 for all examples, and then we can treat w d as the bias and forget about b Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 19 / 32

30 Using halving to learn halfspaces on a grid Let us represent all numbers on the grid G = { 1, 1 + 1/n,..., 1 1/n, 1} Then, H = G d = (2n + 1) d Therefore, Halving s bound is at most d log(2n + 1) We will show an algorithm with a slightly worse mistake bound but that can be implemented efficiently Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 20 / 32

31 The Ellipsoid Learner Recall that Halving maintains the Version Space, V t, containing all hypotheses in H which are consistent with the examples observed so far Each halfspace hypothesis corresponds to a vector in G d Instead of maintaining V t, we will maintain an ellipsoid, E t, that contains V t We will show that every time we make a mistake the volume of E t shrinks by a factor of e 1/(2n+2) On the other hand, we will show that the volume of E t cannot be made too small (this is where we use the grid assumption) Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 21 / 32

32 Background: Balls and Ellipsoids Let B = {w R d : w 2 1} be the unit ball of R d Recall: w 2 = w, w = w w = d i=1 w2 i An ellipsoid is the image of a ball under an affine mapping: given a matrix M and a vector v, E(M, v) = {Mw + v : w 2 1} w Mw + v 0 v Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 22 / 32

33 The Ellipsoid Learner We implicitly maintain an ellipsoid: E t = E(A 1/2 t, w t ) Start with w 1 = 0, A 1 = I For t = 1, 2,... Get x t Predict ŷ t = sign(w t x t ) Get y t If ŷ t y t update: w t+1 = w t + y t A t x t d + 1 x t A t x t ( A t+1 = d2 d 2 A t 2 A t x t x ) t A t 1 d + 1 x t A t x t If ŷ t = y t keep w t+1 = w t and A t+1 = A t Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 23 / 32

34 Intuition Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 24 / 32

35 Intuition Suppose x 1 = (1, 0), y 1 = 1. Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 24 / 32

36 Intuition Suppose x 1 = (1, 0), y 1 = 1. ( ) 1/3 w 2 = 0 Then:, A 2 = ( 4/ /9 ) Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 24 / 32

37 Intuition Suppose x 1 = (1, 0), y 1 = 1. ( ) 1/3 w 2 = 0 Then:, A 2 = ( 4/ /9 ) E 2 is Ellipsoid of minimum volume that contains E 1 {w : y 1 w, x 1 > 0} Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 24 / 32

38 Analysis Theorem The Ellipsoid learner makes at most 2d(2d + 2) log(n) mistakes. Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 25 / 32

39 Analysis Theorem The Ellipsoid learner makes at most 2d(2d + 2) log(n) mistakes. Proof is based on two lemmas: Lemma (Volume Reduction) Whenever we make a mistake, Vol(E t+1 ) Vol(E t ) e 1 2d+2. Lemma (Volume can t be too small) For every t, Vol(E t ) Vol(B) (1/n) 2d Therefore, after M mistakes: Vol(B) (1/n) 2d Vol(E t ) Vol(B) e M 1 2d+2 Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 25 / 32

40 Summary A basic online classification model Need prior knowledge Learning finite hypothesis classes using Halving The runtime problem The Ellipsoid efficiently learns halfspaces (over a grid) Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 26 / 32

41 Summary A basic online classification model Need prior knowledge Learning finite hypothesis classes using Halving The runtime problem The Ellipsoid efficiently learns halfspaces (over a grid) Next lectures: Online learnability: for which H can we have finite number of mistakes? Non-realizable sequences Beyond binary classification A more general online learning game Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 26 / 32

42 Exercises Exercise on Page 7 Exercise on Page 17 Derive the Ellipsoid update equations Prove the two lemmas on Page 25 Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 27 / 32

43 Background: Balls and Ellipsoids Recall: E(M, v) = {Mw + v : w 2 1} We deal with non-degenerative ellipsoids, i.e., M is invertible SVD theorem: Every real invertible matrix M can be decomposed as M = UDV where U, V orthonormal and D diagonal with D i,i > 0. Exercise: Show that E(M, v) = E(UD, v) = E(UDU, v) Therefore, we can assume w.l.o.g. that M = UDU (i.e., it is symmetric positive definite) Exercise: Show that for such M E(M, v) = {x : (x v) M 2 (x v) 1} where M 2 = UD 2 U with (D 2 ) i,i = D 2 i,i Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 28 / 32

44 Volume Calculations Let Vol(B) be the volume of the unit ball Lemma: If M = UDU is positive definite, then ( m ) Vol(E(M, v)) = det(m)vol(b) = D i,i Vol(B) i=1 Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 29 / 32

45 Why volume shrinks Suppose A t = UD 2 U. Define x t = DU x t. Then: ( A t+1 = d2 d 2 A t 2 A t x t x ) t A t 1 d + 1 x t A tx t ( = d2 d 2 1 UD I 2 ) x t x t d + 1 x t 2 DU By Sylvester s determinant theorem, det(i + uv ) = 1 + u, v. Therefore, ( d 2 det(a t+1 ) = d 2 1 ( d 2 = det(a t ) d 2 1 ) d det(d) det ( I 2 d + 1 ) ) d ( 1 2 d + 1 ) x t x t x t 2 det(d) Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 30 / 32

46 Why volume shrinks We obtain: Vol(E t+1 ) Vol(E t ) ( d 2 = d 2 1 ( d 2 = d 2 1 = ) d/2 ( 1 2 ) d 1 2 ( d 2 1 d + 1 ) 1/2 d d 1 (d 1)(d + 1) d + 1 ) d 1 2 ( 1 1 ) d + 1 e d 1 2(d 2 1) e 1 d+1 = e 1 2(d+1) where we used 1 + a e a which holds for all a R. Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 31 / 32

47 Why volume can t be too small Recall, y t w, x t > 0 for every t. Since w, x t are on the grid G, it follows that y t w, x t 1/n 2. Therefore, if w w < 1/n 2 then y t w, x t = y t w w, x t +y t w, x t w w x t +1/n 2 > 0 Convince yourself (by induction) that E t contains the ball of radius 1/n 2 centered around w. It follows that Vol(B) (1/n 2 ) d = Vol(E( 1 n 2 I, w )) Vol(E t ) Shai Shalev-Shwartz (Hebrew U) OLSS Lecture 1 Online Learning 32 / 32

Empirical Risk Minimization Algorithms

Empirical Risk Minimization Algorithms Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:

More information

Introduction to Machine Learning (67577) Lecture 3

Introduction to Machine Learning (67577) Lecture 3 Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz

More information

Littlestone s Dimension and Online Learnability

Littlestone s Dimension and Online Learnability Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

Linear Classifiers and the Perceptron

Linear Classifiers and the Perceptron Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers Let s assume that every instance is an n-dimensional vector of real numbers x R n, and there are only two possible

More information

Agnostic Online learnability

Agnostic Online learnability Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes

More information

Linear Classifiers. Michael Collins. January 18, 2012

Linear Classifiers. Michael Collins. January 18, 2012 Linear Classifiers Michael Collins January 18, 2012 Today s Lecture Binary classification problems Linear classifiers The perceptron algorithm Classification Problems: An Example Goal: build a system that

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Introduction to Machine Learning (67577)

Introduction to Machine Learning (67577) Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks

More information

The Perceptron algorithm

The Perceptron algorithm The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following

More information

Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron

Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron Stat 928: Statistical Learning Theory Lecture: 18 Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron Instructor: Sham Kakade 1 Introduction This course will be divided into 2 parts.

More information

Accelerating Stochastic Optimization

Accelerating Stochastic Optimization Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

Introduction to Machine Learning (67577) Lecture 5

Introduction to Machine Learning (67577) Lecture 5 Introduction to Machine Learning (67577) Lecture 5 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Nonuniform learning, MDL, SRM, Decision Trees, Nearest Neighbor Shai

More information

On the tradeoff between computational complexity and sample complexity in learning

On the tradeoff between computational complexity and sample complexity in learning On the tradeoff between computational complexity and sample complexity in learning Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Joint work with Sham

More information

Introduction to Machine Learning (67577) Lecture 7

Introduction to Machine Learning (67577) Lecture 7 Introduction to Machine Learning (67577) Lecture 7 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Solving Convex Problems using SGD and RLM Shai Shalev-Shwartz (Hebrew

More information

8.1 Polynomial Threshold Functions

8.1 Polynomial Threshold Functions CS 395T Computational Learning Theory Lecture 8: September 22, 2008 Lecturer: Adam Klivans Scribe: John Wright 8.1 Polynomial Threshold Functions In the previous lecture, we proved that any function over

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

Lecture 14 Ellipsoid method

Lecture 14 Ellipsoid method S. Boyd EE364 Lecture 14 Ellipsoid method idea of localization methods bisection on R center of gravity algorithm ellipsoid method 14 1 Localization f : R n R convex (and for now, differentiable) problem:

More information

Understanding Machine Learning A theory Perspective

Understanding Machine Learning A theory Perspective Understanding Machine Learning A theory Perspective Shai Ben-David University of Waterloo MLSS at MPI Tubingen, 2017 Disclaimer Warning. This talk is NOT about how cool machine learning is. I am sure you

More information

COMPUTATIONAL LEARNING THEORY

COMPUTATIONAL LEARNING THEORY COMPUTATIONAL LEARNING THEORY XIAOXU LI Abstract. This paper starts with basic concepts of computational learning theory and moves on with examples in monomials and learning rays to the final discussion

More information

Efficient Bandit Algorithms for Online Multiclass Prediction

Efficient Bandit Algorithms for Online Multiclass Prediction Efficient Bandit Algorithms for Online Multiclass Prediction Sham Kakade, Shai Shalev-Shwartz and Ambuj Tewari Presented By: Nakul Verma Motivation In many learning applications, true class labels are

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Active Learning and Optimized Information Gathering

Active Learning and Optimized Information Gathering Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office

More information

6.867 Machine learning: lecture 2. Tommi S. Jaakkola MIT CSAIL

6.867 Machine learning: lecture 2. Tommi S. Jaakkola MIT CSAIL 6.867 Machine learning: lecture 2 Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learning problem hypothesis class, estimation algorithm loss and estimation criterion sampling, empirical and

More information

Failures of Gradient-Based Deep Learning

Failures of Gradient-Based Deep Learning Failures of Gradient-Based Deep Learning Shai Shalev-Shwartz, Shaked Shammah, Ohad Shamir The Hebrew University and Mobileye Representation Learning Workshop Simons Institute, Berkeley, 2017 Shai Shalev-Shwartz

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Lecture 4: Linear predictors and the Perceptron

Lecture 4: Linear predictors and the Perceptron Lecture 4: Linear predictors and the Perceptron Introduction to Learning and Analysis of Big Data Kontorovich and Sabato (BGU) Lecture 4 1 / 34 Inductive Bias Inductive bias is critical to prevent overfitting.

More information

THE FORGETRON: A KERNEL-BASED PERCEPTRON ON A BUDGET

THE FORGETRON: A KERNEL-BASED PERCEPTRON ON A BUDGET THE FORGETRON: A KERNEL-BASED PERCEPTRON ON A BUDGET OFER DEKEL, SHAI SHALEV-SHWARTZ, AND YORAM SINGER Abstract The Perceptron algorithm, despite its simplicity, often performs well in online classification

More information

Composite Objective Mirror Descent

Composite Objective Mirror Descent Composite Objective Mirror Descent John C. Duchi 1,3 Shai Shalev-Shwartz 2 Yoram Singer 3 Ambuj Tewari 4 1 University of California, Berkeley 2 Hebrew University of Jerusalem, Israel 3 Google Research

More information

Classification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13

Classification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13 Classification Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY Slides adapted from Mohri Jordan Boyd-Graber UMD Classification 1 / 13 Beyond Binary Classification Before we ve talked about

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

CS675: Convex and Combinatorial Optimization Spring 2018 The Ellipsoid Algorithm. Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Spring 2018 The Ellipsoid Algorithm. Instructor: Shaddin Dughmi CS675: Convex and Combinatorial Optimization Spring 2018 The Ellipsoid Algorithm Instructor: Shaddin Dughmi History and Basics Originally developed in the mid 70s by Iudin, Nemirovski, and Shor for use

More information

The Forgetron: A Kernel-Based Perceptron on a Fixed Budget

The Forgetron: A Kernel-Based Perceptron on a Fixed Budget The Forgetron: A Kernel-Based Perceptron on a Fixed Budget Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {oferd,shais,singer}@cs.huji.ac.il

More information

Practical Agnostic Active Learning

Practical Agnostic Active Learning Practical Agnostic Active Learning Alina Beygelzimer Yahoo Research based on joint work with Sanjoy Dasgupta, Daniel Hsu, John Langford, Francesco Orabona, Chicheng Zhang, and Tong Zhang * * introductory

More information

From Batch to Transductive Online Learning

From Batch to Transductive Online Learning From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org

More information

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

Multiclass Learnability and the ERM principle

Multiclass Learnability and the ERM principle Multiclass Learnability and the ERM principle Amit Daniely Sivan Sabato Shai Ben-David Shai Shalev-Shwartz November 5, 04 arxiv:308.893v [cs.lg] 4 Nov 04 Abstract We study the sample complexity of multiclass

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

Computational Learning

Computational Learning Online Learning Computational Learning Computer programs learn a rule as they go rather than being told. Can get information by being given labeled examples, asking queries, experimentation. Automatic

More information

Convex Repeated Games and Fenchel Duality

Convex Repeated Games and Fenchel Duality Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,

More information

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses 1 CONCEPT LEARNING AND THE GENERAL-TO-SPECIFIC ORDERING [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Learning from examples General-to-specific ordering over hypotheses Version spaces and

More information

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization

More information

The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904,

More information

CS446: Machine Learning Fall Final Exam. December 6 th, 2016

CS446: Machine Learning Fall Final Exam. December 6 th, 2016 CS446: Machine Learning Fall 2016 Final Exam December 6 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class

More information

Logistic regression and linear classifiers COMS 4771

Logistic regression and linear classifiers COMS 4771 Logistic regression and linear classifiers COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Grothendieck s Inequality

Grothendieck s Inequality Grothendieck s Inequality Leqi Zhu 1 Introduction Let A = (A ij ) R m n be an m n matrix. Then A defines a linear operator between normed spaces (R m, p ) and (R n, q ), for 1 p, q. The (p q)-norm of A

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

The Perceptron Algorithm

The Perceptron Algorithm The Perceptron Algorithm Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Outline The Perceptron Algorithm Perceptron Mistake Bound Variants of Perceptron 2 Where are we? The Perceptron

More information

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

Elementary Linear Algebra

Elementary Linear Algebra Matrices J MUSCAT Elementary Linear Algebra Matrices Definition Dr J Muscat 2002 A matrix is a rectangular array of numbers, arranged in rows and columns a a 2 a 3 a n a 2 a 22 a 23 a 2n A = a m a mn We

More information

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory

More information

Game Theory, On-line prediction and Boosting (Freund, Schapire)

Game Theory, On-line prediction and Boosting (Freund, Schapire) Game heory, On-line prediction and Boosting (Freund, Schapire) Idan Attias /4/208 INRODUCION he purpose of this paper is to bring out the close connection between game theory, on-line prediction and boosting,

More information

Lecture 29: Computational Learning Theory

Lecture 29: Computational Learning Theory CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational

More information

Combinatorial Optimization Spring Term 2015 Rico Zenklusen. 2 a = ( 3 2 ) 1 E(a, A) = E(( 3 2 ), ( 4 0

Combinatorial Optimization Spring Term 2015 Rico Zenklusen. 2 a = ( 3 2 ) 1 E(a, A) = E(( 3 2 ), ( 4 0 3 2 a = ( 3 2 ) 1 E(a, A) = E(( 3 2 ), ( 4 0 0 1 )) 0 0 1 2 3 4 5 Figure 9: An example of an axis parallel ellipsoid E(a, A) in two dimensions. Notice that the eigenvectors of A correspond to the axes

More information

Multiclass Learnability and the ERM Principle

Multiclass Learnability and the ERM Principle Journal of Machine Learning Research 6 205 2377-2404 Submitted 8/3; Revised /5; Published 2/5 Multiclass Learnability and the ERM Principle Amit Daniely amitdaniely@mailhujiacil Dept of Mathematics, The

More information

2 Upper-bound of Generalization Error of AdaBoost

2 Upper-bound of Generalization Error of AdaBoost COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Near-Optimal Algorithms for Online Matrix Prediction

Near-Optimal Algorithms for Online Matrix Prediction JMLR: Workshop and Conference Proceedings vol 23 (2012) 38.1 38.13 25th Annual Conference on Learning Theory Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan Technion - Israel Inst. of Tech.

More information

Using More Data to Speed-up Training Time

Using More Data to Speed-up Training Time Using More Data to Speed-up Training Time Shai Shalev-Shwartz Ohad Shamir Eran Tromer Benin School of CSE, Microsoft Research, Blavatnik School of Computer Science Hebrew University, Jerusalem, Israel

More information

Computational and Statistical Learning theory

Computational and Statistical Learning theory Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,

More information

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d CS161, Lecture 4 Median, Selection, and the Substitution Method Scribe: Albert Chen and Juliana Cook (2015), Sam Kim (2016), Gregory Valiant (2017) Date: January 23, 2017 1 Introduction Last lecture, we

More information

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016 Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning

More information

1. Kernel ridge regression In contrast to ordinary least squares which has a cost function. m (θ T x (i) y (i) ) 2, J(θ) = 1 2.

1. Kernel ridge regression In contrast to ordinary least squares which has a cost function. m (θ T x (i) y (i) ) 2, J(θ) = 1 2. CS229 Problem Set #2 Solutions 1 CS 229, Public Course Problem Set #2 Solutions: Theory Kernels, SVMs, and 1. Kernel ridge regression In contrast to ordinary least squares which has a cost function J(θ)

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

A New Perspective on an Old Perceptron Algorithm

A New Perspective on an Old Perceptron Algorithm A New Perspective on an Old Perceptron Algorithm Shai Shalev-Shwartz 1,2 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc, 1600 Amphitheater

More information

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 3: Learning Theory

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 3: Learning Theory COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 3: Learning Theory Sanjeev Arora Elad Hazan Admin Exercise 1 due next Tue, in class Enrolment Recap We have seen: AI by introspection

More information

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,

More information

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about

More information

Linear Classifiers: Expressiveness

Linear Classifiers: Expressiveness Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture outline Linear classifiers: Introduction What functions do linear classifiers express?

More information

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an

More information

Is our computers learning?

Is our computers learning? Attribution These slides were prepared for the New Jersey Governor s School course The Math Behind the Machine taught in the summer of 2012 by Grant Schoenebeck Large parts of these slides were copied

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 8: Boosting (and Compression Schemes) Boosting the Error If we have an efficient learning algorithm that for any distribution

More information

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target

More information

Convex Repeated Games and Fenchel Duality

Convex Repeated Games and Fenchel Duality Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.

More information

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center On the power and the limits of evolvability Vitaly Feldman Almaden Research Center Learning from examples vs evolvability Learnable from examples Evolvable Parity functions The core model: PAC [Valiant

More information

Lecture 3. Big-O notation, more recurrences!!

Lecture 3. Big-O notation, more recurrences!! Lecture 3 Big-O notation, more recurrences!! Announcements! HW1 is posted! (Due Friday) See Piazza for a list of HW clarifications First recitation section was this morning, there s another tomorrow (same

More information

6.842 Randomness and Computation April 2, Lecture 14

6.842 Randomness and Computation April 2, Lecture 14 6.84 Randomness and Computation April, 0 Lecture 4 Lecturer: Ronitt Rubinfeld Scribe: Aaron Sidford Review In the last class we saw an algorithm to learn a function where very little of the Fourier coeffecient

More information

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning

More information

Two Lectures on the Ellipsoid Method for Solving Linear Programs

Two Lectures on the Ellipsoid Method for Solving Linear Programs Two Lectures on the Ellipsoid Method for Solving Linear Programs Lecture : A polynomial-time algorithm for LP Consider the general linear programming problem: δ = max cx S.T. Ax b, x R n, () where A =

More information

Weighted Majority and the Online Learning Approach

Weighted Majority and the Online Learning Approach Statistical Techniques in Robotics (16-81, F12) Lecture#9 (Wednesday September 26) Weighted Majority and the Online Learning Approach Lecturer: Drew Bagnell Scribe:Narek Melik-Barkhudarov 1 Figure 1: Drew

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection

Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Greedy Algorithms, Frank-Wolfe and Friends

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next

More information

Machine Learning in the Data Revolution Era

Machine Learning in the Data Revolution Era Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information