Learnability, Stability, Regularization and Strong Convexity
|
|
- Kristian Richard
- 5 years ago
- Views:
Transcription
1 Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago (2008)
2 What is this tutorial about? Characterizing learnability What makes a problem learnable Necessary and sufficient conditions Role of stability Universal learning rules Anything learnable is learnable using XXX Only need to consider rules of the form XXX Relationship between online and statistical
3 Statistical Learning Relationship to Online Plan Characterizing (statistical) learnability Stability as the master property Stability in online learning Convex Problems Strong convexity as the master property
4 The General Learning Setting aka Stochastic Optimization min F w = E z~d f w, z w W Known objective function f: W Ω R Unknown distribution D over Z Ω Unknown distribution over random functions w f(w, z) Given an iid sample z 1, z 2, D, Given iid samples of random functions f, z 1, f, z 2, Vapnik95
5 General Learning: Examples Minimize F(w)=E z [f(w;z)] based on sample z 1,z 2,,z n Supervised learning: z = (x,y) w specifies a perdictor h w : X! Y f( w ; (x,y) ) = loss(h w (x),y) e.g. linear prediction: w, (x) 2 R d, f(w;(x,y) = loss(w (x),y) Unsupervised learning, e.g. k-means clustering: = x 2 R d w = ( [1],, [k]) 2 R d k specifies k cluster centers f( ( [1],, [k]) ; x ) = min j [j]-x 2 Density estimation: w specifies probability density p w (x) f( w ; x ) = -log p w (x) Optimization in uncertain environment, e.g.: z = traffic delays on each road segment w = route chosen (indicator over road segments in route) f( w ; z ) = hw,zi = total delay along route
6 General Learning Setting: Learnability Minimize F w = E z [f w, z ] based on sample z 1,, z m Problem specified by f, and domain W Problem is learnable if: There is a learning rule w(z 1,, z m ) for choosing w based on z 1,, z m s.t. for every ε > 0 and large enough sample size m, for any distribution D: E z1,,z m D F w inf F w + ε w W F(w ) Supervised Learning: When f(w;(x,y)) = loss(h w (x),y): learnable agnostic PAC learnable
7 Online Learning (Optimization) Adversary: Learner: f( ;z 1 ) f( ;z 2 ) f( ;z 3 ) w 1 w 2 w 3. Known function f, Unknown sequence z 1, z 2, Online learning rule: w i (z 1,, z i 1 ) Goal: i f(w i, z i ) Differences vs stochastic setting: Any sequence not necessarily iid No distinction between train and test
8 Online and Stochastic Regret Problem specified by f(, ) and domain W Online Regret: for any sequence, 1 m m i=1 f(w i (z 1,, z i 1 ), z i ) inf w W 1 m m i=1 f w, z i + Reg(m) Statistical Regret: for any distribution D, E z1,,z m D F D w z 1,, z m inf w W F D w + ε(m) Online-To-Batch: F w F w w(z 1,, z m ) = w i with prob 1/m E F w F w + Reg m
9 Online vs Statistical Online Learnable Statistically Learnable Thresholds: θ 0,1, h θ x = x θ x, y, x 0,1, y ±1 f θ, x, y = y h θ x = y sign(x θ) θ Learnable with excess error ε m = O 1 ε 2 statistically Can force learner to always make a mistake on adversarial sequence (Reg m = 1)
10 What is Learnable? Is a problem, specified by f, and W, learnable? I.e. can we drive regret / excess error ε to zero? Using what learning rule?
11 Supervised Classification f(w;(x,y)) = loss(h w (x),y): Binary Hypothesis Classes: h w (x)2±1 { h w w 2 W } has finite VC dimension Uniform convergence: sup F (w) ^F n!1 (w)! 0 w2w Online Learnable Learnable using ERM: ^w = arg min ^F(w) F( ^w) n!1! F(w? ) Learnable (using some rule): F( ~w) n!1! F(w? )
12 Supervised Classification f(w;(x,y)) = loss(h w (x),y): Real Valued h w (with appropriate loss function) { h w w 2 W } has finite fat-shattering dimension Uniform convergence: sup F (w) ^F n!1 (w)! 0 w2w Online Learnable Learnable using ERM: ^w = arg min ^F(w) F( ^w) n!1! F(w? ) Learnable (using some rule): F( ~w) n!1! F(w? ) [Alon Ben-David Cesa-Bianchi Haussler 93]
13 Beyond Supervised Learning Supervised learning: f w, x, y = loss(h w x, y) Combinatorial necessary and sufficient condition of learning ERM universal (if learnable, can do it with ERM) General learning / stochastic optimization: f(w, z)????
14 Convex Lipchitz Problems W convex bounded subset of Hilbert space (e.g. R d ) w W w 2 B For each z, f(w, z) convex Lipschitz w.r.t w f w, z f w, z L w w 2 Online Gradient Descent: Stochastic Setting? Reg m B2 L 2 m
15 Stochastic Convex Optimization Generalized Linear Objectives L = l g R Lipchitz in w f w, z = g( w, φ z ; z ) l g -Lipchitz R Uniform convergence: sup w W F w F w O B 2 L 2 log 1/δ m w.p. 1 δ E F w F w + O B 2 L 2 m Lipschitz-Bounded Problems Learnable in statistical setting? Of course, via online-to-batch! Using ERM? Is there uniform convergence? Finite VC?
16 Empirical Minimizer Not Consistent! f(w; ) = w W = {w 2 R d : w 1 } 2 {0,1} d Elementwise product ~ Uniform({0,1} d ), i.e. [i] ~ iid uniform 0/1 If d>>2 m (think of d=1) then with high probability there exists a coordinate j that is never on in the sample, i.e. i [j]=0 for all i=1..m ^F (e j ) = 0 F (e j ) = 1=2 sup w2w F (w) ^F(w) 1=2 No uniform convergence! e j is an empirical minimizer with F(e j ) = ½, far from F(w * )=F(0)=0
17 Empirical Minimizer Not Consistent! f(w;(,x)) = (w-x) 2 {0,1} d x2 R d, x 1 ~ Uniform({0,1} d ), i.e. [i] ~ iid uniform 0/1 If d>>2 m (think of d=1) then with high probability there exists a coordinate j that is never on in the sample, i.e. i [j]=0 for all i=1..m ^F (e j ) = 0 F (e j ) = 1=2 sup w2w F (w) ^F(w) 1=2 No uniform convergence! e j is an empirical minimizer with F(e j ) = ½, far from F(w * )=F(0)=0
18 general setting general setting general setting { z!f(w;z) w 2 W } has finite fat-shattering dimension Supervised learning Uniform convergence: sup F (w) ^F n!1 (w)! 0 w2w Supervised learning Learnable with ERM: F( ^w) n!1! F(w? ) Online Learnable Supervised learning Learnable (using some rule): F( ~w) n!1! F(w? )
19 Stochastic Convex Optimization and Supervised Learning For supervised learning objectives, e.g. f(w; x, y = yh w x 1 + (y ±1) f(w; x, y = h w x y 2 learnabiliy is equivalent to uniform convergence. What types of predictors h w yield convex problems? When y = +1, h w (x) should be convex in w When y = 1, h w (x) should be concave in w ) h w (x) should be linear in w Convex Supervised Learning Generalized Linear objective (y2{-1,+1})
20 Stochastic Convex Optimization Empirical minimization might not be consistent Learnable using specific procedural rule (online-to-batch conversion of online gradient descent)??????????
21 Strongly Convex Objective f(w, z) is -strongly convex in w iff: f αw + 1 α w αf w + 1 α f w λ 2 α 1 α w w 2 2 Equivalent to w 2 f w, z λ If f(w, z) is λ-convex and L-Lipschitz w.r.t. w Online Gradient Descent [Hazan Kalai Kale Agarwal 2006] Reg O L2 log(m) λm Stochastic Setting? ERM?
22 Strongly Convex Objective f(w, z) is -strongly convex in w iff: f αw + 1 α w αf w + 1 α f w λ 2 α 1 α w w 2 2 Equivalent to w 2 f w, z λ If f(w, z) is λ-convex and L-Lipschitz w.r.t. w Online Gradient Descent [Hazan Kalai Kale Agarwal 2006] Reg O L2 log(m) λm Empirical Risk Minimization: E F w F w + O L2 λm
23 Strong Convexity and Stability Definition: rule w(z 1, z m ) is β(m)-stable if: f w z 1,, z m 1, z m f w z 1,, z m, z m β(m) f is λ-strongly convex and L-Lipschitz f w z 1,, z m 1, z m f w z 1,, z m, z m β = 4L2 λm Symmetric w is β-stable E F w m 1 E F w m + β Conclusion: E F w For ERM: E F w E F w = F w β m
24 { z!f(w;z) w 2 W } has finite fat-shattering dimension Supervised learning Uniform convergence: not even local sup F (w) ^F n!1 (w)! 0 w2w Supervised learning Empirical minimizer is consistent: F( ^w) n!1! F(w? ) Online Learnable Supervised learning Solvable (using some algorithm): F( ~w) n!1! F(w? )
25 Back to Weak Convexity f(w, z) L-Lipchitz (and convex), w 2 B Use Regularized ERM: w λ = arg min w W F w + λ 2 w 2 2 Setting λ = L2 B 2 m : E F w λ F w + O L 2 B 2 m Key: strongly convex regularizer ensures stability
26 The Role of Regularization Typical view of regularization: Adding regularization term effectively constrains domain to lower complexity domain W r = w w r Learning guarantees (e.g. for SVMs, LASSO) are actually for empirical minimization inside W r, and are based on uniform convergence in W r. In our case: No uniform convergence in W r, for any r>0 No uniform convergence even of regularized loss Cannot solve stochastic optimization problem by restricting to W r, for any r. What regularization buys is stability
27 Stability Characterizes Learnability Theorem: Learnable with (symmetric) ERM w iff D E f w z 1,, z m 1, z m f w z 1,, z m, z m β(m) For some β m 0 Theorem: Learnable iff symetric w s.t. D: w is an almost ERM : E F w F w ε(m) w is stable: E f w z 1,, z m 1, z m f w z 1,, z m, z m β(m) For some ε m 0, β m 0 Finite fat-shat dimension Uniform Convergence Stable ERM Learnable with ERM supervised learning Stable AERM Learnable
28 Stability in Online Learning Reminder: rule w(z 1, z m ) is β(m)-stable if f w z 1,, z m 1, z m f w z 1,, z m, z m β(m) Follow The Leader (FTL): w m z 1,, z m 1 Be The Leader (BTL): w m z 1,, z m 1 Reg BTL m 0 = arg min w = arg min w i=1 m 1 f(w, z i ) i=1 m f(w, z i ) If the ERM is β(m)-stable: Reg FTL m Reg BTL m + 1 m i β i 1 m i β i E.g. for λ-strongly convex and L-Lipchitz: Reg FTL m L2 B 2 log m m
29 Regularization and Stability How do we deal with non-strongly convex objective? Follow The Regularized Leader (FTRL): w m z 1,, z m 1 = arg min w i=1 m 1 f(w, z i ) + λr(w) Be The Regularized Leader (BTRL): w m z 1,, z m 1 = arg min w i=1 m f w, z i + λr(w) Reg BTRL m λr(w) If the Regularized ERM is β(λ, m)-stable: Reg FTRL m Reg BTRL m + 1 m i β λ, i λr w + 1 m i β λ, i Follow The Almost Leader: 1 s.t. m 1 f( w m, z i ) inf w m m 1 i=1 w 1 m 1 i=1 m 1 f(w, z i ) + ε(m) Reg FTAL m Reg BTAL m + 1 m i β i 1 m i i ε i + 1 m i β i
30 Regularization in Online Gradient Descent w i+1 arg min w F w i + i, w w i + 1 2η w w i 2 2 = arg min1st η order i, w model + 1 of w F(w) w w 2 i 2 2 around w i, based on = w i η i i = f(w i, z i ) only valid near w i, so don t go too far F(w) F w i + F w i, w w i w i
31 Regularization in Online Gradient Descent w i+1 arg min w F w i + i, w w i + 1 2η w w i 2 2 = arg min w η i, w w w i 2 2 = w i η i F(w) F w i + F w i, w w i w i
32 Regularization in Online Gradient Descent w i+1 arg min w F w i + i, w w i + 1 2η w w i 2 2 = arg min w η i, w w w i 2 2 = w i η i Replace with other regularizer? Effect of regularization: performance depends on w 2
33 Online Mirror Descent Main ingredient: R(w) strongly convex w.r.t. some w R αw + 1 α w αr w + 1 α R w λ 2 α 1 α w w 2 Mirror Descent update: w i+1 arg min w W η f(w i, z i ), w + D R (w w i ) = Π W R R 1 R w i η i Bregman Divergence R w B 2 Reg m B2 L 2 m f w, z L-Lipchitz w.r.t w : f w, z f w, z L w w i.e. f L For generalized linear: φ L Example: with R w = entropy w = KL(w unif) and w 1, MD=Multiplicative Weights / Exponential Gradient
34 Strong Convexity as the Master Property R w strongly convex w.r.t w Uniform convergence of x w, x w B} arg min F w + λr(w) stable Online Mirror Descent RERM FTRL stat learnability of generalized linear (including supervised) stat learnability of f f L} over { w B} online learnability of f f L} over { w B}
35 Answers and Questions Statistical setting: Stable AERM (Almost ERM) necessary and sufficient Is stable RERM (Regularized ERM) necessary? i.e. do we always have w λ = arg min F w + λr(w) Online general setting: Stable FTRL (Follow the Regularized Leader) is sufficient Is stable FTAL (Follow the Almost Leader) sufficient? Stability/regularization necessary and sufficient? Convex problems: Strong convexity necessary and sufficient for online learning Strong convexity implies stability Online Mirror Descent (and FTRL) universal Strong convexity sufficient and almost necessary for stochastic learning Direct analysis of Mirror Descent in terms of stability?
Computational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationOptimistic Rates Nati Srebro
Optimistic Rates Nati Srebro Based on work with Karthik Sridharan and Ambuj Tewari Examples based on work with Andy Cotter, Elad Hazan, Tomer Koren, Percy Liang, Shai Shalev-Shwartz, Ohad Shamir, Karthik
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems
More informationMachine Learning in the Data Revolution Era
Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationOn the tradeoff between computational complexity and sample complexity in learning
On the tradeoff between computational complexity and sample complexity in learning Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Joint work with Sham
More informationOnline Learning: Random Averages, Combinatorial Parameters, and Learnability
Online Learning: Random Averages, Combinatorial Parameters, and Learnability Alexander Rakhlin Department of Statistics University of Pennsylvania Karthik Sridharan Toyota Technological Institute at Chicago
More informationLittlestone s Dimension and Online Learnability
Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David
More informationOLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research
OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationOn Learnability, Complexity and Stability
On Learnability, Complexity and Stability Silvia Villa, Lorenzo Rosasco and Tomaso Poggio 1 Introduction A key question in statistical learning is which hypotheses (function) spaces are learnable. Roughly
More informationThe Interplay Between Stability and Regret in Online Learning
The Interplay Between Stability and Regret in Online Learning Ankan Saha Department of Computer Science University of Chicago ankans@cs.uchicago.edu Prateek Jain Microsoft Research India prajain@microsoft.com
More informationAdaptive Online Gradient Descent
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Hollister, 306 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More informationOnline Learning and Online Convex Optimization
Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More informationStochastic Optimization
Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationStochastic Subgradient Method
Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)
More informationStochastic Approximation: Mini-Batches, Optimistic Rates and Acceleration
Stochastic Approximation: Mini-Batches, Optimistic Rates and Acceleration Nati Srebro Toyota Technological Institute at Chicago a philanthropically endowed academic computer science institute dedicated
More informationThe Online Approach to Machine Learning
The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I
More informationMachine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017
Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,
More informationIntroduction to Machine Learning (67577) Lecture 7
Introduction to Machine Learning (67577) Lecture 7 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Solving Convex Problems using SGD and RLM Shai Shalev-Shwartz (Hebrew
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationExponential Weights on the Hypercube in Polynomial Time
European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts
More informationOnline Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural
More informationBeyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
More informationMulti-class classification via proximal mirror descent
Multi-class classification via proximal mirror descent Daria Reshetova Stanford EE department resh@stanford.edu Abstract We consider the problem of multi-class classification and a stochastic optimization
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationEfficiently Training Sum-Product Neural Networks using Forward Greedy Selection
Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Greedy Algorithms, Frank-Wolfe and Friends
More informationNear-Optimal Algorithms for Online Matrix Prediction
JMLR: Workshop and Conference Proceedings vol 23 (2012) 38.1 38.13 25th Annual Conference on Learning Theory Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan Technion - Israel Inst. of Tech.
More informationAgnostic Online learnability
Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes
More informationAdaptivity and Optimism: An Improved Exponentiated Gradient Algorithm
Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)
More informationTrade-Offs in Distributed Learning and Optimization
Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed
More informationComposite Objective Mirror Descent
Composite Objective Mirror Descent John C. Duchi 1,3 Shai Shalev-Shwartz 2 Yoram Singer 3 Ambuj Tewari 4 1 University of California, Berkeley 2 Hebrew University of Jerusalem, Israel 3 Google Research
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect
More informationMulticlass Learnability and the ERM principle
Multiclass Learnability and the ERM principle Amit Daniely Sivan Sabato Shai Ben-David Shai Shalev-Shwartz November 5, 04 arxiv:308.893v [cs.lg] 4 Nov 04 Abstract We study the sample complexity of multiclass
More informationOnline Convex Optimization
Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationVC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.
VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about
More informationGeneralization bounds
Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question
More informationLecture 16: FTRL and Online Mirror Descent
Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7: Computational Complexity of Learning Agnostic Learning Hardness of Learning via Crypto Assumption: No poly-time algorithm
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationLogarithmic Regret Algorithms for Strongly Convex Repeated Games
Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationMulticlass Learnability and the ERM principle
JMLR: Workshop and Conference Proceedings 19 2011 207 232 24th Annual Conference on Learning Theory Multiclass Learnability and the ERM principle Amit Daniely amit.daniely@mail.huji.ac.il Dept. of Mathematics,
More informationSample Complexity of Learning Independent of Set Theory
Sample Complexity of Learning Independent of Set Theory Shai Ben-David University of Waterloo, Canada Based on joint work with Pavel Hrubes, Shay Moran, Amir Shpilka and Amir Yehudayoff Simons workshop,
More informationClassification Logistic Regression
Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 4: MDL and PAC-Bayes Uniform vs Non-Uniform Bias No Free Lunch: we need some inductive bias Limiting attention to hypothesis
More informationEfficient Bandit Algorithms for Online Multiclass Prediction
Efficient Bandit Algorithms for Online Multiclass Prediction Sham Kakade, Shai Shalev-Shwartz and Ambuj Tewari Presented By: Nakul Verma Motivation In many learning applications, true class labels are
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric
More informationInverse Time Dependency in Convex Regularized Learning
Inverse Time Dependency in Convex Regularized Learning Zeyuan A. Zhu (Tsinghua University) Weizhu Chen (MSRA) Chenguang Zhu (Tsinghua University) Gang Wang (MSRA) Haixun Wang (MSRA) Zheng Chen (MSRA) December
More informationComputational Learning Theory. CS534 - Machine Learning
Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows
More informationLower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization
JMLR: Workshop and Conference Proceedings vol 40:1 16, 2015 28th Annual Conference on Learning Theory Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization Mehrdad
More informationThe FTRL Algorithm with Strongly Convex Regularizers
CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationIntroduction to Machine Learning (67577) Lecture 3
Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationSimultaneous Model Selection and Optimization through Parameter-free Stochastic Learning
Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning Francesco Orabona Yahoo! Labs New York, USA francesco@orabona.com Abstract Stochastic gradient descent algorithms
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationLearning Theory Continued
Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationOnline Linear Optimization via Smoothing
JMLR: Workshop and Conference Proceedings vol 35:1 18, 2014 Online Linear Optimization via Smoothing Jacob Abernethy Chansoo Lee Computer Science and Engineering Division, University of Michigan, Ann Arbor
More informationComputational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013
Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationMulticlass Learnability and the ERM Principle
Journal of Machine Learning Research 6 205 2377-2404 Submitted 8/3; Revised /5; Published 2/5 Multiclass Learnability and the ERM Principle Amit Daniely amitdaniely@mailhujiacil Dept of Mathematics, The
More informationInverse Time Dependency in Convex Regularized Learning
Inverse Time Dependency in Convex Regularized Learning Zeyuan Allen Zhu 2*, Weizhu Chen 2, Chenguang Zhu 23, Gang Wang 2, Haixun Wang 2, Zheng Chen 2 Fundamental Science Class, Department of Physics, Tsinghua
More informationCompeting With Strategies
JMLR: Workshop and Conference Proceedings vol (203) 27 Competing With Strategies Wei Han Alexander Rakhlin Karthik Sridharan wehan@wharton.upenn.edu rakhlin@wharton.upenn.edu skarthik@wharton.upenn.edu
More informationUnconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations
JMLR: Workshop and Conference Proceedings vol 35:1 20, 2014 Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations H. Brendan McMahan MCMAHAN@GOOGLE.COM Google,
More informationComputational Learning Theory
1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number
More informationNew Algorithms for Contextual Bandits
New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised
More informationA survey: The convex optimization approach to regret minimization
A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization
More informationPAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationActive Learning: Disagreement Coefficient
Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationLecture 23: Online convex optimization Online convex optimization: generalization of several algorithms
EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected
More informationA Second-order Bound with Excess Losses
A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationMaking Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania
More informationMachine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016
Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning
More informationAccelerating Online Convex Optimization via Adaptive Prediction
Mehryar Mohri Courant Institute and Google Research New York, NY 10012 mohri@cims.nyu.edu Scott Yang Courant Institute New York, NY 10012 yangs@cims.nyu.edu Abstract We present a powerful general framework
More informationExponentiated Gradient Descent
CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationarxiv: v1 [cs.lg] 8 Feb 2018
Online Learning: A Comprehensive Survey Steven C.H. Hoi, Doyen Sahoo, Jing Lu, Peilin Zhao School of Information Systems, Singapore Management University, Singapore School of Software Engineering, South
More informationOnline Sparse Linear Regression
JMLR: Workshop and Conference Proceedings vol 49:1 11, 2016 Online Sparse Linear Regression Dean Foster Amazon DEAN@FOSTER.NET Satyen Kale Yahoo Research SATYEN@YAHOO-INC.COM Howard Karloff HOWARD@CC.GATECH.EDU
More informationCOS 402 Machine Learning and Artificial Intelligence Fall Lecture 3: Learning Theory
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 3: Learning Theory Sanjeev Arora Elad Hazan Admin Exercise 1 due next Tue, in class Enrolment Recap We have seen: AI by introspection
More informationLearnability of the Superset Label Learning Problem
Learnability of the Superset Label Learning Problem Li-Ping Liu LIULI@EECSOREGONSTATEEDU Thomas G Dietterich TGD@EECSOREGONSTATEEDU School of Electrical Engineering and Computer Science, Oregon State University,
More informationAdaptive Online Learning in Dynamic Environments
Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn
More informationRegression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!
Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011! 1 Todayʼs topics" Learning from Examples: brief review! Univariate Linear Regression! Batch gradient descent! Stochastic gradient
More informationComputational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory
More information