Online (and Distributed) Learning with Information Constraints. Ohad Shamir
|
|
- Sandra Abigail Foster
- 5 years ago
- Views:
Transcription
1 Online (and Distributed) Learning with Information Constraints Ohad Shamir Weizmann Institute of Science Online Algorithms and Learning Workshop Leiden, November 2014 Ohad Shamir Learning with Information Constraints 1/23
2 Main Question How well can we learn with information constraints on how we can interact with data? Ohad Shamir Learning with Information Constraints 2/23
3 Partial-Information Constraints Only part of the data is visible to the learning algorithm Ohad Shamir Learning with Information Constraints 3/23
4 Partial-Information Constraints Only part of the data is visible to the learning algorithm Examples A few coordinates from each example Multi-armed bandits + variants, attribute-efficient learning, missing features... Ohad Shamir Learning with Information Constraints 3/23
5 Partial-Information Constraints Only part of the data is visible to the learning algorithm Examples A few coordinates from each example Multi-armed bandits + variants, attribute-efficient learning, missing features... A linear projection of each example Bandit linear optimization Ohad Shamir Learning with Information Constraints 3/23
6 Partial-Information Constraints Only part of the data is visible to the learning algorithm Examples A few coordinates from each example Multi-armed bandits + variants, attribute-efficient learning, missing features... A linear projection of each example Bandit linear optimization Some other function of each example Partial monitoring; bandit convex optimization... Ohad Shamir Learning with Information Constraints 3/23
7 Partial-Information Constraints Only part of the data is visible to the learning algorithm Examples A few coordinates from each example Multi-armed bandits + variants, attribute-efficient learning, missing features... A linear projection of each example Bandit linear optimization Some other function of each example Partial monitoring; bandit convex optimization... Price of partial information: Known in some cases, setting-dependent Ohad Shamir Learning with Information Constraints 3/23
8 Memory Constraints Ohad Shamir Learning with Information Constraints 4/23
9 Memory Constraints Example (Online Convex Optimization) Consider the squared loss or exp-concave losses Ohad Shamir Learning with Information Constraints 4/23
10 Memory Constraints Example (Online Convex Optimization) Consider the squared loss or exp-concave losses O(log(T )) regret possible, but known algorithms require Ω(d 2 ) memory [Vovk 1998; Azoury and Warmuth 2001; Hazan et al. 2007] Ohad Shamir Learning with Information Constraints 4/23
11 Memory Constraints Example (Online Convex Optimization) Consider the squared loss or exp-concave losses O(log(T )) regret possible, but known algorithms require Ω(d 2 ) memory [Vovk 1998; Azoury and Warmuth 2001; Hazan et al. 2007] O( T ) regret possible with O(d)-memory algorithms (e.g. online gradient descent) Is there a price to pay for linear-memory methods? Ohad Shamir Learning with Information Constraints 4/23
12 Memory Constraints Example (Online PCA) i.i.d. data stream x 1,..., x m R d Goal: Find direction/s with most variance Solution: Consider empirical covariance matrix 1 m m i=1 x ix i d d matrix: too large in high dimensions Ohad Shamir Learning with Information Constraints 5/23
13 Memory Constraints Example (Online PCA) i.i.d. data stream x 1,..., x m R d Goal: Find direction/s with most variance Solution: Consider empirical covariance matrix 1 m m i=1 x ix i d d matrix: too large in high dimensions Can we do online PCA with limited memory? [Warmuth and Kuzmin 2008, Arora et al. 2012, Mitliagkas et al. 2013, Balsubramani et al ] Ohad Shamir Learning with Information Constraints 5/23
14 Communication Constraints Training examples partitioned in a distributed system Communication is (relatively) slow and expensive Can we learn with small communication complexity? Ohad Shamir Learning with Information Constraints 6/23
15 Main Questions Can we quantify information theoretically how such constraints affect our performance? Can we trade-off between data size and information constraints? Memory/ communication/ Information per example feasible region Sample complexity (data size) Ohad Shamir Learning with Information Constraints 7/23
16 Related Work Problem-specific lower bounds (e.g. multi-armed bandits) Learning with communication constraints [e.g. Balcan et al. 2012, Zhang et al. 2013]: Non i.i.d. data or very small communication budget Communication/space bounds in theoretical computer science. But not learning problems and/or non-i.i.d. data Statistical estimation with limited memory [e.g. Hellman and Cover 1970, Leighton and Rivest 1986, Ertin and Potter 2003, Kontorovich 2012]: Asymptotic / small memory regime Ohad Shamir Learning with Information Constraints 8/23
17 (b, n, m) Protocols Ohad Shamir Learning with Information Constraints 9/23
18 (b, n, m) Protocols Definition For t = 1... m Receive n i.i.d. instances X t Compute W t = f t (X t, W 1, W 2,..., W t 1 ), where W t b bits Return f (W 1... W m ) X t b bits W 1 W t 1 W t Learning Algorithm Ohad Shamir Learning with Information Constraints 9/23
19 Examples of (b, n, m) Protocols Ohad Shamir Learning with Information Constraints 10/23
20 Examples of (b, n, m) Protocols Bounded memory online algorithms For t = 1... m Receive i.i.d. instance x t Compute W t = f t (x t, W t 1 ), where W t b bits Return f (W m ) W t 1 x t W t Learning Algorithm Ohad Shamir Learning with Information Constraints 10/23
21 Examples of (b, n, m) Protocols Learning from Expert Advice w/ partial information For t = 1... m Reward vector x t generated Observe W t = f t (x t, W 1, W 2,..., W t 1 ), where W t b bits E.g. for multiarmed bandits, W t = x t,it where coordinate i t selected based on past observations x 1,i1,, x t 1,it 1 x t x t,it Learning Algorithm Ohad Shamir Learning with Information Constraints 11/23
22 Examples of (b, n, m) Protocols Non-interactive distributed learning Machine 1 X 1 Machine 2 X 2 Machine m X m W 2 W 1 W m Ohad Shamir Learning with Information Constraints 12/23
23 Examples of (b, n, m) Protocols One-pass distributed learning Machine 1 X 1 W 1 Machine 2 X 2 Machine m X m W 2 W m Ohad Shamir Learning with Information Constraints 13/23
24 Hide-and-Seek Problem Ohad Shamir Learning with Information Constraints 14/23
25 Hide-and-Seek Problem Instances x t drawn i.i.d. from a product distribution on { 1, +1} d A single unknown coordinate j {1,..., d} is biased E[x t ] = ρ e j Goal: Given sampled data, find j Ohad Shamir Learning with Information Constraints 14/23
26 Result for (b, 1, m) Protocols (e.g. bounded-memory) Ohad Shamir Learning with Information Constraints 15/23
27 Result for (b, 1, m) Protocols (e.g. bounded-memory) Theorem Given m i.i.d. instances; bias ρ: Ohad Shamir Learning with Information Constraints 15/23
28 Result for (b, 1, m) Protocols (e.g. bounded-memory) Theorem Given m i.i.d. instances; bias ρ: Can detect biased coordinate if m log(d)/ρ 2 Just return coordinate J with highest empirical mean Ohad Shamir Learning with Information Constraints 15/23
29 Result for (b, 1, m) Protocols (e.g. bounded-memory) Theorem Given m i.i.d. instances; bias ρ: Can detect biased coordinate if m log(d)/ρ 2 Just return coordinate J with highest empirical mean Any (b, 1, m) protocol will fail if m (d/b)/ρ 2 Ohad Shamir Learning with Information Constraints 15/23
30 Result for (b, 1, m) Protocols (e.g. bounded-memory) Theorem Given m i.i.d. instances; bias ρ: Can detect biased coordinate if m log(d)/ρ 2 Just return coordinate J with highest empirical mean Any (b, 1, m) protocol will fail if m (d/b)/ρ 2 Tight up to log factors Can be generalized to n > 1 Ohad Shamir Learning with Information Constraints 15/23
31 Proof Idea x t,1 j x t,2 x t,j x t,d W t Ohad Shamir Learning with Information Constraints 16/23
32 Proof Idea x t,1 j x t,2 x t,j x t,d W t As long as W t doesn t know j, can only convey b/d bits on x t,j in expectation Ohad Shamir Learning with Information Constraints 16/23
33 Proof Idea x t,1 j x t,2 x t,j x t,d W t As long as W t doesn t know j, can only convey b/d bits on x t,j in expectation x t,j conveys at most ρ 2 bits on j Ohad Shamir Learning with Information Constraints 16/23
34 Proof Idea x t,1 j x t,2 x t,j x t,d W t As long as W t doesn t know j, can only convey b/d bits on x t,j in expectation x t,j conveys at most ρ 2 bits on j Standard data processing inequality: W t conveys min{ρ 2, b/d} bits on j Ohad Shamir Learning with Information Constraints 16/23
35 Proof Idea x t,1 j x t,2 x t,j x t,d W t As long as W t doesn t know j, can only convey b/d bits on x t,j in expectation x t,j conveys at most ρ 2 bits on j Standard data processing inequality: W t conveys min{ρ 2, b/d} bits on j We prove an information contraction inequality: W t conveys ρ 2 b/d bits on j Ohad Shamir Learning with Information Constraints 16/23
36 Proof Idea x t,1 j x t,2 x t,j x t,d W t W t conveys ρ 2 b/d bits on j After m rounds: mρ 2 b/d bits of information on j. Ohad Shamir Learning with Information Constraints 17/23
37 Proof Idea x t,1 j x t,2 x t,j x t,d W t W t conveys ρ 2 b/d bits on j After m rounds: mρ 2 b/d bits of information on j. Insufficient to detect if m (d/b)/ρ 2 Ohad Shamir Learning with Information Constraints 17/23
38 Application: Online learning with partial information Ohad Shamir Learning with Information Constraints 18/23
39 Application: Online learning with partial information T reward vectors x 1,..., x T chosen from {0, 1} d. Some b bits of each x t are sequentially observed Ohad Shamir Learning with Information Constraints 18/23
40 Application: Online learning with partial information T reward vectors x 1,..., x T chosen from {0, 1} d. Some b bits of each x t are sequentially observed ( ) d Theorem: Expected regret Ω b T Result is generic holds regardless of what are those b bits: Chosen coordinate x t,it (Multi-armed bandits) Ohad Shamir Learning with Information Constraints 18/23
41 Application: Online learning with partial information T reward vectors x 1,..., x T chosen from {0, 1} d. Some b bits of each x t are sequentially observed ( ) d Theorem: Expected regret Ω b T Result is generic holds regardless of what are those b bits: Chosen coordinate x t,it (Multi-armed bandits) Some other coordinate x t,jt ; Some subset of coordinates x t,j1, x t,j2,..., x t,jk (semi-bandit feedback; prediction with limited advice; bandits with side-information; attribute-efficient learning...); Linear projection of x t (bandit linear optimization); Using some bounded-width feedback matrix (partial monitoring)... Even if algorithm can choose b bits based on x t! Ohad Shamir Learning with Information Constraints 18/23
42 Application: Stochastic/Online Optimization Theorem stochastic/online optimization problems a in R d, s.t. Possible to get Õ( T ) regret Any b-memory online algorithm (or b-communication distributed algorithm) has Ω( (d 2 /b)t ) regret a Caveat: Non-convex, but efficiently solvable Ohad Shamir Learning with Information Constraints 19/23
43 Application: Stochastic/Online Optimization Theorem stochastic/online optimization problems a in R d, s.t. Possible to get Õ( T ) regret Any b-memory online algorithm (or b-communication distributed algorithm) has Ω( (d 2 /b)t ) regret a Caveat: Non-convex, but efficiently solvable Conclusion: Any online algorithm with o(d 2 ) memory is suboptimal (e.g. gradient descent, mirror descent) New Memory-sample and communication-sample trade-offs: In a statistical setting, can get same average regret with more data, even with memory/communication constraints Ohad Shamir Learning with Information Constraints 19/23
44 Application: Sparse PCA / Covariance Estimation Simplest Case - Detecting Correlations Given i.i.d. sample x 1,..., x m, detect a single pair of correlated features Ohad Shamir Learning with Information Constraints 20/23
45 Application: Sparse PCA / Covariance Estimation Simplest Case - Detecting Correlations Given i.i.d. sample x 1,..., x m, detect a single pair of correlated features Statistically optimal methods use empirical covariance matrix 1 m m i=1 x ix i Problem: Requires too much communication/memory when dimension d and m are large We Show: There are situations where no memory/communication-efficient method can be statistically optimal Ohad Shamir Learning with Information Constraints 20/23
46 Application: Sparse PCA / Covariance Estimation Theorem Suppose: E[xx ] = I d + τ (E i,j + E j,i ) for unknown i, j Given m samples, i j, Pr ( x i x j E[x i x j ] τ 2 ) 2 exp( mτ 2 /6). Then for τ = Θ(1/d): m d 2 : Enough to return largest off-diagonal entry of empirical covariance matrix m d4 b : Any b-memory online algorithm / (b, n, m) protocol for appropriate n will fail for some such distribution Ohad Shamir Learning with Information Constraints 21/23
47 Summary Generic framework of information constraints in learning Same results applicable to Different information constraints (memory, communication, partial information...) Different learning settings (multi-armed bandits and variants, stochastic/online optimization, sparse PCA and covariance estimation...) New resource trade-offs: More data - less memory More data - less communication (in a natural regime) Ohad Shamir Learning with Information Constraints 22/23
48 Summary Generic framework of information constraints in learning Same results applicable to Different information constraints (memory, communication, partial information...) Different learning settings (multi-armed bandits and variants, stochastic/online optimization, sparse PCA and covariance estimation...) New resource trade-offs: More data - less memory More data - less communication (in a natural regime) Information Trade-offs for other learning settings? Ohad Shamir Learning with Information Constraints 22/23
49 More details: arxiv and NIPS 2014 THANKS! Ohad Shamir Learning with Information Constraints 23/23
Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation
Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation Ohad Shamir Weizmann Institute of Science ohad.shamir@weizmann.ac.il Abstract Many machine learning approaches
More informationFirst Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate
58th Annual IEEE Symposium on Foundations of Computer Science First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate Zeyuan Allen-Zhu Microsoft Research zeyuan@csail.mit.edu
More informationA Stochastic PCA Algorithm with an Exponential Convergence Rate. Ohad Shamir
A Stochastic PCA Algorithm with an Exponential Convergence Rate Ohad Shamir Weizmann Institute of Science NIPS Optimization Workshop December 2014 Ohad Shamir Stochastic PCA with Exponential Convergence
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between
More informationOnline Learning and Online Convex Optimization
Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game
More informationFrom Bandits to Experts: A Tale of Domination and Independence
From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A
More informationBandits for Online Optimization
Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each
More informationTrade-Offs in Distributed Learning and Optimization
Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed
More informationExplore no more: Improved high-probability regret bounds for non-stochastic bandits
Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret
More informationThe Online Approach to Machine Learning
The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I
More informationExplore no more: Improved high-probability regret bounds for non-stochastic bandits
Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret
More informationNew Algorithms for Contextual Bandits
New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised
More informationSupplementary Material of Dynamic Regret of Strongly Adaptive Methods
Supplementary Material of A Proof of Lemma 1 Lijun Zhang 1 ianbao Yang Rong Jin 3 Zhi-Hua Zhou 1 We first prove the first part of Lemma 1 Let k = log K t hen, integer t can be represented in the base-k
More informationReducing contextual bandits to supervised learning
Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationEfficient learning by implicit exploration in bandit problems with side observations
Efficient learning by implicit exploration in bandit problems with side observations omáš Kocák Gergely Neu Michal Valko Rémi Munos SequeL team, INRIA Lille Nord Europe, France {tomas.kocak,gergely.neu,michal.valko,remi.munos}@inria.fr
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationAlireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017
s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses
More informationExponential Weights on the Hypercube in Polynomial Time
European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts
More informationYevgeny Seldin. University of Copenhagen
Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New
More informationAdaptivity and Optimism: An Improved Exponentiated Gradient Algorithm
Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)
More informationNotes on AdaGrad. Joseph Perla 2014
Notes on AdaGrad Joseph Perla 2014 1 Introduction Stochastic Gradient Descent (SGD) is a common online learning algorithm for optimizing convex (and often non-convex) functions in machine learning today.
More informationAdaptive Sampling Under Low Noise Conditions 1
Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università
More informationNoisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get
Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More informationNo-Regret Algorithms for Unconstrained Online Convex Optimization
No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com
More informationIFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent
IFT 6085 - Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s):
More informationNear-Optimal Algorithms for Online Matrix Prediction
JMLR: Workshop and Conference Proceedings vol 23 (2012) 38.1 38.13 25th Annual Conference on Learning Theory Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan Technion - Israel Inst. of Tech.
More informationStochastic bandits: Explore-First and UCB
CSE599s, Spring 2014, Online Learning Lecture 15-2/19/2014 Stochastic bandits: Explore-First and UCB Lecturer: Brendan McMahan or Ofer Dekel Scribe: Javad Hosseini In this lecture, we like to answer this
More informationRegret Bounds for Online Portfolio Selection with a Cardinality Constraint
Regret Bounds for Online Portfolio Selection with a Cardinality Constraint Shinji Ito NEC Corporation Daisuke Hatano RIKEN AIP Hanna Sumita okyo Metropolitan University Akihiro Yabe NEC Corporation akuro
More informationOnline Kernel PCA with Entropic Matrix Updates
Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth University of California - Santa Cruz ICML 2007, Corvallis, Oregon April 23, 2008 D. Kuzmin, M. Warmuth (UCSC) Online Kernel
More informationCommunication-efficient and Differentially-private Distributed SGD
1/36 Communication-efficient and Differentially-private Distributed SGD Ananda Theertha Suresh with Naman Agarwal, Felix X. Yu Sanjiv Kumar, H. Brendan McMahan Google Research November 16, 2018 2/36 Outline
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationStochastic Optimization
Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic
More informationStochastic Gradient Descent. CS 584: Big Data Analytics
Stochastic Gradient Descent CS 584: Big Data Analytics Gradient Descent Recap Simplest and extremely popular Main Idea: take a step proportional to the negative of the gradient Easy to implement Each iteration
More informationEASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University
EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationOnline Bounds for Bayesian Algorithms
Online Bounds for Bayesian Algorithms Sham M. Kakade Computer and Information Science Department University of Pennsylvania Andrew Y. Ng Computer Science Department Stanford University Abstract We present
More informationOptimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning
More informationarxiv: v1 [cs.it] 21 Feb 2013
q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto
More informationOn the tradeoff between computational complexity and sample complexity in learning
On the tradeoff between computational complexity and sample complexity in learning Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Joint work with Sham
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationCS281B/Stat241B. Statistical Learning Theory. Lecture 14.
CS281B/Stat241B. Statistical Learning Theory. Lecture 14. Wouter M. Koolen Convex losses Exp-concave losses Mixable losses The gradient trick Specialists 1 Menu Today we solve new online learning problems
More informationThe Multi-Armed Bandit Problem
The Multi-Armed Bandit Problem Electrical and Computer Engineering December 7, 2013 Outline 1 2 Mathematical 3 Algorithm Upper Confidence Bound Algorithm A/B Testing Exploration vs. Exploitation Scientist
More informationLinear Scalarized Knowledge Gradient in the Multi-Objective Multi-Armed Bandits Problem
ESANN 04 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 3-5 April 04, i6doc.com publ., ISBN 978-8749095-7. Available from
More informationOnline Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri
Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over
More informationStochastic Composition Optimization
Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators
More informationPiecewise-stationary Bandit Problems with Side Observations
Jia Yuan Yu jia.yu@mcgill.ca Department Electrical and Computer Engineering, McGill University, Montréal, Québec, Canada. Shie Mannor shie.mannor@mcgill.ca; shie@ee.technion.ac.il Department Electrical
More informationBandits and Exploration: How do we (optimally) gather information? Sham M. Kakade
Bandits and Exploration: How do we (optimally) gather information? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 22
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationOnline Learning with Gaussian Payoffs and Side Observations
Online Learning with Gaussian Payoffs and Side Observations Yifan Wu 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science University of Alberta 2 Department of Electrical and Electronic
More informationDistributed Inexact Newton-type Pursuit for Non-convex Sparse Learning
Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology
More informationAdaptive Online Learning in Dynamic Environments
Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationOnline learning with feedback graphs and switching costs
Online learning with feedback graphs and switching costs A Proof of Theorem Proof. Without loss of generality let the independent sequence set I(G :T ) formed of actions (or arms ) from to. Given the sequence
More informationLarge-scale Stochastic Optimization
Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation
More informationOnline Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural
More informationLecture 4: Lower Bounds (ending); Thompson Sampling
CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from
More informationMaking Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationImproving the Convergence of Back-Propogation Learning with Second Order Methods
the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible
More informationLearning with Exploration
Learning with Exploration John Langford (Yahoo!) { With help from many } Austin, March 24, 2011 Yahoo! wants to interactively choose content and use the observed feedback to improve future content choices.
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For
More informationBudgeted Prediction With Expert Advice
Budgeted Prediction With Expert Advice Kareem Amin University of Pennsylvania, Philadelphia, PA akareem@seas.upenn.edu Gerald Tesauro IBM T.J. Watson Research Center Yorktown Heights, NY gtesauro@us.ibm.com
More informationLinear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationThe Epoch-Greedy Algorithm for Contextual Multi-armed Bandits John Langford and Tong Zhang
The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits John Langford and Tong Zhang Presentation by Terry Lam 02/2011 Outline The Contextual Bandit Problem Prior Works The Epoch Greedy Algorithm
More informationSparse Linear Contextual Bandits via Relevance Vector Machines
Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationAdvanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12
Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Lecture 4: Multiarmed Bandit in the Adversarial Model Lecturer: Yishay Mansour Scribe: Shai Vardi 4.1 Lecture Overview
More informationLinear Probability Forecasting
Linear Probability Forecasting Fedor Zhdanov and Yuri Kalnishkan Computer Learning Research Centre and Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, TW0 0EX, UK {fedor,yura}@cs.rhul.ac.uk
More informationAllocating Resources, in the Future
Allocating Resources, in the Future Sid Banerjee School of ORIE May 3, 2018 Simons Workshop on Mathematical and Computational Challenges in Real-Time Decision Making online resource allocation: basic model......
More informationApproximate Principal Components Analysis of Large Data Sets
Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationGibbs Sampling in Linear Models #2
Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling
More information1 Kernel methods & optimization
Machine Learning Class Notes 9-26-13 Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions
More informationSummary and discussion of: Dropout Training as Adaptive Regularization
Summary and discussion of: Dropout Training as Adaptive Regularization Statistics Journal Club, 36-825 Kirstin Early and Calvin Murdock November 21, 2014 1 Introduction Multi-layered (i.e. deep) artificial
More informationLearnability, Stability, Regularization and Strong Convexity
Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago
More informationStructured Prediction
Structured Prediction Ningshan Zhang Advanced Machine Learning, Spring 2016 Outline Ensemble Methods for Structured Prediction[1] On-line learning Boosting AGeneralizedKernelApproachtoStructuredOutputLearning[2]
More informationClassification: Logistic Regression from Data
Classification: Logistic Regression from Data Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 3 Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber Boulder Classification:
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationEvaluation of multi armed bandit algorithms and empirical algorithm
Acta Technica 62, No. 2B/2017, 639 656 c 2017 Institute of Thermomechanics CAS, v.v.i. Evaluation of multi armed bandit algorithms and empirical algorithm Zhang Hong 2,3, Cao Xiushan 1, Pu Qiumei 1,4 Abstract.
More informationOracle Complexity of Second-Order Methods for Smooth Convex Optimization
racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Hollister, 306 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Other Artificial Neural Networks Samy Bengio IDIAP Research Institute, Martigny, Switzerland,
More informationAn Experimental Evaluation of High-Dimensional Multi-Armed Bandits
An Experimental Evaluation of High-Dimensional Multi-Armed Bandits Naoki Egami Romain Ferrali Kosuke Imai Princeton University Talk at Political Data Science Conference Washington University, St. Louis
More informationPresentation in Convex Optimization
Dec 22, 2014 Introduction Sample size selection in optimization methods for machine learning Introduction Sample size selection in optimization methods for machine learning Main results: presents a methodology
More informationStochastic and Adversarial Online Learning without Hyperparameters
Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford
More informationCommunication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality
Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality Mark Braverman Ankit Garg Tengyu Ma Huy Nguyen David Woodruff DIMACS Workshop on Big Data through
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More informationBandit Convex Optimization: T Regret in One Dimension
Bandit Convex Optimization: T Regret in One Dimension arxiv:1502.06398v1 [cs.lg 23 Feb 2015 Sébastien Bubeck Microsoft Research sebubeck@microsoft.com Tomer Koren Technion tomerk@technion.ac.il February
More informationDoes Better Inference mean Better Learning?
Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract
More information