Applications of on-line prediction. in telecommunication problems
|
|
- Dwayne King
- 5 years ago
- Views:
Transcription
1 Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1
2 Outline On-line prediction; Some algorithmic issues; An application to lossy data compression without delay; An application to a routing problem. 2
3 Randomized prediction A game between forecaster and environment. At each round t, the forecaster chooses an action I t {1,..., N}; the environment chooses an action y t Y; the forecaster suffers loss l(i t, y t ) [0, 1]. The goal is to minimize the cumulative excess loss ( n ) 1 n l(i t, y t ) min l(i, y t ) n i N The forecaster may randomize. At time t chooses a probability distribution p t = (p 1,t,..., p N,t ) and plays action i with probability p i,t. Actions are often called experts.. 3
4 Randomized prediction This and related models have been studied in game theory: playing repeated games; information theory: gambling and data compression; statistics: sequential decisions; statistical learning theory: on-line learning; The simplest model assumes that after each round, the losses l(i, y t ) (i = 1,..., N) are revealed (full information). In this model Hannan (1957) showed that the forecaster has a strategy such that ( n ) 1 n l(i t, y t ) min l(i, y t ) 0 a.s. n i N 4
5 Hannan consistency: basic ideas Obviously, the forecaster must randomize. l(p t, y t ) = N p i,t l(i, y t ) = E t l(i t, y t ) i=1 denotes the expected loss of the forecaster. By martingale convergence, ( 1 n l(i t, y t ) n so it suffices to study ( 1 n n ) n l(p t, y t ) = O P (n 1/2 ) l(p t, y t ) min i N ) n l(i, y t ) 5
6 Weighted average prediction Idea: assign a higher probability to better-performing actions. A popular choice is ( exp η ) t s=1 l(i, y s) p i,t = N k=1 ( η exp ) i = 1,..., N. t s=1 l(k, y s) where η > 0. Then, with η = 8 ln N/n, ( n ) 1 n l(p n t, y t ) min l(i, y t ) i N = ln N nη + η 8 ln N 2n If N is large, the algorithm is not feasible. There is hope for structured classes of experts. 6
7 The shortest path problem v PSfrag replacements u v u Given a dag (V, E) and vertices u and v, a path from u to v is identified by a binary vector i {0, 1} E. For each t = 1,..., n, the forecaster chooses a path I t. A loss l e,t [0, 1] is assigned to each e E. The loss of a path i is the sum of losses over the edges l(i, Y t ) = i l t. 7
8 Efficient implementation The number of paths (experts) is exponentially large. However, the weighted average forecaster can be computed efficiently. For each path i, we write p i,t = P t [I t = i] = K i k=1 It can be seen that, if (ŵ, w) E, P t [v It,k = v ik v It,k 1 = v i,k 1,..., v It,0 = u]. P t [v It,k = w v It,k 1 = ŵ,..., v It,0 = u] = e ηl ( bw,w),t 1 G t 1(w) G t 1 (ŵ) where, for any vertex w, G t (w) = i P w e η P e i L e,t. 8
9 The algorithm computes the weighted average forecaster over all paths such that it requires O( E ) operations at each time. The expected regret is bounded by n l(p t, Y t ) min i n n ln M l(i, Y t ) K 2 where M is the number of paths from u to v and K is the length of the longest path. 9
10 Lossy source coding problem Asymptotically optimal coding of a source sequence with zero delay (sequential encoding and decoding); no probabilistic assumptions on the source; asymptotically the same performance as the best scalar quantizer matched to the entire source sequence (for any sequence). Goals: solve the above problem with low complexity; determine the achievable rate of convergence (also for probabilistic sources) 10
11 Zero-Delay Sequential Source Coding x i encoder y i {1,..., M} decoder ˆx i Ui Source sequence: x i [0, 1], i = 1, 2,...; rate R = log M code: Encoder: f i : [0, 1] i {1, 2..., M}; Decoder: g i : {1, 2,..., M} i [0, 1]; y i = f i (x i ) = f i (x i, U i ) ˆx i = g i (y i ) Randomizing sequence: U i, i.i.d. uniform on [0, 1]; 11
12 Performance Expected normalized cumulative squared distortion: [ ] 1 n E (x i ˆx i ) 2 n i=1 Goal: Perform (essentially) as well as the best scalar quantizer matched to the entire sequence x n = (x 1,..., x n ). Expected distortion redundancy with respect to the class of scalar quantizers: ( [ ] ) 1 n ρ n = sup E (x i ˆx i ) 2 1 n min (x i Q(x i )) 2 x n n Q Q n i=1 where Q is the set of M-level scalar quantizers. i=1 12
13 A scheme based on on-line prediction n {}}{ }{{} (k + 1)st block l {}}{}{{}}{{} 1 R log ( M) K transmitting Q k (x i ) describing Q k the source sequence is divided into blocks of length l at the end of the kth block a quantizer Q k is chosen randomly from a given family of ( K M) quantizers in the first 1 R log ( K M) time instants of the (k + 1)st block the index of Q k is transmitted in the rest of the block, at time instants i, Q k (x i ) is transmitted 13
14 Choosing Q k Let Q K (K n 1/3 for minimum distortion) be the set of M-level nearest neighbor scalar quantizers with code points in the set C (K) = {1/(2K), 3/(2K),..., (2K 1)/(2K)}. Q k is chosen randomly from Q K according to the probabilities p k (Q) = P{Q k = Q} = e η P kl (x t Q(x t )) 2 b Q QK e η P kl (x t bq(x t )) 2 l log K log K Performance: ρ n C 1 + C 2 n l where C 1, C 2 > 0 are constants depending only on M. + 1 K Straightforward implementation: need to compute the cumulative distortion for all Q Q K, i.e., for about K M n M/3 quantizers. 14
15 Efficient implementation The shortest path algorithm can be used to construct a zero-delay coding scheme of rate R = log M such that for any sequence x n [0, 1] n ρ n C 1 l log K n + C 2 log K l + 3 K where C 1, C 2 > 0 are constants depending only on M, with O(MK 2 n/l) computational complexity; and O(K 2 ) memory requirement. Linear computational complexity O(M) per time instant and O(n 1/2 ) memory requirement with O(n 1/4 log n) distortion redundancy 15
16 Tracking the best expert (Herbster and Warmuth, 1998). Given any sequence i 1,..., i n of actions from {1,..., N}, define size(i 1,..., i n ) = n t=2 I i t 1 i t. The tracking regret is ( n ) n max l(p t, Y t ) l(i t, Y t ) over all sequences with size(i 1,..., i n ) m. ( n 1 k The number of meta experts is m k=0 weighted average would give a regret bounded by n 2 ( (m + 1) ln N + m ln ) N(N 1) k. Simple ) e(n 1) m 16
17 Fixed share forecaster Parameters: Real numbers η > 0 and 0 α = m/(n 1). Initialization: w 0 = (1/N,..., 1/N). For each round t = 1, 2,... (1) draw an action I t from {1,..., N} according to the distribution p i,t = w i,t 1 N j=1 w j,t 1 i = 1,..., N (2) obtain Y t and compute v i,t = w i,t 1 e η l(i,y t) for each i = 1,..., N (3) let w i,t = α W t N + (1 α)v i,t where W t = v 1,t + + v N,t. for each i = 1,..., N 17
18 Tracking the shortest path Goal: predict as well as the best path that can change m times. m ( n 1 ) k=0 k M(M 1) k meta experts. (M is the number of paths.) The fixed share forecaster requires computation proportional to M. We need efficient computation of the fixed share algorithm. 18
19 Fixed share forecaster revisited Initialization: For t = 1, choose I 1 uniformly from the set {1,..., N}. For each round t = 2,..., n (1) Draw τ t randomly according to the distribution (1 α) t 1 Z 1,t 1 NW t 1 for t = 1 P t [τ t = t ] = where we define Z t,t 1 = N. P t [I t = i τ t = t ] = α(1 α) t t W t 1 Z t,t 1 NW t 1 for t = 2,..., t (2) Given τ t = t, choose I t randomly according to the probabilities e η P t 1 s=t l(i,y s) Z for t = 1,..., t 1 t,t 1 1 N for t = t 19
20 Tracking the shortest path The alternative fixed share algorithm can be implemented such that, at time t, computing the prediction I t requires time O(tK E + t 2 ). The expected tracking regret of the algorithm satisfies K n 2 ( (m + 1) ln M + m ln e n ) m for all sequences of paths i 1,..., i n such that size(i 1,..., i n ) m where M is the number of all paths in the graph between vertices u and v and K is the length of the longest path. 20
21 Application to zero-delay coding An efficiently computable (O(M) computations per time instant) zero-delay coding scheme can be constructed whose average distortion is at most as large as the best code that can change scalar quantizers at most m times plus a redundancy of the order of mn 1/6. 21
22 Multi-armed bandit problem (Auer, Cesa-Bianchi, Freund, and Schapire, 2002.) The forecaster only observes his/her own loss. Losses of experts have to be estimated. Let g(i, Y t ) = 1 l(i, Y t ). Gains are estimated by g(i, Y t )/p i,t if I t = i g(i, Y t ) = i = 1,..., N. 0 otherwise Note that E[ g(i, Y t ) I 1,..., I t 1 ] = g(i, Y t ), and therefore g(i, Y t ) is an unbiased estimate of g(i, Y t ). 22
23 A strategy for the multi-armed bandit problem Initialization: w i,0 = 1 and p i,1 = 1/N for i = 1,..., N. (1) Select an action I t {1,..., N} according to the probability distribution p t. (2) Calculate the estimated gains g (i, Y t ) = g(i, Y t ) + β p i,t = ( g(i, Yt ) + β ) /p i,t if I t = i β/p i,t otherwise (3) Update the weights w i,t = w i,t 1 e ηg (i,y t ) (4) Calculate the updated probability distribution w i,t p i,t+1 = (1 γ) N j=1 w j,t + γ N, i = 1,..., N. 23
24 Performance bound The regret of the algorithm satisfies, with probability 1 δ, L n min i=1,...,n L i,n 11 2 nn ln(n/δ) + ln N 2 Bound depends badly on the number of actions. Cannot be improved, in general.. 24
25 Bandit version of shortest path: a routing problem A packet is sent through a network (graph), and at each edge the transmission suffers a delay that can vary arbitrarily. We want to choose the route such that the total delay is not much larger than that of the best constant route. We only receive information about the delays of the packets sent, but don t know anything about the other edges. 25
26 The general bandit algorithm needs to be modified to exploit the special structure of the class of experts of all paths. The probability of certain paths needs to be kept large. We obtain an algorithm, computable in time quadratic in E whose regret is, with probability 1 δ, n l(i t, Y t ) min i n l(i, Y t ) nk 3 E ln(m/δ). 26
The on-line shortest path problem under partial monitoring
The on-line shortest path problem under partial monitoring András György Machine Learning Research Group Computer and Automation Research Institute Hungarian Academy of Sciences Kende u. 11-13, Budapest,
More informationLearning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationLecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem
Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationFrom Bandits to Experts: A Tale of Domination and Independence
From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A
More informationPrediction and Playing Games
Prediction and Playing Games Vineel Pratap vineel@eng.ucsd.edu February 20, 204 Chapter 7 : Prediction, Learning and Games - Cesa Binachi & Lugosi K-Person Normal Form Games Each player k (k =,..., K)
More informationEASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University
EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Adversarial bandits Definition: sequential game. Lower bounds on regret from the stochastic case. Exp3: exponential weights
More informationInternal Regret in On-line Portfolio Selection
Internal Regret in On-line Portfolio Selection Gilles Stoltz (gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi (lugosi@upf.es)
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationAdvanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12
Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Lecture 4: Multiarmed Bandit in the Adversarial Model Lecturer: Yishay Mansour Scribe: Shai Vardi 4.1 Lecture Overview
More informationEfficient On-line Schemes for Encoding Individual Sequences with Side Information at the Decoder
Efficient On-line Schemes for Encoding Individual Sequences with Side Information at the Decoder Avraham Reani and Neri Merhav Department of Electrical Engineering Technion - Israel Institute of Technology
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationEfficient Tracking of Large Classes of Experts
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 11, NOVEMBER 2012 6709 Efficient Tracking of Large Classes of Experts András György, Member, IEEE, Tamás Linder, Senior Member, IEEE, and Gábor Lugosi
More informationTHE first formalization of the multi-armed bandit problem
EDIC RESEARCH PROPOSAL 1 Multi-armed Bandits in a Network Farnood Salehi I&C, EPFL Abstract The multi-armed bandit problem is a sequential decision problem in which we have several options (arms). We can
More informationA Second-order Bound with Excess Losses
A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands
More informationReinforcement Learning
Reinforcement Learning Lecture 5: Bandit optimisation Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce bandit optimisation: the
More informationMultimedia Communications. Scalar Quantization
Multimedia Communications Scalar Quantization Scalar Quantization In many lossy compression applications we want to represent source outputs using a small number of code words. Process of representing
More informationNew Algorithms for Contextual Bandits
New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised
More informationBandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University
Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3
More informationYevgeny Seldin. University of Copenhagen
Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.
More informationGambling in a rigged casino: The adversarial multi-armed bandit problem
Gambling in a rigged casino: The adversarial multi-armed bandit problem Peter Auer Institute for Theoretical Computer Science University of Technology Graz A-8010 Graz (Austria) pauer@igi.tu-graz.ac.at
More informationInternal Regret in On-line Portfolio Selection
Internal Regret in On-line Portfolio Selection Gilles Stoltz gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi lugosi@upf.es)
More informationCombinatorial Bandits
Combinatorial Bandits Nicolò Cesa-Bianchi Università degli Studi di Milano, Italy Gábor Lugosi 1 ICREA and Pompeu Fabra University, Spain Abstract We study sequential prediction problems in which, at each
More informationOnline learning CMPUT 654. October 20, 2011
Online learning CMPUT 654 Gábor Bartók Dávid Pál Csaba Szepesvári István Szita October 20, 2011 Contents 1 Shooting Game 4 1.1 Exercises...................................... 6 2 Weighted Majority Algorithm
More informationLearning with Large Number of Experts: Component Hedge Algorithm
Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between
More informationMinimax Policies for Combinatorial Prediction Games
Minimax Policies for Combinatorial Prediction Games Jean-Yves Audibert Imagine, Univ. Paris Est, and Sierra, CNRS/ENS/INRIA, Paris, France audibert@imagine.enpc.fr Sébastien Bubeck Centre de Recerca Matemàtica
More informationOptimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More informationAdaptive Online Prediction by Following the Perturbed Leader
Journal of Machine Learning Research 6 (2005) 639 660 Submitted 10/04; Revised 3/05; Published 4/05 Adaptive Online Prediction by Following the Perturbed Leader Marcus Hutter Jan Poland IDSIA, Galleria
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationThe Online Approach to Machine Learning
The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I
More informationReducing contextual bandits to supervised learning
Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing
More informationMulti-armed Bandits in the Presence of Side Observations in Social Networks
52nd IEEE Conference on Decision and Control December 0-3, 203. Florence, Italy Multi-armed Bandits in the Presence of Side Observations in Social Networks Swapna Buccapatnam, Atilla Eryilmaz, and Ness
More informationToward a Classification of Finite Partial-Monitoring Games
Toward a Classification of Finite Partial-Monitoring Games Gábor Bartók ( *student* ), Dávid Pál, and Csaba Szepesvári Department of Computing Science, University of Alberta, Canada {bartok,dpal,szepesva}@cs.ualberta.ca
More informationAdvanced Machine Learning
Advanced Machine Learning Learning with Large Expert Spaces MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Problem Learning guarantees: R T = O( p T log N). informative even for N very large.
More informationTHE WEIGHTED MAJORITY ALGORITHM
THE WEIGHTED MAJORITY ALGORITHM Csaba Szepesvári University of Alberta CMPUT 654 E-mail: szepesva@ualberta.ca UofA, October 3, 2006 OUTLINE 1 PREDICTION WITH EXPERT ADVICE 2 HALVING: FIND THE PERFECT EXPERT!
More informationCoding into a source: an inverse rate-distortion theorem
Coding into a source: an inverse rate-distortion theorem Anant Sahai joint work with: Mukul Agarwal Sanjoy K. Mitter Wireless Foundations Department of Electrical Engineering and Computer Sciences University
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Online Learning Repeated game: Aim: minimize ˆL n = Decision method plays a t World reveals l t L n l t (a
More informationImportance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits
Journal of Machine Learning Research 17 (2016 1-21 Submitted 3/15; Revised 7/16; Published 8/16 Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits Gergely
More informationCODING SAMPLE DIFFERENCES ATTEMPT 1: NAIVE DIFFERENTIAL CODING
5 0 DPCM (Differential Pulse Code Modulation) Making scalar quantization work for a correlated source -- a sequential approach. Consider quantizing a slowly varying source (AR, Gauss, ρ =.95, σ 2 = 3.2).
More informationRegret in Online Combinatorial Optimization
Regret in Online Combinatorial Optimization Jean-Yves Audibert Imagine, Université Paris Est, and Sierra, CNRS/ENS/INRIA audibert@imagine.enpc.fr Sébastien Bubeck Department of Operations Research and
More informationExplore no more: Improved high-probability regret bounds for non-stochastic bandits
Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret
More informationStrategies for Prediction Under Imperfect Monitoring
MATHEMATICS OF OPERATIONS RESEARCH Vol. 33, No. 3, August 008, pp. 53 58 issn 0364-765X eissn 56-547 08 3303 053 informs doi 0.87/moor.080.03 008 INFORMS Strategies for Prediction Under Imperfect Monitoring
More informationOnline Forest Density Estimation
Online Forest Density Estimation Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr UAI 16 1 Outline 1 Probabilistic Graphical Models 2 Online Density Estimation 3 Online Forest Density
More informationOnline prediction with expert advise
Online prediction with expert advise Jyrki Kivinen Australian National University http://axiom.anu.edu.au/~kivinen Contents 1. Online prediction: introductory example, basic setting 2. Classification with
More informationIntroduction to Bandit Algorithms. Introduction to Bandit Algorithms
Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,
More informationEvaluation of multi armed bandit algorithms and empirical algorithm
Acta Technica 62, No. 2B/2017, 639 656 c 2017 Institute of Thermomechanics CAS, v.v.i. Evaluation of multi armed bandit algorithms and empirical algorithm Zhang Hong 2,3, Cao Xiushan 1, Pu Qiumei 1,4 Abstract.
More informationLecture 3: Lower Bounds for Bandit Algorithms
CMSC 858G: Bandits, Experts and Games 09/19/16 Lecture 3: Lower Bounds for Bandit Algorithms Instructor: Alex Slivkins Scribed by: Soham De & Karthik A Sankararaman 1 Lower Bounds In this lecture (and
More informationTime Series Prediction & Online Learning
Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.
More informationThe Multi-Armed Bandit Problem
Università degli Studi di Milano The bandit problem [Robbins, 1952]... K slot machines Rewards X i,1, X i,2,... of machine i are i.i.d. [0, 1]-valued random variables An allocation policy prescribes which
More informationTheory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!
Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training
More informationMARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for
MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1
More informationOnline Learning with Costly Features and Labels
Online Learning with Costly Features and Labels Navid Zolghadr Department of Computing Science University of Alberta zolghadr@ualberta.ca Gábor Bartók Department of Computer Science TH Zürich bartok@inf.ethz.ch
More informationExponentiated Gradient Descent
CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,
More informationCOS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 22 Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3 How to balance exploration and exploitation in reinforcement
More informationExponential Weights on the Hypercube in Polynomial Time
European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts
More informationAdvanced Machine Learning
Advanced Machine Learning Follow-he-Perturbed Leader MEHRYAR MOHRI MOHRI@ COURAN INSIUE & GOOGLE RESEARCH. General Ideas Linear loss: decomposition as a sum along substructures. sum of edge losses in a
More informationGame-Theoretic Learning:
Game-Theoretic Learning: Regret Minimization vs. Utility Maximization Amy Greenwald with David Gondek, Amir Jafari, and Casey Marks Brown University University of Pennsylvania November 17, 2004 Background
More informationOnline Learning of Non-stationary Sequences
Online Learning of Non-stationary Sequences Claire Monteleoni and Tommi Jaakkola MIT Computer Science and Artificial Intelligence Laboratory 200 Technology Square Cambridge, MA 02139 {cmontel,tommi}@ai.mit.edu
More informationOnline Prediction: Bayes versus Experts
Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,
More informationVector Quantization. Institut Mines-Telecom. Marco Cagnazzo, MN910 Advanced Compression
Institut Mines-Telecom Vector Quantization Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced Compression 2/66 19.01.18 Institut Mines-Telecom Vector Quantization Outline Gain-shape VQ 3/66 19.01.18
More informationTwo optimization problems in a stochastic bandit model
Two optimization problems in a stochastic bandit model Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishnan Journées MAS 204, Toulouse Outline From stochastic optimization
More informationLecture 4: Lower Bounds (ending); Thompson Sampling
CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding
More informationSequential prediction with coded side information under logarithmic loss
under logarithmic loss Yanina Shkel Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Maxim Raginsky Department of Electrical and Computer Engineering Coordinated Science
More informationDownloaded 11/11/14 to Redistribution subject to SIAM license or copyright; see
SIAM J. COMPUT. Vol. 32, No. 1, pp. 48 77 c 2002 Society for Industrial and Applied Mathematics THE NONSTOCHASTIC MULTIARMED BANDIT PROBLEM PETER AUER, NICOLÒ CESA-BIANCHI, YOAV FREUND, AND ROBERT E. SCHAPIRE
More informationOnline Prediction Peter Bartlett
Online Prediction Peter Bartlett Statistics and EECS UC Berkeley and Mathematical Sciences Queensland University of Technology Online Prediction Repeated game: Cumulative loss: ˆL n = Decision method plays
More informationMinimax strategy for prediction with expert advice under stochastic assumptions
Minimax strategy for prediction ith expert advice under stochastic assumptions Wojciech Kotłosi Poznań University of Technology, Poland otlosi@cs.put.poznan.pl Abstract We consider the setting of prediction
More informationOptimality of Walrand-Varaiya Type Policies and. Approximation Results for Zero-Delay Coding of. Markov Sources. Richard G. Wood
Optimality of Walrand-Varaiya Type Policies and Approximation Results for Zero-Delay Coding of Markov Sources by Richard G. Wood A thesis submitted to the Department of Mathematics & Statistics in conformity
More informationRegret to the Best vs. Regret to the Average
Regret to the Best vs. Regret to the Average Eyal Even-Dar 1, Michael Kearns 1, Yishay Mansour 2, Jennifer Wortman 1 1 Department of Computer and Information Science, University of Pennsylvania 2 School
More informationOn Minimaxity of Follow the Leader Strategy in the Stochastic Setting
On Minimaxity of Follow the Leader Strategy in the Stochastic Setting Wojciech Kot lowsi Poznań University of Technology, Poland wotlowsi@cs.put.poznan.pl Abstract. We consider the setting of prediction
More informationRevisiting the Exploration-Exploitation Tradeoff in Bandit Models
Revisiting the Exploration-Exploitation Tradeoff in Bandit Models joint work with Aurélien Garivier (IMT, Toulouse) and Tor Lattimore (University of Alberta) Workshop on Optimization and Decision-Making
More informationOnline Learning for Time Series Prediction
Online Learning for Time Series Prediction Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values.
More informationarxiv: v3 [cs.lg] 30 Jun 2012
arxiv:05874v3 [cslg] 30 Jun 0 Orly Avner Shie Mannor Department of Electrical Engineering, Technion Ohad Shamir Microsoft Research New England Abstract We consider a multi-armed bandit problem where the
More information18.2 Continuous Alphabet (discrete-time, memoryless) Channel
0-704: Information Processing and Learning Spring 0 Lecture 8: Gaussian channel, Parallel channels and Rate-distortion theory Lecturer: Aarti Singh Scribe: Danai Koutra Disclaimer: These notes have not
More informationOnline learning with feedback graphs and switching costs
Online learning with feedback graphs and switching costs A Proof of Theorem Proof. Without loss of generality let the independent sequence set I(G :T ) formed of actions (or arms ) from to. Given the sequence
More informationOnline Learning with Experts & Multiplicative Weights Algorithms
Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect
More information6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011
6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011 On the Structure of Real-Time Encoding and Decoding Functions in a Multiterminal Communication System Ashutosh Nayyar, Student
More informationOnline combinatorial optimization with stochastic decision sets and adversarial losses
Online combinatorial optimization with stochastic decision sets and adversarial losses Gergely Neu Michal Valko SequeL team, INRIA Lille Nord urope, France {gergely.neu,michal.valko}@inria.fr Abstract
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationEfficient learning by implicit exploration in bandit problems with side observations
Efficient learning by implicit exploration in bandit problems with side observations omáš Kocák Gergely Neu Michal Valko Rémi Munos SequeL team, INRIA Lille Nord Europe, France {tomas.kocak,gergely.neu,michal.valko,remi.munos}@inria.fr
More informationAn Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes
An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes Prokopis C. Prokopiou, Peter E. Caines, and Aditya Mahajan McGill University
More informationBandit Algorithms. Tor Lattimore & Csaba Szepesvári
Bandit Algorithms Tor Lattimore & Csaba Szepesvári Bandits Time 1 2 3 4 5 6 7 8 9 10 11 12 Left arm $1 $0 $1 $1 $0 Right arm $1 $0 Five rounds to go. Which arm would you play next? Overview What are bandits,
More informationThe Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm
he Algorithmic Foundations of Adaptive Data Analysis November, 207 Lecture 5-6 Lecturer: Aaron Roth Scribe: Aaron Roth he Multiplicative Weights Algorithm In this lecture, we define and analyze a classic,
More informationDigital communication system. Shannon s separation principle
Digital communication system Representation of the source signal by a stream of (binary) symbols Adaptation to the properties of the transmission channel information source source coder channel coder modulation
More informationPiecewise-stationary Bandit Problems with Side Observations
Jia Yuan Yu jia.yu@mcgill.ca Department Electrical and Computer Engineering, McGill University, Montréal, Québec, Canada. Shie Mannor shie.mannor@mcgill.ca; shie@ee.technion.ac.il Department Electrical
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationNotes from Week 8: Multi-Armed Bandit Problems
CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 8: Multi-Armed Bandit Problems Instructor: Robert Kleinberg 2-6 Mar 2007 The multi-armed bandit problem The multi-armed bandit
More informationDefensive forecasting for optimal prediction with expert advice
Defensive forecasting for optimal prediction with expert advice Vladimir Vovk $25 Peter Paul $0 $50 Peter $0 Paul $100 The Game-Theoretic Probability and Finance Project Working Paper #20 August 11, 2007
More informationFundamental rate delay tradeoffs in multipath routed and network coded networks
Fundamental rate delay tradeoffs in multipath routed and network coded networks John Walsh and Steven Weber Drexel University, Dept of ECE Philadelphia, PA 94 {jwalsh,sweber}@ecedrexeledu IP networks subject
More informationIntroducing strategic measure actions in multi-armed bandits
213 IEEE 24th International Symposium on Personal, Indoor and Mobile Radio Communications: Workshop on Cognitive Radio Medium Access Control and Network Solutions Introducing strategic measure actions
More information