Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods

Similar documents
Ordinal Optimization and Multi Armed Bandit Techniques

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Multi-armed bandit models: a tutorial

Bandit models: a tutorial

Stratégies bayésiennes et fréquentistes dans un modèle de bandit

The information complexity of sequential resource allocation

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

The information complexity of best-arm identification

Two optimization problems in a stochastic bandit model

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Revisiting the Exploration-Exploitation Tradeoff in Bandit Models

On the Complexity of Best Arm Identification with Fixed Confidence

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms

Advanced Machine Learning

Reinforcement Learning

Tractable Sampling Strategies for Ordinal Optimization

The Multi-Armed Bandit Problem

The Multi-Arm Bandit Framework

arxiv: v2 [stat.ml] 14 Nov 2016

On the Complexity of Best Arm Identification with Fixed Confidence

Dynamic resource allocation: Bandit problems and extensions

Lecture 4: Lower Bounds (ending); Thompson Sampling

On Bayesian bandit algorithms

Bayesian and Frequentist Methods in Bandit Models

Multi armed bandit problem: some insights

Anytime optimal algorithms in stochastic multi-armed bandits

Online Learning and Sequential Decision Making

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Subsampling, Concentration and Multi-armed bandits

Concentration inequalities and tail bounds

Stochastic Contextual Bandits with Known. Reward Functions

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group

Consider the context of selecting an optimal system from among a finite set of competing systems, based

International Conference on Applied Probability and Computational Methods in Applied Sciences

Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks

Computational Oracle Inequalities for Large Scale Model Selection Problems

An Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement

Multiple Identifications in Multi-Armed Bandits

Lecture 2: Review of Basic Probability Theory

Bandits : optimality in exponential families

Stat 260/CS Learning in Sequential Decision Problems.

Concentration of Measures by Bounded Size Bias Couplings

STAT 200C: High-dimensional Statistics

Generalization theory

On the Complexity of A/B Testing

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs

Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it

Lecture 3: Lower Bounds for Bandit Algorithms

Adaptive Concentration Inequalities for Sequential Decision Problems

Introduction to Self-normalized Limit Theory

Multi-armed Bandits in the Presence of Side Observations in Social Networks

Informational Confidence Bounds for Self-Normalized Averages and Applications

Lecture 8: Information Theory and Statistics

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University

Analysis of Thompson Sampling for the multi-armed bandit problem

Stochastic bandits: Explore-First and UCB

Adaptive Learning with Unknown Information Flows

CHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD

Lecture 7 Introduction to Statistical Decision Theory

Csaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008

Concentration of Measures by Bounded Couplings

Introduction to Rare Event Simulation

Exercises and Answers to Chapter 1

Regional Multi-Armed Bandits

KULLBACK-LEIBLER UPPER CONFIDENCE BOUNDS FOR OPTIMAL SEQUENTIAL ALLOCATION

Pure Exploration Stochastic Multi-armed Bandits

The multi armed-bandit problem

Bandit Algorithms. Tor Lattimore & Csaba Szepesvári

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors

The knowledge gradient method for multi-armed bandit problems

Chapter 7. Basic Probability Theory

arxiv: v1 [cs.lg] 14 Nov 2018 Abstract

Multi-armed Bandits and the Gittins Index

Analysis of Thompson Sampling for the multi-armed bandit problem

The No-Regret Framework for Online Learning

Two generic principles in modern bandits: the optimistic principle and Thompson sampling

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10. x n+1 = f(x n ),

Bandit Algorithms for Pure Exploration: Best Arm Identification and Game Tree Search. Wouter M. Koolen

Learning, Games, and Networks

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Selected Exercises on Expectations and Some Probability Inequalities

Minimax strategy for Stratified Sampling for Monte Carlo

Probability and Measure

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

Sub-Gaussian estimators under heavy tails

Lecture 4: September Reminder: convergence of sequences

arxiv: v3 [cs.lg] 7 Nov 2017

An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities

Learning Algorithms for Minimizing Queue Length Regret

More on Estimation. Maximum Likelihood Estimation.

Consistency of the maximum likelihood estimator for general hidden Markov models

Statistical Machine Learning

ONLINE ADVERTISEMENTS AND MULTI-ARMED BANDITS CHONG JIANG DISSERTATION

Transcription:

Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Sandeep Juneja Tata Institute of Fundamental Research Mumbai, India joint work with Peter Glynn Applied Probability Frontiers - Banff event June 1, 2015 Glynn, Juneja: Ordinal optimization 1

The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Glynn, Juneja: Ordinal optimization 2

The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Goal is only to identify the best design and not to actually estimate the performance. Glynn, Juneja: Ordinal optimization 2

The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Goal is only to identify the best design and not to actually estimate the performance. We have the ability to generate iid realizations of each of the d random variables. Glynn, Juneja: Ordinal optimization 2

The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Goal is only to identify the best design and not to actually estimate the performance. We have the ability to generate iid realizations of each of the d random variables. We focus primarily on d = 2, so given independent samples of X we want to find if EX > 0 or EX < 0. Glynn, Juneja: Ordinal optimization 2

Determining the smallest mean population Sample Values 1 2 3 Popula'on Glynn, Juneja: Ordinal optimization 3

Talk Overview Estimating difference of mean values relies on central limit theorem with an associated slow n 1/2 convergence rate. Glynn, Juneja: Ordinal optimization 4

Talk Overview Estimating difference of mean values relies on central limit theorem with an associated slow n 1/2 convergence rate. Ho et al. (1990) observed that identifying the best system typically has a faster convergence rate. Glynn, Juneja: Ordinal optimization 4

Talk Overview Estimating difference of mean values relies on central limit theorem with an associated slow n 1/2 convergence rate. Ho et al. (1990) observed that identifying the best system typically has a faster convergence rate. Finding EX < EY, or EX > EY easier than determining the value EX EY. Glynn, Juneja: Ordinal optimization 4

L. Dai (1996) showed using large deviation methods that the probability of false selection decays at an exponential rate under mild light tailed assumptions. Thus, if EX 1 < min i 2 EX i then 1 lim n n log P( X 1 (n) > min X i (n)) = I i 2 for large deviations rate function I > 0. Glynn, Juneja: Ordinal optimization 5

L. Dai (1996) showed using large deviation methods that the probability of false selection decays at an exponential rate under mild light tailed assumptions. Thus, if EX 1 < min i 2 EX i then 1 lim n n log P( X 1 (n) > min X i (n)) = I i 2 for large deviations rate function I > 0. In a series of papers by Chen, Yucesan, L Dai, Chick and others (2000, 200*) attempted to optimize the budget allocated to each population under Normality assumption. Glynn, Juneja: Ordinal optimization 5

L. Dai (1996) showed using large deviation methods that the probability of false selection decays at an exponential rate under mild light tailed assumptions. Thus, if EX 1 < min i 2 EX i then 1 lim n n log P( X 1 (n) > min X i (n)) = I i 2 for large deviations rate function I > 0. In a series of papers by Chen, Yucesan, L Dai, Chick and others (2000, 200*) attempted to optimize the budget allocated to each population under Normality assumption. Substantial literature on selecting the best system amongst many alternatives using ranking/selection procedures. Gaussian assumption is critical to most of the analysis (Nelson and others) Glynn, Juneja: Ordinal optimization 5

Glynn and J (2004) observed that if then for p i > 0, d i=1 p i = 1 P( X 1 (p 1 n) > min i 2 EX 1 < min i 2 EX i X i (p i n)) e nh(p 1,...,p d ) so that H(p 1,..., p d ) can be optimised to determine optimal allocations. Glynn, Juneja: Ordinal optimization 6

Glynn and J (2004) observed that if then for p i > 0, d i=1 p i = 1 P( X 1 (p 1 n) > min i 2 EX 1 < min i 2 EX i X i (p i n)) e nh(p 1,...,p d ) so that H(p 1,..., p d ) can be optimised to determine optimal allocations. Significant literature since then relying on large deviations analysis (E.g., Hunter and Pasupathy 2013, Szechtman and Yucesan 2008, Broadie, Han Zeevi 2007, Blanchet, Liu, Zwart 2008). Glynn, Juneja: Ordinal optimization 6

Some observations If P(FS) e ni, for some I > 0, then n = 1 I log(1/δ)) ensures P(FS) δ. Glynn, Juneja: Ordinal optimization 7

Some observations If P(FS) e ni, for some I > 0, then n = 1 I log(1/δ)) ensures P(FS) δ. O(log(1/δ)) effort is necessary. If log(1/δ) 1 ɛ samples are generated, then as δ 0. positive no. P(X i A) log(1/δ)1 ɛ = δ log(1/δ) ɛ >> δ Glynn, Juneja: Ordinal optimization 7

HOPE However methods relying on P(FS) e ni, rely on estimating the large deviations rate function I from the samples generated. Glynn, Juneja: Ordinal optimization 8

HOPE However methods relying on P(FS) e ni, rely on estimating the large deviations rate function I from the samples generated. One hopes for algorihms that for n = O(log(1/δ)) ensure that at least asymptotically P(FS) δ, that is, lim sup P(FS)δ 1 1 δ 0 even when the means are separated by a fixed and known ɛ > 0. So that min 1 i d EX i < EX j ɛ for all suboptimal j. Glynn, Juneja: Ordinal optimization 8

Contributions We argue through two popular implementations that these rate functions are difficult to estimate accurately using O(log(1/δ)) samples Glynn, Juneja: Ordinal optimization 9

Contributions We argue through two popular implementations that these rate functions are difficult to estimate accurately using O(log(1/δ)) samples Enroute, we identify the large deviations rate function of the empirically estimated rate function. This may be of independent interest. Glynn, Juneja: Ordinal optimization 9

Key negative result Given any (ɛ, δ) algorithm - one that correctly separates designs with mean difference at least ɛ with lim sup P(FS)δ 1 1, δ 0 Glynn, Juneja: Ordinal optimization 10

Key negative result Given any (ɛ, δ) algorithm - one that correctly separates designs with mean difference at least ɛ with lim sup P(FS)δ 1 1, δ 0 We prove that for populations with unbounded support under mild restrictions, the expected number of samples cannot be O(log(1/δ)). Glynn, Juneja: Ordinal optimization 10

Positive contributions Under explicitly available moment upper bounds, we develop truncation based O(log(1/δ)) computation time (ɛ, δ) algorithms. Glynn, Juneja: Ordinal optimization 11

Positive contributions Under explicitly available moment upper bounds, we develop truncation based O(log(1/δ)) computation time (ɛ, δ) algorithms. We also adapt the recently proposed sequential algorithms in multi-armed bandit regret setting to this pure exploration setting. Glynn, Juneja: Ordinal optimization 11

Two phase ordinal optimisation implementation Glynn, Juneja: Ordinal optimization 12

Basic large deviations theory Suppose X 1, X 2,..., X n are i.i.d. samples of X and a > EX. Glynn, Juneja: Ordinal optimization 13

Basic large deviations theory Suppose X 1, X 2,..., X n are i.i.d. samples of X and a > EX. Then, for θ > 0, Cramer s bound P( X n a) e n(θa Λ(θ)) where Λ(θ) = log Ee θx. Glynn, Juneja: Ordinal optimization 13

Basic large deviations theory Suppose X 1, X 2,..., X n are i.i.d. samples of X and a > EX. Then, for θ > 0, Cramer s bound where Λ(θ) = log Ee θx. Cramer s Theorem P( X n a) e n(θa Λ(θ)) P( X ni (a)(1+o(1)) n a) = e where, the large deviations rate function I (a) = sup (θa Λ(θ)). θ R Glynn, Juneja: Ordinal optimization 13

The rate function I(x) EX x Glynn, Juneja: Ordinal optimization 14

A simple setting Consider a single rv X with unknown mean EX. Need to decide whether EX > 0 or EX < 0 with error probability δ. Glynn, Juneja: Ordinal optimization 15

A simple setting Consider a single rv X with unknown mean EX. Need to decide whether EX > 0 or EX < 0 with error probability δ. Then if EX < 0, we may take exp( ni (0)) as a proxy for P( X n 0), the probability of false selection. Glynn, Juneja: Ordinal optimization 15

A simple setting Consider a single rv X with unknown mean EX. Need to decide whether EX > 0 or EX < 0 with error probability δ. Then if EX < 0, we may take exp( ni (0)) as a proxy for P( X n 0), the probability of false selection. If EX > 0, we may again take exp( ni (0)) as a proxy for P( X n 0), the probability of false selection. Glynn, Juneja: Ordinal optimization 15

A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Glynn, Juneja: Ordinal optimization 16

A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is Glynn, Juneja: Ordinal optimization 16

A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Glynn, Juneja: Ordinal optimization 16

A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Second phase - Generate log(1/δ)/îm(0) = m/îm(0) n samples of X. Glynn, Juneja: Ordinal optimization 16

A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Second phase - Generate log(1/δ)/îm(0) = m/îm(0) n samples of X. Decide the sign of EX based on whether X n > 0 or X n 0. Glynn, Juneja: Ordinal optimization 16

A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Second phase - Generate log(1/δ)/îm(0) = m/îm(0) n samples of X. Decide the sign of EX based on whether X n > 0 or X n 0. We now discuss estimation of I (0). Glynn, Juneja: Ordinal optimization 16

Estimating rate function Glynn, Juneja: Ordinal optimization 17

Graphic view of I(0) The log-moment generating function of X Λ(θ) = log E exp(θx ) is convex with Λ(0) = 0 and Λ (0) = EX. Glynn, Juneja: Ordinal optimization 18

Graphic view of I(0) The log-moment generating function of X Λ(θ) = log E exp(θx ) is convex with Λ(0) = 0 and Λ (0) = EX. Then, I (0) = inf θ Λ(θ). Glynn, Juneja: Ordinal optimization 18

Λ(θ) Graphic view of I(0) The log-moment generating function of X Λ(θ) = log E exp(θx ) is convex with Λ(0) = 0 and Λ (0) = EX. Then, I (0) = inf θ Λ(θ). EX I(0) Glynn, Juneja: Ordinal optimization 18 θ

Estimating rate function I (0) A natural estimator for I (0) based on samples (X i : 1 i m) is Î m (0) = inf ˆΛ m (θ) θ R where ( ) 1 m ˆΛ m (θ) = log exp(θx i ) m i=1 Glynn, Juneja: Ordinal optimization 19

Λ(θ) Graphic view of estimated log moment generating function θ Im(0) Glynn, Juneja: Ordinal optimization 20

Large deviations rate function of Îm(0) Can show that for a > I (0), lim m 1 m log P(Îm(0) a), equals Glynn, Juneja: Ordinal optimization 21

Large deviations rate function of Îm(0) Can show that for a > I (0), lim m where lim m ( 1 m log P inf θ 1 m log P(Îm(0) a), equals 1 m m i=1 e θx i e a ) = inf θ R I θ(e a ), I θ (ν) = sup(αν log E exp(αe θx )). α Glynn, Juneja: Ordinal optimization 21

Large deviations rate function of Îm(0) Can show that for a > I (0), lim m where lim m ( 1 m log P inf θ 1 m log P(Îm(0) a), equals 1 m m i=1 e θx i e a ) = inf θ R I θ(e a ), I θ (ν) = sup(αν log E exp(αe θx )). α Further, for a < I (0), lim m 1 m log P(Îm(0) a) = sup θ R I θ (e a ) Glynn, Juneja: Ordinal optimization 21

Also, the natural estimator for I (x) is Î m (x) = sup(θx ˆΛ m (θ)). θ R Glynn, Juneja: Ordinal optimization 22

Also, the natural estimator for I (x) is Î m (x) = sup(θx ˆΛ m (θ)). θ R it can be seen that for a > I (x), the rate function for Î m (x), equals inf θ R I θ(e a+θx ) for a > I (x), and equals a < I (x). sup I θ (e a+θx ) θ R Glynn, Juneja: Ordinal optimization 22

Negative Result 1 Failure of the Naive Glynn, Juneja: Ordinal optimization 23

Returning to two phase procedure We generate samples X 1,..., X m for m = log(1/δ) and set Î m (0) = inf θ ˆΛ m (θ). Glynn, Juneja: Ordinal optimization 24

Returning to two phase procedure We generate samples X 1,..., X m for m = log(1/δ) and set Î m (0) = inf θ ˆΛ m (θ). Then generate log(1/δ)/î m (0) = m/î m (0) samples of X in the second phase. Glynn, Juneja: Ordinal optimization 24

Returning to two phase procedure We generate samples X 1,..., X m for m = log(1/δ) and set Î m (0) = inf θ ˆΛ m (θ). Then generate log(1/δ)/î m (0) = m/î m (0) samples of X in the second phase. ( ) P(FS) E exp m Î m (0) I (0) Errors due to large values of Î m (0) that lead to under sampling in second phase - Due to conspiratorial large deviations behaviour of all the terms. Glynn, Juneja: Ordinal optimization 24

Lower Bound for P(FS) Can show lim m 1 m ( log P(FS) = sup sup I (0) ) a>0 θ a I θ(e a ). Glynn, Juneja: Ordinal optimization 25

Lower Bound for P(FS) Can show lim m 1 m ( log P(FS) = sup sup I (0) ) a>0 θ a I θ(e a ). In particular, lim inf δ 0 P(FS)δ 1 > 1. Glynn, Juneja: Ordinal optimization 25

Negative Result 2 Fall of the Sophisticated Glynn, Juneja: Ordinal optimization 26

Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). Glynn, Juneja: Ordinal optimization 27

Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). If exp( mîm(0)) δ, stop. Glynn, Juneja: Ordinal optimization 27

Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). If exp( mîm(0)) δ, stop. Else, provide another c log(1/δ) of computational budget and so on. Glynn, Juneja: Ordinal optimization 27

Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). If exp( mîm(0)) δ, stop. Else, provide another c log(1/δ) of computational budget and so on. Can identify distributions for which this would not be accurate. Glynn, Juneja: Ordinal optimization 27

Need to find X with EX < 0 so that P( X m 0 and exp( mî m (0)) δ) > δ. Glynn, Juneja: Ordinal optimization 28

Need to find X with EX < 0 so that P( X m 0 and exp( mî m (0)) δ) > δ. This follows if for some θ < 0, I θ (e 1/c ) < 1/c. Glynn, Juneja: Ordinal optimization 28

Need to find X with EX < 0 so that P( X m 0 and exp( mî m (0)) δ) > δ. This follows if for some θ < 0, I θ (e 1/c ) < 1/c. Can be shown for some light tailed distributions even for large c. Glynn, Juneja: Ordinal optimization 28

Λ(θ) Graphic view: I (0) < 1/c and Îm(0) > 1/c -1/c θ Glynn, Juneja: Ordinal optimization 29

Negative Result 3 Perdition Glynn, Juneja: Ordinal optimization 30

Let L denote a collection of probability distributions with finite mean, unbounded positive support, open under Kullback-Leibler distance. Glynn, Juneja: Ordinal optimization 31

Let L denote a collection of probability distributions with finite mean, unbounded positive support, open under Kullback-Leibler distance. Let P(ɛ, δ) denote a policy that can adaptively sample from any two distributions in L with lim sup P(FS)δ 1 1. δ 0 Glynn, Juneja: Ordinal optimization 31

Let L denote a collection of probability distributions with finite mean, unbounded positive support, open under Kullback-Leibler distance. Let P(ɛ, δ) denote a policy that can adaptively sample from any two distributions in L with lim sup P(FS)δ 1 1. δ 0 Result - For any two such distributions in L with arbitrarily apart mean, P(ɛ, δ) policy on average requires more than O(log(1/δ)) samples. Glynn, Juneja: Ordinal optimization 31

Details Under probability P a, {X i } has distribution F, mean µ F, {Yi } has distribution G, mean µ G < µ F ɛ. Glynn, Juneja: Ordinal optimization 32

Details Under probability P a, {X i } has distribution F, mean µ F, {Yi } has distribution G, mean µ G < µ F ɛ. Under probability P b {Xi } has distribution F, {Yi } has distribution G > µ F + ɛ. Glynn, Juneja: Ordinal optimization 32

Details Under probability P a, {X i } has distribution F, mean µ F, {Yi } has distribution G, mean µ G < µ F ɛ. Under probability P b {Xi } has distribution F, {Yi } has distribution G > µ F + ɛ. Under P(ɛ, δ), lim inf δ 0 where KL(G, G) = x R E a T G log(1/δ) 1 3 KL(G, G). ( ) log dg d G (x) dg(x)). Glynn, Juneja: Ordinal optimization 32

Proof similar to Lai and Robbins 1985, Mannor and Tsitsiklis 2004 Asymptotically, P a ( algorithm selects F) 1 δ Glynn, Juneja: Ordinal optimization 33

Proof similar to Lai and Robbins 1985, Mannor and Tsitsiklis 2004 Asymptotically, P a ( algorithm selects F) 1 δ P b ( algorithm selects F) δ. Glynn, Juneja: Ordinal optimization 33

Proof similar to Lai and Robbins 1985, Mannor and Tsitsiklis 2004 Asymptotically, P a ( algorithm selects F) 1 δ P b ( algorithm selects F) δ. ( TG d P b ( algo. selects F) = E G ) a dg (Y i)i ( algo. selects F) i=1 ) E a (e T G KL(G, G) I ( algo. selects F) and the result is easily deduced. e 2Ea(T G ) KL(G, G) P a ( algo. selects F) Glynn, Juneja: Ordinal optimization 33

Result Given G with finite mean and unbounded positive support, for any α > 0, and k > µ G there exists a distribution G k such that KL(G, G k ) α and µ Gk k. Glynn, Juneja: Ordinal optimization 34

Way forward Glynn, Juneja: Ordinal optimization 35

Additional information needed to attain log(1/δ) convergence rates. Glynn, Juneja: Ordinal optimization 36

Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Glynn, Juneja: Ordinal optimization 36

Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Easy to develop such bounds once suitable Lyapunov functions can be identified (not to be discussed here) Glynn, Juneja: Ordinal optimization 36

Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Easy to develop such bounds once suitable Lyapunov functions can be identified (not to be discussed here) Use such bounds to develop (ɛ, δ) strategies by truncating random variables while controlling the error to be less than ɛ. Then use Hoeffding. Glynn, Juneja: Ordinal optimization 36

Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Easy to develop such bounds once suitable Lyapunov functions can be identified (not to be discussed here) Use such bounds to develop (ɛ, δ) strategies by truncating random variables while controlling the error to be less than ɛ. Then use Hoeffding. Recent multi-armed-bandits methods do this in a sequential and adaptive manner. Glynn, Juneja: Ordinal optimization 36

δ guarantees using log(1/δ) samples: Static approach Glynn, Juneja: Ordinal optimization 37

P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Glynn, Juneja: Ordinal optimization 38

P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. Glynn, Juneja: Ordinal optimization 38

P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. Glynn, Juneja: Ordinal optimization 38

P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. If X n < 0, declare, EX < 0. Glynn, Juneja: Ordinal optimization 38

P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. If X n < 0, declare, EX < 0. Hoeffding s inequality can be used to bound probability of false selection. Suppose, EX < ɛ, P( X n 0) P( X n EX ɛ) exp( 2nɛ 2 /(b a) 2 ) Glynn, Juneja: Ordinal optimization 38

P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. If X n < 0, declare, EX < 0. Hoeffding s inequality can be used to bound probability of false selection. Suppose, EX < ɛ, P( X n 0) P( X n EX ɛ) exp( 2nɛ 2 /(b a) 2 ) Thus, n = (b a)2 2ɛ 2 log(1/δ) provides the desired P(ɛ, δ) policy. Glynn, Juneja: Ordinal optimization 38

Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, Glynn, Juneja: Ordinal optimization 39

Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, ( ) a f (b) max EXI (X u) u. X f (x) f (b) Glynn, Juneja: Ordinal optimization 39

Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, ( ) a f (b) max EXI (X u) u. X f (x) f (b) This follows from the optimisation problem max E[XI (X u)] X such that Ef (X ) a. Glynn, Juneja: Ordinal optimization 39

Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, ( ) a f (b) max EXI (X u) u. X f (x) f (b) This follows from the optimisation problem max E[XI (X u)] X such that Ef (X ) a. This has a two point solution relying on observation that Y = E[X X < u]i (X < u) + E[X X u]i (X u) is better than X Glynn, Juneja: Ordinal optimization 39

Pure Exploration Multi-Armed Bandit Approach Glynn, Juneja: Ordinal optimization 40

Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Glynn, Juneja: Ordinal optimization 41

Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Let a = arg max a A p a and let a = p a p a. Glynn, Juneja: Ordinal optimization 41

Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Let a = arg max a A p a and let a = p a p a. Even Dar et al. 2006 devise a sequential sampling strategy to find a with probability at least 1 δ. Glynn, Juneja: Ordinal optimization 41

Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Let a = arg max a A p a and let a = p a p a. Even Dar et al. 2006 devise a sequential sampling strategy to find a with probability at least 1 δ. Expected computational effort O a a ln(n/δ) 2 a. Glynn, Juneja: Ordinal optimization 41

Popular successive rejection algorithm Sample every arm a once and let ˆµ t a be the average reward of arm a by time t; Glynn, Juneja: Ordinal optimization 42

Popular successive rejection algorithm Sample every arm a once and let ˆµ t a be the average reward of arm a by time t; Each arm a such that ˆµ t max ˆµ t a 2α t is removed from consideration. α t = log(5nt 2 /δ)/t; Glynn, Juneja: Ordinal optimization 42

Popular successive rejection algorithm Sample every arm a once and let ˆµ t a be the average reward of arm a by time t; Each arm a such that ˆµ t max ˆµ t a 2α t is removed from consideration. α t = log(5nt 2 /δ)/t; t = t + 1; Repeat till one arm left. Glynn, Juneja: Ordinal optimization 42

Key idea 1 µ2 µ1 0 t Glynn, Juneja: Ordinal optimization 43

Generalizing to heavy tails In Bubeck, Cesa-Bianchi, Lugosi 2013, they develop log(1/δ) algorithms in regret settings when 1 + ɛ moments of each arm output are available. Glynn, Juneja: Ordinal optimization 44

Generalizing to heavy tails In Bubeck, Cesa-Bianchi, Lugosi 2013, they develop log(1/δ) algorithms in regret settings when 1 + ɛ moments of each arm output are available. Analysis again relies on forming a cone, which they do through truncation and clever usage of Bernstein inequality. Glynn, Juneja: Ordinal optimization 44

Generalizing to heavy tails In Bubeck, Cesa-Bianchi, Lugosi 2013, they develop log(1/δ) algorithms in regret settings when 1 + ɛ moments of each arm output are available. Analysis again relies on forming a cone, which they do through truncation and clever usage of Bernstein inequality. We adapt these algorithms to pure exploration settings. Glynn, Juneja: Ordinal optimization 44

In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection Glynn, Juneja: Ordinal optimization 45

In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection However, we show through a series of negative results that this convergence rate, or equivalently O(log(1/δ)) computation algorithms, are not possible for unbounded support distributions without further restrictions. Glynn, Juneja: Ordinal optimization 45

In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection However, we show through a series of negative results that this convergence rate, or equivalently O(log(1/δ)) computation algorithms, are not possible for unbounded support distributions without further restrictions. Under explicit restrictions on moments of underlying random variables, we devise O(log(1/δ)) algorithms. Glynn, Juneja: Ordinal optimization 45

In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection However, we show through a series of negative results that this convergence rate, or equivalently O(log(1/δ)) computation algorithms, are not possible for unbounded support distributions without further restrictions. Under explicit restrictions on moments of underlying random variables, we devise O(log(1/δ)) algorithms. These are closely related to evolving multi-arm bandit literature on pure exploration methods. Glynn, Juneja: Ordinal optimization 45