Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Sandeep Juneja Tata Institute of Fundamental Research Mumbai, India joint work with Peter Glynn Applied Probability Frontiers - Banff event June 1, 2015 Glynn, Juneja: Ordinal optimization 1
The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Glynn, Juneja: Ordinal optimization 2
The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Goal is only to identify the best design and not to actually estimate the performance. Glynn, Juneja: Ordinal optimization 2
The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Goal is only to identify the best design and not to actually estimate the performance. We have the ability to generate iid realizations of each of the d random variables. Glynn, Juneja: Ordinal optimization 2
The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Goal is only to identify the best design and not to actually estimate the performance. We have the ability to generate iid realizations of each of the d random variables. We focus primarily on d = 2, so given independent samples of X we want to find if EX > 0 or EX < 0. Glynn, Juneja: Ordinal optimization 2
Determining the smallest mean population Sample Values 1 2 3 Popula'on Glynn, Juneja: Ordinal optimization 3
Talk Overview Estimating difference of mean values relies on central limit theorem with an associated slow n 1/2 convergence rate. Glynn, Juneja: Ordinal optimization 4
Talk Overview Estimating difference of mean values relies on central limit theorem with an associated slow n 1/2 convergence rate. Ho et al. (1990) observed that identifying the best system typically has a faster convergence rate. Glynn, Juneja: Ordinal optimization 4
Talk Overview Estimating difference of mean values relies on central limit theorem with an associated slow n 1/2 convergence rate. Ho et al. (1990) observed that identifying the best system typically has a faster convergence rate. Finding EX < EY, or EX > EY easier than determining the value EX EY. Glynn, Juneja: Ordinal optimization 4
L. Dai (1996) showed using large deviation methods that the probability of false selection decays at an exponential rate under mild light tailed assumptions. Thus, if EX 1 < min i 2 EX i then 1 lim n n log P( X 1 (n) > min X i (n)) = I i 2 for large deviations rate function I > 0. Glynn, Juneja: Ordinal optimization 5
L. Dai (1996) showed using large deviation methods that the probability of false selection decays at an exponential rate under mild light tailed assumptions. Thus, if EX 1 < min i 2 EX i then 1 lim n n log P( X 1 (n) > min X i (n)) = I i 2 for large deviations rate function I > 0. In a series of papers by Chen, Yucesan, L Dai, Chick and others (2000, 200*) attempted to optimize the budget allocated to each population under Normality assumption. Glynn, Juneja: Ordinal optimization 5
L. Dai (1996) showed using large deviation methods that the probability of false selection decays at an exponential rate under mild light tailed assumptions. Thus, if EX 1 < min i 2 EX i then 1 lim n n log P( X 1 (n) > min X i (n)) = I i 2 for large deviations rate function I > 0. In a series of papers by Chen, Yucesan, L Dai, Chick and others (2000, 200*) attempted to optimize the budget allocated to each population under Normality assumption. Substantial literature on selecting the best system amongst many alternatives using ranking/selection procedures. Gaussian assumption is critical to most of the analysis (Nelson and others) Glynn, Juneja: Ordinal optimization 5
Glynn and J (2004) observed that if then for p i > 0, d i=1 p i = 1 P( X 1 (p 1 n) > min i 2 EX 1 < min i 2 EX i X i (p i n)) e nh(p 1,...,p d ) so that H(p 1,..., p d ) can be optimised to determine optimal allocations. Glynn, Juneja: Ordinal optimization 6
Glynn and J (2004) observed that if then for p i > 0, d i=1 p i = 1 P( X 1 (p 1 n) > min i 2 EX 1 < min i 2 EX i X i (p i n)) e nh(p 1,...,p d ) so that H(p 1,..., p d ) can be optimised to determine optimal allocations. Significant literature since then relying on large deviations analysis (E.g., Hunter and Pasupathy 2013, Szechtman and Yucesan 2008, Broadie, Han Zeevi 2007, Blanchet, Liu, Zwart 2008). Glynn, Juneja: Ordinal optimization 6
Some observations If P(FS) e ni, for some I > 0, then n = 1 I log(1/δ)) ensures P(FS) δ. Glynn, Juneja: Ordinal optimization 7
Some observations If P(FS) e ni, for some I > 0, then n = 1 I log(1/δ)) ensures P(FS) δ. O(log(1/δ)) effort is necessary. If log(1/δ) 1 ɛ samples are generated, then as δ 0. positive no. P(X i A) log(1/δ)1 ɛ = δ log(1/δ) ɛ >> δ Glynn, Juneja: Ordinal optimization 7
HOPE However methods relying on P(FS) e ni, rely on estimating the large deviations rate function I from the samples generated. Glynn, Juneja: Ordinal optimization 8
HOPE However methods relying on P(FS) e ni, rely on estimating the large deviations rate function I from the samples generated. One hopes for algorihms that for n = O(log(1/δ)) ensure that at least asymptotically P(FS) δ, that is, lim sup P(FS)δ 1 1 δ 0 even when the means are separated by a fixed and known ɛ > 0. So that min 1 i d EX i < EX j ɛ for all suboptimal j. Glynn, Juneja: Ordinal optimization 8
Contributions We argue through two popular implementations that these rate functions are difficult to estimate accurately using O(log(1/δ)) samples Glynn, Juneja: Ordinal optimization 9
Contributions We argue through two popular implementations that these rate functions are difficult to estimate accurately using O(log(1/δ)) samples Enroute, we identify the large deviations rate function of the empirically estimated rate function. This may be of independent interest. Glynn, Juneja: Ordinal optimization 9
Key negative result Given any (ɛ, δ) algorithm - one that correctly separates designs with mean difference at least ɛ with lim sup P(FS)δ 1 1, δ 0 Glynn, Juneja: Ordinal optimization 10
Key negative result Given any (ɛ, δ) algorithm - one that correctly separates designs with mean difference at least ɛ with lim sup P(FS)δ 1 1, δ 0 We prove that for populations with unbounded support under mild restrictions, the expected number of samples cannot be O(log(1/δ)). Glynn, Juneja: Ordinal optimization 10
Positive contributions Under explicitly available moment upper bounds, we develop truncation based O(log(1/δ)) computation time (ɛ, δ) algorithms. Glynn, Juneja: Ordinal optimization 11
Positive contributions Under explicitly available moment upper bounds, we develop truncation based O(log(1/δ)) computation time (ɛ, δ) algorithms. We also adapt the recently proposed sequential algorithms in multi-armed bandit regret setting to this pure exploration setting. Glynn, Juneja: Ordinal optimization 11
Two phase ordinal optimisation implementation Glynn, Juneja: Ordinal optimization 12
Basic large deviations theory Suppose X 1, X 2,..., X n are i.i.d. samples of X and a > EX. Glynn, Juneja: Ordinal optimization 13
Basic large deviations theory Suppose X 1, X 2,..., X n are i.i.d. samples of X and a > EX. Then, for θ > 0, Cramer s bound P( X n a) e n(θa Λ(θ)) where Λ(θ) = log Ee θx. Glynn, Juneja: Ordinal optimization 13
Basic large deviations theory Suppose X 1, X 2,..., X n are i.i.d. samples of X and a > EX. Then, for θ > 0, Cramer s bound where Λ(θ) = log Ee θx. Cramer s Theorem P( X n a) e n(θa Λ(θ)) P( X ni (a)(1+o(1)) n a) = e where, the large deviations rate function I (a) = sup (θa Λ(θ)). θ R Glynn, Juneja: Ordinal optimization 13
The rate function I(x) EX x Glynn, Juneja: Ordinal optimization 14
A simple setting Consider a single rv X with unknown mean EX. Need to decide whether EX > 0 or EX < 0 with error probability δ. Glynn, Juneja: Ordinal optimization 15
A simple setting Consider a single rv X with unknown mean EX. Need to decide whether EX > 0 or EX < 0 with error probability δ. Then if EX < 0, we may take exp( ni (0)) as a proxy for P( X n 0), the probability of false selection. Glynn, Juneja: Ordinal optimization 15
A simple setting Consider a single rv X with unknown mean EX. Need to decide whether EX > 0 or EX < 0 with error probability δ. Then if EX < 0, we may take exp( ni (0)) as a proxy for P( X n 0), the probability of false selection. If EX > 0, we may again take exp( ni (0)) as a proxy for P( X n 0), the probability of false selection. Glynn, Juneja: Ordinal optimization 15
A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Glynn, Juneja: Ordinal optimization 16
A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is Glynn, Juneja: Ordinal optimization 16
A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Glynn, Juneja: Ordinal optimization 16
A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Second phase - Generate log(1/δ)/îm(0) = m/îm(0) n samples of X. Glynn, Juneja: Ordinal optimization 16
A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Second phase - Generate log(1/δ)/îm(0) = m/îm(0) n samples of X. Decide the sign of EX based on whether X n > 0 or X n 0. Glynn, Juneja: Ordinal optimization 16
A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Second phase - Generate log(1/δ)/îm(0) = m/îm(0) n samples of X. Decide the sign of EX based on whether X n > 0 or X n 0. We now discuss estimation of I (0). Glynn, Juneja: Ordinal optimization 16
Estimating rate function Glynn, Juneja: Ordinal optimization 17
Graphic view of I(0) The log-moment generating function of X Λ(θ) = log E exp(θx ) is convex with Λ(0) = 0 and Λ (0) = EX. Glynn, Juneja: Ordinal optimization 18
Graphic view of I(0) The log-moment generating function of X Λ(θ) = log E exp(θx ) is convex with Λ(0) = 0 and Λ (0) = EX. Then, I (0) = inf θ Λ(θ). Glynn, Juneja: Ordinal optimization 18
Λ(θ) Graphic view of I(0) The log-moment generating function of X Λ(θ) = log E exp(θx ) is convex with Λ(0) = 0 and Λ (0) = EX. Then, I (0) = inf θ Λ(θ). EX I(0) Glynn, Juneja: Ordinal optimization 18 θ
Estimating rate function I (0) A natural estimator for I (0) based on samples (X i : 1 i m) is Î m (0) = inf ˆΛ m (θ) θ R where ( ) 1 m ˆΛ m (θ) = log exp(θx i ) m i=1 Glynn, Juneja: Ordinal optimization 19
Λ(θ) Graphic view of estimated log moment generating function θ Im(0) Glynn, Juneja: Ordinal optimization 20
Large deviations rate function of Îm(0) Can show that for a > I (0), lim m 1 m log P(Îm(0) a), equals Glynn, Juneja: Ordinal optimization 21
Large deviations rate function of Îm(0) Can show that for a > I (0), lim m where lim m ( 1 m log P inf θ 1 m log P(Îm(0) a), equals 1 m m i=1 e θx i e a ) = inf θ R I θ(e a ), I θ (ν) = sup(αν log E exp(αe θx )). α Glynn, Juneja: Ordinal optimization 21
Large deviations rate function of Îm(0) Can show that for a > I (0), lim m where lim m ( 1 m log P inf θ 1 m log P(Îm(0) a), equals 1 m m i=1 e θx i e a ) = inf θ R I θ(e a ), I θ (ν) = sup(αν log E exp(αe θx )). α Further, for a < I (0), lim m 1 m log P(Îm(0) a) = sup θ R I θ (e a ) Glynn, Juneja: Ordinal optimization 21
Also, the natural estimator for I (x) is Î m (x) = sup(θx ˆΛ m (θ)). θ R Glynn, Juneja: Ordinal optimization 22
Also, the natural estimator for I (x) is Î m (x) = sup(θx ˆΛ m (θ)). θ R it can be seen that for a > I (x), the rate function for Î m (x), equals inf θ R I θ(e a+θx ) for a > I (x), and equals a < I (x). sup I θ (e a+θx ) θ R Glynn, Juneja: Ordinal optimization 22
Negative Result 1 Failure of the Naive Glynn, Juneja: Ordinal optimization 23
Returning to two phase procedure We generate samples X 1,..., X m for m = log(1/δ) and set Î m (0) = inf θ ˆΛ m (θ). Glynn, Juneja: Ordinal optimization 24
Returning to two phase procedure We generate samples X 1,..., X m for m = log(1/δ) and set Î m (0) = inf θ ˆΛ m (θ). Then generate log(1/δ)/î m (0) = m/î m (0) samples of X in the second phase. Glynn, Juneja: Ordinal optimization 24
Returning to two phase procedure We generate samples X 1,..., X m for m = log(1/δ) and set Î m (0) = inf θ ˆΛ m (θ). Then generate log(1/δ)/î m (0) = m/î m (0) samples of X in the second phase. ( ) P(FS) E exp m Î m (0) I (0) Errors due to large values of Î m (0) that lead to under sampling in second phase - Due to conspiratorial large deviations behaviour of all the terms. Glynn, Juneja: Ordinal optimization 24
Lower Bound for P(FS) Can show lim m 1 m ( log P(FS) = sup sup I (0) ) a>0 θ a I θ(e a ). Glynn, Juneja: Ordinal optimization 25
Lower Bound for P(FS) Can show lim m 1 m ( log P(FS) = sup sup I (0) ) a>0 θ a I θ(e a ). In particular, lim inf δ 0 P(FS)δ 1 > 1. Glynn, Juneja: Ordinal optimization 25
Negative Result 2 Fall of the Sophisticated Glynn, Juneja: Ordinal optimization 26
Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). Glynn, Juneja: Ordinal optimization 27
Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). If exp( mîm(0)) δ, stop. Glynn, Juneja: Ordinal optimization 27
Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). If exp( mîm(0)) δ, stop. Else, provide another c log(1/δ) of computational budget and so on. Glynn, Juneja: Ordinal optimization 27
Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). If exp( mîm(0)) δ, stop. Else, provide another c log(1/δ) of computational budget and so on. Can identify distributions for which this would not be accurate. Glynn, Juneja: Ordinal optimization 27
Need to find X with EX < 0 so that P( X m 0 and exp( mî m (0)) δ) > δ. Glynn, Juneja: Ordinal optimization 28
Need to find X with EX < 0 so that P( X m 0 and exp( mî m (0)) δ) > δ. This follows if for some θ < 0, I θ (e 1/c ) < 1/c. Glynn, Juneja: Ordinal optimization 28
Need to find X with EX < 0 so that P( X m 0 and exp( mî m (0)) δ) > δ. This follows if for some θ < 0, I θ (e 1/c ) < 1/c. Can be shown for some light tailed distributions even for large c. Glynn, Juneja: Ordinal optimization 28
Λ(θ) Graphic view: I (0) < 1/c and Îm(0) > 1/c -1/c θ Glynn, Juneja: Ordinal optimization 29
Negative Result 3 Perdition Glynn, Juneja: Ordinal optimization 30
Let L denote a collection of probability distributions with finite mean, unbounded positive support, open under Kullback-Leibler distance. Glynn, Juneja: Ordinal optimization 31
Let L denote a collection of probability distributions with finite mean, unbounded positive support, open under Kullback-Leibler distance. Let P(ɛ, δ) denote a policy that can adaptively sample from any two distributions in L with lim sup P(FS)δ 1 1. δ 0 Glynn, Juneja: Ordinal optimization 31
Let L denote a collection of probability distributions with finite mean, unbounded positive support, open under Kullback-Leibler distance. Let P(ɛ, δ) denote a policy that can adaptively sample from any two distributions in L with lim sup P(FS)δ 1 1. δ 0 Result - For any two such distributions in L with arbitrarily apart mean, P(ɛ, δ) policy on average requires more than O(log(1/δ)) samples. Glynn, Juneja: Ordinal optimization 31
Details Under probability P a, {X i } has distribution F, mean µ F, {Yi } has distribution G, mean µ G < µ F ɛ. Glynn, Juneja: Ordinal optimization 32
Details Under probability P a, {X i } has distribution F, mean µ F, {Yi } has distribution G, mean µ G < µ F ɛ. Under probability P b {Xi } has distribution F, {Yi } has distribution G > µ F + ɛ. Glynn, Juneja: Ordinal optimization 32
Details Under probability P a, {X i } has distribution F, mean µ F, {Yi } has distribution G, mean µ G < µ F ɛ. Under probability P b {Xi } has distribution F, {Yi } has distribution G > µ F + ɛ. Under P(ɛ, δ), lim inf δ 0 where KL(G, G) = x R E a T G log(1/δ) 1 3 KL(G, G). ( ) log dg d G (x) dg(x)). Glynn, Juneja: Ordinal optimization 32
Proof similar to Lai and Robbins 1985, Mannor and Tsitsiklis 2004 Asymptotically, P a ( algorithm selects F) 1 δ Glynn, Juneja: Ordinal optimization 33
Proof similar to Lai and Robbins 1985, Mannor and Tsitsiklis 2004 Asymptotically, P a ( algorithm selects F) 1 δ P b ( algorithm selects F) δ. Glynn, Juneja: Ordinal optimization 33
Proof similar to Lai and Robbins 1985, Mannor and Tsitsiklis 2004 Asymptotically, P a ( algorithm selects F) 1 δ P b ( algorithm selects F) δ. ( TG d P b ( algo. selects F) = E G ) a dg (Y i)i ( algo. selects F) i=1 ) E a (e T G KL(G, G) I ( algo. selects F) and the result is easily deduced. e 2Ea(T G ) KL(G, G) P a ( algo. selects F) Glynn, Juneja: Ordinal optimization 33
Result Given G with finite mean and unbounded positive support, for any α > 0, and k > µ G there exists a distribution G k such that KL(G, G k ) α and µ Gk k. Glynn, Juneja: Ordinal optimization 34
Way forward Glynn, Juneja: Ordinal optimization 35
Additional information needed to attain log(1/δ) convergence rates. Glynn, Juneja: Ordinal optimization 36
Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Glynn, Juneja: Ordinal optimization 36
Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Easy to develop such bounds once suitable Lyapunov functions can be identified (not to be discussed here) Glynn, Juneja: Ordinal optimization 36
Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Easy to develop such bounds once suitable Lyapunov functions can be identified (not to be discussed here) Use such bounds to develop (ɛ, δ) strategies by truncating random variables while controlling the error to be less than ɛ. Then use Hoeffding. Glynn, Juneja: Ordinal optimization 36
Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Easy to develop such bounds once suitable Lyapunov functions can be identified (not to be discussed here) Use such bounds to develop (ɛ, δ) strategies by truncating random variables while controlling the error to be less than ɛ. Then use Hoeffding. Recent multi-armed-bandits methods do this in a sequential and adaptive manner. Glynn, Juneja: Ordinal optimization 36
δ guarantees using log(1/δ) samples: Static approach Glynn, Juneja: Ordinal optimization 37
P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Glynn, Juneja: Ordinal optimization 38
P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. Glynn, Juneja: Ordinal optimization 38
P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. Glynn, Juneja: Ordinal optimization 38
P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. If X n < 0, declare, EX < 0. Glynn, Juneja: Ordinal optimization 38
P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. If X n < 0, declare, EX < 0. Hoeffding s inequality can be used to bound probability of false selection. Suppose, EX < ɛ, P( X n 0) P( X n EX ɛ) exp( 2nɛ 2 /(b a) 2 ) Glynn, Juneja: Ordinal optimization 38
P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. If X n < 0, declare, EX < 0. Hoeffding s inequality can be used to bound probability of false selection. Suppose, EX < ɛ, P( X n 0) P( X n EX ɛ) exp( 2nɛ 2 /(b a) 2 ) Thus, n = (b a)2 2ɛ 2 log(1/δ) provides the desired P(ɛ, δ) policy. Glynn, Juneja: Ordinal optimization 38
Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, Glynn, Juneja: Ordinal optimization 39
Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, ( ) a f (b) max EXI (X u) u. X f (x) f (b) Glynn, Juneja: Ordinal optimization 39
Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, ( ) a f (b) max EXI (X u) u. X f (x) f (b) This follows from the optimisation problem max E[XI (X u)] X such that Ef (X ) a. Glynn, Juneja: Ordinal optimization 39
Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, ( ) a f (b) max EXI (X u) u. X f (x) f (b) This follows from the optimisation problem max E[XI (X u)] X such that Ef (X ) a. This has a two point solution relying on observation that Y = E[X X < u]i (X < u) + E[X X u]i (X u) is better than X Glynn, Juneja: Ordinal optimization 39
Pure Exploration Multi-Armed Bandit Approach Glynn, Juneja: Ordinal optimization 40
Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Glynn, Juneja: Ordinal optimization 41
Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Let a = arg max a A p a and let a = p a p a. Glynn, Juneja: Ordinal optimization 41
Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Let a = arg max a A p a and let a = p a p a. Even Dar et al. 2006 devise a sequential sampling strategy to find a with probability at least 1 δ. Glynn, Juneja: Ordinal optimization 41
Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Let a = arg max a A p a and let a = p a p a. Even Dar et al. 2006 devise a sequential sampling strategy to find a with probability at least 1 δ. Expected computational effort O a a ln(n/δ) 2 a. Glynn, Juneja: Ordinal optimization 41
Popular successive rejection algorithm Sample every arm a once and let ˆµ t a be the average reward of arm a by time t; Glynn, Juneja: Ordinal optimization 42
Popular successive rejection algorithm Sample every arm a once and let ˆµ t a be the average reward of arm a by time t; Each arm a such that ˆµ t max ˆµ t a 2α t is removed from consideration. α t = log(5nt 2 /δ)/t; Glynn, Juneja: Ordinal optimization 42
Popular successive rejection algorithm Sample every arm a once and let ˆµ t a be the average reward of arm a by time t; Each arm a such that ˆµ t max ˆµ t a 2α t is removed from consideration. α t = log(5nt 2 /δ)/t; t = t + 1; Repeat till one arm left. Glynn, Juneja: Ordinal optimization 42
Key idea 1 µ2 µ1 0 t Glynn, Juneja: Ordinal optimization 43
Generalizing to heavy tails In Bubeck, Cesa-Bianchi, Lugosi 2013, they develop log(1/δ) algorithms in regret settings when 1 + ɛ moments of each arm output are available. Glynn, Juneja: Ordinal optimization 44
Generalizing to heavy tails In Bubeck, Cesa-Bianchi, Lugosi 2013, they develop log(1/δ) algorithms in regret settings when 1 + ɛ moments of each arm output are available. Analysis again relies on forming a cone, which they do through truncation and clever usage of Bernstein inequality. Glynn, Juneja: Ordinal optimization 44
Generalizing to heavy tails In Bubeck, Cesa-Bianchi, Lugosi 2013, they develop log(1/δ) algorithms in regret settings when 1 + ɛ moments of each arm output are available. Analysis again relies on forming a cone, which they do through truncation and clever usage of Bernstein inequality. We adapt these algorithms to pure exploration settings. Glynn, Juneja: Ordinal optimization 44
In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection Glynn, Juneja: Ordinal optimization 45
In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection However, we show through a series of negative results that this convergence rate, or equivalently O(log(1/δ)) computation algorithms, are not possible for unbounded support distributions without further restrictions. Glynn, Juneja: Ordinal optimization 45
In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection However, we show through a series of negative results that this convergence rate, or equivalently O(log(1/δ)) computation algorithms, are not possible for unbounded support distributions without further restrictions. Under explicit restrictions on moments of underlying random variables, we devise O(log(1/δ)) algorithms. Glynn, Juneja: Ordinal optimization 45
In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection However, we show through a series of negative results that this convergence rate, or equivalently O(log(1/δ)) computation algorithms, are not possible for unbounded support distributions without further restrictions. Under explicit restrictions on moments of underlying random variables, we devise O(log(1/δ)) algorithms. These are closely related to evolving multi-arm bandit literature on pure exploration methods. Glynn, Juneja: Ordinal optimization 45