Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods

Size: px

Start display at page:

Download "Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods"

Lorin French
5 years ago
Views:

1 Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Sandeep Juneja Tata Institute of Fundamental Research Mumbai, India joint work with Peter Glynn Applied Probability Frontiers - Banff event June 1, 2015 Glynn, Juneja: Ordinal optimization 1

2 The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Glynn, Juneja: Ordinal optimization 2

3 The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Goal is only to identify the best design and not to actually estimate the performance. Glynn, Juneja: Ordinal optimization 2

4 The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Goal is only to identify the best design and not to actually estimate the performance. We have the ability to generate iid realizations of each of the d random variables. Glynn, Juneja: Ordinal optimization 2

5 The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Goal is only to identify the best design and not to actually estimate the performance. We have the ability to generate iid realizations of each of the d random variables. We focus primarily on d = 2, so given independent samples of X we want to find if EX > 0 or EX < 0. Glynn, Juneja: Ordinal optimization 2

6 Determining the smallest mean population Sample Values Popula'on Glynn, Juneja: Ordinal optimization 3

7 Talk Overview Estimating difference of mean values relies on central limit theorem with an associated slow n 1/2 convergence rate. Glynn, Juneja: Ordinal optimization 4

8 Talk Overview Estimating difference of mean values relies on central limit theorem with an associated slow n 1/2 convergence rate. Ho et al. (1990) observed that identifying the best system typically has a faster convergence rate. Glynn, Juneja: Ordinal optimization 4

9 Talk Overview Estimating difference of mean values relies on central limit theorem with an associated slow n 1/2 convergence rate. Ho et al. (1990) observed that identifying the best system typically has a faster convergence rate. Finding EX < EY, or EX > EY easier than determining the value EX EY. Glynn, Juneja: Ordinal optimization 4

10 L. Dai (1996) showed using large deviation methods that the probability of false selection decays at an exponential rate under mild light tailed assumptions. Thus, if EX 1 < min i 2 EX i then 1 lim n n log P( X 1 (n) > min X i (n)) = I i 2 for large deviations rate function I > 0. Glynn, Juneja: Ordinal optimization 5

11 L. Dai (1996) showed using large deviation methods that the probability of false selection decays at an exponential rate under mild light tailed assumptions. Thus, if EX 1 < min i 2 EX i then 1 lim n n log P( X 1 (n) > min X i (n)) = I i 2 for large deviations rate function I > 0. In a series of papers by Chen, Yucesan, L Dai, Chick and others (2000, 200*) attempted to optimize the budget allocated to each population under Normality assumption. Glynn, Juneja: Ordinal optimization 5

12 L. Dai (1996) showed using large deviation methods that the probability of false selection decays at an exponential rate under mild light tailed assumptions. Thus, if EX 1 < min i 2 EX i then 1 lim n n log P( X 1 (n) > min X i (n)) = I i 2 for large deviations rate function I > 0. In a series of papers by Chen, Yucesan, L Dai, Chick and others (2000, 200*) attempted to optimize the budget allocated to each population under Normality assumption. Substantial literature on selecting the best system amongst many alternatives using ranking/selection procedures. Gaussian assumption is critical to most of the analysis (Nelson and others) Glynn, Juneja: Ordinal optimization 5

13 Glynn and J (2004) observed that if then for p i > 0, d i=1 p i = 1 P( X 1 (p 1 n) > min i 2 EX 1 < min i 2 EX i X i (p i n)) e nh(p 1,...,p d ) so that H(p 1,..., p d ) can be optimised to determine optimal allocations. Glynn, Juneja: Ordinal optimization 6

14 Glynn and J (2004) observed that if then for p i > 0, d i=1 p i = 1 P( X 1 (p 1 n) > min i 2 EX 1 < min i 2 EX i X i (p i n)) e nh(p 1,...,p d ) so that H(p 1,..., p d ) can be optimised to determine optimal allocations. Significant literature since then relying on large deviations analysis (E.g., Hunter and Pasupathy 2013, Szechtman and Yucesan 2008, Broadie, Han Zeevi 2007, Blanchet, Liu, Zwart 2008). Glynn, Juneja: Ordinal optimization 6

15 Some observations If P(FS) e ni, for some I > 0, then n = 1 I log(1/δ)) ensures P(FS) δ. Glynn, Juneja: Ordinal optimization 7

16 Some observations If P(FS) e ni, for some I > 0, then n = 1 I log(1/δ)) ensures P(FS) δ. O(log(1/δ)) effort is necessary. If log(1/δ) 1 ɛ samples are generated, then as δ 0. positive no. P(X i A) log(1/δ)1 ɛ = δ log(1/δ) ɛ >> δ Glynn, Juneja: Ordinal optimization 7

17 HOPE However methods relying on P(FS) e ni, rely on estimating the large deviations rate function I from the samples generated. Glynn, Juneja: Ordinal optimization 8

18 HOPE However methods relying on P(FS) e ni, rely on estimating the large deviations rate function I from the samples generated. One hopes for algorihms that for n = O(log(1/δ)) ensure that at least asymptotically P(FS) δ, that is, lim sup P(FS)δ 1 1 δ 0 even when the means are separated by a fixed and known ɛ > 0. So that min 1 i d EX i < EX j ɛ for all suboptimal j. Glynn, Juneja: Ordinal optimization 8

19 Contributions We argue through two popular implementations that these rate functions are difficult to estimate accurately using O(log(1/δ)) samples Glynn, Juneja: Ordinal optimization 9

20 Contributions We argue through two popular implementations that these rate functions are difficult to estimate accurately using O(log(1/δ)) samples Enroute, we identify the large deviations rate function of the empirically estimated rate function. This may be of independent interest. Glynn, Juneja: Ordinal optimization 9

21 Key negative result Given any (ɛ, δ) algorithm - one that correctly separates designs with mean difference at least ɛ with lim sup P(FS)δ 1 1, δ 0 Glynn, Juneja: Ordinal optimization 10

22 Key negative result Given any (ɛ, δ) algorithm - one that correctly separates designs with mean difference at least ɛ with lim sup P(FS)δ 1 1, δ 0 We prove that for populations with unbounded support under mild restrictions, the expected number of samples cannot be O(log(1/δ)). Glynn, Juneja: Ordinal optimization 10

23 Positive contributions Under explicitly available moment upper bounds, we develop truncation based O(log(1/δ)) computation time (ɛ, δ) algorithms. Glynn, Juneja: Ordinal optimization 11

24 Positive contributions Under explicitly available moment upper bounds, we develop truncation based O(log(1/δ)) computation time (ɛ, δ) algorithms. We also adapt the recently proposed sequential algorithms in multi-armed bandit regret setting to this pure exploration setting. Glynn, Juneja: Ordinal optimization 11

25 Two phase ordinal optimisation implementation Glynn, Juneja: Ordinal optimization 12

26 Basic large deviations theory Suppose X 1, X 2,..., X n are i.i.d. samples of X and a > EX. Glynn, Juneja: Ordinal optimization 13

27 Basic large deviations theory Suppose X 1, X 2,..., X n are i.i.d. samples of X and a > EX. Then, for θ > 0, Cramer s bound P( X n a) e n(θa Λ(θ)) where Λ(θ) = log Ee θx. Glynn, Juneja: Ordinal optimization 13

28 Basic large deviations theory Suppose X 1, X 2,..., X n are i.i.d. samples of X and a > EX. Then, for θ > 0, Cramer s bound where Λ(θ) = log Ee θx. Cramer s Theorem P( X n a) e n(θa Λ(θ)) P( X ni (a)(1+o(1)) n a) = e where, the large deviations rate function I (a) = sup (θa Λ(θ)). θ R Glynn, Juneja: Ordinal optimization 13

29 The rate function I(x) EX x Glynn, Juneja: Ordinal optimization 14

30 A simple setting Consider a single rv X with unknown mean EX. Need to decide whether EX > 0 or EX < 0 with error probability δ. Glynn, Juneja: Ordinal optimization 15

31 A simple setting Consider a single rv X with unknown mean EX. Need to decide whether EX > 0 or EX < 0 with error probability δ. Then if EX < 0, we may take exp( ni (0)) as a proxy for P( X n 0), the probability of false selection. Glynn, Juneja: Ordinal optimization 15

32 A simple setting Consider a single rv X with unknown mean EX. Need to decide whether EX > 0 or EX < 0 with error probability δ. Then if EX < 0, we may take exp( ni (0)) as a proxy for P( X n 0), the probability of false selection. If EX > 0, we may again take exp( ni (0)) as a proxy for P( X n 0), the probability of false selection. Glynn, Juneja: Ordinal optimization 15

33 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Glynn, Juneja: Ordinal optimization 16

34 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is Glynn, Juneja: Ordinal optimization 16

35 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Glynn, Juneja: Ordinal optimization 16

36 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Second phase - Generate log(1/δ)/îm(0) = m/îm(0) n samples of X. Glynn, Juneja: Ordinal optimization 16

37 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Second phase - Generate log(1/δ)/îm(0) = m/îm(0) n samples of X. Decide the sign of EX based on whether X n > 0 or X n 0. Glynn, Juneja: Ordinal optimization 16

38 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Second phase - Generate log(1/δ)/îm(0) = m/îm(0) n samples of X. Decide the sign of EX based on whether X n > 0 or X n 0. We now discuss estimation of I (0). Glynn, Juneja: Ordinal optimization 16

39 Estimating rate function Glynn, Juneja: Ordinal optimization 17

40 Graphic view of I(0) The log-moment generating function of X Λ(θ) = log E exp(θx ) is convex with Λ(0) = 0 and Λ (0) = EX. Glynn, Juneja: Ordinal optimization 18

41 Graphic view of I(0) The log-moment generating function of X Λ(θ) = log E exp(θx ) is convex with Λ(0) = 0 and Λ (0) = EX. Then, I (0) = inf θ Λ(θ). Glynn, Juneja: Ordinal optimization 18

42 Λ(θ) Graphic view of I(0) The log-moment generating function of X Λ(θ) = log E exp(θx ) is convex with Λ(0) = 0 and Λ (0) = EX. Then, I (0) = inf θ Λ(θ). EX I(0) Glynn, Juneja: Ordinal optimization 18 θ

43 Estimating rate function I (0) A natural estimator for I (0) based on samples (X i : 1 i m) is Î m (0) = inf ˆΛ m (θ) θ R where ( ) 1 m ˆΛ m (θ) = log exp(θx i ) m i=1 Glynn, Juneja: Ordinal optimization 19

44 Λ(θ) Graphic view of estimated log moment generating function θ Im(0) Glynn, Juneja: Ordinal optimization 20

45 Large deviations rate function of Îm(0) Can show that for a > I (0), lim m 1 m log P(Îm(0) a), equals Glynn, Juneja: Ordinal optimization 21

46 Large deviations rate function of Îm(0) Can show that for a > I (0), lim m where lim m ( 1 m log P inf θ 1 m log P(Îm(0) a), equals 1 m m i=1 e θx i e a ) = inf θ R I θ(e a ), I θ (ν) = sup(αν log E exp(αe θx )). α Glynn, Juneja: Ordinal optimization 21

47 Large deviations rate function of Îm(0) Can show that for a > I (0), lim m where lim m ( 1 m log P inf θ 1 m log P(Îm(0) a), equals 1 m m i=1 e θx i e a ) = inf θ R I θ(e a ), I θ (ν) = sup(αν log E exp(αe θx )). α Further, for a < I (0), lim m 1 m log P(Îm(0) a) = sup θ R I θ (e a ) Glynn, Juneja: Ordinal optimization 21

48 Also, the natural estimator for I (x) is Î m (x) = sup(θx ˆΛ m (θ)). θ R Glynn, Juneja: Ordinal optimization 22

49 Also, the natural estimator for I (x) is Î m (x) = sup(θx ˆΛ m (θ)). θ R it can be seen that for a > I (x), the rate function for Î m (x), equals inf θ R I θ(e a+θx ) for a > I (x), and equals a < I (x). sup I θ (e a+θx ) θ R Glynn, Juneja: Ordinal optimization 22

50 Negative Result 1 Failure of the Naive Glynn, Juneja: Ordinal optimization 23

51 Returning to two phase procedure We generate samples X 1,..., X m for m = log(1/δ) and set Î m (0) = inf θ ˆΛ m (θ). Glynn, Juneja: Ordinal optimization 24

52 Returning to two phase procedure We generate samples X 1,..., X m for m = log(1/δ) and set Î m (0) = inf θ ˆΛ m (θ). Then generate log(1/δ)/î m (0) = m/î m (0) samples of X in the second phase. Glynn, Juneja: Ordinal optimization 24

53 Returning to two phase procedure We generate samples X 1,..., X m for m = log(1/δ) and set Î m (0) = inf θ ˆΛ m (θ). Then generate log(1/δ)/î m (0) = m/î m (0) samples of X in the second phase. ( ) P(FS) E exp m Î m (0) I (0) Errors due to large values of Î m (0) that lead to under sampling in second phase - Due to conspiratorial large deviations behaviour of all the terms. Glynn, Juneja: Ordinal optimization 24

54 Lower Bound for P(FS) Can show lim m 1 m ( log P(FS) = sup sup I (0) ) a>0 θ a I θ(e a ). Glynn, Juneja: Ordinal optimization 25

55 Lower Bound for P(FS) Can show lim m 1 m ( log P(FS) = sup sup I (0) ) a>0 θ a I θ(e a ). In particular, lim inf δ 0 P(FS)δ 1 > 1. Glynn, Juneja: Ordinal optimization 25

56 Negative Result 2 Fall of the Sophisticated Glynn, Juneja: Ordinal optimization 26

57 Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). Glynn, Juneja: Ordinal optimization 27

58 Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). If exp( mîm(0)) δ, stop. Glynn, Juneja: Ordinal optimization 27

59 Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). If exp( mîm(0)) δ, stop. Else, provide another c log(1/δ) of computational budget and so on. Glynn, Juneja: Ordinal optimization 27

60 Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). If exp( mîm(0)) δ, stop. Else, provide another c log(1/δ) of computational budget and so on. Can identify distributions for which this would not be accurate. Glynn, Juneja: Ordinal optimization 27

61 Need to find X with EX < 0 so that P( X m 0 and exp( mî m (0)) δ) > δ. Glynn, Juneja: Ordinal optimization 28

62 Need to find X with EX < 0 so that P( X m 0 and exp( mî m (0)) δ) > δ. This follows if for some θ < 0, I θ (e 1/c ) < 1/c. Glynn, Juneja: Ordinal optimization 28

63 Need to find X with EX < 0 so that P( X m 0 and exp( mî m (0)) δ) > δ. This follows if for some θ < 0, I θ (e 1/c ) < 1/c. Can be shown for some light tailed distributions even for large c. Glynn, Juneja: Ordinal optimization 28

64 Λ(θ) Graphic view: I (0) < 1/c and Îm(0) > 1/c -1/c θ Glynn, Juneja: Ordinal optimization 29

65 Negative Result 3 Perdition Glynn, Juneja: Ordinal optimization 30

66 Let L denote a collection of probability distributions with finite mean, unbounded positive support, open under Kullback-Leibler distance. Glynn, Juneja: Ordinal optimization 31

67 Let L denote a collection of probability distributions with finite mean, unbounded positive support, open under Kullback-Leibler distance. Let P(ɛ, δ) denote a policy that can adaptively sample from any two distributions in L with lim sup P(FS)δ 1 1. δ 0 Glynn, Juneja: Ordinal optimization 31

68 Let L denote a collection of probability distributions with finite mean, unbounded positive support, open under Kullback-Leibler distance. Let P(ɛ, δ) denote a policy that can adaptively sample from any two distributions in L with lim sup P(FS)δ 1 1. δ 0 Result - For any two such distributions in L with arbitrarily apart mean, P(ɛ, δ) policy on average requires more than O(log(1/δ)) samples. Glynn, Juneja: Ordinal optimization 31

69 Details Under probability P a, {X i } has distribution F, mean µ F, {Yi } has distribution G, mean µ G < µ F ɛ. Glynn, Juneja: Ordinal optimization 32

70 Details Under probability P a, {X i } has distribution F, mean µ F, {Yi } has distribution G, mean µ G < µ F ɛ. Under probability P b {Xi } has distribution F, {Yi } has distribution G > µ F + ɛ. Glynn, Juneja: Ordinal optimization 32

71 Details Under probability P a, {X i } has distribution F, mean µ F, {Yi } has distribution G, mean µ G < µ F ɛ. Under probability P b {Xi } has distribution F, {Yi } has distribution G > µ F + ɛ. Under P(ɛ, δ), lim inf δ 0 where KL(G, G) = x R E a T G log(1/δ) 1 3 KL(G, G). ( ) log dg d G (x) dg(x)). Glynn, Juneja: Ordinal optimization 32

72 Proof similar to Lai and Robbins 1985, Mannor and Tsitsiklis 2004 Asymptotically, P a ( algorithm selects F) 1 δ Glynn, Juneja: Ordinal optimization 33

73 Proof similar to Lai and Robbins 1985, Mannor and Tsitsiklis 2004 Asymptotically, P a ( algorithm selects F) 1 δ P b ( algorithm selects F) δ. Glynn, Juneja: Ordinal optimization 33

74 Proof similar to Lai and Robbins 1985, Mannor and Tsitsiklis 2004 Asymptotically, P a ( algorithm selects F) 1 δ P b ( algorithm selects F) δ. ( TG d P b ( algo. selects F) = E G ) a dg (Y i)i ( algo. selects F) i=1 ) E a (e T G KL(G, G) I ( algo. selects F) and the result is easily deduced. e 2Ea(T G ) KL(G, G) P a ( algo. selects F) Glynn, Juneja: Ordinal optimization 33

75 Result Given G with finite mean and unbounded positive support, for any α > 0, and k > µ G there exists a distribution G k such that KL(G, G k ) α and µ Gk k. Glynn, Juneja: Ordinal optimization 34

76 Way forward Glynn, Juneja: Ordinal optimization 35

77 Additional information needed to attain log(1/δ) convergence rates. Glynn, Juneja: Ordinal optimization 36

78 Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Glynn, Juneja: Ordinal optimization 36

79 Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Easy to develop such bounds once suitable Lyapunov functions can be identified (not to be discussed here) Glynn, Juneja: Ordinal optimization 36

80 Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Easy to develop such bounds once suitable Lyapunov functions can be identified (not to be discussed here) Use such bounds to develop (ɛ, δ) strategies by truncating random variables while controlling the error to be less than ɛ. Then use Hoeffding. Glynn, Juneja: Ordinal optimization 36

81 Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Easy to develop such bounds once suitable Lyapunov functions can be identified (not to be discussed here) Use such bounds to develop (ɛ, δ) strategies by truncating random variables while controlling the error to be less than ɛ. Then use Hoeffding. Recent multi-armed-bandits methods do this in a sequential and adaptive manner. Glynn, Juneja: Ordinal optimization 36

82 δ guarantees using log(1/δ) samples: Static approach Glynn, Juneja: Ordinal optimization 37

83 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Glynn, Juneja: Ordinal optimization 38

84 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. Glynn, Juneja: Ordinal optimization 38

85 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. Glynn, Juneja: Ordinal optimization 38

86 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. If X n < 0, declare, EX < 0. Glynn, Juneja: Ordinal optimization 38

87 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. If X n < 0, declare, EX < 0. Hoeffding s inequality can be used to bound probability of false selection. Suppose, EX < ɛ, P( X n 0) P( X n EX ɛ) exp( 2nɛ 2 /(b a) 2 ) Glynn, Juneja: Ordinal optimization 38

88 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. If X n < 0, declare, EX < 0. Hoeffding s inequality can be used to bound probability of false selection. Suppose, EX < ɛ, P( X n 0) P( X n EX ɛ) exp( 2nɛ 2 /(b a) 2 ) Thus, n = (b a)2 2ɛ 2 log(1/δ) provides the desired P(ɛ, δ) policy. Glynn, Juneja: Ordinal optimization 38

89 Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, Glynn, Juneja: Ordinal optimization 39

90 Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, ( ) a f (b) max EXI (X u) u. X f (x) f (b) Glynn, Juneja: Ordinal optimization 39

91 Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, ( ) a f (b) max EXI (X u) u. X f (x) f (b) This follows from the optimisation problem max E[XI (X u)] X such that Ef (X ) a. Glynn, Juneja: Ordinal optimization 39

92 Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, ( ) a f (b) max EXI (X u) u. X f (x) f (b) This follows from the optimisation problem max E[XI (X u)] X such that Ef (X ) a. This has a two point solution relying on observation that Y = E[X X < u]i (X < u) + E[X X u]i (X u) is better than X Glynn, Juneja: Ordinal optimization 39

93 Pure Exploration Multi-Armed Bandit Approach Glynn, Juneja: Ordinal optimization 40

94 Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Glynn, Juneja: Ordinal optimization 41

95 Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Let a = arg max a A p a and let a = p a p a. Glynn, Juneja: Ordinal optimization 41

96 Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Let a = arg max a A p a and let a = p a p a. Even Dar et al devise a sequential sampling strategy to find a with probability at least 1 δ. Glynn, Juneja: Ordinal optimization 41

97 Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Let a = arg max a A p a and let a = p a p a. Even Dar et al devise a sequential sampling strategy to find a with probability at least 1 δ. Expected computational effort O a a ln(n/δ) 2 a. Glynn, Juneja: Ordinal optimization 41

98 Popular successive rejection algorithm Sample every arm a once and let ˆµ t a be the average reward of arm a by time t; Glynn, Juneja: Ordinal optimization 42

99 Popular successive rejection algorithm Sample every arm a once and let ˆµ t a be the average reward of arm a by time t; Each arm a such that ˆµ t max ˆµ t a 2α t is removed from consideration. α t = log(5nt 2 /δ)/t; Glynn, Juneja: Ordinal optimization 42

100 Popular successive rejection algorithm Sample every arm a once and let ˆµ t a be the average reward of arm a by time t; Each arm a such that ˆµ t max ˆµ t a 2α t is removed from consideration. α t = log(5nt 2 /δ)/t; t = t + 1; Repeat till one arm left. Glynn, Juneja: Ordinal optimization 42

101 Key idea 1 µ2 µ1 0 t Glynn, Juneja: Ordinal optimization 43

102 Generalizing to heavy tails In Bubeck, Cesa-Bianchi, Lugosi 2013, they develop log(1/δ) algorithms in regret settings when 1 + ɛ moments of each arm output are available. Glynn, Juneja: Ordinal optimization 44

103 Generalizing to heavy tails In Bubeck, Cesa-Bianchi, Lugosi 2013, they develop log(1/δ) algorithms in regret settings when 1 + ɛ moments of each arm output are available. Analysis again relies on forming a cone, which they do through truncation and clever usage of Bernstein inequality. Glynn, Juneja: Ordinal optimization 44

104 Generalizing to heavy tails In Bubeck, Cesa-Bianchi, Lugosi 2013, they develop log(1/δ) algorithms in regret settings when 1 + ɛ moments of each arm output are available. Analysis again relies on forming a cone, which they do through truncation and clever usage of Bernstein inequality. We adapt these algorithms to pure exploration settings. Glynn, Juneja: Ordinal optimization 44

105 In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection Glynn, Juneja: Ordinal optimization 45

106 In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection However, we show through a series of negative results that this convergence rate, or equivalently O(log(1/δ)) computation algorithms, are not possible for unbounded support distributions without further restrictions. Glynn, Juneja: Ordinal optimization 45

107 In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection However, we show through a series of negative results that this convergence rate, or equivalently O(log(1/δ)) computation algorithms, are not possible for unbounded support distributions without further restrictions. Under explicit restrictions on moments of underlying random variables, we devise O(log(1/δ)) algorithms. Glynn, Juneja: Ordinal optimization 45

108 In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection However, we show through a series of negative results that this convergence rate, or equivalently O(log(1/δ)) computation algorithms, are not possible for unbounded support distributions without further restrictions. Under explicit restrictions on moments of underlying random variables, we devise O(log(1/δ)) algorithms. These are closely related to evolving multi-arm bandit literature on pure exploration methods. Glynn, Juneja: Ordinal optimization 45

Ordinal Optimization and Multi Armed Bandit Techniques

Ordinal Optimization and Multi Armed Bandit Techniques Sandeep Juneja. with Peter Glynn September 10, 2014 The ordinal optimization problem Determining the best of d alternative designs for a system, on