Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods

Size: px
Start display at page:

Download "Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods"

Transcription

1 Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Sandeep Juneja Tata Institute of Fundamental Research Mumbai, India joint work with Peter Glynn Applied Probability Frontiers - Banff event June 1, 2015 Glynn, Juneja: Ordinal optimization 1

2 The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Glynn, Juneja: Ordinal optimization 2

3 The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Goal is only to identify the best design and not to actually estimate the performance. Glynn, Juneja: Ordinal optimization 2

4 The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Goal is only to identify the best design and not to actually estimate the performance. We have the ability to generate iid realizations of each of the d random variables. Glynn, Juneja: Ordinal optimization 2

5 The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Goal is only to identify the best design and not to actually estimate the performance. We have the ability to generate iid realizations of each of the d random variables. We focus primarily on d = 2, so given independent samples of X we want to find if EX > 0 or EX < 0. Glynn, Juneja: Ordinal optimization 2

6 Determining the smallest mean population Sample Values Popula'on Glynn, Juneja: Ordinal optimization 3

7 Talk Overview Estimating difference of mean values relies on central limit theorem with an associated slow n 1/2 convergence rate. Glynn, Juneja: Ordinal optimization 4

8 Talk Overview Estimating difference of mean values relies on central limit theorem with an associated slow n 1/2 convergence rate. Ho et al. (1990) observed that identifying the best system typically has a faster convergence rate. Glynn, Juneja: Ordinal optimization 4

9 Talk Overview Estimating difference of mean values relies on central limit theorem with an associated slow n 1/2 convergence rate. Ho et al. (1990) observed that identifying the best system typically has a faster convergence rate. Finding EX < EY, or EX > EY easier than determining the value EX EY. Glynn, Juneja: Ordinal optimization 4

10 L. Dai (1996) showed using large deviation methods that the probability of false selection decays at an exponential rate under mild light tailed assumptions. Thus, if EX 1 < min i 2 EX i then 1 lim n n log P( X 1 (n) > min X i (n)) = I i 2 for large deviations rate function I > 0. Glynn, Juneja: Ordinal optimization 5

11 L. Dai (1996) showed using large deviation methods that the probability of false selection decays at an exponential rate under mild light tailed assumptions. Thus, if EX 1 < min i 2 EX i then 1 lim n n log P( X 1 (n) > min X i (n)) = I i 2 for large deviations rate function I > 0. In a series of papers by Chen, Yucesan, L Dai, Chick and others (2000, 200*) attempted to optimize the budget allocated to each population under Normality assumption. Glynn, Juneja: Ordinal optimization 5

12 L. Dai (1996) showed using large deviation methods that the probability of false selection decays at an exponential rate under mild light tailed assumptions. Thus, if EX 1 < min i 2 EX i then 1 lim n n log P( X 1 (n) > min X i (n)) = I i 2 for large deviations rate function I > 0. In a series of papers by Chen, Yucesan, L Dai, Chick and others (2000, 200*) attempted to optimize the budget allocated to each population under Normality assumption. Substantial literature on selecting the best system amongst many alternatives using ranking/selection procedures. Gaussian assumption is critical to most of the analysis (Nelson and others) Glynn, Juneja: Ordinal optimization 5

13 Glynn and J (2004) observed that if then for p i > 0, d i=1 p i = 1 P( X 1 (p 1 n) > min i 2 EX 1 < min i 2 EX i X i (p i n)) e nh(p 1,...,p d ) so that H(p 1,..., p d ) can be optimised to determine optimal allocations. Glynn, Juneja: Ordinal optimization 6

14 Glynn and J (2004) observed that if then for p i > 0, d i=1 p i = 1 P( X 1 (p 1 n) > min i 2 EX 1 < min i 2 EX i X i (p i n)) e nh(p 1,...,p d ) so that H(p 1,..., p d ) can be optimised to determine optimal allocations. Significant literature since then relying on large deviations analysis (E.g., Hunter and Pasupathy 2013, Szechtman and Yucesan 2008, Broadie, Han Zeevi 2007, Blanchet, Liu, Zwart 2008). Glynn, Juneja: Ordinal optimization 6

15 Some observations If P(FS) e ni, for some I > 0, then n = 1 I log(1/δ)) ensures P(FS) δ. Glynn, Juneja: Ordinal optimization 7

16 Some observations If P(FS) e ni, for some I > 0, then n = 1 I log(1/δ)) ensures P(FS) δ. O(log(1/δ)) effort is necessary. If log(1/δ) 1 ɛ samples are generated, then as δ 0. positive no. P(X i A) log(1/δ)1 ɛ = δ log(1/δ) ɛ >> δ Glynn, Juneja: Ordinal optimization 7

17 HOPE However methods relying on P(FS) e ni, rely on estimating the large deviations rate function I from the samples generated. Glynn, Juneja: Ordinal optimization 8

18 HOPE However methods relying on P(FS) e ni, rely on estimating the large deviations rate function I from the samples generated. One hopes for algorihms that for n = O(log(1/δ)) ensure that at least asymptotically P(FS) δ, that is, lim sup P(FS)δ 1 1 δ 0 even when the means are separated by a fixed and known ɛ > 0. So that min 1 i d EX i < EX j ɛ for all suboptimal j. Glynn, Juneja: Ordinal optimization 8

19 Contributions We argue through two popular implementations that these rate functions are difficult to estimate accurately using O(log(1/δ)) samples Glynn, Juneja: Ordinal optimization 9

20 Contributions We argue through two popular implementations that these rate functions are difficult to estimate accurately using O(log(1/δ)) samples Enroute, we identify the large deviations rate function of the empirically estimated rate function. This may be of independent interest. Glynn, Juneja: Ordinal optimization 9

21 Key negative result Given any (ɛ, δ) algorithm - one that correctly separates designs with mean difference at least ɛ with lim sup P(FS)δ 1 1, δ 0 Glynn, Juneja: Ordinal optimization 10

22 Key negative result Given any (ɛ, δ) algorithm - one that correctly separates designs with mean difference at least ɛ with lim sup P(FS)δ 1 1, δ 0 We prove that for populations with unbounded support under mild restrictions, the expected number of samples cannot be O(log(1/δ)). Glynn, Juneja: Ordinal optimization 10

23 Positive contributions Under explicitly available moment upper bounds, we develop truncation based O(log(1/δ)) computation time (ɛ, δ) algorithms. Glynn, Juneja: Ordinal optimization 11

24 Positive contributions Under explicitly available moment upper bounds, we develop truncation based O(log(1/δ)) computation time (ɛ, δ) algorithms. We also adapt the recently proposed sequential algorithms in multi-armed bandit regret setting to this pure exploration setting. Glynn, Juneja: Ordinal optimization 11

25 Two phase ordinal optimisation implementation Glynn, Juneja: Ordinal optimization 12

26 Basic large deviations theory Suppose X 1, X 2,..., X n are i.i.d. samples of X and a > EX. Glynn, Juneja: Ordinal optimization 13

27 Basic large deviations theory Suppose X 1, X 2,..., X n are i.i.d. samples of X and a > EX. Then, for θ > 0, Cramer s bound P( X n a) e n(θa Λ(θ)) where Λ(θ) = log Ee θx. Glynn, Juneja: Ordinal optimization 13

28 Basic large deviations theory Suppose X 1, X 2,..., X n are i.i.d. samples of X and a > EX. Then, for θ > 0, Cramer s bound where Λ(θ) = log Ee θx. Cramer s Theorem P( X n a) e n(θa Λ(θ)) P( X ni (a)(1+o(1)) n a) = e where, the large deviations rate function I (a) = sup (θa Λ(θ)). θ R Glynn, Juneja: Ordinal optimization 13

29 The rate function I(x) EX x Glynn, Juneja: Ordinal optimization 14

30 A simple setting Consider a single rv X with unknown mean EX. Need to decide whether EX > 0 or EX < 0 with error probability δ. Glynn, Juneja: Ordinal optimization 15

31 A simple setting Consider a single rv X with unknown mean EX. Need to decide whether EX > 0 or EX < 0 with error probability δ. Then if EX < 0, we may take exp( ni (0)) as a proxy for P( X n 0), the probability of false selection. Glynn, Juneja: Ordinal optimization 15

32 A simple setting Consider a single rv X with unknown mean EX. Need to decide whether EX > 0 or EX < 0 with error probability δ. Then if EX < 0, we may take exp( ni (0)) as a proxy for P( X n 0), the probability of false selection. If EX > 0, we may again take exp( ni (0)) as a proxy for P( X n 0), the probability of false selection. Glynn, Juneja: Ordinal optimization 15

33 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Glynn, Juneja: Ordinal optimization 16

34 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is Glynn, Juneja: Ordinal optimization 16

35 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Glynn, Juneja: Ordinal optimization 16

36 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Second phase - Generate log(1/δ)/îm(0) = m/îm(0) n samples of X. Glynn, Juneja: Ordinal optimization 16

37 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Second phase - Generate log(1/δ)/îm(0) = m/îm(0) n samples of X. Decide the sign of EX based on whether X n > 0 or X n 0. Glynn, Juneja: Ordinal optimization 16

38 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Second phase - Generate log(1/δ)/îm(0) = m/îm(0) n samples of X. Decide the sign of EX based on whether X n > 0 or X n 0. We now discuss estimation of I (0). Glynn, Juneja: Ordinal optimization 16

39 Estimating rate function Glynn, Juneja: Ordinal optimization 17

40 Graphic view of I(0) The log-moment generating function of X Λ(θ) = log E exp(θx ) is convex with Λ(0) = 0 and Λ (0) = EX. Glynn, Juneja: Ordinal optimization 18

41 Graphic view of I(0) The log-moment generating function of X Λ(θ) = log E exp(θx ) is convex with Λ(0) = 0 and Λ (0) = EX. Then, I (0) = inf θ Λ(θ). Glynn, Juneja: Ordinal optimization 18

42 Λ(θ) Graphic view of I(0) The log-moment generating function of X Λ(θ) = log E exp(θx ) is convex with Λ(0) = 0 and Λ (0) = EX. Then, I (0) = inf θ Λ(θ). EX I(0) Glynn, Juneja: Ordinal optimization 18 θ

43 Estimating rate function I (0) A natural estimator for I (0) based on samples (X i : 1 i m) is Î m (0) = inf ˆΛ m (θ) θ R where ( ) 1 m ˆΛ m (θ) = log exp(θx i ) m i=1 Glynn, Juneja: Ordinal optimization 19

44 Λ(θ) Graphic view of estimated log moment generating function θ Im(0) Glynn, Juneja: Ordinal optimization 20

45 Large deviations rate function of Îm(0) Can show that for a > I (0), lim m 1 m log P(Îm(0) a), equals Glynn, Juneja: Ordinal optimization 21

46 Large deviations rate function of Îm(0) Can show that for a > I (0), lim m where lim m ( 1 m log P inf θ 1 m log P(Îm(0) a), equals 1 m m i=1 e θx i e a ) = inf θ R I θ(e a ), I θ (ν) = sup(αν log E exp(αe θx )). α Glynn, Juneja: Ordinal optimization 21

47 Large deviations rate function of Îm(0) Can show that for a > I (0), lim m where lim m ( 1 m log P inf θ 1 m log P(Îm(0) a), equals 1 m m i=1 e θx i e a ) = inf θ R I θ(e a ), I θ (ν) = sup(αν log E exp(αe θx )). α Further, for a < I (0), lim m 1 m log P(Îm(0) a) = sup θ R I θ (e a ) Glynn, Juneja: Ordinal optimization 21

48 Also, the natural estimator for I (x) is Î m (x) = sup(θx ˆΛ m (θ)). θ R Glynn, Juneja: Ordinal optimization 22

49 Also, the natural estimator for I (x) is Î m (x) = sup(θx ˆΛ m (θ)). θ R it can be seen that for a > I (x), the rate function for Î m (x), equals inf θ R I θ(e a+θx ) for a > I (x), and equals a < I (x). sup I θ (e a+θx ) θ R Glynn, Juneja: Ordinal optimization 22

50 Negative Result 1 Failure of the Naive Glynn, Juneja: Ordinal optimization 23

51 Returning to two phase procedure We generate samples X 1,..., X m for m = log(1/δ) and set Î m (0) = inf θ ˆΛ m (θ). Glynn, Juneja: Ordinal optimization 24

52 Returning to two phase procedure We generate samples X 1,..., X m for m = log(1/δ) and set Î m (0) = inf θ ˆΛ m (θ). Then generate log(1/δ)/î m (0) = m/î m (0) samples of X in the second phase. Glynn, Juneja: Ordinal optimization 24

53 Returning to two phase procedure We generate samples X 1,..., X m for m = log(1/δ) and set Î m (0) = inf θ ˆΛ m (θ). Then generate log(1/δ)/î m (0) = m/î m (0) samples of X in the second phase. ( ) P(FS) E exp m Î m (0) I (0) Errors due to large values of Î m (0) that lead to under sampling in second phase - Due to conspiratorial large deviations behaviour of all the terms. Glynn, Juneja: Ordinal optimization 24

54 Lower Bound for P(FS) Can show lim m 1 m ( log P(FS) = sup sup I (0) ) a>0 θ a I θ(e a ). Glynn, Juneja: Ordinal optimization 25

55 Lower Bound for P(FS) Can show lim m 1 m ( log P(FS) = sup sup I (0) ) a>0 θ a I θ(e a ). In particular, lim inf δ 0 P(FS)δ 1 > 1. Glynn, Juneja: Ordinal optimization 25

56 Negative Result 2 Fall of the Sophisticated Glynn, Juneja: Ordinal optimization 26

57 Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). Glynn, Juneja: Ordinal optimization 27

58 Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). If exp( mîm(0)) δ, stop. Glynn, Juneja: Ordinal optimization 27

59 Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). If exp( mîm(0)) δ, stop. Else, provide another c log(1/δ) of computational budget and so on. Glynn, Juneja: Ordinal optimization 27

60 Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). If exp( mîm(0)) δ, stop. Else, provide another c log(1/δ) of computational budget and so on. Can identify distributions for which this would not be accurate. Glynn, Juneja: Ordinal optimization 27

61 Need to find X with EX < 0 so that P( X m 0 and exp( mî m (0)) δ) > δ. Glynn, Juneja: Ordinal optimization 28

62 Need to find X with EX < 0 so that P( X m 0 and exp( mî m (0)) δ) > δ. This follows if for some θ < 0, I θ (e 1/c ) < 1/c. Glynn, Juneja: Ordinal optimization 28

63 Need to find X with EX < 0 so that P( X m 0 and exp( mî m (0)) δ) > δ. This follows if for some θ < 0, I θ (e 1/c ) < 1/c. Can be shown for some light tailed distributions even for large c. Glynn, Juneja: Ordinal optimization 28

64 Λ(θ) Graphic view: I (0) < 1/c and Îm(0) > 1/c -1/c θ Glynn, Juneja: Ordinal optimization 29

65 Negative Result 3 Perdition Glynn, Juneja: Ordinal optimization 30

66 Let L denote a collection of probability distributions with finite mean, unbounded positive support, open under Kullback-Leibler distance. Glynn, Juneja: Ordinal optimization 31

67 Let L denote a collection of probability distributions with finite mean, unbounded positive support, open under Kullback-Leibler distance. Let P(ɛ, δ) denote a policy that can adaptively sample from any two distributions in L with lim sup P(FS)δ 1 1. δ 0 Glynn, Juneja: Ordinal optimization 31

68 Let L denote a collection of probability distributions with finite mean, unbounded positive support, open under Kullback-Leibler distance. Let P(ɛ, δ) denote a policy that can adaptively sample from any two distributions in L with lim sup P(FS)δ 1 1. δ 0 Result - For any two such distributions in L with arbitrarily apart mean, P(ɛ, δ) policy on average requires more than O(log(1/δ)) samples. Glynn, Juneja: Ordinal optimization 31

69 Details Under probability P a, {X i } has distribution F, mean µ F, {Yi } has distribution G, mean µ G < µ F ɛ. Glynn, Juneja: Ordinal optimization 32

70 Details Under probability P a, {X i } has distribution F, mean µ F, {Yi } has distribution G, mean µ G < µ F ɛ. Under probability P b {Xi } has distribution F, {Yi } has distribution G > µ F + ɛ. Glynn, Juneja: Ordinal optimization 32

71 Details Under probability P a, {X i } has distribution F, mean µ F, {Yi } has distribution G, mean µ G < µ F ɛ. Under probability P b {Xi } has distribution F, {Yi } has distribution G > µ F + ɛ. Under P(ɛ, δ), lim inf δ 0 where KL(G, G) = x R E a T G log(1/δ) 1 3 KL(G, G). ( ) log dg d G (x) dg(x)). Glynn, Juneja: Ordinal optimization 32

72 Proof similar to Lai and Robbins 1985, Mannor and Tsitsiklis 2004 Asymptotically, P a ( algorithm selects F) 1 δ Glynn, Juneja: Ordinal optimization 33

73 Proof similar to Lai and Robbins 1985, Mannor and Tsitsiklis 2004 Asymptotically, P a ( algorithm selects F) 1 δ P b ( algorithm selects F) δ. Glynn, Juneja: Ordinal optimization 33

74 Proof similar to Lai and Robbins 1985, Mannor and Tsitsiklis 2004 Asymptotically, P a ( algorithm selects F) 1 δ P b ( algorithm selects F) δ. ( TG d P b ( algo. selects F) = E G ) a dg (Y i)i ( algo. selects F) i=1 ) E a (e T G KL(G, G) I ( algo. selects F) and the result is easily deduced. e 2Ea(T G ) KL(G, G) P a ( algo. selects F) Glynn, Juneja: Ordinal optimization 33

75 Result Given G with finite mean and unbounded positive support, for any α > 0, and k > µ G there exists a distribution G k such that KL(G, G k ) α and µ Gk k. Glynn, Juneja: Ordinal optimization 34

76 Way forward Glynn, Juneja: Ordinal optimization 35

77 Additional information needed to attain log(1/δ) convergence rates. Glynn, Juneja: Ordinal optimization 36

78 Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Glynn, Juneja: Ordinal optimization 36

79 Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Easy to develop such bounds once suitable Lyapunov functions can be identified (not to be discussed here) Glynn, Juneja: Ordinal optimization 36

80 Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Easy to develop such bounds once suitable Lyapunov functions can be identified (not to be discussed here) Use such bounds to develop (ɛ, δ) strategies by truncating random variables while controlling the error to be less than ɛ. Then use Hoeffding. Glynn, Juneja: Ordinal optimization 36

81 Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Easy to develop such bounds once suitable Lyapunov functions can be identified (not to be discussed here) Use such bounds to develop (ɛ, δ) strategies by truncating random variables while controlling the error to be less than ɛ. Then use Hoeffding. Recent multi-armed-bandits methods do this in a sequential and adaptive manner. Glynn, Juneja: Ordinal optimization 36

82 δ guarantees using log(1/δ) samples: Static approach Glynn, Juneja: Ordinal optimization 37

83 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Glynn, Juneja: Ordinal optimization 38

84 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. Glynn, Juneja: Ordinal optimization 38

85 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. Glynn, Juneja: Ordinal optimization 38

86 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. If X n < 0, declare, EX < 0. Glynn, Juneja: Ordinal optimization 38

87 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. If X n < 0, declare, EX < 0. Hoeffding s inequality can be used to bound probability of false selection. Suppose, EX < ɛ, P( X n 0) P( X n EX ɛ) exp( 2nɛ 2 /(b a) 2 ) Glynn, Juneja: Ordinal optimization 38

88 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. If X n < 0, declare, EX < 0. Hoeffding s inequality can be used to bound probability of false selection. Suppose, EX < ɛ, P( X n 0) P( X n EX ɛ) exp( 2nɛ 2 /(b a) 2 ) Thus, n = (b a)2 2ɛ 2 log(1/δ) provides the desired P(ɛ, δ) policy. Glynn, Juneja: Ordinal optimization 38

89 Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, Glynn, Juneja: Ordinal optimization 39

90 Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, ( ) a f (b) max EXI (X u) u. X f (x) f (b) Glynn, Juneja: Ordinal optimization 39

91 Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, ( ) a f (b) max EXI (X u) u. X f (x) f (b) This follows from the optimisation problem max E[XI (X u)] X such that Ef (X ) a. Glynn, Juneja: Ordinal optimization 39

92 Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, ( ) a f (b) max EXI (X u) u. X f (x) f (b) This follows from the optimisation problem max E[XI (X u)] X such that Ef (X ) a. This has a two point solution relying on observation that Y = E[X X < u]i (X < u) + E[X X u]i (X u) is better than X Glynn, Juneja: Ordinal optimization 39

93 Pure Exploration Multi-Armed Bandit Approach Glynn, Juneja: Ordinal optimization 40

94 Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Glynn, Juneja: Ordinal optimization 41

95 Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Let a = arg max a A p a and let a = p a p a. Glynn, Juneja: Ordinal optimization 41

96 Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Let a = arg max a A p a and let a = p a p a. Even Dar et al devise a sequential sampling strategy to find a with probability at least 1 δ. Glynn, Juneja: Ordinal optimization 41

97 Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Let a = arg max a A p a and let a = p a p a. Even Dar et al devise a sequential sampling strategy to find a with probability at least 1 δ. Expected computational effort O a a ln(n/δ) 2 a. Glynn, Juneja: Ordinal optimization 41

98 Popular successive rejection algorithm Sample every arm a once and let ˆµ t a be the average reward of arm a by time t; Glynn, Juneja: Ordinal optimization 42

99 Popular successive rejection algorithm Sample every arm a once and let ˆµ t a be the average reward of arm a by time t; Each arm a such that ˆµ t max ˆµ t a 2α t is removed from consideration. α t = log(5nt 2 /δ)/t; Glynn, Juneja: Ordinal optimization 42

100 Popular successive rejection algorithm Sample every arm a once and let ˆµ t a be the average reward of arm a by time t; Each arm a such that ˆµ t max ˆµ t a 2α t is removed from consideration. α t = log(5nt 2 /δ)/t; t = t + 1; Repeat till one arm left. Glynn, Juneja: Ordinal optimization 42

101 Key idea 1 µ2 µ1 0 t Glynn, Juneja: Ordinal optimization 43

102 Generalizing to heavy tails In Bubeck, Cesa-Bianchi, Lugosi 2013, they develop log(1/δ) algorithms in regret settings when 1 + ɛ moments of each arm output are available. Glynn, Juneja: Ordinal optimization 44

103 Generalizing to heavy tails In Bubeck, Cesa-Bianchi, Lugosi 2013, they develop log(1/δ) algorithms in regret settings when 1 + ɛ moments of each arm output are available. Analysis again relies on forming a cone, which they do through truncation and clever usage of Bernstein inequality. Glynn, Juneja: Ordinal optimization 44

104 Generalizing to heavy tails In Bubeck, Cesa-Bianchi, Lugosi 2013, they develop log(1/δ) algorithms in regret settings when 1 + ɛ moments of each arm output are available. Analysis again relies on forming a cone, which they do through truncation and clever usage of Bernstein inequality. We adapt these algorithms to pure exploration settings. Glynn, Juneja: Ordinal optimization 44

105 In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection Glynn, Juneja: Ordinal optimization 45

106 In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection However, we show through a series of negative results that this convergence rate, or equivalently O(log(1/δ)) computation algorithms, are not possible for unbounded support distributions without further restrictions. Glynn, Juneja: Ordinal optimization 45

107 In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection However, we show through a series of negative results that this convergence rate, or equivalently O(log(1/δ)) computation algorithms, are not possible for unbounded support distributions without further restrictions. Under explicit restrictions on moments of underlying random variables, we devise O(log(1/δ)) algorithms. Glynn, Juneja: Ordinal optimization 45

108 In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection However, we show through a series of negative results that this convergence rate, or equivalently O(log(1/δ)) computation algorithms, are not possible for unbounded support distributions without further restrictions. Under explicit restrictions on moments of underlying random variables, we devise O(log(1/δ)) algorithms. These are closely related to evolving multi-arm bandit literature on pure exploration methods. Glynn, Juneja: Ordinal optimization 45

Ordinal Optimization and Multi Armed Bandit Techniques

Ordinal Optimization and Multi Armed Bandit Techniques Ordinal Optimization and Multi Armed Bandit Techniques Sandeep Juneja. with Peter Glynn September 10, 2014 The ordinal optimization problem Determining the best of d alternative designs for a system, on

More information

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Stratégies bayésiennes et fréquentistes dans un modèle de bandit

Stratégies bayésiennes et fréquentistes dans un modèle de bandit Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016

More information

The information complexity of sequential resource allocation

The information complexity of sequential resource allocation The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation

More information

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds. Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huscha, and S. E. Chic, eds. OPTIMAL COMPUTING BUDGET ALLOCATION WITH EXPONENTIAL UNDERLYING

More information

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March

More information

The information complexity of best-arm identification

The information complexity of best-arm identification The information complexity of best-arm identification Emilie Kaufmann, joint work with Olivier Cappé and Aurélien Garivier MAB workshop, Lancaster, January th, 206 Context: the multi-armed bandit model

More information

Two optimization problems in a stochastic bandit model

Two optimization problems in a stochastic bandit model Two optimization problems in a stochastic bandit model Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishnan Journées MAS 204, Toulouse Outline From stochastic optimization

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

Revisiting the Exploration-Exploitation Tradeoff in Bandit Models

Revisiting the Exploration-Exploitation Tradeoff in Bandit Models Revisiting the Exploration-Exploitation Tradeoff in Bandit Models joint work with Aurélien Garivier (IMT, Toulouse) and Tor Lattimore (University of Alberta) Workshop on Optimization and Decision-Making

More information

On the Complexity of Best Arm Identification with Fixed Confidence

On the Complexity of Best Arm Identification with Fixed Confidence On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse

More information

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Lecture 5: Bandit optimisation Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce bandit optimisation: the

More information

Tractable Sampling Strategies for Ordinal Optimization

Tractable Sampling Strategies for Ordinal Optimization Submitted to Operations Research manuscript Please, provide the manuscript number!) Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the

More information

The Multi-Armed Bandit Problem

The Multi-Armed Bandit Problem The Multi-Armed Bandit Problem Electrical and Computer Engineering December 7, 2013 Outline 1 2 Mathematical 3 Algorithm Upper Confidence Bound Algorithm A/B Testing Exploration vs. Exploitation Scientist

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

arxiv: v2 [stat.ml] 14 Nov 2016

arxiv: v2 [stat.ml] 14 Nov 2016 Journal of Machine Learning Research 6 06-4 Submitted 7/4; Revised /5; Published /6 On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models arxiv:407.4443v [stat.ml] 4 Nov 06 Emilie Kaufmann

More information

On the Complexity of Best Arm Identification with Fixed Confidence

On the Complexity of Best Arm Identification with Fixed Confidence On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, joint work with Emilie Kaufmann CNRS, CRIStAL) to be presented at COLT 16, New York

More information

Dynamic resource allocation: Bandit problems and extensions

Dynamic resource allocation: Bandit problems and extensions Dynamic resource allocation: Bandit problems and extensions Aurélien Garivier Institut de Mathématiques de Toulouse MAD Seminar, Université Toulouse 1 October 3rd, 2014 The Bandit Model Roadmap 1 The Bandit

More information

Lecture 4: Lower Bounds (ending); Thompson Sampling

Lecture 4: Lower Bounds (ending); Thompson Sampling CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from

More information

On Bayesian bandit algorithms

On Bayesian bandit algorithms On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms

More information

Bayesian and Frequentist Methods in Bandit Models

Bayesian and Frequentist Methods in Bandit Models Bayesian and Frequentist Methods in Bandit Models Emilie Kaufmann, Telecom ParisTech Bayes In Paris, ENSAE, October 24th, 2013 Emilie Kaufmann (Telecom ParisTech) Bayesian and Frequentist Bandits BIP,

More information

Multi armed bandit problem: some insights

Multi armed bandit problem: some insights Multi armed bandit problem: some insights July 4, 20 Introduction Multi Armed Bandit problems have been widely studied in the context of sequential analysis. The application areas include clinical trials,

More information

Anytime optimal algorithms in stochastic multi-armed bandits

Anytime optimal algorithms in stochastic multi-armed bandits Rémy Degenne LPMA, Université Paris Diderot Vianney Perchet CREST, ENSAE REMYDEGENNE@MATHUNIV-PARIS-DIDEROTFR VIANNEYPERCHET@NORMALESUPORG Abstract We introduce an anytime algorithm for stochastic multi-armed

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision

More information

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:

More information

Subsampling, Concentration and Multi-armed bandits

Subsampling, Concentration and Multi-armed bandits Subsampling, Concentration and Multi-armed bandits Odalric-Ambrym Maillard, R. Bardenet, S. Mannor, A. Baransi, N. Galichet, J. Pineau, A. Durand Toulouse, November 09, 2015 O-A. Maillard Subsampling and

More information

Concentration inequalities and tail bounds

Concentration inequalities and tail bounds Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples

More information

Stochastic Contextual Bandits with Known. Reward Functions

Stochastic Contextual Bandits with Known. Reward Functions Stochastic Contextual Bandits with nown 1 Reward Functions Pranav Sakulkar and Bhaskar rishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering University of Southern

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems Sébastien Bubeck Theory Group Part 1: i.i.d., adversarial, and Bayesian bandit models i.i.d. multi-armed bandit, Robbins [1952]

More information

Consider the context of selecting an optimal system from among a finite set of competing systems, based

Consider the context of selecting an optimal system from among a finite set of competing systems, based INFORMS Journal on Computing Vol. 25, No. 3, Summer 23, pp. 527 542 ISSN 9-9856 print) ISSN 526-5528 online) http://dx.doi.org/.287/ijoc.2.59 23 INFORMS Optimal Sampling Laws for Stochastically Constrained

More information

International Conference on Applied Probability and Computational Methods in Applied Sciences

International Conference on Applied Probability and Computational Methods in Applied Sciences International Conference on Applied Probability and Computational Methods in Applied Sciences Organizing Committee Zhiliang Ying, Columbia University and Shanghai Center for Mathematical Sciences Jingchen

More information

Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks

Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks Reward Maximization Under Uncertainty: Leveraging Side-Observations Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks Swapna Buccapatnam AT&T Labs Research, Middletown, NJ

More information

Computational Oracle Inequalities for Large Scale Model Selection Problems

Computational Oracle Inequalities for Large Scale Model Selection Problems for Large Scale Model Selection Problems University of California at Berkeley Queensland University of Technology ETH Zürich, September 2011 Joint work with Alekh Agarwal, John Duchi and Clément Levrard.

More information

An Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement

An Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement An Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement Satyanath Bhat Joint work with: Shweta Jain, Sujit Gujar, Y. Narahari Department of Computer Science and Automation, Indian

More information

Multiple Identifications in Multi-Armed Bandits

Multiple Identifications in Multi-Armed Bandits Multiple Identifications in Multi-Armed Bandits arxiv:05.38v [cs.lg] 4 May 0 Sébastien Bubeck Department of Operations Research and Financial Engineering, Princeton University sbubeck@princeton.edu Tengyao

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

Bandits : optimality in exponential families

Bandits : optimality in exponential families Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities

More information

Stat 260/CS Learning in Sequential Decision Problems.

Stat 260/CS Learning in Sequential Decision Problems. Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Exponential families. Cumulant generating function. KL-divergence. KL-UCB for an exponential

More information

Concentration of Measures by Bounded Size Bias Couplings

Concentration of Measures by Bounded Size Bias Couplings Concentration of Measures by Bounded Size Bias Couplings Subhankar Ghosh, Larry Goldstein University of Southern California [arxiv:0906.3886] January 10 th, 2013 Concentration of Measure Distributional

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

On the Complexity of A/B Testing

On the Complexity of A/B Testing JMLR: Workshop and Conference Proceedings vol 35:1 3, 014 On the Complexity of A/B Testing Emilie Kaufmann LTCI, Télécom ParisTech & CNRS KAUFMANN@TELECOM-PARISTECH.FR Olivier Cappé CAPPE@TELECOM-PARISTECH.FR

More information

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs 2015 IEEE 54th Annual Conference on Decision and Control CDC December 15-18, 2015. Osaka, Japan An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs Abhishek Gupta Rahul Jain Peter

More information

Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration

Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration JMLR: Workshop and Conference Proceedings vol 65: 55, 207 30th Annual Conference on Learning Theory Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration Editor: Under Review for COLT 207

More information

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it Statistics 300B Winter 08 Final Exam Due 4 Hours after receiving it Directions: This test is open book and open internet, but must be done without consulting other students. Any consultation of other students

More information

Lecture 3: Lower Bounds for Bandit Algorithms

Lecture 3: Lower Bounds for Bandit Algorithms CMSC 858G: Bandits, Experts and Games 09/19/16 Lecture 3: Lower Bounds for Bandit Algorithms Instructor: Alex Slivkins Scribed by: Soham De & Karthik A Sankararaman 1 Lower Bounds In this lecture (and

More information

Adaptive Concentration Inequalities for Sequential Decision Problems

Adaptive Concentration Inequalities for Sequential Decision Problems Adaptive Concentration Inequalities for Sequential Decision Problems Shengjia Zhao Tsinghua University zhaosj12@stanford.edu Ashish Sabharwal Allen Institute for AI AshishS@allenai.org Enze Zhou Tsinghua

More information

Introduction to Self-normalized Limit Theory

Introduction to Self-normalized Limit Theory Introduction to Self-normalized Limit Theory Qi-Man Shao The Chinese University of Hong Kong E-mail: qmshao@cuhk.edu.hk Outline What is the self-normalization? Why? Classical limit theorems Self-normalized

More information

Multi-armed Bandits in the Presence of Side Observations in Social Networks

Multi-armed Bandits in the Presence of Side Observations in Social Networks 52nd IEEE Conference on Decision and Control December 0-3, 203. Florence, Italy Multi-armed Bandits in the Presence of Side Observations in Social Networks Swapna Buccapatnam, Atilla Eryilmaz, and Ness

More information

Informational Confidence Bounds for Self-Normalized Averages and Applications

Informational Confidence Bounds for Self-Normalized Averages and Applications Informational Confidence Bounds for Self-Normalized Averages and Applications Aurélien Garivier Institut de Mathématiques de Toulouse - Université Paul Sabatier Thursday, September 12th 2013 Context Tree

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play

More information

Analysis of Thompson Sampling for the multi-armed bandit problem

Analysis of Thompson Sampling for the multi-armed bandit problem Analysis of Thompson Sampling for the multi-armed bandit problem Shipra Agrawal Microsoft Research India shipra@microsoft.com avin Goyal Microsoft Research India navingo@microsoft.com Abstract We show

More information

Stochastic bandits: Explore-First and UCB

Stochastic bandits: Explore-First and UCB CSE599s, Spring 2014, Online Learning Lecture 15-2/19/2014 Stochastic bandits: Explore-First and UCB Lecturer: Brendan McMahan or Ofer Dekel Scribe: Javad Hosseini In this lecture, we like to answer this

More information

Adaptive Learning with Unknown Information Flows

Adaptive Learning with Unknown Information Flows Adaptive Learning with Unknown Information Flows Yonatan Gur Stanford University Ahmadreza Momeni Stanford University June 8, 018 Abstract An agent facing sequential decisions that are characterized by

More information

CHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD

CHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD CHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD Vijay Singamasetty, Navneeth Nair, Srikrishna Bhashyam and Arun Pachai Kannu Department of Electrical Engineering Indian

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Csaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008

Csaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008 LEARNING THEORY OF OPTIMAL DECISION MAKING PART I: ON-LINE LEARNING IN STOCHASTIC ENVIRONMENTS Csaba Szepesvári 1 1 Department of Computing Science University of Alberta Machine Learning Summer School,

More information

Concentration of Measures by Bounded Couplings

Concentration of Measures by Bounded Couplings Concentration of Measures by Bounded Couplings Subhankar Ghosh, Larry Goldstein and Ümit Işlak University of Southern California [arxiv:0906.3886] [arxiv:1304.5001] May 2013 Concentration of Measure Distributional

More information

Introduction to Rare Event Simulation

Introduction to Rare Event Simulation Introduction to Rare Event Simulation Brown University: Summer School on Rare Event Simulation Jose Blanchet Columbia University. Department of Statistics, Department of IEOR. Blanchet (Columbia) 1 / 31

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

Regional Multi-Armed Bandits

Regional Multi-Armed Bandits School of Information Science and Technology University of Science and Technology of China {wzy43, zrd7}@mail.ustc.edu.cn, congshen@ustc.edu.cn Abstract We consider a variant of the classic multiarmed

More information

KULLBACK-LEIBLER UPPER CONFIDENCE BOUNDS FOR OPTIMAL SEQUENTIAL ALLOCATION

KULLBACK-LEIBLER UPPER CONFIDENCE BOUNDS FOR OPTIMAL SEQUENTIAL ALLOCATION Submitted to the Annals of Statistics arxiv: math.pr/0000000 KULLBACK-LEIBLER UPPER CONFIDENCE BOUNDS FOR OPTIMAL SEQUENTIAL ALLOCATION By Olivier Cappé 1, Aurélien Garivier 2, Odalric-Ambrym Maillard

More information

Pure Exploration Stochastic Multi-armed Bandits

Pure Exploration Stochastic Multi-armed Bandits C&A Workshop 2016, Hangzhou Pure Exploration Stochastic Multi-armed Bandits Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Outline Introduction 2 Arms Best Arm Identification

More information

The multi armed-bandit problem

The multi armed-bandit problem The multi armed-bandit problem (with covariates if we have time) Vianney Perchet & Philippe Rigollet LPMA Université Paris Diderot ORFE Princeton University Algorithms and Dynamics for Games and Optimization

More information

Bandit Algorithms. Tor Lattimore & Csaba Szepesvári

Bandit Algorithms. Tor Lattimore & Csaba Szepesvári Bandit Algorithms Tor Lattimore & Csaba Szepesvári Bandits Time 1 2 3 4 5 6 7 8 9 10 11 12 Left arm $1 $0 $1 $1 $0 Right arm $1 $0 Five rounds to go. Which arm would you play next? Overview What are bandits,

More information

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

The Moment Method; Convex Duality; and Large/Medium/Small Deviations Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential

More information

Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors

Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conugate Priors Yichi Zhou 1 Jun Zhu 1 Jingwe Zhuo 1 Abstract Thompson sampling has impressive empirical performance for many multi-armed

More information

The knowledge gradient method for multi-armed bandit problems

The knowledge gradient method for multi-armed bandit problems The knowledge gradient method for multi-armed bandit problems Moving beyond inde policies Ilya O. Ryzhov Warren Powell Peter Frazier Department of Operations Research and Financial Engineering Princeton

More information

Chapter 7. Basic Probability Theory

Chapter 7. Basic Probability Theory Chapter 7. Basic Probability Theory I-Liang Chern October 20, 2016 1 / 49 What s kind of matrices satisfying RIP Random matrices with iid Gaussian entries iid Bernoulli entries (+/ 1) iid subgaussian entries

More information

arxiv: v1 [cs.lg] 14 Nov 2018 Abstract

arxiv: v1 [cs.lg] 14 Nov 2018 Abstract Sample complexity of partition identification using multi-armed bandits Sandeep Juneja Subhashini Krishnasamy TIFR, Mumbai November 15, 2018 arxiv:1811.05654v1 [cs.lg] 14 Nov 2018 Abstract Given a vector

More information

Multi-armed Bandits and the Gittins Index

Multi-armed Bandits and the Gittins Index Multi-armed Bandits and the Gittins Index Richard Weber Statistical Laboratory, University of Cambridge A talk to accompany Lecture 6 Two-armed Bandit 3, 10, 4, 9, 12, 1,... 5, 6, 2, 15, 2, 7,... Two-armed

More information

Analysis of Thompson Sampling for the multi-armed bandit problem

Analysis of Thompson Sampling for the multi-armed bandit problem Analysis of Thompson Sampling for the multi-armed bandit problem Shipra Agrawal Microsoft Research India shipra@microsoft.com Navin Goyal Microsoft Research India navingo@microsoft.com Abstract The multi-armed

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Two generic principles in modern bandits: the optimistic principle and Thompson sampling

Two generic principles in modern bandits: the optimistic principle and Thompson sampling Two generic principles in modern bandits: the optimistic principle and Thompson sampling Rémi Munos INRIA Lille, France CSML Lunch Seminars, September 12, 2014 Outline Two principles: The optimistic principle

More information

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk Instructor: Victor F. Araman December 4, 2003 Theory and Applications of Stochastic Systems Lecture 0 B60.432.0 Exponential Martingale for Random Walk Let (S n : n 0) be a random walk with i.i.d. increments

More information

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018 Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from

More information

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits Alexandra Carpentier 1, Alessandro Lazaric 1, Mohammad Ghavamzadeh 1, Rémi Munos 1, and Peter Auer 2 1 INRIA Lille - Nord Europe,

More information

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10. x n+1 = f(x n ),

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10. x n+1 = f(x n ), MS&E 321 Spring 12-13 Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10 Section 4: Steady-State Theory Contents 4.1 The Concept of Stochastic Equilibrium.......................... 1 4.2

More information

Bandit Algorithms for Pure Exploration: Best Arm Identification and Game Tree Search. Wouter M. Koolen

Bandit Algorithms for Pure Exploration: Best Arm Identification and Game Tree Search. Wouter M. Koolen Bandit Algorithms for Pure Exploration: Best Arm Identification and Game Tree Search Wouter M. Koolen Machine Learning and Statistics for Structures Friday 23 rd February, 2018 Outline 1 Intro 2 Model

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

Selected Exercises on Expectations and Some Probability Inequalities

Selected Exercises on Expectations and Some Probability Inequalities Selected Exercises on Expectations and Some Probability Inequalities # If E(X 2 ) = and E X a > 0, then P( X λa) ( λ) 2 a 2 for 0 < λ

More information

Minimax strategy for Stratified Sampling for Monte Carlo

Minimax strategy for Stratified Sampling for Monte Carlo Journal of Machine Learning Research 1 2000 1-48 Submitted 4/00; Published 10/00 Minimax strategy for Stratified Sampling for Monte Carlo Alexandra Carpentier INRIA Lille - Nord Europe 40, avenue Halley

More information

Probability and Measure

Probability and Measure Chapter 4 Probability and Measure 4.1 Introduction In this chapter we will examine probability theory from the measure theoretic perspective. The realisation that measure theory is the foundation of probability

More information

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang. Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)

More information

Sub-Gaussian estimators under heavy tails

Sub-Gaussian estimators under heavy tails Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade Maresias, August 6th 2015 Joint with Luc Devroye (McGill) Matthieu Lerasle (CNRS/Nice) Gábor

More information

Lecture 4: September Reminder: convergence of sequences

Lecture 4: September Reminder: convergence of sequences 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused

More information

arxiv: v3 [cs.lg] 7 Nov 2017

arxiv: v3 [cs.lg] 7 Nov 2017 ESAIM: PROCEEDINGS AND SURVEYS, Vol.?, 2017, 1-10 Editors: Will be set by the publisher arxiv:1702.00001v3 [cs.lg] 7 Nov 2017 LEARNING THE DISTRIBUTION WITH LARGEST MEAN: TWO BANDIT FRAMEWORKS Emilie Kaufmann

More information

An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes

An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes Prokopis C. Prokopiou, Peter E. Caines, and Aditya Mahajan McGill University

More information

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities l -Regularized Linear Regression: Persistence and Oracle Inequalities Peter Bartlett EECS and Statistics UC Berkeley slides at http://www.stat.berkeley.edu/ bartlett Joint work with Shahar Mendelson and

More information

Learning Algorithms for Minimizing Queue Length Regret

Learning Algorithms for Minimizing Queue Length Regret Learning Algorithms for Minimizing Queue Length Regret Thomas Stahlbuhk Massachusetts Institute of Technology Cambridge, MA Brooke Shrader MIT Lincoln Laboratory Lexington, MA Eytan Modiano Massachusetts

More information

More on Estimation. Maximum Likelihood Estimation.

More on Estimation. Maximum Likelihood Estimation. More on Estimation. In the previous chapter we looked at the properties of estimators and the criteria we could use to choose between types of estimators. Here we examine more closely some very popular

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 2: Introduction to statistical learning theory. 1 / 22 Goals of statistical learning theory SLT aims at studying the performance of

More information

ONLINE ADVERTISEMENTS AND MULTI-ARMED BANDITS CHONG JIANG DISSERTATION

ONLINE ADVERTISEMENTS AND MULTI-ARMED BANDITS CHONG JIANG DISSERTATION c 2015 Chong Jiang ONLINE ADVERTISEMENTS AND MULTI-ARMED BANDITS BY CHONG JIANG DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and

More information