Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods
|
|
- Lorin French
- 5 years ago
- Views:
Transcription
1 Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Sandeep Juneja Tata Institute of Fundamental Research Mumbai, India joint work with Peter Glynn Applied Probability Frontiers - Banff event June 1, 2015 Glynn, Juneja: Ordinal optimization 1
2 The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Glynn, Juneja: Ordinal optimization 2
3 The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Goal is only to identify the best design and not to actually estimate the performance. Glynn, Juneja: Ordinal optimization 2
4 The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Goal is only to identify the best design and not to actually estimate the performance. We have the ability to generate iid realizations of each of the d random variables. Glynn, Juneja: Ordinal optimization 2
5 The ordinal optimization problem d different designs are compared on the basis of random performance measures X (i), i d, and the goal is to identify i = arg min EX (j). 1 j d Goal is only to identify the best design and not to actually estimate the performance. We have the ability to generate iid realizations of each of the d random variables. We focus primarily on d = 2, so given independent samples of X we want to find if EX > 0 or EX < 0. Glynn, Juneja: Ordinal optimization 2
6 Determining the smallest mean population Sample Values Popula'on Glynn, Juneja: Ordinal optimization 3
7 Talk Overview Estimating difference of mean values relies on central limit theorem with an associated slow n 1/2 convergence rate. Glynn, Juneja: Ordinal optimization 4
8 Talk Overview Estimating difference of mean values relies on central limit theorem with an associated slow n 1/2 convergence rate. Ho et al. (1990) observed that identifying the best system typically has a faster convergence rate. Glynn, Juneja: Ordinal optimization 4
9 Talk Overview Estimating difference of mean values relies on central limit theorem with an associated slow n 1/2 convergence rate. Ho et al. (1990) observed that identifying the best system typically has a faster convergence rate. Finding EX < EY, or EX > EY easier than determining the value EX EY. Glynn, Juneja: Ordinal optimization 4
10 L. Dai (1996) showed using large deviation methods that the probability of false selection decays at an exponential rate under mild light tailed assumptions. Thus, if EX 1 < min i 2 EX i then 1 lim n n log P( X 1 (n) > min X i (n)) = I i 2 for large deviations rate function I > 0. Glynn, Juneja: Ordinal optimization 5
11 L. Dai (1996) showed using large deviation methods that the probability of false selection decays at an exponential rate under mild light tailed assumptions. Thus, if EX 1 < min i 2 EX i then 1 lim n n log P( X 1 (n) > min X i (n)) = I i 2 for large deviations rate function I > 0. In a series of papers by Chen, Yucesan, L Dai, Chick and others (2000, 200*) attempted to optimize the budget allocated to each population under Normality assumption. Glynn, Juneja: Ordinal optimization 5
12 L. Dai (1996) showed using large deviation methods that the probability of false selection decays at an exponential rate under mild light tailed assumptions. Thus, if EX 1 < min i 2 EX i then 1 lim n n log P( X 1 (n) > min X i (n)) = I i 2 for large deviations rate function I > 0. In a series of papers by Chen, Yucesan, L Dai, Chick and others (2000, 200*) attempted to optimize the budget allocated to each population under Normality assumption. Substantial literature on selecting the best system amongst many alternatives using ranking/selection procedures. Gaussian assumption is critical to most of the analysis (Nelson and others) Glynn, Juneja: Ordinal optimization 5
13 Glynn and J (2004) observed that if then for p i > 0, d i=1 p i = 1 P( X 1 (p 1 n) > min i 2 EX 1 < min i 2 EX i X i (p i n)) e nh(p 1,...,p d ) so that H(p 1,..., p d ) can be optimised to determine optimal allocations. Glynn, Juneja: Ordinal optimization 6
14 Glynn and J (2004) observed that if then for p i > 0, d i=1 p i = 1 P( X 1 (p 1 n) > min i 2 EX 1 < min i 2 EX i X i (p i n)) e nh(p 1,...,p d ) so that H(p 1,..., p d ) can be optimised to determine optimal allocations. Significant literature since then relying on large deviations analysis (E.g., Hunter and Pasupathy 2013, Szechtman and Yucesan 2008, Broadie, Han Zeevi 2007, Blanchet, Liu, Zwart 2008). Glynn, Juneja: Ordinal optimization 6
15 Some observations If P(FS) e ni, for some I > 0, then n = 1 I log(1/δ)) ensures P(FS) δ. Glynn, Juneja: Ordinal optimization 7
16 Some observations If P(FS) e ni, for some I > 0, then n = 1 I log(1/δ)) ensures P(FS) δ. O(log(1/δ)) effort is necessary. If log(1/δ) 1 ɛ samples are generated, then as δ 0. positive no. P(X i A) log(1/δ)1 ɛ = δ log(1/δ) ɛ >> δ Glynn, Juneja: Ordinal optimization 7
17 HOPE However methods relying on P(FS) e ni, rely on estimating the large deviations rate function I from the samples generated. Glynn, Juneja: Ordinal optimization 8
18 HOPE However methods relying on P(FS) e ni, rely on estimating the large deviations rate function I from the samples generated. One hopes for algorihms that for n = O(log(1/δ)) ensure that at least asymptotically P(FS) δ, that is, lim sup P(FS)δ 1 1 δ 0 even when the means are separated by a fixed and known ɛ > 0. So that min 1 i d EX i < EX j ɛ for all suboptimal j. Glynn, Juneja: Ordinal optimization 8
19 Contributions We argue through two popular implementations that these rate functions are difficult to estimate accurately using O(log(1/δ)) samples Glynn, Juneja: Ordinal optimization 9
20 Contributions We argue through two popular implementations that these rate functions are difficult to estimate accurately using O(log(1/δ)) samples Enroute, we identify the large deviations rate function of the empirically estimated rate function. This may be of independent interest. Glynn, Juneja: Ordinal optimization 9
21 Key negative result Given any (ɛ, δ) algorithm - one that correctly separates designs with mean difference at least ɛ with lim sup P(FS)δ 1 1, δ 0 Glynn, Juneja: Ordinal optimization 10
22 Key negative result Given any (ɛ, δ) algorithm - one that correctly separates designs with mean difference at least ɛ with lim sup P(FS)δ 1 1, δ 0 We prove that for populations with unbounded support under mild restrictions, the expected number of samples cannot be O(log(1/δ)). Glynn, Juneja: Ordinal optimization 10
23 Positive contributions Under explicitly available moment upper bounds, we develop truncation based O(log(1/δ)) computation time (ɛ, δ) algorithms. Glynn, Juneja: Ordinal optimization 11
24 Positive contributions Under explicitly available moment upper bounds, we develop truncation based O(log(1/δ)) computation time (ɛ, δ) algorithms. We also adapt the recently proposed sequential algorithms in multi-armed bandit regret setting to this pure exploration setting. Glynn, Juneja: Ordinal optimization 11
25 Two phase ordinal optimisation implementation Glynn, Juneja: Ordinal optimization 12
26 Basic large deviations theory Suppose X 1, X 2,..., X n are i.i.d. samples of X and a > EX. Glynn, Juneja: Ordinal optimization 13
27 Basic large deviations theory Suppose X 1, X 2,..., X n are i.i.d. samples of X and a > EX. Then, for θ > 0, Cramer s bound P( X n a) e n(θa Λ(θ)) where Λ(θ) = log Ee θx. Glynn, Juneja: Ordinal optimization 13
28 Basic large deviations theory Suppose X 1, X 2,..., X n are i.i.d. samples of X and a > EX. Then, for θ > 0, Cramer s bound where Λ(θ) = log Ee θx. Cramer s Theorem P( X n a) e n(θa Λ(θ)) P( X ni (a)(1+o(1)) n a) = e where, the large deviations rate function I (a) = sup (θa Λ(θ)). θ R Glynn, Juneja: Ordinal optimization 13
29 The rate function I(x) EX x Glynn, Juneja: Ordinal optimization 14
30 A simple setting Consider a single rv X with unknown mean EX. Need to decide whether EX > 0 or EX < 0 with error probability δ. Glynn, Juneja: Ordinal optimization 15
31 A simple setting Consider a single rv X with unknown mean EX. Need to decide whether EX > 0 or EX < 0 with error probability δ. Then if EX < 0, we may take exp( ni (0)) as a proxy for P( X n 0), the probability of false selection. Glynn, Juneja: Ordinal optimization 15
32 A simple setting Consider a single rv X with unknown mean EX. Need to decide whether EX > 0 or EX < 0 with error probability δ. Then if EX < 0, we may take exp( ni (0)) as a proxy for P( X n 0), the probability of false selection. If EX > 0, we may again take exp( ni (0)) as a proxy for P( X n 0), the probability of false selection. Glynn, Juneja: Ordinal optimization 15
33 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Glynn, Juneja: Ordinal optimization 16
34 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is Glynn, Juneja: Ordinal optimization 16
35 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Glynn, Juneja: Ordinal optimization 16
36 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Second phase - Generate log(1/δ)/îm(0) = m/îm(0) n samples of X. Glynn, Juneja: Ordinal optimization 16
37 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Second phase - Generate log(1/δ)/îm(0) = m/îm(0) n samples of X. Decide the sign of EX based on whether X n > 0 or X n 0. Glynn, Juneja: Ordinal optimization 16
38 A two phase implementation Thus, log(1/δ) I (0) samples ensure that P(FS) δ. Hence, one reasonable estimation procedure is First phase - Generate m = log(1/δ) samples to estimate I (0) by Îm(0). Second phase - Generate log(1/δ)/îm(0) = m/îm(0) n samples of X. Decide the sign of EX based on whether X n > 0 or X n 0. We now discuss estimation of I (0). Glynn, Juneja: Ordinal optimization 16
39 Estimating rate function Glynn, Juneja: Ordinal optimization 17
40 Graphic view of I(0) The log-moment generating function of X Λ(θ) = log E exp(θx ) is convex with Λ(0) = 0 and Λ (0) = EX. Glynn, Juneja: Ordinal optimization 18
41 Graphic view of I(0) The log-moment generating function of X Λ(θ) = log E exp(θx ) is convex with Λ(0) = 0 and Λ (0) = EX. Then, I (0) = inf θ Λ(θ). Glynn, Juneja: Ordinal optimization 18
42 Λ(θ) Graphic view of I(0) The log-moment generating function of X Λ(θ) = log E exp(θx ) is convex with Λ(0) = 0 and Λ (0) = EX. Then, I (0) = inf θ Λ(θ). EX I(0) Glynn, Juneja: Ordinal optimization 18 θ
43 Estimating rate function I (0) A natural estimator for I (0) based on samples (X i : 1 i m) is Î m (0) = inf ˆΛ m (θ) θ R where ( ) 1 m ˆΛ m (θ) = log exp(θx i ) m i=1 Glynn, Juneja: Ordinal optimization 19
44 Λ(θ) Graphic view of estimated log moment generating function θ Im(0) Glynn, Juneja: Ordinal optimization 20
45 Large deviations rate function of Îm(0) Can show that for a > I (0), lim m 1 m log P(Îm(0) a), equals Glynn, Juneja: Ordinal optimization 21
46 Large deviations rate function of Îm(0) Can show that for a > I (0), lim m where lim m ( 1 m log P inf θ 1 m log P(Îm(0) a), equals 1 m m i=1 e θx i e a ) = inf θ R I θ(e a ), I θ (ν) = sup(αν log E exp(αe θx )). α Glynn, Juneja: Ordinal optimization 21
47 Large deviations rate function of Îm(0) Can show that for a > I (0), lim m where lim m ( 1 m log P inf θ 1 m log P(Îm(0) a), equals 1 m m i=1 e θx i e a ) = inf θ R I θ(e a ), I θ (ν) = sup(αν log E exp(αe θx )). α Further, for a < I (0), lim m 1 m log P(Îm(0) a) = sup θ R I θ (e a ) Glynn, Juneja: Ordinal optimization 21
48 Also, the natural estimator for I (x) is Î m (x) = sup(θx ˆΛ m (θ)). θ R Glynn, Juneja: Ordinal optimization 22
49 Also, the natural estimator for I (x) is Î m (x) = sup(θx ˆΛ m (θ)). θ R it can be seen that for a > I (x), the rate function for Î m (x), equals inf θ R I θ(e a+θx ) for a > I (x), and equals a < I (x). sup I θ (e a+θx ) θ R Glynn, Juneja: Ordinal optimization 22
50 Negative Result 1 Failure of the Naive Glynn, Juneja: Ordinal optimization 23
51 Returning to two phase procedure We generate samples X 1,..., X m for m = log(1/δ) and set Î m (0) = inf θ ˆΛ m (θ). Glynn, Juneja: Ordinal optimization 24
52 Returning to two phase procedure We generate samples X 1,..., X m for m = log(1/δ) and set Î m (0) = inf θ ˆΛ m (θ). Then generate log(1/δ)/î m (0) = m/î m (0) samples of X in the second phase. Glynn, Juneja: Ordinal optimization 24
53 Returning to two phase procedure We generate samples X 1,..., X m for m = log(1/δ) and set Î m (0) = inf θ ˆΛ m (θ). Then generate log(1/δ)/î m (0) = m/î m (0) samples of X in the second phase. ( ) P(FS) E exp m Î m (0) I (0) Errors due to large values of Î m (0) that lead to under sampling in second phase - Due to conspiratorial large deviations behaviour of all the terms. Glynn, Juneja: Ordinal optimization 24
54 Lower Bound for P(FS) Can show lim m 1 m ( log P(FS) = sup sup I (0) ) a>0 θ a I θ(e a ). Glynn, Juneja: Ordinal optimization 25
55 Lower Bound for P(FS) Can show lim m 1 m ( log P(FS) = sup sup I (0) ) a>0 θ a I θ(e a ). In particular, lim inf δ 0 P(FS)δ 1 > 1. Glynn, Juneja: Ordinal optimization 25
56 Negative Result 2 Fall of the Sophisticated Glynn, Juneja: Ordinal optimization 26
57 Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). Glynn, Juneja: Ordinal optimization 27
58 Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). If exp( mîm(0)) δ, stop. Glynn, Juneja: Ordinal optimization 27
59 Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). If exp( mîm(0)) δ, stop. Else, provide another c log(1/δ) of computational budget and so on. Glynn, Juneja: Ordinal optimization 27
60 Another common estimation method Generate m = c log(1/δ) samples in the first phase to estimate I (0) by Î m (0). If exp( mîm(0)) δ, stop. Else, provide another c log(1/δ) of computational budget and so on. Can identify distributions for which this would not be accurate. Glynn, Juneja: Ordinal optimization 27
61 Need to find X with EX < 0 so that P( X m 0 and exp( mî m (0)) δ) > δ. Glynn, Juneja: Ordinal optimization 28
62 Need to find X with EX < 0 so that P( X m 0 and exp( mî m (0)) δ) > δ. This follows if for some θ < 0, I θ (e 1/c ) < 1/c. Glynn, Juneja: Ordinal optimization 28
63 Need to find X with EX < 0 so that P( X m 0 and exp( mî m (0)) δ) > δ. This follows if for some θ < 0, I θ (e 1/c ) < 1/c. Can be shown for some light tailed distributions even for large c. Glynn, Juneja: Ordinal optimization 28
64 Λ(θ) Graphic view: I (0) < 1/c and Îm(0) > 1/c -1/c θ Glynn, Juneja: Ordinal optimization 29
65 Negative Result 3 Perdition Glynn, Juneja: Ordinal optimization 30
66 Let L denote a collection of probability distributions with finite mean, unbounded positive support, open under Kullback-Leibler distance. Glynn, Juneja: Ordinal optimization 31
67 Let L denote a collection of probability distributions with finite mean, unbounded positive support, open under Kullback-Leibler distance. Let P(ɛ, δ) denote a policy that can adaptively sample from any two distributions in L with lim sup P(FS)δ 1 1. δ 0 Glynn, Juneja: Ordinal optimization 31
68 Let L denote a collection of probability distributions with finite mean, unbounded positive support, open under Kullback-Leibler distance. Let P(ɛ, δ) denote a policy that can adaptively sample from any two distributions in L with lim sup P(FS)δ 1 1. δ 0 Result - For any two such distributions in L with arbitrarily apart mean, P(ɛ, δ) policy on average requires more than O(log(1/δ)) samples. Glynn, Juneja: Ordinal optimization 31
69 Details Under probability P a, {X i } has distribution F, mean µ F, {Yi } has distribution G, mean µ G < µ F ɛ. Glynn, Juneja: Ordinal optimization 32
70 Details Under probability P a, {X i } has distribution F, mean µ F, {Yi } has distribution G, mean µ G < µ F ɛ. Under probability P b {Xi } has distribution F, {Yi } has distribution G > µ F + ɛ. Glynn, Juneja: Ordinal optimization 32
71 Details Under probability P a, {X i } has distribution F, mean µ F, {Yi } has distribution G, mean µ G < µ F ɛ. Under probability P b {Xi } has distribution F, {Yi } has distribution G > µ F + ɛ. Under P(ɛ, δ), lim inf δ 0 where KL(G, G) = x R E a T G log(1/δ) 1 3 KL(G, G). ( ) log dg d G (x) dg(x)). Glynn, Juneja: Ordinal optimization 32
72 Proof similar to Lai and Robbins 1985, Mannor and Tsitsiklis 2004 Asymptotically, P a ( algorithm selects F) 1 δ Glynn, Juneja: Ordinal optimization 33
73 Proof similar to Lai and Robbins 1985, Mannor and Tsitsiklis 2004 Asymptotically, P a ( algorithm selects F) 1 δ P b ( algorithm selects F) δ. Glynn, Juneja: Ordinal optimization 33
74 Proof similar to Lai and Robbins 1985, Mannor and Tsitsiklis 2004 Asymptotically, P a ( algorithm selects F) 1 δ P b ( algorithm selects F) δ. ( TG d P b ( algo. selects F) = E G ) a dg (Y i)i ( algo. selects F) i=1 ) E a (e T G KL(G, G) I ( algo. selects F) and the result is easily deduced. e 2Ea(T G ) KL(G, G) P a ( algo. selects F) Glynn, Juneja: Ordinal optimization 33
75 Result Given G with finite mean and unbounded positive support, for any α > 0, and k > µ G there exists a distribution G k such that KL(G, G k ) α and µ Gk k. Glynn, Juneja: Ordinal optimization 34
76 Way forward Glynn, Juneja: Ordinal optimization 35
77 Additional information needed to attain log(1/δ) convergence rates. Glynn, Juneja: Ordinal optimization 36
78 Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Glynn, Juneja: Ordinal optimization 36
79 Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Easy to develop such bounds once suitable Lyapunov functions can be identified (not to be discussed here) Glynn, Juneja: Ordinal optimization 36
80 Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Easy to develop such bounds once suitable Lyapunov functions can be identified (not to be discussed here) Use such bounds to develop (ɛ, δ) strategies by truncating random variables while controlling the error to be less than ɛ. Then use Hoeffding. Glynn, Juneja: Ordinal optimization 36
81 Additional information needed to attain log(1/δ) convergence rates. Often upper bounds on moments may be available in simulation models. Easy to develop such bounds once suitable Lyapunov functions can be identified (not to be discussed here) Use such bounds to develop (ɛ, δ) strategies by truncating random variables while controlling the error to be less than ɛ. Then use Hoeffding. Recent multi-armed-bandits methods do this in a sequential and adaptive manner. Glynn, Juneja: Ordinal optimization 36
82 δ guarantees using log(1/δ) samples: Static approach Glynn, Juneja: Ordinal optimization 37
83 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Glynn, Juneja: Ordinal optimization 38
84 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. Glynn, Juneja: Ordinal optimization 38
85 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. Glynn, Juneja: Ordinal optimization 38
86 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. If X n < 0, declare, EX < 0. Glynn, Juneja: Ordinal optimization 38
87 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. If X n < 0, declare, EX < 0. Hoeffding s inequality can be used to bound probability of false selection. Suppose, EX < ɛ, P( X n 0) P( X n EX ɛ) exp( 2nɛ 2 /(b a) 2 ) Glynn, Juneja: Ordinal optimization 38
88 P(ɛ, δ) policy for bounded random variables Consider X ɛ = {X : EX > ɛ, a X b}. A reasonable algorithm on X ɛ is: Generate iid samples X 1, X 2,..., X n of X. If X n 0 declare, EX > 0. If X n < 0, declare, EX < 0. Hoeffding s inequality can be used to bound probability of false selection. Suppose, EX < ɛ, P( X n 0) P( X n EX ɛ) exp( 2nɛ 2 /(b a) 2 ) Thus, n = (b a)2 2ɛ 2 log(1/δ) provides the desired P(ɛ, δ) policy. Glynn, Juneja: Ordinal optimization 38
89 Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, Glynn, Juneja: Ordinal optimization 39
90 Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, ( ) a f (b) max EXI (X u) u. X f (x) f (b) Glynn, Juneja: Ordinal optimization 39
91 Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, ( ) a f (b) max EXI (X u) u. X f (x) f (b) This follows from the optimisation problem max E[XI (X u)] X such that Ef (X ) a. Glynn, Juneja: Ordinal optimization 39
92 Truncation when explicit bounds on function of rv known Suppose f is a strictly increasing convex function and we know that Ef (X ) a. Further, X b. Then, ( ) a f (b) max EXI (X u) u. X f (x) f (b) This follows from the optimisation problem max E[XI (X u)] X such that Ef (X ) a. This has a two point solution relying on observation that Y = E[X X < u]i (X < u) + E[X X u]i (X u) is better than X Glynn, Juneja: Ordinal optimization 39
93 Pure Exploration Multi-Armed Bandit Approach Glynn, Juneja: Ordinal optimization 40
94 Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Glynn, Juneja: Ordinal optimization 41
95 Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Let a = arg max a A p a and let a = p a p a. Glynn, Juneja: Ordinal optimization 41
96 Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Let a = arg max a A p a and let a = p a p a. Even Dar et al devise a sequential sampling strategy to find a with probability at least 1 δ. Glynn, Juneja: Ordinal optimization 41
97 Pure exploration bandit algorithms Total n arms. Each arm a when sampled gives a Bernoulli reward with mean p a. Let a = arg max a A p a and let a = p a p a. Even Dar et al devise a sequential sampling strategy to find a with probability at least 1 δ. Expected computational effort O a a ln(n/δ) 2 a. Glynn, Juneja: Ordinal optimization 41
98 Popular successive rejection algorithm Sample every arm a once and let ˆµ t a be the average reward of arm a by time t; Glynn, Juneja: Ordinal optimization 42
99 Popular successive rejection algorithm Sample every arm a once and let ˆµ t a be the average reward of arm a by time t; Each arm a such that ˆµ t max ˆµ t a 2α t is removed from consideration. α t = log(5nt 2 /δ)/t; Glynn, Juneja: Ordinal optimization 42
100 Popular successive rejection algorithm Sample every arm a once and let ˆµ t a be the average reward of arm a by time t; Each arm a such that ˆµ t max ˆµ t a 2α t is removed from consideration. α t = log(5nt 2 /δ)/t; t = t + 1; Repeat till one arm left. Glynn, Juneja: Ordinal optimization 42
101 Key idea 1 µ2 µ1 0 t Glynn, Juneja: Ordinal optimization 43
102 Generalizing to heavy tails In Bubeck, Cesa-Bianchi, Lugosi 2013, they develop log(1/δ) algorithms in regret settings when 1 + ɛ moments of each arm output are available. Glynn, Juneja: Ordinal optimization 44
103 Generalizing to heavy tails In Bubeck, Cesa-Bianchi, Lugosi 2013, they develop log(1/δ) algorithms in regret settings when 1 + ɛ moments of each arm output are available. Analysis again relies on forming a cone, which they do through truncation and clever usage of Bernstein inequality. Glynn, Juneja: Ordinal optimization 44
104 Generalizing to heavy tails In Bubeck, Cesa-Bianchi, Lugosi 2013, they develop log(1/δ) algorithms in regret settings when 1 + ɛ moments of each arm output are available. Analysis again relies on forming a cone, which they do through truncation and clever usage of Bernstein inequality. We adapt these algorithms to pure exploration settings. Glynn, Juneja: Ordinal optimization 44
105 In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection Glynn, Juneja: Ordinal optimization 45
106 In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection However, we show through a series of negative results that this convergence rate, or equivalently O(log(1/δ)) computation algorithms, are not possible for unbounded support distributions without further restrictions. Glynn, Juneja: Ordinal optimization 45
107 In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection However, we show through a series of negative results that this convergence rate, or equivalently O(log(1/δ)) computation algorithms, are not possible for unbounded support distributions without further restrictions. Under explicit restrictions on moments of underlying random variables, we devise O(log(1/δ)) algorithms. Glynn, Juneja: Ordinal optimization 45
108 In conclusion We discussed that the ordinal optimisation method relies on exponential convergence rate of the the probability of false selection However, we show through a series of negative results that this convergence rate, or equivalently O(log(1/δ)) computation algorithms, are not possible for unbounded support distributions without further restrictions. Under explicit restrictions on moments of underlying random variables, we devise O(log(1/δ)) algorithms. These are closely related to evolving multi-arm bandit literature on pure exploration methods. Glynn, Juneja: Ordinal optimization 45
Ordinal Optimization and Multi Armed Bandit Techniques
Ordinal Optimization and Multi Armed Bandit Techniques Sandeep Juneja. with Peter Glynn September 10, 2014 The ordinal optimization problem Determining the best of d alternative designs for a system, on
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationStratégies bayésiennes et fréquentistes dans un modèle de bandit
Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016
More informationThe information complexity of sequential resource allocation
The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation
More informationProceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.
Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huscha, and S. E. Chic, eds. OPTIMAL COMPUTING BUDGET ALLOCATION WITH EXPONENTIAL UNDERLYING
More informationOn the Complexity of Best Arm Identification in Multi-Armed Bandit Models
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March
More informationThe information complexity of best-arm identification
The information complexity of best-arm identification Emilie Kaufmann, joint work with Olivier Cappé and Aurélien Garivier MAB workshop, Lancaster, January th, 206 Context: the multi-armed bandit model
More informationTwo optimization problems in a stochastic bandit model
Two optimization problems in a stochastic bandit model Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishnan Journées MAS 204, Toulouse Outline From stochastic optimization
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More informationRevisiting the Exploration-Exploitation Tradeoff in Bandit Models
Revisiting the Exploration-Exploitation Tradeoff in Bandit Models joint work with Aurélien Garivier (IMT, Toulouse) and Tor Lattimore (University of Alberta) Workshop on Optimization and Decision-Making
More informationOn the Complexity of Best Arm Identification with Fixed Confidence
On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse
More informationIntroduction to Bandit Algorithms. Introduction to Bandit Algorithms
Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationReinforcement Learning
Reinforcement Learning Lecture 5: Bandit optimisation Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce bandit optimisation: the
More informationTractable Sampling Strategies for Ordinal Optimization
Submitted to Operations Research manuscript Please, provide the manuscript number!) Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the
More informationThe Multi-Armed Bandit Problem
The Multi-Armed Bandit Problem Electrical and Computer Engineering December 7, 2013 Outline 1 2 Mathematical 3 Algorithm Upper Confidence Bound Algorithm A/B Testing Exploration vs. Exploitation Scientist
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationarxiv: v2 [stat.ml] 14 Nov 2016
Journal of Machine Learning Research 6 06-4 Submitted 7/4; Revised /5; Published /6 On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models arxiv:407.4443v [stat.ml] 4 Nov 06 Emilie Kaufmann
More informationOn the Complexity of Best Arm Identification with Fixed Confidence
On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, joint work with Emilie Kaufmann CNRS, CRIStAL) to be presented at COLT 16, New York
More informationDynamic resource allocation: Bandit problems and extensions
Dynamic resource allocation: Bandit problems and extensions Aurélien Garivier Institut de Mathématiques de Toulouse MAD Seminar, Université Toulouse 1 October 3rd, 2014 The Bandit Model Roadmap 1 The Bandit
More informationLecture 4: Lower Bounds (ending); Thompson Sampling
CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from
More informationOn Bayesian bandit algorithms
On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms
More informationBayesian and Frequentist Methods in Bandit Models
Bayesian and Frequentist Methods in Bandit Models Emilie Kaufmann, Telecom ParisTech Bayes In Paris, ENSAE, October 24th, 2013 Emilie Kaufmann (Telecom ParisTech) Bayesian and Frequentist Bandits BIP,
More informationMulti armed bandit problem: some insights
Multi armed bandit problem: some insights July 4, 20 Introduction Multi Armed Bandit problems have been widely studied in the context of sequential analysis. The application areas include clinical trials,
More informationAnytime optimal algorithms in stochastic multi-armed bandits
Rémy Degenne LPMA, Université Paris Diderot Vianney Perchet CREST, ENSAE REMYDEGENNE@MATHUNIV-PARIS-DIDEROTFR VIANNEYPERCHET@NORMALESUPORG Abstract We introduce an anytime algorithm for stochastic multi-armed
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision
More informationLecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem
Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:
More informationSubsampling, Concentration and Multi-armed bandits
Subsampling, Concentration and Multi-armed bandits Odalric-Ambrym Maillard, R. Bardenet, S. Mannor, A. Baransi, N. Galichet, J. Pineau, A. Durand Toulouse, November 09, 2015 O-A. Maillard Subsampling and
More informationConcentration inequalities and tail bounds
Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples
More informationStochastic Contextual Bandits with Known. Reward Functions
Stochastic Contextual Bandits with nown 1 Reward Functions Pranav Sakulkar and Bhaskar rishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering University of Southern
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems Sébastien Bubeck Theory Group Part 1: i.i.d., adversarial, and Bayesian bandit models i.i.d. multi-armed bandit, Robbins [1952]
More informationConsider the context of selecting an optimal system from among a finite set of competing systems, based
INFORMS Journal on Computing Vol. 25, No. 3, Summer 23, pp. 527 542 ISSN 9-9856 print) ISSN 526-5528 online) http://dx.doi.org/.287/ijoc.2.59 23 INFORMS Optimal Sampling Laws for Stochastically Constrained
More informationInternational Conference on Applied Probability and Computational Methods in Applied Sciences
International Conference on Applied Probability and Computational Methods in Applied Sciences Organizing Committee Zhiliang Ying, Columbia University and Shanghai Center for Mathematical Sciences Jingchen
More informationReward Maximization Under Uncertainty: Leveraging Side-Observations on Networks
Reward Maximization Under Uncertainty: Leveraging Side-Observations Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks Swapna Buccapatnam AT&T Labs Research, Middletown, NJ
More informationComputational Oracle Inequalities for Large Scale Model Selection Problems
for Large Scale Model Selection Problems University of California at Berkeley Queensland University of Technology ETH Zürich, September 2011 Joint work with Alekh Agarwal, John Duchi and Clément Levrard.
More informationAn Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement
An Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement Satyanath Bhat Joint work with: Shweta Jain, Sujit Gujar, Y. Narahari Department of Computer Science and Automation, Indian
More informationMultiple Identifications in Multi-Armed Bandits
Multiple Identifications in Multi-Armed Bandits arxiv:05.38v [cs.lg] 4 May 0 Sébastien Bubeck Department of Operations Research and Financial Engineering, Princeton University sbubeck@princeton.edu Tengyao
More informationLecture 2: Review of Basic Probability Theory
ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent
More informationBandits : optimality in exponential families
Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities
More informationStat 260/CS Learning in Sequential Decision Problems.
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Exponential families. Cumulant generating function. KL-divergence. KL-UCB for an exponential
More informationConcentration of Measures by Bounded Size Bias Couplings
Concentration of Measures by Bounded Size Bias Couplings Subhankar Ghosh, Larry Goldstein University of Southern California [arxiv:0906.3886] January 10 th, 2013 Concentration of Measure Distributional
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationOn the Complexity of A/B Testing
JMLR: Workshop and Conference Proceedings vol 35:1 3, 014 On the Complexity of A/B Testing Emilie Kaufmann LTCI, Télécom ParisTech & CNRS KAUFMANN@TELECOM-PARISTECH.FR Olivier Cappé CAPPE@TELECOM-PARISTECH.FR
More informationAn Empirical Algorithm for Relative Value Iteration for Average-cost MDPs
2015 IEEE 54th Annual Conference on Decision and Control CDC December 15-18, 2015. Osaka, Japan An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs Abhishek Gupta Rahul Jain Peter
More informationNearly Optimal Sampling Algorithms for Combinatorial Pure Exploration
JMLR: Workshop and Conference Proceedings vol 65: 55, 207 30th Annual Conference on Learning Theory Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration Editor: Under Review for COLT 207
More informationStatistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it
Statistics 300B Winter 08 Final Exam Due 4 Hours after receiving it Directions: This test is open book and open internet, but must be done without consulting other students. Any consultation of other students
More informationLecture 3: Lower Bounds for Bandit Algorithms
CMSC 858G: Bandits, Experts and Games 09/19/16 Lecture 3: Lower Bounds for Bandit Algorithms Instructor: Alex Slivkins Scribed by: Soham De & Karthik A Sankararaman 1 Lower Bounds In this lecture (and
More informationAdaptive Concentration Inequalities for Sequential Decision Problems
Adaptive Concentration Inequalities for Sequential Decision Problems Shengjia Zhao Tsinghua University zhaosj12@stanford.edu Ashish Sabharwal Allen Institute for AI AshishS@allenai.org Enze Zhou Tsinghua
More informationIntroduction to Self-normalized Limit Theory
Introduction to Self-normalized Limit Theory Qi-Man Shao The Chinese University of Hong Kong E-mail: qmshao@cuhk.edu.hk Outline What is the self-normalization? Why? Classical limit theorems Self-normalized
More informationMulti-armed Bandits in the Presence of Side Observations in Social Networks
52nd IEEE Conference on Decision and Control December 0-3, 203. Florence, Italy Multi-armed Bandits in the Presence of Side Observations in Social Networks Swapna Buccapatnam, Atilla Eryilmaz, and Ness
More informationInformational Confidence Bounds for Self-Normalized Averages and Applications
Informational Confidence Bounds for Self-Normalized Averages and Applications Aurélien Garivier Institut de Mathématiques de Toulouse - Université Paul Sabatier Thursday, September 12th 2013 Context Tree
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationEASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University
EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play
More informationAnalysis of Thompson Sampling for the multi-armed bandit problem
Analysis of Thompson Sampling for the multi-armed bandit problem Shipra Agrawal Microsoft Research India shipra@microsoft.com avin Goyal Microsoft Research India navingo@microsoft.com Abstract We show
More informationStochastic bandits: Explore-First and UCB
CSE599s, Spring 2014, Online Learning Lecture 15-2/19/2014 Stochastic bandits: Explore-First and UCB Lecturer: Brendan McMahan or Ofer Dekel Scribe: Javad Hosseini In this lecture, we like to answer this
More informationAdaptive Learning with Unknown Information Flows
Adaptive Learning with Unknown Information Flows Yonatan Gur Stanford University Ahmadreza Momeni Stanford University June 8, 018 Abstract An agent facing sequential decisions that are characterized by
More informationCHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD
CHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD Vijay Singamasetty, Navneeth Nair, Srikrishna Bhashyam and Arun Pachai Kannu Department of Electrical Engineering Indian
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationCsaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008
LEARNING THEORY OF OPTIMAL DECISION MAKING PART I: ON-LINE LEARNING IN STOCHASTIC ENVIRONMENTS Csaba Szepesvári 1 1 Department of Computing Science University of Alberta Machine Learning Summer School,
More informationConcentration of Measures by Bounded Couplings
Concentration of Measures by Bounded Couplings Subhankar Ghosh, Larry Goldstein and Ümit Işlak University of Southern California [arxiv:0906.3886] [arxiv:1304.5001] May 2013 Concentration of Measure Distributional
More informationIntroduction to Rare Event Simulation
Introduction to Rare Event Simulation Brown University: Summer School on Rare Event Simulation Jose Blanchet Columbia University. Department of Statistics, Department of IEOR. Blanchet (Columbia) 1 / 31
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationRegional Multi-Armed Bandits
School of Information Science and Technology University of Science and Technology of China {wzy43, zrd7}@mail.ustc.edu.cn, congshen@ustc.edu.cn Abstract We consider a variant of the classic multiarmed
More informationKULLBACK-LEIBLER UPPER CONFIDENCE BOUNDS FOR OPTIMAL SEQUENTIAL ALLOCATION
Submitted to the Annals of Statistics arxiv: math.pr/0000000 KULLBACK-LEIBLER UPPER CONFIDENCE BOUNDS FOR OPTIMAL SEQUENTIAL ALLOCATION By Olivier Cappé 1, Aurélien Garivier 2, Odalric-Ambrym Maillard
More informationPure Exploration Stochastic Multi-armed Bandits
C&A Workshop 2016, Hangzhou Pure Exploration Stochastic Multi-armed Bandits Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Outline Introduction 2 Arms Best Arm Identification
More informationThe multi armed-bandit problem
The multi armed-bandit problem (with covariates if we have time) Vianney Perchet & Philippe Rigollet LPMA Université Paris Diderot ORFE Princeton University Algorithms and Dynamics for Games and Optimization
More informationBandit Algorithms. Tor Lattimore & Csaba Szepesvári
Bandit Algorithms Tor Lattimore & Csaba Szepesvári Bandits Time 1 2 3 4 5 6 7 8 9 10 11 12 Left arm $1 $0 $1 $1 $0 Right arm $1 $0 Five rounds to go. Which arm would you play next? Overview What are bandits,
More informationThe Moment Method; Convex Duality; and Large/Medium/Small Deviations
Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential
More informationRacing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors
Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conugate Priors Yichi Zhou 1 Jun Zhu 1 Jingwe Zhuo 1 Abstract Thompson sampling has impressive empirical performance for many multi-armed
More informationThe knowledge gradient method for multi-armed bandit problems
The knowledge gradient method for multi-armed bandit problems Moving beyond inde policies Ilya O. Ryzhov Warren Powell Peter Frazier Department of Operations Research and Financial Engineering Princeton
More informationChapter 7. Basic Probability Theory
Chapter 7. Basic Probability Theory I-Liang Chern October 20, 2016 1 / 49 What s kind of matrices satisfying RIP Random matrices with iid Gaussian entries iid Bernoulli entries (+/ 1) iid subgaussian entries
More informationarxiv: v1 [cs.lg] 14 Nov 2018 Abstract
Sample complexity of partition identification using multi-armed bandits Sandeep Juneja Subhashini Krishnasamy TIFR, Mumbai November 15, 2018 arxiv:1811.05654v1 [cs.lg] 14 Nov 2018 Abstract Given a vector
More informationMulti-armed Bandits and the Gittins Index
Multi-armed Bandits and the Gittins Index Richard Weber Statistical Laboratory, University of Cambridge A talk to accompany Lecture 6 Two-armed Bandit 3, 10, 4, 9, 12, 1,... 5, 6, 2, 15, 2, 7,... Two-armed
More informationAnalysis of Thompson Sampling for the multi-armed bandit problem
Analysis of Thompson Sampling for the multi-armed bandit problem Shipra Agrawal Microsoft Research India shipra@microsoft.com Navin Goyal Microsoft Research India navingo@microsoft.com Abstract The multi-armed
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationTwo generic principles in modern bandits: the optimistic principle and Thompson sampling
Two generic principles in modern bandits: the optimistic principle and Thompson sampling Rémi Munos INRIA Lille, France CSML Lunch Seminars, September 12, 2014 Outline Two principles: The optimistic principle
More informationTheory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk
Instructor: Victor F. Araman December 4, 2003 Theory and Applications of Stochastic Systems Lecture 0 B60.432.0 Exponential Martingale for Random Walk Let (S n : n 0) be a random walk with i.i.d. increments
More informationMathematics Ph.D. Qualifying Examination Stat Probability, January 2018
Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from
More informationUpper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits
Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits Alexandra Carpentier 1, Alessandro Lazaric 1, Mohammad Ghavamzadeh 1, Rémi Munos 1, and Peter Auer 2 1 INRIA Lille - Nord Europe,
More informationMS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10. x n+1 = f(x n ),
MS&E 321 Spring 12-13 Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10 Section 4: Steady-State Theory Contents 4.1 The Concept of Stochastic Equilibrium.......................... 1 4.2
More informationBandit Algorithms for Pure Exploration: Best Arm Identification and Game Tree Search. Wouter M. Koolen
Bandit Algorithms for Pure Exploration: Best Arm Identification and Game Tree Search Wouter M. Koolen Machine Learning and Statistics for Structures Friday 23 rd February, 2018 Outline 1 Intro 2 Model
More informationLearning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationSequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process
Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University
More informationSelected Exercises on Expectations and Some Probability Inequalities
Selected Exercises on Expectations and Some Probability Inequalities # If E(X 2 ) = and E X a > 0, then P( X λa) ( λ) 2 a 2 for 0 < λ
More informationMinimax strategy for Stratified Sampling for Monte Carlo
Journal of Machine Learning Research 1 2000 1-48 Submitted 4/00; Published 10/00 Minimax strategy for Stratified Sampling for Monte Carlo Alexandra Carpentier INRIA Lille - Nord Europe 40, avenue Halley
More informationProbability and Measure
Chapter 4 Probability and Measure 4.1 Introduction In this chapter we will examine probability theory from the measure theoretic perspective. The realisation that measure theory is the foundation of probability
More informationMarch 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.
Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)
More informationSub-Gaussian estimators under heavy tails
Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade Maresias, August 6th 2015 Joint with Luc Devroye (McGill) Matthieu Lerasle (CNRS/Nice) Gábor
More informationLecture 4: September Reminder: convergence of sequences
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused
More informationarxiv: v3 [cs.lg] 7 Nov 2017
ESAIM: PROCEEDINGS AND SURVEYS, Vol.?, 2017, 1-10 Editors: Will be set by the publisher arxiv:1702.00001v3 [cs.lg] 7 Nov 2017 LEARNING THE DISTRIBUTION WITH LARGEST MEAN: TWO BANDIT FRAMEWORKS Emilie Kaufmann
More informationAn Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes
An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes Prokopis C. Prokopiou, Peter E. Caines, and Aditya Mahajan McGill University
More informationl 1 -Regularized Linear Regression: Persistence and Oracle Inequalities
l -Regularized Linear Regression: Persistence and Oracle Inequalities Peter Bartlett EECS and Statistics UC Berkeley slides at http://www.stat.berkeley.edu/ bartlett Joint work with Shahar Mendelson and
More informationLearning Algorithms for Minimizing Queue Length Regret
Learning Algorithms for Minimizing Queue Length Regret Thomas Stahlbuhk Massachusetts Institute of Technology Cambridge, MA Brooke Shrader MIT Lincoln Laboratory Lexington, MA Eytan Modiano Massachusetts
More informationMore on Estimation. Maximum Likelihood Estimation.
More on Estimation. In the previous chapter we looked at the properties of estimators and the criteria we could use to choose between types of estimators. Here we examine more closely some very popular
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 2: Introduction to statistical learning theory. 1 / 22 Goals of statistical learning theory SLT aims at studying the performance of
More informationONLINE ADVERTISEMENTS AND MULTI-ARMED BANDITS CHONG JIANG DISSERTATION
c 2015 Chong Jiang ONLINE ADVERTISEMENTS AND MULTI-ARMED BANDITS BY CHONG JIANG DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and
More information