An Asymptotically Optimal Algorithm for the Max k-armed Bandit Problem

Size: px
Start display at page:

Download "An Asymptotically Optimal Algorithm for the Max k-armed Bandit Problem"

Transcription

1 An Asymptotically Optimal Algorithm for the Max k-armed Bandit Problem Matthew J. Streeter February 27, 2006 CMU-CS Stephen F. Smith School of Computer Science Carnegie Mellon University Pittsburgh, PA Abstract We present an asymptotically optimal algorithm for the max variant of the k-armed bandit problem. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the expected maximum payoff received over a series of n trials. Subject to certain distributional assumptions, we show that O ) 1δ ln(n)2 (ln( ) ɛ 2 trials are sufficient to identify, with probability at least 1 δ, a machine whose expected maximum payoff is within ɛ of optimal. This result leads to a strategy for solving the problem that is asymptotically optimal in the following sense: the gap between the expected maximum payoff obtained by using our strategy for n trials and that obtained by pulling the single best arm for all n trials approaches zero as n.

2 Keywords: multi-armed bandit problem, PAC bounds

3 1 Introduction In the k-armed bandit problem one is faced with a set of k slot machines, each having an arm that, when pulled, yields a payoff from a fixed (but unknown) distribution. The goal is to allocate trials to the arms so as to maximize the expected cumulative payoff obtained over a series of n trials. Solving the problem entails striking a balance between exploration (determining which arm yields the highest mean payoff) and exploitation (repeatedly pulling this arm). In the max k-armed bandit problem, the goal is to maximize the expected maximum (rather than cumulative) payoff. This version of the problem arises in practice when tackling combinatorial optimization problems for which a number of randomized search heuristics exist: given k heuristics, each yielding a stochastic outcome when applied to some particular problem instance, we wish to allocate trials to the heuristics so as to maximize the maximum payoff (e.g., the maximum number of clauses satisfied by any sampled variable assignment, the minimum makespan of any sampled schedule). Cicirello and Smith [3] show that a max k-armed bandit approach yields good performance on the resource-constrained project scheduling problem with maximum time lags (RCPSP/max). 1.1 Summary of Results We consider a restricted version of the max k-armed bandit problem in which each arm yields payoff drawn from a generalized extreme value (GEV) distribution (defined in 2). This paper presents the first provably asymptotically optimal algorithm for this problem. Roughly speaking, the reason assuming a GEV distribution is the Extremal Types Theorem (stated in 2), which states that the distribution of the sample maximum of n independent identically distributed random variables approaches a GEV distribution as n. A more formal justification is given in 3. For reasons that will become clear, the nature of our results depend on the shape parameter (ξ) of the GEV distribution. Assuming all arms have ξ 0, our results can be summarized as follows. ɛ 2 ) Let a be an arm that yields payoff drawn from a GEV distribution with unknown parameters; let M n denote the maximum payoff obtained after pulling a n times; and let m n = E[M n ]. 1δ ln(n)2 We provide an algorithm that, after pulling the arm O (ln( ) times, produces an estimate m n of m n with the property that P [ m n m n < ɛ] 1 δ. Let a 1, a 2,..., a k be k arms, each yielding payoff from (distinct) GEV distributions with unknown parameters. Let m i n denote the expected maximum payoff obtained by pulling the i th arm n times, and let m n = max 1 i k m i n. We provide an algorithm that, when run for n pulls, obtains expected maximum payoff m n o(1). Our results for the case ξ > 0 are similar, except that our estimates and expected maximum payoffs come within arbitrarily small factors (rather [ than absolute distances) ] of optimality. Specifically, our estimates have the property that P < mn α ɛ m n α 1 < 1 + ɛ 1 δ for constant α 1 independent of n, while the expected maximum payoff obtained by using our algorithm for n pulls is m n(1 o(1)). 1

4 1.2 Related Work The classical k-armed bandit problem was first studied by Robbins [7] and has since been the subject of numerous papers; see Berry and Fristedt [1] and Kaelbling [6] for overviews. In a paper similar in spirit to ours, Fong [5] showed that an initial exploration phase consisting of O( k ɛ 2 ln( k δ )) pulls is sufficient to identify, with probability at least 1 δ, an arm whose mean payoff is within ɛ of optimal. Theorem 2 of this paper proves a bound similar to Fong s on the number of pulls needed to identify an arm whose expected maximum payoff (over a series of n trials) is near-optimal. The max variant of the k-armed bandit problem was first studied by Cicirello and Smith [2, 3], who successfully used a heuristic for the max k-armed bandit problem to select among priority rules for the RCPSP/max. The design of Cicirello and Smith s heuristic is motivated by an analysis of the special case in which each arm s payoff distribution is a GEV distribution with shape parameter ξ = 0, but they do not rigorously analyze the heuristic s behavior. Our paper is more theoretical and less empirical: on the one hand we do not perform experiments on any practical combinatorial problem, but on the other hand we provide stronger performance guarantees under weaker distributional assumptions. 1.3 Notation For an arbitrary cumulative distribution function G, let the random variable M G n be defined by M G n = max{z 1, Z 2,..., Z n } where Z 1, Z 2,..., Z n are independent random variables, each having distribution G. Let m G n = E[M G n ]. 2 Extreme Value Theory This section provides a self-contained overview of results in extreme value theory that are relevant to this work. Our presentation is based on the text by Coles [4]. The central result of extreme value theory is an analogue of the central limit theorem that applies to extremely rare events. Recall that the central limit theorem states that (under certain regularity conditions) the distribution of the sum of n independent, identically distributed (i.i.d) random variables converges to a normal distribution as n. The extremal types theorem states that (under certain regularity conditions) the distribution of the maximum of n i.i.d random variables converges to a generalized extreme value (GEV) distribution. Definition (GEV distribution). A random variable Z has a generalized extreme value distribution if, for constants µ, σ > 0, and ξ, P[Z z] = GEV (µ,σ,ξ) (z), where GEV (µ,σ,ξ) (z) = exp ( ( 1 + ξ(z µ) σ ) ) 1 ξ 2

5 for z {z : 1+ξ(z µ)σ 1 > 0}, and GEV (µ,σ,ξ) (z) = 1 otherwise. The case ξ = 0 is interpreted as the limit ( ( )) µ z lim GEV ξ (µ,σ,ξ )(z) = exp exp. 0 σ The following three propositions establish properties of the GEV distribution. Proposition 1. Let Z be a random variable with P[Z z] = GEV (µ,σ,ξ) (z). Then where is the complete gamma function and is Euler s constant. µ + σ (Γ(1 ξ) 1) if ξ < 1, ξ 0 ξ E[Z] = µ + σγ if ξ = 0 if ξ 1 Γ(z) = 0 γ = lim n t z 1 exp( t) dt ( n k=1 ) 1 k ln(n) Proposition 2. Let G = GEV (µ,σ,ξ). Then M G n has distribution GEV (µ,σ,ξ ), where { µ + σ µ = ξ (nξ 1) if ξ 0 µ + σ ln(n) otherwise, σ = σn ξ, and ξ = ξ. given by Proposition 2 into Proposition 1 gives an expres- Substituting the parameters of Mn G sion for m G n. Proposition 3. Let G = GEV (µ,σ,ξ) where ξ < 1. Then It follows that m G n = for ξ > 0, m G n is Θ(n ξ ); for ξ = 0, m G n is Θ(ln(n)); and for ξ < 0, m G n = µ σ ξ Θ(nξ ). { ( µ + σ ξ n ξ Γ(1 ξ) 1 ) if ξ 0 µ + σγ + σ ln(n) otherwise. 3

6 20 Expected maximum Shape = 0.1 Shape = 0 Shape = ln(n) Figure 1: The effect of the shape parameter (ξ) on the expected maximum of n independent draws from a GEV distribution. It is useful to have a visual picture of what Proposition 3 means. Figure 1 plots m G n as a function of n for three GEV distributions with µ = 0, σ = 1, and ξ {0.1, 0, 0.1}. The central result of extreme value theory is the following theorem. The Extremal Types Theorem. Let G be an arbitrary cumulative distribution function, and suppose there exist sequences of constants {a n > 0} and {b n } such that [ ] M G lim P n b n z = G (z) (2.1) n a n for any continuity point z of G, where G is a not a point mass. Then there exist constants µ, σ > 0, and ξ such that G (z) = GEV (µ,σ,ξ) (z) z. Furthermore, lim P [M n z] = GEV (µan+bn,σa n n,ξ)(z). Condition (2.1) holds for a variety of distributions including the normal, lognormal, uniform, and Cauchy distributions. 3 The Max k-armed Bandit Problem Definition (max k-armed bandit instance). An instance I = (n, G) of the max k-armed bandit problem is an ordered pair whose first element is a positive integer n, and whose second element is a set G = {G 1, G 2,..., G k } of k cumulative distribution functions, each thought of as an arm on a slot machine. The i th arm, when pulled, returns a realization of a random variable with distribution G i. Definition (max k-armed bandit strategy). A max k-armed bandit strategy S is an algorithm that, given an instance I = (n, G) of the max k-armed bandit problem, performs a sequence of n 4

7 arm-pulls. For any strategy S and integer l n, we denote by S l (I) the expected maximum payoff obtained by running S on I for l trials: [ ] S l (I) = E max p j 0 j l where p j is the payoff obtained from the j th pull, and we define p 0 = 0. Our goal is to come up with a strategy S such that S n (I) is near-maximal. Note that the problem is ill-posed (i.e., there is no clear criterion for preferring one strategy over another) unless we make some assumptions about the distributions G i. We will assume that each arm G i = GEV (µi,σ i,ξ i ) is a GEV distribution whose parameters satisfy 1. µ i µ u 2. 0 < σ l σ i σ u 3. ξ l ξ i ξ u < 1 2 for known constants µ u, σ l, σ u, ξ l, and ξ u. There are two arguments for assuming that each arm is a GEV distribution. First, in practice the distribution of payoffs returned by a strong heuristic may be approximately GEV, even if the conditions of the Extremal Type Theorem are not formally satisfied [2]. A second argument runs as follows. Suppose I = (n, G) is an instance of the max k-armed bandit problem in which each distribution G i G satisfies condition (2.1) of the Extremal Types Theorem. Consider the instance Ī = ( n, Ḡ), where Ḡ = {Ḡ1, Ḡ2,..., Ḡk}, and arm m Ḡi returns the maximum payoff obtained by pulling the corresponding arm G i m times. Effectively, Ī is a restricted version of I in which the arms must be pulled in batches of size m, rather than in any arbitrary order. For m sufficiently large, the Extremal Types Theorem guarantees that for each i, Ḡ i GEV (µi,σ i,ξ i ) for some constants µ i, σ i, and ξ i. Thus, the instance I = ( n, m G ) with G = {G 1, G 2,..., G k } and G i = GEV (µi,σ i,ξ i ) is approximately equivalent to Ī and satisfies our distributional assumptions. The purpose of the restrictions on the parameters µ i, σ i, and ξ i is to ensure that each GEV distribution has finite, bounded mean and variance. 4 An Asymptotically Optimal Algorithm We will analyze the following max k-armed bandit strategy. 5

8 Strategy S 1 (ɛ, δ): 1. (Exploration) For each arm G i G: 1δ ln(n)2 (a) Using t = O (ln( ) ɛ 2 ) samples of G i, obtain an estimate m G i n of m G i n. Assuming that arm G i has shape parameter ξ i 0, our estimate will have the property that P [ m G i n m G i n < ɛ ] 1 δ. 2. (Exploitation) Set î = arg max 1 i k m G i n, and pull arm Gî for the remaining n tk trials. If an arm G i has [ shape parameter ξ i > ] 0, the estimate obtained in step 1 (a) will instead have the property that P < 1 + ɛ 1 δ for constant α 1 independent of n. 1 1+ɛ < mn α 1 m n α 1 The following theorem shows that with appropriate settings of ɛ and δ, strategy S 1 is asymptotically optimal. Theorem 1. Let I = (n, G) be an instance of the max k-armed bandit problem, where G = {G 1, G 2,..., G k } and G i = GEV (µi,σ i,ξ i ). Let m n = max 1 i k m G i n, ξ max = max 1 i k ξ i, and ( S = S 1 3 k, 1 ). n kn 2 Then if ξ max 0, while if ξ max > 0, Proof. lim n S n(i) = m n S n (I) lim n m n = 1. Case ξ max 0. Let ˆm n = m G î n (where î is the arm selected for exploitation in step 2). Then ˆm n tk is the expected maximum payoff obtained during the exploitation step, so Claim 1. ˆm n ˆm n tk is O( tk n ). S n (I) ˆm n tk. Proof of Claim 1. Let µ = µî, σ = σî, and ξ = ξî be the parameters of the arm selected for exploitation. Suppose ξ = 0. Then by Proposition 3, ˆm n ˆm n tk = σ (ln(n) ln(n tk)). 6

9 Expanding ln(x + β) in powers of β about β = 0 for β < x 2, x > 0 yields ln(x + β) ln(x) = i=1 = β x 1 1 β x 1 < 2 β x i=1 ( 1)i+1 1 i so for n sufficiently large, ˆm n ˆm n tk 2σ tk = O( tk). n n Now suppose ξ < 0. By Proposition 3, ˆm n ˆm n tk = σγ(1 ξ ξ)(nξ (n t) ξ ) = O((n t) ξ n ξ ) where we have used the fact that σγ(1 ξ) < 0. Expanding (n ξ t)ξ in powers of t about t = 0 gives ( ) t (n t) ξ = n ξ ξn ξ 1 2 t O n ξ 2 and so (n t) ξ n ξ ξn ξ 1 t ξ l t n = O( t n ). With probability at least 1 kδ, all estimates obtained during the exploration phase are within ɛ of the correct values, so that m n ˆm n < 2ɛ. Assuming m n ˆm n < 2ɛ, it follows that ( β x ) i ( β x ) i m n ˆm n tk = (m n ˆm n ) + ( ˆm n ˆm n tk ) < 2ɛ + O( tk) n ( ) = 2ɛ + k O ln( 1 (ln n)2 ) n δ ɛ 2 = O ( ) where = ln(nk) ln(n) 2 3 k, and on the second line we have used Claim 1. Thus, n S n (I) (1 kδ) (m n O ( )) = m n O ( ) = m n o(1). Case ξ max > 0. See Appendix A. Theorem 1 completes our analysis of the performance of S 1. It remains only to describe how the estimates in step 1 (a) are obtained. 4.1 Estimating m n We adopt the following notation: Let G = GEV µ,σ,ξ denote a GEV distribution with (unknown) parameters µ, σ, and ξ satisfying the conditions stated in 3, and 7

10 let m i = m G i. To estimate m n, we first obtain an accurate estimate of ξ. Then 1. if ξ 0 (so that the growth of m n as a function of ln n is linear), we estimate m n by first estimating m 1 and m 2, then performing linear interpolation; 2. otherwise we estimate m n by first estimating m 1, m 2, and m 4, then performing a nonlinear interpolation Estimating m i for i {1, 2, 4} The following two lemmas use well-known ideas to efficiently estimate m i for small values of i. Lemma 1. Let i be a positive integer and let ɛ > 0 be a real number. Then O ( i ɛ 2 ) draws from G suffice to obtain an estimate m i of m i such that P[ m i m i < ɛ] 3 4. Proof. First consider the special case i = 1. Let X denote the sum of t draws from G, for some to-be-specified positive integer t. Then E[X] = m 1 t and V ar[x] = σ 2 t, where σ is the (unknown) standard deviation of G. We take m 1 = X as our estimate of m t 1. Then P[ m 1 m 1 ɛ] = P[ t m 1 tm 1 tɛ] = P[ X E[X] tɛ σ V ar[x]] σ2 tɛ 2 where the last inequality is Chebyshev s. Thus to guarantee P [ m 1 m 1 ɛ] 1 we must set 4 t = 4 σ2 = O ( ) 1 ɛ 2 ɛ (note that due to the assumptions in 3, σ is O(1)). 2 For i > 1, we let X be the sum of t block maxima (each the maximum of i independent draws from G), which increases the number of samples required by a factor of i. To boost the probability that m i m i < ɛ from 3 to 1 δ, we use the median of means 4 method. Lemma 2. Let i be a positive integer and let ɛ > 0 and δ (0, 1) be real numbers. O ( ln( 1) ) i δ ɛ draws from G suffice to obtain an estimate 2 mi of m i such that P [ m n m n < ɛ] 1 δ. Proof. We invoke Lemma 1 r times (for r to be determined), yielding a set E = { m (1) i, m (2) i,..., m (r) of estimates of m i. Let m i be the median element of E. Let A = { m (j) i E : m (j) i m i < ɛ} be the set of accurate estimates of m i ; and let A = A. Then m i m i ɛ implies A r, while 2 E[A] 3 r. Using the standard Chernoff bound, we have 4 [ P [ m i m i ɛ] P A r ] ( exp r ) 2 C for constant C > 0. Thus r = O ( ln( 1 δ )) repetitions suffice to ensure P[ m i m i > ɛ] δ. Then i } 8

11 4.1.2 Estimating m n when ξ = 0 Lemma 3. Assume G has shape parameter ξ = ) 0. Let n be a positive integer and let ɛ > 0 and 1δ ln(n)2 δ (0, 1) be real numbers. Then O (ln( ) draws from G suffice to obtain an estimate m ɛ 2 n of m n such that P[ m n m n < ɛ] 1 δ. Proof. By Proposition 3, m i = µ + σγ + σ ln(i). Thus m n = m 1 + (m 2 m 1 ) log 2 (n). (4.1) Let m 1 and m 2 be estimates of m 1 and m 2, respectively, and let m n be the estimate of m n obtained by plugging m 1 and m 2 into (4.1). Define i = m i m i for i {1, 2, n}. Then Thus to guarantee P[ n i {1, 2}. By Lemma 2, this requires O Estimating m n when ξ 0 n (1 + log 2 (n))( ). < ɛ] 1 δ, it suffices that P [ i ) 1δ (ln n)2 (ln( ) draws from G. ɛ 2 ] ɛ 2(1+log 2 (n)) 1 δ 2 for all For the purpose of the proofs presented here, we will make a minor assumption concerning an arm s shape parameter ξ i : we assume that for some known constant ξ > 0, ξ i < ξ ξ i = 0. Removing this assumption does not fundamentally change the results, but it makes the proofs more complicated. Lemma 5 shows how to efficiently estimate ξ. Lemmas 6 and 7 show how to efficiently estimate m n in the cases ξ < 0 and ξ > 0, respectively. We will use the following lemma. Lemma 4. m 4 m σ and m 2 m σ. Proof. See Appendix A. Lemma 5. For real numbers ɛ > 0 and δ (0, 1), O ( ln( 1 δ ) 1 ɛ 2 ) draws from G suffice to obtain an estimate ξ of ξ such that P [ ξ ξ < ɛ ] 1 δ. Proof. Using Proposition 3, it is straightforward to check that for any ξ < 1, ( ) m4 m 2 ξ = log 2. (4.2) m 2 m 1 9

12 Let m 1, m 2, and m 4 be estimates of m 1, m 2, and m 4, respectively, and let ξ be the estimate of ξ obtained by plugging m 1, m 2, and m 4 into (4.2). Define m = max i {1,2,4} m i m i and define ξ = ξ ξ. We wish to upper bound ξ as a function of m. In Claim 1 of Theorem 1 we showed that ln(x + β) ln(x) 2 β for β x. Letting x 2 N = m 4 m 2 and D = m 2 m 1, and noting that ξ = log 2 (N) log 2 (D) = 1 (ln(n) ln(d)), ln 2 it follows that ξ 1 ( 2(2 m ) + 2(2 ) m) ln(2) N D for m < 1 2 min(n, D). Thus by Lemma 4 and the assumption that σ σ l, ξ is O( m ). Define i = m i m i, so that m = max i {1,2,4} i. Then to guarantee P[ ξ < ɛ] 1 δ, it suffices that P [ i Ω(ɛ)] 1 δ 3 for all i {1, 2, 4}. By Lemma 2, this requires O ( ln( 1 δ ) 1 ɛ 2 ) draws from G. Lemma 6. Assume G has shape parameter ξ ξ. Let n be a positive integer and let ɛ > 0 and δ (0, 1) be real numbers. Then O ( ln( 1 δ ) 1 ɛ 2 ) draws from G suffice to obtain an estimate mn of m n such that P[ m n m n < ɛ] 1 δ. Proof. By Proposition 3, m i = µ + σ ξ ( i ξ Γ(1 ξ) 1 ). Define so that α 1 = µ σξ 1 α 2 = σξ 1 Γ(1 ξ) α 3 = 2 ξ m i = α 1 + α 2 α log 2 (i) 3. (4.3) Plugging in the values i = 1, i = 2, and i = 4 into (4.3) yields a system of three quadratic equations. Solving this system for α 1, α 2, and α 3 yields α 1 = (m 1 m 4 m 2 2)(m 1 2m 2 + m 4 ) 1 α 2 = ( 2m 1 m 2 + m m 2 2)(m 1 2m 2 + m 4 ) 1 α 3 = (m 4 m 2 )(m 2 m 1 ) 1. Let m 1, m 2, and m 4 be estimates of m 1, m 2, and m 4, respectively. Plugging m 1, m 2, and m 4 into the above equations yields estimates, say ᾱ 1, ᾱ 2, and ᾱ 3, of α 1, α 2, and α 3, respectively. Define m = max i {1,2,4} m i m i and α = max i {1,2,3} ᾱ i α i. To complete the proof, we show that m n m n is O( m ). The argument consists of two parts: in claims 1 through 3 we show that α is O( m ), then in Claim 4 we show that m n m n is O( α ). Claim 1. Each of the numerators in the expressions for α 1, α 2, and α 3 has absolute value bounded from above, while each of the denominators has absolute value bounded from below. (The bounds are independent of the unknown parameters of G.) 10

13 Proof of claim 1. The numerators will have bounded absolute value as long as m 1, m 2, and m 3 are bounded. Upper bounds on m 1, m 2, and m 3 follow from the restrictions on the parameters µ, σ, and ξ. As for the denominators, by Lemma 4 we have m 1 2m 2 + m 4 = (m 2 m 1 )(α 3 1) 1 8 σ l 2 ξ 1. Claim 2. Let N and D be fixed real numbers, and let β N and β D be real numbers with β D < D Then N+β N D+β D N is O( β D N + β D ). Proof of claim 2. First, using the Taylor series expansion of N D+β D, N D+β D N = Nβ ( D D D 2 i=0 ( 1)i+1 β D D Nβ D D 2 (1 β D D 1 ) = O ( β D ). ) i 2. Then N+β N D+β D N D N D+β D N D + β N D+β D = O ( β N + β D ). Claim 3. α is O( m ). Proof of claim 3. We show that ᾱ 1 α 1 is O( m ). Similar arguments show that ᾱ 2 α 2 and ᾱ 4 α 4 are O( m ), which proves the claim. To see that ᾱ 1 α 1 is O( m ), let N = m 1 m 4 m 2 2, and let D = m 1 2m 2 + m 4, so that α 1 = N D. Define N and D in the natural way so that ᾱ 1 = N D. Because m 1,m 2, and m 3 are all O(1) (by Claim 1), it follows that both N N and D D are O( m ). That ᾱ 1 α 1 is O( m ) follows by Claim 2. Claim 4. m n m n is O ( α ). Proof of claim 4. Because ξ l ξ ξ it must be that 0 < 2 ξ l < α3 < 2 ξ < 1. So for α sufficiently small, 0 < ᾱ 3 < 1. m n m n = (ᾱ 1 + ᾱ 2 ᾱ log 2 (n) 3 ) (α 1 + α 2 α log 2 (n) 3 ) ᾱ 1 α 1 + ᾱ 2 ᾱ log 2 (n) 3 ᾱ 2 α log 2 (n) 3 + ᾱ 2 α log 2 (n) 3 α 2 α log 2 (n) 3 ᾱ 1 α 1 + ᾱ 2 ᾱ 3 α 3 + ᾱ 2 α 2 = O( α ) where on the third line we have used the fact that both α 3 and ᾱ 3 are between 0 and 1, and in the last line we have used the fact that ᾱ 2 is O(1). 11

14 Putting claims 3 and 4 together, m n m n is O( m ). Define i = m i m i, so that m = max i {1,2,4} i. Thus to guarantee P[ m n m n < ɛ] 1 δ, it suffices that P [ i < Ω(ɛ)] 1 δ 3 for all i {1, 2, 4}. By Lemma 2, this requires O(ln( 1 δ ) 1 ɛ 2 ) draws from G. Lemma 7. Assume G has shape parameter ξ ) ξ. Let n be a positive integer and let ɛ > 0 and 1δ ln(n)2 δ (0, 1) be real numbers. Then O (ln( ) draws from G suffice to obtain an estimate m ɛ 2 n of m n such that [ 1 P 1 + ɛ < m ] n α 1 < (1 + ɛ) 1 δ m n α 1 where α 1 = µ σ ξ. Proof. See Appendix A. Putting the results of lemmas 3, 5, 6, and 7 together, we obtain the following theorem. Theorem 2. Let ) n be a positive integer and let ɛ > 0 and δ (0, 1) be real numbers. Then 1δ ln(n)2 O (ln( ) draws from G suffice to obtain an estimate m ɛ 2 n of m n such that with probability at least 1 δ, one of the following holds: ξ 0 and m n m n < ɛ, or ξ > 0 and 1 1+ɛ < mn α 1 m n α 1 < 1 + ɛ, where α 1 = µ σ ξ. Proof. First, invoke Lemma 5 with parameters ξ and δ. Then invoke one of Lemmas 3, 6, or (depending on the estimate ξ obtained from Lemma 5) with parameters ɛ and δ. 2 Theorem 2 shows that step 1 (a) of strategy S 1 can be performed as described. 5 Conclusions The max k-armed bandit problem is a variant of the classical k-armed bandit problem with practical applications to combinatorial optimization. Motivated by extreme value theory, we studied a restricted version of this problem in which each arm yields payoff drawn from a GEV distribution. We derived PAC bounds on the sample complexity of estimating m n, the expected maximum of n draws from a GEV distribution. Using these bounds, we showed that a simple algorithm for the max k-armed bandit problem is asymptotically optimal. Ours is the first algorithm for this problem with rigorous asymptotic performance guarantees. References [1] Donald. A. Berry and Bert Fristedt. Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London,

15 [2] Vincent A. Cicirello and Stephen F. Smith. Heuristic selection for stochastic search optimization: Modeling solution quality by extreme value theory. In Proceedings of the 10th International Conference on Principles and Practice of Constraint Programming, pages , [3] Vincent A. Cicirello and Stephen F. Smith. The max k-armed bandit: A new model of exploration applied to search heuristic selection. In Proceedings of AAAI 2005, pages , [4] Stuart Coles. An Introduction to Statistical Modeling of Extreme Values. Springer-Verlag, London, [5] Philip W. L. Fong. A quantitative study of hypothesis selection. In International Conference on Machine Learning, pages , [6] Leslie P. Kaelbling. Learning in Embedded Systems. The MIT Press, Cambridge, MA, [7] Herbert Robbins. Some aspects of sequential design of experiments. Bulletin of the American Mathematical Society, 58: , Appendix A Theorem 1. Let I = (n, G) be an instance of the max k-armed bandit problem, where G = {G 1, G 2,..., G k } and G i = GEV (µi,σ i,ξ i ). Let m n = max 1 i k m G i n, ξ max = max 1 i k ξ i, and ( S = S 1 3 k, 1 ). n kn 2 Then if ξ max 0, while if ξ max > 0, lim n S n(i) = m n S n (I) lim n m n = 1. Proof. For ξ max 0, the theorem was proved in the main text. It remains only to address the case ξ max > 0. Case ξ max > 0. We will prove the stronger claim that S n (I) α 1 m n α 1 = 1 O( ) (5.1) where = ln(nk) ln(n) 2 3 k n and α 1 = max 1 i k α i, where α i = µ i σ i ξ i. 13

16 For the moment, let us assume that all arms have shape parameter ξ i > 0. Let A be the event (which occurs with probability at least 1 kδ) that all estimates obtained in step 1 (a) satisfy the inequality in Theorem 2. Claim 1. To prove (5.1), it suffices to show that A implies m n α 1 ˆm n tk α 1 = 1 + O( ). Proof of claim 1. Because S n (I) ˆm n tk and the event A occurs with probability at least 1 kδ, it suffices to show that A implies (1 δk) ˆm n tk α 1 m n α 1 = 1 O( ). Because δk( ˆm n tk) m n α 1 is O ( 1 n 2 ) = o( ), it suffices to show that A implies ˆm n tk α 1 m n α 1 = 1 O( ). This can be rewritten as m 1 n α 1 = ( ˆm n tk α 1 ) = ( ˆm 1 O( ) n tk α 1 )(1 + O( )) (we can 1 replace with 1 + O( ) because for r < 1, 1 = 1 + r < 1 + 2r). 1 O( ) 2 1 r 1 r Claim 2. ˆm n ˆα 1 ˆm n tk ˆα 1 = 1 + O( ). Proof of claim 2. Using Proposition 3, ( ln ) ˆm n ˆα 1 ˆm n tk ˆα 1 ( = ln ) n ξ (n t) ξ = ξ (ln(n) ln(n tk)) = O( tk n ) = O( ) The claim follows from the fact that exp(β) < 1 + 3β for β < 1, so that exp(o( )) = O( ). Claim 3. A implies that for all i, m i n α 1 m i n α 1 < 1 + ɛ. Proof of claim 3. By definition, α 1 = α1 i β for some β 0. The claim follows from the fact that for positive N and D and β 0, N N+β < 1 + ɛ implies < 1 + ɛ. D D+β Claim 4. A implies Proof of claim 4. m n α 1 ˆm n tk α 1 m n α 1 mîn tk α 1 = 1 + O( ). = m n α 1 m m n α n α 1 mîn α 1 mîn α 1 1 mîn α 1 mîn α 1 (1 + ɛ) 1 (1 + ɛ) (1 + O( )) = 1 + O( ) where in the second step we have used claims 2 and mîn tk α 1

17 Putting claims 1 and 4 together completes the proof. To remove the assumption that all arms have ξ i > 0, we need to show that A implies that for n sufficiently large, the arms î and i (the only arms that play a role in the proof) will have shape parameters > 0. This follows from the fact that if ξ i 0, m i n is O(ln(n)), while if ξ i > ξ > 0, m i n is Ω(n ξ i ). Lemma 4. m 4 m σ and m 2 m σ. Proof. If ξ = 0, then by Proposition 3, m 4 m 2 = m 2 m 1 = ln(2)σ and we are done. Otherwise, It thus suffices to prove the following claim. Claim 1. m 4 m 2 = σ(2 ξ 1)ξ 1 Γ(1 ξ) and m 2 m 1 = σ(4 ξ 2 ξ )ξ 1 Γ(1 ξ). { 2 ξ 1 min ξ< 1 ξ 2 { 4 ξ 2 ξ min ξ< 1 2 ξ } Γ(1 ξ) 1 4, and } Γ(1 ξ) 1 8. Proof of claim 1. We state without proof the following properties of the Γ function: min y> 1 2 Γ(z) z! z 2 Γ(z) 1 2 z > 0 Making the change of variable y = ξ, it suffices to show { } 1 2 y min Γ(1 + y) 1, and y> 1 y 4 2 (5.2) { } 2 y (1 2 y ) Γ(1 + y) 1 8. (5.3) (5.2) holds because for 1 2 < y 1, while for y > 1, y 1 2 y Γ(1 + y) 1 y 2 Γ(1 + y) 1 4, 1 2 y Γ(1 + y) y Similarly, (5.3) holds because for 1 2 < y 1, y + 1! 2y 2 y (1 2 y ) Γ(1 + y) 1 y 8,

18 while for y > 1, 2 y (1 2 y ) Γ(1 + y) y y + 1! 2y(2 y ) 1 8. Lemma 7. Assume G has shape parameter ξ ) ξ. Let n be a positive integer and let ɛ > 0 and 1δ ln(n)2 δ (0, 1) be real numbers. Then O (ln( ) draws from G suffice to obtain an estimate m ɛ 2 n of m n such that [ 1 P 1 + ɛ < m ] n α 1 < (1 + ɛ) 1 δ m n α 1 where α 1 = µ σ ξ. Proof. We use the same estimation procedure as in the proof of Lemma 6. Let α 1, α 2, α 3, α, and m be defined as they were in that proof. The inequality 1 < mn α 1 1+ɛ m n α 1 < 1 + ɛ is the same as ln( mn α 1 m n α 1 ) < ln(1 + ɛ). For ɛ < 1, 2 ln(1 + ɛ) 7 ɛ, so it suffices to guarantee that 8 ln( m n α 1 ) ln(m n α 1 ) < 7 8 ɛ. Claim 1. ln( m n α 1 ) ln(m n α 1 ) is O(ln(n) α ). Proof of claim 1. Because ln( m n α 1 ) ln( m n ᾱ 1 ) is O( α ), it suffices to show that ln( m n ᾱ 1 ) ln(m n α 1 ) is O(ln(n) α ). This is true because ( ) ln( m n ᾱ 1 ) = ln ᾱ 2 ᾱ log 2 (n) 3 = log 2 (n) ln(ᾱ 3 ) + ln(ᾱ 2 ) = log ( 2 (n) ln(α 3 ) + ln(α 2 ) ± O (ln(n) α ) = ln α 2 α log 2 (n) 3 ± O (ln(n) α ) = ln(m n α 1 ) ± O (ln(n) α ). Setting α < Ω (ln(n) 1 ɛ) then guarantees ln( m n ) ln(m n ) < 7 ɛ. By Claim 3 of the 8 proof of Lemma 6 (which did not depend on the assumption ξ < 0), α is O ( m ), so we require P[ m < Ω (ln(n) 1 ɛ)] 1 δ. Define i = m i m i, so that m = max i {1,2,4} i. It suffices that P ) [ i < Ω (ln(n) 1 ɛ)] 1 δ for i {1, 2, 4}. By Lemma 2, ensuring this requires 3 1δ ln(n)2 O (ln( ) draws from G. ɛ 2 16

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Lecture 4: Lower Bounds (ending); Thompson Sampling

Lecture 4: Lower Bounds (ending); Thompson Sampling CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

An Online Algorithm for Maximizing Submodular Functions

An Online Algorithm for Maximizing Submodular Functions An Online Algorithm for Maximizing Submodular Functions Matthew Streeter December 20, 2007 CMU-CS-07-171 Daniel Golovin School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This research

More information

Bandits for Online Optimization

Bandits for Online Optimization Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each

More information

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:

More information

8 Laws of large numbers

8 Laws of large numbers 8 Laws of large numbers 8.1 Introduction We first start with the idea of standardizing a random variable. Let X be a random variable with mean µ and variance σ 2. Then Z = (X µ)/σ will be a random variable

More information

Lecture 8: February 8

Lecture 8: February 8 CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 8: February 8 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They

More information

The Fitness Level Method with Tail Bounds

The Fitness Level Method with Tail Bounds The Fitness Level Method with Tail Bounds Carsten Witt DTU Compute Technical University of Denmark 2800 Kgs. Lyngby Denmark arxiv:307.4274v [cs.ne] 6 Jul 203 July 7, 203 Abstract The fitness-level method,

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

1 MDP Value Iteration Algorithm

1 MDP Value Iteration Algorithm CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using

More information

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding

More information

Multiple Identifications in Multi-Armed Bandits

Multiple Identifications in Multi-Armed Bandits Multiple Identifications in Multi-Armed Bandits arxiv:05.38v [cs.lg] 4 May 0 Sébastien Bubeck Department of Operations Research and Financial Engineering, Princeton University sbubeck@princeton.edu Tengyao

More information

Lecture 4: September Reminder: convergence of sequences

Lecture 4: September Reminder: convergence of sequences 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused

More information

The expansion of random regular graphs

The expansion of random regular graphs The expansion of random regular graphs David Ellis Introduction Our aim is now to show that for any d 3, almost all d-regular graphs on {1, 2,..., n} have edge-expansion ratio at least c d d (if nd is

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

Notes 1 : Measure-theoretic foundations I

Notes 1 : Measure-theoretic foundations I Notes 1 : Measure-theoretic foundations I Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Section 1.0-1.8, 2.1-2.3, 3.1-3.11], [Fel68, Sections 7.2, 8.1, 9.6], [Dur10,

More information

The information complexity of sequential resource allocation

The information complexity of sequential resource allocation The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Discriminative Learning can Succeed where Generative Learning Fails

Discriminative Learning can Succeed where Generative Learning Fails Discriminative Learning can Succeed where Generative Learning Fails Philip M. Long, a Rocco A. Servedio, b,,1 Hans Ulrich Simon c a Google, Mountain View, CA, USA b Columbia University, New York, New York,

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present

More information

6.1 Moment Generating and Characteristic Functions

6.1 Moment Generating and Characteristic Functions Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

Lecture 5: January 30

Lecture 5: January 30 CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 5: January 30 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They

More information

Homework 4 Solutions

Homework 4 Solutions CS 174: Combinatorics and Discrete Probability Fall 01 Homework 4 Solutions Problem 1. (Exercise 3.4 from MU 5 points) Recall the randomized algorithm discussed in class for finding the median of a set

More information

Lectures 2 3 : Wigner s semicircle law

Lectures 2 3 : Wigner s semicircle law Fall 009 MATH 833 Random Matrices B. Való Lectures 3 : Wigner s semicircle law Notes prepared by: M. Koyama As we set up last wee, let M n = [X ij ] n i,j=1 be a symmetric n n matrix with Random entries

More information

Learning symmetric non-monotone submodular functions

Learning symmetric non-monotone submodular functions Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru

More information

The space complexity of approximating the frequency moments

The space complexity of approximating the frequency moments The space complexity of approximating the frequency moments Felix Biermeier November 24, 2015 1 Overview Introduction Approximations of frequency moments lower bounds 2 Frequency moments Problem Estimate

More information

1 Introduction. 2 Measure theoretic definitions

1 Introduction. 2 Measure theoretic definitions 1 Introduction These notes aim to recall some basic definitions needed for dealing with random variables. Sections to 5 follow mostly the presentation given in chapter two of [1]. Measure theoretic definitions

More information

Lecture 9. = 1+z + 2! + z3. 1 = 0, it follows that the radius of convergence of (1) is.

Lecture 9. = 1+z + 2! + z3. 1 = 0, it follows that the radius of convergence of (1) is. The Exponential Function Lecture 9 The exponential function 1 plays a central role in analysis, more so in the case of complex analysis and is going to be our first example using the power series method.

More information

Random Bernstein-Markov factors

Random Bernstein-Markov factors Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit

More information

Lecture 4: Two-point Sampling, Coupon Collector s problem

Lecture 4: Two-point Sampling, Coupon Collector s problem Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collector s problem Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms

More information

Bandits : optimality in exponential families

Bandits : optimality in exponential families Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities

More information

Module 3. Function of a Random Variable and its distribution

Module 3. Function of a Random Variable and its distribution Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given

More information

Lecture 9: March 26, 2014

Lecture 9: March 26, 2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 9: March 26, 204 Spring 204 Scriber: Keith Nichols Overview. Last Time Finished analysis of O ( n ɛ ) -query

More information

Experience-Efficient Learning in Associative Bandit Problems

Experience-Efficient Learning in Associative Bandit Problems Alexander L. Strehl strehl@cs.rutgers.edu Chris Mesterharm mesterha@cs.rutgers.edu Michael L. Littman mlittman@cs.rutgers.edu Haym Hirsh hirsh@cs.rutgers.edu Department of Computer Science, Rutgers University,

More information

Hoeffding, Chernoff, Bennet, and Bernstein Bounds

Hoeffding, Chernoff, Bennet, and Bernstein Bounds Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

THE first formalization of the multi-armed bandit problem

THE first formalization of the multi-armed bandit problem EDIC RESEARCH PROPOSAL 1 Multi-armed Bandits in a Network Farnood Salehi I&C, EPFL Abstract The multi-armed bandit problem is a sequential decision problem in which we have several options (arms). We can

More information

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Proofs for Large Sample Properties of Generalized Method of Moments Estimators Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my

More information

1 A Support Vector Machine without Support Vectors

1 A Support Vector Machine without Support Vectors CS/CNS/EE 53 Advanced Topics in Machine Learning Problem Set 1 Handed out: 15 Jan 010 Due: 01 Feb 010 1 A Support Vector Machine without Support Vectors In this question, you ll be implementing an online

More information

Distinguishing a truncated random permutation from a random function

Distinguishing a truncated random permutation from a random function Distinguishing a truncated random permutation from a random function Shoni Gilboa Shay Gueron July 9 05 Abstract An oracle chooses a function f from the set of n bits strings to itself which is either

More information

The Multi-Armed Bandit Problem

The Multi-Armed Bandit Problem Università degli Studi di Milano The bandit problem [Robbins, 1952]... K slot machines Rewards X i,1, X i,2,... of machine i are i.i.d. [0, 1]-valued random variables An allocation policy prescribes which

More information

Electronic Companion to: What matters in school choice tie-breaking? How competition guides design

Electronic Companion to: What matters in school choice tie-breaking? How competition guides design Electronic Companion to: What matters in school choice tie-breaking? How competition guides design This electronic companion provides the proofs for the theoretical results (Sections -5) and stylized simulations

More information

Lecture Notes 3 Convergence (Chapter 5)

Lecture Notes 3 Convergence (Chapter 5) Lecture Notes 3 Convergence (Chapter 5) 1 Convergence of Random Variables Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the cdf of X n and let

More information

A sequential hypothesis test based on a generalized Azuma inequality 1

A sequential hypothesis test based on a generalized Azuma inequality 1 A sequential hypothesis test based on a generalized Azuma inequality 1 Daniël Reijsbergen a,2, Werner Scheinhardt b, Pieter-Tjerk de Boer b a Laboratory for Foundations of Computer Science, University

More information

Finite-time Analysis of the Multiarmed Bandit Problem*

Finite-time Analysis of the Multiarmed Bandit Problem* Machine Learning, 47, 35 56, 00 c 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Finite-time Analysis of the Multiarmed Bandit Problem* PETER AUER University of Technology Graz, A-8010

More information

A Result of Vapnik with Applications

A Result of Vapnik with Applications A Result of Vapnik with Applications Martin Anthony Department of Statistical and Mathematical Sciences London School of Economics Houghton Street London WC2A 2AE, U.K. John Shawe-Taylor Department of

More information

The Game of Normal Numbers

The Game of Normal Numbers The Game of Normal Numbers Ehud Lehrer September 4, 2003 Abstract We introduce a two-player game where at each period one player, say, Player 2, chooses a distribution and the other player, Player 1, a

More information

How Much Evidence Should One Collect?

How Much Evidence Should One Collect? How Much Evidence Should One Collect? Remco Heesen October 10, 2013 Abstract This paper focuses on the question how much evidence one should collect before deciding on the truth-value of a proposition.

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Financial Econometrics and Volatility Models Extreme Value Theory

Financial Econometrics and Volatility Models Extreme Value Theory Financial Econometrics and Volatility Models Extreme Value Theory Eric Zivot May 3, 2010 1 Lecture Outline Modeling Maxima and Worst Cases The Generalized Extreme Value Distribution Modeling Extremes Over

More information

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Christos Dimitrakakis Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands

More information

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011 LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic

More information

10.1 The Formal Model

10.1 The Formal Model 67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate

More information

The concentration of the chromatic number of random graphs

The concentration of the chromatic number of random graphs The concentration of the chromatic number of random graphs Noga Alon Michael Krivelevich Abstract We prove that for every constant δ > 0 the chromatic number of the random graph G(n, p) with p = n 1/2

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

CLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES

CLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES CLASSICAL PROBABILITY 2008 2. MODES OF CONVERGENCE AND INEQUALITIES JOHN MORIARTY In many interesting and important situations, the object of interest is influenced by many random factors. If we can construct

More information

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems 1 Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems Alyson K. Fletcher, Mojtaba Sahraee-Ardakan, Philip Schniter, and Sundeep Rangan Abstract arxiv:1706.06054v1 cs.it

More information

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang. Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)

More information

SOME MEMORYLESS BANDIT POLICIES

SOME MEMORYLESS BANDIT POLICIES J. Appl. Prob. 40, 250 256 (2003) Printed in Israel Applied Probability Trust 2003 SOME MEMORYLESS BANDIT POLICIES EROL A. PEKÖZ, Boston University Abstract We consider a multiarmed bit problem, where

More information

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Martin J. Wolfsegger Department of Biostatistics, Baxter AG, Vienna, Austria Thomas Jaki Department of Statistics, University of South Carolina,

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Lecture 3: Lower Bounds for Bandit Algorithms

Lecture 3: Lower Bounds for Bandit Algorithms CMSC 858G: Bandits, Experts and Games 09/19/16 Lecture 3: Lower Bounds for Bandit Algorithms Instructor: Alex Slivkins Scribed by: Soham De & Karthik A Sankararaman 1 Lower Bounds In this lecture (and

More information

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Lecture 4: Multiarmed Bandit in the Adversarial Model Lecturer: Yishay Mansour Scribe: Shai Vardi 4.1 Lecture Overview

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Statistical Convergence of Kernel CCA

Statistical Convergence of Kernel CCA Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,

More information

Estimation of Dynamic Regression Models

Estimation of Dynamic Regression Models University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.

More information

Lecture 1: Introduction to Sublinear Algorithms

Lecture 1: Introduction to Sublinear Algorithms CSE 522: Sublinear (and Streaming) Algorithms Spring 2014 Lecture 1: Introduction to Sublinear Algorithms March 31, 2014 Lecturer: Paul Beame Scribe: Paul Beame Too much data, too little time, space for

More information

Tijmen Daniëls Universiteit van Amsterdam. Abstract

Tijmen Daniëls Universiteit van Amsterdam. Abstract Pure strategy dominance with quasiconcave utility functions Tijmen Daniëls Universiteit van Amsterdam Abstract By a result of Pearce (1984), in a finite strategic form game, the set of a player's serially

More information

Ordinal Optimization and Multi Armed Bandit Techniques

Ordinal Optimization and Multi Armed Bandit Techniques Ordinal Optimization and Multi Armed Bandit Techniques Sandeep Juneja. with Peter Glynn September 10, 2014 The ordinal optimization problem Determining the best of d alternative designs for a system, on

More information

On the Complexity of Best Arm Identification with Fixed Confidence

On the Complexity of Best Arm Identification with Fixed Confidence On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse

More information

Robustness and duality of maximum entropy and exponential family distributions

Robustness and duality of maximum entropy and exponential family distributions Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional

More information

Lossless Online Bayesian Bagging

Lossless Online Bayesian Bagging Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu

More information

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem

More information

The distribution inherited by Y is called the Cauchy distribution. Using that. d dy ln(1 + y2 ) = 1 arctan(y)

The distribution inherited by Y is called the Cauchy distribution. Using that. d dy ln(1 + y2 ) = 1 arctan(y) Stochastic Processes - MM3 - Solutions MM3 - Review Exercise Let X N (0, ), i.e. X is a standard Gaussian/normal random variable, and denote by f X the pdf of X. Consider also a continuous random variable

More information

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1 Random Walks and Brownian Motion Tel Aviv University Spring 011 Lecture date: May 0, 011 Lecture 9 Instructor: Ron Peled Scribe: Jonathan Hermon In today s lecture we present the Brownian motion (BM).

More information

Sampling Contingency Tables

Sampling Contingency Tables Sampling Contingency Tables Martin Dyer Ravi Kannan John Mount February 3, 995 Introduction Given positive integers and, let be the set of arrays with nonnegative integer entries and row sums respectively

More information

Case study: stochastic simulation via Rademacher bootstrap

Case study: stochastic simulation via Rademacher bootstrap Case study: stochastic simulation via Rademacher bootstrap Maxim Raginsky December 4, 2013 In this lecture, we will look at an application of statistical learning theory to the problem of efficient stochastic

More information

Answers to Problem Set Number MIT (Fall 2005).

Answers to Problem Set Number MIT (Fall 2005). Answers to Problem Set Number 5. 18.305 MIT Fall 2005). D. Margetis and R. Rosales MIT, Math. Dept., Cambridge, MA 02139). November 23, 2005. Course TA: Nikos Savva, MIT, Dept. of Mathematics, Cambridge,

More information

Lecture 8 Inequality Testing and Moment Inequality Models

Lecture 8 Inequality Testing and Moment Inequality Models Lecture 8 Inequality Testing and Moment Inequality Models Inequality Testing In the previous lecture, we discussed how to test the nonlinear hypothesis H 0 : h(θ 0 ) 0 when the sample information comes

More information

Game Theory, On-line prediction and Boosting (Freund, Schapire)

Game Theory, On-line prediction and Boosting (Freund, Schapire) Game heory, On-line prediction and Boosting (Freund, Schapire) Idan Attias /4/208 INRODUCION he purpose of this paper is to bring out the close connection between game theory, on-line prediction and boosting,

More information

CSCI-6971 Lecture Notes: Monte Carlo integration

CSCI-6971 Lecture Notes: Monte Carlo integration CSCI-6971 Lecture otes: Monte Carlo integration Kristopher R. Beevers Department of Computer Science Rensselaer Polytechnic Institute beevek@cs.rpi.edu February 21, 2006 1 Overview Consider the following

More information

Lecture 8: The Field B dr

Lecture 8: The Field B dr Lecture 8: The Field B dr October 29, 2018 Throughout this lecture, we fix a perfectoid field C of characteristic p, with valuation ring O C. Fix an element π C with 0 < π C < 1, and let B denote the completion

More information

Essentials on the Analysis of Randomized Algorithms

Essentials on the Analysis of Randomized Algorithms Essentials on the Analysis of Randomized Algorithms Dimitris Diochnos Feb 0, 2009 Abstract These notes were written with Monte Carlo algorithms primarily in mind. Topics covered are basic (discrete) random

More information

Quantum query complexity of entropy estimation

Quantum query complexity of entropy estimation Quantum query complexity of entropy estimation Xiaodi Wu QuICS, University of Maryland MSR Redmond, July 19th, 2017 J o i n t C e n t e r f o r Quantum Information and Computer Science Outline Motivation

More information

Econ 508B: Lecture 5

Econ 508B: Lecture 5 Econ 508B: Lecture 5 Expectation, MGF and CGF Hongyi Liu Washington University in St. Louis July 31, 2017 Hongyi Liu (Washington University in St. Louis) Math Camp 2017 Stats July 31, 2017 1 / 23 Outline

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

Analysis of Thompson Sampling for the multi-armed bandit problem

Analysis of Thompson Sampling for the multi-armed bandit problem Analysis of Thompson Sampling for the multi-armed bandit problem Shipra Agrawal Microsoft Research India shipra@microsoft.com avin Goyal Microsoft Research India navingo@microsoft.com Abstract We show

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Lectures 2 3 : Wigner s semicircle law

Lectures 2 3 : Wigner s semicircle law Fall 009 MATH 833 Random Matrices B. Való Lectures 3 : Wigner s semicircle law Notes prepared by: M. Koyama As we set up last wee, let M n = [X ij ] n i,j= be a symmetric n n matrix with Random entries

More information

Convergence Rates on Root Finding

Convergence Rates on Root Finding Convergence Rates on Root Finding Com S 477/577 Oct 5, 004 A sequence x i R converges to ξ if for each ǫ > 0, there exists an integer Nǫ) such that x l ξ > ǫ for all l Nǫ). The Cauchy convergence criterion

More information

Lecture 18: March 15

Lecture 18: March 15 CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may

More information