An Asymptotically Optimal Algorithm for the Max k-armed Bandit Problem
|
|
- Dominick Carson
- 5 years ago
- Views:
Transcription
1 An Asymptotically Optimal Algorithm for the Max k-armed Bandit Problem Matthew J. Streeter February 27, 2006 CMU-CS Stephen F. Smith School of Computer Science Carnegie Mellon University Pittsburgh, PA Abstract We present an asymptotically optimal algorithm for the max variant of the k-armed bandit problem. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the expected maximum payoff received over a series of n trials. Subject to certain distributional assumptions, we show that O ) 1δ ln(n)2 (ln( ) ɛ 2 trials are sufficient to identify, with probability at least 1 δ, a machine whose expected maximum payoff is within ɛ of optimal. This result leads to a strategy for solving the problem that is asymptotically optimal in the following sense: the gap between the expected maximum payoff obtained by using our strategy for n trials and that obtained by pulling the single best arm for all n trials approaches zero as n.
2 Keywords: multi-armed bandit problem, PAC bounds
3 1 Introduction In the k-armed bandit problem one is faced with a set of k slot machines, each having an arm that, when pulled, yields a payoff from a fixed (but unknown) distribution. The goal is to allocate trials to the arms so as to maximize the expected cumulative payoff obtained over a series of n trials. Solving the problem entails striking a balance between exploration (determining which arm yields the highest mean payoff) and exploitation (repeatedly pulling this arm). In the max k-armed bandit problem, the goal is to maximize the expected maximum (rather than cumulative) payoff. This version of the problem arises in practice when tackling combinatorial optimization problems for which a number of randomized search heuristics exist: given k heuristics, each yielding a stochastic outcome when applied to some particular problem instance, we wish to allocate trials to the heuristics so as to maximize the maximum payoff (e.g., the maximum number of clauses satisfied by any sampled variable assignment, the minimum makespan of any sampled schedule). Cicirello and Smith [3] show that a max k-armed bandit approach yields good performance on the resource-constrained project scheduling problem with maximum time lags (RCPSP/max). 1.1 Summary of Results We consider a restricted version of the max k-armed bandit problem in which each arm yields payoff drawn from a generalized extreme value (GEV) distribution (defined in 2). This paper presents the first provably asymptotically optimal algorithm for this problem. Roughly speaking, the reason assuming a GEV distribution is the Extremal Types Theorem (stated in 2), which states that the distribution of the sample maximum of n independent identically distributed random variables approaches a GEV distribution as n. A more formal justification is given in 3. For reasons that will become clear, the nature of our results depend on the shape parameter (ξ) of the GEV distribution. Assuming all arms have ξ 0, our results can be summarized as follows. ɛ 2 ) Let a be an arm that yields payoff drawn from a GEV distribution with unknown parameters; let M n denote the maximum payoff obtained after pulling a n times; and let m n = E[M n ]. 1δ ln(n)2 We provide an algorithm that, after pulling the arm O (ln( ) times, produces an estimate m n of m n with the property that P [ m n m n < ɛ] 1 δ. Let a 1, a 2,..., a k be k arms, each yielding payoff from (distinct) GEV distributions with unknown parameters. Let m i n denote the expected maximum payoff obtained by pulling the i th arm n times, and let m n = max 1 i k m i n. We provide an algorithm that, when run for n pulls, obtains expected maximum payoff m n o(1). Our results for the case ξ > 0 are similar, except that our estimates and expected maximum payoffs come within arbitrarily small factors (rather [ than absolute distances) ] of optimality. Specifically, our estimates have the property that P < mn α ɛ m n α 1 < 1 + ɛ 1 δ for constant α 1 independent of n, while the expected maximum payoff obtained by using our algorithm for n pulls is m n(1 o(1)). 1
4 1.2 Related Work The classical k-armed bandit problem was first studied by Robbins [7] and has since been the subject of numerous papers; see Berry and Fristedt [1] and Kaelbling [6] for overviews. In a paper similar in spirit to ours, Fong [5] showed that an initial exploration phase consisting of O( k ɛ 2 ln( k δ )) pulls is sufficient to identify, with probability at least 1 δ, an arm whose mean payoff is within ɛ of optimal. Theorem 2 of this paper proves a bound similar to Fong s on the number of pulls needed to identify an arm whose expected maximum payoff (over a series of n trials) is near-optimal. The max variant of the k-armed bandit problem was first studied by Cicirello and Smith [2, 3], who successfully used a heuristic for the max k-armed bandit problem to select among priority rules for the RCPSP/max. The design of Cicirello and Smith s heuristic is motivated by an analysis of the special case in which each arm s payoff distribution is a GEV distribution with shape parameter ξ = 0, but they do not rigorously analyze the heuristic s behavior. Our paper is more theoretical and less empirical: on the one hand we do not perform experiments on any practical combinatorial problem, but on the other hand we provide stronger performance guarantees under weaker distributional assumptions. 1.3 Notation For an arbitrary cumulative distribution function G, let the random variable M G n be defined by M G n = max{z 1, Z 2,..., Z n } where Z 1, Z 2,..., Z n are independent random variables, each having distribution G. Let m G n = E[M G n ]. 2 Extreme Value Theory This section provides a self-contained overview of results in extreme value theory that are relevant to this work. Our presentation is based on the text by Coles [4]. The central result of extreme value theory is an analogue of the central limit theorem that applies to extremely rare events. Recall that the central limit theorem states that (under certain regularity conditions) the distribution of the sum of n independent, identically distributed (i.i.d) random variables converges to a normal distribution as n. The extremal types theorem states that (under certain regularity conditions) the distribution of the maximum of n i.i.d random variables converges to a generalized extreme value (GEV) distribution. Definition (GEV distribution). A random variable Z has a generalized extreme value distribution if, for constants µ, σ > 0, and ξ, P[Z z] = GEV (µ,σ,ξ) (z), where GEV (µ,σ,ξ) (z) = exp ( ( 1 + ξ(z µ) σ ) ) 1 ξ 2
5 for z {z : 1+ξ(z µ)σ 1 > 0}, and GEV (µ,σ,ξ) (z) = 1 otherwise. The case ξ = 0 is interpreted as the limit ( ( )) µ z lim GEV ξ (µ,σ,ξ )(z) = exp exp. 0 σ The following three propositions establish properties of the GEV distribution. Proposition 1. Let Z be a random variable with P[Z z] = GEV (µ,σ,ξ) (z). Then where is the complete gamma function and is Euler s constant. µ + σ (Γ(1 ξ) 1) if ξ < 1, ξ 0 ξ E[Z] = µ + σγ if ξ = 0 if ξ 1 Γ(z) = 0 γ = lim n t z 1 exp( t) dt ( n k=1 ) 1 k ln(n) Proposition 2. Let G = GEV (µ,σ,ξ). Then M G n has distribution GEV (µ,σ,ξ ), where { µ + σ µ = ξ (nξ 1) if ξ 0 µ + σ ln(n) otherwise, σ = σn ξ, and ξ = ξ. given by Proposition 2 into Proposition 1 gives an expres- Substituting the parameters of Mn G sion for m G n. Proposition 3. Let G = GEV (µ,σ,ξ) where ξ < 1. Then It follows that m G n = for ξ > 0, m G n is Θ(n ξ ); for ξ = 0, m G n is Θ(ln(n)); and for ξ < 0, m G n = µ σ ξ Θ(nξ ). { ( µ + σ ξ n ξ Γ(1 ξ) 1 ) if ξ 0 µ + σγ + σ ln(n) otherwise. 3
6 20 Expected maximum Shape = 0.1 Shape = 0 Shape = ln(n) Figure 1: The effect of the shape parameter (ξ) on the expected maximum of n independent draws from a GEV distribution. It is useful to have a visual picture of what Proposition 3 means. Figure 1 plots m G n as a function of n for three GEV distributions with µ = 0, σ = 1, and ξ {0.1, 0, 0.1}. The central result of extreme value theory is the following theorem. The Extremal Types Theorem. Let G be an arbitrary cumulative distribution function, and suppose there exist sequences of constants {a n > 0} and {b n } such that [ ] M G lim P n b n z = G (z) (2.1) n a n for any continuity point z of G, where G is a not a point mass. Then there exist constants µ, σ > 0, and ξ such that G (z) = GEV (µ,σ,ξ) (z) z. Furthermore, lim P [M n z] = GEV (µan+bn,σa n n,ξ)(z). Condition (2.1) holds for a variety of distributions including the normal, lognormal, uniform, and Cauchy distributions. 3 The Max k-armed Bandit Problem Definition (max k-armed bandit instance). An instance I = (n, G) of the max k-armed bandit problem is an ordered pair whose first element is a positive integer n, and whose second element is a set G = {G 1, G 2,..., G k } of k cumulative distribution functions, each thought of as an arm on a slot machine. The i th arm, when pulled, returns a realization of a random variable with distribution G i. Definition (max k-armed bandit strategy). A max k-armed bandit strategy S is an algorithm that, given an instance I = (n, G) of the max k-armed bandit problem, performs a sequence of n 4
7 arm-pulls. For any strategy S and integer l n, we denote by S l (I) the expected maximum payoff obtained by running S on I for l trials: [ ] S l (I) = E max p j 0 j l where p j is the payoff obtained from the j th pull, and we define p 0 = 0. Our goal is to come up with a strategy S such that S n (I) is near-maximal. Note that the problem is ill-posed (i.e., there is no clear criterion for preferring one strategy over another) unless we make some assumptions about the distributions G i. We will assume that each arm G i = GEV (µi,σ i,ξ i ) is a GEV distribution whose parameters satisfy 1. µ i µ u 2. 0 < σ l σ i σ u 3. ξ l ξ i ξ u < 1 2 for known constants µ u, σ l, σ u, ξ l, and ξ u. There are two arguments for assuming that each arm is a GEV distribution. First, in practice the distribution of payoffs returned by a strong heuristic may be approximately GEV, even if the conditions of the Extremal Type Theorem are not formally satisfied [2]. A second argument runs as follows. Suppose I = (n, G) is an instance of the max k-armed bandit problem in which each distribution G i G satisfies condition (2.1) of the Extremal Types Theorem. Consider the instance Ī = ( n, Ḡ), where Ḡ = {Ḡ1, Ḡ2,..., Ḡk}, and arm m Ḡi returns the maximum payoff obtained by pulling the corresponding arm G i m times. Effectively, Ī is a restricted version of I in which the arms must be pulled in batches of size m, rather than in any arbitrary order. For m sufficiently large, the Extremal Types Theorem guarantees that for each i, Ḡ i GEV (µi,σ i,ξ i ) for some constants µ i, σ i, and ξ i. Thus, the instance I = ( n, m G ) with G = {G 1, G 2,..., G k } and G i = GEV (µi,σ i,ξ i ) is approximately equivalent to Ī and satisfies our distributional assumptions. The purpose of the restrictions on the parameters µ i, σ i, and ξ i is to ensure that each GEV distribution has finite, bounded mean and variance. 4 An Asymptotically Optimal Algorithm We will analyze the following max k-armed bandit strategy. 5
8 Strategy S 1 (ɛ, δ): 1. (Exploration) For each arm G i G: 1δ ln(n)2 (a) Using t = O (ln( ) ɛ 2 ) samples of G i, obtain an estimate m G i n of m G i n. Assuming that arm G i has shape parameter ξ i 0, our estimate will have the property that P [ m G i n m G i n < ɛ ] 1 δ. 2. (Exploitation) Set î = arg max 1 i k m G i n, and pull arm Gî for the remaining n tk trials. If an arm G i has [ shape parameter ξ i > ] 0, the estimate obtained in step 1 (a) will instead have the property that P < 1 + ɛ 1 δ for constant α 1 independent of n. 1 1+ɛ < mn α 1 m n α 1 The following theorem shows that with appropriate settings of ɛ and δ, strategy S 1 is asymptotically optimal. Theorem 1. Let I = (n, G) be an instance of the max k-armed bandit problem, where G = {G 1, G 2,..., G k } and G i = GEV (µi,σ i,ξ i ). Let m n = max 1 i k m G i n, ξ max = max 1 i k ξ i, and ( S = S 1 3 k, 1 ). n kn 2 Then if ξ max 0, while if ξ max > 0, Proof. lim n S n(i) = m n S n (I) lim n m n = 1. Case ξ max 0. Let ˆm n = m G î n (where î is the arm selected for exploitation in step 2). Then ˆm n tk is the expected maximum payoff obtained during the exploitation step, so Claim 1. ˆm n ˆm n tk is O( tk n ). S n (I) ˆm n tk. Proof of Claim 1. Let µ = µî, σ = σî, and ξ = ξî be the parameters of the arm selected for exploitation. Suppose ξ = 0. Then by Proposition 3, ˆm n ˆm n tk = σ (ln(n) ln(n tk)). 6
9 Expanding ln(x + β) in powers of β about β = 0 for β < x 2, x > 0 yields ln(x + β) ln(x) = i=1 = β x 1 1 β x 1 < 2 β x i=1 ( 1)i+1 1 i so for n sufficiently large, ˆm n ˆm n tk 2σ tk = O( tk). n n Now suppose ξ < 0. By Proposition 3, ˆm n ˆm n tk = σγ(1 ξ ξ)(nξ (n t) ξ ) = O((n t) ξ n ξ ) where we have used the fact that σγ(1 ξ) < 0. Expanding (n ξ t)ξ in powers of t about t = 0 gives ( ) t (n t) ξ = n ξ ξn ξ 1 2 t O n ξ 2 and so (n t) ξ n ξ ξn ξ 1 t ξ l t n = O( t n ). With probability at least 1 kδ, all estimates obtained during the exploration phase are within ɛ of the correct values, so that m n ˆm n < 2ɛ. Assuming m n ˆm n < 2ɛ, it follows that ( β x ) i ( β x ) i m n ˆm n tk = (m n ˆm n ) + ( ˆm n ˆm n tk ) < 2ɛ + O( tk) n ( ) = 2ɛ + k O ln( 1 (ln n)2 ) n δ ɛ 2 = O ( ) where = ln(nk) ln(n) 2 3 k, and on the second line we have used Claim 1. Thus, n S n (I) (1 kδ) (m n O ( )) = m n O ( ) = m n o(1). Case ξ max > 0. See Appendix A. Theorem 1 completes our analysis of the performance of S 1. It remains only to describe how the estimates in step 1 (a) are obtained. 4.1 Estimating m n We adopt the following notation: Let G = GEV µ,σ,ξ denote a GEV distribution with (unknown) parameters µ, σ, and ξ satisfying the conditions stated in 3, and 7
10 let m i = m G i. To estimate m n, we first obtain an accurate estimate of ξ. Then 1. if ξ 0 (so that the growth of m n as a function of ln n is linear), we estimate m n by first estimating m 1 and m 2, then performing linear interpolation; 2. otherwise we estimate m n by first estimating m 1, m 2, and m 4, then performing a nonlinear interpolation Estimating m i for i {1, 2, 4} The following two lemmas use well-known ideas to efficiently estimate m i for small values of i. Lemma 1. Let i be a positive integer and let ɛ > 0 be a real number. Then O ( i ɛ 2 ) draws from G suffice to obtain an estimate m i of m i such that P[ m i m i < ɛ] 3 4. Proof. First consider the special case i = 1. Let X denote the sum of t draws from G, for some to-be-specified positive integer t. Then E[X] = m 1 t and V ar[x] = σ 2 t, where σ is the (unknown) standard deviation of G. We take m 1 = X as our estimate of m t 1. Then P[ m 1 m 1 ɛ] = P[ t m 1 tm 1 tɛ] = P[ X E[X] tɛ σ V ar[x]] σ2 tɛ 2 where the last inequality is Chebyshev s. Thus to guarantee P [ m 1 m 1 ɛ] 1 we must set 4 t = 4 σ2 = O ( ) 1 ɛ 2 ɛ (note that due to the assumptions in 3, σ is O(1)). 2 For i > 1, we let X be the sum of t block maxima (each the maximum of i independent draws from G), which increases the number of samples required by a factor of i. To boost the probability that m i m i < ɛ from 3 to 1 δ, we use the median of means 4 method. Lemma 2. Let i be a positive integer and let ɛ > 0 and δ (0, 1) be real numbers. O ( ln( 1) ) i δ ɛ draws from G suffice to obtain an estimate 2 mi of m i such that P [ m n m n < ɛ] 1 δ. Proof. We invoke Lemma 1 r times (for r to be determined), yielding a set E = { m (1) i, m (2) i,..., m (r) of estimates of m i. Let m i be the median element of E. Let A = { m (j) i E : m (j) i m i < ɛ} be the set of accurate estimates of m i ; and let A = A. Then m i m i ɛ implies A r, while 2 E[A] 3 r. Using the standard Chernoff bound, we have 4 [ P [ m i m i ɛ] P A r ] ( exp r ) 2 C for constant C > 0. Thus r = O ( ln( 1 δ )) repetitions suffice to ensure P[ m i m i > ɛ] δ. Then i } 8
11 4.1.2 Estimating m n when ξ = 0 Lemma 3. Assume G has shape parameter ξ = ) 0. Let n be a positive integer and let ɛ > 0 and 1δ ln(n)2 δ (0, 1) be real numbers. Then O (ln( ) draws from G suffice to obtain an estimate m ɛ 2 n of m n such that P[ m n m n < ɛ] 1 δ. Proof. By Proposition 3, m i = µ + σγ + σ ln(i). Thus m n = m 1 + (m 2 m 1 ) log 2 (n). (4.1) Let m 1 and m 2 be estimates of m 1 and m 2, respectively, and let m n be the estimate of m n obtained by plugging m 1 and m 2 into (4.1). Define i = m i m i for i {1, 2, n}. Then Thus to guarantee P[ n i {1, 2}. By Lemma 2, this requires O Estimating m n when ξ 0 n (1 + log 2 (n))( ). < ɛ] 1 δ, it suffices that P [ i ) 1δ (ln n)2 (ln( ) draws from G. ɛ 2 ] ɛ 2(1+log 2 (n)) 1 δ 2 for all For the purpose of the proofs presented here, we will make a minor assumption concerning an arm s shape parameter ξ i : we assume that for some known constant ξ > 0, ξ i < ξ ξ i = 0. Removing this assumption does not fundamentally change the results, but it makes the proofs more complicated. Lemma 5 shows how to efficiently estimate ξ. Lemmas 6 and 7 show how to efficiently estimate m n in the cases ξ < 0 and ξ > 0, respectively. We will use the following lemma. Lemma 4. m 4 m σ and m 2 m σ. Proof. See Appendix A. Lemma 5. For real numbers ɛ > 0 and δ (0, 1), O ( ln( 1 δ ) 1 ɛ 2 ) draws from G suffice to obtain an estimate ξ of ξ such that P [ ξ ξ < ɛ ] 1 δ. Proof. Using Proposition 3, it is straightforward to check that for any ξ < 1, ( ) m4 m 2 ξ = log 2. (4.2) m 2 m 1 9
12 Let m 1, m 2, and m 4 be estimates of m 1, m 2, and m 4, respectively, and let ξ be the estimate of ξ obtained by plugging m 1, m 2, and m 4 into (4.2). Define m = max i {1,2,4} m i m i and define ξ = ξ ξ. We wish to upper bound ξ as a function of m. In Claim 1 of Theorem 1 we showed that ln(x + β) ln(x) 2 β for β x. Letting x 2 N = m 4 m 2 and D = m 2 m 1, and noting that ξ = log 2 (N) log 2 (D) = 1 (ln(n) ln(d)), ln 2 it follows that ξ 1 ( 2(2 m ) + 2(2 ) m) ln(2) N D for m < 1 2 min(n, D). Thus by Lemma 4 and the assumption that σ σ l, ξ is O( m ). Define i = m i m i, so that m = max i {1,2,4} i. Then to guarantee P[ ξ < ɛ] 1 δ, it suffices that P [ i Ω(ɛ)] 1 δ 3 for all i {1, 2, 4}. By Lemma 2, this requires O ( ln( 1 δ ) 1 ɛ 2 ) draws from G. Lemma 6. Assume G has shape parameter ξ ξ. Let n be a positive integer and let ɛ > 0 and δ (0, 1) be real numbers. Then O ( ln( 1 δ ) 1 ɛ 2 ) draws from G suffice to obtain an estimate mn of m n such that P[ m n m n < ɛ] 1 δ. Proof. By Proposition 3, m i = µ + σ ξ ( i ξ Γ(1 ξ) 1 ). Define so that α 1 = µ σξ 1 α 2 = σξ 1 Γ(1 ξ) α 3 = 2 ξ m i = α 1 + α 2 α log 2 (i) 3. (4.3) Plugging in the values i = 1, i = 2, and i = 4 into (4.3) yields a system of three quadratic equations. Solving this system for α 1, α 2, and α 3 yields α 1 = (m 1 m 4 m 2 2)(m 1 2m 2 + m 4 ) 1 α 2 = ( 2m 1 m 2 + m m 2 2)(m 1 2m 2 + m 4 ) 1 α 3 = (m 4 m 2 )(m 2 m 1 ) 1. Let m 1, m 2, and m 4 be estimates of m 1, m 2, and m 4, respectively. Plugging m 1, m 2, and m 4 into the above equations yields estimates, say ᾱ 1, ᾱ 2, and ᾱ 3, of α 1, α 2, and α 3, respectively. Define m = max i {1,2,4} m i m i and α = max i {1,2,3} ᾱ i α i. To complete the proof, we show that m n m n is O( m ). The argument consists of two parts: in claims 1 through 3 we show that α is O( m ), then in Claim 4 we show that m n m n is O( α ). Claim 1. Each of the numerators in the expressions for α 1, α 2, and α 3 has absolute value bounded from above, while each of the denominators has absolute value bounded from below. (The bounds are independent of the unknown parameters of G.) 10
13 Proof of claim 1. The numerators will have bounded absolute value as long as m 1, m 2, and m 3 are bounded. Upper bounds on m 1, m 2, and m 3 follow from the restrictions on the parameters µ, σ, and ξ. As for the denominators, by Lemma 4 we have m 1 2m 2 + m 4 = (m 2 m 1 )(α 3 1) 1 8 σ l 2 ξ 1. Claim 2. Let N and D be fixed real numbers, and let β N and β D be real numbers with β D < D Then N+β N D+β D N is O( β D N + β D ). Proof of claim 2. First, using the Taylor series expansion of N D+β D, N D+β D N = Nβ ( D D D 2 i=0 ( 1)i+1 β D D Nβ D D 2 (1 β D D 1 ) = O ( β D ). ) i 2. Then N+β N D+β D N D N D+β D N D + β N D+β D = O ( β N + β D ). Claim 3. α is O( m ). Proof of claim 3. We show that ᾱ 1 α 1 is O( m ). Similar arguments show that ᾱ 2 α 2 and ᾱ 4 α 4 are O( m ), which proves the claim. To see that ᾱ 1 α 1 is O( m ), let N = m 1 m 4 m 2 2, and let D = m 1 2m 2 + m 4, so that α 1 = N D. Define N and D in the natural way so that ᾱ 1 = N D. Because m 1,m 2, and m 3 are all O(1) (by Claim 1), it follows that both N N and D D are O( m ). That ᾱ 1 α 1 is O( m ) follows by Claim 2. Claim 4. m n m n is O ( α ). Proof of claim 4. Because ξ l ξ ξ it must be that 0 < 2 ξ l < α3 < 2 ξ < 1. So for α sufficiently small, 0 < ᾱ 3 < 1. m n m n = (ᾱ 1 + ᾱ 2 ᾱ log 2 (n) 3 ) (α 1 + α 2 α log 2 (n) 3 ) ᾱ 1 α 1 + ᾱ 2 ᾱ log 2 (n) 3 ᾱ 2 α log 2 (n) 3 + ᾱ 2 α log 2 (n) 3 α 2 α log 2 (n) 3 ᾱ 1 α 1 + ᾱ 2 ᾱ 3 α 3 + ᾱ 2 α 2 = O( α ) where on the third line we have used the fact that both α 3 and ᾱ 3 are between 0 and 1, and in the last line we have used the fact that ᾱ 2 is O(1). 11
14 Putting claims 3 and 4 together, m n m n is O( m ). Define i = m i m i, so that m = max i {1,2,4} i. Thus to guarantee P[ m n m n < ɛ] 1 δ, it suffices that P [ i < Ω(ɛ)] 1 δ 3 for all i {1, 2, 4}. By Lemma 2, this requires O(ln( 1 δ ) 1 ɛ 2 ) draws from G. Lemma 7. Assume G has shape parameter ξ ) ξ. Let n be a positive integer and let ɛ > 0 and 1δ ln(n)2 δ (0, 1) be real numbers. Then O (ln( ) draws from G suffice to obtain an estimate m ɛ 2 n of m n such that [ 1 P 1 + ɛ < m ] n α 1 < (1 + ɛ) 1 δ m n α 1 where α 1 = µ σ ξ. Proof. See Appendix A. Putting the results of lemmas 3, 5, 6, and 7 together, we obtain the following theorem. Theorem 2. Let ) n be a positive integer and let ɛ > 0 and δ (0, 1) be real numbers. Then 1δ ln(n)2 O (ln( ) draws from G suffice to obtain an estimate m ɛ 2 n of m n such that with probability at least 1 δ, one of the following holds: ξ 0 and m n m n < ɛ, or ξ > 0 and 1 1+ɛ < mn α 1 m n α 1 < 1 + ɛ, where α 1 = µ σ ξ. Proof. First, invoke Lemma 5 with parameters ξ and δ. Then invoke one of Lemmas 3, 6, or (depending on the estimate ξ obtained from Lemma 5) with parameters ɛ and δ. 2 Theorem 2 shows that step 1 (a) of strategy S 1 can be performed as described. 5 Conclusions The max k-armed bandit problem is a variant of the classical k-armed bandit problem with practical applications to combinatorial optimization. Motivated by extreme value theory, we studied a restricted version of this problem in which each arm yields payoff drawn from a GEV distribution. We derived PAC bounds on the sample complexity of estimating m n, the expected maximum of n draws from a GEV distribution. Using these bounds, we showed that a simple algorithm for the max k-armed bandit problem is asymptotically optimal. Ours is the first algorithm for this problem with rigorous asymptotic performance guarantees. References [1] Donald. A. Berry and Bert Fristedt. Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London,
15 [2] Vincent A. Cicirello and Stephen F. Smith. Heuristic selection for stochastic search optimization: Modeling solution quality by extreme value theory. In Proceedings of the 10th International Conference on Principles and Practice of Constraint Programming, pages , [3] Vincent A. Cicirello and Stephen F. Smith. The max k-armed bandit: A new model of exploration applied to search heuristic selection. In Proceedings of AAAI 2005, pages , [4] Stuart Coles. An Introduction to Statistical Modeling of Extreme Values. Springer-Verlag, London, [5] Philip W. L. Fong. A quantitative study of hypothesis selection. In International Conference on Machine Learning, pages , [6] Leslie P. Kaelbling. Learning in Embedded Systems. The MIT Press, Cambridge, MA, [7] Herbert Robbins. Some aspects of sequential design of experiments. Bulletin of the American Mathematical Society, 58: , Appendix A Theorem 1. Let I = (n, G) be an instance of the max k-armed bandit problem, where G = {G 1, G 2,..., G k } and G i = GEV (µi,σ i,ξ i ). Let m n = max 1 i k m G i n, ξ max = max 1 i k ξ i, and ( S = S 1 3 k, 1 ). n kn 2 Then if ξ max 0, while if ξ max > 0, lim n S n(i) = m n S n (I) lim n m n = 1. Proof. For ξ max 0, the theorem was proved in the main text. It remains only to address the case ξ max > 0. Case ξ max > 0. We will prove the stronger claim that S n (I) α 1 m n α 1 = 1 O( ) (5.1) where = ln(nk) ln(n) 2 3 k n and α 1 = max 1 i k α i, where α i = µ i σ i ξ i. 13
16 For the moment, let us assume that all arms have shape parameter ξ i > 0. Let A be the event (which occurs with probability at least 1 kδ) that all estimates obtained in step 1 (a) satisfy the inequality in Theorem 2. Claim 1. To prove (5.1), it suffices to show that A implies m n α 1 ˆm n tk α 1 = 1 + O( ). Proof of claim 1. Because S n (I) ˆm n tk and the event A occurs with probability at least 1 kδ, it suffices to show that A implies (1 δk) ˆm n tk α 1 m n α 1 = 1 O( ). Because δk( ˆm n tk) m n α 1 is O ( 1 n 2 ) = o( ), it suffices to show that A implies ˆm n tk α 1 m n α 1 = 1 O( ). This can be rewritten as m 1 n α 1 = ( ˆm n tk α 1 ) = ( ˆm 1 O( ) n tk α 1 )(1 + O( )) (we can 1 replace with 1 + O( ) because for r < 1, 1 = 1 + r < 1 + 2r). 1 O( ) 2 1 r 1 r Claim 2. ˆm n ˆα 1 ˆm n tk ˆα 1 = 1 + O( ). Proof of claim 2. Using Proposition 3, ( ln ) ˆm n ˆα 1 ˆm n tk ˆα 1 ( = ln ) n ξ (n t) ξ = ξ (ln(n) ln(n tk)) = O( tk n ) = O( ) The claim follows from the fact that exp(β) < 1 + 3β for β < 1, so that exp(o( )) = O( ). Claim 3. A implies that for all i, m i n α 1 m i n α 1 < 1 + ɛ. Proof of claim 3. By definition, α 1 = α1 i β for some β 0. The claim follows from the fact that for positive N and D and β 0, N N+β < 1 + ɛ implies < 1 + ɛ. D D+β Claim 4. A implies Proof of claim 4. m n α 1 ˆm n tk α 1 m n α 1 mîn tk α 1 = 1 + O( ). = m n α 1 m m n α n α 1 mîn α 1 mîn α 1 1 mîn α 1 mîn α 1 (1 + ɛ) 1 (1 + ɛ) (1 + O( )) = 1 + O( ) where in the second step we have used claims 2 and mîn tk α 1
17 Putting claims 1 and 4 together completes the proof. To remove the assumption that all arms have ξ i > 0, we need to show that A implies that for n sufficiently large, the arms î and i (the only arms that play a role in the proof) will have shape parameters > 0. This follows from the fact that if ξ i 0, m i n is O(ln(n)), while if ξ i > ξ > 0, m i n is Ω(n ξ i ). Lemma 4. m 4 m σ and m 2 m σ. Proof. If ξ = 0, then by Proposition 3, m 4 m 2 = m 2 m 1 = ln(2)σ and we are done. Otherwise, It thus suffices to prove the following claim. Claim 1. m 4 m 2 = σ(2 ξ 1)ξ 1 Γ(1 ξ) and m 2 m 1 = σ(4 ξ 2 ξ )ξ 1 Γ(1 ξ). { 2 ξ 1 min ξ< 1 ξ 2 { 4 ξ 2 ξ min ξ< 1 2 ξ } Γ(1 ξ) 1 4, and } Γ(1 ξ) 1 8. Proof of claim 1. We state without proof the following properties of the Γ function: min y> 1 2 Γ(z) z! z 2 Γ(z) 1 2 z > 0 Making the change of variable y = ξ, it suffices to show { } 1 2 y min Γ(1 + y) 1, and y> 1 y 4 2 (5.2) { } 2 y (1 2 y ) Γ(1 + y) 1 8. (5.3) (5.2) holds because for 1 2 < y 1, while for y > 1, y 1 2 y Γ(1 + y) 1 y 2 Γ(1 + y) 1 4, 1 2 y Γ(1 + y) y Similarly, (5.3) holds because for 1 2 < y 1, y + 1! 2y 2 y (1 2 y ) Γ(1 + y) 1 y 8,
18 while for y > 1, 2 y (1 2 y ) Γ(1 + y) y y + 1! 2y(2 y ) 1 8. Lemma 7. Assume G has shape parameter ξ ) ξ. Let n be a positive integer and let ɛ > 0 and 1δ ln(n)2 δ (0, 1) be real numbers. Then O (ln( ) draws from G suffice to obtain an estimate m ɛ 2 n of m n such that [ 1 P 1 + ɛ < m ] n α 1 < (1 + ɛ) 1 δ m n α 1 where α 1 = µ σ ξ. Proof. We use the same estimation procedure as in the proof of Lemma 6. Let α 1, α 2, α 3, α, and m be defined as they were in that proof. The inequality 1 < mn α 1 1+ɛ m n α 1 < 1 + ɛ is the same as ln( mn α 1 m n α 1 ) < ln(1 + ɛ). For ɛ < 1, 2 ln(1 + ɛ) 7 ɛ, so it suffices to guarantee that 8 ln( m n α 1 ) ln(m n α 1 ) < 7 8 ɛ. Claim 1. ln( m n α 1 ) ln(m n α 1 ) is O(ln(n) α ). Proof of claim 1. Because ln( m n α 1 ) ln( m n ᾱ 1 ) is O( α ), it suffices to show that ln( m n ᾱ 1 ) ln(m n α 1 ) is O(ln(n) α ). This is true because ( ) ln( m n ᾱ 1 ) = ln ᾱ 2 ᾱ log 2 (n) 3 = log 2 (n) ln(ᾱ 3 ) + ln(ᾱ 2 ) = log ( 2 (n) ln(α 3 ) + ln(α 2 ) ± O (ln(n) α ) = ln α 2 α log 2 (n) 3 ± O (ln(n) α ) = ln(m n α 1 ) ± O (ln(n) α ). Setting α < Ω (ln(n) 1 ɛ) then guarantees ln( m n ) ln(m n ) < 7 ɛ. By Claim 3 of the 8 proof of Lemma 6 (which did not depend on the assumption ξ < 0), α is O ( m ), so we require P[ m < Ω (ln(n) 1 ɛ)] 1 δ. Define i = m i m i, so that m = max i {1,2,4} i. It suffices that P ) [ i < Ω (ln(n) 1 ɛ)] 1 δ for i {1, 2, 4}. By Lemma 2, ensuring this requires 3 1δ ln(n)2 O (ln( ) draws from G. ɛ 2 16
Introduction to Bandit Algorithms. Introduction to Bandit Algorithms
Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationLecture 4: Lower Bounds (ending); Thompson Sampling
CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationAn Online Algorithm for Maximizing Submodular Functions
An Online Algorithm for Maximizing Submodular Functions Matthew Streeter December 20, 2007 CMU-CS-07-171 Daniel Golovin School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This research
More informationBandits for Online Optimization
Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each
More informationLecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem
Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:
More information8 Laws of large numbers
8 Laws of large numbers 8.1 Introduction We first start with the idea of standardizing a random variable. Let X be a random variable with mean µ and variance σ 2. Then Z = (X µ)/σ will be a random variable
More informationLecture 8: February 8
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 8: February 8 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They
More informationThe Fitness Level Method with Tail Bounds
The Fitness Level Method with Tail Bounds Carsten Witt DTU Compute Technical University of Denmark 2800 Kgs. Lyngby Denmark arxiv:307.4274v [cs.ne] 6 Jul 203 July 7, 203 Abstract The fitness-level method,
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More information1 MDP Value Iteration Algorithm
CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding
More informationMultiple Identifications in Multi-Armed Bandits
Multiple Identifications in Multi-Armed Bandits arxiv:05.38v [cs.lg] 4 May 0 Sébastien Bubeck Department of Operations Research and Financial Engineering, Princeton University sbubeck@princeton.edu Tengyao
More informationLecture 4: September Reminder: convergence of sequences
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused
More informationThe expansion of random regular graphs
The expansion of random regular graphs David Ellis Introduction Our aim is now to show that for any d 3, almost all d-regular graphs on {1, 2,..., n} have edge-expansion ratio at least c d d (if nd is
More informationNotes 6 : First and second moment methods
Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative
More informationNotes 1 : Measure-theoretic foundations I
Notes 1 : Measure-theoretic foundations I Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Section 1.0-1.8, 2.1-2.3, 3.1-3.11], [Fel68, Sections 7.2, 8.1, 9.6], [Dur10,
More informationThe information complexity of sequential resource allocation
The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation
More informationRegression and Statistical Inference
Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF
More informationDiscriminative Learning can Succeed where Generative Learning Fails
Discriminative Learning can Succeed where Generative Learning Fails Philip M. Long, a Rocco A. Servedio, b,,1 Hans Ulrich Simon c a Google, Mountain View, CA, USA b Columbia University, New York, New York,
More informationLinear models. Linear models are computationally convenient and remain widely used in. applied econometric research
Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y
More informationBennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence
Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present
More information6.1 Moment Generating and Characteristic Functions
Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,
More information1 Review of The Learning Setting
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we
More informationLecture 5: January 30
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 5: January 30 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They
More informationHomework 4 Solutions
CS 174: Combinatorics and Discrete Probability Fall 01 Homework 4 Solutions Problem 1. (Exercise 3.4 from MU 5 points) Recall the randomized algorithm discussed in class for finding the median of a set
More informationLectures 2 3 : Wigner s semicircle law
Fall 009 MATH 833 Random Matrices B. Való Lectures 3 : Wigner s semicircle law Notes prepared by: M. Koyama As we set up last wee, let M n = [X ij ] n i,j=1 be a symmetric n n matrix with Random entries
More informationLearning symmetric non-monotone submodular functions
Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru
More informationThe space complexity of approximating the frequency moments
The space complexity of approximating the frequency moments Felix Biermeier November 24, 2015 1 Overview Introduction Approximations of frequency moments lower bounds 2 Frequency moments Problem Estimate
More information1 Introduction. 2 Measure theoretic definitions
1 Introduction These notes aim to recall some basic definitions needed for dealing with random variables. Sections to 5 follow mostly the presentation given in chapter two of [1]. Measure theoretic definitions
More informationLecture 9. = 1+z + 2! + z3. 1 = 0, it follows that the radius of convergence of (1) is.
The Exponential Function Lecture 9 The exponential function 1 plays a central role in analysis, more so in the case of complex analysis and is going to be our first example using the power series method.
More informationRandom Bernstein-Markov factors
Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit
More informationLecture 4: Two-point Sampling, Coupon Collector s problem
Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collector s problem Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms
More informationBandits : optimality in exponential families
Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities
More informationModule 3. Function of a Random Variable and its distribution
Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given
More informationLecture 9: March 26, 2014
COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 9: March 26, 204 Spring 204 Scriber: Keith Nichols Overview. Last Time Finished analysis of O ( n ɛ ) -query
More informationExperience-Efficient Learning in Associative Bandit Problems
Alexander L. Strehl strehl@cs.rutgers.edu Chris Mesterharm mesterha@cs.rutgers.edu Michael L. Littman mlittman@cs.rutgers.edu Haym Hirsh hirsh@cs.rutgers.edu Department of Computer Science, Rutgers University,
More informationHoeffding, Chernoff, Bennet, and Bernstein Bounds
Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationTHE first formalization of the multi-armed bandit problem
EDIC RESEARCH PROPOSAL 1 Multi-armed Bandits in a Network Farnood Salehi I&C, EPFL Abstract The multi-armed bandit problem is a sequential decision problem in which we have several options (arms). We can
More informationProofs for Large Sample Properties of Generalized Method of Moments Estimators
Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my
More information1 A Support Vector Machine without Support Vectors
CS/CNS/EE 53 Advanced Topics in Machine Learning Problem Set 1 Handed out: 15 Jan 010 Due: 01 Feb 010 1 A Support Vector Machine without Support Vectors In this question, you ll be implementing an online
More informationDistinguishing a truncated random permutation from a random function
Distinguishing a truncated random permutation from a random function Shoni Gilboa Shay Gueron July 9 05 Abstract An oracle chooses a function f from the set of n bits strings to itself which is either
More informationThe Multi-Armed Bandit Problem
Università degli Studi di Milano The bandit problem [Robbins, 1952]... K slot machines Rewards X i,1, X i,2,... of machine i are i.i.d. [0, 1]-valued random variables An allocation policy prescribes which
More informationElectronic Companion to: What matters in school choice tie-breaking? How competition guides design
Electronic Companion to: What matters in school choice tie-breaking? How competition guides design This electronic companion provides the proofs for the theoretical results (Sections -5) and stylized simulations
More informationLecture Notes 3 Convergence (Chapter 5)
Lecture Notes 3 Convergence (Chapter 5) 1 Convergence of Random Variables Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the cdf of X n and let
More informationA sequential hypothesis test based on a generalized Azuma inequality 1
A sequential hypothesis test based on a generalized Azuma inequality 1 Daniël Reijsbergen a,2, Werner Scheinhardt b, Pieter-Tjerk de Boer b a Laboratory for Foundations of Computer Science, University
More informationFinite-time Analysis of the Multiarmed Bandit Problem*
Machine Learning, 47, 35 56, 00 c 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Finite-time Analysis of the Multiarmed Bandit Problem* PETER AUER University of Technology Graz, A-8010
More informationA Result of Vapnik with Applications
A Result of Vapnik with Applications Martin Anthony Department of Statistical and Mathematical Sciences London School of Economics Houghton Street London WC2A 2AE, U.K. John Shawe-Taylor Department of
More informationThe Game of Normal Numbers
The Game of Normal Numbers Ehud Lehrer September 4, 2003 Abstract We introduce a two-player game where at each period one player, say, Player 2, chooses a distribution and the other player, Player 1, a
More informationHow Much Evidence Should One Collect?
How Much Evidence Should One Collect? Remco Heesen October 10, 2013 Abstract This paper focuses on the question how much evidence one should collect before deciding on the truth-value of a proposition.
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationFinancial Econometrics and Volatility Models Extreme Value Theory
Financial Econometrics and Volatility Models Extreme Value Theory Eric Zivot May 3, 2010 1 Lecture Outline Modeling Maxima and Worst Cases The Generalized Extreme Value Distribution Modeling Extremes Over
More informationComplexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning
Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Christos Dimitrakakis Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
More informationLARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011
LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic
More information10.1 The Formal Model
67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate
More informationThe concentration of the chromatic number of random graphs
The concentration of the chromatic number of random graphs Noga Alon Michael Krivelevich Abstract We prove that for every constant δ > 0 the chromatic number of the random graph G(n, p) with p = n 1/2
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More informationCLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES
CLASSICAL PROBABILITY 2008 2. MODES OF CONVERGENCE AND INEQUALITIES JOHN MORIARTY In many interesting and important situations, the object of interest is influenced by many random factors. If we can construct
More informationRigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems
1 Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems Alyson K. Fletcher, Mojtaba Sahraee-Ardakan, Philip Schniter, and Sundeep Rangan Abstract arxiv:1706.06054v1 cs.it
More informationMarch 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.
Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)
More informationSOME MEMORYLESS BANDIT POLICIES
J. Appl. Prob. 40, 250 256 (2003) Printed in Israel Applied Probability Trust 2003 SOME MEMORYLESS BANDIT POLICIES EROL A. PEKÖZ, Boston University Abstract We consider a multiarmed bit problem, where
More informationEstimation of AUC from 0 to Infinity in Serial Sacrifice Designs
Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Martin J. Wolfsegger Department of Biostatistics, Baxter AG, Vienna, Austria Thomas Jaki Department of Statistics, University of South Carolina,
More informationStochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions
International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.
More informationLecture 3: Lower Bounds for Bandit Algorithms
CMSC 858G: Bandits, Experts and Games 09/19/16 Lecture 3: Lower Bounds for Bandit Algorithms Instructor: Alex Slivkins Scribed by: Soham De & Karthik A Sankararaman 1 Lower Bounds In this lecture (and
More informationAdvanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12
Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Lecture 4: Multiarmed Bandit in the Adversarial Model Lecturer: Yishay Mansour Scribe: Shai Vardi 4.1 Lecture Overview
More informationPCA with random noise. Van Ha Vu. Department of Mathematics Yale University
PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical
More informationStatistical Convergence of Kernel CCA
Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,
More informationEstimation of Dynamic Regression Models
University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.
More informationLecture 1: Introduction to Sublinear Algorithms
CSE 522: Sublinear (and Streaming) Algorithms Spring 2014 Lecture 1: Introduction to Sublinear Algorithms March 31, 2014 Lecturer: Paul Beame Scribe: Paul Beame Too much data, too little time, space for
More informationTijmen Daniëls Universiteit van Amsterdam. Abstract
Pure strategy dominance with quasiconcave utility functions Tijmen Daniëls Universiteit van Amsterdam Abstract By a result of Pearce (1984), in a finite strategic form game, the set of a player's serially
More informationOrdinal Optimization and Multi Armed Bandit Techniques
Ordinal Optimization and Multi Armed Bandit Techniques Sandeep Juneja. with Peter Glynn September 10, 2014 The ordinal optimization problem Determining the best of d alternative designs for a system, on
More informationOn the Complexity of Best Arm Identification with Fixed Confidence
On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse
More informationRobustness and duality of maximum entropy and exponential family distributions
Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat
More informationOn prediction and density estimation Peter McCullagh University of Chicago December 2004
On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating
More informationMultivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma
Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional
More informationLossless Online Bayesian Bagging
Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu
More informationCS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm
CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem
More informationThe distribution inherited by Y is called the Cauchy distribution. Using that. d dy ln(1 + y2 ) = 1 arctan(y)
Stochastic Processes - MM3 - Solutions MM3 - Review Exercise Let X N (0, ), i.e. X is a standard Gaussian/normal random variable, and denote by f X the pdf of X. Consider also a continuous random variable
More informationLecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1
Random Walks and Brownian Motion Tel Aviv University Spring 011 Lecture date: May 0, 011 Lecture 9 Instructor: Ron Peled Scribe: Jonathan Hermon In today s lecture we present the Brownian motion (BM).
More informationSampling Contingency Tables
Sampling Contingency Tables Martin Dyer Ravi Kannan John Mount February 3, 995 Introduction Given positive integers and, let be the set of arrays with nonnegative integer entries and row sums respectively
More informationCase study: stochastic simulation via Rademacher bootstrap
Case study: stochastic simulation via Rademacher bootstrap Maxim Raginsky December 4, 2013 In this lecture, we will look at an application of statistical learning theory to the problem of efficient stochastic
More informationAnswers to Problem Set Number MIT (Fall 2005).
Answers to Problem Set Number 5. 18.305 MIT Fall 2005). D. Margetis and R. Rosales MIT, Math. Dept., Cambridge, MA 02139). November 23, 2005. Course TA: Nikos Savva, MIT, Dept. of Mathematics, Cambridge,
More informationLecture 8 Inequality Testing and Moment Inequality Models
Lecture 8 Inequality Testing and Moment Inequality Models Inequality Testing In the previous lecture, we discussed how to test the nonlinear hypothesis H 0 : h(θ 0 ) 0 when the sample information comes
More informationGame Theory, On-line prediction and Boosting (Freund, Schapire)
Game heory, On-line prediction and Boosting (Freund, Schapire) Idan Attias /4/208 INRODUCION he purpose of this paper is to bring out the close connection between game theory, on-line prediction and boosting,
More informationCSCI-6971 Lecture Notes: Monte Carlo integration
CSCI-6971 Lecture otes: Monte Carlo integration Kristopher R. Beevers Department of Computer Science Rensselaer Polytechnic Institute beevek@cs.rpi.edu February 21, 2006 1 Overview Consider the following
More informationLecture 8: The Field B dr
Lecture 8: The Field B dr October 29, 2018 Throughout this lecture, we fix a perfectoid field C of characteristic p, with valuation ring O C. Fix an element π C with 0 < π C < 1, and let B denote the completion
More informationEssentials on the Analysis of Randomized Algorithms
Essentials on the Analysis of Randomized Algorithms Dimitris Diochnos Feb 0, 2009 Abstract These notes were written with Monte Carlo algorithms primarily in mind. Topics covered are basic (discrete) random
More informationQuantum query complexity of entropy estimation
Quantum query complexity of entropy estimation Xiaodi Wu QuICS, University of Maryland MSR Redmond, July 19th, 2017 J o i n t C e n t e r f o r Quantum Information and Computer Science Outline Motivation
More informationEcon 508B: Lecture 5
Econ 508B: Lecture 5 Expectation, MGF and CGF Hongyi Liu Washington University in St. Louis July 31, 2017 Hongyi Liu (Washington University in St. Louis) Math Camp 2017 Stats July 31, 2017 1 / 23 Outline
More informationFORMULATION OF THE LEARNING PROBLEM
FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we
More informationAnalysis of Thompson Sampling for the multi-armed bandit problem
Analysis of Thompson Sampling for the multi-armed bandit problem Shipra Agrawal Microsoft Research India shipra@microsoft.com avin Goyal Microsoft Research India navingo@microsoft.com Abstract We show
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationLectures 2 3 : Wigner s semicircle law
Fall 009 MATH 833 Random Matrices B. Való Lectures 3 : Wigner s semicircle law Notes prepared by: M. Koyama As we set up last wee, let M n = [X ij ] n i,j= be a symmetric n n matrix with Random entries
More informationConvergence Rates on Root Finding
Convergence Rates on Root Finding Com S 477/577 Oct 5, 004 A sequence x i R converges to ξ if for each ǫ > 0, there exists an integer Nǫ) such that x l ξ > ǫ for all l Nǫ). The Cauchy convergence criterion
More informationLecture 18: March 15
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may
More information