arxiv: v1 [cs.lg] 14 Nov 2018 Abstract

Size: px
Start display at page:

Download "arxiv: v1 [cs.lg] 14 Nov 2018 Abstract"

Transcription

1 Sample complexity of partition identification using multi-armed bandits Sandeep Juneja Subhashini Krishnasamy TIFR, Mumbai November 15, 2018 arxiv: v1 [cs.lg] 14 Nov 2018 Abstract Given a vector of probability distributions, or arms, each of which can be sampled independently, we consider the problem of identifying the partition to which this vector belongs from a finitely partitioned universe of such vector of distributions. We study this as a pure exploration problem in multi armed bandit settings and develop sample complexity bounds on the total mean number of samples required for identifying the correct partition with high probability. This framework subsumes well studied problems in the literature such as finding the best arm or the best few arms. We consider distributions belonging to the single parameter exponential family and primarily consider partitions where the vector of means of arms lie either in a given set or its complement. The sets considered correspond to distributions where there exists a mean above a specified threshold, where the set is a half space and where either the set or its complement is convex. When the set is convex we restrict our analysis to its complement being a union of half spaces. In all these settings, we characterize the lower bounds on mean number of samples for each arm. Further, inspired by the lower bounds, and building upon Garivier and Kaufmann (2016), we propose algorithms that can match these bounds asymptotically with decreasing probability of error. Applications of this framework may be diverse. We briefly discuss a few associated with finance. 1 Introduction Suppose that Ω denotes a collection of vectors ν = (ν 1,..., ν K ) where each ν i is a probability distribution on R. Further, Ω = m A i where the component sets A i are disjoint, and thus partition Ω. In this set-up, given µ = (µ 1,..., µ K ) Ω, we consider the problem of identifying the correct component A i that contains µ. The distributions (µ i : i K) are not known to us, however, it is possible to generate independent samples from each µ i. We call this the partition identification or PI problem. In the multi-armed bandit literature, for any ν Ω, generating a sample from distribution ν i is referred to as sampling from, or pulling, an arm i. We also follow this convention. We consider algorithms that sequentially and adaptively generate samples from each arm in µ and then after generating finitely many samples, stop to announce a component of Ω that is erred to contain µ. We study the so-called δ-pac algorithms in the PI framework. As is well known, PAC algorithms stands for probably approximately correct algorithms. Definition 1. For any δ (0, 1), an algorithm is said to be δ-pac for the PI problem Ω = m A i if, for every µ Ω, it restricts the probability of announcing an incorrect component to most δ. 1

2 More generally, in similar sequential decision making problems, algorithms are said to provide δ-pac guarantees if the probability of incorrect decision is bounded from above by δ for each δ (0, 1). The PI framework is quite general and captures popular pure exploration problems studied in the multi-armed bandit literature. For instance, finding the best arm, that is, the arm with the highest mean, with above δ-pac type guarantees, is well studied in literature and fits PI framework (see, e.g., in learning theory Garivier and Kaufmann (2016), Kaufmann et al. (2016), Russo (2016), Jamieson et al. (2014), Bubeck et al. (2011), Audibert and Bubeck (2010), Even-Dar et al. (2006), Mannor and Tsitsiklis (2004); in earlier statistics literature - Jennison et al. (1982), Bechhofer et al. (1968), Paulson et al. (1964), Chernoff (1959); in simulation theory literature - Glynn and Juneja (2004), Kim and Nelson (2001), Chen et al. (2000), Dai (1996), Ho et al. (1992)). More generally, identifying m arms (for some m < K) with the the largest m means amongst K distributions also is a PI problem ( see, e.g., Kaufmann and Kalyanakrishnan (2013), Kalyanakrishnan et al. (2012)). The advantage of PI framework is that it provides a unified approach to tackle a large class of problems, both in developing lower bounds on the sample complexity (expected total number of arm pulls) to achieve δ-pac guarantees (we call this the lower bound problem), as well as in arriving at algorithms that match up to the developed lower bounds, under certain distributional restrictions. Analysis of lower bound problem relies on a fundamental inequality developed by Garivier and Kaufmann (2016). Their work in turn is built upon the earlier analysis that goes back at least to Lai and Robbins (1985) (also see Mannor and Tsitsiklis (2004), Burnetas and Katehakis (1996)). This inequality allows us to formulate the lower bound problem as an optimization problem - a linear program with initely many constraints; as well as an equivalent min-max formulation. To further analyze this optimization problem, some distributional restrictions are needed (see Glynn and Juneja (2018) for necessity of distributional restrictions). As is customary in the learning theory literature (see, e.g., Cappé et al. (2013), Garivier and Kaufmann (2016), Kaufmann et al. (2016)), we assume that each arm distribution belongs to a single parameter exponential family. Examples include Binomial, Poisson, Gaussian with known variance distribution, etc. See, Cappé et al. (2013) for an elaborate discussion on such distributions. Any member of a single parameter exponential family can be represented by its corresponding parameter (say its mean), which allows us to consider the partition problem in the parameter space (i.e., Ω R K ) instead of the distribution space. In our analysis, we solve the lower bound problem in the following settings that fit the general framework: Given a µ Ω and a threshold u, determining if max i K µ i > u. We refer to this as the threshold crossing problem. We briefly discuss how this problem arises naturally in financial applications involving nested simulations. Ascertaining if µ Ω lies in the half space for (a 1,..., a K, b) R K+1. {ν R K : a i ν i > b} This is a generic problem with potentially many applications. One application is in the capital budgeting problem faced by a financial manager, where the manager needs to ascertain if expected profitability of a given set of projects exceeds an acceptable threshold. While expected cash flow from a project may not be known in closed form, independent, identically, distributed samples of cash flow from each project can be generated via simulation. 2

3 More generally, ascertaining if µ Ω lies in a convex set or in a complement of a convex set. When considering complement of a convex set, we restrict our analysis to a simpler setting where this set is a union of half spaces. Here, under further simplifying assumptions, we highlight some of the key features of the solution. Garivier and Kaufmann (2016) solve an equivalent optimization problem in the best arm setting. They further use the solution to arrive at an adaptive δ-pac algorithm whose sample complexity asymptotically matches the lower bound (as δ 0). We note that their algorithm can also be adapted to the problems we consider to again arrive at an adaptive δ-pac algorithm whose sample complexity asymptotically matches the corresponding lower bound. The rest of the paper is organized as follows: In Section 2, we state the lower bound inequality from Kaufmann et al. (2016) and state the resultant lower bound problem in our framework as an optimization problem. We also spell out preliminaries such as the single parameter exponential family distributions and related assumptions in this section. In Section 3, we characterize the solution to the lower bound problem for various special cases of partition of Ω into sets A and A c (complement of set A). For the threshold crossing problem (Section 3.1), we give a closed form expression for the solution to the lower bound problem and discuss its applications to a specific problem in finance. For the half-space problem (Section 3.2), we give a simple characterization of the solution that is useful in designing the sampling rule in our δ-pac algorithm. Similarly, for the problem where Ω is partitioned into a convex set and its complement, we derive some useful properties of the solution to the lower bound problem (Sections 3.3 and 3.4). In Section 4, we propose a δ-pac algorithm that in substantial generality achieves the derived lower bounds asymptotically as δ decreases to zero. 2 Preliminaries and basic optimization problem Recall that Ω denotes a collection of vectors ν = (ν 1,..., ν K ) where each ν i is a probability distribution in R. Further, Ω = m A i where the A i are disjoint, and thus partition Ω. Let KL(µ i ν i ) = log( µ i ν i (x))dµ i (x) denote the Kullback-Leibler divergence between distributions µ i and ν i. We further assume that for each ν, ν Ω, the components ν i and ν i for each i are mutually absolutely continuous and the expectation KL(ν i ν i ) exists (it may be inite). For p, q (0, 1), let d(p, q) := p log ( ) ( ) p 1 p + (1 p) log, q 1 q that is d(p, q) denotes the KL-divergence between Bernoulli distributions with mean p and q, respectively. For any set B, let B c denote its complement, B o its interior, B its closure and B its boundary. Under a δ - PAC algorithm, and for µ A j it follows from Kaufmann et al. (2016) that ( ) 1 E µ N i KL(µ i ν i ) d(δ, 1 δ) log (1) 2.4δ for any ν A c j, where N i denotes the number of times Arm i is pulled by the algorithm. The assumption that KL(ν i ν i ) exists, allows the use of Wald s Lemma in proof of (1) in Kaufmann et al. (2016). It is easy to see that d(δ, 1 δ) log δ 1 as δ 0. Taking t i = E µ N i / log( 1 2.4δ ), our lower bound problem can be modelled as the following convex programming problem, when µ A j : min K t i (2) t=(t 1,...,t K ) s.t. ν A c j K t ikl(µ i ν i ) 1, (3) 3 t i 0 i.

4 Observe that, by making the following change of variables letting P K {w R k : w i 0 i, w i = 1} t i j t j = w i and 1 j t j = Λ, and denote the K-dimensional probability simplex, our optimization problem maybe equivalently stated as The above problem is in turn equivalent to max w PK,Λ s.t. ν A c j K w ikl(µ i ν i ) Λ. Λ max w P K ν A c j w i KL(µ i ν i ). (Problem LB) Let C (µ) be the optimal value of the above problem. The lower bound on the total expected number of samples is then given by log( 1 2.4δ )T (µ) where T (µ) = 1/C (µ). Remark 1. As mentioned earlier, the optimization problem in (2) and (3) is equivalent to Problem LB. Its one advantage is that it can be viewed as a linear program with initely many constraints, or a semi-inite linear program (see, e.g., López and Still (2007)). Then linear programming duality provides a great deal of insight into the solution structure for the problems that we consider. However, we instead present our analysis on Problem LB, since on it Sion s minimax theorem can be applied to directly arrive at the solution. Remark 2. For any w P K, the sub-problem in LB, ν A c j K w ikl(µ i ν i ), has an elegant geometrical interpretation. For c > 0, consider the sublevel set S(µ, w, c) { ν : } w i KL(µ i ν i ) c. Then, for element-wise strictly positive w, S(µ, w, 0) = {µ}. The set S(µ, w, c) for some c > 0 intersects with A c j. Further, the set shrinks as c reduces. We are looking for the smallest c for which S(µ, w, c) has a non-empty intersection with Āc j. Equivalently, we are looking for the first c > 0 for which the set grows beyond the interior of A j and intersects with Āc j. Thus, ν Āc j w i KL(µ i ν i ) = {c : S(µ, w, c) Āc j }. Garivier and Kaufmann (2016) considered Problem LB in the best arm setting. In Section 3, we discuss how the lower bound problem simplifies in other specific settings. In Section 4, we analyze an asymptotically optimal δ-pac algorithm in the general PI setting. 2.1 Single Parameter Exponential Families In the remaining paper, we consider single parameter exponential family (SPEF) of distributions for each arm. For each 1 i K, let ρ i denote a reference measure on the real line, and let ( ) Λ i (η) log exp(ηx)dρ i (x). x R 4

5 Λ i is referred to as a cumulant or the log-partition function. Further, set D i {η : Λ i (η) < }. An SPEF distribution for arm i and η D i, p i,η has the form dp i,η (x) = exp(ηx Λ i (η))dρ i (x). Note that Λ i is C in Di o (see, e.g., Dembo and Zeitouni (2011)). Further, Λ i (η) is a convex function of η Di o, and if the underlying distribution is non-degenerate, then it is strictly convex. Let Λ i denote the Legendre-Fenchel transform of Λ i, that is Λ i (θ) = sup η D i (ηθ Λ i (η)). Further, let µ i denote the mean under p i,ηi. Then, µ i = Λ i(η i ) for η i D o i. In particular, µ i is a strictly increasing function of η i, and there is one to one mapping between the two. Below we suppress the notational dependence of µ i on η i and vice-versa. Let U i {Λ i(η i ), η i D o i }. Since Λ i (η i) is strictly increasing for η i Di o, U i is an open interval, and sans the boundary cases, denotes the value of means attainable for arm i. For η i Di o, the following are well known and easily checked. For η i, β i Di o, It is easily seen that η i = Λ i (µ i ) Λ i (µ i ) + Λ i (η i ) = µ i η i. (4) KL(p i,ηi p i,βi ) = Λ i (β i ) Λ i (η i ) µ i (β i η i ) where again µ i = Λ i (η i). We denote the above by K i (µ i ν i ) with ν i = Λ i (β i) emphasizing that when the two distributions are from the same SEPF, Kullback-Leibler divergence only depends on the mean values of the distributions. Using (4), we have K i (µ i ν i ) = Λ i (µ i ) Λ i (ν i ) β i (µ i ν i ), (5) where β i = Λ i (ν i ). Again, it can be shown that Λ i is C in U i (see, Dembo and Zeitouni (2011)), and it is strictly convex if Λ i is strictly convex. Thus, K i is C in U i with respect to each of its arguments. In the remaining paper, Problem LB refers to max w P K ν A c j w i KL(µ i ν i ), (6) with µ, ν taking values in R K and A c j a subset of RK. 5

6 2.1.1 Conditions on KL-Divergence Since Λ i is a convex function, we have that K i is convex in its first argument. Since K i (µ i ν i ) decreases with ν i for ν i µ i, and it increases with ν i for ν i µ i, it is a quasi-convex function of ν i. Remark 3. For many known SPEFs, for instance, Bernoulli, Poisson and Gaussian with known variance, the KL-divergence is also strictly convex in the second argument. But there are also SPEFs for which it is not convex in the second argument, for e.g., Rayleigh, centered Laplacian and negative Binomial (with number of failures fixed). Problem LB is easier to analyze when the sublevel sets of K w ikl(µ i ν i ) are convex, i.e., when K w ikl(µ i ) is a quasiconvex function. But it is known that sum of quasi-convex functions need not be quasi-convex. Further, is also known that (see Yaari (1977)), if u is a quasi-convex real function defined on R n and is of the form u(x 1, x 2,..., x n ) = u 1 (x 1 ) + u 2 (x 2 ) + + u n (x n ), where u i, 1 i n, are real continuous functions whose domains are intervals on the real line, then at least n 1 of the functions u i, 1 i n are convex. We therefore restrict ourselves only to those SPEFs whose KL-divergence is strictly convex in its second argument. Assumption 1. For each i, Di o is non-empty and Λ i(η i ) is strictly convex for η i Di o. Further, for any µ i U i, K i (µ i ν i ) is a strictly convex function of ν i U i. Under this assumption, the function K w ikl(µ i ) satisfies strict convexity. In addition, the following assumption considerably simplifies our analysis. Assumption 2. For any µ i U i, K i (µ i ν i ) as ν i U i with ν i taking values in U i. 3 Lower bounds for some PI problems 3.1 Threshold crossing problem Let U = K Consider a Ω = A 1 A 2 where A 1 = {µ U : max i K µ i > u}, and A 2 = {µ U : max i K µ i < u}. In this section, we consider the associated lower bound problem for µ Ω. We first discuss how the threshold crossing problem arises naturally in nested simulation used in financial portfolio risk measurement. Example 1. Consider the problem of measuring tail risk in a portfolio comprising financial derivatives. The key property of a financial derivative is that as a function of underlying stock prices or other financial instruments, it s value is a conditional expectation (see, e.g., Duffie (2010), Shreve (2004)). Thus, the value of a portfolio of financial securities that contains financial derivatives can also be expressed as a conditional expectation given the value of underlying financial instruments. Suppose that (X 1,..., X K ), where each X t is a vector in a Euclidean space, denote the macroceconomic variables and financial instruments at time t, such as prevailing interest rates, stock index value and stock prices, on which the value of a portfolio depends. For notational convenience we have assumed that times take integer values. U i. 6

7 Portfolio loss amount at any time t is a function of X t (X 1,..., X t ) and is given by E(Y t X t ) for some random variable Y t (see, e.g. Gordy and Juneja (2010), Broadie et al. (2011) for further discussion on portfolio loss as a conditional expectation, and the need for nested simulation). The quantity E(Y t X t ) is not known, however, conditional on X t, independent samples of Y t can be generated via simulation. Our interest is in estimating the probability that the portfolio loss by time K exceeds a large threshold u or γ P ( max 1 t K Z t u), (7) where Z t = E(Y t X t ). These probabilities typically do not have a closed form expression and are estimated using Monte Carlo simulation. An algorithm to estimate this probability maybe nested and is given as follows: 1. Repeat the outer loop iterations for 1 j n. 2. At outer loop iteration j, generate through Monte Carlo a sample of underlying factors (X 1,j,..., X K,j ). 3. Given this sample, we need to ascertain whether Then, W j max 1 t K Z t,j u, where Z t,j = E(Y t X t,j ). This fits our framework of threshold crossing problem where we may sequentially generate conditionally independent samples of Y t for each t conditional on (X 1,j,..., X t,j ) and arrive at an indicator Ŵj that equals W j with probability 1 δ. ˆγ n ( ) 1 n denotes our estimator for γ. There are interesting technical issues related to optimally distributing computational budget in deciding the number of samples in the outer loop, in the inner loop and the value of δ to be selected. These issues, however, are not addressed in the paper and may be a topic for future research. Theorem 1 below points to an interesting asymmetry that arises in the lower bound problem associated with threshold crossing as a function µ Ω. Theorem 1. Suppose that (u,..., u) U. Consider µ A 1 such that, w.l.o.g., for some i 1, n Ŵ j µ j > u for j = 1,..., i, µ j < u for i + 1 j K, and K 1 (µ 1 u) > K j (µ j u) for j = 1,..., i. Then, Problem LB has a unique solution given by w 1 = 1, and w j = 0 for j = 2,..., K. (8) The lower bound on expected total number of samples generated equals 1 K 1 (µ 1 u) log( 1 2.4δ ). When µ A 2, Problem LB has a unique solution given by w j 1/K j (µ j u), 1 j K, (9) and the lower bound on expected total number of samples generated equals 1 K j (µ j u) log( 1 2.4δ ). 7

8 Proof: To see (8), first observe that due to continuity of each K j (µ j ν j ) as a function of ν j U j, we have w j K j (µ j ν j ) = w j K j (µ j ν j ), ν A 2 ν Ā2 where recall that for any set A, Ā denotes its closure. The RHS above is solved by in the sense that for any other ν Ā2, w j K j (µ j ν j ) Our lower bound problem reduces to ν = (u,..., u, µ i+1,..., µ k ) w j K j (µ j ν j ) = max w P K i w j K j (µ j u). i w j K j (µ j u). This can easily be seen to be solved uniquely by w1 = 1, w j = 0 for j = 2,..., K, and the optimal value C is K 1 (µ 1 u). The lower bound on the overall expected number of samples generated is then given by log( 1 2.4δ )/C. To see (9), observe that to simplify ν Ā 1 K w jk j (µ j ν j ), it suffices to consider ν(s) Ā 1 for each s (1 s K) where in the sense that for any ν Ā1 w j K j (µ j ν j ) ν(s) (µ 1,..., µ s 1, u, µ s+1,..., µ k ), min s=1,...,k w j K j (µ j ν j (s)) = The lower bound problem then reduces to The solution to this problem is given by ( K max min w j K j (µ j u). w P K j w j 1/K j (µ j u) j, min w sk s (µ s u). s=1,...,k and the optimal value C 1 1. is K j (µ j u)) The lower bound on the overall expected number of samples generated is equal to log( 1 2.4δ )/C. 3.2 Half-space problem In this section, we consider the problem of identifying the half-space to which the mean vector belongs, i.e., where the partition is defined by the hyperplane K k=1 a kν k = b. Set A 1 = {ν R K U : 8 a k ν k < b} k=1

9 and A 2 = {ν R K U : a k ν k > b}. W.l.o.g. each a i can be taken to be non-zero and b > 0. Problem LB may be formulated as: For µ A 1, and non-empty A 2, max w P K ν Ā2 k=1 w j K j (µ j ν j ). (10) Theorem 2. Under Assumptions 1, 2, and that A 2 is non-empty, there is a unique optimal solution (w, ν ) to Problem LB. Further, K i (µ i ν i ) = K 1 (µ 1 ν 1) i, (11) a k νk = b, (12) k=1 ν i > µ i if a i > 0, and ν i < µ i if a i < 0. (13) Relations (11), (12) and (13) uniquely specify ν U. Moreover, w i a i K i(µ i ν i ) = w 1 a 1 K 1(µ 1 ν 1) i. (14) Let u i = sup{u U i }, and u i = {u U i }. Further, set û i = u i if a i > 0, and û i = u i if a i < 0. The following lemma is useful in proving Theorem 2. Lemma 1. Under Assumption 2, the following are equivalent 1. A K a iû i > b. 3. There exists a unique ν U such that (11), (12) and (13) hold. Proof of Lemma 1: Claim 1 implies existence of ν such that K a iν i > b and K i (µ i ν i ) < for all i. Claim 2 follows as a i ν i < a i û i. To see that Claim 2 implies Claim 3, recall that K i (µ i ν i ) equals zero at ν i = µ i. It strictly increases with ν i for ν i µ i and it strictly reduces with ν i for ν i µ i. Assume w.l.o.g. that a 1 > 0, and for ν 1 µ 1, consider the function ν i (ν 1 ) = K 1 i (K 1 (µ 1 ν 1 )) where ν i (ν 1 ) µ i if a i > 0, and ν i (ν 1 ) µ i if a i < 0. Now, the function h(ν 1 ) a i ν i (ν 1 ) < b for ν 1 = µ 1 and it strictly increases with ν 1. 9

10 Further, observe that as ν 1 u 1, ν i (ν 1 ) u i if a i > 0, and ν i (ν 1 ) u i if a i > 0. Thus, h(ν 1 ) K a iû i and thus there exists a unique ν U so that h(ν1 ) = b, and (11) and (13) hold. To see that Claim 3 implies Claim 1, observe that Claim 3 guarantees that (ν 1, ν 2 (ν 1),..., ν K (ν 1)) U By selecting ν 1 > ν1 and sufficiently small, Claim 1 follows. Proof of Theorem 2: Lemma 1 guarantees the existence of ν, w that solve Eqs. (11) to (14). Here, observe that (14) defines w. Note that ν is the solution to the optimization problem: ν Ā2 wj K j (µ j ν j ). This can be verified by observing that the first order KKT conditions for this convex programmimg problem are given by Eqs. (12) to (14) (recall that Ā 2 = {ν : K a iν i b}). Further, from Eq. (11), it follows that ν Ā2 wj K j (µ j ν j ) = For any another feasible solution w, we have wj K j (µ j νj ) = K 1 (µ 1 ν1). ν Ā2 w i K i (µ i ν i ) w i K i (µ i ν i ) K 1 (µ 1 ν 1), which shows that w is an optimal solution to the problem. Uniqueness: It remains to show that above is a unique solution to Problem LB. We skip the details as, in Section 3.4, we prove uniqueness of the solution for a general case where Ā2 is a union of half-spaces. See Lemma 3 for uniqueness in this more general setting. 3.3 A c is convex Suppose that Ω = A A c where µ A and A c is a closed convex set. To avoid trivialities, assume that Ω U. Further, there exists a ν (0) A c so that A c is non-empty. Let the associated lower bound problem be denoted by Problem CVX. max w P K ν A c w j K j (µ j ν j ). (Problem CVX) The solution to Problem CVX and each of its sub-problems ν A c K w jk j (µ j ν j ) is finite. This follows as for each feasible w P K, ν A c w j K j (µ j ν j ) w j K j (µ j ν (0) j ) < max K j (µ j ν (0) j j ). 10

11 Let C denote the optimal value for Problem CVX. Under Assumption 1, K w jk j (µ j ) is strictly convex and there is a unique ν A c that achieves the minimum in the sub-problem ν A c w j K j (µ j ν j ). Let ν(w) denote this unique solution for any w P K. Lemma 2 below shows that for every optimal solution to Problem CVX, the same ν achieves the minimum in the above sub-problem. Lemma 2. Under Assumption 1, for any w, s that are optimal for Problem CVX, ν(w ) = ν(s ). Proof. To see this, first note that ν A c K w jk j (µ j ν j ) is a concave function of w. This shows that, if w and s are two optimal solutions, then αw + (1 α)s for α (0, 1) is another optimal solution. Since it is optimal, we have (αwj + (1 α)s j)k j (µ j ν j (αw + (1 α)s )) = C. Now due to Assumption 1, wj K j (µ j ν j (αw + (1 α)s )) > C if ν(αw + (1 α)s ) ν(w ) and s jk j (µ j ν j (αw + (1 α)s )) > C if ν(αw + (1 α)s ) ν(s ), it follows that ν(w ) = ν(αw + (1 α)s ) = ν(s ). Let ν be the unique value of ν which achieves the minimum in the sub-problem for every optimal solution. In Theorem 3, we provide an alternate characterization of ν, as well as a characterization of the solution of Problem CVX. Some notation is needed to state Theorem 3. For any index set J [K] and vector ν R K, let ν J denote the projection of the vector ν on to the lower dimensional subspace with coordinate set given by J. Similarly, for any set B R K, let B J denote its projection onto the subspace restricted to the coordinate set J, i.e, B J = {ν J : ν B}. Note that if B is convex, then B J is also convex. If B is the c-sublevel set of a convex function f, then B J = {ν J : f(ν J, ν J c) c for some ν J c R J c } = {ν J : f(ν J, ν J c) c}. ν J c R J c In other words, B J is the c-sublevel set of the function h J := νj c R J c f(ν J, ν J c). Theorem 3. Suppose that µ A, A c is non-empty, and Assumptions 1 and 2 hold. Then, for any optimal solution (w, ν ) to Problem CVX, the ν uniquely solves the min-max problem max K i (µ i ν i ). (15) ν A c i Further, the following are necessary and sufficient conditions for such an (w, ν ). Let I = arg max i K i (µ i ν i ). Then, (a) w i = 0 i Ic, 11

12 (b) ν I Ac I, and (c) there exists a supporting hyperplane of (A c ) I at ν I given by i I a iν i = b such that ν i > µ i if a i > 0, and ν i < µ i if a i < 0 i I, (16) w i a i K i(µ i ν i ) = w j a j K i(µ j ν j ) i, j I. (17) Remark 4. Condition (c) shows that the problem has a unique solution, i.e., the optimal w is a singleton, if there is a unique supporting hyperplane of (A c ) I at νi satisfying (16). Consider the case where A c = {ν : f(ν) c} is the c-sublevel set of a convex function f. Then, A c I is the c-sublevel set of the function h : R I R, h(ν I ) := νi c R Ic f(ν I, ν I c). Further suppose that h( ) is a smooth function. Then, the unique tangential hyperplane at νi is given by h(νi) (ν I νi) = 0. In particular, in this case for i I, w i h ν i (ν I ) K i (µ i ν i ). Proof of Theorem 3. Let B n denote a closed ball centered at µ with radius n. Consider n sufficiently large so that ν defined as the solution to (15) lies in B n (since the objective function max i K i (µ i ν i ) is strictly convex in ν, such a ν is unique). Since A c B n is a compact set, and K w ik i (µ i ν i ) is continuous in w and ν and concave in w P K and convex in ν A c B n, by Sion s Minimax Theorem max w P K Observe that ν A c B n w i K i (µ i ν i ) = ν A c B n max K w PK w ik i (µ i ν i ) r n (w) ν A c B n = ν A c B n max i K i (µ i ν i ) = ν A c max i K i (µ i ν i ). (18) w i K i (µ i ν i ) is continuous in w (see Theorem 2.1 in Fiacco and Ishizuka (1990)) and decreases with n to r(w) ν A c K w ik i (µ i ν i ). Thus, we have uniform convergence (see Theorem 7.13 in Rudin (1976)) sup w P K r n (w) r(w) 0. This in turn implies that max w P K r n (w) max w P K r(w). From (18) it follows that LHS above is independent of n. Therefore, the min-max relation max w P K ν A c w i K i (µ i ν i ) = max ν A c i K i (µ i ν i ) (19) holds. Now if (w, ν ) is a saddlepoint of the minmax problem, and since ν is unique, it equals ν. 12

13 Necessity of conditions on optimal (w, ν ): Let I = arg max i K i (µ i νi ). The minimax equality in (19) shows that (w, ν ) is a saddle point, and therefore, w solves the optimization problem max (w 1,...,w K ) P K From this, it is easy to see that w i = 0 i Ic. To see (b), note that ν uniquely solves the optimization problem min (ν 1,...,ν K ) A c w j K j (µ j νj ). (20) wj K j (µ j ν j ). (21) If ν I is in the interior of Ac I, it is easy to come up with ν ν on A c, with a smaller value of K w j K j(µ j ν j ). Now, consider the convex set { C := ν I R I : i I w i K i (µ i ν i ) < i I w i K i (µ i ν i ) } (convexity of C follows from Assumption 1). By the separating hyperplane theorem, there exists a hyperplane i I a iν i = b that separates C and A c I. Since ν I C Ac I, this hyperplane passes through νi, and is a supporting hyperplane to both convex sets C and Ac I. From the fact that it is a supporting hyperplane to C at νi, we have This proves (c). w i a i K i(µ i ν i ) = w j a j K i(µ j ν j ) i, j I. Sufficiency: Let ν and w be such that (a), (b), (c) hold. Note that i I a iµ i < b and (A c ) I {ν I : i I a iν i b}. Then, from Theorem 2, wi and ν I solve the following half space problem in the lower dimensional subspace restricted to coordinate set I: max w I P I ν I : w i K i (µ i ν i ). i I a iν i b In particular, i I ν I : w i I a i K i (µ i ν i ) = wi K i (µ i νi ). iν i b i I i I Further, for any w I, note that w i K i (µ i ν i ) ν I (A c ) I ν I : w i K i (µ i ν i ). i I a iν i b This shows that wj K j (µ j ν j ) = ν A c i I w ν I (A c i K i (µ i ν i ) = ) I i I i I i I wi K i (µ i νi ) = max K i (µ i νi ). i Now, consider any w which is a feasible solution of Problem CVX. Then, ν A c w i K i (µ i ν i ) w i K i (µ i νi ) max K i (µ i νi ). i This proves our claim that w, ν(w ) = ν form an optimal solution. 13

14 3.4 A c is non-convex union of half spaces In Sections 3.2 and 3.3, A c is convex, which allows us to explicitly characterize the solution to the lower bound problem. In this section, we consider a problem where A c is not convex. Specifically, we examine the case where A c is a union of half-spaces. Just as the single halfspace problem was useful in studying the case where A c is convex, analyzing A c when it is a union of half-spaces, may provide insights to a more general problem where A c is a union of convex sets. In Section 3.4.1, we restrict ourselves to two arms both having a Gaussian distribution with known and common variance. This simple setting lends itself to elegant analysis and graphical interpretation. Extensions to general distributions, K > 2 arms and settings where A c is a union of convex sets are part of our ongoing research. Let B j {ν R K : a j,k ν k b j }, (22) each b j 0, and A c = m B j be the union of these half-spaces. Again, suppose that A c U. The lower bound problem can be expressed as max w P K ν m B j k=1 w i K i (µ i ν i ) = max min w P K j ν B j w i K i (µ i ν i ). (23) Remark 5. It is easy to see that the best arm identification problem is a special case of this problem. To see this, suppose arm 1 has the highest mean among the K arms, i.e., µ 1 µ j j 1. We then have A c = K j=2 B j, where for any j, B j = {ν R K : ν j ν 1 }. Lemma 3 shows that the optimization problem in (23) has a unique solution. Lemma 3. There is a unique w P K that achieves the maximum in (23). Proof. Denote the optimal value of (23) by C. We first show that if q, s P K are two distinct optimal solutions and ν(q), ν(s) A c, respectively achieve the minimum in the sub-problem, then ν(q) ν(s). To see this, suppose ν(q) = ν(s) = ν B j for some 1 j m. Then ν achieves the minimum in the subproblem ν Bj K w ik i (µ i ν i ) for both w = q and w = s. Hence, both q, s solve the following equations: w i K i (µ i ν i ) = C, (24) w i a j,i K i(µ i ν i ) = w 1 a j,1 K 1(µ 1 ν 1 ) i. (25) This is a contradiction as the above set of equations has a unique solution. Now, suppose q, s P K are two distinct optimal solutions of the convex program (23). Then any convex combination z = αq + (1 α)s is also an optimal solution. Let ν(z) achieve the minimum in the sub-problem for z. Then C = z i K i (µ i ν i (z)) = α q i K i (µ i ν i (z)) + (1 α) s i K i (µ i ν i (z)). In addition, for any ν, we have K w ik i (µ i ν i ) C for both w = q and w = s. Then, the above equality is possible only if K q ik i (µ i ν i (z)) = K s ik i (µ i ν i (z)) = C. This in turn implies that ν(z) achieves the minimum in the sub-problem for both q, s, which is a contradiction to our earlier result. Hence proved. 14

15 3.4.1 Two arms Gaussian setting To illustrate the issues that arise with A c being non-convex, consider a simple setting of two arms. Both are assumed to have a Gaussian distribution and the variance of each arm is assumed to be 1/2. W.l.o.g. mean of each arm is set to zero. Then, for j = 1, 2, B j = {ν R 2 : a j,1 ν 1 + a j,1 ν 2 b j }, (26) and A c = B 1 B 2 be the union of the two half-spaces. To avoid degeneracies we assume that each a j,k 0. Further suppose that a 1,1 a 1,2 a 2,1 a 2,2 so that A c is non-convex. The lower bound problem is then given by max (w 1,w 2 ) P 2 ν A c 2 w i νi 2. (27) The following geometrical result provides useful insights towards solution of (27). Proposition 1. For w 1, w 2, C > 0, a necessary and sufficient condition for an ellipse of the form to be uniquely tangential to lines and is that Then, the tangential ellipse is specified by and The ellipse (28) meets the line (29) at point ( Ca1,1, Ca ) 1,2 w 1 b 1 w 2 b 1 and it meets line (30) at point w 1 ν w 2 ν 2 2 = C (28) a 1,1 ν 1 + a 1,2 ν 2 = b 1 (29) a 2,1 ν 1 + a 2,2 ν 2 = b 2 (30) min a 2,k < b 2 < max k=1,2 a 1,k b a 2,k. (31) 1 k=1,2 a 1,k w 1 C = (a 1,2a 2,1 ) 2 (a 1,1 a 2,2 ) 2 (b 2 a 1,2 ) 2 (b 1 a 2,2 ) 2 (32) w 2 C = (a 1,2a 2,1 ) 2 (a 1,1 a 2,2 ) 2 (b 1 a 2,1 ) 2 (b 2 a 1,1 ) 2. (33) ( Ca2,1, Ca ) 2,2. w 1 b 2 w 2 b 2 Proof. A necessary and sufficient condition for ellipse (28) to be tangential to line (29) at point (ν1, ν 2 ) is for (ν 1, ν 2 ) to satisfy the two equations of ellipse and the line, respectively, and the slope matching condition w 1 ν1 = w 2ν2. (34) a 1,1 a 1,2 The fact that (ν1, ν 2 ) satisfies (28) and (29) implies that (34) equals C/b 2. Plugging (ν1, ν 2 ) from (34) into (28), we observe, a 2 1,1 + a2 1,2 = b2 1 w 1 w 2 C. 15

16 Similarly, considering the other half-space, we get a 2 2,1 w 1 + a2 2,2 w 2 = b2 2 C. The result follows by solving the two equations. Theorem 4. The solution to (27) depends in the following way on the underlying parameters Case 1: ( ) ( ) 2 b2 a 2 2,1 a 1,1 + a2 2,2 ( a 1,1 + a 1,2 ) 1. (35) a 1,2 b 1 In this case, (27) reduces to the half-space problem where A c = B 1 so that the optimal solution to (27) is given by wi a 1,i =, i = 1, 2, (36) a 1,1 + a 1,2 and the optimal value C = Case 2: b 2 1 ( a 1,1 + a 1,2 ) 2. ( b2 b 1 ) 2 ( ) a 2 1 1,1 a 2,1 + a2 1,2 ( a 2,1 + a 2,2 ). a 2,2 This simply corresponds to Case 1, with the (a 1,1, a 1,2, b 1 ) interchanged with (a 2,1, a 2,2, b 2 ). Case 3: ( ) a 2 1 ( ) ( ) 1,1 a 2,1 + a2 2 1,2 b2 a 2 2,1 ( a 2,1 + a 2,2 ) < < a 2,2 a 1,1 + a2 2,2 ( a 1,1 + a 1,2 ) 1. (37) a 1,2 Here (31) holds, and the optimal w1 and w 2 are given by (32) and (33), respectively. b 1 Proof. Case 1: First consider the half-space problem where A c = B 1. Our analysis in 3.2 shows that there is a unique (w1, w 2 ) and (ν 1, ν 2 ) that solves the resulting problem, and a 1,1 ν 1 + a 1,2ν 2 = b 1 so that Further, from sign(a 1,1 )ν 1 = ν 1 = sign(a 1,2 )ν 2 = ν 2, ν 1 = ν 2 = b 1 a 1,1 + a 1,2. w 1 ν 1 a 1,1 = w 2 ν 2 a 1,2, (38) it follows that for the half-space problem, wi a 1,i is the optimal solution and the optimal value C b = 2 1 ( a 1,1 + a 1,2. ) 2 Returning to (27), we show that when (35) is true and and wi a 1,i, ν B 2 2 wi K i (µ i ν i ) = w1ν w2ν 2 2 C ν:a 2,1 ν 1 +a 2,2 ν 2 b 2 and hence w i a 1,i continues to be optimal for (27). We first find the point (κ 1, κ 2 ) B 2 that achieves the minimum in the above optimization problem. We know that (κ 1, κ 2 ) satisfies a 2,1 κ 1 + a 2,2 κ 2 = b 2, 16

17 and the slope matching condition It follows from easy calculations that w 1 κ 1 a 2,1 = w 2 κ 2 a 2,2. w1ν w2ν 2 2 = ν:a 2,1 ν 1 +a 2,2 ν 2 b 2 a 2 2,1 w 1 b a2 2,2 w 2 = ( a 2 2,1 a 1,1 + a2 2,2 a 1,2 b 2 2 ) ( a 1,1 + a 1,2 ). b The above expression is greater than 2 1 ( a 1,1 + a 1,2 when (35) is true, which gives us the required ) 2 result. Case 2: Case 2 follows similarly as Case 1. Case 3: It is easy to see that (37) implies (31). Let (w1, w 2 ) denote the optimal solution to (27). It is clear that the corresponding ellipse must be tangential to both the half lines a 1,1 ν 1 + a 1,2 ν 2 = b 1 and a 2,1 ν 1 + a 2,2 ν 2 = b 2, since if it does not touch one of these half lines, then the associated constraint can be ignored in solving (27). However, that violates (37). Therefore, the solution is provided by Proposition A c is a union of many half spaces We now provide an algorithm to solve the lower bound problem (27) for general m when the number of arms K = 2. Again, both arms are assumed to have a Gaussian distribution, the variance of each arm is assumed to be 1/2, and mean of each arm is set to zero. The algorithm is outlined somewhat ormally emphasising its graphical interpretation. Observe that in the discussion in Section ellipse 2 w iνi 2 touches the line a 1,1 ν 1 + a 1,2 ν 2 = b 1 (i.e., they have a non-empty intersection) if and only if it touches lines a 1,1 ν 1 + a 1,2 ν 2 = b 1, a 1,1 ν 1 a 1,2 ν 2 = b 1 and a 1,1 ν 1 a 1,2 ν 2 = b 1. Further, it is tangential to them at points symmetric around the axes. Thus, without loss of generality we could have restricted the analysis to a 1,1, a 1,2, a 2,1, a 2,2 > 0. With this in mind, consider m half-spaces (B j : j = 1,..., m), where B j = {ν R 2 : a j,1 ν 1 + a j,2 ν 2 b j }, (39) where we take each a j,k and b j to be strictly positive. Let L j = {ν R 2 : a j,1 ν 1 + a j,2 ν 2 = b j } (40) for each j denote the associated line. Further, without loss of generality, suppose that a 1,1 a 1,2 > a 2,1 a 2,2 >... a 2,m a 2,m. To ensure that any two of the lines intersect in the positive quadrant (else one of the two lines is above the other line in the positive quadrant and can be ignored), we further assume that a j,1 a j+1,1 > for j = 1,..., m 1. Recall that our optimization problem is b j b j+1 > a j,2 a j+1,2 max (w 1,w 2 ) P 2 ν m B j 17 2 w i νi 2. (41)

18 For any w [0, 1], let C(w) = wν ν m B (1 w)ν2. 2 j Our aim is to maximize C(w) for w [0, 1]. Again, this is a concave programming problem so a local optimal is global optimal. The algorithm starts with value of w = 0 and proceeds with increasing w so that C(w) increases. It stops when it reaches a point where further increase in w leads to reduction in C(w). At each value of w considered in the algorithm, C(w) is known. The ellipse wν (1 w)ν 2 2 = C(w) has the property that it is tangential to one or more lines (L j : j m) and always lies in Ā. This ensures that wν (1 w)ν2 2 C(w) for all ν m B j. Algorithm: The algorithm proceeds in stages. Set i 1 = 1. Stage j = 1 starts with the ellipse tangential with L 1. Specifically, set w = 0, C(w) = 0 and the ellipse wν (1 w)ν2 2 = C(w) is tangential to L 1 at point (ν 1 (0), ν 2 (0)) = (b 1 /a 1,1, 0). Let C(w), ν 1 (w), and ν 2 (w) be solution to equations wν (1 w)ν 2 2 = C, (42) a 1,1 ν 1 + a 1,2 ν 2 = b 1, (43) wν 1 = (1 w)ν 2. (44) a 1,1 a 1,2 From Lemma 4 below, it follows thats as w increases from 0, C(w) and ν 2 (w) increase while ν 1 (w) reduces. With increasing w either the ellipse touches another line, call it a L i2, before w equals 1,1 a 1,1 +a 1,2 or w equals this value before ellipse touches any other line. In the latter case, the points (w, C(w), ν 1 (w), ν 2 (w)) are a solution to the problem where A c = B 1 and hence also for A c = m B j since the ellipse does not intersect with the half-spaces B j for 2 j m. Else, let (ν 1 (w), ν 2 (w)) denote the point where the ellipse meets L i2. The algorithm moves to stage j = 2 The algorithm at the start of any stage j 2 proceeds by increasing w so that the ellipse remains tangential to L ij. Lemma 5 below ensures that at any stage j 2, with increasing w, the ellipse wν 1 (w) 2 + (1 w)ν 2 (w) 2 = C(w) does not intersect with lines L k, k i j 1. Again, if (ν 1 (w), ν 2 (w)) denote the point of intersection of the ellipse with L ij. Then, C(w) and ν 2 (w) increase with w and ν 1 (w) reduces with increase in w. Stage j ends when either ellipse satisfying the optimality condition for the single half space problem where a ij,1 A c = B ij, w = a ij,1 + a ij,2 or before it reaches that value of w, the ellipse becomes tangential to another line, L ij+1, for some i j+1 > i j and i j+1 m. At that point, 18

19 either the current ellipse is optimal for the problem A c = B ij B ij+1, so that a ij+1,1 a ij+1,1 + a ij+1,2 w < a ij,1 a ij,1 + a ij,2 and no further improvement by increasing w is possible, or if w < a ij+1,1 a ij+1,1+a ij+1,2 The algorithm clearly terminates for j m. Some notation is needed to state Lemma 4., then j is incremented by 1 and the next stage is initiated. For a 1, a 2, b > 0, let C(w), x(w), y(w) denote the solutions to the system of equations below for each w (0, 1). wx 2 + (1 w)y 2 = C, (45) a 1 x + a 2 y = b, (46) wx (1 w)y =. (47) a 1 a 2 Let C (w), x (w) and y (w) denote the respective derivatives of C(w), x(w) and y(w) with respect to w. Lemma 4. For w (0, 1), C (w) = b 2 ( a2 1 w + a2 2 1 w )2 ( a 2 1 w 2 a 2 ) 2 (1 w) 2. (48) and it is positive for w < a 1 a 1 +a 2, equals zero at w = a 1 a 1 +a 2 and is negative for w > a 1 a 1 +a 2. Further, x(w) = so that x (w) < 0 and y (w) = a 1 a 2 x (w) > 0 for w (0, 1). a 1 b ( ), (49) a w a2 2 1 w Proof of Lemma 4: Observe first that by substituting for wx and (1 w)y from (47) in (45), we see that (47) equals C(w)/b. Then, using expressions for for x and y in (47) and plugging them in (46), we get C(w) = b 2 a 2 1 w + a2 2 1 w (48) follows from differentiation. (49) follows by substituting for y in (47) using (46). The remaining statements are obvious. Some notation is needed for Lemma 5. For c 1, c 2, d > 0, for each w (0, 1), consider a solution C(w), x(w), y(w) to. wx 2 + (1 w)y 2 = C, (50) c 1 x + c 2 y = d, (51) wx (1 w)y =. (52) c 1 c 2 Thus, (x(w), y(w)) correspond to the point determined by intersection of the ellipse (50) (with C(w) in place of C) with line (51) so that the ellipse is tangential to the line at the point of intersection. 19

20 Further, for a 1, a 2 > 0, assume that a 1 the solution x(w), ỹ(w), s(w) > 0 to a 2 > c 1 c 2. For each w (0, 1) and C(w) as above, consider w x 2 + (1 w)ỹ 2 = C(w), (53) a 1 x + a 2 ỹ = s, (54) w x (1 w)ỹ =. (55) a 1 a 2 Thus ellipse wx 2 + (1 w)y 2 = C(w) intersects with the line a 1 x + a 2 y = s(w) at point ( x(w), ỹ(w)) and s(w) is chosen so that the ellipse is tangential to the line. Lemma 5. For w (0, 1), s(w) is a decreasing function of w. Lemma 5 implies that with increase in w the distance between the ellipse (50) and any line of the form a 1 x + a 2 y = b for b > s(w) increases. Proof of Lemma 5: Observe as in Proof of Lemma 4 that C(w) = d 2 c 2 1 w + c2 2 1 w = s(w)2 a 2 1 w + a2 2 1 w. Thus, ( d s(w) 2 2 a 2 2 = c 2 2 ) a a 2 w w 2 c c 2 w w 2. Since a2 1 a 2 2 > c2 1, it follows that s(w) is a decreasing function of w. c An asymptotically optimal algorithm In this section, we present a δ-pac algorithm (Algorithm 1) for the PI problem which, under some conditions, achieves asymptotically optimal mean termination time as δ 0. Both the algorithm and its analysis closely follow the best arm identification in Garivier and Kaufmann (2016). The sampling and stopping rules used in the algorithm (described below) are inspired by the lower bound Problem LB. Some notation: In Problem LB (with SPEF distributions replaced by the corresponding means), let W (µ) and C (µ) respectively denote the optimal solution set and optimal value. Also, let V (µ, w) and g(µ, w) respectively denote the optimal solution set and optimal value of the inner sub-problem. Sampling Rule The basic idea is to draw samples according to estimated optimal sampling ratios obtained by solving the lower bound problem with empirical means of the parameters. In other words, if ˆµ(t) is the vector of empirical means of the arms at time t, an arm is chosen to bring the ratio of total number of samples for all the arms closer to an optimal ratio ŵ(t) W(ˆµ(t)). But this simple strategy may result in erroneously giving too few samples to an arm due to initial bad estimates preventing convergence to the correct value in subsequent time-slots. This difficulty can be dealt through forced exploration for each arm to ensure sufficiently fast convergence. This idea was used in Garivier and Kaufmann (2016) for the best arm problem, where they propose two rules C-Tracking and D-Tracking which ensure convergence to the correct sampling ratio. We use the D-Tracking rule they propose as the sampling rule in our algorithm. The rule can be described as follows. Let N i (t) denote the number of samples of arm i at 20

21 time t for all i and let ŵ(t) W(ˆµ(t)). If there exists an arm i such that N i (t) < t K/2, choose that arm. Otherwise, choose an arm that has the maximum difference between the estimated optimal ratio and the actual fraction of samples, i.e., an arm is chosen from arg max i ŵ i (t) N i (t)/t. This sampling rule has the following properties: (i) each arm gets Ω( t), (ii) if the estimated sampling ratios converge to an optimal ratio, then the actual fraction of samples also converges to the same optimal ratio. We state below the result from Garivier and Kaufmann (2016) which shows these properties. Lemma 6. The D-tracking rule ensures that min i N i (t) ( t K/2 ) + 1 and that for all ɛ > 0, for all t 0, there exists t ɛ t 0 such that if sup t t0 max i ŵi (t) w i ɛ for some w PK, then Stopping Rule sup max t t ɛ i N i (t) t w i 3(K 1)ɛ. The stopping rule uses a threshold rule that imitates the lower bound (1). It first finds the partition in which the empirical mean vector ˆµ(t) lies. Denote this partition by A(t). If ν A c (t) N i (t)k i (ˆµ i (t) ν i ) β(t, δ), i then it stops and declares A(t) as the partition in which µ lies. Otherwise it continues to sample arms according to the D-Tracking rule. We will set the threshold β(t, δ) = log ( ) ct δ, where c is some constant. Algorithm 1 Algorithm for one parameter exponential families At time t, Compute weights w(ˆµ(t 1)) and sample according to D-Tracking rule Let ˆµ(t) A(t). if ν A c (t) i N i(t)k i (ˆµ i (t) ν i ) β(t, δ) then Declare µ A(t). end if Sampling Rule Termination Rule Sample Complexity Analysis Let T U (δ) be the time at which Algorithm 1 terminates. Then we have the following guarantee. Theorem 5. Suppose that Ω U and Assumptions 1 and 2 hold. If Problem LB has a unique optimal solution, i.e., if W(µ) = 1, then Algorithm 1 is a δ-pac algorithm with lim sup δ 0 E[T U (δ)] log ( ) 1 T (µ). δ As seen in Section 3, for threshold crossing, and when A c is a half-space problem or union of half-spaces, Problem LB has a unique optimal solution. We also observed that when A c is a closed convex set and the associated ν A c is a smooth point (with a unique supporting hyperplane), then Problem LB again has a unique optimal solution. Before, we prove Theorem 5, we will first prove the following continuity result. Lemma 7. Under conditions of Theorem 5, the function g is continuous at (µ, w) for any w P K. Further, if Problem LB has a unique optimal solution, then this solution is continuous at µ. 21

22 Proof. First suppose that Ā c is compact. The fact that g is continuous at (µ, w) follows from the continuity results for non-linear programs. Specifically, since the objective function is continuous in ν and Āc is compact, Theorem 2.1 in Fiacco and Ishizuka (1990) implies that g is continuous at (µ, w). Now consider non-compact Āc and define g n (µ, w) = ν Āc B n w i K i (µ i ν i ) for each n where B n is an Euclidean closed ball of radius n centred at µ. n is taken to be sufficiently large so that Āc B n is non-empty. Then, g n (µ, w) is continuous in (µ, w) and decreases with n to g(µ, w). Since this convergence is uniform, it follows that g(µ, w) is continuous in (µ, w). To see that the optimal solution to Problem LB is continuous at µ if W(µ) is a singleton, note that the problem is equivalent to max w PK g(µ, w). Since g(µ, ) is continuous on P K and W(µ) is a singleton, from Theorem 2.2 in Fiacco and Ishizuka (1990), we conclude that the optimal solution is continuous at µ. Proof of Theorem 5. Without loss of generality, let µ A. We first prove that the probability of error is at most δ. ] P µ [error] P µ [ t 1 : N i (t)k i (ˆµ i (t) ν i ) β(t, δ) ν A i ] N i (t)k i (ˆµ i (t) µ i ) β(t, δ) [ P µ t=1 i ( β(t, δ) e K+1 2 ) K log t e β(t,δ) K δ if c is chosen large enough s.t. t=1 t=1 e K+1 ct ( log 2 ) K (ct) log t 1. The third inequality above follows from Magureanu et al. (2014) extended from Bernoulli family to SPEF. Next, we prove the upper bound on the mean termination time. Fix an ɛ > 0. From the continuity of w at µ, there exists ξ > 0 such that for any µ B (µ) ξ we have w(µ ) B (w(µ)). ɛ For any T N, define the event E T := T t=h(t ) ˆµ(t) Bξ (µ). It easy to show that (see Lemma 19 of Garivier and Kaufmann (2016)) there exist constants B, C depending on ɛ and µ s.t. K ( P µ [ET c ] B exp CT 1/8). Note that ξ, E T, B, C are all functions of ɛ and µ. Now, for every ɛ > 0, define C ɛ (µ) = µ B ξ(ɛ) w B 3(K 1)ɛ (µ), (w(µ)) g(µ, w ). 22

23 By the continuity of w and g, we have lim ɛ 0 C ɛ (µ) = C (µ) = (T (µ)) 1. From Lemma 6, for any ɛ > 0, we have for every T T ɛ that on E T (ɛ), which in turn implies that N(t) t w(µ) 3(K 1)ɛ t > T, ( g ˆµ(t), N(t) ) Cɛ (µ) t > T. t Since the termination rule in the algorithm is given by for T T ɛ, on E T (ɛ), we have Now, let ( g ˆµ(t), N(t) ) t min(t U (δ), T ) T + T 0 (δ) := T + T t= T β(t, δ) C ɛ (µ). β(t, δ), t { 1 Cɛ (µ) < } β(t, δ) t { T N : } β(t, δ) T + Cɛ (µ) < T. Therefore, for any T max{t ɛ, T 0 (δ)}, on E T (ɛ), we have T U (δ) < T, which gives us E[T U (δ)] P[T U (δ) > T ] max{t ɛ, T 0 (δ)} + T =1 As shown in Garivier and Kaufmann (2016), we have P[E T (ɛ) c ]. T =1 T 0 (δ) = 1 Cɛ (O (log (1/δ)) + o (log log (1/δ))). (µ) This gives us Now, letting ɛ go to zero, we get lim sup δ 0 E[T U (δ)] log ( ) 1 1 C δ ɛ (µ). lim sup δ 0 E[T U (δ)] log ( 1 ) 1 lim ɛ 0 C δ ɛ (µ) = T (µ). 23

The information complexity of best-arm identification

The information complexity of best-arm identification The information complexity of best-arm identification Emilie Kaufmann, joint work with Olivier Cappé and Aurélien Garivier MAB workshop, Lancaster, January th, 206 Context: the multi-armed bandit model

More information

The information complexity of sequential resource allocation

The information complexity of sequential resource allocation The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation

More information

Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods

Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Sandeep Juneja Tata Institute of Fundamental Research Mumbai, India joint work with Peter Glynn Applied

More information

On the Complexity of Best Arm Identification with Fixed Confidence

On the Complexity of Best Arm Identification with Fixed Confidence On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse

More information

Bandit Algorithms for Pure Exploration: Best Arm Identification and Game Tree Search. Wouter M. Koolen

Bandit Algorithms for Pure Exploration: Best Arm Identification and Game Tree Search. Wouter M. Koolen Bandit Algorithms for Pure Exploration: Best Arm Identification and Game Tree Search Wouter M. Koolen Machine Learning and Statistics for Structures Friday 23 rd February, 2018 Outline 1 Intro 2 Model

More information

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March

More information

Ordinal Optimization and Multi Armed Bandit Techniques

Ordinal Optimization and Multi Armed Bandit Techniques Ordinal Optimization and Multi Armed Bandit Techniques Sandeep Juneja. with Peter Glynn September 10, 2014 The ordinal optimization problem Determining the best of d alternative designs for a system, on

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

arxiv: v2 [stat.ml] 14 Nov 2016

arxiv: v2 [stat.ml] 14 Nov 2016 Journal of Machine Learning Research 6 06-4 Submitted 7/4; Revised /5; Published /6 On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models arxiv:407.4443v [stat.ml] 4 Nov 06 Emilie Kaufmann

More information

On the Complexity of Best Arm Identification with Fixed Confidence

On the Complexity of Best Arm Identification with Fixed Confidence On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, joint work with Emilie Kaufmann CNRS, CRIStAL) to be presented at COLT 16, New York

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

Two optimization problems in a stochastic bandit model

Two optimization problems in a stochastic bandit model Two optimization problems in a stochastic bandit model Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishnan Journées MAS 204, Toulouse Outline From stochastic optimization

More information

Revisiting the Exploration-Exploitation Tradeoff in Bandit Models

Revisiting the Exploration-Exploitation Tradeoff in Bandit Models Revisiting the Exploration-Exploitation Tradeoff in Bandit Models joint work with Aurélien Garivier (IMT, Toulouse) and Tor Lattimore (University of Alberta) Workshop on Optimization and Decision-Making

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Boundary Behavior of Excess Demand Functions without the Strong Monotonicity Assumption

Boundary Behavior of Excess Demand Functions without the Strong Monotonicity Assumption Boundary Behavior of Excess Demand Functions without the Strong Monotonicity Assumption Chiaki Hara April 5, 2004 Abstract We give a theorem on the existence of an equilibrium price vector for an excess

More information

Stratégies bayésiennes et fréquentistes dans un modèle de bandit

Stratégies bayésiennes et fréquentistes dans un modèle de bandit Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016

More information

CHAPTER 7. Connectedness

CHAPTER 7. Connectedness CHAPTER 7 Connectedness 7.1. Connected topological spaces Definition 7.1. A topological space (X, T X ) is said to be connected if there is no continuous surjection f : X {0, 1} where the two point set

More information

Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration

Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration JMLR: Workshop and Conference Proceedings vol 65: 55, 207 30th Annual Conference on Learning Theory Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration Editor: Under Review for COLT 207

More information

On the Complexity of A/B Testing

On the Complexity of A/B Testing JMLR: Workshop and Conference Proceedings vol 35:1 3, 014 On the Complexity of A/B Testing Emilie Kaufmann LTCI, Télécom ParisTech & CNRS KAUFMANN@TELECOM-PARISTECH.FR Olivier Cappé CAPPE@TELECOM-PARISTECH.FR

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Multiple Identifications in Multi-Armed Bandits

Multiple Identifications in Multi-Armed Bandits Multiple Identifications in Multi-Armed Bandits arxiv:05.38v [cs.lg] 4 May 0 Sébastien Bubeck Department of Operations Research and Financial Engineering, Princeton University sbubeck@princeton.edu Tengyao

More information

Set, functions and Euclidean space. Seungjin Han

Set, functions and Euclidean space. Seungjin Han Set, functions and Euclidean space Seungjin Han September, 2018 1 Some Basics LOGIC A is necessary for B : If B holds, then A holds. B A A B is the contraposition of B A. A is sufficient for B: If A holds,

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

Math 341: Convex Geometry. Xi Chen

Math 341: Convex Geometry. Xi Chen Math 341: Convex Geometry Xi Chen 479 Central Academic Building, University of Alberta, Edmonton, Alberta T6G 2G1, CANADA E-mail address: xichen@math.ualberta.ca CHAPTER 1 Basics 1. Euclidean Geometry

More information

On Bayesian bandit algorithms

On Bayesian bandit algorithms On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms

More information

Bandits : optimality in exponential families

Bandits : optimality in exponential families Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities

More information

Measures. 1 Introduction. These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland.

Measures. 1 Introduction. These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland. Measures These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland. 1 Introduction Our motivation for studying measure theory is to lay a foundation

More information

Convex Geometry. Carsten Schütt

Convex Geometry. Carsten Schütt Convex Geometry Carsten Schütt November 25, 2006 2 Contents 0.1 Convex sets... 4 0.2 Separation.... 9 0.3 Extreme points..... 15 0.4 Blaschke selection principle... 18 0.5 Polytopes and polyhedra.... 23

More information

Course 212: Academic Year Section 1: Metric Spaces

Course 212: Academic Year Section 1: Metric Spaces Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........

More information

Lebesgue Measure on R n

Lebesgue Measure on R n CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

Jensen s inequality for multivariate medians

Jensen s inequality for multivariate medians Jensen s inequality for multivariate medians Milan Merkle University of Belgrade, Serbia emerkle@etf.rs Given a probability measure µ on Borel sigma-field of R d, and a function f : R d R, the main issue

More information

A strongly polynomial algorithm for linear systems having a binary solution

A strongly polynomial algorithm for linear systems having a binary solution A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th

More information

Part III. 10 Topological Space Basics. Topological Spaces

Part III. 10 Topological Space Basics. Topological Spaces Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.

More information

Topological properties of Z p and Q p and Euclidean models

Topological properties of Z p and Q p and Euclidean models Topological properties of Z p and Q p and Euclidean models Samuel Trautwein, Esther Röder, Giorgio Barozzi November 3, 20 Topology of Q p vs Topology of R Both R and Q p are normed fields and complete

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

In English, this means that if we travel on a straight line between any two points in C, then we never leave C. Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

Convex Analysis and Optimization Chapter 2 Solutions

Convex Analysis and Optimization Chapter 2 Solutions Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

Faster Algorithms for some Optimization Problems on Collinear Points

Faster Algorithms for some Optimization Problems on Collinear Points Faster Algorithms for some Optimization Problems on Collinear Points Ahmad Biniaz Prosenjit Bose Paz Carmi Anil Maheshwari J. Ian Munro Michiel Smid June 29, 2018 Abstract We propose faster algorithms

More information

Final Exam - Math Camp August 27, 2014

Final Exam - Math Camp August 27, 2014 Final Exam - Math Camp August 27, 2014 You will have three hours to complete this exam. Please write your solution to question one in blue book 1 and your solutions to the subsequent questions in blue

More information

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given. HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard

More information

Generalized Hypothesis Testing and Maximizing the Success Probability in Financial Markets

Generalized Hypothesis Testing and Maximizing the Success Probability in Financial Markets Generalized Hypothesis Testing and Maximizing the Success Probability in Financial Markets Tim Leung 1, Qingshuo Song 2, and Jie Yang 3 1 Columbia University, New York, USA; leung@ieor.columbia.edu 2 City

More information

GENERALIZED CONVEXITY AND OPTIMALITY CONDITIONS IN SCALAR AND VECTOR OPTIMIZATION

GENERALIZED CONVEXITY AND OPTIMALITY CONDITIONS IN SCALAR AND VECTOR OPTIMIZATION Chapter 4 GENERALIZED CONVEXITY AND OPTIMALITY CONDITIONS IN SCALAR AND VECTOR OPTIMIZATION Alberto Cambini Department of Statistics and Applied Mathematics University of Pisa, Via Cosmo Ridolfi 10 56124

More information

Second Welfare Theorem

Second Welfare Theorem Second Welfare Theorem Econ 2100 Fall 2015 Lecture 18, November 2 Outline 1 Second Welfare Theorem From Last Class We want to state a prove a theorem that says that any Pareto optimal allocation is (part

More information

On John type ellipsoids

On John type ellipsoids On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to

More information

Minimizing Cubic and Homogeneous Polynomials over Integers in the Plane

Minimizing Cubic and Homogeneous Polynomials over Integers in the Plane Minimizing Cubic and Homogeneous Polynomials over Integers in the Plane Alberto Del Pia Department of Industrial and Systems Engineering & Wisconsin Institutes for Discovery, University of Wisconsin-Madison

More information

Game Theory and its Applications to Networks - Part I: Strict Competition

Game Theory and its Applications to Networks - Part I: Strict Competition Game Theory and its Applications to Networks - Part I: Strict Competition Corinne Touati Master ENS Lyon, Fall 200 What is Game Theory and what is it for? Definition (Roger Myerson, Game Theory, Analysis

More information

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3 Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

More information

Introduction to Proofs in Analysis. updated December 5, By Edoh Y. Amiran Following the outline of notes by Donald Chalice INTRODUCTION

Introduction to Proofs in Analysis. updated December 5, By Edoh Y. Amiran Following the outline of notes by Donald Chalice INTRODUCTION Introduction to Proofs in Analysis updated December 5, 2016 By Edoh Y. Amiran Following the outline of notes by Donald Chalice INTRODUCTION Purpose. These notes intend to introduce four main notions from

More information

MATH 722, COMPLEX ANALYSIS, SPRING 2009 PART 5

MATH 722, COMPLEX ANALYSIS, SPRING 2009 PART 5 MATH 722, COMPLEX ANALYSIS, SPRING 2009 PART 5.. The Arzela-Ascoli Theorem.. The Riemann mapping theorem Let X be a metric space, and let F be a family of continuous complex-valued functions on X. We have

More information

Topology, Math 581, Fall 2017 last updated: November 24, Topology 1, Math 581, Fall 2017: Notes and homework Krzysztof Chris Ciesielski

Topology, Math 581, Fall 2017 last updated: November 24, Topology 1, Math 581, Fall 2017: Notes and homework Krzysztof Chris Ciesielski Topology, Math 581, Fall 2017 last updated: November 24, 2017 1 Topology 1, Math 581, Fall 2017: Notes and homework Krzysztof Chris Ciesielski Class of August 17: Course and syllabus overview. Topology

More information

BALANCING GAUSSIAN VECTORS. 1. Introduction

BALANCING GAUSSIAN VECTORS. 1. Introduction BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors

More information

A Parametric Simplex Algorithm for Linear Vector Optimization Problems

A Parametric Simplex Algorithm for Linear Vector Optimization Problems A Parametric Simplex Algorithm for Linear Vector Optimization Problems Birgit Rudloff Firdevs Ulus Robert Vanderbei July 9, 2015 Abstract In this paper, a parametric simplex algorithm for solving linear

More information

A LOCALIZATION PROPERTY AT THE BOUNDARY FOR MONGE-AMPERE EQUATION

A LOCALIZATION PROPERTY AT THE BOUNDARY FOR MONGE-AMPERE EQUATION A LOCALIZATION PROPERTY AT THE BOUNDARY FOR MONGE-AMPERE EQUATION O. SAVIN. Introduction In this paper we study the geometry of the sections for solutions to the Monge- Ampere equation det D 2 u = f, u

More information

Automorphism groups of wreath product digraphs

Automorphism groups of wreath product digraphs Automorphism groups of wreath product digraphs Edward Dobson Department of Mathematics and Statistics Mississippi State University PO Drawer MA Mississippi State, MS 39762 USA dobson@math.msstate.edu Joy

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Discrete Geometry. Problem 1. Austin Mohr. April 26, 2012

Discrete Geometry. Problem 1. Austin Mohr. April 26, 2012 Discrete Geometry Austin Mohr April 26, 2012 Problem 1 Theorem 1 (Linear Programming Duality). Suppose x, y, b, c R n and A R n n, Ax b, x 0, A T y c, and y 0. If x maximizes c T x and y minimizes b T

More information

B. Appendix B. Topological vector spaces

B. Appendix B. Topological vector spaces B.1 B. Appendix B. Topological vector spaces B.1. Fréchet spaces. In this appendix we go through the definition of Fréchet spaces and their inductive limits, such as they are used for definitions of function

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

Module 1. Probability

Module 1. Probability Module 1 Probability 1. Introduction In our daily life we come across many processes whose nature cannot be predicted in advance. Such processes are referred to as random processes. The only way to derive

More information

EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018

EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018 EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018 While these notes are under construction, I expect there will be many typos. The main reference for this is volume 1 of Hörmander, The analysis of liner

More information

Lecture 4: Lower Bounds (ending); Thompson Sampling

Lecture 4: Lower Bounds (ending); Thompson Sampling CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from

More information

Elements of Convex Optimization Theory

Elements of Convex Optimization Theory Elements of Convex Optimization Theory Costis Skiadas August 2015 This is a revised and extended version of Appendix A of Skiadas (2009), providing a self-contained overview of elements of convex optimization

More information

1 Review Session. 1.1 Lecture 2

1 Review Session. 1.1 Lecture 2 1 Review Session Note: The following lists give an overview of the material that was covered in the lectures and sections. Your TF will go through these lists. If anything is unclear or you have questions

More information

CS 6820 Fall 2014 Lectures, October 3-20, 2014

CS 6820 Fall 2014 Lectures, October 3-20, 2014 Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given

More information

SPECTRAL PROPERTIES AND NODAL SOLUTIONS FOR SECOND-ORDER, m-point, BOUNDARY VALUE PROBLEMS

SPECTRAL PROPERTIES AND NODAL SOLUTIONS FOR SECOND-ORDER, m-point, BOUNDARY VALUE PROBLEMS SPECTRAL PROPERTIES AND NODAL SOLUTIONS FOR SECOND-ORDER, m-point, BOUNDARY VALUE PROBLEMS BRYAN P. RYNNE Abstract. We consider the m-point boundary value problem consisting of the equation u = f(u), on

More information

Irredundant Families of Subcubes

Irredundant Families of Subcubes Irredundant Families of Subcubes David Ellis January 2010 Abstract We consider the problem of finding the maximum possible size of a family of -dimensional subcubes of the n-cube {0, 1} n, none of which

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Lecture 3: Lower Bounds for Bandit Algorithms

Lecture 3: Lower Bounds for Bandit Algorithms CMSC 858G: Bandits, Experts and Games 09/19/16 Lecture 3: Lower Bounds for Bandit Algorithms Instructor: Alex Slivkins Scribed by: Soham De & Karthik A Sankararaman 1 Lower Bounds In this lecture (and

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Two generic principles in modern bandits: the optimistic principle and Thompson sampling

Two generic principles in modern bandits: the optimistic principle and Thompson sampling Two generic principles in modern bandits: the optimistic principle and Thompson sampling Rémi Munos INRIA Lille, France CSML Lunch Seminars, September 12, 2014 Outline Two principles: The optimistic principle

More information

UTILITY OPTIMIZATION IN A FINITE SCENARIO SETTING

UTILITY OPTIMIZATION IN A FINITE SCENARIO SETTING UTILITY OPTIMIZATION IN A FINITE SCENARIO SETTING J. TEICHMANN Abstract. We introduce the main concepts of duality theory for utility optimization in a setting of finitely many economic scenarios. 1. Utility

More information

DIFFERENTIAL GEOMETRY 1 PROBLEM SET 1 SOLUTIONS

DIFFERENTIAL GEOMETRY 1 PROBLEM SET 1 SOLUTIONS DIFFERENTIAL GEOMETRY PROBLEM SET SOLUTIONS Lee: -4,--5,-6,-7 Problem -4: If k is an integer between 0 and min m, n, show that the set of m n matrices whose rank is at least k is an open submanifold of

More information

Mathematics 530. Practice Problems. n + 1 }

Mathematics 530. Practice Problems. n + 1 } Department of Mathematical Sciences University of Delaware Prof. T. Angell October 19, 2015 Mathematics 530 Practice Problems 1. Recall that an indifference relation on a partially ordered set is defined

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Facets for Node-Capacitated Multicut Polytopes from Path-Block Cycles with Two Common Nodes

Facets for Node-Capacitated Multicut Polytopes from Path-Block Cycles with Two Common Nodes Facets for Node-Capacitated Multicut Polytopes from Path-Block Cycles with Two Common Nodes Michael M. Sørensen July 2016 Abstract Path-block-cycle inequalities are valid, and sometimes facet-defining,

More information

Rose-Hulman Undergraduate Mathematics Journal

Rose-Hulman Undergraduate Mathematics Journal Rose-Hulman Undergraduate Mathematics Journal Volume 17 Issue 1 Article 5 Reversing A Doodle Bryan A. Curtis Metropolitan State University of Denver Follow this and additional works at: http://scholar.rose-hulman.edu/rhumj

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

10725/36725 Optimization Homework 2 Solutions

10725/36725 Optimization Homework 2 Solutions 10725/36725 Optimization Homework 2 Solutions 1 Convexity (Kevin) 1.1 Sets Let A R n be a closed set with non-empty interior that has a supporting hyperplane at every point on its boundary. (a) Show that

More information

The Caratheodory Construction of Measures

The Caratheodory Construction of Measures Chapter 5 The Caratheodory Construction of Measures Recall how our construction of Lebesgue measure in Chapter 2 proceeded from an initial notion of the size of a very restricted class of subsets of R,

More information

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS TSOGTGEREL GANTUMUR Abstract. After establishing discrete spectra for a large class of elliptic operators, we present some fundamental spectral properties

More information

MATH 426, TOPOLOGY. p 1.

MATH 426, TOPOLOGY. p 1. MATH 426, TOPOLOGY THE p-norms In this document we assume an extended real line, where is an element greater than all real numbers; the interval notation [1, ] will be used to mean [1, ) { }. 1. THE p

More information

IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I IE 5531: Engineering Optimization I Lecture 15: Nonlinear optimization Prof. John Gunnar Carlsson November 1, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 1 / 24

More information

Centre for Mathematics and Its Applications The Australian National University Canberra, ACT 0200 Australia. 1. Introduction

Centre for Mathematics and Its Applications The Australian National University Canberra, ACT 0200 Australia. 1. Introduction ON LOCALLY CONVEX HYPERSURFACES WITH BOUNDARY Neil S. Trudinger Xu-Jia Wang Centre for Mathematics and Its Applications The Australian National University Canberra, ACT 0200 Australia Abstract. In this

More information

Large Deviations for Weakly Dependent Sequences: The Gärtner-Ellis Theorem

Large Deviations for Weakly Dependent Sequences: The Gärtner-Ellis Theorem Chapter 34 Large Deviations for Weakly Dependent Sequences: The Gärtner-Ellis Theorem This chapter proves the Gärtner-Ellis theorem, establishing an LDP for not-too-dependent processes taking values in

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008. 1 ECONOMICS 594: LECTURE NOTES CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS W. Erwin Diewert January 31, 2008. 1. Introduction Many economic problems have the following structure: (i) a linear function

More information

Notes on Complex Analysis

Notes on Complex Analysis Michael Papadimitrakis Notes on Complex Analysis Department of Mathematics University of Crete Contents The complex plane.. The complex plane...................................2 Argument and polar representation.........................

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

THE STRUCTURE OF 3-CONNECTED MATROIDS OF PATH WIDTH THREE

THE STRUCTURE OF 3-CONNECTED MATROIDS OF PATH WIDTH THREE THE STRUCTURE OF 3-CONNECTED MATROIDS OF PATH WIDTH THREE RHIANNON HALL, JAMES OXLEY, AND CHARLES SEMPLE Abstract. A 3-connected matroid M is sequential or has path width 3 if its ground set E(M) has a

More information

Mathematics for Economists

Mathematics for Economists Mathematics for Economists Victor Filipe Sao Paulo School of Economics FGV Metric Spaces: Basic Definitions Victor Filipe (EESP/FGV) Mathematics for Economists Jan.-Feb. 2017 1 / 34 Definitions and Examples

More information

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero Chapter Limits of Sequences Calculus Student: lim s n = 0 means the s n are getting closer and closer to zero but never gets there. Instructor: ARGHHHHH! Exercise. Think of a better response for the instructor.

More information

Problem Set 2: Solutions Math 201A: Fall 2016

Problem Set 2: Solutions Math 201A: Fall 2016 Problem Set 2: s Math 201A: Fall 2016 Problem 1. (a) Prove that a closed subset of a complete metric space is complete. (b) Prove that a closed subset of a compact metric space is compact. (c) Prove that

More information