Tuning bandit algorithms in stochastic environments

Size: px
Start display at page:

Download "Tuning bandit algorithms in stochastic environments"

Transcription

1 Tuning bandit algorithm in tochatic environment Jean-Yve Audibert 1 and Rémi Muno and Caba Szepevári 3 1 CERTIS - Ecole de Pont 19, rue Alfred Nobel - Cité Decarte Marne-la-Vallée - France audibert@certi.enpc.fr INRIA Futur Lille, SequeL project, 50 avenue Halley, Villeneuve d Acq, France remi.muno@inria.fr 3 Univerity of Alberta, Edmonton T6G E8, Canada zepeva@c.ualberta.ca Abtract. Algorithm baed on upper-confidence bound for balancing exploration and exploitation are gaining popularity ince they are eay to implement, efficient and effective. In thi paper we conider a variant of the baic algorithm for the tochatic, multi-armed bandit problem that tae into account the empirical variance of the different arm. In earlier experimental wor, uch algorithm were found to outperform the competing algorithm. The purpoe of thi paper i to provide a theoretical explanation of thee finding and provide theoretical guideline for the tuning of the parameter of thee algorithm. For thi we analyze the expected regret and for the firt time the concentration of the regret. The analyi of the expected regret how that variance etimate can be epecially advantageou when the payoff of uboptimal arm have low variance. The ri analyi, rather unexpectedly, reveal that except ome very pecial bandit problem, for upper confidence bound baed algorithm with tandard bia equence, the regret concentrate only at a polynomial rate. Hence, although thee algorithm achieve logarithmic expected regret rate, they eem le attractive when the ri of achieving much wore than logarithmic cumulative regret i alo taen into account. 1 Introduction and notation In thi paper we conider tochatic multi-armed bandit problem. The original motivation of bandit problem come from the deire to optimize efficiency in clinical trial when the deciion maer can chooe between treatment but initially he doe not now which of the treatment i the mot effective one [9]. Multi-armed bandit problem became popular with the eminal paper of Robbin [8], after which they have found application in divere field, uch a control, economic, tatitic, or learning theory.

2 Formally, a K-armed bandit problem (K ) i defined by K ditribution, ν 1,..., ν K, one for each arm of the bandit. Imagine a gambler playing with thee K lot machine. The gambler can pull the arm of any of the machine. Succeive play of arm yield a equence of independent and identically ditributed (i.i.d.) real-valued random variable X,1, X,,..., coming from the ditribution ν. The random variable X,t i the payoff (or reward) of the -th arm when thi arm i pulled the t-th time. Independence alo hold for reward acro the different arm. The gambler facing the bandit problem want to pull the arm o a to maximize hi cumulative payoff. The problem i made challenging by auming that the payoff ditribution are initially unnown. Thu the gambler mut ue exploratory action in order to learn the utility of the individual arm, maing hi deciion baed on the available pat information. However, exploration ha to be carefully controlled ince exceive exploration may lead to unneceary loe. Hence, efficient online algorithm mut find the right balance between exploration and exploitation. Since the gambler cannot ue the ditribution of the arm (which are not available to him) he mut follow a policy, which i a mapping from the pace of poible hitorie, t N +{1,..., K} t R t, into the et {1,..., K}, which indexe the arm. Let µ = E[X,1 ] denote the expected reward of arm. 4 By definition, an optimal arm i an arm having the larget expected reward. We will ue to denote the index of uch an arm. Let the optimal expected reward be µ = max 1 K µ. Further, let T (t) denote the number of time arm i choen by the policy during the firt t play and let I t denote the arm played at time t. The (cumulative) regret at time n i defined by ˆR n n X,t t=1 n t=1 X It,T It (t). Oftentime, the goal i to minimize the expected (cumulative) regret of the policy, E[ ˆR n ]. Clearly, thi i equivalent to maximizing the total expected reward achieved up to time n. It turn out that the expected regret atifie E[ ˆR n ] K E[T (n)], =1 where = µ µ i the expected lo of playing arm. Hence, an algorithm that aim at minimizing the expected regret hould minimize the expected ampling time of uboptimal arm. Early paper tudied tochatic bandit problem under Bayeian aumption (e.g., [5]). Lai and Robbin [6] tudied bandit problem with parametric uncertaintie. They introduced an algorithm that follow what i now called the optimim in the face of uncertainty principle. Their algorithm compute upper 4 N denote the et of natural number, including zero and N + denote the et of poitive integer.

3 confidence bound for all the arm by maximizing the expected payoff when the parameter are varied within appropriate confidence et derived for the parameter. Then the algorithm chooe the arm with the highet uch bound. They how that the expected regret increae logarithmically only with the number of trial and prove that the regret i aymptotically the mallet poible up to a ublogarithmic factor for the conidered family of ditribution. Agrawal ha hown how to contruct uch optimal policie tarting from the ample-mean of the arm [1]. More recently, Auer et. al conidered the cae when the reward come from a bounded upport, ay [0, b], but otherwie the reward ditribution are uncontrained [3]. They have tudied everal policie, mot notably UCB1 which contruct the Upper Confidence Bound (UCB) for arm at time t by adding the bia factor b log t T (t 1) to it ample-mean. They have proven that the expected regret of thi algorithm atifie E[ ˆR ( ) n ] 8 :µ<µ b log(n) + O(1). (1) In the ame paper they propoe UCB1-NORMAL, that i deigned to wor with normally ditributed reward only. Thi algorithm etimate the variance of the arm and ue thee etimate to refine the bia factor. They how that for thi algorithm when the reward are indeed normally ditributed with mean µ and variance σ, E[ ˆR n ] 8 ( ) 3σ :µ <µ + log(n) + O(1). () Note that one major difference of thi reult and the previou one i that the regret-bound for UCB1 cale with b, while the regret bound for UCB1- NORMAL cale with the variance of the arm. Firt, let u note that it can be proven that the caling behavior of the regret-bound with b i not a proof artifact: The expected regret indeed cale with Ω(b ). Since b i typically jut an a priori gue on the ize of the interval containing the reward, which might be overly conervative, it i more deirable the leen the dependence on it. Auer et al. introduced another algorithm, UCB1-Tuned, in the experimental ection of their paper. Thi algorithm, imilarly to UCB1-NORMAL ue the empirical etimate of the variance in the bia equence. Although no theoretical guarantee were derived for UCB1-Tuned, thi algorithm ha been hown to outperform the other algorithm conidered in the paper in eentially all the experiment. The uperiority of thi algorithm ha been reconfirmed recently in the latet Pacal Challenge [4]. Intuitively, algorithm uing variance etimate hould wor better than UCB1 when the variance of ome uboptimal arm i much maller than b, ince thee arm will be le often drawn: uboptimal arm are more eaily potted by algorithm uing variance etimate. In thi paper we tudy the regret of UCB-V, which i a generic UCB algorithm that ue variance etimate in the bia equence. In particular, the bia equence 3

4 of UCB-V tae the form V,T (t 1)E T (t 1),t + c 3bE T (t 1),t T (t 1) T (t 1), where V, i the empirical variance etimate for arm baed on ample, E (viewed a a function of (, t)) i a o-called exploration function for which a typical choice i E,t = ζ log(t) (thu in thi cae, E independent of ). Here ζ, c > 0 are tuning parameter that can be ued to control the behavior of the algorithm. One major reult of the paper (Corollary 1) i a bound on the expected regret that cale in an improved fahion with b. In particular, we how that for a particular etting of the parameter of the algorithm, E[ ˆR ( ) σ n ] 10 + b log(n). :µ <µ The main difference to the bound (1) i that b i replaced by σ, though b till appear in the bound. Thi i indeed the major difference to the bound (). 5 In order to prove thi reult we will prove a novel tail bound on the ample average of i.i.d. random variable with bounded upport that, unlie previou imilar bound, involve the empirical variance and which may be of independent interet (Theorem 1). Otherwie, the proof of the regret bound involve the analyi of the ampling time of uboptimal arm (Theorem ), which contain ignificant advance compared with the one in [3]. Thi way we obtain reult on the expected regret for a wide cla of exploration function (Theorem 3). For the tandard logarithmic equence we will give lower limit on the tuning parameter: If the tuning parameter are below thee limit the lo goe up coniderably (Theorem 4,5). The econd major contribution of the paper i the analyi of the ri that the tudied upper confidence baed policie have a regret much higher than it expected value. To our bet nowledge no uch analyi exited for thi cla of algorithm o far. In order to analyze thi ri, we define the (cumulative) peudo-regret at time n via R n = K T (n). =1 Note that the expectation of the peudo-regret and the regret are the ame: E[R n ] = E[ ˆR n ]. The difference of the regret and the peudo-regret come from the randomne of the reward. Section 4 and 5 develop high probability bound for the peudo-regret. The ame ind of formulae can be obtained for the cumulative regret (ee Remar p.13). 5 Although, thi i unfortunate, it i poible to how that the dependence on b i unavoidable. 4

5 Interetingly, our analyi revealed ome tradeoff that we did not expect: A it turn out, if one aim for logarithmic expected regret (or, more generally, for ubpolynomial regret) then the regret doe not necearily concentrate exponentially fat around it mean (Theorem 7). In fact, thi i the cae when with poitive probability the optimal arm yield a reward maller than the expected reward of ome uboptimal arm. Tae for example two arm atifying thi condition and with µ 1 > µ : the firt arm i the optimal arm and = µ 1 µ > 0. Then the ditribution of the peudo-regret at time n will have two mode, one at Ω(log n) and the other at Ω( n). The probability ma aociated with thi econd ma will decay polynomially with n where the rate of decay depend on. Above the econd mode the ditribution decay exponentially. By increaing the exploration rate the ituation can be improved. Our ri tail bound (Theorem 6) mae thi dependence explicit. Of coure, increaing exploration rate increae the expected regret, hence the tradeoff between the expected regret and the ri of achieving much wore than the expected regret. One leon i thu that if in an application ri i important then it might be better to increae the exploration rate. In Section 5, we tudy a variant of the algorithm obtained by E,t = E. In particular, we how that with an appropriate choice of E = E (β), for any 0 < β < 1, for an infinite number of play, the algorithm achieve finite cumulative regret with probability 1 β (Theorem 8). Hence, we name thi variant PAC-UCB ( Probably approximately correct UCB ). Beide, for a finite timehorizon n, chooing β = 1/n then yield a logarithmic bound on the regret that fail with probability O(1/n) only. Thi hould be compared with the bound O(1/ log(n) a ), a > 0 obtained for the tandard choice E,t = ζ log t in Corollary. Hence, we conjecture that nowing the time horizon might repreent a ignificant advantage. Due to limited pace, ome of the proof are abent from thi paper. All the proof can be found in the extended verion []. The UCB-V algorithm For any {1,..., K} and t N, let X,t and V,t be the empirical etimate of the mean payoff and variance of arm : X,t 1 t t X,i and V,t 1 t i=1 t (X,i X,t ), i=1 where by convention X,0 0 and V,0 0. We recall that an optimal arm i an arm having the bet expected reward argmax {1,...,K} µ. We denote quantitie related to the optimal arm by putting in the upper index. In the following, we aume that the reward are bounded. Without lo of generality, we may aume that all the reward are almot urely in [0, b], with b > 0. We ummarize our aumption on the reward equence here: 5

6 Aumption: Let K >, ν 1,..., ν K ditribution over real with upport [0, b]. For 1 K, let {X,t } ν be an i.i.d. equence of random variable pecifying the reward for arm. 6 Aume that the reward of different arm are independent of each other, i.e., for any,, 1 < K, t N +, the collection of random variable, (X,1,..., X,t ) and (X,1,..., X,t), are independent of each other..1 The algorithm Let c 0. Let E = (E,t ) 0,t 0 be nonnegative real number uch that for any 0, the function t E,t i nondecreaing. We hall call E (viewed a a function of (, t)) the exploration function. For any arm and any nonnegative integer, t, introduce B,,t X, + with the convention 1/0 = +. V, E,t + c 3bE,t (3) UCB-V policy: At time t, play an arm maximizing B,T (t 1),t. Let u roughly decribe the behaviour of the algorithm. At the beginning (i.e., for mall t), every arm that ha not been drawn i aociated with an infinite bound which will become finite a oon a the arm i drawn. The more an arm i drawn, the cloer the bound (3) get cloe to it firt term, and thu, from the law of large number, to the expected reward µ. So the procedure will hopefully tend to draw more often arm having greatet expected reward. Neverthele, ince the obtained reward are tochatic it might happen that during the firt draw the (unnown) optimal arm alway give low reward. Fortunately, if the optimal arm ha not been drawn too often (i.e., mall T (t 1)), for appropriate choice of E (when E,t increae without bound in t for any fixed ), after a while the lat term of (3) will tart to dominate the two other term and will alo dominate the bound aociated with the arm drawn very often. Thu the optimal arm will be drawn even if the empirical mean of the obtained reward, X,T (t 1), i mall. More generally, uch choice of E lead to the exploration of arm with inferior empirical mean. Thi i why E i referred to a the exploration function. Naturally, a high-valued exploration function alo lead to draw often uboptimal arm. Therefore the choice of E i crucial in order to explore poibly optimal arm while eeping exploiting (what loo lie to be) the optimal arm. The actual form of B,,t come from the following novel tail bound on the ample average of i.i.d. random variable with bounded upport that, unlie previou imilar bound (Bennett and Berntein inequalitie), involve the empirical variance. 6 The i.i.d. aumption can be relaxed, ee e.g., [7]. 6

7 Theorem 1. Let X 1,..., X t be i.i.d. random variable taing their value in [0, b]. Let µ = E [X 1 ] be their common expected value. Conider the empirical expectation X t and variance V t defined repectively by t i=1 X t = X t i i=1 and V t = (X i X t ). t t Then for any t N and x > 0, with probability at leat 1 3e x, Vt x X t µ + 3bx t t. (4) Furthermore, introducing β(x, t) = 3 inf 1<α 3 ( log t log α t ) e x/α, (5) we have for any t N and x > 0, with probability at leat 1 β(x, t) V x X µ + 3bx hold imultaneouly for {1,,..., t}. Remar 1. The uniformity in time i the only difference between the two aertion of the previou theorem. When we ue (6), the value of x and t will be uch that β(x, t) i of order of 3e x, hence there will be no real price to pay for writing a verion of (4) that i uniform in time. In particular, thi mean that if 1 S t i a random variable then (4) till hold with probability at leat 1 β(x, t) and when i replaced with S. Note that (4) i uele for t 3 ince it r.h.. i larger than b. For any arm, time t and integer 1 t we may apply Theorem 1 to the reward X,1,..., X,, and obtain that with probability at leat 1 3 =4 e (c 1)E,t, we have µ B,,t. Hence, by our previou remar at time t with high probability (for a high-valued exploration function E) the expected reward of arm i upper bounded by B,T (t 1),t. The uer of the generic UCB-V policy ha two parameter to tune: the exploration function E and the poitive real number c. A cumberome technical analyi (not reproduced here) how that there are eentially two intereting type of exploration function: the one in which E,t depend only on t (ee Section 3 and 4). the one in which E,t depend only on (ee Section 5). (6). Bound for the ampling time of uboptimal arm The natural way of bounding the regret of UCB policie i to bound the number of time uboptimal arm are drawn. The bound preented here ignificantly improve the one ued in [3]. The improvement i a neceary tep to get tight bound for the intereting cae where the exploration function i logarithmic. 7

8 Theorem. After K play, each arm ha been pulled once. Let arm and time n N + be fixed. For any τ R and any integer u > 1, we have hence T (n) u + n ( t=u+k 1 1{ :u t 1.t. B,,t ) >τ} (7) +1 { :1 t 1.t. τ B,,t }, E [T (n)] u + n t 1 t=u+k 1 =u P( B,,t > τ ) + n t=u+k 1 P( : 1 t 1.t. B,,t τ ). Beide we have P ( T (n) > u ) n t=3 P( B,u,t > τ ) + P ( : 1 n u.t. B,,u+ τ ). Even if the above tatement hold for any arm, they will be only ueful for uboptimal arm. Proof. The firt aertion i trivial ince at the beginning all arm ha an infinite UCB, which become finite a oon a the arm ha been played once. To obtain (7), we note that where T (n) u n t=u+k 1 1 {It =;T (t)>u} = n t=u+k 1 Z,t,u, Z,t,u = 1 {It=; u T (t 1); 1 T (t 1);B,T (t 1),t B,T (t 1),t} 1 { :u t 1.t. B,,t >τ} + 1 { :1 t 1.t. τ B,,t } Taing the expectation on both ide of (7) and uing the probability union bound, we obtain (8). Finally, (9) come from a more direct argument that ue that the exploration function ξ,t i a nondecreaing function with repect to t. Conider an event uch that the following tatement hold: { t : 3 t n.t. B,u,t τ, : 1 n u.t. B,,u+ > τ.. Then for any 1 n u and u + t n B,,t B,,u+ > τ B,u,t. Thi implie that arm will not be pulled a (u + 1)-th time. Therefore we have proved by contradiction that { T (n) > u } ( { t : 3 t n.t. B,u,t > τ } { : 1 n u.t. B,,u+ τ }), which by taing probabilitie of both ide give the announced reult. (8) (9) (10) 8

9 3 Expected regret of UCB-V In thi ection, we conider that the exploration function doe not depend on (till, E = (E t ) t 0 i a nondecreaing of t). We will ee that a far a the expected regret i concerned, a natural choice of E t i the logarithmic function and that c hould not be taen too mall if one doe not want to uffer polynomial regret intead of logarithmic one. We derive bound on the expected regret and conclude by pecifying natural contraint on c and E t. Theorem 3. We have E[R n ] { ( σ 1 + 8(c 1)E n : >0 +ne E n + b ) ( ) 4σ + 4b + n t=16e n β ( (c 1)Et, t )}, (11) where we recall that β ( (c 1)E t, t ) i eentially of order e (c 1)E t (ee (5) and Remar 1). Proof. Let E n = (c 1)E n. We ue (8) with u the mallet integer larger than 8 ( σ + b ) E n and τ = µ. The above choice of u guarantee that for any u < t and t, [σ +b /]E t + 3bc Et [σ +b ]E n u = + 3b E n u [σ +b ] 8[σ +b ] + 3b 8[σ +b ] [, σ +b σ +4b + 3b 4σ +8b ince the lat inequality i equivalent to (x 1) 0 for x = ] σ +b σ +4b. For any u and t, uing (1), we have P(B,,t > µ ) = P ( V X, +, E t + 3bc E t > µ ) + P ( [σ X, + +b /]E t + 3bc E ) ( t > µ + + P V, σ + b / ) P ( X, µ > / ) ( ) j=1 + P (X,j µ ) σ b / e /(8σ +4b /3), (13) where in the lat tep we ued Berntein inequality twice. Summing up thee probabilitie we obtain t 1 P(B,,t > µ ) e /(8σ +4b /3) = e u /(8σ +4b /3) =u =u 1 e /(8σ +4b /3) ( ) ( ) 4σ + 4b e u /(8σ +4b /3) 4σ + 4b e E n, (14) (1) 9

10 where we have ued that 1 e x x/3 for 0 x 3/4. By uing (6) of Theorem 1 to bound the other probability in (8), we obtain that ( σ E [T (n)] 1 + 8E n + b ) ( + ne E n 4σ + 4b ) which by u 16E n give the announced reult. + n t=u+1 β((c 1)E t, t), In order to balance the term in (11) the exploration function hould be choen to be proportional to log t. For thi choice, the following corollary give an explicit bound on the expected regret: Corollary 1. If c = 1 and E t = ζ log t for ζ > 1, then there exit a contant c ζ depending only on ζ uch that for n ( ) σ E[R n ] c ζ + b log n. (15) : >0 For intance, for ζ = 1., the reult hold for c ζ = 10. Proof (Setch of the proof). The firt part, (15), follow directly from Theorem 3. Let u thu turn to the numerical reult. For n K, we have R n b(n 1) (ince in the firt K round, the optimal arm i choen at leat once). A a conequence, the numerical bound i nontrivial only for 0 log n < n 1, o we only need to chec the reult for n > 91. For n > 91, we bound the contant term uing 1 log n log 91 a 1 b (log n), with a 1 = 1/( log 91) The econd term ( σ ) between the bracet in (11) i bounded by a + b log n, with a = 8 1. = 9.6. For the third term, we ue that for n > 91, we have 4n 0. < a 3 log n, 4 with a 3 = log By tediou computation, the fourth term can b be bounded by a 4 (log n), with a Thi give the deired reult ince a 1 + a + a 3 + a A promied, Corollary 1 give a logarithmic bound on the expected regret that ha a linear dependence on the range of the reward contrary to bound on algorithm that doe not tae into account the empirical variance of the reward ditribution (ee e.g. the bound (1) that hold for UCB1). The previou corollary i well completed by the following reult, which eentially ay that we hould not ue E t = ζ log t with ζ < 1. Theorem 4. Conider E t = ζ log t and let n denote the total number of draw. Whatever c i, if ζ < 1, then there exit ome reward ditribution (depending on n) uch that the expected number of draw of uboptimal arm uing the UCB-V algorithm i polynomial in the total number of draw the UCB-V algorithm uffer a polynomial lo. 10

11 So far we have een that for c = 1 and ζ > 1 we obtain a logarithmic regret, and that the contant ζ could not be taen below 1 (whatever c i) without riing to uffer polynomial regret. Now we conider the lat term in B,,t, which i linear in the ratio E t /, and how that thi term i alo neceary to obtain a logarithmic regret, ince we have: Theorem 5. Conider E t = ζ log t. Whatever ζ i, if cζ < 1/6, there exit probability ditribution of the reward uch that the UCB-V algorithm uffer a polynomial lo. To conclude the above analyi, natural value for the contant appearing in the bound are the following one B,,t X, + V, log t + b log t. Thi choice correpond to the critical exploration function E t = log t and to c = 1/6, that i, the minimal aociated value of c in view of the previou theorem. In practice, it would be unwie (or ri eeing) to ue maller contant in front of the lat two term. 4 Concentration of the regret In real life, people are not only intereted in the expected reward that they can obtain by ome policy. They alo want to etimate probabilitie of obtaining much le reward than expected, hence they are intereted in the concentration of the regret. Thi ection tart with the tudy of the concentration of the peudo-regret, ince, a we will ee in Remar p.13, the concentration propertie of the regret follow from the concentration propertie of the peudo-regret. We till aume that the exploration function doe not depend on and that E = (E t ) t 0 i nondecreaing. Introduce β n (t) 3 min α 1 M N 0 =0< 1 < < M =n.t. j+1 α( j +1) M 1 j=0 e (c 1)E j +t+1 α. (16) We have een in the previou ection that to obtain logarithmic expected regret, it i natural to tae a logarithmic exploration function. In thi cae, and alo when the exploration function goe to infinity fater than the logarithmic function, the complicate um of (16), up to econd order logarithmic term, i of the order of e (c 1)E t. Thi can be een by conidering (diregarding rounding iue) the geometric grid j = α j with α cloe to 1. Let x till denote the larget integer maller or equal to x. The next theorem provide a bound for the tail of the peudo-regret. 11

12 Theorem 6. Let σ v 8(c 1)( + 4b ), r 0 3 Then, for any x 1, we have P ( R n > r 0 x ) : >0 : >0 ( 1 + v E n ). { } ne (c 1)Enx + β n ( v E n x ), (17) (ee the above dicu- where we recall that β n (t) i eentially of order e (c 1)Et ion). Proof (etch of the proof). Firt note that P ( R n > r 0 x ) { = P : >0 : >0 T (n) > : >0 { } P T (n) > (1 + v E n )x. } (1 + v E n )x Let E n = (c 1)E n. We ue (9) with τ = µ and u = (1 + v E n )x v E n x. From (13), we have P(B,u,t > µ ) e u /(8σ +4b /3) e E n x. To bound the other probability in (9), we ue α 1 and the grid 0,..., M of {1,..., n} realizing the minimum of (16) when t = u. Let I j = { j + 1,..., j+1 }. Then P ( : 1 n u.t. B,,u+ µ ) M 1 j=0 M 1 j=0 P ( I j.t. B,, j +u+1 µ ) P ( I j.t. (X, µ ) + V,E j +u+1 + 3bcE j +u+1 0 ) M 1 3 j=0 e (c 1)E j +u+1 α = β n (u) β n ( v E n x ), where the econd to lat inequality come from an appropriate union bound argument (ee [] for detail). When E n log n, the lat term i the leading term. In particular, when c = 1 and E t = ζ log t with ζ > 1, Theorem 6 lead to the following corollary, which eentially ay that for any z > γ log n with γ large enough, for ome contant C > 0: P ( R n > z ) C z ζ, 1

13 Corollary. When c = 1 and E t = ζ log t with ζ > 1, there exit κ 1 > 0 and κ > 0 depending only on b, K, (σ ) {1,...,K}, ( ) {1,...,K} atifying that for any ε > 0 there exit Γ ε > 0 (tending to infinity when ε goe to 0) uch that for any n and any z > κ 1 log n P ( R n > z ) κ Γ ε log z z ζ(1 ε) Since the regret i expected to be of order log n the condition z = Ω(log n) i not an eential retriction. Further, the regret concentration, although increae with increaing ζ, i pretty low. For comparion, remember that a zero-mean martingale M n with increment bounded by 1 would atify P(M n > z) exp( z /n). The low concentration for UCB-V happen becaue the firt Ω(log(t)) choice of the optimal arm can be unlucy (yielding mall regret) in which cae the optimal regret will not be elected any more during the firt t tep. Hence, the ditribution of the regret will be of a mixture form with a mode whoe poition cale linearly with time and which decay only at a polynomial rate, which i controlled by ζ. 7 Thi reaoning relie crucially on that the choice of the optimal arm can be unlucy. Hence, we have the following reult: Theorem 7. Conider E t = ζ log t with cζ > 1. Let denote the econd optimal arm. If the eential infimum of the optimal arm i trictly larger than µ, then the peudo-regret ha exponentially mall tail. Inverely, if the eential infimum of the optimal arm i trictly maller than µ, then the peudo-regret ha only polynomial tail. Remar. In Theorem 6 and Corollary, we have conidered the peudo-regret: R n = K =1 T (n) intead of the regret ˆR n n t=1 X,t n t=1 X I t,t It (t). Our main motivation for thi wa to provide a imple a poible formulae and aumption. The following computation explain that when the optimal arm i unique, one can obtain imilar contration bound for the regret. Conider the intereting cae when c = 1 and E t = ζ log t with ζ > 1. By modifying the analyi lightly in Corollary, one can get that there exit κ 1 > 0 uch that for any z > κ 1 log n, with probability at leat 1 z 1, the number of draw of uboptimal arm i bounded by C z for ome C > 0. Thi mean that the algorithm draw an optimal arm at leat n C z time. Now if the optimal arm i unique, thi mean that n Cz term cancel out in the ummation of the definition of the regret. For the Cz term which remain, one can ue tandard Berntein inequalitie and union bound to prove that with probability 1 Cz 1, we have ˆR n R n + C z. Since the bound on the peudo-regret i of order z (Corollary ), a imilar bound hold for the regret. 5 PAC-UCB In thi ection, we conider that the exploration function doe not depend on t: E,t = E. We how that for appropriate equence (E ) 0, thi lead to an UCB 7 Note that entirely analogou reult hold for UCB1. 13

14 algorithm which ha nice propertie with high probability (Probably Approximately Correct), hence the name of it. Note that in thi etting, the quantity B,,t doe not depend on the time t o we will imply write it B,. Beide, in order to implify the dicuion, we tae c = 1. Theorem 8. Let β (0, 1). Conider a equence (E ) 0 atifying E and Conider u the mallet integer uch that 4K 7 e E β. (18) u E u > 8σ + 6b 3. (19) With probability at leat 1 β, the PAC-UCB policy play any uboptimal arm at mot u time. Let q > 1 be a fixed parameter. A typical choice for E i E = log(k q β 1 ), (0) up to ome additive contant enuring that (18) hold. For thi choice, Theorem 8 implie that for ome poitive contant κ, with probability at leat 1 β, for any uboptimal arm (i.e., > 0), it number of play i bounded by T,β κ ( σ + 1 ) [ ( σ log K + b ) β 1 ], which i independent of the total number of play! Thi directly lead to the following upper bound on the regret of the policy at time n K =1 T (n) : >0 T,β. (1) One hould notice that the previou bound hold with probability at leat 1 β and on the complement et no mall upper bound i poible: one can find a ituation in which with probability of order β, the regret i of order n (even if (1) hold with probability greater than 1 β). More formally, thi mean that the following bound cannot be eentially improved (unle putting additional aumption): K E[R n ] = E[T (n)] (1 β) T,β + βn =1 : >0 A a conequence, if one i intereted in having a bound on the expected regret at ome fixed time n, one hould tae β of order 1/n (up to poibly a logarithmic factor): Theorem 9. Let n 7. Conider the equence E = log[kn( + 1)]. For thi equence, the PAC-UCB policy atifie with probability at leat 1 4 log(n/7) n, for any : > 0, the number of play of arm up to time n i bounded by 1 + ( 8σ the expected regret at time n atifie E[R n ] : >0 + 6b 3 ) log(kn ). ( 4σ + 30b ) log(n/3). () 14

15 6 Open problem When the horizon time n i nown, one may want to chooe the exploration function E depending on the value of n. For intance, in view of Theorem 3 and 6, one may want to tae c = 1 and a contant exploration function E 3 log n. Thi choice enure logarithmic expected regret and a nice concentration property: { P R n > 4 ( ) } σ : >0 + b log n C n. (3) Thi algorithm doe not behave a the one which imply tae E,t = 3 log t. Indeed the algorithm with contant exploration function E,t = 3 log n concentrate it exploration phae at the beginning of the play, and then witche to exploitation mode. On the contrary, the algorithm which adapt to the time horizon explore and exploit during all the time interval [0; n]. However, in view of Theorem 7, it atifie only P {R n > 4 : >0 ( σ ) } C + b log n (log n) C. which i ignificantly wore than (3). The open quetion i: i there an algorithm that adapt to time horizon which ha a logarithmic expected regret and a concentration property imilar to (3)? We conjecture that the anwer i no. Reference 1. R. Agrawal. Sample mean baed index policie with O(log n) regret for the multiarmed bandit problem. Advance in Applied Probability, 7: , J.-Y. Audibert, R. Muno, and C. Szepevári. Variance etimate and exploration function in multi-armed bandit. Reearch report 07-31, Certi - Ecole de Pont, , 1 3. P. Auer, N. Cea-Bianchi, and P. Ficher. Finite time analyi of the multiarmed bandit problem. Machine Learning, 47(-3):35 56, 00. 3, 4, 7 4. P. Auer, N. Cea-Bianchi, and J. Shawe-Taylor. Exploration veru exploitation challenge. In nd PASCAL Challenge Worhop. Pacal Networ, J. C. Gittin. Multi-armed Bandit Allocation Indice. Wiley-Intercience erie in ytem and optimization. Wiley, Chicheter, NY, T. L. Lai and H. Robbin. Aymptotically efficient adaptive allocation rule. Advance in Applied Mathematic, 6:4, T.L. Lai and S. Yaowitz. Machine learning and nonparametric bandit theory. IEEE Tranaction on Automatic Control, 40: , H. Robbin. Some apect of the equential deign of experiment. Bulletin of the American Mathematic Society, 58:57 535, W.R. Thompon. On the lielihood that one unnown probability exceed another in view of the evidence of two ample. Biometria, 5:85 94,

Exploration-exploitation trade-off using variance estimates in multi-armed bandits

Exploration-exploitation trade-off using variance estimates in multi-armed bandits Exploration-exploitation trade-off uing variance etimate in multi-armed bandit Jean Yve Audibert Univerité Pari-Et, Ecole de Pont PariTech, CERTIS 6 avenue Blaie Pacal, 77455 Marne-la-Vallée, France &

More information

Variance estimates and exploration function in multi-armed bandit

Variance estimates and exploration function in multi-armed bandit Variance etimate and exploration function in multi-armed bandit Jean-Yve Audibert 1, Rémi Muno and Caba Szepevári 3 CERTIS Reearch Report 07-31 alo Willow Technical report April 007, revied January 008

More information

Tuning bandit algorithms in stochastic environments

Tuning bandit algorithms in stochastic environments Tuning bandit algorithms in stochastic environments Jean-Yves Audibert, Rémi Munos, Csaba Szepesvari To cite this version: Jean-Yves Audibert, Rémi Munos, Csaba Szepesvari. Tuning bandit algorithms in

More information

Algorithms for Infinitely Many-Armed Bandits

Algorithms for Infinitely Many-Armed Bandits Algorithm for Infinitely Many-Armed Bandit Yizao Wang Department of Statitic - Univerity of Michigan 437 Wet Hall, 1085 South Univerity, Ann Arbor, MI, 48109-1107, USA yizwang@umich.edu Jean-Yve Audibert

More information

Social Studies 201 Notes for November 14, 2003

Social Studies 201 Notes for November 14, 2003 1 Social Studie 201 Note for November 14, 2003 Etimation of a mean, mall ample ize Section 8.4, p. 501. When a reearcher ha only a mall ample ize available, the central limit theorem doe not apply to the

More information

Social Studies 201 Notes for March 18, 2005

Social Studies 201 Notes for March 18, 2005 1 Social Studie 201 Note for March 18, 2005 Etimation of a mean, mall ample ize Section 8.4, p. 501. When a reearcher ha only a mall ample ize available, the central limit theorem doe not apply to the

More information

Preemptive scheduling on a small number of hierarchical machines

Preemptive scheduling on a small number of hierarchical machines Available online at www.ciencedirect.com Information and Computation 06 (008) 60 619 www.elevier.com/locate/ic Preemptive cheduling on a mall number of hierarchical machine György Dóa a, Leah Eptein b,

More information

1. The F-test for Equality of Two Variances

1. The F-test for Equality of Two Variances . The F-tet for Equality of Two Variance Previouly we've learned how to tet whether two population mean are equal, uing data from two independent ample. We can alo tet whether two population variance are

More information

Lecture 21. The Lovasz splitting-off lemma Topics in Combinatorial Optimization April 29th, 2004

Lecture 21. The Lovasz splitting-off lemma Topics in Combinatorial Optimization April 29th, 2004 18.997 Topic in Combinatorial Optimization April 29th, 2004 Lecture 21 Lecturer: Michel X. Goeman Scribe: Mohammad Mahdian 1 The Lovaz plitting-off lemma Lovaz plitting-off lemma tate the following. Theorem

More information

Source slideplayer.com/fundamentals of Analytical Chemistry, F.J. Holler, S.R.Crouch. Chapter 6: Random Errors in Chemical Analysis

Source slideplayer.com/fundamentals of Analytical Chemistry, F.J. Holler, S.R.Crouch. Chapter 6: Random Errors in Chemical Analysis Source lideplayer.com/fundamental of Analytical Chemitry, F.J. Holler, S.R.Crouch Chapter 6: Random Error in Chemical Analyi Random error are preent in every meaurement no matter how careful the experimenter.

More information

ON THE APPROXIMATION ERROR IN HIGH DIMENSIONAL MODEL REPRESENTATION. Xiaoqun Wang

ON THE APPROXIMATION ERROR IN HIGH DIMENSIONAL MODEL REPRESENTATION. Xiaoqun Wang Proceeding of the 2008 Winter Simulation Conference S. J. Maon, R. R. Hill, L. Mönch, O. Roe, T. Jefferon, J. W. Fowler ed. ON THE APPROXIMATION ERROR IN HIGH DIMENSIONAL MODEL REPRESENTATION Xiaoqun Wang

More information

IEOR 3106: Fall 2013, Professor Whitt Topics for Discussion: Tuesday, November 19 Alternating Renewal Processes and The Renewal Equation

IEOR 3106: Fall 2013, Professor Whitt Topics for Discussion: Tuesday, November 19 Alternating Renewal Processes and The Renewal Equation IEOR 316: Fall 213, Profeor Whitt Topic for Dicuion: Tueday, November 19 Alternating Renewal Procee and The Renewal Equation 1 Alternating Renewal Procee An alternating renewal proce alternate between

More information

List coloring hypergraphs

List coloring hypergraphs Lit coloring hypergraph Penny Haxell Jacque Vertraete Department of Combinatoric and Optimization Univerity of Waterloo Waterloo, Ontario, Canada pehaxell@uwaterloo.ca Department of Mathematic Univerity

More information

Problem Set 8 Solutions

Problem Set 8 Solutions Deign and Analyi of Algorithm April 29, 2015 Maachuett Intitute of Technology 6.046J/18.410J Prof. Erik Demaine, Srini Devada, and Nancy Lynch Problem Set 8 Solution Problem Set 8 Solution Thi problem

More information

Lecture 7: Testing Distributions

Lecture 7: Testing Distributions CSE 5: Sublinear (and Streaming) Algorithm Spring 014 Lecture 7: Teting Ditribution April 1, 014 Lecturer: Paul Beame Scribe: Paul Beame 1 Teting Uniformity of Ditribution We return today to property teting

More information

New bounds for Morse clusters

New bounds for Morse clusters New bound for More cluter Tamá Vinkó Advanced Concept Team, European Space Agency, ESTEC Keplerlaan 1, 2201 AZ Noordwijk, The Netherland Tama.Vinko@ea.int and Arnold Neumaier Fakultät für Mathematik, Univerität

More information

Singular perturbation theory

Singular perturbation theory Singular perturbation theory Marc R. Rouel June 21, 2004 1 Introduction When we apply the teady-tate approximation (SSA) in chemical kinetic, we typically argue that ome of the intermediate are highly

More information

Bogoliubov Transformation in Classical Mechanics

Bogoliubov Transformation in Classical Mechanics Bogoliubov Tranformation in Claical Mechanic Canonical Tranformation Suppoe we have a et of complex canonical variable, {a j }, and would like to conider another et of variable, {b }, b b ({a j }). How

More information

Clustering Methods without Given Number of Clusters

Clustering Methods without Given Number of Clusters Clutering Method without Given Number of Cluter Peng Xu, Fei Liu Introduction A we now, mean method i a very effective algorithm of clutering. It mot powerful feature i the calability and implicity. However,

More information

Codes Correcting Two Deletions

Codes Correcting Two Deletions 1 Code Correcting Two Deletion Ryan Gabry and Frederic Sala Spawar Sytem Center Univerity of California, Lo Angele ryan.gabry@navy.mil fredala@ucla.edu Abtract In thi work, we invetigate the problem of

More information

Suggested Answers To Exercises. estimates variability in a sampling distribution of random means. About 68% of means fall

Suggested Answers To Exercises. estimates variability in a sampling distribution of random means. About 68% of means fall Beyond Significance Teting ( nd Edition), Rex B. Kline Suggeted Anwer To Exercie Chapter. The tatitic meaure variability among core at the cae level. In a normal ditribution, about 68% of the core fall

More information

Comparing Means: t-tests for Two Independent Samples

Comparing Means: t-tests for Two Independent Samples Comparing ean: t-tet for Two Independent Sample Independent-eaure Deign t-tet for Two Independent Sample Allow reearcher to evaluate the mean difference between two population uing data from two eparate

More information

Lecture 8: Period Finding: Simon s Problem over Z N

Lecture 8: Period Finding: Simon s Problem over Z N Quantum Computation (CMU 8-859BB, Fall 205) Lecture 8: Period Finding: Simon Problem over Z October 5, 205 Lecturer: John Wright Scribe: icola Rech Problem A mentioned previouly, period finding i a rephraing

More information

Theoretical Computer Science. Optimal algorithms for online scheduling with bounded rearrangement at the end

Theoretical Computer Science. Optimal algorithms for online scheduling with bounded rearrangement at the end Theoretical Computer Science 4 (0) 669 678 Content lit available at SciVere ScienceDirect Theoretical Computer Science journal homepage: www.elevier.com/locate/tc Optimal algorithm for online cheduling

More information

Alternate Dispersion Measures in Replicated Factorial Experiments

Alternate Dispersion Measures in Replicated Factorial Experiments Alternate Diperion Meaure in Replicated Factorial Experiment Neal A. Mackertich The Raytheon Company, Sudbury MA 02421 Jame C. Benneyan Northeatern Univerity, Boton MA 02115 Peter D. Krau The Raytheon

More information

CHAPTER 6. Estimation

CHAPTER 6. Estimation CHAPTER 6 Etimation Definition. Statitical inference i the procedure by which we reach a concluion about a population on the bai of information contained in a ample drawn from that population. Definition.

More information

arxiv: v2 [math.nt] 30 Apr 2015

arxiv: v2 [math.nt] 30 Apr 2015 A THEOREM FOR DISTINCT ZEROS OF L-FUNCTIONS École Normale Supérieure arxiv:54.6556v [math.nt] 3 Apr 5 943 Cachan November 9, 7 Abtract In thi paper, we etablih a imple criterion for two L-function L and

More information

Optimal Coordination of Samples in Business Surveys

Optimal Coordination of Samples in Business Surveys Paper preented at the ICES-III, June 8-, 007, Montreal, Quebec, Canada Optimal Coordination of Sample in Buine Survey enka Mach, Ioana Şchiopu-Kratina, Philip T Rei, Jean-Marc Fillion Statitic Canada New

More information

(b) Is the game below solvable by iterated strict dominance? Does it have a unique Nash equilibrium?

(b) Is the game below solvable by iterated strict dominance? Does it have a unique Nash equilibrium? 14.1 Final Exam Anwer all quetion. You have 3 hour in which to complete the exam. 1. (60 Minute 40 Point) Anwer each of the following ubquetion briefly. Pleae how your calculation and provide rough explanation

More information

Question 1 Equivalent Circuits

Question 1 Equivalent Circuits MAE 40 inear ircuit Fall 2007 Final Intruction ) Thi exam i open book You may ue whatever written material you chooe, including your cla note and textbook You may ue a hand calculator with no communication

More information

7.2 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 281

7.2 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 281 72 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 28 and i 2 Show how Euler formula (page 33) can then be ued to deduce the reult a ( a) 2 b 2 {e at co bt} {e at in bt} b ( a) 2 b 2 5 Under what condition

More information

Unbounded solutions of second order discrete BVPs on infinite intervals

Unbounded solutions of second order discrete BVPs on infinite intervals Available online at www.tjna.com J. Nonlinear Sci. Appl. 9 206), 357 369 Reearch Article Unbounded olution of econd order dicrete BVP on infinite interval Hairong Lian a,, Jingwu Li a, Ravi P Agarwal b

More information

μ + = σ = D 4 σ = D 3 σ = σ = All units in parts (a) and (b) are in V. (1) x chart: Center = μ = 0.75 UCL =

μ + = σ = D 4 σ = D 3 σ = σ = All units in parts (a) and (b) are in V. (1) x chart: Center = μ = 0.75 UCL = Our online Tutor are available 4*7 to provide Help with Proce control ytem Homework/Aignment or a long term Graduate/Undergraduate Proce control ytem Project. Our Tutor being experienced and proficient

More information

Control Systems Analysis and Design by the Root-Locus Method

Control Systems Analysis and Design by the Root-Locus Method 6 Control Sytem Analyi and Deign by the Root-Locu Method 6 1 INTRODUCTION The baic characteritic of the tranient repone of a cloed-loop ytem i cloely related to the location of the cloed-loop pole. If

More information

EC381/MN308 Probability and Some Statistics. Lecture 7 - Outline. Chapter Cumulative Distribution Function (CDF) Continuous Random Variables

EC381/MN308 Probability and Some Statistics. Lecture 7 - Outline. Chapter Cumulative Distribution Function (CDF) Continuous Random Variables EC38/MN38 Probability and Some Statitic Yanni Pachalidi yannip@bu.edu, http://ionia.bu.edu/ Lecture 7 - Outline. Continuou Random Variable Dept. of Manufacturing Engineering Dept. of Electrical and Computer

More information

Convex Hulls of Curves Sam Burton

Convex Hulls of Curves Sam Burton Convex Hull of Curve Sam Burton 1 Introduction Thi paper will primarily be concerned with determining the face of convex hull of curve of the form C = {(t, t a, t b ) t [ 1, 1]}, a < b N in R 3. We hall

More information

Chapter 2 Sampling and Quantization. In order to investigate sampling and quantization, the difference between analog

Chapter 2 Sampling and Quantization. In order to investigate sampling and quantization, the difference between analog Chapter Sampling and Quantization.1 Analog and Digital Signal In order to invetigate ampling and quantization, the difference between analog and digital ignal mut be undertood. Analog ignal conit of continuou

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions Stochatic Optimization with Inequality Contraint Uing Simultaneou Perturbation and Penalty Function I-Jeng Wang* and Jame C. Spall** The John Hopkin Univerity Applied Phyic Laboratory 11100 John Hopkin

More information

MATEMATIK Datum: Tid: eftermiddag. A.Heintz Telefonvakt: Anders Martinsson Tel.:

MATEMATIK Datum: Tid: eftermiddag. A.Heintz Telefonvakt: Anders Martinsson Tel.: MATEMATIK Datum: 20-08-25 Tid: eftermiddag GU, Chalmer Hjälpmedel: inga A.Heintz Telefonvakt: Ander Martinon Tel.: 073-07926. Löningar till tenta i ODE och matematik modellering, MMG5, MVE6. Define what

More information

Multicolor Sunflowers

Multicolor Sunflowers Multicolor Sunflower Dhruv Mubayi Lujia Wang October 19, 2017 Abtract A unflower i a collection of ditinct et uch that the interection of any two of them i the ame a the common interection C of all of

More information

RaneNote BESSEL FILTER CROSSOVER

RaneNote BESSEL FILTER CROSSOVER RaneNote BESSEL FILTER CROSSOVER A Beel Filter Croover, and It Relation to Other Croover Beel Function Phae Shift Group Delay Beel, 3dB Down Introduction One of the way that a croover may be contructed

More information

By Xiaoquan Wen and Matthew Stephens University of Michigan and University of Chicago

By Xiaoquan Wen and Matthew Stephens University of Michigan and University of Chicago Submitted to the Annal of Applied Statitic SUPPLEMENTARY APPENDIX TO BAYESIAN METHODS FOR GENETIC ASSOCIATION ANALYSIS WITH HETEROGENEOUS SUBGROUPS: FROM META-ANALYSES TO GENE-ENVIRONMENT INTERACTIONS

More information

A SIMPLE NASH-MOSER IMPLICIT FUNCTION THEOREM IN WEIGHTED BANACH SPACES. Sanghyun Cho

A SIMPLE NASH-MOSER IMPLICIT FUNCTION THEOREM IN WEIGHTED BANACH SPACES. Sanghyun Cho A SIMPLE NASH-MOSER IMPLICIT FUNCTION THEOREM IN WEIGHTED BANACH SPACES Sanghyun Cho Abtract. We prove a implified verion of the Nah-Moer implicit function theorem in weighted Banach pace. We relax the

More information

DIFFERENTIAL EQUATIONS

DIFFERENTIAL EQUATIONS DIFFERENTIAL EQUATIONS Laplace Tranform Paul Dawkin Table of Content Preface... Laplace Tranform... Introduction... The Definition... 5 Laplace Tranform... 9 Invere Laplace Tranform... Step Function...4

More information

Suggestions - Problem Set (a) Show the discriminant condition (1) takes the form. ln ln, # # R R

Suggestions - Problem Set (a) Show the discriminant condition (1) takes the form. ln ln, # # R R Suggetion - Problem Set 3 4.2 (a) Show the dicriminant condition (1) take the form x D Ð.. Ñ. D.. D. ln ln, a deired. We then replace the quantitie. 3ß D3 by their etimate to get the proper form for thi

More information

Nonlinear Single-Particle Dynamics in High Energy Accelerators

Nonlinear Single-Particle Dynamics in High Energy Accelerators Nonlinear Single-Particle Dynamic in High Energy Accelerator Part 6: Canonical Perturbation Theory Nonlinear Single-Particle Dynamic in High Energy Accelerator Thi coure conit of eight lecture: 1. Introduction

More information

Lecture 9: Shor s Algorithm

Lecture 9: Shor s Algorithm Quantum Computation (CMU 8-859BB, Fall 05) Lecture 9: Shor Algorithm October 7, 05 Lecturer: Ryan O Donnell Scribe: Sidhanth Mohanty Overview Let u recall the period finding problem that wa et up a a function

More information

OPTIMAL STOPPING FOR SHEPP S URN WITH RISK AVERSION

OPTIMAL STOPPING FOR SHEPP S URN WITH RISK AVERSION OPTIMAL STOPPING FOR SHEPP S URN WITH RISK AVERSION ROBERT CHEN 1, ILIE GRIGORESCU 1 AND MIN KANG 2 Abtract. An (m, p) urn contain m ball of value 1 and p ball of value +1. A player tart with fortune k

More information

Online Appendix for Managerial Attention and Worker Performance by Marina Halac and Andrea Prat

Online Appendix for Managerial Attention and Worker Performance by Marina Halac and Andrea Prat Online Appendix for Managerial Attention and Worker Performance by Marina Halac and Andrea Prat Thi Online Appendix contain the proof of our reult for the undicounted limit dicued in Section 2 of the paper,

More information

Predicting the Performance of Teams of Bounded Rational Decision-makers Using a Markov Chain Model

Predicting the Performance of Teams of Bounded Rational Decision-makers Using a Markov Chain Model The InTITuTe for ytem reearch Ir TechnIcal report 2013-14 Predicting the Performance of Team of Bounded Rational Deciion-maer Uing a Marov Chain Model Jeffrey Herrmann Ir develop, applie and teache advanced

More information

Estimation of Peaked Densities Over the Interval [0,1] Using Two-Sided Power Distribution: Application to Lottery Experiments

Estimation of Peaked Densities Over the Interval [0,1] Using Two-Sided Power Distribution: Application to Lottery Experiments MPRA Munich Peronal RePEc Archive Etimation of Peaed Denitie Over the Interval [0] Uing Two-Sided Power Ditribution: Application to Lottery Experiment Krzyztof Konte Artal Invetment 8. April 00 Online

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Yihay Manour Google Inc. & Tel-Aviv Univerity Outline Goal of Reinforcement Learning Mathematical Model (MDP) Planning Learning Current Reearch iue 2 Goal of Reinforcement Learning

More information

The Impact of Imperfect Scheduling on Cross-Layer Rate. Control in Multihop Wireless Networks

The Impact of Imperfect Scheduling on Cross-Layer Rate. Control in Multihop Wireless Networks The mpact of mperfect Scheduling on Cro-Layer Rate Control in Multihop Wirele Network Xiaojun Lin and Ne B. Shroff Center for Wirele Sytem and Application (CWSA) School of Electrical and Computer Engineering,

More information

Lecture 4 Topic 3: General linear models (GLMs), the fundamentals of the analysis of variance (ANOVA), and completely randomized designs (CRDs)

Lecture 4 Topic 3: General linear models (GLMs), the fundamentals of the analysis of variance (ANOVA), and completely randomized designs (CRDs) Lecture 4 Topic 3: General linear model (GLM), the fundamental of the analyi of variance (ANOVA), and completely randomized deign (CRD) The general linear model One population: An obervation i explained

More information

NCAAPMT Calculus Challenge Challenge #3 Due: October 26, 2011

NCAAPMT Calculus Challenge Challenge #3 Due: October 26, 2011 NCAAPMT Calculu Challenge 011 01 Challenge #3 Due: October 6, 011 A Model of Traffic Flow Everyone ha at ome time been on a multi-lane highway and encountered road contruction that required the traffic

More information

Lecture 17: Analytic Functions and Integrals (See Chapter 14 in Boas)

Lecture 17: Analytic Functions and Integrals (See Chapter 14 in Boas) Lecture 7: Analytic Function and Integral (See Chapter 4 in Boa) Thi i a good point to take a brief detour and expand on our previou dicuion of complex variable and complex function of complex variable.

More information

Chapter 4. The Laplace Transform Method

Chapter 4. The Laplace Transform Method Chapter 4. The Laplace Tranform Method The Laplace Tranform i a tranformation, meaning that it change a function into a new function. Actually, it i a linear tranformation, becaue it convert a linear combination

More information

If Y is normally Distributed, then and 2 Y Y 10. σ σ

If Y is normally Distributed, then and 2 Y Y 10. σ σ ull Hypothei Significance Teting V. APS 50 Lecture ote. B. Dudek. ot for General Ditribution. Cla Member Uage Only. Chi-Square and F-Ditribution, and Diperion Tet Recall from Chapter 4 material on: ( )

More information

Chapter Landscape of an Optimization Problem. Local Search. Coping With NP-Hardness. Gradient Descent: Vertex Cover

Chapter Landscape of an Optimization Problem. Local Search. Coping With NP-Hardness. Gradient Descent: Vertex Cover Coping With NP-Hardne Chapter 12 Local Search Q Suppoe I need to olve an NP-hard problem What hould I do? A Theory ay you're unlikely to find poly-time algorithm Mut acrifice one of three deired feature

More information

Lecture 10 Filtering: Applied Concepts

Lecture 10 Filtering: Applied Concepts Lecture Filtering: Applied Concept In the previou two lecture, you have learned about finite-impule-repone (FIR) and infinite-impule-repone (IIR) filter. In thee lecture, we introduced the concept of filtering

More information

Laplace Transformation

Laplace Transformation Univerity of Technology Electromechanical Department Energy Branch Advance Mathematic Laplace Tranformation nd Cla Lecture 6 Page of 7 Laplace Tranformation Definition Suppoe that f(t) i a piecewie continuou

More information

Department of Mechanical Engineering Massachusetts Institute of Technology Modeling, Dynamics and Control III Spring 2002

Department of Mechanical Engineering Massachusetts Institute of Technology Modeling, Dynamics and Control III Spring 2002 Department of Mechanical Engineering Maachuett Intitute of Technology 2.010 Modeling, Dynamic and Control III Spring 2002 SOLUTIONS: Problem Set # 10 Problem 1 Etimating tranfer function from Bode Plot.

More information

Gain and Phase Margins Based Delay Dependent Stability Analysis of Two- Area LFC System with Communication Delays

Gain and Phase Margins Based Delay Dependent Stability Analysis of Two- Area LFC System with Communication Delays Gain and Phae Margin Baed Delay Dependent Stability Analyi of Two- Area LFC Sytem with Communication Delay Şahin Sönmez and Saffet Ayaun Department of Electrical Engineering, Niğde Ömer Halidemir Univerity,

More information

UNIT 15 RELIABILITY EVALUATION OF k-out-of-n AND STANDBY SYSTEMS

UNIT 15 RELIABILITY EVALUATION OF k-out-of-n AND STANDBY SYSTEMS UNIT 1 RELIABILITY EVALUATION OF k-out-of-n AND STANDBY SYSTEMS Structure 1.1 Introduction Objective 1.2 Redundancy 1.3 Reliability of k-out-of-n Sytem 1.4 Reliability of Standby Sytem 1. Summary 1.6 Solution/Anwer

More information

Random vs. Deterministic Deployment of Sensors in the Presence of Failures and Placement Errors

Random vs. Deterministic Deployment of Sensors in the Presence of Failures and Placement Errors Random v. Determinitic Deployment of Senor in the Preence of Failure and Placement Error Paul Baliter Univerity of Memphi pbalitr@memphi.edu Santoh Kumar Univerity of Memphi antoh.kumar@memphi.edu Abtract

More information

TRIPLE SOLUTIONS FOR THE ONE-DIMENSIONAL

TRIPLE SOLUTIONS FOR THE ONE-DIMENSIONAL GLASNIK MATEMATIČKI Vol. 38583, 73 84 TRIPLE SOLUTIONS FOR THE ONE-DIMENSIONAL p-laplacian Haihen Lü, Donal O Regan and Ravi P. Agarwal Academy of Mathematic and Sytem Science, Beijing, China, National

More information

SMALL-SIGNAL STABILITY ASSESSMENT OF THE EUROPEAN POWER SYSTEM BASED ON ADVANCED NEURAL NETWORK METHOD

SMALL-SIGNAL STABILITY ASSESSMENT OF THE EUROPEAN POWER SYSTEM BASED ON ADVANCED NEURAL NETWORK METHOD SMALL-SIGNAL STABILITY ASSESSMENT OF THE EUROPEAN POWER SYSTEM BASED ON ADVANCED NEURAL NETWORK METHOD S.P. Teeuwen, I. Erlich U. Bachmann Univerity of Duiburg, Germany Department of Electrical Power Sytem

More information

Riemann s Functional Equation is Not a Valid Function and Its Implication on the Riemann Hypothesis. Armando M. Evangelista Jr.

Riemann s Functional Equation is Not a Valid Function and Its Implication on the Riemann Hypothesis. Armando M. Evangelista Jr. Riemann Functional Equation i Not a Valid Function and It Implication on the Riemann Hypothei By Armando M. Evangelita Jr. armando78973@gmail.com On Augut 28, 28 ABSTRACT Riemann functional equation wa

More information

Computers and Mathematics with Applications. Sharp algebraic periodicity conditions for linear higher order

Computers and Mathematics with Applications. Sharp algebraic periodicity conditions for linear higher order Computer and Mathematic with Application 64 (2012) 2262 2274 Content lit available at SciVere ScienceDirect Computer and Mathematic with Application journal homepage: wwweleviercom/locate/camwa Sharp algebraic

More information

Beta Burr XII OR Five Parameter Beta Lomax Distribution: Remarks and Characterizations

Beta Burr XII OR Five Parameter Beta Lomax Distribution: Remarks and Characterizations Marquette Univerity e-publication@marquette Mathematic, Statitic and Computer Science Faculty Reearch and Publication Mathematic, Statitic and Computer Science, Department of 6-1-2014 Beta Burr XII OR

More information

The Laplace Transform (Intro)

The Laplace Transform (Intro) 4 The Laplace Tranform (Intro) The Laplace tranform i a mathematical tool baed on integration that ha a number of application It particular, it can implify the olving of many differential equation We will

More information

A Bluffer s Guide to... Sphericity

A Bluffer s Guide to... Sphericity A Bluffer Guide to Sphericity Andy Field Univerity of Suex The ue of repeated meaure, where the ame ubject are teted under a number of condition, ha numerou practical and tatitical benefit. For one thing

More information

arxiv: v1 [math.mg] 25 Aug 2011

arxiv: v1 [math.mg] 25 Aug 2011 ABSORBING ANGLES, STEINER MINIMAL TREES, AND ANTIPODALITY HORST MARTINI, KONRAD J. SWANEPOEL, AND P. OLOFF DE WET arxiv:08.5046v [math.mg] 25 Aug 20 Abtract. We give a new proof that a tar {op i : i =,...,

More information

A Simplified Methodology for the Synthesis of Adaptive Flight Control Systems

A Simplified Methodology for the Synthesis of Adaptive Flight Control Systems A Simplified Methodology for the Synthei of Adaptive Flight Control Sytem J.ROUSHANIAN, F.NADJAFI Department of Mechanical Engineering KNT Univerity of Technology 3Mirdamad St. Tehran IRAN Abtract- A implified

More information

Strong Stochastic Stability for MANET Mobility Models

Strong Stochastic Stability for MANET Mobility Models trong tochatic tability for MAET Mobility Model R. Timo, K. Blacmore and L. Hanlen Department of Engineering, the Autralian ational Univerity, Canberra Email: {roy.timo, im.blacmore}@anu.edu.au Wirele

More information

arxiv: v4 [math.co] 21 Sep 2014

arxiv: v4 [math.co] 21 Sep 2014 ASYMPTOTIC IMPROVEMENT OF THE SUNFLOWER BOUND arxiv:408.367v4 [math.co] 2 Sep 204 JUNICHIRO FUKUYAMA Abtract. A unflower with a core Y i a family B of et uch that U U Y for each two different element U

More information

CONGESTION control is a key functionality in modern

CONGESTION control is a key functionality in modern IEEE TRANSACTIONS ON INFORMATION TEORY, VOL. X, NO. X, XXXXXXX 2008 On the Connection-Level Stability of Congetion-Controlled Communication Network Xiaojun Lin, Member, IEEE, Ne B. Shroff, Fellow, IEEE,

More information

Z a>2 s 1n = X L - m. X L = m + Z a>2 s 1n X L = The decision rule for this one-tail test is

Z a>2 s 1n = X L - m. X L = m + Z a>2 s 1n X L = The decision rule for this one-tail test is M09_BERE8380_12_OM_C09.QD 2/21/11 3:44 PM Page 1 9.6 The Power of a Tet 9.6 The Power of a Tet 1 Section 9.1 defined Type I and Type II error and their aociated rik. Recall that a repreent the probability

More information

CHAPTER 4 DESIGN OF STATE FEEDBACK CONTROLLERS AND STATE OBSERVERS USING REDUCED ORDER MODEL

CHAPTER 4 DESIGN OF STATE FEEDBACK CONTROLLERS AND STATE OBSERVERS USING REDUCED ORDER MODEL 98 CHAPTER DESIGN OF STATE FEEDBACK CONTROLLERS AND STATE OBSERVERS USING REDUCED ORDER MODEL INTRODUCTION The deign of ytem uing tate pace model for the deign i called a modern control deign and it i

More information

Memoryle Strategie in Concurrent Game with Reachability Objective Λ Krihnendu Chatterjee y Luca de Alfaro x Thoma A. Henzinger y;z y EECS, Univerity o

Memoryle Strategie in Concurrent Game with Reachability Objective Λ Krihnendu Chatterjee y Luca de Alfaro x Thoma A. Henzinger y;z y EECS, Univerity o Memoryle Strategie in Concurrent Game with Reachability Objective Krihnendu Chatterjee, Luca de Alfaro and Thoma A. Henzinger Report No. UCB/CSD-5-1406 Augut 2005 Computer Science Diviion (EECS) Univerity

More information

CHAPTER 8 OBSERVER BASED REDUCED ORDER CONTROLLER DESIGN FOR LARGE SCALE LINEAR DISCRETE-TIME CONTROL SYSTEMS

CHAPTER 8 OBSERVER BASED REDUCED ORDER CONTROLLER DESIGN FOR LARGE SCALE LINEAR DISCRETE-TIME CONTROL SYSTEMS CHAPTER 8 OBSERVER BASED REDUCED ORDER CONTROLLER DESIGN FOR LARGE SCALE LINEAR DISCRETE-TIME CONTROL SYSTEMS 8.1 INTRODUCTION 8.2 REDUCED ORDER MODEL DESIGN FOR LINEAR DISCRETE-TIME CONTROL SYSTEMS 8.3

More information

Math Skills. Scientific Notation. Uncertainty in Measurements. Appendix A5 SKILLS HANDBOOK

Math Skills. Scientific Notation. Uncertainty in Measurements. Appendix A5 SKILLS HANDBOOK ppendix 5 Scientific Notation It i difficult to work with very large or very mall number when they are written in common decimal notation. Uually it i poible to accommodate uch number by changing the SI

More information

Weber Schafheitlin-type integrals with exponent 1

Weber Schafheitlin-type integrals with exponent 1 Integral Tranform and Special Function Vol., No., February 9, 47 53 Weber Schafheitlin-type integral with exponent Johanne Kellendonk* and Serge Richard Univerité de Lyon, Univerité Lyon, Intitut Camille

More information

Multi-dimensional Fuzzy Euler Approximation

Multi-dimensional Fuzzy Euler Approximation Mathematica Aeterna, Vol 7, 2017, no 2, 163-176 Multi-dimenional Fuzzy Euler Approximation Yangyang Hao College of Mathematic and Information Science Hebei Univerity, Baoding 071002, China hdhyywa@163com

More information

Unavoidable Cycles in Polynomial-Based Time-Invariant LDPC Convolutional Codes

Unavoidable Cycles in Polynomial-Based Time-Invariant LDPC Convolutional Codes European Wirele, April 7-9,, Vienna, Autria ISBN 978--87-4-9 VE VERLAG GMBH Unavoidable Cycle in Polynomial-Baed Time-Invariant LPC Convolutional Code Hua Zhou and Norbert Goertz Intitute of Telecommunication

More information

Given the following circuit with unknown initial capacitor voltage v(0): X(s) Immediately, we know that the transfer function H(s) is

Given the following circuit with unknown initial capacitor voltage v(0): X(s) Immediately, we know that the transfer function H(s) is EE 4G Note: Chapter 6 Intructor: Cheung More about ZSR and ZIR. Finding unknown initial condition: Given the following circuit with unknown initial capacitor voltage v0: F v0/ / Input xt 0Ω Output yt -

More information

A Study on Simulating Convolutional Codes and Turbo Codes

A Study on Simulating Convolutional Codes and Turbo Codes A Study on Simulating Convolutional Code and Turbo Code Final Report By Daniel Chang July 27, 2001 Advior: Dr. P. Kinman Executive Summary Thi project include the deign of imulation of everal convolutional

More information

Pythagorean Triple Updated 08--5 Drlnoordzij@leennoordzijnl wwwleennoordzijme Content A Roadmap for generating Pythagorean Triple Pythagorean Triple 3 Dicuion Concluion 5 A Roadmap for generating Pythagorean

More information

Approximating discrete probability distributions with Bayesian networks

Approximating discrete probability distributions with Bayesian networks Approximating dicrete probability ditribution with Bayeian network Jon Williamon Department of Philoophy King College, Str and, London, WC2R 2LS, UK Abtract I generalie the argument of [Chow & Liu 1968]

More information

LDPC Convolutional Codes Based on Permutation Polynomials over Integer Rings

LDPC Convolutional Codes Based on Permutation Polynomials over Integer Rings LDPC Convolutional Code Baed on Permutation Polynomial over Integer Ring Marco B. S. Tavare and Gerhard P. Fettwei Vodafone Chair Mobile Communication Sytem, Dreden Univerity of Technology, 01062 Dreden,

More information

White Rose Research Online URL for this paper: Version: Accepted Version

White Rose Research Online URL for this paper:   Version: Accepted Version Thi i a repoitory copy of Identification of nonlinear ytem with non-peritent excitation uing an iterative forward orthogonal leat quare regreion algorithm. White Roe Reearch Online URL for thi paper: http://eprint.whiteroe.ac.uk/107314/

More information

One Class of Splitting Iterative Schemes

One Class of Splitting Iterative Schemes One Cla of Splitting Iterative Scheme v Ciegi and V. Pakalnytė Vilniu Gedimina Technical Univerity Saulėtekio al. 11, 2054, Vilniu, Lithuania rc@fm.vtu.lt Abtract. Thi paper deal with the tability analyi

More information

DIFFERENTIAL EQUATIONS Laplace Transforms. Paul Dawkins

DIFFERENTIAL EQUATIONS Laplace Transforms. Paul Dawkins DIFFERENTIAL EQUATIONS Laplace Tranform Paul Dawkin Table of Content Preface... Laplace Tranform... Introduction... The Definition... 5 Laplace Tranform... 9 Invere Laplace Tranform... Step Function...

More information

On the Stability Region of Congestion Control

On the Stability Region of Congestion Control On the Stability Region of Congetion Control Xiaojun Lin and Ne B. Shroff School of Electrical and Computer Engineering Purdue Univerity, Wet Lafayette, IN 47906 {linx,hroff}@ecn.purdue.edu Abtract It

More information

[Saxena, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

[Saxena, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY [Saena, (9): September, 0] ISSN: 77-9655 Impact Factor:.85 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY Contant Stre Accelerated Life Teting Uing Rayleigh Geometric Proce

More information

Dimensional Analysis A Tool for Guiding Mathematical Calculations

Dimensional Analysis A Tool for Guiding Mathematical Calculations Dimenional Analyi A Tool for Guiding Mathematical Calculation Dougla A. Kerr Iue 1 February 6, 2010 ABSTRACT AND INTRODUCTION In converting quantitie from one unit to another, we may know the applicable

More information

Component-by-Component Construction of Low-Discrepancy Point Sets of Small Size

Component-by-Component Construction of Low-Discrepancy Point Sets of Small Size Component-by-Component Contruction of Low-Dicrepancy Point Set of Small Size Benjamin Doerr, Michael Gnewuch, Peter Kritzer, and Friedrich Pillichhammer Abtract We invetigate the problem of contructing

More information

Factor Analysis with Poisson Output

Factor Analysis with Poisson Output Factor Analyi with Poion Output Gopal Santhanam Byron Yu Krihna V. Shenoy, Department of Electrical Engineering, Neurocience Program Stanford Univerity Stanford, CA 94305, USA {gopal,byronyu,henoy}@tanford.edu

More information

Tail estimates for sums of variables sampled by a random walk

Tail estimates for sums of variables sampled by a random walk Tail etimate for um of variable ampled by a random walk arxiv:math/0608740v mathpr] 11 Oct 006 Roy Wagner April 1, 008 Abtract We prove tail etimate for variable i f(x i), where (X i ) i i the trajectory

More information

into a discrete time function. Recall that the table of Laplace/z-transforms is constructed by (i) selecting to get

into a discrete time function. Recall that the table of Laplace/z-transforms is constructed by (i) selecting to get Lecture 25 Introduction to Some Matlab c2d Code in Relation to Sampled Sytem here are many way to convert a continuou time function, { h( t) ; t [0, )} into a dicrete time function { h ( k) ; k {0,,, }}

More information