Multi-Bandit Best Arm Identification

Size: px
Start display at page:

Download "Multi-Bandit Best Arm Identification"

Transcription

1 Multi-Bndit Best Arm Identifiction Victor Gbillon Mohmmd Ghvmzdeh Alessndro Lzric INRIA Lille - Nord Europe, Tem SequeL {victor.gbillon,mohmmd.ghvmzdeh,lessndro.lzric}@inri.fr Sébstien Bubeck Deprtment of Opertions Reserch nd Finncil Engineering, Princeton University sbubeck@princeton.edu Abstrct We study the problem of identifying the best rm in ech of the bndits in multibndit multi-rmed setting. We first propose n lgorithm clled Gp-bsed Explortion (GpE) tht focuses on the rms whose men is close to the men of the best rm in the sme bndit (i.e., smll gp). We then introduce n lgorithm, clled GpE-V, which tkes into ccount the vrince of the rms in ddition to their gp. We prove n upper-bound on the probbility of error for both lgorithms. Since GpE nd GpE-V need to tune n explortion prmeter tht depends on the complexity of the problem, which is often unknown in dvnce, we lso introduce vritions of these lgorithms tht estimte this complexity online. Finlly, we evlute the performnce of these lgorithms nd compre them to other lloction strtegies on number of synthetic problems. 1 Introduction Consider clinicl problem with M subpopultions, in which one should decide between K m options for treting subjects from ech subpopultionm. A subpopultion my correspond to ptients with prticulr gene biomrker (or other risk ctegories) nd the tretment options re the vilble tretments for disese. The min objective here is to construct rule, which recommends the best tretment for ech of the subpopultions. These rules re usully constructed using dt from clinicl trils tht re generlly costly to run. Therefore, it is importnt to distribute the tril resources wisely so tht the devised rule yields good performnce. Since it my tke significntly more resources to find the best tretment for one subpopultion thn for the others, the common strtegy of enrolling ptients s they rrive my not yield n overll good performnce. Moreover, pplying tretment options uniformly t rndom in subpopultion could not only wste tril resources, but lso it might run the risk of finding bd tretment for tht subpopultion. This problem cn be formulted s the best rm identifiction over M multi-rmed bndits [1], which itself cn be seen s the problem of pure explortion [4] over multiple bndits. In this formultion, ech subpopultion is considered s multi-rmed bndit, ech tretment s n rm, trying mediction on ptient s pull, nd we re sked to recommend n rm for ech bndit fter given number of pulls (budget). The evlution cn be bsed on 1) the verge over the bndits of the rewrd of the recommended rms, or 2) the verge probbility of error (not selecting the best rm), or 3) the mximum probbility of error. Note tht this setting is different from the stndrd multi-rmed bndit problem in which the gol is to mximize the cumultive sum of rewrds (see e.g., [13, 3]). The pure explortion problem is bout designing strtegies tht mke the best use of the limited budget (e.g., the totl number of ptients tht cn be dmitted to the clinicl tril) in order to optimize the performnce in decision-mking tsk. Audibert et l. [1] proposed two lgorithms to ddress this problem: 1) highly exploring strtegy bsed on upper confidence bounds, clled UCB-E, in which the optiml vlue of its prmeter depends on some mesure of the complexity of the problem, nd 2) prmeter-free method bsed on progressively rejecting the rms which seem to be suboptiml, clled Successive Rejects. They showed tht both lgorithms re nerly optiml since their probbility of returning the wrong rm decreses exponentilly t rte. Rcing lgorithms (e.g., [10, 12]) 1

2 nd ction-elimintion lgorithms [7] ddress this problem under constrint on the ccurcy in identifying the best rm nd they minimize the budget needed to chieve tht ccurcy. However, UCB-E nd Successive Rejects re designed for single bndit problem, nd s we will discuss lter, cnnot be esily extended to the multi-bndit cse studied in this pper. Deng et l. hve recently proposed n ctive lerning lgorithm for resource lloction over multiple bndits [5]. However, they do not provide ny theoreticl nlysis for their lgorithm nd only empiriclly evlute its performnce. Moreover, the trget of their proposed lgorithm is to minimize the mximum uncertinty in estimting the vlue of the rms for ech bndit. Note tht this is different thn our trget, which is to mximize the qulity of the rms recommended for ech bndit. In this pper, we study the problem of best-rm identifiction in multi-rmed multi-bndit setting under fixed budget constrint, nd propose n lgorithm, clled Gp-bsed Explortion (GpE), to solve it. The lloction strtegy implemented by GpE focuses on the gp of the rms, i.e., the difference between the men of the rm nd the men of the best rm (in tht bndit). The GpE-vrince (GpE-V) lgorithm extends this pproch tking into ccount lso the vrince of the rms. For both lgorithms, we prove n upper-bound on the probbility of error tht decreses exponentilly with the budget. Since both GpE nd GpE-V need to tune n explortion prmeter tht depends on the complexity of the problem, which is rrely known in dvnce, we lso introduce their dptive version. Finlly, we evlute the performnce of these lgorithms nd compre them with Uniform nd Uniform+UCB-E strtegies on number of synthetic problems. Our empiricl results indicte tht 1) GpE nd GpE-V hve better performnce thn Uniform nd Uniform+UCB-E, nd 2) the dptive version of these lgorithms mtch the performnce of their non-dptive counterprts. 2 Problem Setup In this section, we introduce the nottion used throughout the pper nd formlize the multi-bndit best rm identifiction problem. Let M be the number of bndits nd K be the number of rms for ech bndit (we use indices m, p, q for the bndits nd k, i, j for the rms). Ech rm k of bndit m is chrcterized by distributionν mk bounded in [0,b] with menµ mk nd vrinceσmk 2. In the following, we ssume tht ech bndit hs unique best rm. We denote byµ m ndk m the men nd the index of the best rm of bnditm (i.e.,µ m = mx 1 k K µ mk, km = rgmx 1 k K µ mk ). In ech bnditm, we define the gp for ech rm s mk = mx j k µ mj µ mk. The clinicl tril problem described in Sec. 1 cn be formlized s gme between stochstic multibndit environment nd forecster, where the distributions {ν mk } re unknown to the forecster. At ech round t = 1,...,n, the forecster pulls bndit-rm pir I(t) = (m,k) nd observes smple drwn from the distribution ν I(t) independent from the pst. The forecster estimtes the expected vlue of ech rm by computing the verge of the smples observed over time. Let T mk (t) be the number of times tht rm k of bndit m hs been pulled by the end of round t, then the men of this rm is estimted s µ mk (t) = 1 T mk (t) Tmk (t) s=1 X mk (s), wherex mk (s) is the s-th smple observed from ν mk. Given the previous definitions, we define the estimted gps s mk (t) = mx j k µ mj (t) µ mk (t). At the end of roundn, the forecster returns for ech bndit m the rm with the highest estimted men, i.e.,j m (n) = rgmx k µ mk (n), nd incurs regret r(n) = 1 M r m(n) = 1 M ( ) µ M M m µ mjm(n). m=1 m=1 As discussed in the introduction, other performnce mesures cn be defined for this problem. In some pplictions, returning the wrong rm is considered s n error independently from its regret, nd thus, the objective is to minimize the verge probbility of error e(n) = 1 M e m(n) = 1 M P ( J m(n) km). M M m=1 m=1 Finlly, in problems similr to the clinicl tril, resonble objective is to return the right tretment for ll the genetic profiles nd not just to hve smll verge probbility of error. In this cse, the globl performnce of the forecster cn be mesured s l(n) = mx m lm(n) = mx m P( J m(n) k m It is interesting to note the reltionship between these three performnce mesures: min m m e(n) Er(n) b e(n) b l(n), where the expecttion in the regret is w.r.t. the rndom smples. As result, ny lgorithm minimizing the worst cse probbility of error, l(n), lso controls the verge probbility of error,e(n), nd the simple regreter(n). Note tht the lgorithms introduced in this pper directly trget the problem of minimizing l(n). 2 ).

3 Prmeters: number of rounds n, explortion prmeter, mximum rnge b Initilize: T mk (0) = 0, mk (0) = 0 for ll bndit-rm pirs (m,k) for t = 1,2,...,n do Compute B mk (t) = mk (t 1)+b T mk for ll bndit-rm pirs (m,k) (t 1) Drw I(t) rgmx ( m,k B mk (t) Observe X I(t) TI(t) (t 1)+1 ) ν I(t) Updte T I(t) (t) = T I(t) (t 1)+1 nd mk (t) k of the selected bndit end for ReturnJ m(n) rgmx k {1,...,K} µ mk (n), m {1...M} Figure 1: The pseudo-code of the gp-bsed Explortion (GpE) lgorithm. 3 The Gp-bsed Explortion Algorithm Fig. 1 contins the pseudo-code of the gp-bsed explortion (GpE) lgorithm. GpE flttens the bndit-rm structure nd reduces it to single-bndit problem with M K rms. At ech time step t, the lgorithm relies on the observtions up to time t 1 to build n indexb mk (t) for ech bnditrm pir, nd then selects the pir I(t) with the highest index. The index B mk consists of two terms. The first term is the negtive of the estimted gp for rm k in bndit m. Similr to other upper-confidence bound (UCB) methods [3], the second prt is n explortion term which forces the lgorithm to pull rms tht hve been less explored. As result, the lgorithm tends to pull rms with smll estimted gp nd smll number of pulls. The explortion prmeter tunes the level of explortion of the lgorithm. As it is shown by the theoreticl nlysis of Sec. 3.1, if the time horizonnis known,should be set to = 4 n K 9 H, whereh = m,k b2 / 2 mk is the complexity of the problem (see Sec. 3.1 for further discussion). Note tht GpE differs from most stndrd bndit strtegies in the sense tht the B-index for n rm depends explicitly on the sttistics of the other rms. This feture mkes the nlysis of this lgorithm much more involved. As we my notice from Fig. 1, GpE resembles the UCB-E lgorithm [1] designed to solve the pure explortion problem in the single-bndit setting. Nonetheless, the use of the negtive estimted gp ( mk ) insted of the estimted men ( µ mk ) (used by UCB-E) is crucil in the multi-bndit setting. In the single-bndit problem, since the best nd second best rms hve the sme gp ( mk m = min k k m mk ), GpE considers them equivlent nd tends to pull them the sme mount of time, while UCB-E tends to pull the best rm more often thn the second best one. Despite this difference, the performnce of both lgorithms in predicting the best rm fternpulls would be the sme. This is due to the fct tht the probbility of error depends on the cpbility of the lgorithm to distinguish optiml nd suboptiml rms, nd this is not ffected by different lloction over the best nd second best rms s long s the number of pulls llocted to tht pir is lrge enough w.r.t. their gp. Despite this similrity, the two pproches become completely different in the multi-bndit cse. In this cse, if we run UCB-E on ll themk rms, it tends to pull more the rm with the highest men over ll the bndits, i.e., k = rgmx m,k µ mk. As result, it would be ccurte in predicting the best rmk over bndits, but my hve n rbitrrily bd performnce in predicting the best rm for ech bndit, nd thus, my incur lrge error l(n). On the other hnd, GpE focuses on the rms with the smllest gps. This wy, it ssigns more pulls to bndits whose optiml rms re difficult to identify (i.e., bndits with rms with smll gps), nd s shown in the next section, it chieves high probbility in identifying the best rm in ech bndit. 3.1 Theoreticl Anlysis In this section, we derive n upper-bound on the probbility of errorl(n) for the GpE lgorithm. Theorem 1. If we run GpE with prmeter0 < 4 9 in prticulr for = 4 9 l(n) P ( m : J m (n) k m n MK H, then its probbility of error stisfies ) 2MKnexp( 64 ), n MK 1 n MK H, we hvel(n) 2MKnexp( 144 H ). Remrk 1 (Anlysis of the bound). If the time horizonnis known in dvnce, it would be possible to set the explortion prmeter s liner function ofn, nd s result, the probbility of error of GpE decreses exponentilly with the time horizon. The other interesting spect of the bound is the 3

4 complexity termh ppering in the optiml vlue of the explortion prmeter (i.e., = 4 n K 9 H ). If we denote byh mk = b 2 / 2 mk, the complexity of rmk in bnditm, it is cler from the definition of H tht ech rm hs n dditive impct on the overll complexity of the multi-bndit problem. Moreover, if we define the complexity of ech bndit m s H m = k b2 / 2 mk (similr to the definition of complexity for UCB-E in [1]), the GpE complexity my be rewritten sh = m H m. This mens tht the complexity of GpE is simply the sum of the complexities of ll the bndits. Remrk 2 (Comprison with the sttic lloction strtegy). The min objective of GpE is to trdeoff between llocting pulls ccording to the gps (more precisely, ccording to the complexities H mk ) nd the explortion needed to improve the ccurcy of their estimtes. If the gps were known in dvnce, nerly-optiml sttic lloction strtegy ssigns to ech bndit-rm pir number of pulls proportionl to its complexity. Let us consider strtegy tht pulls ech rm fixed number of times over the horizon n. The probbility of error for this strtegy my be bounded s l Sttic(n) P ( m : J m(n) km ) M P ( J m(n) k ) M m m=1 m=1 M m=1 k km exp ( T mk (n) 2 mk b 2 ) = M m=1 k km k k m P (ˆµ mk m (n) ˆµ mk (n) ) exp ( T mk (n)h 1 mk). (1) Given the constrint mk T mk(n) = n, the lloction minimizing the lst term in Eq. 1 is Tmk (n) = nh mk/h. We refer to this fixed strtegy s StticGp. Although this is not necessrily the optiml sttic strtegy (Tmk (n) minimizes n upper-bound), this lloction gurntees probbility of error smller thn M K exp( n/h). Theorem 1 shows tht, for n lrge enough, GpE chieves the sme performnce s the sttic lloction StticGp. Remrk 3 (Comprison with other lloction strtegies). At the beginning of Sec. 3, we discussed the difference between GpE nd UCB-E. Here we compre the bound reported in Theorem 1 with the performnce of the Uniform nd combined Uniform+UCB-E lloction strtegies. In the uniform lloction strtegy, the totl budget n is uniformly split over ll the bndits nd rms. As result, ech bndit-rm pir is pulledt mk (n) = n/(mk) times. Using the sme derivtion s in Remrk 2, the probbility of errorl(n) for this strtegy my be bounded s l Unif(n) M m=1 k km exp ( n 2 ) ( mk MKexp MK b 2 n ). MK mx m,k H mk In the Uniform+UCB-E lloction strtegy, i.e., two-level lgorithm tht first selects bndit uniformly nd then pulls rms within ech bndit using UCB-E, the totl number of pulls for ech bndit m is k T mk(n) = n/m, while the number of pulls T mk (n) over the rms in bndit m is determined by UCB-E. Thus, the probbility of error of this strtegy my be bounded s l Unif+UCB-E(n) M m=1 ( n/m K ) ( n/m K ) 2nK exp 2nMK exp, 18H m 18mx mh m where the first inequlity follows from Theorem 1 in [1] (recll thth m = k b2 / 2 mk ). Letb = 1 (i.e., ll the rms hve distributions bounded in [0, 1]), up to constnts nd multiplictive fctors in front of the exponentils, nd if n is lrge enough compred to M nd K (so s to pproximte n/m K nd n K by n), the probbility of error for the three lgorithms my be bounded s ( l Unif(n) exp O ( n/mk ) ) mx H, l U+UCBE(n) exp(o ( n/m ) ), l GpE(n) exp(o ( n ) ). mk mx m,k m Hm H mk By compring the rguments of the exponentil terms, we hve the trivil sequence of inequlities MKmx m,k H mk M mx m k H mk m,k H mk, which implies tht the upper bound on the probbility of error of GpE is usully significntly smller. This reltionship, which is confirmed by the experiments reported in Sec. 4, shows tht GpE is ble to dpt to the complexity H of the overll multi-bndit problem better thn the other two lloction strtegies. In fct, while the performnce of the Uniform strtegy depends on the most complex rm over the bndits nd the strtegy Unif+UCB-E is ffected by the most complex bndit, the performnce of GpE depends on the sum of the complexities of ll the rms involved in the pure explortion problem. m,k 4

5 Proof of Theorem 1. Step 1. Let us consider the following event: { E = m {1,...,M}, k {1,...,K}, t {1,...,n}, } µmk (t) µ mk < bc. T mk (t) From Chernoff-Hoeffding s inequlity nd union bound, we hvep(ξ) 1 2MKnexp( 2c 2 ). Now we would like to prove tht on the evente, we find the best rm for ll the bndits, i.e.,j m (n) = km, m {1...M}. SinceJ m (n) is the empiricl best rm of bnditm, we should prove tht for ny k {1,...,K}, µ mk (n) µ mk m (n). By upper-bounding the LHS nd lower-bounding the RHS of this inequlity, we note tht it would be enough to prove bc /T mk (n) mk /2 on the evente, or equivlently, to prove tht for ny bndit-rm pirm,k, we hvet mk (n) 4b2 c 2. Step 2. In this step, we show tht in GpE, for ny bndits (m,q) nd rms (k,j), nd for ny t M K, the following dependence between the number of pulls of the rms holds mk +(1+d)b mx ( T mk (t) 1,1 ) qj +(1 d)b T qj (t), (2) where d [0, 1]. We prove this inequlity by induction. Bse step. We know tht fter the first MK rounds of the GpE lgorithm, ll the rms hve been pulled once, i.e.,t mk (t) = 1, m,k, thus if 1/4d 2, the inequlity (2) holds fort = MK. Inductive step. Let us ssume tht (2) holds t time t 1 nd we pull rm i of bndit p t time t, i.e., I(t) = (p,i). So t time t, the inequlity (2) trivilly holds for every choice of m, q, k, nd j, except when (m,k) = (p,i). As result, in the inductive step, we only need to prove tht the following holds for nyq {1,...M} ndj {1,...K} pi +(1+d)b mx ( T pi (t) 1,1 ) qj +(1 d)b T qj (t). (3) Since rmiof bnditphs been pulled t timet, we hve tht for ny bndit-rm pir(q,j) pi (t 1)+b T pi (t 1) qj (t 1)+b T qj (t 1). (4) To prove (3), we first prove n upper-bound for pi (t 1) nd lower-bound for qj (t 1) 2 mk pi (t 1) pi + 2bc 1 c T pi (t) 1 nd qj (t 1) qj 2 2bc 1 d T qj (t). (5) We report the proofs of the inequlities in (5) in App. B of [8]. The inequlity (3), nd s result, the inductive step is proved by replcing pi (t 1) nd qj (t 1) in (4) from (5) nd under the conditions thtd 2c 1 c ndd 2 2c 1 d. These conditions re stisfied byd = 1/2 ndc = 2/16. Step 3. In order to prove the condition of T mk (n) in step 1, we need to find lower-bound on the number of pulls of ll the rms t time t = n (t the end). Let us ssume tht rmk of bnditmhs been pulled less thn b2 (1 d) 2, which indictes tht 2 mk + (1 d)b T mk mk (n) > 0. From this result nd (2), we hve qj +(1+d)b T > 0, or equivlentlyt qj(n) 1 qj(n) < b2 (1+d) qj for ny pir (q,j). We lso know tht q,j T qj(n) = n. From these, we deduce tht n MK < b 2 (1+d) 2 q,j 1 2 qj. So, if we selectsuch thtn MK b 2 (1+d) 2 q,j the first ssumption tht T mk (n) < b2 (1 d) qj, we contrdict, which mens tht T 2 mk (n) 4b2 c 2 for ny pir mk 2 mk (m,k), when 1 d 2c. This concludes the proof. The condition for in the sttement of the theorem comes from our choice ofin this step nd the vlues ofcnddfrom the inductive step. 3.2 Extensions In this section we propose two vrints on the GpE lgorithm with the objective of extending its pplicbility nd improving its performnce. 5

6 GpE with vrince (GpE-V). The lloction strtegy implemented by GpE focuses only on the rms with smll gp nd does not tke into considertion their vrince. However, it is cler tht the rms with smll vrince, even if their gp is smll, just need few pulls to be correctly estimted. In order to tke into ccount both the gps nd vrinces of the rms, we introduce the GpE-vrince Tmk (t) s=1 X 2 mk (s) µ2 mk (GpE-V) lgorithm. Let σ mk 2 (t) = 1 T mk (t) 1 (t) be the estimted vrince for rm k of bndit m t the end of round t. GpE-V uses the following B-index for ech rm: B mk (t) = mk (t 1)+ 2 σ mk 2 (t 1) + T mk (t 1) 7b 3 ( T mk (t 1) 1 ). Note tht the explortion term in the B-index hs now two components: the first one depends on the empiricl vrince nd the second one decreses so(1/t mk ). As result, rms with low vrince will be explored much less thn in the GpE lgorithm. Similr to the difference between UCB [3] nd UCB-V [2], while the B-index in GpE is motivted by Hoeffding s inequlities, the one for GpE-V is obtined using n empiricl Bernstein s inequlity [11, 2]. The following performnce bound cn be proved for GpE-V lgorithm. We report the proof of Theorem 2 in App. C of [8]. Theorem 2. If GpE-V is run with prmeter0 < 8 9 n 2MK H σ, then it stisfies l(n) P ( m : J m (n) km ) 6nMKexp ( 9 ) in prticulr for = 8 n 2MK 9 H, we hvel(n) 6nMKexp ( ) 1 n 2MK σ 64 8 H. σ In Theorem 2,H σ is the complexity of the GpE-V lgorithm nd is defined s ( M K H σ σmk + σmk 2 = +(16/3)b 2 mk). m=1 k=1 Although the vrince-complexityh σ could be lrger thn the complexityh used in GpE, whenever the vrinces of the rms re smll compred to the rngebof the distribution, we expecth σ to be smller thn H. Furthermore, if the rms hve very different vrinces, then GpE-V is expected to better cpture the complexity of ech rm nd llocte the pulls ccordingly. For instnce, in the cse where ll the gps re the sme, GpE tends to llocte pulls proportionlly to the complexity H mk nd it would perform n lmost uniform lloction over bndits nd rms. On the other hnd, the vrinces of the rms could be very heterogeneous nd GpE-V would dpt the lloction strtegy by pulling more often the rms whose vlues re more uncertin. Adptive GpE nd GpE-V. A drwbck of GpE nd GpE-V is tht the explortion prmeter should be tuned ccording to the complexitiesh nd H σ of the multi-bndit problem, which re rrely known in dvnce. A strightforwrd solution to this issue is to move to n dptive version of these lgorithms by substitutingh ndh σ with suitble estimtes Ĥ nd Ĥσ. At ech step t of the dptive GpE nd GpE-V lgorithms, we estimte these complexities s Ĥ(t) = m,k b 2 UCB i (t) 2, UCB i (t) = i(t 1)+ 2 mk Ĥσ (t) = ( LCBσi (t)+ LCB σi (t) 2 +(16/3)b UCB i (t) ) 2, where UCB i (t) 2 m,k ( ) 1 2 nd LCB σi (t) = mx 0, σ i(t 1). 2T i(t 1) T i(t 1) 1 Similr to the dptive version of UCB-E in [1],Ĥ ndĥσ re lower-confidence bounds on the true complexities H nd H σ. Note tht the GpE nd GpE-V bounds written for the optiml vlue of indicte n inverse reltion between the complexity nd the explortion. By using lower-bound on the trueh ndh σ, the lgorithms tend to explore rms more uniformly nd this llows them to increse the ccurcy of their estimted complexities. Although we do not nlyze these lgorithms, we empiriclly show in Sec. 4 tht they re in fct ble to mtch the performnce of the GpE nd GpE-V lgorithms. 4 Numericl Simultions In this section, we report numericl simultions of the gp-bsed lgorithms presented in this pper, GpE nd GpE-V, nd their dptive versions A-GpE nd A-GpE-V, nd compre them with Unif 6

7 Mximum probbility of error Uniform + UCBE GpE Adpt GpE /8 1/4 1/2 1 Prmeter η Mximum probbility of error GpE GpE V Prmeter η Adpt GpE V 1/4 1/2 1 2 Figure 2: (left) Problem 1: Comprison between GpE, dptive GpE, nd the uniform strtegies. (right) Problem 2: Comprison between GpE, GpE-V, nd dptive GpE-V lgorithms. Unif + UCBE Unif + A UCBE Unif + UCBE V Unif + A UCBE V GpE A GpE GpE V A GpE V Mximum probbility of error /4 1/ /4 1/ / Prmeter η Figure 3: Performnce of the lgorithms in Problem 3. 1/4 1/2 1 2 nd Unif+UCB-E lgorithms introduced in Sec The results of our experiments both those in the pper nd those in App. A of [8] indicte tht 1) GpE successfully dpts its lloction strtegy to the complexity of ech bndit nd outperforms the uniform lloction strtegies, 2) the use of the empiricl vrince in GpE-V cn significntly improve the performnce over GpE, nd 3) the dptive versions of GpE nd GpE-V tht estimte the complexities H nd H σ online ttin the sme performnce s the bsic lgorithms, which receiveh ndh σ s n input. Experimentl setting. We use the following three problems in our experiments. Note tht b = 1 nd tht Rdemcher distribution with prmeters(x, y) tkes vlue x or y with probbility 1/2. Problem 1. n = 700, M = 2, K = 4. The rms hve Bernoulli distribution with prmeters: bndit 1 = (0.5, 0.45, 0.4, 0.3), bndit 2 =(0.5, 0.3, 0.2, 0.1). Problem 2. n = 1000, M = 2, K = 4. The rms hve Rdemcher distribution with prmeters (x, y): bndit 1 = {(0, 1.0),(0.45, 0.45),(0.25, 0.65),(0, 0.9)} nd in bndit 2 = {(0.4,0.6),(0.45,0.45),(0.35,0.55),(0.25,0.65)}. Problem 3. n = 1400, M = 4, K = 4. The rms hve Rdemcher distribution with prmeters (x, y): bndit 1 = {(0, 1.0),(0.45, 0.45),(0.25, 0.65),(0, 0.9)}, bndit 2 = {(0.4, 0.6),(0.45, 0.45),(0.35, 0.55),(0.25, 0.65)}, bndit 3 = {(0, 1.0),(0.45, 0.45), (0.25,0.65),(0,0.9)}, nd bndit 4 ={(0.4,0.6),(0.45,0.45),(0.35,0.55),(0.25,0.65)}. All the lgorithms, except the uniform lloction, hve n explortion prmeter. The theoreticl nlysis suggests thtshould be proportionl to n H. Although could be optimized ccording to the bound, since the constnts in the nlysis re not ccurte, we will run the lgorithms with = η n H, where η is prmeter which is empiriclly tuned (in the experiments we report four different vlues for η). If H correctly defines the complexity of the explortion problem (i.e., the number of smples to find the best rms with high probbility),η should simply correct the inccurcy of the constnts in the nlysis, nd thus, the rnge of its nerly-optiml vlues should be constnt cross different problems. In Unif+UCB-E, UCB-E is run with the budget of n/m nd the sme prmeter η for ll the bndits. Finlly, we set n H σ, since we expect H σ to roughly cpture the number of pulls necessry to solve the pure explortion problem with high probbility. In Figs. 2 nd 3, we report the performnce l(n), i.e. the probbility to identify the best rm in ll the bndits fter n rounds, of the gp-bsed lgorithms s well s Unif nd Unif+UCB-E strtegies. The results re verged 7

8 over 10 5 runs nd the error brs correspond to three times the estimted stndrd devition. In ll the figures the performnce of Unif is reported s horizontl dshed line. The left pnel of Fig. 2 displys the performnce of Unif+UCB-E, GpE, nd A-GpE in Problem1. As expected, Unif+UCB-E hs better performnce (23.9% probbility of error) thn Unif (29.4% probbility of error), since it dpts the lloction within ech bndit so s to pull more often the nerly-optiml rms. However, the two bndit problems re not eqully difficult. In fct, their complexities re very different (H nd H 2 67), nd thus, much less smples re needed to identify the best rm in the second bndit thn in the first one. Unlike Unif+UCB-E, GpE dpts its lloction strtegy to the complexities of the bndits (on verge only 19% of the pulls re llocted to the second bndit), nd t the sme time to the rm complexities within ech bndit (in the first bndit the verged lloction of GpE is (37%,36%,20%,7%)). As result, GpE hs probbility of error of 15.7%, which represents significnt improvement over Unif+UCB-E. The right pnel of Fig. 2 compres the performnce of GpE, GpE-V, nd A-GpE-V in Problem 2. In this problem, ll the gps re equls ( mk = 0.05), thus ll the rms (nd bndits) hve the sme complexity H mk = 400. As result, GpE tends to implement nerly uniform lloction, which results in smll difference between Unif nd GpE (28% nd 25% ccurcy, respectively). The reson why GpE is still ble to improve over Unif my be explined by the difference between sttic nd dynmic lloction strtegies nd it is further investigted in App. A of [8]. Unlike the gps, the vrince of the rms is extremely heterogeneous. In fct, the vrince of the rms of bndit1 is bigger thn in bndit 2, thus mking it hrder to solve. This difference is cptured by the definition of H σ (H1 σ 1400 > Hσ 2 600). Note lso tht Hσ H. As discussed in Sec. 3.2, since GpE-V tkes into ccount the empiricl vrince of the rms, it is ble to dpt to the complexity Hmk σ of ech bndit-rm pir nd to focus more on uncertin rms. GpE-V improves the finl ccurcy by lmost 10% w.r.t. GpE. From both pnels of Fig. 2, we lso notice tht the dptive lgorithms chieve similr performnce to their non-dptive counterprts. Finlly, we notice tht good choice of prmeter η for GpE-V is lwys close to 2 nd 4 (see lso [8] for dditionl experiments), while GpE needs η to be tuned more crefully, prticulrly in Problem 2 where the lrge vlues ofη try to compenste the fct thth does not successfully cpture the rel complexity of the problem. This further strengthens the intuition tht H σ is more ccurte mesure of the complexity for the multi-bndit pure explortion problem. While Problems 1 nd 2 re reltively simple, we report the results of the more complicted Problem 3 in Fig. 3. The experiment is designed so tht the complexity w.r.t. the vrince of ech bndit nd within ech bndit is strongly heterogeneous. In this experiment, we lso introduce UCBE-V tht extends UCB-E by tking into ccount the empiricl vrince similrly to GpE-V. The results confirm the previous findings nd show the improvement chieved by introducing empiricl estimtes of the vrince nd llocting non-uniformly over bndits. 5 Conclusion In this pper, we studied the problem of best rm identifiction in multi-bndit multi-rmed setting. We introduced gp-bsed explortion lgorithm, clled GpE, nd proved n upper-bound for its probbility of error. We extended the bsic lgorithm to lso consider the vrince of the rms nd proved n upper-bound for its probbility of error. We lso introduced dptive versions of these lgorithms tht estimte the complexity of the problem online. The numericl simultions confirmed the theoreticl findings tht GpE nd GpE-V outperform other lloction strtegies, nd tht their dptive counterprts re ble to estimte the complexity without worsening the globl performnce. Although GpE does not know the gps, the experimentl results reported in [8] indicte tht it might outperform sttic lloction strtegy, which knows the gps in dvnce, thus suggesting tht n dptive strtegy could perform better thn sttic one. This observtion sks for further investigtion. Moreover, we pln to pply the lgorithms introduced in this pper to the problem of rollout lloction for clssifiction-bsed policy itertion in reinforcement lerning [9, 6], where the gol is to identify the greedy ction (rm) in ech of the sttes (bndit) in trining set. Acknowledgments Experiments presented in this pper were crried out using the Grid 5000 experimentl testbed ( This work ws supported by Ministry of Higher Eduction nd Reserch, Nord-Ps de Clis Regionl Council nd FEDER through the contrt de projets étt region , French Ntionl Reserch Agency (ANR) under project LAMPADA n ANR-09-EMER-007, Europen Community s Seventh Frmework Progrmme (FP7/ ) under grnt greementn , nd PASCAL2 Europen Network of Excellence. 8

9 References [1] J.-Y. Audibert, S. Bubeck, nd R. Munos. Best rm identifiction in multi-rmed bndits. In Proceedings of the Twenty-Third Annul Conference on Lerning Theory, pges 41 53, [2] Jen-Yves Audibert, Rémi Munos, nd Csb Szepesvári. Tuning bndit lgorithms in stochstic environments. In Mrcus Hutter, Rocco Servedio, nd Eiji Tkimoto, editors, Algorithmic Lerning Theory, volume 4754 of Lecture Notes in Computer Science, pges Springer Berlin / Heidelberg, [3] P. Auer, N. Ces-Binchi, nd P. Fischer. Finite-time nlysis of the multi-rmed bndit problem. Mchine Lerning, 47: , [4] S. Bubeck, R. Munos, nd G. Stoltz. Pure explortion in multi-rmed bndit problems. In Proceedings of the Twentieth Interntionl Conference on Algorithmic Lerning Theory, pges 23 37, [5] K. Deng, J. Pineu, nd S. Murphy. Active lerning for personlizing tretment. In IEEE Symposium on Adptive Dynmic Progrmming nd Reinforcement Lerning, [6] C. Dimitrkkis nd M. Lgoudkis. Rollout smpling pproximte policy itertion. Mchine Lerning Journl, 72(3): , [7] Eyl Even-Dr, Shie Mnnor, nd Yishy Mnsour. Action elimintion nd stopping conditions for the multi-rmed bndit nd reinforcement lerning problems. Journl of Mchine Lerning Reserch, 7: , [8] V. Gbillon, M. Ghvmzdeh, A. Lzric, nd S. Bubeck. Multi-bndit best rm identifiction. Technicl Report , INRIA, [9] M. Lgoudkis nd R. Prr. Reinforcement lerning s clssifiction: Leverging modern clssifiers. In Proceedings of the Twentieth Interntionl Conference on Mchine Lerning, pges , [10] O. Mron nd A. Moore. Hoeffding rces: Accelerting model selection serch for clssifiction nd function pproximtion. In Proceedings of Advnces in Neurl Informtion Processing Systems 6, [11] A. Murer nd M. Pontil. Empiricl bernstein bounds nd smple-vrince penliztion. In 22th nnul conference on lerning theory, [12] V. Mnih, Cs. Szepesvári, nd J.-Y. Audibert. Empiricl Bernstein stopping. In Proceedings of the Twenty-Fifth Interntionl Conference on Mchine Lerning, pges , [13] H. Robbins. Some spects of the sequentil design of experiments. Bulletin of the Americn Mthemtics Society, 58: ,

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent

More information

Multi-Bandit Best Arm Identification

Multi-Bandit Best Arm Identification Multi-Bndit Best Arm Identifiction Victor Gbillon Mohmmd Ghvmzdeh Alessndro Lzric INRIA Lille - Nord Euroe, Tem SequeL {victor.gbillon,mohmmd.ghvmzdeh,lessndro.lzric}@inri.fr Sébstien Bubeck Dertment of

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

Lecture 14: Quadrature

Lecture 14: Quadrature Lecture 14: Qudrture This lecture is concerned with the evlution of integrls fx)dx 1) over finite intervl [, b] The integrnd fx) is ssumed to be rel-vlues nd smooth The pproximtion of n integrl by numericl

More information

Tests for the Ratio of Two Poisson Rates

Tests for the Ratio of Two Poisson Rates Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (

More information

Student Activity 3: Single Factor ANOVA

Student Activity 3: Single Factor ANOVA MATH 40 Student Activity 3: Single Fctor ANOVA Some Bsic Concepts In designed experiment, two or more tretments, or combintions of tretments, is pplied to experimentl units The number of tretments, whether

More information

The steps of the hypothesis test

The steps of the hypothesis test ttisticl Methods I (EXT 7005) Pge 78 Mosquito species Time of dy A B C Mid morning 0.0088 5.4900 5.5000 Mid Afternoon.3400 0.0300 0.8700 Dusk 0.600 5.400 3.000 The Chi squre test sttistic is the sum of

More information

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004 Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when

More information

A signalling model of school grades: centralized versus decentralized examinations

A signalling model of school grades: centralized versus decentralized examinations A signlling model of school grdes: centrlized versus decentrlized exmintions Mri De Pol nd Vincenzo Scopp Diprtimento di Economi e Sttistic, Università dell Clbri m.depol@unicl.it; v.scopp@unicl.it 1 The

More information

Numerical Integration

Numerical Integration Chpter 5 Numericl Integrtion Numericl integrtion is the study of how the numericl vlue of n integrl cn be found. Methods of function pproximtion discussed in Chpter??, i.e., function pproximtion vi the

More information

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by. NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

ECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance

ECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance Generl structure ECO 37 Economics of Uncertinty Fll Term 007 Notes for lectures 4. Stochstic Dominnce Here we suppose tht the consequences re welth mounts denoted by W, which cn tke on ny vlue between

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

8 Laplace s Method and Local Limit Theorems

8 Laplace s Method and Local Limit Theorems 8 Lplce s Method nd Locl Limit Theorems 8. Fourier Anlysis in Higher DImensions Most of the theorems of Fourier nlysis tht we hve proved hve nturl generliztions to higher dimensions, nd these cn be proved

More information

LECTURE NOTE #12 PROF. ALAN YUILLE

LECTURE NOTE #12 PROF. ALAN YUILLE LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of

More information

arxiv: v1 [stat.ml] 9 Aug 2016

arxiv: v1 [stat.ml] 9 Aug 2016 On Lower Bounds for Regret in Reinforcement Lerning In Osbnd Stnford University, Google DeepMind iosbnd@stnford.edu Benjmin Vn Roy Stnford University bvr@stnford.edu rxiv:1608.02732v1 [stt.ml 9 Aug 2016

More information

APPROXIMATE INTEGRATION

APPROXIMATE INTEGRATION APPROXIMATE INTEGRATION. Introduction We hve seen tht there re functions whose nti-derivtives cnnot be expressed in closed form. For these resons ny definite integrl involving these integrnds cnnot be

More information

Numerical Analysis: Trapezoidal and Simpson s Rule

Numerical Analysis: Trapezoidal and Simpson s Rule nd Simpson s Mthemticl question we re interested in numericlly nswering How to we evlute I = f (x) dx? Clculus tells us tht if F(x) is the ntiderivtive of function f (x) on the intervl [, b], then I =

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

Math& 152 Section Integration by Parts

Math& 152 Section Integration by Parts Mth& 5 Section 7. - Integrtion by Prts Integrtion by prts is rule tht trnsforms the integrl of the product of two functions into other (idelly simpler) integrls. Recll from Clculus I tht given two differentible

More information

Online Supplements to Performance-Based Contracts for Outpatient Medical Services

Online Supplements to Performance-Based Contracts for Outpatient Medical Services Jing, Png nd Svin: Performnce-bsed Contrcts Article submitted to Mnufcturing & Service Opertions Mngement; mnuscript no. MSOM-11-270.R2 1 Online Supplements to Performnce-Bsed Contrcts for Outptient Medicl

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model:

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model: 1 2 MIXED MODELS (Sections 17.7 17.8) Exmple: Suppose tht in the fiber breking strength exmple, the four mchines used were the only ones of interest, but the interest ws over wide rnge of opertors, nd

More information

Review of basic calculus

Review of basic calculus Review of bsic clculus This brief review reclls some of the most importnt concepts, definitions, nd theorems from bsic clculus. It is not intended to tech bsic clculus from scrtch. If ny of the items below

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

Non-Linear & Logistic Regression

Non-Linear & Logistic Regression Non-Liner & Logistic Regression If the sttistics re boring, then you've got the wrong numbers. Edwrd R. Tufte (Sttistics Professor, Yle University) Regression Anlyses When do we use these? PART 1: find

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses Chpter 9: Inferences bsed on Two smples: Confidence intervls nd tests of hypotheses 9.1 The trget prmeter : difference between two popultion mens : difference between two popultion proportions : rtio of

More information

Estimation of Binomial Distribution in the Light of Future Data

Estimation of Binomial Distribution in the Light of Future Data British Journl of Mthemtics & Computer Science 102: 1-7, 2015, Article no.bjmcs.19191 ISSN: 2231-0851 SCIENCEDOMAIN interntionl www.sciencedomin.org Estimtion of Binomil Distribution in the Light of Future

More information

Hybrid Group Acceptance Sampling Plan Based on Size Biased Lomax Model

Hybrid Group Acceptance Sampling Plan Based on Size Biased Lomax Model Mthemtics nd Sttistics 2(3): 137-141, 2014 DOI: 10.13189/ms.2014.020305 http://www.hrpub.org Hybrid Group Acceptnce Smpling Pln Bsed on Size Bised Lomx Model R. Subb Ro 1,*, A. Ng Durgmmb 2, R.R.L. Kntm

More information

3.4 Numerical integration

3.4 Numerical integration 3.4. Numericl integrtion 63 3.4 Numericl integrtion In mny economic pplictions it is necessry to compute the definite integrl of relvlued function f with respect to "weight" function w over n intervl [,

More information

Math 131. Numerical Integration Larson Section 4.6

Math 131. Numerical Integration Larson Section 4.6 Mth. Numericl Integrtion Lrson Section. This section looks t couple of methods for pproimting definite integrls numericlly. The gol is to get good pproimtion of the definite integrl in problems where n

More information

Continuous Random Variables

Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht

More information

A Fast and Reliable Policy Improvement Algorithm

A Fast and Reliable Policy Improvement Algorithm A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce

More information

Bayesian Networks: Approximate Inference

Bayesian Networks: Approximate Inference pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17 EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

Numerical Integration. 1 Introduction. 2 Midpoint Rule, Trapezoid Rule, Simpson Rule. AMSC/CMSC 460/466 T. von Petersdorff 1

Numerical Integration. 1 Introduction. 2 Midpoint Rule, Trapezoid Rule, Simpson Rule. AMSC/CMSC 460/466 T. von Petersdorff 1 AMSC/CMSC 46/466 T. von Petersdorff 1 umericl Integrtion 1 Introduction We wnt to pproximte the integrl I := f xdx where we re given, b nd the function f s subroutine. We evlute f t points x 1,...,x n

More information

31.2. Numerical Integration. Introduction. Prerequisites. Learning Outcomes

31.2. Numerical Integration. Introduction. Prerequisites. Learning Outcomes Numericl Integrtion 3. Introduction In this Section we will present some methods tht cn be used to pproximte integrls. Attention will be pid to how we ensure tht such pproximtions cn be gurnteed to be

More information

Generation of Lyapunov Functions by Neural Networks

Generation of Lyapunov Functions by Neural Networks WCE 28, July 2-4, 28, London, U.K. Genertion of Lypunov Functions by Neurl Networks Nvid Noroozi, Pknoosh Krimghee, Ftemeh Sfei, nd Hmed Jvdi Abstrct Lypunov function is generlly obtined bsed on tril nd

More information

Near-Bayesian Exploration in Polynomial Time

Near-Bayesian Exploration in Polynomial Time J. Zico Kolter kolter@cs.stnford.edu Andrew Y. Ng ng@cs.stnford.edu Computer Science Deprtment, Stnford University, CA 94305 Abstrct We consider the explortion/exploittion problem in reinforcement lerning

More information

Vyacheslav Telnin. Search for New Numbers.

Vyacheslav Telnin. Search for New Numbers. Vycheslv Telnin Serch for New Numbers. 1 CHAPTER I 2 I.1 Introduction. In 1984, in the first issue for tht yer of the Science nd Life mgzine, I red the rticle "Non-Stndrd Anlysis" by V. Uspensky, in which

More information

Credibility Hypothesis Testing of Fuzzy Triangular Distributions

Credibility Hypothesis Testing of Fuzzy Triangular Distributions 666663 Journl of Uncertin Systems Vol.9, No., pp.6-74, 5 Online t: www.jus.org.uk Credibility Hypothesis Testing of Fuzzy Tringulr Distributions S. Smpth, B. Rmy Received April 3; Revised 4 April 4 Abstrct

More information

Chapter 5 : Continuous Random Variables

Chapter 5 : Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll

More information

Predict Global Earth Temperature using Linier Regression

Predict Global Earth Temperature using Linier Regression Predict Globl Erth Temperture using Linier Regression Edwin Swndi Sijbt (23516012) Progrm Studi Mgister Informtik Sekolh Teknik Elektro dn Informtik ITB Jl. Gnesh 10 Bndung 40132, Indonesi 23516012@std.stei.itb.c.id

More information

CS667 Lecture 6: Monte Carlo Integration 02/10/05

CS667 Lecture 6: Monte Carlo Integration 02/10/05 CS667 Lecture 6: Monte Crlo Integrtion 02/10/05 Venkt Krishnrj Lecturer: Steve Mrschner 1 Ide The min ide of Monte Crlo Integrtion is tht we cn estimte the vlue of n integrl by looking t lrge number of

More information

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17 CS 70 Discrete Mthemtics nd Proility Theory Summer 2014 Jmes Cook Note 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion, y tking

More information

Euler, Ioachimescu and the trapezium rule. G.J.O. Jameson (Math. Gazette 96 (2012), )

Euler, Ioachimescu and the trapezium rule. G.J.O. Jameson (Math. Gazette 96 (2012), ) Euler, Iochimescu nd the trpezium rule G.J.O. Jmeson (Mth. Gzette 96 (0), 36 4) The following results were estblished in recent Gzette rticle [, Theorems, 3, 4]. Given > 0 nd 0 < s

More information

A NOTE ON ESTIMATION OF THE GLOBAL INTENSITY OF A CYCLIC POISSON PROCESS IN THE PRESENCE OF LINEAR TREND

A NOTE ON ESTIMATION OF THE GLOBAL INTENSITY OF A CYCLIC POISSON PROCESS IN THE PRESENCE OF LINEAR TREND A NOTE ON ESTIMATION OF THE GLOBAL INTENSITY OF A CYCLIC POISSON PROCESS IN THE PRESENCE OF LINEAR TREND I WAYAN MANGKU Deprtment of Mthemtics, Fculty of Mthemtics nd Nturl Sciences, Bogor Agriculturl

More information

Lecture 1. Functional series. Pointwise and uniform convergence.

Lecture 1. Functional series. Pointwise and uniform convergence. 1 Introduction. Lecture 1. Functionl series. Pointwise nd uniform convergence. In this course we study mongst other things Fourier series. The Fourier series for periodic function f(x) with period 2π is

More information

Improper Integrals, and Differential Equations

Improper Integrals, and Differential Equations Improper Integrls, nd Differentil Equtions October 22, 204 5.3 Improper Integrls Previously, we discussed how integrls correspond to res. More specificlly, we sid tht for function f(x), the region creted

More information

A Brief Review on Akkar, Sandikkaya and Bommer (ASB13) GMPE

A Brief Review on Akkar, Sandikkaya and Bommer (ASB13) GMPE Southwestern U.S. Ground Motion Chrcteriztion Senior Seismic Hzrd Anlysis Committee Level 3 Workshop #2 October 22-24, 2013 A Brief Review on Akkr, Sndikky nd Bommer (ASB13 GMPE Sinn Akkr Deprtment of

More information

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1 Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

5.7 Improper Integrals

5.7 Improper Integrals 458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the

More information

7.2 The Definite Integral

7.2 The Definite Integral 7.2 The Definite Integrl the definite integrl In the previous section, it ws found tht if function f is continuous nd nonnegtive, then the re under the grph of f on [, b] is given by F (b) F (), where

More information

Data Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading

Data Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading Dt Assimiltion Aln O Neill Dt Assimiltion Reserch Centre University of Reding Contents Motivtion Univrite sclr dt ssimiltion Multivrite vector dt ssimiltion Optiml Interpoltion BLUE 3d-Vritionl Method

More information

UNIT 1 FUNCTIONS AND THEIR INVERSES Lesson 1.4: Logarithmic Functions as Inverses Instruction

UNIT 1 FUNCTIONS AND THEIR INVERSES Lesson 1.4: Logarithmic Functions as Inverses Instruction Lesson : Logrithmic Functions s Inverses Prerequisite Skills This lesson requires the use of the following skills: determining the dependent nd independent vribles in n exponentil function bsed on dt from

More information

Math 426: Probability Final Exam Practice

Math 426: Probability Final Exam Practice Mth 46: Probbility Finl Exm Prctice. Computtionl problems 4. Let T k (n) denote the number of prtitions of the set {,..., n} into k nonempty subsets, where k n. Argue tht T k (n) kt k (n ) + T k (n ) by

More information

1B40 Practical Skills

1B40 Practical Skills B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need

More information

63. Representation of functions as power series Consider a power series. ( 1) n x 2n for all 1 < x < 1

63. Representation of functions as power series Consider a power series. ( 1) n x 2n for all 1 < x < 1 3 9. SEQUENCES AND SERIES 63. Representtion of functions s power series Consider power series x 2 + x 4 x 6 + x 8 + = ( ) n x 2n It is geometric series with q = x 2 nd therefore it converges for ll q =

More information

1 Probability Density Functions

1 Probability Density Functions Lis Yn CS 9 Continuous Distributions Lecture Notes #9 July 6, 28 Bsed on chpter by Chris Piech So fr, ll rndom vribles we hve seen hve been discrete. In ll the cses we hve seen in CS 9, this ment tht our

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

2.4 Linear Inequalities and Problem Solving

2.4 Linear Inequalities and Problem Solving Section.4 Liner Inequlities nd Problem Solving 77.4 Liner Inequlities nd Problem Solving S 1 Use Intervl Nottion. Solve Liner Inequlities Using the Addition Property of Inequlity. 3 Solve Liner Inequlities

More information

Math 8 Winter 2015 Applications of Integration

Math 8 Winter 2015 Applications of Integration Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl

More information

Reversals of Signal-Posterior Monotonicity for Any Bounded Prior

Reversals of Signal-Posterior Monotonicity for Any Bounded Prior Reversls of Signl-Posterior Monotonicity for Any Bounded Prior Christopher P. Chmbers Pul J. Hely Abstrct Pul Milgrom (The Bell Journl of Economics, 12(2): 380 391) showed tht if the strict monotone likelihood

More information

THERMAL EXPANSION COEFFICIENT OF WATER FOR VOLUMETRIC CALIBRATION

THERMAL EXPANSION COEFFICIENT OF WATER FOR VOLUMETRIC CALIBRATION XX IMEKO World Congress Metrology for Green Growth September 9,, Busn, Republic of Kore THERMAL EXPANSION COEFFICIENT OF WATER FOR OLUMETRIC CALIBRATION Nieves Medin Hed of Mss Division, CEM, Spin, mnmedin@mityc.es

More information

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Riemann is the Mann! (But Lebesgue may besgue to differ.) Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >

More information

QUADRATURE is an old-fashioned word that refers to

QUADRATURE is an old-fashioned word that refers to World Acdemy of Science Engineering nd Technology Interntionl Journl of Mthemticl nd Computtionl Sciences Vol:5 No:7 011 A New Qudrture Rule Derived from Spline Interpoltion with Error Anlysis Hdi Tghvfrd

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information

NUMERICAL INTEGRATION

NUMERICAL INTEGRATION NUMERICAL INTEGRATION How do we evlute I = f (x) dx By the fundmentl theorem of clculus, if F (x) is n ntiderivtive of f (x), then I = f (x) dx = F (x) b = F (b) F () However, in prctice most integrls

More information

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007 A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus

More information

Introduction to Group Theory

Introduction to Group Theory Introduction to Group Theory Let G be n rbitrry set of elements, typiclly denoted s, b, c,, tht is, let G = {, b, c, }. A binry opertion in G is rule tht ssocites with ech ordered pir (,b) of elements

More information

Comparison Procedures

Comparison Procedures Comprison Procedures Single Fctor, Between-Subects Cse /8/ Comprison Procedures, One-Fctor ANOVA, Between Subects Two Comprison Strtegies post hoc (fter-the-fct) pproch You re interested in discovering

More information

Operations with Polynomials

Operations with Polynomials 38 Chpter P Prerequisites P.4 Opertions with Polynomils Wht you should lern: How to identify the leding coefficients nd degrees of polynomils How to dd nd subtrct polynomils How to multiply polynomils

More information

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties; Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

Mathematics Number: Logarithms

Mathematics Number: Logarithms plce of mind F A C U L T Y O F E D U C A T I O N Deprtment of Curriculum nd Pedgogy Mthemtics Numer: Logrithms Science nd Mthemtics Eduction Reserch Group Supported y UBC Teching nd Lerning Enhncement

More information

Deteriorating Inventory Model for Waiting. Time Partial Backlogging

Deteriorating Inventory Model for Waiting. Time Partial Backlogging Applied Mthemticl Sciences, Vol. 3, 2009, no. 9, 42-428 Deteriorting Inventory Model for Witing Time Prtil Bcklogging Nit H. Shh nd 2 Kunl T. Shukl Deprtment of Mthemtics, Gujrt university, Ahmedbd. 2

More information

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0) 1 Tylor polynomils In Section 3.5, we discussed how to pproximte function f(x) round point in terms of its first derivtive f (x) evluted t, tht is using the liner pproximtion f() + f ()(x ). We clled this

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

MATH SS124 Sec 39 Concepts summary with examples

MATH SS124 Sec 39 Concepts summary with examples This note is mde for students in MTH124 Section 39 to review most(not ll) topics I think we covered in this semester, nd there s exmples fter these concepts, go over this note nd try to solve those exmples

More information

Numerical Integration

Numerical Integration Chpter 1 Numericl Integrtion Numericl differentition methods compute pproximtions to the derivtive of function from known vlues of the function. Numericl integrtion uses the sme informtion to compute numericl

More information

Aggregation of Multi-Armed Bandits Learning Algorithms for Opportunistic Spectrum Access

Aggregation of Multi-Armed Bandits Learning Algorithms for Opportunistic Spectrum Access Aggregtion of Multi-Armed Bndits Lerning Algorithms for Opportunistic Spectrum Access Lilin Besson 1,2, Emilie Kufmnn 2 nd Christophe Moy 3 1 CentrleSupélec, IETR SCEE, Cesson-Sévigné, Frnce, Lilin.Besson@CentrleSupelec.fr

More information

Entropy and Ergodic Theory Notes 10: Large Deviations I

Entropy and Ergodic Theory Notes 10: Large Deviations I Entropy nd Ergodic Theory Notes 10: Lrge Devitions I 1 A chnge of convention This is our first lecture on pplictions of entropy in probbility theory. In probbility theory, the convention is tht ll logrithms

More information

Spanning tree congestion of some product graphs

Spanning tree congestion of some product graphs Spnning tree congestion of some product grphs Hiu-Fi Lw Mthemticl Institute Oxford University 4-9 St Giles Oxford, OX1 3LB, United Kingdom e-mil: lwh@mths.ox.c.uk nd Mikhil I. Ostrovskii Deprtment of Mthemtics

More information

ADVANCEMENT OF THE CLOSELY COUPLED PROBES POTENTIAL DROP TECHNIQUE FOR NDE OF SURFACE CRACKS

ADVANCEMENT OF THE CLOSELY COUPLED PROBES POTENTIAL DROP TECHNIQUE FOR NDE OF SURFACE CRACKS ADVANCEMENT OF THE CLOSELY COUPLED PROBES POTENTIAL DROP TECHNIQUE FOR NDE OF SURFACE CRACKS F. Tkeo 1 nd M. Sk 1 Hchinohe Ntionl College of Technology, Hchinohe, Jpn; Tohoku University, Sendi, Jpn Abstrct:

More information