Multiarmed Bandits With Limited Expert Advice

Size: px
Start display at page:

Download "Multiarmed Bandits With Limited Expert Advice"

Transcription

1 uliarmed Bandis Wih Limied Exper Advice Sayen Kale Yahoo Labs ew York Absrac We consider he problem of minimizing regre in he seing of advice-efficien muliarmed bandis wih exper advice. We give an algorihm for he seing of K arms and expers ou of which we are allowed o query and use only expers advice in each round, which has a ( min{k,} regre bound 1 of Õ afer rounds. We also prove ha any algorihm for his ( problem mus have expeced regre a leas Ω, hus showing ha our upper min{k,} bound is nearly igh. his solves he COL 2013 open problem of Seldin e al. [7]. 1 Inroducion In many real world applicaions one is faced wih he problem of choosing one of several acions: for example, in healhcare, a choice of reamen; in financial domains, a choice of invesmen. ypically in such scenarios one may uilize he advice of several domain expers o make an informed choice. Once an acion is chosen, one obains feedback for he acion in erms of some loss (or reward, bu no feedback for oher acions is obained. his is repeaed over several rounds. Repeaed decision-making in his conex is modeled by he well-sudied muliarmed bandis wih exper advice problem [4]. In his paper, we sudy an imporan pracical consideraion for his seing: frequenly here are coss associaed wih obaining useful advice, and budge consrains imply ha only a few expers may be queried for advice. his consrain on he number of expers ha can be queried in any round is modeled by he advice-efficien seing of he muliarmed bandis wih exper advice problem, inroduced by Seldin e al. [7]. In his seing, in each round = 1, 2,...,, he learner is required o pull one arm A from some se A of K arms. Simulaneously, an adversary ses losses l (a [0, 1] for each arm a A, hus generaing he loss vecor l R A. Assising us in his ask are expers in he se H. Each exper h can provide advice 2 on which arm o pull in he form of a probabiliy disribuion ξ h R A on he se of arms. his advice gives he exper h an expeced loss of ξ h l in round. he cach is ha we can only observe he advice of a mos expers of our choosing in each round. he goal is o choose subses of expers in each round o query he advice of, and using heir advice his work was done when he auhor was a IB. J. Wason Research Cener. 1 Here, we use he Õ( and Ω( noaion o suppress dependence on logarihmic facors in he problem parameers. 2 o assumpions are made on how his advice is chosen by he expers oher in each round han ha i is independen of he losses of he arms chosen by he adversary in ha round. 1

2 play some arm A A (probabilisically, if desired o minimize he expeced regre wih respec o he loss of he bes exper, where he regre is defined as: Regre := l (A min h H ξ h l. In his paper we give an algorihm whose expeced regre is bounded by 2 min{k, } log( afer rounds, based on he uliplicaive Weighs (W forecaser for predicion wih exper advice [5]. We can improve his upper bound using he PolyIF forecaser of Audiber and Bubeck [2] o 8 min{k, } log( min{k,} 4. his maches he regre of he bes known algorihms for he special cases = 1 and =, and inerpolaes beween hem for inermediae values of. his solves he COL 2013 open problem proposed by Seldin e al. [7], and in fac gives a beer regre bound han he bound conjecured ( in [7], which was O Ω K log(. ( Furhermore, we also show ha any algorihm for he problem mus incur expeced regre of min{k, log(k } on some sequence of exper advice and arm losses, hus showing ha our upper bound is nearly igh: he raio beween he upper and lower bounds is always bounded by O( log(. 2 Preliminaries For any even E, le I[E] be he indicaor random variable se o 1 if E happens and 0 oherwise. In any round of he algorihm, le Pr [ ] and E [ ] denoe probabiliy and expecaion respecively condiioned on all he randomness defined up o round 1. For wo probabiliy disribuions P and Q defined on he same space le KL(P Q and d V (P, Q denoe he KL-divergence and oal variaion disance beween he wo disribuions respecively. Le p denoe he p-norm for any p 1. Wihou loss of generaliy, we may assume ha each exper suggess exacly one arm o play in any round; i.e. ξ h (a = 1 for exacly one arm a A and 0 for all oher arms. Call such advice vecors pure. o see his, for every exper h we can randomly round a general advice vecor ξ h o a pure vecor by sampling some arm a h ξ h and consrucing a new advice vecor ˆξ h by seing ˆξ h (a h = 1 and ˆξ h (a = 0 for all a a h. oe ha E[ˆξ h ] = ξ h ; hus for any exper h, following he randomly rounded advice ˆξ h for = 1, 2,..., has he same expeced cos as following he advice ξ h. Since his randomized rounding rick can be applied o he advice (algorihmically for he observed advice, and concepually for he unobserved advice, in he res of he paper we assume ha all advice vecors are pure vecors; his helps us in geing a igher bound on he regre. Le 2

3 a h denoe he acion chosen by exper h a ime, so ha he loss of he exper can be rewrien as ξ h l = l (a h. For any ime period and any se U H, define he acive se of arms o be he se of all arms recommended by expers in U, i.e. A U = {a A : h U s.. a h = a}. oe ha since we are allowed o query a mos expers in any round, if U is he queried se of expers in round, hen A U min{k, }; his leads o min{k, } facor in he regre bound. Define K := min{k, }, he effecive number of arms. hroughou he paper we also assume ha 2 and K 2: in he remaining cases we rivially ge 0 regre. 3 Algorihm he algorihm, dubbed LEXP, works as follows. Assume for simpliciy 3 ha divides, and in he beginning, pariion he expers ino R := groups of size arbirarily. Run an algorihm for predicion wih exper advice (such as uliplicaive Weighs (W forecaser of Lilesone and Warmuh [5], or he PolyIF forecaser of Audiber and Bubeck [2] on all he expers. In each round, his base exper learning algorihm compues a disribuion over he expers. hen LEXP samples an exper from his disribuion, and chooses he group of expers i belongs o o query for advice, hus ensuring ha a mos expers are queried in any round. I hen plays he acion recommended by he chosen exper, and observes is loss. I hen consrucs unbiased loss esimaors for all expers using he observed loss and queried advice and passes hese o he base exper learning algorihm, which updaes is disribuion. he loss esimaors are non-zero only for expers in he chosen group; hus hey can be compued for all expers and he algorihm is well-defined. he pseudo-code follows. 4 Analysis We firs prove a number of uiliy lemmas. he firs lemma shows ha he loss esimaors we consruc are unbiased for all expers wih posiive probabiliy in he disribuion (and an underesimae in general: Lemma 1 For all rounds and all expers h, we have E [Y h ] l (a h wih equaliy holding if q (h > 0. 4 hus, E [q (hy h ] = q (hl (a h, and uncondiionally, E[Y h ] l (a h. Proof: Le h B i. For clariy, le a = a h. If Pr [i, a] > 0, hen by he definiion of he loss esimaor in (1, we have [ E [Y h ] = E [ˆl i (a] = E l (a I[I ] = i, A = a] = l (a Pr [i, a] = l (a. Pr [i, a] Pr [i, a] 3 he regre bounds only change by a small consan facor if doesn divide. 4 I is easy o see ha boh he W and PolyIF forecasers always have posiive probabiliy on all expers, so if we use one of hese wo exper learning algorihms, hen all he inequaliies in his lemma are acually equaliies. 3

4 Algorihm 1 uliarmed Bandis wih Limied Exper Advice Algorihm (LEXP. 1: Pariion he expers ino R = / groups of expers each arbirarily. Call he groups B 1, B 2,..., B R, and define R := {1, 2,..., R}. 2: Run an algorihm for predicion wih exper advice (such as W or PolyIF on all he expers. 3: for = 1, 2,..., do 4: Le q be he disribuion over expers generaed by he base exper learning algoihm. Sample an exper H q, se I o be he index of he group o which H belongs. 5: Query he advice of all expers in B I. 6: Play A = a H, and observe is loss l (A. 7: For every group B i and every arm a A, define he loss esimaor given by ˆl i (a := { l i (a I[I=i,A=a] Pr [i,a] if Pr [i, a] > 0 0 oherwise, (1 where Pr [i, a] = h B i q (hξ h (a is he probabiliy of he even {I = i, A = a}, condiioned on all he randomness up o round 1. 8: For all expers h B i, define he loss esimaor Y h := ˆl i (a h, and pass hem o he base exper learning algorihm. 9: end for If Pr [i, a] = 0, hen ˆl i (a = 0, and so E [ˆl i (a] = 0 l (a. hus in eiher case, E [Y h ] l (a, and E [q (hy h ] = q (hl (a h. Finally, noe ha if q (h > 0, hen Pr [i, a] > 0, so equaliy holds. he nex lemma says ha he algorihm s expeced loss in each round is he same as ha of he base exper learning algorihm: Lemma 2 For all rounds we have E[l (A ] = E[ h H q (hy h ]. Proof: We have E [l (A ] = E [l (a H ] = q (hl (a h = E [ q (hy h ], h H h H by Lemma 1. aking expecaion over all he randomness up o ime 1, he proof is complee. he nex lemma gives a bound on he variance of he esimaed losses. We sae his in slighly more general erms han necessary o unify he analysis of he algorihms using he W or PolyIF forecasers as he exper learning algorihm. Lemma 3 Fix any α [1, 2]. For all rounds we have E[ (q (h α (Y h 2 ] (RK 2 α. h H Proof: Le S := {(i, a R A Pr [i, a] > 0} 4

5 be he se of all (group index, acion pairs ha have posiive probabiliy in round. Since in round, he algorihm only plays arms in A B I, and for any group B i, he se of acive arms in round, A B i, has size a mos K, we conclude ha S RK. he pair (I, A compued by he algorihm is in S. Condiioning on he value of (I, A, we can upper bound h H (q (h α (Y h 2 as follows: (q (h α (Y h 2 = (q (h α (ˆl I (ah 2 ( Y h = 0 for all h / B I h H h B I = ( (q (h α ξ h l (A 2 I (A ( ˆl Pr [I, A ] (a = 0 for all a A h B I ( (q (hξ h (A α h B I q (hξ h (A h B I 1 Pr [I, A ] α ( 2 ( ξ h (A, l (A [0, 1], α ( α 1 since α 1 Pr [I, A ] = Pr [I, A ] α 2, (2 since Pr [I, A ] = h B I q (hξ h (A. ex, we have E [ (q (h α (Y h 2 ] = E [E [ (q (h α (Y h 2 (I, A ] h H h H Pr[I, A ] Pr[I, A ] α 2 (By (2 = (I,A S (I,A S (I,A S = S 2 α (RK 2 α. Pr [I, A ] α 1 Pr[I, A ] α 1 (I,A S he penulimae inequaliy follows by applying Hölder s inequaliy o he pair of dual norms 1 α 1 and 1. aking expecaion over all he randomness up o ime 1, he proof is complee. 2 α 4.1 Analysis using he W forecaser he W forecaser for predicion wih exper advice akes one parameer, η. I sars wih q 1 being he uniform disribuion over all expers, and for any 1, consrucs he disribuion q +1 using he following updae rule: q +1 (h := q (h exp( ηy h /Z, where Z is he normalizaion consan required o make q +1 a disribuion, i.e. h H q +1(h = α

6 heorem 1 Se η = is bounded by 2K log(. log( K. hen he expeced regre of he algorihm using he W forecaser Proof: he W forecaser guaranees (see [1] ha as long as Y h exper h h H q (hy h ow, we have for any exper h E[l (A ] = using η = E[ h H E[Y h ] + η 2 Y h + η 2 0 for all, h, we have for any q (h(y h 2 + log. (3 η h H q (hy h ] (By Lemma 2 E[ h H l (a h + η 2 RK + log η q (h(y h 2 ] + log η (By Lemma 1 and Lemma 3 wih α = 1 2K l (a h + log(, 2 log( RK = 2 log( K. 4.2 Analysis using he PolyIF forecaser (By (3 he PolyIF forecaser for predicion wih exper advice akes wo parameers, η and c > 1. I sars wih q 1 being he uniform disribuion over all expers, and for any 1, consrucs he disribuion q +1 as follows: 1 q +1 (h = [η( τ=1 Y τ h + C +1 ] c where C +1 is a consan chosen so ha q +1 is a disribuion, i.e. h H q +1(h = 1. heorem 2 Se c = log( 8 K and η = 2 1 2c [c(rk 1 1 c ] 1 2. algorihm using he PolyIF forecaser is bounded by 4 K log( 8 K. Proof: Audiber e al. [3] prove ha for he PolyIF forecaser, as long as Y h have for any exper h : hen he expeced regre of he 0 for all, h, we h H q (hy h Y h + cη 2 (q (h 1+ 1 c (Y h 2 + h H c 1 c η(c 1. (4 6

7 ow, we have for any exper h E[l (A ] = E[ h H E[Y h ] + cη 2 q (hy h ] (By Lemma 2 E[ h H l (a h + cη 2 (RK 1 1 c c η (q (h 1+ 1 c (Y h 2 ] c η (By Lemma 1 and Lemma 3 wih α = c l (a h + 2 crk ( RK 1 c, (Using η = 2 1 2c [c(rk 1 1 c ] 1 2 K log( 8 l (a h K + 4, (By (4, using c 2 using c = log( 8 K = log( 8 RK. 4.3 Exension o Changing umber of Queried Expers he algorihm and is analysis exends easily o he siuaion where he number of expers queried is no fixed bu can change from round o round. Specifically, a ime, he learner is old he number of expers ha can be queried in ha round. In his seing, consider he following varian of he algorihm. In each round, he expers are re-pariioned ino as / groups 5 of size. he res of he algorihm says he same: viz. an exper is chosen from he curren probabiliy disribuion over he expers, and he group i belongs o is chosen for querying for exper advice. he updae o he disribuion and he loss esimaors are he same as in Algorihm 1. he analysis of Algorihm 1 relies on Lemmas 1, 2 and 3 all of which concern a specific round, and he re-pariioning doesn affec hem. hus, we easily obain he following bound: heorem 3 In he seing where in each round he number of expers ha can be queried in ha round is specified, he exension of Algorihm 1 which re-pariions he expers in each round ino R := / groups of size, has he following regre bound. For every round, le K = min{k, } and = arg max. If he W forecaser is used wih η = hen he expeced regre is bounded by 2K log( log( K /,. If he PolyIF forecaser is used wih c = log( 8 K and η = 2 1 2c [c (R K 1 1 c ] 1 2, hen he expeced regre is bounded by 4 5 Again, here we assume divides for convenience. K log( 8 /K. 7

8 5 Lower Bound In his secion, we show a lower bound on he regre of any algorihm for he muliarmed bandi wih limied exper advice seing which shows ha our upper bound is nearly igh. o describe he lower bound, consider he well-sudied balls-ino-bins process. Here balls are ossed randomly ino K bins. In each oss a bin is chosen uniformly a random from he K bins independenly of oher osses. Define he funcion f(k, o be he expeced number of balls in he bin wih he maximum number of balls. I is well-known (see, for example, [6] ha f(k, = O(max{log(K, K }. Wih his definiion, we can prove he following lower bound. oe ha his lower bound doesn follow from a similar lower bound in [8] because in heir seing he expers losses can be all uncorrelaed, whereas in our seing he expers losses are necessarily correlaed because here are only K arms. heorem 4 For any algorihm for he muliarmed bandi wih limied exper advice seing, here is a sequence of exper advice and (losses for each arm so ha he expeced regre of he algorihm is a leas Ω ( f(k, = Ω min{k, log(k } Proof: he lower bound is based on sandard informaion heoreic argumens (see, e.g. [4]. Le B(p be he Bernoulli disribuion wih parameer p, i.e. 1 is chosen wih probabiliy p and 0 wih probabiliy 1 p. In he following, we assume he online algorihm is deerminisic: he exension o randomized algorihms is easy by condiioning on he random seed of he algorihm, since he sequence of advice and losses we consruc do no depend on he algorihm. Fix he parameer ε := 1 16 f(k,. he exper advice and he losses of he arms are generaed randomly as follows.we define probabiliy disribuions over advice and losses, P h for all h H. Fix an h H, and define P h as follows. In each round, for all expers h H, we se heir advice o be a uniformly random arm in A. Recall ha he arm chosen by exper h in round is a h. Condiioned on he choice of he arm a h, he loss of arm a h is chosen from B( 1 2 ε, and he loss of all arms a a h from B( 1 2, independenly. Uncondiionally, he disribuion of he loss of any arm a a any ime is B(p where p = 1 K ( 1 2 ε + K 1 K 1 2 = 1 2 ε K. A similar calculaion shows ha for all expers h h, he disribuion of he loss of heir chosen arm is B(p and hus has expecaion p, and he expeced loss of he arm chosen by h is 1 2 ε. hus he bes exper is h. Le E h denoe expecaion under P h. Consider anoher probabiliy disribuion P 0 over advice and losses: in all rounds, all expers choose heir arms in A uniformly a random as before, and all arms have loss disribued as B(p. Le E 0 denoe he expecaion of random variables under P 0. Before round 1, we choose an exper h H uniformly a random, and advice and losses are hen generaed from P h. In round, le S denoe he se of expers chosen by he algorihm o query. Lemma 4 below shows ha if eiher of he evens [h / S ] or [h S, A a h algorihm suffers an expeced regre of a leas ε/2. Define he random variables L h = I[h S ] and h =. I[h S, A = a h ]. ] happens, he 8

9 hen o ge a lower bound on he expeced regre we need o upper bound E h [ h ]. o do his, we use argumens based on KL-divergence beween he disribuions P h and P 0. Specifically, for all, le H = (G 1, l 1 (A 1, (G 2, l 2 (A 2,..., (G, l (A denoe he hisory up o ime ; here, G τ = {(h, a h τ h S τ } is he se of pairs of expers and heir advice for he expers queried a ime τ. For convenience, we define H 0 = {}, he empy se. oe ha since he algorihm is assumed o be deerminisic, h is a deerminisic funcion of he hisory H. hus o upper bound E h [ h ] we compue an upper bound on KL(P 0 (H P h (H. Lemma 5 below shows ha hus, by Pinsker s inequaliy, we ge d V (P 0 (H, P h (H Since h [0, ], his implies ha KL(P 0 (H P h (H 6ε 2 E 0 [ h ] + 4ε2 K 2 E 0[L h ]. 1 2 KL(P 0(H P h (H E h [ h ] E 0 [ h ] + 3ε 2 E 0 [ h ] + 2ε2 K 2 E 0[L h ]. 3ε 2 E 0 [ h ] + 2ε2 K 2 E 0[L h ]. By Jensen s inequaliy applied o he concave square roo funcion, we ge 1 E h [ h ] 1 [ ] [ ] E 0 [ h ] + 1 3ε 2 E 0 [ h ] + 2ε2 1 K 2 E 0 [L h ] f(k, f(k, + 3ε2 + 2ε 2 K 2 (5 3 f(k, 4 + 2ε. (6 Inequaliy (5 follows from Lemma 6 below using E 0 [L h ] = P 0 [h S ] = E 0 [ S ] and E 0 [ h ] = P 0 [h S, A = a h ] E 0 [f(k, S ] f(k,. (7 o obain inequaliy (6, we upper bound 2ε 2 K 2 by ε2 f(k, because f(k, is a leas he expeced number of balls in each bin, which equals K, and so f(k, 2 for K 2. As for K 2 he f(k, erm, we bound i using he fac ha f(k, f(2, for K 2 (since f is clearly monoonically decreasing in he firs argumen and monoonically increasing in he second, and f(2, for 2. 9

10 ow, aking expecaion over he choice of he exper h, he expeced regre of he algorihm is a leas 1 ε 2 ( E h [ h ] ε f(k, 8 ε2 = f(k, = Ω min{k, log(k }, using he seing ε = 1 16 f(k, and he fac ha f(k, = O(max{log(K, K }. Lemma 4 Suppose h is he exper chosen in he beginning and advice and losses are hen generaed from P h. hen in any round, if eiher of he evens [h / S ] or [h S, A a h ] happens, he algorihm suffers an expeced regre of a leas ε/2. Proof: Firs, recall ha he exper h always incurs an expeced loss of 1 2 ε in each round. ow if h / S, hen he losses of he arms are independen of he advice of he expers in S, and hence heir disribuion condiioned on he advice of expers in S is B(p. hus, he disribuion of he chosen arm A is also B(p, which implies ha he algorihm suffers an expeced regre of p ( 1 2 ε = ε(1 1/K ε/2. If h S bu A a h, hen he disribuion of he loss of A, condiioned on he advice of he expers in S, is B( 1 2. his implies ha he algorihm suffers an expeced regre of 1 2 ( 1 2 ε = ε ε/2. Lemma 5 We have Proof: We have KL(P 0 (H P h (H 6ε 2 E 0 [ h ] + 4ε2 K 2 E 0[L h ]. KL(P 0 (H P h (H = = = = KL(P 0 ((G, l (A H 1 P h ((G, l (A H 1 (8 [KL(P 0 (l (A H 1, G P h (l (A H 1, G + KL(P 0 (G H 1 P h (G H 1 ] (9 KL(P 0 (l (A H 1, G P h (l (A H 1, G (10 P 0 [h S, A = a h ]KL(B(p B( 1 2 ε + P 0 [h S, A a h ]KL(B(p B( P 0 [h / S ]KL(B(p B(p (11 10

11 P 0 [h S, A = a h ] 6ε 2 + P 0 [h S, A a h ] 4ε2 K 2 (12 6ε 2 P 0 [h S, A = a h ] + 4ε2 K 2 P 0[h S ] = 6ε 2 E 0 [ h ] + 4ε2 K 2 E 0[L h ]. Equaliies (8 and (9 follow from he chain rule for relaive enropy. Equaliy (10 follows because he disribuion of G condiioned on H 1 is idenical in P 0 and P h. Equaliy (11 follows under P 0, he loss of he chosen arm always follows B(p, and under P h, if h / S, hen he loss of he chosen arm follows B(p, if h S and A = a h, hen he loss of he chosen arm follows B( 1 2 ε, and if h S and A a h, hen he loss of he chosen arm follows B( 1 2. Finally, inequaliy (12 follows using sandard calculaions for KL-divergence beween Bernoulli random variables. Recall ha f(k, is he expeced number of balls in he bin wih he maximum balls in a -balls-ino-k-bins process. Lemma 6 For all, we have P 0 [h S ] = E 0 [ S ] and P 0 [h S, A = a h ] E 0 [f(k, S ]. Proof: Firs, we have [ ] P 0 [h S ] = E 0 I[h S ] = E 0 [ S ]. ex, we have [ P 0 [h S, A = a h ] = E 0 ] E 0 [max a A { {h S : a = a h } } = E 0 [f(k, S ]. I[h S, A = a h ] ] [ ] = E 0 {h S : A = a h } [ ]] = E 0 E 0 [max a A { {h S : a = a h } } S he penulimae equaliy follows because condiioning on he choice of S, he random variable max a A { {h S : a = a h } } is compleely deermined by he choice of he arms recommended by he expers h S. Since hese arms are chosen uniformly a random from A independenly for each exper h S, we can hink of he S expers in S as balls and he K arms in A as bins in a balls-ino-bins process. hen he random variable of ineres is exacly he number of balls in he bin wih maximum number of balls. he expecaion of his random variable is f(k, S. 5.1 Exension o Global Limi on Queries In cerain siuaions a global limi on he number of queries made o expers over he enire run of he algorihm, raher han a per-round limi, is more naural. hen he analysis of heorem 4 can be exended easily o give he following heorem (proved in Appendix A: 11

12 heorem 5 In he seing of he muliarmed bandis wih limied exper advice problem where here is a global limi of queries o expers over he rounds, for any algorihm, here is a sequence of( exper advice and losses for each arm so ha he expeced regre of he algorihm is a leas Ω min{k, log(k }. his shows ha he up o logarihmic facors, he opimal allocaion of queries over he rounds is he uniform allocaion of queries per round. 5.2 Exension o Changing umber of Queried Expers he lower bound also exends o he seing of Secion 4.3 where in each round, he learner is old he number of expers ha can be queried,. he analysis is basically he same wih a few modificaions o handle he changing number of expers o be queried. In Appendix A, we prove he following heorem: heorem 6 For any algorihm working in he seing where he algorihm is old he number of expers ha can be queried in each round, here is a sequence of exper advice and (losses for each ( min{k, arm so ha he expeced regre of he algorihm is a leas Ω = Ω f(k, log(k }. 6 Conclusions In his paper, we presened near-opimal algorihms for he muliarmed bandis wih limied exper advice problem, solving he COL 2013 open problem of Seldin e al. [7]. he upper bound uses a novel grouping idea combined wih a sandard expers learning algorihm, whereas he lower bound uses an informaion-heoreic approach and a connecion o he classic ball-ino-bins problem o ge a nearly-igh dependence on he problem parameers. he binning sraegy migh be useful in oher conexs such as seings where here may be non-uniform cos associaed wih he advice for each exper. An ineresing open quesion is o close he sub-logarihmic gap beween he upper and lower bounds. Acknowledgmens he auhor hanks Elad Hazan, Dean Foser, Rob Schapire, and Yevgeny Seldin for discussions on his problem. References [1] Sanjeev Arora, Elad Hazan, and Sayen Kale. he uliplicaive Weighs Updae ehod: a ea-algorihm and Applicaions. heory of Compuing, 8(1: , [2] Jean-Yves Audiber and Sébasien Bubeck. Regre bounds and minimax policies under parial monioring. Journal of achine Learning Research, 11: ,

13 [3] Jean-Yves Audiber, Sébasien Bubeck, and Gábor Lugosi. inimax policies for combinaorial predicion games. Journal of achine Learning Research - Proceedings rack, 19: , [4] Peer Auer, icolò Cesa-Bianchi, Yoav Freund, and Rober E. Schapire. he nonsochasic muliarmed bandi problem. SIA J. Compu., 32(1:48 77, [5] ick Lilesone and anfred K. Warmuh. he weighed majoriy algorihm. Inf. Compu., 108(2: , [6] arin Raab and Angelika Seger. Balls ino Bins - A Simple and igh Analysis. In RA- DO, pages , [7] Yevgeny Seldin, Koby Crammer, and Peer Barle. Open Problem: Adversarial uliarmed Bandis wih Limied Advice. In COL, [8] Yevgeny Seldin, Peer L. Barle, Koby Crammer, and Yasin Abbasi-Yadkori. Predicion wih limied advice and muliarmed bandis wih paid observaions. In ICL,

14 A Proofs of Exensions o Lower Bounds In his secion, we provide missing proofs for exensions o lower bounds on he regre. A.1 Global Limi on Queries: Proof of heorem 5. Firs, noe ha since f(k, = O(max{log(K, K }, we have ha f(k, g(k, := c(log(k + K for some consan c. oe ha g is linear in is second argumen (as opposed o f so i is easier o manipulae. We use he exac same consrucion of exper advice and losses as in he proof of heorem 4, wih he choice of ε = 1 16 g(k,. he only change ha needs o be made o he proof is in inequaliy (7, which now becomes [ ] E 0 [ h ] E 0 [f(k, S ] E 0 [g(k, S ] = E 0 g(k, S g(k,. Since, as proved in he paragraph afer inequaliy (7, we have f(k, 3 4 also have ha E 0 [ h ] E 0 [f(k, S ] 3 4. Using hese wo bounds, we can now derive he following analogue of inequaliy (6: 1 E h [ h ] ε g(k,. for all, we 1 he res of he analysis goes hrough jus as before, and yields a regre lower bound of 256 g(k, ( = Ω min{k, log(k }. A.2 Changing umber of Queried Expers: Proof of heorem 6. We use he essenially he same consrucion as in he proof of heorem 4 bu wih one imporan wis: he ε conrolling he loss of he bes exper changes in each round. Specifically, in round, we se and he loss of he arm a h chosen from B( 1 2 as before. ε = 16 /f(k, τ=1 /f(k, τ is chosen from B( 1 2 ε. he losses of all oher arms a a h We now urn o he analysis. Firs, we noe ha since Lemma 4 gives a lower bound on expeced regre in specific rounds, summing over all rounds, we conclude ha he expeced regre of he algorihm is a leas ε 2 (1 I[h S, A = a h ]. 14 are

15 hus, define he random variable G h = ε 2 I[h S, A = a h ]. o ge a lower bound on regre, we need o upper bound his random variable. ex, because ε changes in differen rounds, we need o consider slighly differen random variables: L h = ε 2 I[h S ] and h = ε 2 I[h S, A = a h ]. Wih his definiion, he saemen of Lemma 5 exends easily o he following: KL(P 0 (H P h (H 6E 0 [ h ] + 4 K 2 E 0[L h ]. Define U = ε 2. Coninuing he analysis as in he proof of heorem 4, using Pinsker s inequaliy, he fac ha G h [0, U], and Jensen s inequaliy applied o he concave square roo funcion o conclude ha 1 E h [G h ] 1 [ ] [ ] E 0 [G h ] + U 1 3 E 0 [ h ] K 2 E 0 [L h ] ε 2 f(k, + U 3ε 2 f(k, + 2ε 2 K 2 (13 3U 4 + 2U ε 2 f(k,. (14 Inequaliy (13 follows from Lemma 6 using he following bounds: E 0 [L h ] = ε 2 P 0 [h S ] = ε 2 E 0 [ S ] ε 2, and E 0 [ h ] = ε 2 P 0 [h S, A = a h ] ε 2 E 0 [f(k, S ] ε 2 f(k,, E 0 [G h ] = ε 2 P 0[h S, A = a h ] ε 2 E 0[f(K, S ] ε 2 f(k,. Inequaliy (14 follows from he bound f(k, 2 K 2 for K 2, and he bound f(k, 3 4 for 2. Finally, aking expecaion over he choice of he exper h, he expeced regre of he 15

16 algorihm is a leas 1 (U G h U 4 2U ε 2 f(k, = f(k, = Ω min{k, log(k } using he definiion of ε. his gives us he required lower bound on he expeced regre., 16

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Approximation Algorithms for Unique Games via Orthogonal Separators

Approximation Algorithms for Unique Games via Orthogonal Separators Approximaion Algorihms for Unique Games via Orhogonal Separaors Lecure noes by Konsanin Makarychev. Lecure noes are based on he papers [CMM06a, CMM06b, LM4]. Unique Games In hese lecure noes, we define

More information

Notes on online convex optimization

Notes on online convex optimization Noes on online convex opimizaion Karl Sraos Online convex opimizaion (OCO) is a principled framework for online learning: OnlineConvexOpimizaion Inpu: convex se S, number of seps T For =, 2,..., T : Selec

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

Non-Stochastic Bandit Slate Problems

Non-Stochastic Bandit Slate Problems Non-Sochasic Bandi Slae Problems Sayen Kale Yahoo! Research Sana Clara, CA skale@yahoo-inccom Lev Reyzin Georgia Ins of echnology Alana, GA lreyzin@ccgaechedu Absrac Rober E Schapire Princeon Universiy

More information

Lecture 33: November 29

Lecture 33: November 29 36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure

More information

Online Learning with Partial Feedback. 1 Online Mirror Descent with Estimated Gradient

Online Learning with Partial Feedback. 1 Online Mirror Descent with Estimated Gradient Avance Course in Machine Learning Spring 2010 Online Learning wih Parial Feeback Hanous are joinly prepare by Shie Mannor an Shai Shalev-Shwarz In previous lecures we alke abou he general framework of

More information

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070

More information

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8) I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression

More information

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H.

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H. ACE 56 Fall 5 Lecure 8: The Simple Linear Regression Model: R, Reporing he Resuls and Predicion by Professor Sco H. Irwin Required Readings: Griffihs, Hill and Judge. "Explaining Variaion in he Dependen

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin ACE 56 Fall 005 Lecure 4: Simple Linear Regression Model: Specificaion and Esimaion by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Simple Regression: Economic and Saisical Model

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

Topics in Machine Learning Theory

Topics in Machine Learning Theory Topics in Machine Learning Theory The Adversarial Muli-armed Bandi Problem, Inernal Regre, and Correlaed Equilibria Avrim Blum 10/8/14 Plan for oday Online game playing / combining exper advice bu: Wha

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and

More information

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3 Macroeconomic Theory Ph.D. Qualifying Examinaion Fall 2005 Comprehensive Examinaion UCLA Dep. of Economics You have 4 hours o complee he exam. There are hree pars o he exam. Answer all pars. Each par has

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits DOI: 0.545/mjis.07.5009 Exponenial Weighed Moving Average (EWMA) Char Under The Assumpion of Moderaeness And Is 3 Conrol Limis KALPESH S TAILOR Assisan Professor, Deparmen of Saisics, M. K. Bhavnagar Universiy,

More information

Online Learning with Queries

Online Learning with Queries Online Learning wih Queries Chao-Kai Chiang Chi-Jen Lu Absrac The online learning problem requires a player o ieraively choose an acion in an unknown and changing environmen. In he sandard seing of his

More information

Comparing Means: t-tests for One Sample & Two Related Samples

Comparing Means: t-tests for One Sample & Two Related Samples Comparing Means: -Tess for One Sample & Two Relaed Samples Using he z-tes: Assumpions -Tess for One Sample & Two Relaed Samples The z-es (of a sample mean agains a populaion mean) is based on he assumpion

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

SELBERG S CENTRAL LIMIT THEOREM ON THE CRITICAL LINE AND THE LERCH ZETA-FUNCTION. II

SELBERG S CENTRAL LIMIT THEOREM ON THE CRITICAL LINE AND THE LERCH ZETA-FUNCTION. II SELBERG S CENRAL LIMI HEOREM ON HE CRIICAL LINE AND HE LERCH ZEA-FUNCION. II ANDRIUS GRIGUIS Deparmen of Mahemaics Informaics Vilnius Universiy, Naugarduko 4 035 Vilnius, Lihuania rius.griguis@mif.vu.l

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

Stability and Bifurcation in a Neural Network Model with Two Delays

Stability and Bifurcation in a Neural Network Model with Two Delays Inernaional Mahemaical Forum, Vol. 6, 11, no. 35, 175-1731 Sabiliy and Bifurcaion in a Neural Nework Model wih Two Delays GuangPing Hu and XiaoLing Li School of Mahemaics and Physics, Nanjing Universiy

More information

Games Against Nature

Games Against Nature Advanced Course in Machine Learning Spring 2010 Games Agains Naure Handous are joinly prepared by Shie Mannor and Shai Shalev-Shwarz In he previous lecures we alked abou expers in differen seups and analyzed

More information

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Stochastic models and their distributions

Stochastic models and their distributions Sochasic models and heir disribuions Couning cusomers Suppose ha n cusomers arrive a a grocery a imes, say T 1,, T n, each of which akes any real number in he inerval (, ) equally likely The values T 1,,

More information

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes

More information

Let us start with a two dimensional case. We consider a vector ( x,

Let us start with a two dimensional case. We consider a vector ( x, Roaion marices We consider now roaion marices in wo and hree dimensions. We sar wih wo dimensions since wo dimensions are easier han hree o undersand, and one dimension is a lile oo simple. However, our

More information

Online Learning, Regret Minimization, Minimax Optimality, and Correlated Equilibrium

Online Learning, Regret Minimization, Minimax Optimality, and Correlated Equilibrium Algorihm Online Learning, Regre Minimizaion, Minimax Opimaliy, and Correlaed Equilibrium High level Las ime we discussed noion of Nash equilibrium Saic concep: se of prob Disribuions (p,q, ) such ha nobody

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given

More information

Expert Advice for Amateurs

Expert Advice for Amateurs Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

References are appeared in the last slide. Last update: (1393/08/19)

References are appeared in the last slide. Last update: (1393/08/19) SYSEM IDEIFICAIO Ali Karimpour Associae Professor Ferdowsi Universi of Mashhad References are appeared in he las slide. Las updae: 0..204 393/08/9 Lecure 5 lecure 5 Parameer Esimaion Mehods opics o be

More information

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux Gues Lecures for Dr. MacFarlane s EE3350 Par Deux Michael Plane Mon., 08-30-2010 Wrie name in corner. Poin ou his is a review, so I will go faser. Remind hem o go lisen o online lecure abou geing an A

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates Biol. 356 Lab 8. Moraliy, Recruimen, and Migraion Raes (modified from Cox, 00, General Ecology Lab Manual, McGraw Hill) Las week we esimaed populaion size hrough several mehods. One assumpion of all hese

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

Unit Root Time Series. Univariate random walk

Unit Root Time Series. Univariate random walk Uni Roo ime Series Univariae random walk Consider he regression y y where ~ iid N 0, he leas squares esimae of is: ˆ yy y y yy Now wha if = If y y hen le y 0 =0 so ha y j j If ~ iid N 0, hen y ~ N 0, he

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

4.1 Other Interpretations of Ridge Regression

4.1 Other Interpretations of Ridge Regression CHAPTER 4 FURTHER RIDGE THEORY 4. Oher Inerpreaions of Ridge Regression In his secion we will presen hree inerpreaions for he use of ridge regression. The firs one is analogous o Hoerl and Kennard reasoning

More information

Testing for a Single Factor Model in the Multivariate State Space Framework

Testing for a Single Factor Model in the Multivariate State Space Framework esing for a Single Facor Model in he Mulivariae Sae Space Framework Chen C.-Y. M. Chiba and M. Kobayashi Inernaional Graduae School of Social Sciences Yokohama Naional Universiy Japan Faculy of Economics

More information

Solutions from Chapter 9.1 and 9.2

Solutions from Chapter 9.1 and 9.2 Soluions from Chaper 9 and 92 Secion 9 Problem # This basically boils down o an exercise in he chain rule from calculus We are looking for soluions of he form: u( x) = f( k x c) where k x R 3 and k is

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

Boosting with Online Binary Learners for the Multiclass Bandit Problem

Boosting with Online Binary Learners for the Multiclass Bandit Problem Shang-Tse Chen School of Compuer Science, Georgia Insiue of Technology, Alana, GA Hsuan-Tien Lin Deparmen of Compuer Science and Informaion Engineering Naional Taiwan Universiy, Taipei, Taiwan Chi-Jen

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013 IMPLICI AND INVERSE FUNCION HEOREMS PAUL SCHRIMPF 1 OCOBER 25, 213 UNIVERSIY OF BRIISH COLUMBIA ECONOMICS 526 We have exensively sudied how o solve sysems of linear equaions. We know how o check wheher

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

4 Sequences of measurable functions

4 Sequences of measurable functions 4 Sequences of measurable funcions 1. Le (Ω, A, µ) be a measure space (complee, afer a possible applicaion of he compleion heorem). In his chaper we invesigae relaions beween various (nonequivalen) convergences

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

arxiv:math.fa/ v1 31 Oct 2004

arxiv:math.fa/ v1 31 Oct 2004 A sharp isoperimeric bound for convex bodies Ravi Monenegro arxiv:mah.fa/0408 v 3 Oc 2004 Absrac We consider he problem of lower bounding a generalized Minkowski measure of subses of a convex body wih

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

dt = C exp (3 ln t 4 ). t 4 W = C exp ( ln(4 t) 3) = C(4 t) 3.

dt = C exp (3 ln t 4 ). t 4 W = C exp ( ln(4 t) 3) = C(4 t) 3. Mah Rahman Exam Review Soluions () Consider he IVP: ( 4)y 3y + 4y = ; y(3) = 0, y (3) =. (a) Please deermine he longes inerval for which he IVP is guaraneed o have a unique soluion. Soluion: The disconinuiies

More information

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t Exercise 7 C P = α + β R P + u C = αp + βr + v (a) (b) C R = α P R + β + w (c) Assumpions abou he disurbances u, v, w : Classical assumions on he disurbance of one of he equaions, eg. on (b): E(v v s P,

More information

A Dynamic Model of Economic Fluctuations

A Dynamic Model of Economic Fluctuations CHAPTER 15 A Dynamic Model of Economic Flucuaions Modified for ECON 2204 by Bob Murphy 2016 Worh Publishers, all righs reserved IN THIS CHAPTER, OU WILL LEARN: how o incorporae dynamics ino he AD-AS model

More information

Prediction with Limited Advice and Multiarmed Bandits with Paid Observations

Prediction with Limited Advice and Multiarmed Bandits with Paid Observations Predicion wi Limied Advice and Muliarmed Bandis wi Paid Observaions Yevgeny Seldin Queensland Universiy of ecnology and UC Berkeley Peer Barle UC Berkeley and Queensland Universiy of ecnology Koby Crammer

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

A New Perturbative Approach in Nonlinear Singularity Analysis

A New Perturbative Approach in Nonlinear Singularity Analysis Journal of Mahemaics and Saisics 7 (: 49-54, ISSN 549-644 Science Publicaions A New Perurbaive Approach in Nonlinear Singulariy Analysis Ta-Leung Yee Deparmen of Mahemaics and Informaion Technology, The

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate.

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate. Inroducion Gordon Model (1962): D P = r g r = consan discoun rae, g = consan dividend growh rae. If raional expecaions of fuure discoun raes and dividend growh vary over ime, so should he D/P raio. Since

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Some Ramsey results for the n-cube

Some Ramsey results for the n-cube Some Ramsey resuls for he n-cube Ron Graham Universiy of California, San Diego Jozsef Solymosi Universiy of Briish Columbia, Vancouver, Canada Absrac In his noe we esablish a Ramsey-ype resul for cerain

More information

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC This documen was generaed a :45 PM 8/8/04 Copyrigh 04 Richard T. Woodward. An inroducion o dynamic opimizaion -- Opimal Conrol and Dynamic Programming AGEC 637-04 I. Overview of opimizaion Opimizaion is

More information

Presentation Overview

Presentation Overview Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion

More information

Answers to QUIZ

Answers to QUIZ 18441 Answers o QUIZ 1 18441 1 Le P be he proporion of voers who will voe Yes Suppose he prior probabiliy disribuion of P is given by Pr(P < p) p for 0 < p < 1 You ake a poll by choosing nine voers a random,

More information

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation:

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation: M ah 5 7 Fall 9 L ecure O c. 4, 9 ) Hamilon- J acobi Equaion: Weak S oluion We coninue he sudy of he Hamilon-Jacobi equaion: We have shown ha u + H D u) = R n, ) ; u = g R n { = }. ). In general we canno

More information

11!Hí MATHEMATICS : ERDŐS AND ULAM PROC. N. A. S. of decomposiion, properly speaking) conradics he possibiliy of defining a counably addiive real-valu

11!Hí MATHEMATICS : ERDŐS AND ULAM PROC. N. A. S. of decomposiion, properly speaking) conradics he possibiliy of defining a counably addiive real-valu ON EQUATIONS WITH SETS AS UNKNOWNS BY PAUL ERDŐS AND S. ULAM DEPARTMENT OF MATHEMATICS, UNIVERSITY OF COLORADO, BOULDER Communicaed May 27, 1968 We shall presen here a number of resuls in se heory concerning

More information

Sequential Importance Resampling (SIR) Particle Filter

Sequential Importance Resampling (SIR) Particle Filter Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle

More information

Lecture 4 Notes (Little s Theorem)

Lecture 4 Notes (Little s Theorem) Lecure 4 Noes (Lile s Theorem) This lecure concerns one of he mos imporan (and simples) heorems in Queuing Theory, Lile s Theorem. More informaion can be found in he course book, Bersekas & Gallagher,

More information

Matlab and Python programming: how to get started

Matlab and Python programming: how to get started Malab and Pyhon programming: how o ge sared Equipping readers he skills o wrie programs o explore complex sysems and discover ineresing paerns from big daa is one of he main goals of his book. In his chaper,

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems Essenial Microeconomics -- 6.5: OPIMAL CONROL Consider he following class of opimizaion problems Max{ U( k, x) + U+ ( k+ ) k+ k F( k, x)}. { x, k+ } = In he language of conrol heory, he vecor k is he vecor

More information

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H.

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H. ACE 564 Spring 2006 Lecure 7 Exensions of The Muliple Regression Model: Dumm Independen Variables b Professor Sco H. Irwin Readings: Griffihs, Hill and Judge. "Dumm Variables and Varing Coefficien Models

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

Module 2 F c i k c s la l w a s o s f dif di fusi s o i n

Module 2 F c i k c s la l w a s o s f dif di fusi s o i n Module Fick s laws of diffusion Fick s laws of diffusion and hin film soluion Adolf Fick (1855) proposed: d J α d d d J (mole/m s) flu (m /s) diffusion coefficien and (mole/m 3 ) concenraion of ions, aoms

More information

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims Problem Se 5 Graduae Macro II, Spring 2017 The Universiy of Nore Dame Professor Sims Insrucions: You may consul wih oher members of he class, bu please make sure o urn in your own work. Where applicable,

More information

Supplementary Material

Supplementary Material Dynamic Global Games of Regime Change: Learning, Mulipliciy and iming of Aacks Supplemenary Maerial George-Marios Angeleos MI and NBER Chrisian Hellwig UCLA Alessandro Pavan Norhwesern Universiy Ocober

More information

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS LECTURE : GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS We will work wih a coninuous ime reversible Markov chain X on a finie conneced sae space, wih generaor Lf(x = y q x,yf(y. (Recall ha q

More information

Empirical Process Theory

Empirical Process Theory Empirical Process heory 4.384 ime Series Analysis, Fall 27 Reciaion by Paul Schrimpf Supplemenary o lecures given by Anna Mikusheva Ocober 7, 28 Reciaion 7 Empirical Process heory Le x be a real-valued

More information

18 Biological models with discrete time

18 Biological models with discrete time 8 Biological models wih discree ime The mos imporan applicaions, however, may be pedagogical. The elegan body of mahemaical heory peraining o linear sysems (Fourier analysis, orhogonal funcions, and so

More information

INDEPENDENT SETS IN GRAPHS WITH GIVEN MINIMUM DEGREE

INDEPENDENT SETS IN GRAPHS WITH GIVEN MINIMUM DEGREE INDEPENDENT SETS IN GRAPHS WITH GIVEN MINIMUM DEGREE JAMES ALEXANDER, JONATHAN CUTLER, AND TIM MINK Absrac The enumeraion of independen ses in graphs wih various resricions has been a opic of much ineres

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updaes Michael Kearns AT&T Labs mkearns@research.a.com Sainder Singh AT&T Labs baveja@research.a.com Absrac We give he firs rigorous upper bounds on he error

More information