Near Minimax Optimal Players for the Finite-Time 3-Expert Prediction Problem

Size: px
Start display at page:

Download "Near Minimax Optimal Players for the Finite-Time 3-Expert Prediction Problem"

Transcription

1 Near Minimax Opimal Players for he Finie-Time 3-Exper Predicion Problem Yasin Abbasi-Yadkori Adobe Research Peer L. Barle UC Berkeley Vicor Gabillon Queensland Universiy of Technology Absrac We sudy minimax sraegies for he online predicion problem wih exper advice. I has been conjecured ha a simple adversary sraegy, called COMB, is near opimal in his game for any number of expers. Our resuls and new insighs make progress in his direcion by showing ha, up o a small addiive erm, COMB is minimax opimal in he finie-ime hree exper problem. In addiion, we provide for his seing a new near minimax opimal COMB-based learner. Prior o his work, in his problem, learners obaining he opimal muliplicaive consan in heir regre rae were known only when K = or K. We characerize, when K = 3, he regre of he game scaling as 8/(9)T ± log(t ) which gives for he firs ime he opimal consan in he leading ( T ) erm of he regre. Inroducion This paper sudies he online predicion problem wih exper advice. This is a fundamenal problem of machine learning ha has been sudied for decades, going back a leas o he work of Hannan [] (see [4] for a survey). As i sudies predicion under adversarial daa he designed algorihms are known o be robus and are commonly used as building blocks of more complicaed machine learning algorihms wih numerous applicaions. Thus, elucidaing he ye unknown opimal sraegies has he poenial o significanly improve he performance of hese higher level algorihms, in addiion o providing insigh ino a classic predicion problem. The problem is a repeaed wo-player zero-sum game beween an adversary and a learner. A each of he T rounds, he adversary decides he qualiy/gain of K expers advice, while simulaneously he learner decides o follow he advice of one of he expers. The objecive of he adversary is o maximize he regre of he learner, defined as he difference beween he oal gain of he learner and he oal gain of he bes fixed exper. Open Problems and our Main Resuls. Previously his game has been solved asympoically as boh T and K end o : asympoically he upper bound on he performance of he sae-of-hear Muliplicaive Weighs Algorihm (MWA) for he learner maches he opimal muliplicaive consan of he asympoic minimax opimal regre rae (T/) log K [3]. However, for finie K, his asympoic quaniy acually overesimaes he finie-ime value of he game. Moreover, Gravin e al. [] proved a maching lower bound (T/) log K on he regre of he classic version of MWA, addiionally showing ha he opimal learner does no belong an exended MWA family. Already, Cover [5] proved ha he value of he game is of order of T/() when K =, meaning ha he regre of a MWA learner is 47% larger ha he opimal learner in his case. Therefore he quesion of opimaliy remains open for non-asympoic K which are he ypical cases in applicaions. In sudying a relaed seing wih K = 3, where T is sampled from a geomeric disribuion wih parameer δ, Gravin e al. [9] conjecured ha, for any K, a simple adversary sraegy, called he COMB adversary, is asympoically opimal (T, or when δ ), and also excessively compeiive for finie-ime fixed T. The COMB sraegy sors he expers based on heir cumulaive gains and, wih probabiliy one half, assigns gain one o each exper in an odd posiion and gain zero 3s Conference on Neural Informaion Processing Sysems (NIPS 7), Long Beach, CA, USA.

2 o each exper in an even posiion. Wih probabiliy one half, he zeros and ones are swapped. The simpliciy and elegance of his sraegy, combined wih is almos opimal performance makes i very appealing and calls for a more exensive sudy of is properies. Our resuls and new insighs make progress in his direcion by showing ha, for any fixed T and up o small addiive erms, COMB is minimax opimal in he finie-ime hree exper problem. Addiionally and wih similar guaranees, we provide for his seing a new near minimax opimal COMB-based learner. For K = 3, he regre of a MWA learner is 39% larger han our new opimal learner. In his paper we also characerize, when K = 3, he regre of he game as 8/(9)T ± log(t ) which gives for he firs ime he opimal consan in he leading ( T ) erm of he regre. Noe ha he sae-of-he-ar non-asympoic lower bound in [5] on he value of his problem is non informaive as he lower bound for he case of K = 3 is a negaive quaniy. Relaed Works and Challenges. For he case of K = 3, Gravin e al. [9] proved he exac minimax opimaliy of a COMB-relaed adversary in he geomerical seing, i.e. where T is no fixed in advance bu raher sampled from a geomeric disribuion wih parameer δ. However he connecion beween he geomerical seing and he original finie-ime seing is no well undersood, even asympoically (possibly due o he large variance of geomeric disribuions wih small δ). Addressing his issue, in Secion 7 of [8], Gravin e al. formulae he Finie vs Geomeric Regre conjecure which saes ha he value of he game in he geomerical seing, V α, and he value of he game in he finie-ime seing, V T, verify V T = V α=/t. We resolve here he conjecure for K = 3. Analyzing he finie-ime exper problem raises new challenges compared o he geomeric seing. In he geomeric seing, a any ime (round) of he game, he expeced number of remaining rounds before he end of he game is consan (does no depend on he curren ime ). This simplifies he problem o he poin ha, when K = 3, here exiss an exacly minimax opimal adversary ha ignores he ime and he parameer δ. As noed in [9], and noiceable from solving exacly small insances of he game wih a compuer, in he finie-ime case, he exac opimal adversary seems o depend in a complex manner on ime and sae. I is herefore naural o compromise for a simpler adversary ha is opimal up o a small addiive error erm. Acually, based on he observaion of he resriced compuer-based soluions, he addiive error erm of COMB seems o vanish wih larger T. Tighly conrolling he errors made by COMB is a new challenge wih respec o [9], where he soluion o he opimaliy equaions led direcly o he exac opimal adversary. The exisence of such equaions in he geomeric seing crucially relies on he fac ha he value-o-go of a given policy in a given sae does no depend on he curren ime (because geomeric disribuions are memoryless). To conrol he errors in he finie-ime seing, our new approach solves he game by backward inducion showing he approximae greediness of COMB wih respec o iself (read Secion. for an overview of our new proof echniques and heir organizaion). We use a novel exchangeabiliy propery, new connecions o random walks and a close relaion ha we develop beween COMB and a TWIN-COMB sraegy. Addiional connecions wih new relaed opimal sraegies and random walks are used o compue he value of he game (Theorem ). We discuss in Secion 6 how our new echniques have more poenial o exend o an arbirary number of arms, han hose of [9]. Addiionally, we show how he approximae greediness of COMB wih respec o iself is key o proving ha a learner based direcly on he COMB adversary is iself quasi-minimax-opimal. This is he firs work o exend o he approximae case, approaches used o designed exacly opimal players in relaed works. In [] a probabiliy maching learner is proven opimal under he assumpion ha he adversary is limied o a fixed cumulaive loss for he bes exper. In [4] and [], he opimal learner relies on esimaing he value-o-go of he game hrough rollous of he opimal adversary s plays. The resuls in hese papers were limied o games where he opimal adversary was only playing canonical uni vecor while our resul holds for general gain vecors. Noe also ha a probabiliy maching learner is opimal in [9]. Noaion: Le [a : b] = {a, a +,..., b} wih a, b N, a b, and [a] = [ : a]. For a vecor w R n, n N, w = max k [n] w k. A vecor indexed by boh a ime and a specific elemen index k is w,k. An undiscouned Markov Decision Process (MDP) [3, 6] M is a 4-uple S, A, r, p. S is he sae space, A is he se of acions, r : S A R is he reward funcion, and he ransiion model p( s, a) gives he probabiliy disribuion over he nex sae when acion a is aken in sae s. A sae is denoed by s or s if i is aken a ime. An acion is denoed by a or a. [9] also provides an upper-bound ha is subopimal when K = 3 even afer opimizaion of is parameers.

3 The Game We consider a game, composed of T rounds, beween wo players, called a learner and an adversary. A each ime/round he learner chooses an index I [K] from a disribuion p on he K arms. Simulaneously, he adversary assigns a binary gain o each of he arms/expers, possibly a random from a disribuion Ȧ, and we denoe he vecor of hese gains by g {, } K. The adversary and he learner hen observe I and g. For simpliciy we use he noaion g [] = (g s ) s=,...,. The value of one realizaion of such a game is he cumulaive regre defined as R T = g g,i. = A sae s S = (N {}) K is a K-dimensional vecor such ha he k-h elemen is he cumulaive sum of gains deal by he adversary on arm k before he curren ime. Here he sae does no include bu is ypically denoed for a specific ime as s and compued as s = = g. This definiion is moivaed by he fac ha here exis minimax sraegies for boh players ha rely solely on he sae and ime informaion as opposed o he complee hisory of plays, g [] I []. In sae s, he se of leading expers, i.e., hose wih maximum cumulaive gain, is X(s) = {k [K] : s k = s }. We use o denoe he (possibly non-saionary) sraegy/policy used by he adversary, i.e., for any inpu sae s and ime i oupus he gain disribuion (s, ) played by he adversary a ime in sae s. Similarly we use p o denoe he sraegy of he learner. As he sae depends only on he adversary plays, we can sample a sae s a ime from. Given an adversary and a learner p, he expeced regre of he game, V T p,, is V T p, = E g[t ],I [T ] p [R T ]. The learner ries o minimize he expeced regre while he adversary ries o maximize i. The value of he game is he minimax value V T defined by V T = min max p = V T p, = max min V T p,. p In his work, we are ineresed in he search for opimal minimax sraegies, which are adversary sraegies such ha V T = min p V T p, and learner sraegies p, such ha V T = max V T p,.. Summary of our Approach o Obain he Near Greediness of COMB Mos of our maerial is new. Firs, Secion 3 recalls ha Gravin e al. [9] have shown ha he search for he opimal adversary can be resriced o he finie family of balanced sraegies (defined in he nex secion). When K = 3, he acion space of a balanced adversary is limied o seven sochasic acions (gain disribuions), denoed by Ḃ3 = {Ẇ, Ċ, V,,, {}, {3}} (see Secion 5. for heir descripion). The COMB adversary repeas he gain disribuion Ċ a each ime and in any sae. In Secion 4 we provide an explici formulaion of he problem as finding inside an MDP wih a specific reward funcion. Ineresingly, we observe ha anoher adversary, which we call TWIN- COMB and denoe by W, which repeas he disribuion Ẇ, has he same value as C (Secion 5.). To conrol he errors made by COMB, he proof uses a novel and inriguing exchangeabiliy propery (Secion 5.). This exchangeabiliy propery holds hanks o he surprising role played by he TWIN- COMB sraegy. For any disribuions Ȧ Ḃ3 here exiss a disribuion Ḋ, mixure of Ċ and Ẇ, such ha for almos all saes, playing Ȧ and hen Ḋ is he same as playing Ẇ and hen Ȧ in erms of he expeced reward and he probabiliies over he nex saes afer hese wo seps. Using Bellman operaors, his can be concisely wrien as: for any (value) funcion f : S R, in (almos) any sae s, we have ha [TȦ[TḊf]](s) = [TẆ[TȦf]](s). We solve he MDP wih a backward inducion in ime from = T. We show ha playing Ċ a ime is almos greedy wih respec o playing C in laer rounds >. The greedy error is defined as he difference of expeced reward beween always playing C and playing he bes (greedy) firs acion before playing COMB. Bounding how hese errors accumulae hrough he rounds relaes he value of COMB o he value of (Lemma 6). To illusrae he main ideas, le us firs make wo simplifying (bu unrealisic) assumpions a ime : COMB has been proven greedy w.r.. iself in rounds > and he exchangeabiliy holds in all saes. Then we would argue a ime ha by he exchangeabiliy propery, insead of opimizing he greedy 3

4 acion w.r.. COMB as maxȧ Ḃ 3 ȦĊ... Ċ, we can sudy he opimizer of maxȧ Ḃ 3 ẆȦĊ... Ċ. Then we use he inducion propery o conclude ha Ċ is he soluion of he previous opimizaion problem. Unforunaely, he exchangeabiliy propery does no hold in one specific sae denoed by s α. Wha saves us hough is ha we can direcly compue he error of greedificaion of any gain disribuion wih respec o COMB in s α and show ha i diminishes exponenially fas as T, he number of rounds remaining, increases (Lemma 7). This helps us o conrol how he errors accumulae during he inducion. From one given sae s s α a ime, firs, we use he exchangeabiliy propery once when rying o assess he qualiy of an acion Ȧ as a greedy acion w.r.. COMB. This leads us o consider he qualiy of playing Ȧ in possibly several new saes {s + } a ime + reached following TWIN-COMB in s. We use our exchangeabiliy propery repeaedly, saring from he sae s unil a subsequen sae reaches s α, say a ime α, where we can subsiue he exponenially decreasing greedy error compued a his ime α in s α. Here he subsequen saes are he saes reached afer having played TWIN-COMB repeiively saring from he sae s. If s α is never reached we use he fac ha COMB is an opimal acion everywhere else in he las round. The problem is hen o deermine a which ime α, saring from any sae a ime and following a TWIN-COMB sraegy, we hi s α for he firs ime. This is ranslaed ino a classical gambler s ruin problem, which concerns he hiing imes of a simple random walk (Secion 5.3). Similarly he value of he game is compued using he sudy of he expeced number of equalizaions of a simple random walk (Theorem 5.). 3 Solving for he Adversary Direcly In his secion, we recall he resuls from [9] ha, for arbirary K, permi us o direcly search for he minimax opimal adversary in he resriced se of balanced adversaries while ignoring he learner. Definiion. A gain disribuion Ȧ is balanced if here exiss a consan cȧ, he mean gain of Ȧ, such ha k [K], cȧ = E g Ȧ [g k ]. A balanced adversary uses exclusively balanced gain disribuions. Lemma (Claim 5 in [9]). There exiss a minimax opimal balanced adversary. Use B o denoe he se of all balanced sraegies and Ḃ o denoe he se of all balanced gain disribuions. Ineresingly, as demonsraed in [9], a balanced adversary inflics he same regre on every learner: If B, hen VT R : p, V T p, = VT. (See Lemma ) Therefore, given an adversary sraegy, we can define he value-o-go V (s) associaed wih from ime in sae s, V [ ] (s) = E s T + st E c(s,), s+ P (. s, (s, ), s = s). + s = Anoher reducion comes from he fac ha he se of balanced gain disribuions can be seen as a convex combinaion of a finie se of balanced disribuions [9, Claim and 3]. We call his limied se he aomic gain disribuions. Therefore he search for can be limied o his se. The se of convex combinaions of he m disribuions Ȧ,... Ȧ m is denoed by (Ȧ,... Ȧ m ). 4 Reformulaion as a Markovian Decision Problem In his secion we formulae, for arbirary K, he maximizaion problem over balanced adversaries as an undiscouned MDP problem S, A, r, p. The sae space S was defined in Secion and he acion space is he se of aomic balanced disribuions as discussed in Secion 3. The ransiion model is defined by p(. s, Ḋ), which is a probabiliy disribuion over saes given he curren sae s and a balanced disribuion over gains Ḋ. In his model, he ransiion dynamics are deerminisic and enirely conrolled by he adversary s acion choices. However, he adversary is forced o choose sochasic acions (balanced gain disribuions). The maximizaion problem can herefore also be hough of as designing a balanced random walk on saes so as o maximize a sum of rewards (ha are ye o be defined). Firs, we define PȦ he ransiion probabiliy operaor wih respec o a gain disribuion Ȧ. Given funcion f : S R, PȦ reurns [PȦf](s) = E[f(s ) s p(. s, Ȧ)] = E [f(s + g)]. g s,ȧ g is sampled in s according o Ȧ. Given Ȧ in s, he per-sep regre is denoed by rȧ(s) and defined as rȧ(s) = E s s,ȧ s s cȧ. 4

5 Given an adversary sraegy, saring in s a ime, he cumulaive per-sep regre is V (s) = T = E [ r (,) (s ) s + p(. s, (s, ), s = s) ]. The acion-value funcion of a (s, Ḋ) and is he expeced sum of rewards received by saring from s, aking acion Ḋ, and hen following : Q (s, Ḋ) = E [ T = r Ȧ (s ) Ȧ = Ḋ, s + p( s, Ȧ ), Ȧ + = (s +, + )]. The Bellman operaor of Ȧ, TȦ, is [TȦf](s) = rȧ(s) + [PȦf](s). wih [T (s,) V + ](s) = V (s). This per-sep regre, rȧ(s), depends on s and Ȧ and no on he ime sep. Removing he ime from he picure permis a simplified view of he problem ha leads o a naural formulaion of he exchangeabiliy propery ha is independen of he ime. Crucially, his decomposiion of he regre ino per-sep regres is such ha maximizing V (s) over adversaries is equivalen, for all ime and s, o maximizing over adversaries he original value of he game, he regre V (s) (Lemma ). Lemma. For any adversary sraegy and any sae s and ime, V (s) = V (s) + s. The proof of Lemma is in Secion 8. In he following, our focus will be on maximizing V (s) in any sae s. We now show some basic properies of he per-sep regre ha holds for an arbirary number of expers K and discuss heir implicaions. The proofs are in Secion 9. Lemma 3. Le Ȧ Ḃ, for all s,, we have r Ȧ(s). Furhermore if X(s) =, rȧ(s) =. Lemma 3 shows ha a sae s in which he reward is no zero conains a leas wo equal leading expers, X(s) >. Therefore he goal of maximizing he reward can be rephrased ino finding a policy ha visis he saes wih X(s) > as ofen as possible, while sill aking ino accoun ha he per-sep reward increases wih X(s). The se of saes wih X(s) > is called he reward wall. Lemma 4. In any sae s, wih X(s) =, for any balanced gain disribuion Ḋ such ha wih probabiliy one exacly one of he leading exper receives a gain of, rḋ(s) = maxȧ Ḃ r Ȧ(s). 5 The Case of K = 3 5. Noaions in he 3-Expers Case, he COMB and he TWIN-COMB Adversaries Firs we define he sae space in he 3-exper case. The expers are sored wih respec o heir cumulaive gains and are named in decreasing order, he leading exper, he middle exper and he lagging exper. As menioned in [9], in our search for he minimax opimal adversary, i is sufficien for any K o describe our sae only using d ij ha denoe he difference beween he cumulaive gains of consecuive sored expers i and j = i +. Here, i denoes he exper wih ih larges cumulaive gains, and hence d ij for all i < j. Therefore one noaion for a sae, ha will be used hroughou his secion, is s = (x, y) = (d, d 3 ). We disinguish four ypes of saes C, C, C 3, C 4 as deailed below in Figure. In he same figure, in he cener, he saes are represened on a d-grid. C 4 conains only he sae denoed s α = (, ). s C, d >, d 3 > s C, d =, d 3 > s C 3, d >, d 3 = s C 4, d =, d 3 = Reward Wall d d Aomic Ȧ Symbol cȧ {}{3} Ẇ / {}{3} Ċ / {3}{} V / {}{}{3} /3 {}{3}{3} /3 Figure : 4 ypes of saes (lef), heir locaion on he d grid of saes (cener) and 5 aomic Ȧ (righ) Concerning he acion space, he gain disribuions use brackes. The group of arms in he same bracke receive gains ogeher and each group receive gains wih equal probabiliy. For insance, {}{}{3} exclusively deals a gain o exper (leading exper) wih probabiliy /3, exper (middle exper) wih probabiliy /3, and exper 3 (lagging exper) wih probabiliy /3, whereas {}{3} means dealing a gain o exper alone wih probabiliy / and expers and 3 ogeher wih probabiliy /. As discussed in Secion 3, we are searching for a using mixures of aomic balanced disribuions. When K = 3 here are seven aomic disribuions, denoed by Ḃ3 = { V,,, Ċ, Ẇ, {}, {3}} and described in Figure (righ). Moreover, in Figure, we repor in deail in a able (lef) and 5

6 Disribuion of nex sae s s rċ(s) p( s, Ċ) d wih s = (x, y) C P (s = (x, y+)) = P (s = (x+, y )) = C / P (s = (x +, y)) = P (s = (x +, y )) = C 3 P (s = (x, y + )) = P (s = (x, y + )) = C 4 / P (s = (x, y + )) = P (s = (x +, y)) = 3 4 Figure : The per-sep regre and ransiion probabiliies of he gain disribuion Ċ d 3 / / an illusraion (righ) on he -D sae grid he properies of he COMB gain disribuion Ċ. The remaining aomic disribuions are similarly repored in he appendix in Figures 5 o 8. In he case of hree expers, he COMB disribuion is simply playing {}{3} in any sae. We use Ẇ o denoe he sraegy ha plays {}{3} in any sae and refer o i as he TWIN-COMB sraegy. The COMB and TWIN-COMB sraegies (as opposed o he disribuions) repea heir respecive gain disribuions in any sae and any ime. They are respecively denoed C, W. The Lemma 5 shows ha he COMB sraegy C, he TWIN-COMB sraegy W and herefore any mixure of boh, have he same expeced cumulaive per-sep regre. The proof is repored o Secion. Lemma 5. For all saes s a ime, we have V C 5. The Exchangeabiliy Propery (s) = V W Lemma 6. Le Ȧ Ḃ3, here exiss Ḋ (Ċ, Ẇ) such ha for any s s α, and for any f : S R, (s). [TȦ[TḊf]](s) = [TẆ[TȦf]](s). Proof. If Ȧ = Ẇ, Ȧ = {} or Ȧ = {3}, use Ḋ = Ẇ. If Ȧ = Ċ, use Lemma and. Case. Ȧ = V: V is equal o Ċ in C3 C 4 and if s p(. s, Ẇ) wih s C 3 hen s C 3 C 4. So when s C 3 we reuse he case Ȧ = Ċ above. When s C C, we consider wo cases. Case.. s (, ): We choose Ḋ = Ẇ which is {}{3}. If s p(. s, V) wih s C hen s C. Similarly, if s p(. s, V) wih s C hen s C C 3. Moreover Ḋ modifies similarly he coordinaes (d, d 3 ) of s C and s C 3. Therefore he effec in erms of ransiion probabiliy and reward of Ḋ is he same wheher i is done before or afer he acions chosen by V. If s p(. s, Ḋ) wih s C C hen s C C. Moreover V modifies similarly he coordinaes (d, d 3 ) of s C and s C. Therefore he effec in erms of he ransiion probabiliy of V is he same wheher i is done before or afer he acion Ḋ. In erms of reward, noice ha in he saes s C C, V has per-sep regre and using V does no make s leave or ener he reward wall. Case. s = (, ): We can chose Ḋ = Ẇ. One can check from he ables in Figures 7 and 8 ha exchangebiliy holds. Addiionally we provide an illusraion of he exchangeabiliy equaliy in he d-grid in Figure. The saring sae s = (, ), is graphically represened by. We show on he grid he effec of he gain disribuion V (in dashed red) followed (lef picure) or preceded (righ picure) by he gain disribuion Ḋ (in plain blue). The illusraion shows ha V Ḋ and Ḋ V lead o he same final saes ( ) wih equal probabiliies. The rewards are displayed on op of he picures. Their color corresponds o he acions, he probabiliies are in ialic, and he rewards are in roman. Case & 3. Ȧ = & Ȧ = : The proof is similar and is repored in Secion of he appendix. 6

7 5.3 Approximae Greediness of COMB, Minimax Players and Regre The greedy error of he gain disribuion Ḋ in sae s a ime is ɛḋ s, = max Ȧ Ḃ3 Q C (s, Ȧ) Q C (s, Ḋ). Le ɛḋ = max s S ɛḋ s, denoe he maximum greedy error of he gain disribuion Ḋ a ime. The COMB greedy error in s α is conrolled by he following lemma proved in Secion 3.. Missing proofs from his secion are in he appendix in Secion 3.. ( Lemma 7. For any [T ] and gain disribuion Ḋ {Ẇ, Ċ, V, }, ɛḋ s α, ) T. 6 The following proposiion shows how we can index he saes d 3 in he d-grid as a one dimensional line over which he TWIN COMB sraegy behaves very similarly o a simple random walk. Figure 3 (op) illusraes his random walk on he d-grid and he indexing scheme (he yellow sickers). Proposiion. Index a sae s = (x, y) by i s = x + y irrespecive of he ime. Then for any sae s s α, and s p( s, Ẇ) we have ha P (i s = i s ) = P (i s = i s +) =. Consider a random walk ha sars from sae s = s and is generaed by he TWIN-COMB sraegy, s + p(. s, Ẇ). Define he random variable T α,s = min{ N {} : s = s α }. This random variable is he number of seps of he random walk before hiing s α for he firs ime. Then, le P α (s, ) be he probabiliy ha s α is reached afer seps: P α (s, ) = P (T α,s = ). Lemma 8 conrols he COMB greedy error in s in relaion o P α (s, ). Lemma 9 derives a sae-independen upper-bound for P α (s, ). Lemma 8. For any ime [T ] and sae s, ɛċ s, P α (s, ) 6 = ( ) T d d d Figure 3: Numbering TWIN-COMB (op) & G random walks (boom) Proof. If s = s α, his is a direc applicaion of Lemma 7 as P α (s α, ) = for >. When s s α, he following proof is by inducion. Iniializaion: Le = T. A he las round only he las per-sep regre maers (for all saes s, Q C (s, Ḋ) = rḋ(s)). As s s α, s is such ha X(s) hen rḋ(s) = maxȧ Ḃ r Ȧ(s) because of Lemma 4 and Lemma 3. Therefore he saemen holds. Inducion: Le < T. We assume he saemen is rue a ime +. We disinguish wo cases. For all gain disribuions Ḋ Ḃ3, Q C (s, Ḋ) = [TḊ[TĖ V C + ]](s) = [TẆ[TḊ V C + ]](s) = [T Q C Ẇ (c) [TẆ max C Q + (., Ȧ)](s) Ȧ Ḃ3 (d) max C [TẆ Q + (., Ȧ)](s) Ȧ Ḃ3 = max Ȧ Ḃ3 (e) = max Ȧ Ḃ3 Q C Q C (s, Ȧ) (s, Ȧ) =+ = 6. + (., Ḋ)](s) [PẆP α (., ) ( ) T ](s) 6 =+ =+ 6 ( 6 ( ) T [PẆP α (., )](s) ) T [PẆP α (., )](s) ( ) T P α (s, ) 7

8 where in Ė is any disribuion in (Ċ, Ẇ) and his sep holds because of Lemma 5, holds because of he exchangeabiliy propery of Lemma 6, (c) is rue by inducion and monooniciy of Bellman operaor, in (d) he max operaors change from being specific o any nex sae s a ime + o being jus one max operaor ha has o choose a single opimal gain disribuion in sae s a ime, (e) holds by definiion as for any, (here he las equaliy holds because s s α ) [PẆP α (., )](s) = E s p(. s,ẇ)[p α (s, )] = E s p(. s,ẇ)[p (T α,s = )] = P α (s, + ). Lemma 9. For > and any s, P α (s, ). Proof. Using he connecion beween he TWIN-COMB sraegy and a simple random walk in Proposiion, a formula can be found for P α (s, ) from he classical Gambler s ruin problem, where one wans o know he probabiliy ha he gambler reaches ruin (here sae s α ) a any ime given an iniial capial in dollars (here i s as defined in Proposiion ). The gambler has an equal probabiliy o win or lose one dollar a each round and has no upper bound on( his capial during he game. Using [7] (Chaper XIV, Equaion 4.4) or [8] we have P α (s, ) = is ) +is, where he binomial coefficien is if and i s are no of he same pariy. The echnical Lemma 4 complees he proof. We now sae our main resul, connecing he value of he COMB adversary o he value of he game. Theorem. Le K = 3, he regre of COMB sraegies agains any learner p, min p V T p, C, saisfies min p V T p, C V T log (T + ). We also characerize he minimax regre of he game. Theorem. Le K = 3, for even T, we have ha ( ) T + T/ + V T T/ + 3 T log (T + ), ( ) T + T/ + 8T wih T/ + 3 T 9. In Figure 4 we inroduce a COMB-based learner ha is denoed by p C. Here a sae is represened by a vecor of 3 inegers. The hree arms/expers are ordered as () () (3), breaking ies arbirarily. We connec he value of he COMB-based learner o he value of he game. p,() (s) = V C + (s+e ()) V C (s) Theorem 3. Le K = 3, he regre of COMB-based p,() (s) = V C + (s+e ()) V C (s) learner agains any adversary, max V T p C,, saisfies p,(3) (s) = p,() (s) p,() (s) max V T p C, V T + 36 log (T + ). Figure 4: A COMB learner, p C Similarly o [] and [4], his sraegy can be efficienly compued using rollous/simulaions from he COMB adversary in order o esimae he value V C (s) of C in s a ime. 6 Discussion and Fuure Work The main objecive is o generalize our new proof echniques o higher dimensions. In our case, he MDP formulaion and all he resuls in Secion 4 already holds for general K. Ineresingly, Lemma 3 and 4 show ha he COMB disribuion is he balanced disribuion wih highes per-sep regre in all he saes s such ha X(s), for arbirary K. Then assuming an ideal exchangeabiliy propery ha gives maxȧ Ḃ ȦĊ... Ċ = maxȧ Ḃ ĊĊ... ĊȦ, a disribuion would be greedy w.r. he COMB sraegy a an early round of he game if i maximizes he per-sep regre a he las round of he game. The COMB policy specifically ends o visi almos exclusively saes X(s), saes where COMB iself is he maximizer of he per-sep regre (Lemma 3). This would give ha COMB is greedy w.r.. iself and herefore opimal. To obain his resul for larger K, we will need o exend he exchangeabiliy propery o higher K and herefore undersand how he COMB and TWIN-COMB families exend o higher dimensions. One could also borrow ideas from he link wih pde approaches made in [6]. 8

9 Acknowledgemens We graefully acknowledge he suppor of he NSF hrough gran IIS-6936 and of he Ausralian Research Council hrough an Ausralian Laureae Fellowship (FL8) and hrough he Ausralian Research Council Cenre of Excellence for Mahemaical and Saisical Froniers (ACEMS). We would like o hank Nae Eldredge for poining us o he resuls in [8] and Wouer Koolen for poining us a [9]! References [] Jacob Abernehy and Manfred K. Warmuh. Repeaed games agains budgeed adversaries. In Advances in Neural Informaion Processing Sysems (NIPS), pages 9,. [] Jacob Abernehy, Manfred K. Warmuh, and Joel Yellin. Opimal sraegies from random walks. In s Annual Conference on Learning Theory (COLT), pages , 8. [3] Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Rober E. Schapire, and Manfred K. Warmuh. How o use exper advice. Journal of he ACM (JACM), 44(3):47 485, 997. [4] Nicolò Cesa-Bianchi and Gábor Lugosi. Predicion, learning, and games. Cambridge universiy press, 6. [5] Thomas M. Cover. Behavior of sequenial predicors of binary sequences. In 4h Prague Conference on Informaion Theory, Saisical Decision Funcions, Random Processes, pages 63 7, 965. [6] Nadeja Drenska. A pde approach o mixed sraegies predicion wih exper advice. hp:// (Exended absrac). [7] Willliam Feller. An Inroducion o Probabiliy Theory and is Applicaions, volume. John Wiley & Sons, 8. [8] Nick Gravin, Yuval Peres, and Balasubramanian Sivan. Towards opimal algorihms for predicion wih exper advice. In arxiv preprin arxiv:63.498, 4. [9] Nick Gravin, Yuval Peres, and Balasubramanian Sivan. Towards opimal algorihms for predicion wih exper advice. In Proceedings of he Tweny-Sevenh Annual ACM-SIAM Symposium on Discree Algorihms (SODA), pages , 6. [] Nick Gravin, Yuval Peres, and Balasubramanian Sivan. Tigh Lower Bounds for Muliplicaive Weighs Algorihmic Families. In 44h Inernaional Colloquium on Auomaa, Languages, and Programming (ICALP), volume 8, pages 48: 48:4, 7. [] Charles Miller Grinsead and James Laurie Snell. Inroducion o probabiliy. American Mahemaical Soc.,. [] James Hannan. Approximaion o bayes risk in repeaed play. Conribuions o he Theory of Games, 3:97 39, 957. [3] Ronald A. Howard. Dynamic Programming and Markov Processes. The MIT Press, Cambridge, MA, 96. [4] Haipeng Luo and Rober E. Schapire. Towards minimax online learning wih unknown ime horizon. In Proceedings of The 3s Inernaional Conference on Machine Learning (ICML), pages 6 34, 4. [5] Francesco Orabona and Dávid Pál. Opimal non-asympoic lower bound on he minimax regre of learning wih exper advice. arxiv preprin arxiv:5.76, 5. [6] Marin L. Puerman. Markov Decision Processes. Wiley, New York, 994. [7] Panelimon Sanica. Good lower and upper bounds on binomial coefficiens. Journal of Inequaliies in Pure and Applied Mahemaics, (3):3,. 9

10 [8] Remco van der Hofsad and Michael Keane. An elemenary proof of he hiing ime heorem. The American Mahemaical Monhly, 5(8): , 8. [9] Vladimir Vovk. A game of predicion wih exper advice. Journal of Compuer and Sysem Sciences (JCSS), 56():53 73, 998.

11 We use e,... e K {, } K o denoe he K canonical basis uni vecors. 7 A Shor Proof on he Regre of Balanced Adversaries Lemma. A balanced adversary inflics he same regre on any learner: If B, V T R : p, V T p, = V T. Proof. Indeed he expeced regre for a balanced adversary, E[R T ] agains any learner, can be wrien as [ ] E [R T ] = E g e I g () p, g,i p, = = E s T + s T + = E s T + s T + = E s = = E (g ) I () g,i, p s E c (s,) (3) s where is because is balanced which means ha E g s [(g ) k ] = c (s,) for each k. 8 Equivalence Beween Maximizing he Value and Maximizing he Cumulaive Per-Sep Regre Proof of Lemma. In he following he expecaion over saes are all aken wih respec o he sraegy. V (s) + s = = = E r(s ) + s (4) s s, = E s s = ( = [ ] E s + s c (s,) + s (5) s + s E s + E s s + s s s = E s T + s s T + s = E s T + s s T + = = ) E c (s,) + s (6) s s = E c (s,) + s (7) s s E c (s,) (8) s s = V (s) (9) 9 Proofs of Basic Properies of he Per-Sep Regre Firs we noice ha he per-sep regre can be wrien as rȧ(s) = PȦ ( k : k X(s) and g k = ) cȧ. () where he noaion PȦ denoes he fac ha he gain vecor g is sampled from Ȧ. Proof of Lemma 3. Sep ) rȧ(s) : We have E s s s s,ȧ

12 s r (s) Disribuion of wih s = (x, y) nex sae s p( s, ) C P (s = (x+, y)) = P (s = (x, y )) = P (s = (x, y + )) = /3 C /3 P (s = (x+, y)) = /3, P (s = (x, y )) = /3 C 3 P (s = (x+, y)) = /3, P (s = (x, y +)) = /3 C 4 /3 P (s = (x +, y)) = /3 /3 /3 /3 3 /3 /3 /3 /3 /3 4 Figure 5: The per-sep regre and ransiion probabiliies of he gain disribuion by definiion as he maximum cumulaive gain canno increase by more han in one round. Moreover c s as he adversary only deals posiive gains. Therefore rȧ(s) = E s s,ȧ s s c s. Sep ) rȧ(s): We are following he argumen in [9]. We wrie PȦ ( k : k X(s) and g k = ) PȦ (g k = ) = E g Ȧ [g k s ] (c) = cȧ, where in k s is any arm in X(s), holds because as g {, } K, for all k [K], E g s,ȧ [g k ] = PȦ(g k = ), and (c) holds because, as Ȧ is balanced, for all k, cȧ = E g s [g k ]. Therefore rȧ(s) = PȦ ( k : k X(s) and g k = ) cȧ. Sep 3) rȧ(s) = if X(s) = : This resul can be proven by following he same seps as in he previous case bu now he inequaliy urns ino an equaliy as X(s) =. Proof of Lemma 4. We have exacly wo (leading) arms ha have equal maximal cumulaive gain a ime. The adversary is designing a balanced gain disribuion on he K arms. Le p denoe he probabiliy ha boh leading arms are allocaed a gain of and le p, p, p be similarly defined. Therefore p + p + p + p =. Also, he balanced propery forces ha he expeced gains of boh leading expers are equal: cȧ = p + p = p + p, which gives p = p. A balanced gain disribuion Ȧ possesses herefore he following per-sep regre a sae s (see Equaion ): rȧ(s) = PȦ ( k : k X(s) and g k = ) cȧ = p + p + p (p + p ) = p = p. Finding he balanced gain disribuion wih maximal per-sep regre, arg maxȧ Ḃ r Ȧ(s), means maximizing p = p subjec o he consrains p + p + p + p = which is also p + p + p =. This is solved by having p = and p = p =, which is a valid balanced disribuion ha saisfies PḊ( k : k X(s) and g k = = ) =. Gain Disribuions Illusraions In his secion we repor he dynamics of he MDP for all he balanced gain disribuions in Ḃ3 and all he saes. We also illusrae hem on he d-grid of saes as inroduced in Figure. We firs repor he dynamics of he MDP grouped by gain disribuion (Secion.) and hen grouped by sae class (Secion.).. Grouped by Gain Disribuions We deail in ables and illusrae on he d sae grid, he properies of he aomic gain disribuions V,,, Ẇ in Figures 5 o 8. Noe ha he able and he illusraion for Ċ is in Figure of he main paper.. Grouped by Sae Class In Figure 9, we illusrae for each sae class he effec of each acions in erms of he nex saed reached in he d-grid. For insance if we look a he sae in class C 4, we can see ha he acions, and 3 leads us o he sae (, ).

13 s r (s) Disribuion of wih s = (x, y) nex sae s p( s, ) C P (s = (x +, y )) = P (s = (x, y + )) = P (s = (x, y)) = /3 C /3 P (s = (x+, y )) = /3, P (s = (x, y +)) = /3 C 3 P (s = (x, y)) = /3, P (s = (x, y+)) = /3 C 4 /3 P (s = (x, y + )) = /3 /3 3 /3 /3 /3 /3 4 /3 /3 /3 Figure 6: The per-sep regre and ransiion probabiliies of he gain disribuion s rẇ(s) Disribuion of nex sae s p( s, Ẇ) wih s = (x, y) C P (s = (x +, y)) = P (s = (x, y)) = C / P (s = (x +, y)) = P (s = (x +, y )) = C 3 P (s = (x +, y)) = P (s = (x, y)) = C 4 / P (s = (x, y + )) = P (s = (x +, y)) = 4 / / 3 Figure 7: The per-sep regre and ransiion probabiliies of he gain disribuion Ẇ s r V (s) Disribuion of nex sae s p( s, V) wih s = (x, y) C P (s = (x, y + )) =, P (s = (x, y )) = C / P (s = (x, y + )) =, P (s = (x, y )) = C 3 P (s = (x, y+)) =, P (s = (x, y+)) = C 4 / P (s = (x, y + )) =, P (s = (x +, y)) = 3 4 / Figure 8: The per-sep regre and ransiion probabiliies of he gain disribuion V d d Figure 9: The acions illusraed for each sae class 3

14 Proofs of Secion 5. Given wo adversary gain disribuions Ȧ, Ȧ respecively a imes, +, a wo-sep regre in sae s is rȧȧ (s) = E s s,ȧ [rȧ (s )] + rȧ (s). Proof of Lemma 5. The proof is by inducion. If = T, V C (s) = rċ(s) = rẇ(s) = V W (s) for any sae s. Now, given a ime, we assume for all imes >, for all saes s a ime, we have V C (s) = V W (s). Case. A ime, if s C or s C 4 hen we have p(. s, Ċ) = p(. s, Ẇ) by Lemma and, looking a he gain disribuions ables, rċ(s) = rẇ(s). Therefore we have, using he inducion propery, V C (s) = rċ(s) + [PĊ V C + ](s) = r Ẇ(s) + [PẆ V W + ](s) = V W (s). Case. A ime, if s C or s C 3 hen we have, V C (s) = rċ(s) + [PĊ V C + ](s) = rċẇ(s) + [PĊ[PẆ V C + ]](s) = rẇċ(s) + [PẆ[PĊ V C + ]](s) = rẇ(s) + [PẆ V W + ](s) = V W (s), where holds by inducion, and follows from he exchangeabiliy beween PẆ and PĊ (proved in Lemma ) and also he fac ha rċẇ(s) = rẇċ(s) (proved in Lemma ). The exac saemens and proofs of hese wo lemmas are repored in he nex subsecion.. Lemma and Lemma We prove a firs exchangeabiliy resul beween he COMB gain disribuion Ċ and he TWIN-COMB gain disribuion Ẇ. Lemma. If s C or s C 4, p( s, Ẇ) = p( s, Ċ). If s C or s C 3, E p( s, Ẇ) = s s,ċ E p( s, Ċ). s s,ẇ Proof. If s C or s C 4 hen we have p(. s, Ċ) = p(. s, Ẇ). This can be seen by direcly reading he ables in Figures and 5 o 8. Le us now focus on he cases s C or s C 3. To prove ha he gains disribuions Ċ and Ẇ are inverible, we show ha he order does no maer for any pair of possible oucomes of he gains of each disribuion. Recall ha he oucomes are or 3 for Ẇ and or 3 for Ċ. To follow he upcoming reasonings one can use he illusraions of he effec of he acions in Figure 9 and he d-grid in Figure. In he following we will refer o he acion as he oucome of he gain disribuion Ẇ = {}{3} ha happens by definiion half of he ime and ha means ha he exper number, ie. he leading exper receives a gain of one while he middle and lagging expers receive zero. Similarly, for insance acion 3 refers o a one deal o boh he lagging and middle expers while he leading exper receives zero. Case of he exchangeabiliy of wih 3: The acion applied o a sae s C leads o a sae s C and similarly a sae s C 3 is lead o a sae s C 3. Therefore he effec of 3 is he same wheher i is done before or afer he acion. The acion 3 applied o a s C C 3 leads o a sae s C C 3. Moreover he acion has he same effec wheher i is in C or C 3 (incremening d by one and leaving d 3 he same). Therefore he effec of is he same wheher i is done before or afer he acion 3. Case of he exchangeabiliy of wih 3: Case. d > : The acion has he same effec wheher i is in C C 3. The same goes for acion 3. Moreover d > insures ha afer applying eiher or 3 we say in C C 3. The exchangeabiliy holds. 4

15 Case. d = : One can check from he ables ha he effecs of playing hen 3 or 3 hen eiher from a sae in C or C 3 are canceling ou and he final sae is he original sae in all cases. Case of he exchangeabiliy of wih : The acion applied o a s C leads o a sae s C similarly for a sae s C 3 leads o a sae s C 3. Therefore he effec of is he same wheher i is done before or afer he acion. is an acion ha has he same effec in any sae. Therefore he effec of is he same wheher i is done before or afer he acion. Case of he exchangeabiliy of 3 wih 3: The acion 3 applied o a sae s C leads o a sae s C or C. The acion 3 applied o a sae s C 3 leads o a sae s C 3 or C 4. Moreover he acion 3 has he same effec wheher i is in C or C and also has he same effec wheher i is in C 3 or C 4. Therefore he effec of 3 is he same wheher i is done before or afer he acion 3. The acion 3 applied o a sae s C C 3 leads o a sae s C C 3. Moreover he acion 3 has he same effec wheher i is in C or C 3. Therefore he effec of 3 is he same wheher i is done before or afer he acion 3. Lemma. For all sae s we have rċẇ(s) = rẇċ(s). Proof. Firs we have rċ(s) = rẇ(s) for all saes s. Then we have [ ] E [rẇ(s { X(s )] = E ) >} = P s s,ċ s s,ċ s s,ċ( X(s ) > ) () [ ] = P s s,ẇ( X(s { X(s ) > ) = E ) >} = E [rċ(s )], () s s,ẇ s s,ẇ where This is obvious in he case s C 4 C as saed in Lemma, p(. s, Ċ) = p(. s, Ẇ). If s = (x, y) C 3 C. Le he sae s = (x, y ) be generaed wih he COMB policy Ċ and a sae s = (x, y ) be generaed wih he TWIN-COMB policy Ẇ, we have ha P (x = x ) = P (x = x ) = / so P ( X(s ) > ) = P ( X(s ) > ) and hese probabiliies are eiher equal o (if d > ) or / (if d = ). Therefore rċẇ(s) = E s s,ċ [rẇ(s )] + rċ(s) = E s s,ẇ [rċ(s )] + rẇ(s) = rẇċ(s) Proof of he Exchangeabiliy Propery End of he Proof of Lemma 6. Case. Ȧ = : Case. s C : We can chose Ḋ = {}{}{3}{3}, he disribuion ha mixes wih equal probabiliies Ċ and Ẇ. One can check from he ables ha he exchangeabiliy holds. Case.. s (, ): One can check from he able in Figures 7 and 8 ha he exchangebiliy holds. We provide a visual illusraion of he exchangeabiliy equaliy below. 5

16 Case.. s = (, ): One can check from he able ha he exchangebiliy holds. Also we provide a visual illusraion of he exchangeabiliy equaliy below. Case. s C C 3 : We can chose Ḋ = {}{}{3}{3}, he disribuion ha mixes wih probabiliy half-half Ċ and Ẇ. Case.. d > : Following a very similar reasoning as in he Case. he resul hold. Case.. d = : Case... d 3 > : One can check from he able ha he exchangebiliy holds. Also we provide a visual illusraion of he exchangeabiliy equaliy below. Case... d 3 = : One can check from he able ha he exchangebiliy holds. Also we provide a visual illusraion of he exchangeabiliy equaliy below. Case 3. Ȧ = : This case uses a very similar srucure of argumens as in he Case. 6

17 3 Proofs for Conrolling he Accumulaion of he Greedy Errors 3. The COMB Greedy Error in s α, Proof of Lemma 7 Le he value of playing he COMB sraegy from he saes s A = (, ), s B = (, ), s C = (, ) a ime be V A = V C (s A ), V B = V C (s B ) and V C = V C (s C ). Lemma 3 relaes V A o V B. Lemma 3. For all [T ] we have, V B = V A ( 6 T ). Proof. From Table, for all [T ] we have, V A for all [T ] we have, V B V A = V A + V B + + and V T B V T A formula for he value of V B V A, we have for all [T ], V B = V B + + V C +, V B = + V A + + V C +. Therefore =. Solving his recurrence V A = 3 + ( 6 T. ) The greedy acion in s α w.r. playing COMB in laer rounds is arg C maxȧ Ḃ Q 3 (s α, Ȧ) which is equal o arg maxȧ {Ċ,, } Q C (s α, Ȧ) as Ẇ, Ċ, V are equivalen in s α and Q C (s α, {}) = Q C (s α, {3}) = V C + (s α) V C (s α ) = Q C (s α, Ċ). Moreover, from Figures, 5 and 6 we have Q C (s α, Ċ) = / + ( V + A + V +)/, B C Q (s α, ) = /3 + V +, A Combining hese equaliies wih Lemma 3 leads o Lemma Bounding All he Errors Lemma 4. For any wo inegers i n of he same pariy, i ( ) n n+i n. Proof of Lemma 4. Case. i n: we have ( ) n n i ( ) n n n n+i n Here uses he fac ha he cenral binomial coefficien ( n C Q (s α, ) = /3 + V + B. n+ n n n +. ) n is he larges of he binomial coefficien of he shape ( n m) wih m an ineger, and uses he upper bound in Theorem.6 in [7] which gives ( ) n n n n+ n+. ) is non-increasing for Case. i n: We now wan o show ha he mapping Φ from i o i ( n n+i i n. Indeed his would prove ha for all i n, Φ(i) Φ( n). As we have already proven in he Case above Φ( n) n his would prove he desired inequaliy for he Case also. To prove he non-increase, we sudy he raio of he values for i and i + (he expression is zero a i + ) is (i + ) ( n n+i i ( n n+i ) = + ) ( + ) i ( + ) n ( n+i n i )! ( )! )! = n n ( + ( n+i + )! ( n i n+ n + = ( + i ) n i n+i + ) n n + n + n + n + =. n Lemma 5. For [T ] and any s, T = P ( α(s, ) T 6 ) log(t +)+4 T +. Proof of Lemma 5. Case. s = s α : We have P α (s, ) = for = and P α (s, ) = for >. Therefore he bound holds in his case. 7

18 Case. s s α : We have P α (s, ) = which gives T = P ( α(s, ) T 6 ) T =+ P ( α(s, ) ) T 6. Le T = T log (T + ) so ha we have = We have ha + =+ where is using Equaion 3. Moreover we have =T + T T T +. (3) ( ) T ( ) T + =+ T T =T + 4 T +. ( ) T (T T ) T + log (T + ) T + log (T + ) T +, where is using he definiion of T above, and is because T + T log (T + ) T + and herefore T + = T + T + T + T +. Therefore we have ( ) T P α (s, ( ) T ) =+ = ( T+ ( ) T ( ) T ) = + =+ =T + ( log (T + ) + 4) T +, where holds by Lemma 9. The opimal Bellman operaor is denoed by T and is defined as [T f](s) = maxḋ Ḃ r Ḋ(s) + [PḊf](s). If we define he opimal cumulaive per-sep regre as V (s) = max B V (s), we have ha [T V + ](s) = V (s). Lemma 6. For any sae s and any ime, we have V (s) V C (s) + T = ɛċ. Proof of Lemma 6. The proof is by backward inducion on ime. Iniializaion = T : We use Lemma 8 and Lemma 5. Inducion: We wrie V (s) = [T V + ](s) = max rḋ(s) + [PḊ V + ](s) Ḋ Ḃ max rḋ(s) + [PḊ V C + ](s) + T ɛċ Ḋ Ḃ =+ 8

19 = max Q C (s, Ḋ) + Ḋ Ḃ = Q C (s, Ċ) + Q C (s, Ċ) + = V C (s) + = =+ =+ ɛċ, =+ ɛċ ɛċ + ɛċ s, ɛċ + ɛċ where holds by inducion and holds by definiion of he errors ɛċ s, and ɛċ. Proof of Theorem 3. We have, min V T p, p C = V C T (s α) = V C T (s α) (c) (d) V (s α ) V (s α ) = ɛċ ( log (T + ) + 4) T + = T V (s α ) ( log (T + ) + 4) T + log (T + ), where we use C is balanced, Lemma, (c) Lemma 6, and (d) Lemma 8 & 5. = 4 Proofs for he Minimax Regre of he Game Proof of Theorem. A new adversary, GREEDY-COMB, is denoed G. We prove ha he value of COMB and he value of GREEDY-COMB do no differ by more han an addiive numerical consan. As we have proved ha COMB is almos minimax opimal, hen GREEDY-COMB is also almos opimal. However, compuing he value of his new sraegy is easier using he classical resuls on number of passages o he origin in random walks. The GREEDY-COMB adversary akes he same acions as he TWIN-COMB in all saes excep in sae s α where, a all ime, he gain disribuion played by GREEDY-COMB is. Lemma 7 in he appendix proves ha he value of GREEDY-COMB is no differen from he value of TWIN-COMB by more han a small consan: V T G V T C /3. This is shown by using a backward inducion over ime, accumulaing he errors ha appear in sae s α (see Lemma 3), and noing ha hese errors decrease exponenially wih T. Saring from s α, he random walk followed by GREEDY-COMB is a simple random walk on he line d 3 = as illusraed in Figure 3(boom). Therefore he value of his adversary is V T G = 3 H G, where H G is he expeced number of imes he random walk of GREEDY-COMB his he reward wall. Indeed each ime he wall is hi he GREEDY-COMB earns /3 as is gain disribuion on he wall is. We noice ha compuing H G is equivalen o compuing he expeced number of equalizaions (passages by ) in a random walk ha sars wih a value and incremens is value by + wih probabiliy half and decremens by - wih probabiliy half. Therefore, following he classic resul in [, Example.3 in Secion ], we have H G = ( T + T/+) (T/ + ) T T. Finally we have, V T log (T + ) V T C V T G + /3, where holds by Theorem 3. Moreover we have V T = max B V T V T G. Lemma 7. For all T >, V T G V T C 3. 9

20 Proof of Lemma 7. We will acually prove ha for all s, and all T, V G (s) V C (s) T ( = ) T 6 which implies he claim of he lemma by looking a he special case =, s = sα and using Lemma. The proof is by backward inducion on ime. Iniializaion = T : The wo policies G and C only differ in sae s α where we use Lemma 7. Inducion: We wrie V G (s) = Q G (s, G (s)) Q C (s, G (s)) =+ where he las inequaliy is by inducion. We disinguish wo cases. Case. s s α : inducion. 6 ( Q C (s, G (s)) = Q C 6 ( ) T, (4) (s, C (s)) = V C (s) and he Equaion 4 gives he Case. s = s α : Using Lemma 7 we have Q C (s α, G (s α )) + 6 ) T C maxȧ Ḃ Q 3 (s α, Ȧ) Q C (s α, C (s α )) = V C (s α ). 5 Proofs for he COMB-based Learner ( ) T = Q C (s α, ) + Definiion. An ordering ( ) is a bijecive funcion ha maps an order k [K] o is arm (k) [K]. For a sae s, a valid increasing ordering ( ) is an ordering such ha for all pairs of disinc orders i, j [K] wih i j and i < j we have s (i) s (j). Proposiion. Le g {, } K, for any sae s, X(s) X(s + g). Proof. If here exiss an arm k X(s) such ha g k =, hen k X(s + g) and hen X(s) X(s + g). Oherwise for all arms k X(s) we have g k = so acually X(s) X(s + g) as s = s + g. Again we have X(s) X(s + g). For a se of deacivaed arms I [K], we define X I (s) = {k [K]\I : s k = max j [K]\I s j }. Therefore we also have he following proposiion. Proposiion 3. Le g {, } K, for any sae s, for any se arms I [K], X I (s) X I (s + g). Proposiion 4. Le g {, } K, for any sae s here exiss an ordering ( ) ha is valid simulaneously for boh saes s and s + g. Proof. To consruc he ordering follow he following seps. Le he firs sep be i = and A = [K], I =. Ieraively, for each sep i, we ake k i X I (s) X I (s + g) wih X I (s) X I (s + g) (Proposiion 3). We se (i) k i and we pass o he nex ieraion (i i + ) by deacivaing index arm k i wih I i+ = I i {k i }. Noe ha by consrucion his ordering saisfies a any sep he definiing propery of valid increasing ordering for s. The following lemma, which holds for any K, will be used in showing ha p C defines a probabiliy. Lemma 8. Le g {, } K, for any sae s, hen for all, and V C (s + g) V C (s), V C (s + ) V C (s) =. Proof. The proof is by backward inducion on ime. Iniiaion: A = T + we have for any s and any g {, } K, V C T + (s + g) V C T + (s) = s + g s = (s j + g j ) s i g i min g k, k [K] where in i = arg max k [K] (s) and j = arg max k [K] (s + g), and uses s j + g j s i + g i by definiion. Noe ha he inequaliies urn ino equaliies when g =.

21 Inducion sep: Assuming ha for all > and for any s and any g {, } K, V C (s + g) (s) min k [K] g k we have, V C V C (s + g) V C (s) = ( + V C + (s + g + e ()) + V C min g k k [K] + (s + g + e () + e (3) ) ) ( + V C + (s + e ()) + V C + (s + e () + e (3) ) ) where is because, using Proposiion 4, here exiss a common ordering ( ) for boh saes s ans s + g and is by inducion. Noe ha he inequaliy urn ino equaliy when g =. We now urn o he analysis of our COMB-based learner, specifically designed for he case K = 3. Proposiion 5. In all saes and a all imes, he COMB-based learner is a probabiliy. Proof. We need o show for any sae s, any ime, 3 k= p,k(s) = and for any exper k, p,k (s). For all, for any sae s, We have moreover, 3 p,k (s) = p,() (s) + p,() (s) + p,(3) (s) k= p,() (s) + p,() (s) = p,() (s) + p,() (s) + p,() (s) p,() (s) =. = V C + (s + e ()) V C (s) + V C + (s + e ()) V C (s) = V C + (s + e ()) (V C + (s + e ()) + V C + (s + e () + e (3) ) ) + V C + (s + e ()) (V C + (s + e ()) + V C + (s + e () + e (3) ) ) = + V C + (s + e ()) V C + (s + e () + e (3) ) + V C + (s + e ()) V C + (s + e () + e (3) ) = ( V C + (s + e () + e (3) ) V C + (s + e ()) ) ( V C + (s + e () + e (3) ) V C + (s + e ()) ) min (s + e () + e (3) (s + e () )) k k [K] min (s + e () + e (3) (s + e () )) k k [K] =, where is because of Lemma 5 and is using Lemma 8. Therefore p,(3) (s) = p,() (s) p,() (s). Concerning he posiiveness of p,() (s), we have, p,() (s) = V C + (s + e ()) V C (s) = V C + (s + e ()) (V C + (s + e ()) + V C + (s + e () + e (3) ) ) = (V C + (s + e ()) V C + (s + e () + e (3) ) + ) = (V C + (s + e ()) V C + (s) + V C + (s + e () + e () + e (3) ) V C + (s + e () + e (3) ) + ) min k [K] (s + e () s) k

Near Minimax Optimal Players for the Finite-Time 3-Expert Prediction Problem

Near Minimax Optimal Players for the Finite-Time 3-Expert Prediction Problem Near Minimax Opimal Players for he Finie-Time 3-Exper Predicion Problem Yasin Abbasi-Yadkori Adobe Research Peer L. Barle UC Berkeley Vicor Gabillon Queensland Universiy of Technology Absrac We sudy minimax

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1 SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision

More information

Games Against Nature

Games Against Nature Advanced Course in Machine Learning Spring 2010 Games Agains Naure Handous are joinly prepared by Shie Mannor and Shai Shalev-Shwarz In he previous lecures we alked abou expers in differen seups and analyzed

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

The Arcsine Distribution

The Arcsine Distribution The Arcsine Disribuion Chris H. Rycrof Ocober 6, 006 A common heme of he class has been ha he saisics of single walker are ofen very differen from hose of an ensemble of walkers. On he firs homework, we

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes

More information

SOLUTIONS TO ECE 3084

SOLUTIONS TO ECE 3084 SOLUTIONS TO ECE 384 PROBLEM 2.. For each sysem below, specify wheher or no i is: (i) memoryless; (ii) causal; (iii) inverible; (iv) linear; (v) ime invarian; Explain your reasoning. If he propery is no

More information

Expert Advice for Amateurs

Expert Advice for Amateurs Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

BU Macro BU Macro Fall 2008, Lecture 4

BU Macro BU Macro Fall 2008, Lecture 4 Dynamic Programming BU Macro 2008 Lecure 4 1 Ouline 1. Cerainy opimizaion problem used o illusrae: a. Resricions on exogenous variables b. Value funcion c. Policy funcion d. The Bellman equaion and an

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

4 Sequences of measurable functions

4 Sequences of measurable functions 4 Sequences of measurable funcions 1. Le (Ω, A, µ) be a measure space (complee, afer a possible applicaion of he compleion heorem). In his chaper we invesigae relaions beween various (nonequivalen) convergences

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems Essenial Microeconomics -- 6.5: OPIMAL CONROL Consider he following class of opimizaion problems Max{ U( k, x) + U+ ( k+ ) k+ k F( k, x)}. { x, k+ } = In he language of conrol heory, he vecor k is he vecor

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Some Ramsey results for the n-cube

Some Ramsey results for the n-cube Some Ramsey resuls for he n-cube Ron Graham Universiy of California, San Diego Jozsef Solymosi Universiy of Briish Columbia, Vancouver, Canada Absrac In his noe we esablish a Ramsey-ype resul for cerain

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits DOI: 0.545/mjis.07.5009 Exponenial Weighed Moving Average (EWMA) Char Under The Assumpion of Moderaeness And Is 3 Conrol Limis KALPESH S TAILOR Assisan Professor, Deparmen of Saisics, M. K. Bhavnagar Universiy,

More information

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t.

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t. Econ. 5b Spring 999 C. Sims Discree-Time Sochasic Dynamic Programming 995, 996 by Chrisopher Sims. This maerial may be freely reproduced for educaional and research purposes, so long as i is no alered,

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

In this chapter the model of free motion under gravity is extended to objects projected at an angle. When you have completed it, you should

In this chapter the model of free motion under gravity is extended to objects projected at an angle. When you have completed it, you should Cambridge Universiy Press 978--36-60033-7 Cambridge Inernaional AS and A Level Mahemaics: Mechanics Coursebook Excerp More Informaion Chaper The moion of projeciles In his chaper he model of free moion

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018 MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren

More information

Approximation Algorithms for Unique Games via Orthogonal Separators

Approximation Algorithms for Unique Games via Orthogonal Separators Approximaion Algorihms for Unique Games via Orhogonal Separaors Lecure noes by Konsanin Makarychev. Lecure noes are based on he papers [CMM06a, CMM06b, LM4]. Unique Games In hese lecure noes, we define

More information

Mixing times and hitting times: lecture notes

Mixing times and hitting times: lecture notes Miing imes and hiing imes: lecure noes Yuval Peres Perla Sousi 1 Inroducion Miing imes and hiing imes are among he mos fundamenal noions associaed wih a finie Markov chain. A variey of ools have been developed

More information

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013 IMPLICI AND INVERSE FUNCION HEOREMS PAUL SCHRIMPF 1 OCOBER 25, 213 UNIVERSIY OF BRIISH COLUMBIA ECONOMICS 526 We have exensively sudied how o solve sysems of linear equaions. We know how o check wheher

More information

Longest Common Prefixes

Longest Common Prefixes Longes Common Prefixes The sandard ordering for srings is he lexicographical order. I is induced by an order over he alphabe. We will use he same symbols (,

More information

14 Autoregressive Moving Average Models

14 Autoregressive Moving Average Models 14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

Block Diagram of a DCS in 411

Block Diagram of a DCS in 411 Informaion source Forma A/D From oher sources Pulse modu. Muliplex Bandpass modu. X M h: channel impulse response m i g i s i Digial inpu Digial oupu iming and synchronizaion Digial baseband/ bandpass

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Reading from Young & Freedman: For this topic, read sections 25.4 & 25.5, the introduction to chapter 26 and sections 26.1 to 26.2 & 26.4.

Reading from Young & Freedman: For this topic, read sections 25.4 & 25.5, the introduction to chapter 26 and sections 26.1 to 26.2 & 26.4. PHY1 Elecriciy Topic 7 (Lecures 1 & 11) Elecric Circuis n his opic, we will cover: 1) Elecromoive Force (EMF) ) Series and parallel resisor combinaions 3) Kirchhoff s rules for circuis 4) Time dependence

More information

Matlab and Python programming: how to get started

Matlab and Python programming: how to get started Malab and Pyhon programming: how o ge sared Equipping readers he skills o wrie programs o explore complex sysems and discover ineresing paerns from big daa is one of he main goals of his book. In his chaper,

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17 EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive

More information

Electrical and current self-induction

Electrical and current self-induction Elecrical and curren self-inducion F. F. Mende hp://fmnauka.narod.ru/works.hml mende_fedor@mail.ru Absrac The aricle considers he self-inducance of reacive elemens. Elecrical self-inducion To he laws of

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

We just finished the Erdős-Stone Theorem, and ex(n, F ) (1 1/(χ(F ) 1)) ( n

We just finished the Erdős-Stone Theorem, and ex(n, F ) (1 1/(χ(F ) 1)) ( n Lecure 3 - Kövari-Sós-Turán Theorem Jacques Versraëe jacques@ucsd.edu We jus finished he Erdős-Sone Theorem, and ex(n, F ) ( /(χ(f ) )) ( n 2). So we have asympoics when χ(f ) 3 bu no when χ(f ) = 2 i.e.

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

Some Basic Information about M-S-D Systems

Some Basic Information about M-S-D Systems Some Basic Informaion abou M-S-D Sysems 1 Inroducion We wan o give some summary of he facs concerning unforced (homogeneous) and forced (non-homogeneous) models for linear oscillaors governed by second-order,

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

An Introduction to Malliavin calculus and its applications

An Introduction to Malliavin calculus and its applications An Inroducion o Malliavin calculus and is applicaions Lecure 5: Smoohness of he densiy and Hörmander s heorem David Nualar Deparmen of Mahemaics Kansas Universiy Universiy of Wyoming Summer School 214

More information

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC This documen was generaed a :45 PM 8/8/04 Copyrigh 04 Richard T. Woodward. An inroducion o dynamic opimizaion -- Opimal Conrol and Dynamic Programming AGEC 637-04 I. Overview of opimizaion Opimizaion is

More information

4.6 One Dimensional Kinematics and Integration

4.6 One Dimensional Kinematics and Integration 4.6 One Dimensional Kinemaics and Inegraion When he acceleraion a( of an objec is a non-consan funcion of ime, we would like o deermine he ime dependence of he posiion funcion x( and he x -componen of

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

Continuous Time. Time-Domain System Analysis. Impulse Response. Impulse Response. Impulse Response. Impulse Response. ( t) + b 0.

Continuous Time. Time-Domain System Analysis. Impulse Response. Impulse Response. Impulse Response. Impulse Response. ( t) + b 0. Time-Domain Sysem Analysis Coninuous Time. J. Robers - All Righs Reserved. Edied by Dr. Rober Akl 1. J. Robers - All Righs Reserved. Edied by Dr. Rober Akl 2 Le a sysem be described by a 2 y ( ) + a 1

More information

Chapter 6. Systems of First Order Linear Differential Equations

Chapter 6. Systems of First Order Linear Differential Equations Chaper 6 Sysems of Firs Order Linear Differenial Equaions We will only discuss firs order sysems However higher order sysems may be made ino firs order sysems by a rick shown below We will have a sligh

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes Some common engineering funcions 2.7 Inroducion This secion provides a caalogue of some common funcions ofen used in Science and Engineering. These include polynomials, raional funcions, he modulus funcion

More information

Homework 4 (Stats 620, Winter 2017) Due Tuesday Feb 14, in class Questions are derived from problems in Stochastic Processes by S. Ross.

Homework 4 (Stats 620, Winter 2017) Due Tuesday Feb 14, in class Questions are derived from problems in Stochastic Processes by S. Ross. Homework 4 (Sas 62, Winer 217) Due Tuesday Feb 14, in class Quesions are derived from problems in Sochasic Processes by S. Ross. 1. Le A() and Y () denoe respecively he age and excess a. Find: (a) P{Y

More information

KINEMATICS IN ONE DIMENSION

KINEMATICS IN ONE DIMENSION KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004 ODEs II, Lecure : Homogeneous Linear Sysems - I Mike Raugh March 8, 4 Inroducion. In he firs lecure we discussed a sysem of linear ODEs for modeling he excreion of lead from he human body, saw how o ransform

More information

Failure of the work-hamiltonian connection for free energy calculations. Abstract

Failure of the work-hamiltonian connection for free energy calculations. Abstract Failure of he work-hamilonian connecion for free energy calculaions Jose M. G. Vilar 1 and J. Miguel Rubi 1 Compuaional Biology Program, Memorial Sloan-Keering Cancer Cener, 175 York Avenue, New York,

More information

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and

More information

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015 Explaining Toal Facor Produciviy Ulrich Kohli Universiy of Geneva December 2015 Needed: A Theory of Toal Facor Produciviy Edward C. Presco (1998) 2 1. Inroducion Toal Facor Produciviy (TFP) has become

More information

Chapter 3 Boundary Value Problem

Chapter 3 Boundary Value Problem Chaper 3 Boundary Value Problem A boundary value problem (BVP) is a problem, ypically an ODE or a PDE, which has values assigned on he physical boundary of he domain in which he problem is specified. Le

More information

15. Vector Valued Functions

15. Vector Valued Functions 1. Vecor Valued Funcions Up o his poin, we have presened vecors wih consan componens, for example, 1, and,,4. However, we can allow he componens of a vecor o be funcions of a common variable. For example,

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t M ah 5 2 7 Fall 2 0 0 9 L ecure 1 0 O c. 7, 2 0 0 9 Hamilon- J acobi Equaion: Explici Formulas In his lecure we ry o apply he mehod of characerisics o he Hamilon-Jacobi equaion: u + H D u, x = 0 in R n

More information

Two Coupled Oscillators / Normal Modes

Two Coupled Oscillators / Normal Modes Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own

More information

arxiv: v1 [math.pr] 19 Feb 2011

arxiv: v1 [math.pr] 19 Feb 2011 A NOTE ON FELLER SEMIGROUPS AND RESOLVENTS VADIM KOSTRYKIN, JÜRGEN POTTHOFF, AND ROBERT SCHRADER ABSTRACT. Various equivalen condiions for a semigroup or a resolven generaed by a Markov process o be of

More information

Solutions from Chapter 9.1 and 9.2

Solutions from Chapter 9.1 and 9.2 Soluions from Chaper 9 and 92 Secion 9 Problem # This basically boils down o an exercise in he chain rule from calculus We are looking for soluions of he form: u( x) = f( k x c) where k x R 3 and k is

More information

Pade and Laguerre Approximations Applied. to the Active Queue Management Model. of Internet Protocol

Pade and Laguerre Approximations Applied. to the Active Queue Management Model. of Internet Protocol Applied Mahemaical Sciences, Vol. 7, 013, no. 16, 663-673 HIKARI Ld, www.m-hikari.com hp://dx.doi.org/10.1988/ams.013.39499 Pade and Laguerre Approximaions Applied o he Acive Queue Managemen Model of Inerne

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

2. Nonlinear Conservation Law Equations

2. Nonlinear Conservation Law Equations . Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear

More information

) were both constant and we brought them from under the integral.

) were both constant and we brought them from under the integral. YIELD-PER-RECRUIT (coninued The yield-per-recrui model applies o a cohor, bu we saw in he Age Disribuions lecure ha he properies of a cohor do no apply in general o a collecion of cohors, which is wha

More information

Online Learning with Queries

Online Learning with Queries Online Learning wih Queries Chao-Kai Chiang Chi-Jen Lu Absrac The online learning problem requires a player o ieraively choose an acion in an unknown and changing environmen. In he sandard seing of his

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

Martingales Stopping Time Processes

Martingales Stopping Time Processes IOSR Journal of Mahemaics (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765. Volume 11, Issue 1 Ver. II (Jan - Feb. 2015), PP 59-64 www.iosrjournals.org Maringales Sopping Time Processes I. Fulaan Deparmen

More information

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC This documen was generaed a :37 PM, 1/11/018 Copyrigh 018 Richard T. Woodward 1. An inroducion o dynamic opimiaion -- Opimal Conrol and Dynamic Programming AGEC 64-018 I. Overview of opimiaion Opimiaion

More information

Physics 127b: Statistical Mechanics. Fokker-Planck Equation. Time Evolution

Physics 127b: Statistical Mechanics. Fokker-Planck Equation. Time Evolution Physics 7b: Saisical Mechanics Fokker-Planck Equaion The Langevin equaion approach o he evoluion of he velociy disribuion for he Brownian paricle migh leave you uncomforable. A more formal reamen of his

More information

MATH 4330/5330, Fourier Analysis Section 6, Proof of Fourier s Theorem for Pointwise Convergence

MATH 4330/5330, Fourier Analysis Section 6, Proof of Fourier s Theorem for Pointwise Convergence MATH 433/533, Fourier Analysis Secion 6, Proof of Fourier s Theorem for Poinwise Convergence Firs, some commens abou inegraing periodic funcions. If g is a periodic funcion, g(x + ) g(x) for all real x,

More information

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3 Macroeconomic Theory Ph.D. Qualifying Examinaion Fall 2005 Comprehensive Examinaion UCLA Dep. of Economics You have 4 hours o complee he exam. There are hree pars o he exam. Answer all pars. Each par has

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

18 Biological models with discrete time

18 Biological models with discrete time 8 Biological models wih discree ime The mos imporan applicaions, however, may be pedagogical. The elegan body of mahemaical heory peraining o linear sysems (Fourier analysis, orhogonal funcions, and so

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem

An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem An Opimal Approximae Dynamic Programming Algorihm for he Lagged Asse Acquisiion Problem Juliana M. Nascimeno Warren B. Powell Deparmen of Operaions Research and Financial Engineering Princeon Universiy

More information

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous

More information

Errata (1 st Edition)

Errata (1 st Edition) P Sandborn, os Analysis of Elecronic Sysems, s Ediion, orld Scienific, Singapore, 03 Erraa ( s Ediion) S K 05D Page 8 Equaion (7) should be, E 05D E Nu e S K he L appearing in he equaion in he book does

More information

Unit Root Time Series. Univariate random walk

Unit Root Time Series. Univariate random walk Uni Roo ime Series Univariae random walk Consider he regression y y where ~ iid N 0, he leas squares esimae of is: ˆ yy y y yy Now wha if = If y y hen le y 0 =0 so ha y j j If ~ iid N 0, hen y ~ N 0, he

More information