Unimodal Thompson Sampling for Graph Structured Arms

Size: px
Start display at page:

Download "Unimodal Thompson Sampling for Graph Structured Arms"

Transcription

1 Proceedings of he Thiry-Firs AAAI Conference on Arificial Inelligence AAAI-7 Unimodal Thompson Sampling for Graph Srucured Arms Sefano Paladino, Francesco Trovò, Marcello Reselli, Nicola Gai Diparimeno di Eleronica, Informazione e Bioingegneria Poliecnico di Milano, Milano, Ialy {sefano.paladino, francesco.rovo, marcello.reselli, nicola.gai}@polimi.i Absrac We sudy, o he bes of our knowledge, he firs Bayesian algorihm for unimodal Muli Armed Bandi MAB problems wih graph srucure. In his seing, each arm corresponds o a node of a graph and each edge provides a relaionship, unknown o he learner, beween wo nodes in erms of expeced reward. Furhermore, for any node of he graph here is a pah leading o he unique node providing he maximum expeced reward, along which he expeced reward is monoonically increasing. Previous resuls on his seing describe he behavior of frequenis MAB algorihms. In our paper, we design a Thompson Sampling based algorihm whose asympoic pseudo regre maches he lower bound for he considered seing. We show ha as i happens in a wide number of scenarios Bayesian MAB algorihms dramaically ouperform frequenis ones. In paricular, we provide a horough experimenal evaluaion of he performance of our and sae of he ar algorihms as he properies of he graph vary. Inroducion Muli Armed Bandi MAB algorihms Auer, Cesa- Bianchi, and Fischer have been proven o provide effecive soluions for a wide range of applicaions fiing he sequenial decisions making scenario. In his framework, a each round over a finie horizon T, he learner selecs an acion usually called arm from a finie se and observes only he reward corresponding o he choice she made. The goal of a MAB algorihm is o converge o he opimal arm, i.e., he one wih he highes expeced reward, while minimizing he loss incurred in he learning process and, herefore, is performance is measured hrough is expeced pseudo regre, defined as he difference beween he expeced reward achieved by an oracle algorihm always selecing he opimal arm and he one achieved by he considered algorihm. We focus on he so called Unimodal MAB UMAB, inroduced in Combes and Prouiere 4a, in which each arm corresponds o a node of a graph and each edge is associaed wih a relaionship specifying which node of he edge gives he larges expeced reward providing hus a parial ordering over he arm space. Furhermore, from any node here is a pah leading o he unique node wih he maximum expeced reward along which he expeced reward is Copyrigh c 7, Associaion for he Advancemen of Arificial Inelligence All righs reserved. monoonically increasing. While he graph srucure may be no necessarily known a priori by he UMAB algorihm, he relaionship defined over he edges is discovered during he learning. In he presen paper, we propose a novel algorihm relying on he Bayesian learning approach for a generic UMAB seing. Models presening a graph srucure have become more and more ineresing in las years due o he spread of social neworks. Indeed, he relaionships among he eniies of a social nework have a naural graph srucure. A pracical problem in his scenario is he argeed adverisemen problem, whose goal is o discover he par of he nework ha is ineresed in a given produc. This ask is heavily influenced by he graph srucure, since in social neworks people end o have similar characerisics o hose of heir friends i.e., neighbor nodes in he graph, herefore ineress of people in a social nework change smoohly and neighboring nodes in he graph look similar o each oher McPherson, Smih-Lovin, and Cook ; Crandall e al. 8. More specifically, an adveriser aims a finding hose users ha maximize he ad expeced revenue i.e., he produc beween click probabiliy and value per click, while a he same ime reducing he amoun of imes he adverisemen is presened o people no ineresed in is conen. Under he assumpion of unimodal expeced reward, he learner can move from low expeced rewards o high ones jus by climbing hem in he graph, prevening from he need for an exploraion over all he graph nodes. This assumpion reduces he complexiy in he search for he opimal arm, since he learning algorihm can avoid o pull he arms corresponding o some subse of non opimal nodes, reducing hus he regre. Oher applicaions migh benefi from his srucure, e.g., recommender sysems which aims a coupling iems wih hose users are likely o enjoy hem. Similarly, he use of he unimodal graph srucure migh provide more meaningful recommendaions wihou esing all he users in he social nework. Finally, noice ha unimodal problems wih a single variable, e.g., in sequenial pricing Jia and Mannor, bidding in online sponsored search aucions Edelman and Osrovsky 7 and single peak preferences economics and voing seings Mas-Collel, Whinson, and Green 995, are graph srucured problems in which he graph is a line. 457

2 Frequenis approaches for UMAB wih graph srucure are proposed in Jia and Mannor and Combes and Prouiere 4a. Jia and Mannor inroduce he GLSE algorihm wih a regre of order O T logt. However, GLSE performs beer han classical bandi algorihms only when he number of arms is ΘT. Combes and Prouiere 4a presen he algorihm based on LUCB achieving asympoic regre of OlogT and ouperforming GLSE in seings wih a few arms. To he bes of our knowledge, no Bayesian approach has been proposed for unimodal bandi seings, included he UMAB seing we sudy. However, i is well known ha Bayesian MAB algorihms he mos popular is Thompson Sampling usually suffer of same order of regre as he bes frequenis one e.g., in unsrucured seings aufmann, orda, and Munos, bu hey ouperform he frequenis mehods in a wide range of problems e.g., in bandi problems wihou srucure Chapelle and Li and in bandi problems wih budge Xia e al. 5. Furhermore, in problems wih srucure, he classical Thompson Sampling no exploiing he problem srucure may ouperform frequenis algorihms exploiing he problem srucure. For his reason, in his paper we explore Bayesian approaches for he UMAB seing. More precisely, we provide he following original conribuions: we design a novel Bayesian MAB algorihm, called and based on he algorihm; we derive a igh upper bound over he pseudo regre for, which asympoically maches he lower bound for he UMAB seing; we describe a wide experimenal campaign showing beer performance of in applicaive scenarios han hose of sae of he ar algorihms, evaluaing also how he performance of he algorihms ours and of he sae of he ar varies as he graph srucure properies vary. Relaed work Here, we menion he main works relaed o ours. Some works deal wih unimodal reward funcions in coninuous armed bandi seing Jia and Mannor ; Combes and Prouiere 4b; leinberg, Slivkins, and Upfal 8. In Jia and Mannor a successive eliminaion algorihm, called LSE, is proposed achieving regre of O T log T. In his case, assumpions over he minimum local decrease and increase of he expeced reward is required. Combes and Prouiere 4b consider sochasic bandi problems wih a coninuous se of arms and where he expeced reward is a coninuous and unimodal funcion of he arm. They propose he SP algorihm, based on he sochasic penachoomy procedure o narrow he search space. Unimodal MABs on meric spaces are sudied in leinberg, Slivkins, and Upfal 8. An applicaion dependen soluion o he recommendaion sysems which explois he similariy of he graph in social nework in argeed adverisemen has been proposed in Valko e al. 4. Similar informaion has been considered in Caron and Bhaga where he problem of cold sar users i.e., new users is sudied. Anoher ype of srucure considered in sequenial games is he one of monooniciy of he conversion rae in he price Trovòeal. 5. Ineresingly, he assumpions of monooniciy and unimodaliy are orhogonal, none of hem being a special case of he oher, herefore he resuls for monoonic seing canno be used in unimodal bandis. In Alon e al. ; Mannor and Shamir, a graph srucure of he arm feedback in an adversarial seing is sudied. More precisely, hey assume o have correlaion over rewards and no over he expeced values of arms. Problem Formulaion A learner receives in inpu a finie undireced graph MAB seing G =A, E, whose verices A = {a,...,a } wih N correspond o he arms and an edge a i,a j E exiss only if here is a direc parial order relaionship beween he expeced rewards of arms a i and a j. The leaner knows a priori he nodes and he edges i.e., she knows he graph, bu, for each edge, she does no know a priori which is he node of he edge wih he larges expeced reward i.e., she does no know he ordering relaionship. A each round over a ime horizon of T N he learner selecs an arm a i and gains he corresponding reward x i,. This reward is drawn from an i.i.d. random variable X i, i.e., we consider a sochasic MAB seing characerized by an unknown disribuion D i wih finie known suppor Ω R as cusomary in MAB seings, from now on we consider Ω [, ] and by unknown expeced value μ i := E[X i, ]. We assume ha here is a single opimal arm, i.e., here exiss a unique arm a i s.. is expeced value μ i =max i μ i and, for sake of noaion, we denoe μ i wih μ. Here we analyze a graph bandi seing wih unimodaliy propery, defined as: Definiion. A graph unimodal MAB UMAB seing G = A, E is a graph bandi seing G s.. for each sub opimal arm a i,i i i exiss a finie pah p =i = i,...,i m = i s.. μ ik < μ ik+ and a ik,a ik+ E for each k {,...,m }. This definiion assures ha if one is able o idenify a non decreasing pah in G of expeced rewards, she be able o reach he opimum arm, wihou geing suck in local opima. Noe ha he unimodaliy propery implies ha he graph G is conneced and herefore we consider only conneced graphs from here on. A policy U over a UMAB seing is a procedure able o selec a each round an arm a i by basing on he hisory h, i.e., he sequence of pas seleced arms and pas rewards gained. The pseudo regre R T U of a generic policy U over a UMAB seing is defined as: [ T ] R T U :=Tμ E, X i, where he expeced value E[ ] is aken w.r.. he sochasiciy of he gained rewards X i, and of he policy U. Le us define he neighborhood of arm a i as Ni := {j a i,a j E}, i.e., he se of each index j of he arm a j 458

3 conneced o he arm a i by an edge a i,a j E. I has been shown in Combes and Prouiere 4a ha he problem of learning in a UMAB seing presens a lower bound over he regre R T U of he following form: Theorem. Le U be a uniformly good policy, i.e., a policy s.. R T U =ot c for each c>. Given a UMAB seing G =A, E we have: lim inf T R T U logt = where Lp, q =p log p q i Ni μ μ i Lμ i,μ, + plog p q, i.e., he ullaback Leibler divergence of wo Bernoulli disribuions wih means p and q, respecively. This resul is similar o he one provided in Lai and Robbins 985, wih he only difference ha he summaion is resriced o he arms laying in he neighborhood of he opimal arm Ni and reduces o i when he opimal arm is conneced o all he ohers i.e., Ni {,...,} or he graph is compleely conneced i.e., Ni {,...,}, i. We would like o poin ou ha by relying on he assumpion of having a single maximum of he expeced rewards, we also assure ha he opimal arm neighborhood Ni is uniquely defined and, hus, he lower bound inequaliy in Equaion is well defined. The algorihm We describe he algorihm and we show ha is regre is asympoically opimal, i.e., i asympoically maches he lower bound of Theorem. The algorihm is an exension of he Thompson Sampling Thompson 9 ha explois he graph srucure and he unimodal propery of he UMAB seing. Basically, he raionale of he algorihm is o apply a simple variaion of he algorihm o only he arms associaed wih he nodes ha compose he neighborhood of he arm wih he highes empirical mean reward, called leader. The pseudo code The pseudo code of he algorihm is presened in Algorihm. The algorihm receives in inpu he graph srucure G, he ime horizon T, and a Bayesian prior π i for each expeced reward μ i. A each round, he algorihm compues he empirical expeced reward for each arm Line : ˆμ i, := S i T i, if T i, > oherwise where S i, = h= X i,h{uh =a i } is he cumulaive reward of arm a i up o round and T i, = h= {Uh = a i } is he number of imes he arm a i has been pulled up o round. Afer ha, selecs he arm denoed as he leader a l for round, i.e., he one having he maximum empirical expeced reward: a l =argmax ˆμ i,. a i A We here denoe wih { } he indicaor funcion., Algorihm : Inpu: UMAB seing G =V,E, Horizon T, Priors {π i } i= : for {,...,T} do : Compue ˆμ i, for each i {,...,} 4: Find he leader a l 5: if L l, mod N + l =hen 6: Collec reward x l, 7: else 8: Draw θ i, from π i, for each i N + l 9: Collec reward x i, where i =argmax i θ i, Once he leader has been chosen, we resric he selecion procedure o i and is neighborhood, considering only arms wih indexes in N + l := Nl {l}. Denoe wih L i, := h= {lh =i} he number of imes he arm a i has been seleced as leader before round. IfL l, is a muliple of N + l, hen he leader is pulled and reward x l, is gained Line 6. Oherwise, he algorihm is performed over arms a i s.. i N + l Lines 8 9. Basically, under he assumpion of having a prior π i,we can compue he poserior disribuion π i, for μ i afer rounds, using he informaion gahered from he rounds in which a i has been pulled. We denoe wih θ i, a sample drawn from π i,, called Thompson sample. For insance, for Bernoulli rewards and by assuming uniform priors we have ha π i, = Bea+S i,, +T i, S i,, where Beaα, β is he bea disribuion wih parameers α and β. Finally, he algorihm pulls he arm wih he larges Thompson sample θ i,n and collecs he corresponding reward x i,. See aufmann, orda, and Munos for furher deails. Remark. Assuming ha he algorihm receives in inpu he whole graph G is unnecessary. The algorihm jus requires an oracle ha, a each round, is able o reurn he neighborhood Nl of he arm which is currenly he leader a l. This is crucial in all he applicaions in which he graph is discovered by means of a series of queries and he queries have a non negligible cos e.g., in social neworks a query migh be compuaionally cosly. Finally, we remark ha he frequenis counerpar of our algorihm i.e., he algorihm requires he compuaion of he maximum node degree γ := max i Ni, hus requiring a leas an iniial analysis of he enire graph G. Finie ime analysis of Theorem. Given a UMAB seing G =A, E, he expeced pseudo regre of he algorihm saisfies, for every ε>: μ μ i R T + ε [logt +loglogt] + C, i Ni Lμ i,μ where C > is a consan depending on ε, he number of arms and he expeced rewards {μ,...,μ }. We here denoe wih he cardinaliy operaor. 459

4 Skech of proof. The complee version of he proof is repored in Paladino e al. 6. A firs, we remark ha a sraighforward applicaion of he proof provided for is no possible in he case of. Indeed, he use of frequenis upper bounds over he expeced reward in implies ha in finie ime and wih high probabiliy he bounds are ordered as he expeced values. Since we are using a Bayesian algorihm, we would require he same assurance over he Thompson samples θ i,, bu we do no have a direc bound over Pθ i, >θ i, where a i is he opimal arm in he neighborhood N + i. This fac requires o follow a compleely differen sraegy when we analyze he case in which he leader is no he opimal arm. The regre of he algorihm R T can be divided in wo pars: he one obained during hose rounds in which he opimal arm a is he leader, called R, and he summaion of he regres in he rounds in which he leader is he arm a i a, called R i. R is obained when i is he leader, hus, he algorihms behaves like Thompson Sampling resriced o he opimal arm and is neighborhood N + i, and he regre upper bound is he one derived in aufmann, orda, and Munos for he algorihm. R i is upper bounded by he expeced number of rounds he arm a i has been seleced as leader E[L i,t ] over he horizon T. Le us consider ˆL i,t defined as he number as leader when resricing he of rounds spen wih a i problem o is neighborhood N + i. E[ˆL i,t ] is an upper bound over E[L i,t ], since here is nonzero probabiliy ha he algorihm moves in anoher neighborhood. Since i i and he seing is unimodal, here exiss an opimal arm a i,i i among hose in he neighborhood Ni s.. μ i =max i ai Ni μ i and ˆμ i, ˆμ i. Thus: [ ] R i E[ ˆL i,t ]= E {ˆμ i, = max ˆμ j, } a j N + i = P ˆμ i, max ˆμ j, a j N + i = P ˆμ i, μ i Δ i ˆμ i, + μ i Δ i P ˆμ i, μ i Δ i + } {{ } R i P ˆμ i, ˆμ i, P ˆμ i, μ i + Δ i, } {{ } R i where Δ i =max i a i Ni μ i μ i is he expeced loss incurred in choosing a i insead of is bes adjacen one a i. R i can be upper bounded by a consan by relying on condiional probabiliy definiion and he Hoeffding inequaliy Hoeffding 96. Specifically, we rely on he fac ha he leader is chosen a leas imes. Upper Ll, N + l bounding R i by a consan erm requires he use of Proposiion in aufmann, orda, and Munos, which limis he expeced number of imes he opimal arm is pulled less han b imes by, where b, is a consan, and Table : Resuls concerning R % U, in he seing wih =7and = 9 and a line graph. 7 9 LUCB.8 ± ±.7.4 ±.7.68 ±.5.5 ±.7.76 ±.5 he use of a echnique already used on R i. Summing up he regre over i i and considering he hree obained bounds concludes he proof. Experimenal Evaluaion In his secion, we compare he empirical performance of he proposed algorihm wih he performance of a number of algorihms. We sudy he performance of he sae of he ar algorihm Combes and Prouiere 4a o evaluae he improvemen due o he employmen of Bayesian approaches w.r.. frequenis approaches. Furhermore, we sudy he performance of Thompson 9 o evaluae he improvemen in Bayesian approaches due o he exploiaion of he problem srucure. For compleeness, we sudy also he performance of LUCB Garivier and Cappé, being a frequenis algorihm ha is opimal for Bernoulli disribuions. Figures of meri Given a policy U, we evaluae he average and 95% confidence inervals of he following figures of meri: he pseudo regre R T U as defined in Equaion ; he lower R T U he beer he performance; he regre raio R % U, U = R T U R T U showing he raio beween he oal regre of policy U afer T rounds and he one obained wih U ; he lower R % U, U he larger he relaive improvemen of U w.r.. U. Line graphs We iniially consider he same experimenal seings, composed of line graphs, ha are sudied in Combes and Prouiere 4a. They consider graphs wih {7, 9} arms, where he arms are ordered on a line from he arm wih smalles index o he arm wih he larges index and wih Bernoulli rewards whose averages have a riangular shape wih he maximum on he arm in he middle of he line. More precisely, he minimum average is., associaed wih arms a and a 7 when =7and wih arms a and a 9 wih = 9, while he maximum average reward is μ =.9, associaed wih arm a 9 when =7and wih arm a 65 wih = 9. The averages decrease linearly from he maximum one o he minimum one. For boh he experimens, we average he regre over independen rials of lengh T = 5. We repor R U for each policy U as varies in Fig. a, for =7, and in Fig. b, for = 9. The algorihm ouperforms all he oher algorihms along he whole ime horizon, providing a significan improvemen in erms of regre w.r.. he 46

5 RU LUCB a RU LUCB Figure : Resuls for he pseudo regre R U in line graphs seings wih = 7 a and = 9 b as defined in Combes and Prouiere 4a. sae of he ar algorihms. In order o have a more precise evaluaion of he reducion of he regre w.r.. algorihm, we repor R % U, in Tab.. As also confirmed below by a more exhausive series of experimens, in line graphs he relaive improvemen of performance due o w.r.. reduces as he number of arms increases, while he relaive improvemen of performance due o w.r.. increases as he number of arms increases. Erdős-Rényi graphs To provide a horough experimenal evaluaion of he considered algorihms in seings in which he space of arms has a graph srucure, we generae graphs using he model proposed by Erdős and Rényi 959, which allows us o simulae graph srucures more complex han a simple line. An Erdős-Rényi graph is generaed by connecing nodes randomly: each edge is included in he graph wih probabiliy p, independenly from exising edges. We consider conneced graphs wih {5,,, 5,, } and wih probabiliy p {,, log,l}, where p = corresponds o have a fully conneced graph and herefore he graph srucure is useless, p = corresponds o have a number of edges ha increases linearly in he number of nodes, p = log corresponds o have a few edges w.r.. he nodes, and we use p = l o denoe line graphs hese line graphs differ from hose used for he experimenal evaluaion discussed above for he reward funcion, as discussed in wha follows. We use differen values of p in order o see how he performance of changes w.r.. he number of edges in he graph; we remark ha such an analysis is unexplored in he lieraure so far. The opimal arm is chosen randomly among he exising arms and is reward is given by a Bernoulli disribuion wih expeced value.9. The rewards of he subopimal arms are given by Bernoulli disribuions wih expeced value depending on heir disance from he opimal one. More precisely, le d i be he shores pah from he i h arm o he opimal arm and le: b d max = max i {,...,} d i be he maximum shores pah of he graph. The expeced reward of he i h arm is: μ i =.9 d.9. i d, max Table : Resuls concerning R T U T = 5 in he seing wih Erdős-Rényi graphs. 5 5 p / log/ l LUCB 4 ±.4 5 ±.5 5 ±.7 56 ±. 8 ±. ±.6 4 ±. 5 ±.7 4 ±. ± 7. 5 ± 5.8 ± 4. 7 ±. 5 ±.4 6 ±. 4 ±. LUCB 77 ±.5 7 ± ±. 59 ± 7. 4 ±. 5 ±. 56 ±.8 67 ±.5 77 ±. 76 ± ± ± 8. 9 ±. 5 ±. 7 ±. 4 ±.4 LUCB 6 ±.7 7 ± 6. 6 ± ±. 84 ±.5 ±. 7 ± ± ±.8 48 ± ± ±.7 8 ±. 7 ± ± ± 8.8 LUCB 4 ±.7 56 ± ±.5 ± ±.5 6 ± 4.4 ±. 454 ± ±. 8 ± ±.9 4 ± ±.7 8 ± ± ±. LUCB 846 ±. 4 ± 7.8 ± ± ±. 58 ± ± ± ± ± 9. 6 ± ±.7 47 ±.5 7 ± 5. 4 ± 9. 9 ± 4. LUCB 855 ±. 47 ± 6. 4 ± ± ±.4 56 ± ± ± ± ± ± ± ± ± 6.9 ± ± 4.8 i.e., he arm wih d max has a value equal o. and he expeced rewards of he arms along he pah from i o he opimal arm are evenly spaced beween. and.9. We generae differen graphs for each combinaion of and p and we run independen rials of lengh T = 5 for each graph. We average he regre over he resuls of he graphs. In Tab., we repor R T U for each combinaion of policy U,, and p. I can be observed ha he algorihm ouperforms all he oher algorihms, providing in every case he smalles regre excep for = and p = l. Below we discuss how he relaive performance of he algorihms vary as he values of he parameers and p vary. Consider he case wih p =. The performance of and are approximaely equal and he same holds for he performance of and LUCB. This is due o he fac ha he neighborhood of each node is composed by all he arms, he graphs being fully conneced, and herefore and canno ake any advanage from he srucure of he problem. We noice, however, ha and have no he same behavior and ha always performs slighly beer han. I can be observed in Fig. wih = 5 and p = ha he relaive improvemen is mainly a he beginning of he ime horizon and ha i goes o zero as increases he same holds for w.r.. LUCB. The reason behind his behavior is ha reduces he exploraion performed by in he firs rounds, forcing he algorihm o pull he leader chosen as he arm maximizing he empirical mean for a larger number of rounds. 46

6 RU LUCB RU 6 4 LUCB RU LUCB Figure : Resuls for he pseudo regre R U in he seing wih =5and p =. Consider he case wih p =. In he considered experimenal seing, he relaive performance of he algorihms does no depend on. The ordering, from he bes o he wors, over he performance of he algorihms is:,,, and finally LUCB. Surprisingly, even he dependency of he following raios on is negligible: R %, =.68 ±., R %, =.47 ±., and R %, LUCB =.68 ±.. This shows ha he relaive improvemen due o is consan w.r.. and as varies. These resuls raise he quesion wheher he relaive performance of and would be he same, excep for he numerical values, for every p consan w.r... To answer o his quesion, we consider he case in which p =., corresponding o he case in which he number of edges is linear in, bu i is smaller han he case wih p =. The resuls in erms of R T U, repored in Table show ha ouperforms for, suggesing ha, when p is consan in, may or may no ouperform depending on he specific pair p,. Consider he case wih p = log. The ordering over he performance of he algorihms changes as varies. More precisely, while keeps o be he bes algorihm for every and LUCB he wors algorihm for every, he ordering beween and changes. When performs beer han, insead when ouperforms, see Fig.. This is due o he fac ha, wih a small number of arms, exploiing he graph srucure is no sufficien for a frequenis algorihm o ouperform he performance of, while wih many arms exploiing he graph srucure even wih a frequenis algorihm is much beer han employing a general-purpose Bayesian algorihm. The raio R %, monoonically decreases as increases, from.66 when =5o.9 when =, suggesing ha exploiing he graph srucure provides ad- Table : Resuls concerning R T U T = 5 in he seing wih Erdős-Rényi graphs and p = a Figure : Resuls for he pseudo regre R U in he seing wih =5a and =b and p = log. vanages as increases. Insead, he raio R %, monoonically increases as increases, from.45 when =5o.94 when =, suggesing ha he improvemen provided by employing Bayesian approaches reduces as increases as observed above in line graphs. Consider he case wih p = l. As in he case discussed above, is ouperformed by for a small number of arms, while i ouperforms for many arms. The reason is he same above. Similarly, he raio R %, monoonically decreases as increases, from.58 when =5o.8 when =, and he raio R %, monoonically increases as increases, from.45 when =5o. when =. This confirms ha he performance of and he one of asympoically mach as increases when p = l as well as p = log. In order o invesigae he reasons behind such a behavior, we produce an addiional experimen wih he line graphs of Combes and Prouiere 4a excep ha he maximum expeced reward is se o.8 when = 7 and.65 when = 9 hus, given any edge wih erminals i and i +,wehave μ i μ i+ =.. Wha we observe deails of hese experimens and hose described below are in Paladino e al. 6 is ha, on average, ouperforms a T = 5 suggesing ha, when i is necessary o repeaedly disinguish beween hree arms ha have very similar expeced rewards, frequenis mehods may ouperform he Bayesian ones. This is no longer rue when T is much larger, e.g., T = 7, where ouperforms ineresingly, differenly from wha happens in he oher opologies, in line graphs wih very small μ i μ i+, he average R T and R T cross a number of imes during he ime horizon. Fuhermore, we evaluae how he relaive performance of w.r.. varies for μ i μ i+ {.,.,.5}, observing i improves as μ i μ i+ decreases. Finally, we evaluae wheher his behavior emerges also in Erdős-Rényi graphs in which p = c where c is a consan we use p = 5, and we observe ha ouperforms, suggesing ha line graphs wih very small μ i μ i+ are pahological insances for. b 46

7 Conclusions and Fuure Work In his paper, we focus on he Unimodal Muli Armed Bandi problem wih graph srucure in which each arm corresponds o a node of a graph and each edge is associaed wih a relaionship in erms of expeced reward beween is arms. We propose, o he bes of our knowledge, he firs Bayesian algorihm for he UMAB seing, called, which is based on he well known Thompson Sampling algorihm. We derive a igh upper bound for ha asympoically maches he lower bound for he UMAB seing, providing a non-rivial derivaion of he bound. Furhermore, we presen a horough experimenal analysis showing ha our algorihm ouperforms he sae of he ar mehods. In fuure, we will evaluae he performance of he algorihms considered in his paper wih oher classes of graphs, e.g., Barabási Alber and laices. Fuure developmen of his work may consider an analysis of he proposed algorihm in he case of ime varying environmens, i.e., he expeced reward of each arm varies over ime, assuming ha he unimodal srucure is preserved. Anoher ineresing sudy may consider he case of a coninuous decision space. References Alon, N.; Cesa-Bianchi, N.; Genile, C.; and Mansour, Y.. From bandis o expers: a ale of dominaion and independence. In Advances in Neural Informaion Processing Sysems NIPS, Auer, P.; Cesa-Bianchi, N.; and Fischer, P.. Finie ime analysis of he muliarmed bandi problem. Machine Learning 47-:5 56. Caron, S., and Bhaga, S.. Mixing bandis: A recipe for improved cold sar recommendaions in a social nework. In Proceedings of he Workshop on Social Nework Mining and Analysis SNA-DD. Chapelle, O., and Li, L.. An empirical evaluaion of hompson sampling. In Advances in Neural Informaion Processing Sysems NIPS, Combes, R., and Prouiere, A. 4a. Unimodal bandis: Regre lower bounds and opimal algorihms. In Proceedings of he Inernaional Conference on Machine Learning ICML, Combes, R., and Prouiere, A. 4b. Unimodal bandis wihou smoohness. arxiv preprin arxiv: Crandall, D.; Cosley, D.; Huenlocher, D.; leinberg, J.; and Suri, S. 8. Feedback effecs beween similariy and social influence in online communiies. In Proceedings of he Special Ineres Group on nowledge Discovery and Daa Mining SIGDD, Edelman, B., and Osrovsky, M. 7. Sraegic bidder behavior in sponsored search aucions. Decision Suppor Sysems 4:9 98. Erdős, P., and Rényi, A On random graphs I. Publicaiones Mahemaicae Debrecen 6:9 97. Garivier, A., and Cappé, O.. The kl-ucb algorihm for bounded sochasic bandis and beyond. In Proceedings of he Conference on Learning Theory COLT, Hoeffding, W. 96. Probabiliy inequaliies for sums of bounded random variables. Journal of he American Saisical Associaion 58:. Jia, Y. Y., and Mannor, S.. Unimodal bandis. In Proceedings of he Inernaional Conference on Machine Learning ICML, aufmann, E.; orda, N.; and Munos, R.. Thompson sampling: An asympoically opimal finie ime analysis. In Proceedings of he Algorihmic Learing Theory ALT, 99. leinberg, R.; Slivkins, A.; and Upfal, E. 8. Muli armed bandis in meric spaces. In Proceedings of he Symposium on Theory Of Compuing STOC, Lai, T. L., and Robbins, H Asympoically efficien adapive allocaion rules. Advances in Applied Mahemaics 6:4. Mannor, S., and Shamir, O.. From bandis o expers: On he value of side observaions. In Advances in Neural Informaion Processing Sysems NIPS, Mas-Collel, A.; Whinson, M. D.; and Green, J. R Micreconomic Theory. Oxford Universiy Press. McPherson, M.; Smih-Lovin, L.; and Cook, J. M.. Birds of a feaher: Homophily in social neworks. Annual review of sociology Paladino, S.; Trovò, F.; Reselli, M.; and Gai, N. 6. Unimodal hompson sampling for graph-srucured arms. arxiv preprin arxiv: Thompson, W. R. 9. On he likelihood ha one unknown probabiliy exceeds anoher in view of he evidence of wo samples. Biomerika 5: Trovò, F.; Paladino, S.; Reselli, M.; and Gai, N. 5. Muli armed bandi for pricing. In Proceedings of he European Workshop on Reinforcemen Learning EWRL. Valko, M.; Munos, R.; veon, B.; and ocak, T. 4. Specral bandis for smooh graph funcions. In Proceedings of he Inernaional Conference on Machine Learning ICML, Xia, Y.; Li, H.; Qin, T.; Yu, N.; and Liu, T.-Y. 5. Thompson sampling for budgeed muli armed bandis. In Proceedings of he Inernaional Join Conference on Arificial Inelligence IJCAI. 46

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

Budgeted Multi Armed Bandit in Continuous Action Space

Budgeted Multi Armed Bandit in Continuous Action Space Budgeed Muli Armed Bandi in Coninuous Acion Space Francesco Trovò, Sefano aladino, Marcello Reselli, Nicola Gai 1 Absrac. Muli Armed Bandis MABs) have been widely considered in he las decade o model seings

More information

Single and Double Pendulum Models

Single and Double Pendulum Models Single and Double Pendulum Models Mah 596 Projec Summary Spring 2016 Jarod Har 1 Overview Differen ypes of pendulums are used o model many phenomena in various disciplines. In paricular, single and double

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Conservative Contextual Linear Bandits

Conservative Contextual Linear Bandits Conservaive Conexual Linear Bandis Abbas Kazerouni Sanford Universiy abbask@sanford.edu Yasin Abbasi-Yadkori Adobe Research abbasiya@adobe.com Mohammad Ghavamzadeh DeepMind ghavamza@google.com Benjamin

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

Notes on online convex optimization

Notes on online convex optimization Noes on online convex opimizaion Karl Sraos Online convex opimizaion (OCO) is a principled framework for online learning: OnlineConvexOpimizaion Inpu: convex se S, number of seps T For =, 2,..., T : Selec

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Capacity Constraints

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Capacity Constraints IJCSI Inernaional Journal of Compuer Science Issues, Vol 9, Issue 1, No 1, January 2012 wwwijcsiorg 18 Applying Geneic Algorihms for Invenory Lo-Sizing Problem wih Supplier Selecion under Sorage Capaciy

More information

Cash Flow Valuation Mode Lin Discrete Time

Cash Flow Valuation Mode Lin Discrete Time IOSR Journal of Mahemaics (IOSR-JM) e-issn: 2278-5728,p-ISSN: 2319-765X, 6, Issue 6 (May. - Jun. 2013), PP 35-41 Cash Flow Valuaion Mode Lin Discree Time Olayiwola. M. A. and Oni, N. O. Deparmen of Mahemaics

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

EXERCISES FOR SECTION 1.5

EXERCISES FOR SECTION 1.5 1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems.

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems. di ernardo, M. (995). A purely adapive conroller o synchronize and conrol chaoic sysems. hps://doi.org/.6/375-96(96)8-x Early version, also known as pre-prin Link o published version (if available):.6/375-96(96)8-x

More information

Conservative Contextual Linear Bandits

Conservative Contextual Linear Bandits Conservaive Conexual Linear Bandis Abbas Kazerouni, Mohammad Ghavamzadeh and Benjamin Van Roy 1 Absrac Safey is a desirable propery ha can immensely increase he applicabiliy of learning algorihms in real-world

More information

KINEMATICS IN ONE DIMENSION

KINEMATICS IN ONE DIMENSION KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec

More information

3.1 More on model selection

3.1 More on model selection 3. More on Model selecion 3. Comparing models AIC, BIC, Adjused R squared. 3. Over Fiing problem. 3.3 Sample spliing. 3. More on model selecion crieria Ofen afer model fiing you are lef wih a handful of

More information

Introduction to Probability and Statistics Slides 4 Chapter 4

Introduction to Probability and Statistics Slides 4 Chapter 4 Inroducion o Probabiliy and Saisics Slides 4 Chaper 4 Ammar M. Sarhan, asarhan@mahsa.dal.ca Deparmen of Mahemaics and Saisics, Dalhousie Universiy Fall Semeser 8 Dr. Ammar Sarhan Chaper 4 Coninuous Random

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

Expert Advice for Amateurs

Expert Advice for Amateurs Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

CHERNOFF DISTANCE AND AFFINITY FOR TRUNCATED DISTRIBUTIONS *

CHERNOFF DISTANCE AND AFFINITY FOR TRUNCATED DISTRIBUTIONS * haper 5 HERNOFF DISTANE AND AFFINITY FOR TRUNATED DISTRIBUTIONS * 5. Inroducion In he case of disribuions ha saisfy he regulariy condiions, he ramer- Rao inequaliy holds and he maximum likelihood esimaor

More information

A Hop Constrained Min-Sum Arborescence with Outage Costs

A Hop Constrained Min-Sum Arborescence with Outage Costs A Hop Consrained Min-Sum Arborescence wih Ouage Coss Rakesh Kawara Minnesoa Sae Universiy, Mankao, MN 56001 Email: Kawara@mnsu.edu Absrac The hop consrained min-sum arborescence wih ouage coss problem

More information

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H.

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H. ACE 564 Spring 2006 Lecure 7 Exensions of The Muliple Regression Model: Dumm Independen Variables b Professor Sco H. Irwin Readings: Griffihs, Hill and Judge. "Dumm Variables and Varing Coefficien Models

More information

Some Ramsey results for the n-cube

Some Ramsey results for the n-cube Some Ramsey resuls for he n-cube Ron Graham Universiy of California, San Diego Jozsef Solymosi Universiy of Briish Columbia, Vancouver, Canada Absrac In his noe we esablish a Ramsey-ype resul for cerain

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

Learning to Discover: A Bayesian Approach

Learning to Discover: A Bayesian Approach Learning o Discover: A Bayesian Approach Zheng Wen Deparmen of Elecrical Engineering Sanford Universiy Sanford, CA zhengwen@sanford.edu Branislav Kveon and Sandilya Bhamidipai Technicolor Labs Palo Alo,

More information

Games Against Nature

Games Against Nature Advanced Course in Machine Learning Spring 2010 Games Agains Naure Handous are joinly prepared by Shie Mannor and Shai Shalev-Shwarz In he previous lecures we alked abou expers in differen seups and analyzed

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time. Supplemenary Figure 1 Spike-coun auocorrelaions in ime. Normalized auocorrelaion marices are shown for each area in a daase. The marix shows he mean correlaion of he spike coun in each ime bin wih he spike

More information

Approximation Algorithms for Unique Games via Orthogonal Separators

Approximation Algorithms for Unique Games via Orthogonal Separators Approximaion Algorihms for Unique Games via Orhogonal Separaors Lecure noes by Konsanin Makarychev. Lecure noes are based on he papers [CMM06a, CMM06b, LM4]. Unique Games In hese lecure noes, we define

More information

Some Basic Information about M-S-D Systems

Some Basic Information about M-S-D Systems Some Basic Informaion abou M-S-D Sysems 1 Inroducion We wan o give some summary of he facs concerning unforced (homogeneous) and forced (non-homogeneous) models for linear oscillaors governed by second-order,

More information

SOLUTIONS TO ECE 3084

SOLUTIONS TO ECE 3084 SOLUTIONS TO ECE 384 PROBLEM 2.. For each sysem below, specify wheher or no i is: (i) memoryless; (ii) causal; (iii) inverible; (iv) linear; (v) ime invarian; Explain your reasoning. If he propery is no

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous

More information

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

Echocardiography Project and Finite Fourier Series

Echocardiography Project and Finite Fourier Series Echocardiography Projec and Finie Fourier Series 1 U M An echocardiagram is a plo of how a porion of he hear moves as he funcion of ime over he one or more hearbea cycles If he hearbea repeas iself every

More information

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem.

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem. Noes, M. Krause.. Problem Se 9: Exercise on FTPL Same model as in paper and lecure, only ha one-period govenmen bonds are replaced by consols, which are bonds ha pay one dollar forever. I has curren marke

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1.

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1. Roboics I April 11, 017 Exercise 1 he kinemaics of a 3R spaial robo is specified by he Denavi-Harenberg parameers in ab 1 i α i d i a i θ i 1 π/ L 1 0 1 0 0 L 3 0 0 L 3 3 able 1: able of DH parameers of

More information

Undetermined coefficients for local fractional differential equations

Undetermined coefficients for local fractional differential equations Available online a www.isr-publicaions.com/jmcs J. Mah. Compuer Sci. 16 (2016), 140 146 Research Aricle Undeermined coefficiens for local fracional differenial equaions Roshdi Khalil a,, Mohammed Al Horani

More information

5.1 - Logarithms and Their Properties

5.1 - Logarithms and Their Properties Chaper 5 Logarihmic Funcions 5.1 - Logarihms and Their Properies Suppose ha a populaion grows according o he formula P 10, where P is he colony size a ime, in hours. When will he populaion be 2500? We

More information

Graphical Event Models and Causal Event Models. Chris Meek Microsoft Research

Graphical Event Models and Causal Event Models. Chris Meek Microsoft Research Graphical Even Models and Causal Even Models Chris Meek Microsof Research Graphical Models Defines a join disribuion P X over a se of variables X = X 1,, X n A graphical model M =< G, Θ > G =< X, E > is

More information

) were both constant and we brought them from under the integral.

) were both constant and we brought them from under the integral. YIELD-PER-RECRUIT (coninued The yield-per-recrui model applies o a cohor, bu we saw in he Age Disribuions lecure ha he properies of a cohor do no apply in general o a collecion of cohors, which is wha

More information

Unit Root Time Series. Univariate random walk

Unit Root Time Series. Univariate random walk Uni Roo ime Series Univariae random walk Consider he regression y y where ~ iid N 0, he leas squares esimae of is: ˆ yy y y yy Now wha if = If y y hen le y 0 =0 so ha y j j If ~ iid N 0, hen y ~ N 0, he

More information

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given

More information

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux Gues Lecures for Dr. MacFarlane s EE3350 Par Deux Michael Plane Mon., 08-30-2010 Wrie name in corner. Poin ou his is a review, so I will go faser. Remind hem o go lisen o online lecure abou geing an A

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN Inernaional Journal of Scienific & Engineering Research, Volume 4, Issue 10, Ocober-2013 900 FUZZY MEAN RESIDUAL LIFE ORDERING OF FUZZY RANDOM VARIABLES J. EARNEST LAZARUS PIRIYAKUMAR 1, A. YAMUNA 2 1.

More information

Mathematical Theory and Modeling ISSN (Paper) ISSN (Online) Vol 3, No.3, 2013

Mathematical Theory and Modeling ISSN (Paper) ISSN (Online) Vol 3, No.3, 2013 Mahemaical Theory and Modeling ISSN -580 (Paper) ISSN 5-05 (Online) Vol, No., 0 www.iise.org The ffec of Inverse Transformaion on he Uni Mean and Consan Variance Assumpions of a Muliplicaive rror Model

More information

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8) I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

14 Autoregressive Moving Average Models

14 Autoregressive Moving Average Models 14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class

More information

Chapter 3 Boundary Value Problem

Chapter 3 Boundary Value Problem Chaper 3 Boundary Value Problem A boundary value problem (BVP) is a problem, ypically an ODE or a PDE, which has values assigned on he physical boundary of he domain in which he problem is specified. Le

More information

Linear Submodular Bandits with a Knapsack Constraint

Linear Submodular Bandits with a Knapsack Constraint Proceedings of he Thirieh AAAI Conference on Arificial Inelligence (AAAI-) Linear Submodular Bandis wih a Knapsack Consrain Baosheng Yu and Meng Fang and Dacheng Tao Cenre for Quanum Compuaion and Inelligen

More information

Economics 8105 Macroeconomic Theory Recitation 6

Economics 8105 Macroeconomic Theory Recitation 6 Economics 8105 Macroeconomic Theory Reciaion 6 Conor Ryan Ocober 11h, 2016 Ouline: Opimal Taxaion wih Governmen Invesmen 1 Governmen Expendiure in Producion In hese noes we will examine a model in which

More information

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates Biol. 356 Lab 8. Moraliy, Recruimen, and Migraion Raes (modified from Cox, 00, General Ecology Lab Manual, McGraw Hill) Las week we esimaed populaion size hrough several mehods. One assumpion of all hese

More information

Stochastic Bandits with Pathwise Constraints

Stochastic Bandits with Pathwise Constraints Sochasic Bandis wih Pahwise Consrains Auhor Insiue Absrac. We consider he problem of sochasic bandis, wih he goal of maximizing a reward while saisfying pahwise consrains. The moivaion for his problem

More information

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t M ah 5 2 7 Fall 2 0 0 9 L ecure 1 0 O c. 7, 2 0 0 9 Hamilon- J acobi Equaion: Explici Formulas In his lecure we ry o apply he mehod of characerisics o he Hamilon-Jacobi equaion: u + H D u, x = 0 in R n

More information

Average Number of Lattice Points in a Disk

Average Number of Lattice Points in a Disk Average Number of Laice Poins in a Disk Sujay Jayakar Rober S. Sricharz Absrac The difference beween he number of laice poins in a disk of radius /π and he area of he disk /4π is equal o he error in he

More information

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models. Technical Repor Doc ID: TR--203 06-March-203 (Las revision: 23-Februar-206) On formulaing quadraic funcions in opimizaion models. Auhor: Erling D. Andersen Convex quadraic consrains quie frequenl appear

More information

An Online Minimax Optimal Algorithm for Adversarial Multi-Armed Bandit Problem

An Online Minimax Optimal Algorithm for Adversarial Multi-Armed Bandit Problem 1 An Online Minimax Opimal Algorihm for Adversarial Muli-Armed Bandi Problem Kaan Gokcesu and Suleyman S. Koza, Senior Member, IEEE Absrac We invesigae he adversarial muli-armed bandi problem and inroduce

More information

Two Coupled Oscillators / Normal Modes

Two Coupled Oscillators / Normal Modes Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own

More information

Lecture 33: November 29

Lecture 33: November 29 36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure

More information

Stability and Bifurcation in a Neural Network Model with Two Delays

Stability and Bifurcation in a Neural Network Model with Two Delays Inernaional Mahemaical Forum, Vol. 6, 11, no. 35, 175-1731 Sabiliy and Bifurcaion in a Neural Nework Model wih Two Delays GuangPing Hu and XiaoLing Li School of Mahemaics and Physics, Nanjing Universiy

More information

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data Chaper 2 Models, Censoring, and Likelihood for Failure-Time Daa William Q. Meeker and Luis A. Escobar Iowa Sae Universiy and Louisiana Sae Universiy Copyrigh 1998-2008 W. Q. Meeker and L. A. Escobar. Based

More information

18 Biological models with discrete time

18 Biological models with discrete time 8 Biological models wih discree ime The mos imporan applicaions, however, may be pedagogical. The elegan body of mahemaical heory peraining o linear sysems (Fourier analysis, orhogonal funcions, and so

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

What Ties Return Volatilities to Price Valuations and Fundamentals? On-Line Appendix

What Ties Return Volatilities to Price Valuations and Fundamentals? On-Line Appendix Wha Ties Reurn Volailiies o Price Valuaions and Fundamenals? On-Line Appendix Alexander David Haskayne School of Business, Universiy of Calgary Piero Veronesi Universiy of Chicago Booh School of Business,

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

Presentation Overview

Presentation Overview Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion

More information

A Dynamic Model of Economic Fluctuations

A Dynamic Model of Economic Fluctuations CHAPTER 15 A Dynamic Model of Economic Flucuaions Modified for ECON 2204 by Bob Murphy 2016 Worh Publishers, all righs reserved IN THIS CHAPTER, OU WILL LEARN: how o incorporae dynamics ino he AD-AS model

More information

Stochastic models and their distributions

Stochastic models and their distributions Sochasic models and heir disribuions Couning cusomers Suppose ha n cusomers arrive a a grocery a imes, say T 1,, T n, each of which akes any real number in he inerval (, ) equally likely The values T 1,,

More information

A Local Regret in Nonconvex Online Learning

A Local Regret in Nonconvex Online Learning Sergul Aydore Lee Dicker Dean Foser Absrac We consider an online learning process o forecas a sequence of oucomes for nonconvex models. A ypical measure o evaluae online learning policies is regre bu such

More information

Lecture 2 April 04, 2018

Lecture 2 April 04, 2018 Sas 300C: Theory of Saisics Spring 208 Lecure 2 April 04, 208 Prof. Emmanuel Candes Scribe: Paulo Orensein; edied by Sephen Baes, XY Han Ouline Agenda: Global esing. Needle in a Haysack Problem 2. Threshold

More information

The Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite

The Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite American Journal of Operaions Research, 08, 8, 8-9 hp://wwwscirporg/journal/ajor ISSN Online: 60-8849 ISSN Prin: 60-8830 The Opimal Sopping Time for Selling an Asse When I Is Uncerain Wheher he Price Process

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information