Joint Channel Selection and Power Control in Infrastructureless Wireless Networks: A Multi-Player Multi-Armed Bandit Framework

Size: px
Start display at page:

Download "Joint Channel Selection and Power Control in Infrastructureless Wireless Networks: A Multi-Player Multi-Armed Bandit Framework"

Transcription

1 Join Channel Selecion and Power Conrol in Infrasrucureless Wireless Neworks: A Muli-Player Muli-Armed Bandi Framework Seareh Maghsudi and Sławomir Sańczak, Senior Member, IEEE arxiv: v [cs.gt] 2 Jul 24 Absrac This paper deals wih he problem of efficien resource allocaion in dynamic infrasrucureless wireless neworks. Assuming a reacive inerference-limied scenario, each ransmier is allowed o selec one frequency channel from a common pool ogeher wih a power level a each ransmission rial; hence, for all ransmiers, no only he fading gain, bu also he number of inerfering ransmissions and heir ransmi powers are varying over ime. Due o he absence of a cenral conroller and ime-varying nework characerisics, i is highly inefficien for ransmiers o acquire global channel and nework knowledge. Therefore a reasonable assumpion is ha ransmiers have no knowledge of fading gains, inerference, and nework opology. Each ransmiing node selfishly aims a maximizing is average reward or minimizing is average cos, which is a funcion of he acion of ha specific ransmier as well as hose of all oher ransmiers. This scenario is modeled as a muli-player muli-armed adversarial bandi game, in which muliple players receive an a priori unknown reward wih an arbirarily imevarying disribuion by sequenially pulling an arm, seleced from a known and finie se of arms. Since players do no know he arm wih he highes average reward in advance, hey aemp o minimize heir so-called regre, deermined by he se of players acions, while aemping o achieve equilibrium in some sense. To his end, we design in his paper wo join power level and channel selecion sraegies. We prove ha he gap beween he average reward achieved by our approaches and ha based on he bes fixed sraegy converges o zero asympoically. Moreover, he empirical join frequencies of he game converge o he se of correlaed equilibria. We also characerize his se for wo special cases Pars of he maerial in his paper were presened a he IEEE Wireless Communicaions and Neworking Conference, Shanghai, April, 23. The work was suppored by he German Research Foundaion DFG under gran STA 864/3-3. The auhors are wih he Fachgebie für Informaionsheorie und heoreische Informaionsechnik, Technische Universiä Berlin. The second auhor is also wih he Fraunhofer Insiue for Telecommunicaions Heinrich Herz Insiue, Berlin, Germany seareh.maghsudi@u-berlin.de, slawomir.sanczak@hhi.fraunhofer.de.

2 2 of our designed game. We furher discuss experimenal regre-esing procedure as anoher poenial soluion, which converges o Nash equilibrium. Finally all approaches are compared hrough exensive numerical analysis. Index Terms Adversarial bandis, channel selecion, equilibrium, infrasrucureless wireless nework, power conrol. I. INTRODUCTION A. Bandi Theory and Wireless Communicaion Muli-armed bandi MAB is a class of sequenial opimizaion problems, o he bes of our knowledge originally inroduced in []. In he mos radiional form of MAB, given a se of arms acions, a player pulls an arm a each rial of he game o receive a reward. The rewards of arms are no known o he player in advance; however, upon pulling an arm, is insananeous reward is revealed. In such unknown seing, afer playing an arm, he player may lose some reward or incur addiional cos due o no playing anoher arm insead of he currenly played arm. This can be quanified by he difference beween he reward ha would have been achieved had he player seleced anoher arm, and he reward of he played arm. This quaniy is called regre. The player decides which arm o pull in a sequence of rials so ha is accumulaed regre over he game horizon is minimized. Such problems obviously render he inrinsic rade-off beween exploraion learning and exploiaion conrol, i.e. playing he arm which has exhibied he bes performance in he pas and playing oher arms o guaranee he opimal payoff in fuure. An imporan class of bandi games is adversarial bandis, where he series of rewards generaed by an arm canno be aribued o any specific disribuion funcion. In recen years, bandi heory has been used in communicaion heory. For insance, [2] and [3] uilize he classical bandi game o model specrum sharing in cogniive radio neworks. In [4], he auhors propose a cooperaive specrum sensing scheme based on bandi heory. Furher, References [5], [6], and [7] use bandi heory o model relay selecion, sensor scheduling and objec racking, respecively. Channel monioring using bandi model is invesigaed in [8] and [9]. Bandi models have been also used o solve he disribued resource allocaion problem, as discussed in he following.

3 3 B. Disribued Resource Allocaion in Infrasrucureless Wireless Neworks In recen years, game heory and reinforcemen learning have been widely used o solve he disribued resource allocaion problem. The vas majoriy of game-heoreic approaches are based on eiher cooperaion e.g. coaliion formaion, mechanism design e.g. aucion heory, or exchange economy e.g. supply-demand markes. Alhough hese approaches can be implemened in a disribued manner, such an implemenaion in a real nework environmen requires ha each player a leas knows is own uiliy funcion a priori. On he oher hand, hese approaches are in general inefficien as players have o exchange informaion for coordinaion, which increases signaling and feedback overhead. For example, mos models from cooperaive game heory require coordinaion and/or communicaion among players o consruc coaliions [], []. In wireless resource allocaion using aucion games, bids mus be submied o some cenral conroller ha performs necessary compuaions and makes decisions [2], [3]. Finally, in supply-demand marke models, prices and demands are exchanged among buyers and sellers [4], [5]. When he uiliy funcions are no known in advance, he resource allocaion problem is ofen solved by using learning approaches, including bandi models. A large body of lieraure, such as [6], [7] and [8], analyze single-agen sochasic learning problems. Anoher example is [9]. In his work, nework opimizaion is modeled as a sochasic bandi game, where a each rial muliple arms are seleced by a single player and he reward is some linear combinaion of he rewards of seleced arms. An applicaion of his formulaion migh be a downlink user selecion, performed by he base saion. In single-agen seings, he agen learns from is previous experiences, and no informaion flow is required. However, his ype of learning canno generally be used in wireless neworks, where muliple players ac selfishly by responding o each oher and heir uiliies are influenced by he acions of oher players. Moreover, similar o games wih complee informaion, i is desired ha players achieve equilibrium in some sense. As for muli-agen seings, mos sudies assume ha players are able o observe he acions of each oher. This assumpion, despie being realisic for some specrum sharing problems, is no always applicable o general resource allocaion problems, especially in power conrol games, where i is difficul o idenify he ransmi power level of players. In addiion, he assumpion ha each player announces is acions e.g. is ransmi power is no inensive compaible. As a resul, a

4 4 grea majoriy of previous works focus on specrum sharing and/or sensing, as well as channel monioring. On he oher hand, mos of previous sudies assume ha he rewards achieved by each acion can be aribued o a single densiy disribuion. However his assumpion is highly resricive especially for dynamic neworks. In [2], muli-agen bandi problem is invesigaed. This sudy assumes ha in case of inerference, no reward is paid o inerfering users, hereby eliminaing inerference, which degrades he overall performance depending on uiliy funcions. In addiion, communicaion among players is necessary. Finally, no equilibrium analysis is performed. Anoher example is Reference [2], where opporunisic specrum access is formulaed as a muli-agen learning game. In his work, upon availabiliy, each channel pays he same reward o all users so ha his scenario is sricly resricive as i neglecs differen channel qualiies. Moreover, if a channel is seleced by muliple users, orhogonal specrum access scheme is used, which is known o be sub-opimal in general. References [22] and [23] consider graphical games for an inerference minimizaion problem wih parially overlapping channels, where he inerference is presen only beween neighboring users. These works esablish he convergence of proposed learning approaches for he special case of exac poenial games; Noneheless he analysis does no hold for more general games. The auhors of [24] model he cooperaive rae maximizaion in cogniive radio neworks as bandi game, and propose wo approaches, depending on he availabiliy of informaion. The sabiliy of he soluion is however no invesigaed. Reference [25] proposes wo approaches ha achieve Nash equilibrium in a muli-player cogniive environmen. Sysem verificaion, however, is only based on numerical approaches. References [26], [27] and [28] propose various selecion schemes o achieve logarihmic regre; however, no equilibrium analysis is performed. All of he works named above assume ha he generaed rewards of any given acion are independen and idenically disribued. C. Our Conribuion As discussed in Secion I-B, he resource allocaion problem using machine learning heory has been subjec o exensive research in recen years. In shor, our focus is on a resource allocaion problem in an infrasrucureless nework. Firs, we model his problem as an adversarial muliplayer muli-armed bandi game. Wih he aim of an efficien managemen of nework resources and he co-channel inerference miigaion, we follow an approach suggesed in [29] o design

5 5 wo join power conrol and channel selecion PC-CS, hereafer sraegies, which are adaped versions of exponenial-based weighed average [3] and follow he leader [3] sraegies. Boh PC-CS sraegies no only resul in small ha is, wih sublinear growh rae in ime regre for each individual player, bu also guaranee he convergence of empirical frequencies of play o he se of correlaed equilibria. We furher characerize his se for wo special cases of our designed game. Moreover, we implemen he experimenal regre-esing procedure [32], which is shown o converge o he se of Nash equilibria of he game. Our work exends he sae-of-he-ar in his area significanly since i differs from he exising sudies in he following crucial aspecs: We analyze he muli-agen bandi problem and ake ino accoun he selfishness of players. We do no assume ha he reward generaing process of any given acion is ime-invarian. In fac, he reward funcions are allowed o vary arbirarily, which enables us o accommodae he dynamic naure of wireless channels and disribued neworks. We do no allow any communicaion among players, hereby minimizing he overhead. Moreover, players do no observe he acions of each oher, so ha he developed model can be applied o a large body of resource allocaion problems. An example is a power conrol problem wih unknown power levels used by oher players. We sudy a wo-dimensional problem, namely join channel and power level selecion problem, by modeling i as a muliplayer muli-armed bandi game. In our model, channel qualiies are aken ino accoun so ha channels pay differen rewards o differen users. In addiion, we impose no limiaions on inerference paern. Our convergence analysis is valid for a wide range of games. This is in conras o many previous works where he game should be necessarily poenial for he convergence analysis o hold. We characerize he se of correlaed equilibria for wo special cases of our formulaed game model. D. Paper Srucure Secion II briefly reviews some conceps and resuls of bandi heory. In Secion III he resource allocaion game is formulaed. Secion IV presens a PC-CS sraegy based on exponenialbased weighed average rule [3]. In Secion V, anoher PC-CS sraegy, derived from follow

6 6 he leader rule [3] is discussed. Secion VI is devoed o experimenal regre-esing procedure [32]. Numerical analysis are presened in Secion VII. Secion VIII concludes he paper. A. Noions of Regre II. MULTI-PLAYER MULTI-ARMED BANDIT GAMES Muli-player muli-armed bandi problem MP-MAB, hereafer is a class of sequenial decision making problems wih limied informaion. In his game, each player k {,..., K} is assigned an acion se including N k acions arms, N k N. Every player selecs an acion a successive rials in order o receive an iniially unknown reward, which is deermined no only by is own acions, bu also by hose of oher players. The acion se, he played acion and he reward achieved by each player are regarded as privae informaion. The reward generaing processes of arms are independen. Le I and I k be he join acion space and he acion space of player k, respecively. Accordingly, I = I of players a ime, wih I k,..., I k,..., I K denoes he join acion profile being he acion of player k. Moreover, le g k I [, ] be he reward achieved by some player k a ime. The insananeous regre of any player k is defined as he difference beween he reward of he opimal acion, 2 and ha of he played acion. Based on his definiion, he cumulaive regre of player k is formally defined in he following. Definiion. The cumulaive regre of player k up o ime n is defined as R k n = max i=,...,n k = g k i, I,k = g k I k, I,k, where I,k is defined o be he join acion profile of all players excep for k a ime. Each player aims a minimizing is accumulaed regre, which is an insance of he well-known exploiaion-exploraion dilemma: Find a desired balance beween exploiing acions ha have exhibied well performance in he pas conrol on he one hand, and exploring acions which migh lead o a beer performance in he fuure learning on he oher hand. Now, suppose ha players use mixed sraegies. This means ha, a each rial, player k selecs a probabiliy disribuion P k = p k,,..., p k,..., pk over arms, and plays arm i wih i, N k, Noe ha all resuls can be also expressed in erms of loss d, provided ha he loss is relaed o he gain by d = g, g [, ]. 2 Opimaliy is defined in he sense of he highes insananeous reward.

7 7 probabiliy p k i,. In his case, we resor o expeced regre, also called exernal regre [33], defined as follows. Definiion 2. The exernal cumulaive regre of player k is defined as R k Ex := Rk Ex n = max i=,...,n k = = max i=,...,n k g k = i, I,k N j= p k j, = g k ḡ k P k, I,k i, I,k gk j, I,k, 2 where ḡ k ḡ k = N denoes he expeced reward a round by using mixed sraegy P k, defined as j= gk p k j,. By definiion, exernal regre compares he expeced reward of he curren mixed sraegy wih ha of he bes fixed acion in he hindsigh, bu fails o compare he rewards achieved by changing acions in a pair-wise manner. In order o compare acions in pairs, inernal regre [33] is inroduced ha is closely relaed o he concep of equilibrium in games. Definiion 3. The inernal cumulaive regre of player k is defined as R k In := R k In n = max R k i,j=,...,n i j,n k = max i,j=,...,n k = p k i, g k j, I,k Noice ha on he righ-hand side of 3, r k i j, = pk i, k g i, I,k. g k j, g k i, 3 denoes he expeced regre caused by pulling arm i insead of arm j. By comparing 2 and 3, exernal regre can be bounded above by inernal regre as [34] R k Ex = max i=,...,n k N k j= R k i j,n N k max R k i,j=,...,n i j,n = N kr k In. 4 k Remark. Throughou he paper, vanishing zero-average exernal and inernal regre means ha lim n n R Ex = and lim n n R In =, respecively. In oher words, we have R Ex on and R In on. Noe ha by 4, R In on yields R Ex on. Throughou he paper, we call any sraegy wih R In on as no-regre sraegy.

8 8 B. Equilibrium From he view poin of each player k, an MP-MAB is seen as a game wih wo agens: player k iself, and he se of all oher K players referred o as he opponen, whose join acion profile affecs he reward achieved by player k. We consider here he mos general framework, where he opponen is non-oblivious, i.e. is series of acions depends on he acions of player k. I is known ha a game agains a non-oblivious opponen can be modeled only by adversarial bandi games [35], while similar o oher game-heoreic formulaions, he soluion is considered o be equilibrium, mos imporanly Nash and correlaed equilibria. 3 In he conex of game-heoreic bandis, an imporan resul is he following heorem. Theorem [33]. Consider a K-player bandi game, where each player k is provided wih an acion se of cardinaliy N k. Denoe he inernal regre of player k by R k In, and he se of correlaed equilibria by C. A ime n, define he empirical join disribuion of he game as ˆπ n i = n = I {I=i}, i = i,..., i K K {,..., N K }. 5 k= Then, if all players k {,..., K} play according o any sraegy so ha lim n n Rk In =, 6 he disance inf π C i ˆπ ni πi beween he empirical join disribuion of plays and he se of correlaed equilibria converges o almos surely. Theorem simply saes ha in an MP-MAB game, if all players play according o a sraegy wih vanishing inernal regre no-regre, hen he empirical join disribuion of plays converges o he se of correlaed equilibria. Noe ha he sraegies used by players are no required o be idenical. Since a raional player is always ineresed in minimizing is regre, he assumpion ha every player plays according o a no-regre sraegy is reasonable. C. From Vanishing Exernal Regre o Vanishing Inernal Regre In [34], an approach is proposed for convering any selecion sraegy wih vanishing exernal regre o anoher version wih vanishing inernal regre. We describe his approach briefly. 3 These definiions are quie sandard see e.g. [36], and hus we do no resae hem here.

9 9 Consider a selecion sraegy O-sraegy, hereafer which a each ime assigns probabiliy disribuion P o he se of N acions, and selecs an acion according o his disribuion. Assume ha he player sars using O-Sraegy wih uniform disribuion over N acions. A each ime >, he O-sraegy has already seleced P = p,,.., p i,,.., p j,,.., p N,. Now, he O-sraegy consrucs a mea-sraegy M-sraegy, hereafer wih NN virual sraegies based on P. Each virual sraegy corresponds o a pair of acions i j, i, j {,..., N}, i j, and consrucs a disribuion over N acions by assigning he probabiliy mass of acion i o acion j. Tha is, i defines P i j = p,,..,,.., p j, + p i,,.., p N,, which has and p j, +p i, a he place of p i, and p j,, respecively, and all oher elemens remain unchanged. Assume ha he M-sraegy reas hese virual sraegies as acions. Tha is, a each ime, i defines a probabiliy vecor δ over NN virual acions, where he probabiliy of acion i j, i.e. δ i j,, depends on is pas performance. 4 Now, a ime, he O-sraegy assigns a disribuion P o N acions, where P = i,j:i j Pi j δ i j,. The consruced O- sraegy has he characerisic ha is inernal regre is upper-bounded by he exernal regre of he M-sraegy over NN virual acions according o probabiliy δ. Thus, if he M-sraegy exhibis vanishing exernal regre, he O-sraegy resuls in vanishing inernal regre. In Secion IV and V, we use his propery o design no-regre selecion sraegies. III. BANDIT-THEORETICAL MODEL OF INFRASTRUCTURELESS WIRELESS NETWORKS We consider a nework consising of K ransmier-receiver pairs, denoed by k, k, where k, k {,..., K}. The ransmier-receiver pair k, k is referred o as user or player k. Each user k can access C k muually orhogonal channels a L k quanized power levels. This implies ha is sraegy se includes N k = C k L k acions, where a ime each acion I k = c k, l k consiss of one channel index which corresponds o some channel qualiy, and one power level. Therefore, he join acion profile of users, I, is o be undersood here as he pair c, l, where c = c,..., c K and l = l,..., l K. As each channel migh be accessible by muliple users, co-channel inerference collision, inerchangeably is likely o arise. Since users are allowed o selec a new channel and o adap heir power levels a each ransmission rial, inerference paern in general changes over ime. In addiion, he disribuion of fading coefficiens migh be 4 Noe ha he gains of virual acions canno be calculaed explicily. Laer we will see ha he gain achieved by any virual acion i j is calculaed based on he gain achieved by playing rue acions i and j.

10 ime-varying so ha acquiring channel and/or nework informaion a he level of auonomous ransmiers would be exremely challenging and inefficien. Therefore, we assume ha A ransmiers have no channel knowledge or any oher side informaion such as he number of users or heir seleced acions. A2 In addiion, users do no coordinae heir acions ha can be chosen compleely asynchronously by each user. Noe ha as users do no observe he acions of each oher, i migh be in heir ineres o selec heir acions a he beginning of rials, hereby using he remaining ime for daa ransmission. In his paper, we model he join channel and power level selecion problem as a K-player adversarial bandi game, where player k decides for one of he N k acions. We define he expeced uiliy funcion reward of player k o be 5 G k l k h kk I = log,,c k 2 Qk q= lq h qk,,c k 2 + N α l k, 7 for some given join acion profile I = c, l. In 7, Q k < K is he number of players ha inerfere wih user k in channel c k. Throughou he paper, h uv,,c 2 R + is used o denoe he average gain of channel c beween u v a ime. N is he variance of zero-mean addiive whie Gaussian noise, and α is he consan power price facor. The las erm in 7 is used o penalize he use of excessive power. According o Secion II, le g k I [, ] denoe he achieved reward of player k a ime, as a funcion of join acion profile I. We consider a game wih noisy rewards where g k I = G k I + ɛ, wih ɛ being some zero-mean random variable wih bounded variance, which is independen and idenically disribued over ime. As i is well-known, in a non-cooperaive game, he primary goal of each selfish player is o maximize is own accumulaed reward. Formally, his can be wrien as maximize k c,l k = g k c, l, 8 where c k {,..., C k } and l k {,..., L k }. By Assumpions A and A2, however, i is clear ha he objecive funcion in 8 is no available. For his reason, we argue for a less ambiious goal, which is known as regre minimizaion. More precisely, each player k aemps 5 Throughou he paper, logarihms are based 2 unless oherwise is saed.

11 o achieve vanishing exernal regre in he sense ha lim n n Rk Ex = lim n n max i=,...,n k = g k i, I,k = ḡ k P k, I,k =. 9 In addiion o he individual sraegy of each user aiming a saisfying 9, all players should achieve some seady sae, i.e. equilibrium. Therefore, in he remainder of his paper, we develop algorihmic soluions o he resource allocaion problem wih a wofold objecive in mind: i exernal regre of each user should vanish asympoically according o 9 and ii he acions of all players should convergence o equilibrium. By 4, he exernal regre of each user is upper-bounded by is inernal regre. As a resul, if all users selec heir acions according o some no-regre sraegy, no only 9 is achieved by all of hem see also Remark, bu also he corresponding game converges o equilibrium in some sense, which immediaely follows from Theorem. In Secions IV and V, we presen wo inernal-regre minimizing sraegies ha are shown o solve he game and, wih i, o achieve he wo objecives menioned above. Boh algorihms can be applied in a fully decenralized manner by each player, since a each ime, hey only require he se of pas rewards of he respecive player. Finally, i is worh noing ha he se of correlaed equilibria for he general ime-varying repeaed game defined by 7 canno be characerized. Neverheless, in wha follows, we characerize his se for wo games defined by some relaxed versions of 7. Firs, consider a game similar o he one defined above, wih he difference ha unlike 7, he reward process is assumed o be saionary, i.e. l k h G k kk I = log,c k 2 Qk q= lq h qk,c k 2 + N α l k, which implies ha he average channel gains are ime-invarian. By he following proposiion, his game has a unique correlaed equilibrium. Proposiion. Consider a K-player game where he expeced reward funcion of each player k is defined by. This game has a unique correlaed equilibrium which places probabiliy one on is unique pure-sraegy Nash equilibrium.

12 2 Proof: See Secion IX-B. Now le he expeced reward funcion be defined as follows: G k I = log l k h kk,c k 2 N αl k, which is more resriced, bu simpler han. Wih his choice of expeced reward funcion, he game can be shown o have a unique correlaed equilibrium ha maximizes he aggregae uiliy of all players, i.e. he social welfare. This resul is saed formally in he following proposiion. Proposiion 2. Consider a K-player game where each player k has he expeced reward funcion G k given by. This game has a unique correlaed equilibrium which places probabiliy one on a unique pure sraegy Nash equilibrium ha maximizes K k= Gk. Proof: See Secion IX-C. IV. NO-REGRET BANDIT EXPONENTIAL-BASED WEIGHTED AVERAGE STRATEGY The basic idea of an exponenial-based weighed sraegy is o assign each acion, a every rial, some selecion probabiliy which is inversely proporional o exponenially-weighed accumulaed regre or direcly proporional o exponenially-weighed accumulaed reward caused by ha acion in he pas [37]. Roughly speaking, if playing an acion has resuled in large regre in he pas, is fuure selecion probabiliy is small, and vice versa. As described in Secion II-A, in bandi formulaion, players only observe he reward of he played acion, and no hose of ohers. Therefore he reward of each acion i is esimaed as [33] g k I k i = I k g k p i = k i,, 2 o.w. which is an unbiased esimae of he rue reward of acion i; ha is, E [ g k i ] = g k i. Esimaed rewards are aferwards used o calculae regres. For example, he regre of no playing acion j insead of acion i yields R k i j, = s= r k i j,s = p k i,s gk s s= j g k s i. 3 Despie exhibiing vanishing exernal regre, weighed average sraegies yield in general large inernal regre; as a resul, even if all players play according o such sraegies, he game does no

13 3 converge o equilibrium. In he following, we uilize he bandi version of exponenially weighed average sraegy [38], and conver i o an improved version ha yields small inernal regre, using he approach of Secion II-C. The sraegy is called no regre bandi exponenially-weighed average sraegy NR-BEWAS, and is described in Algorihm. Algorihm No-Regre Bandi Exponenial-Based Weighed Average Sraegy NR-BEWAS : If he game horizon, n, is known, define γ and η as given in Proposiion 3, oherwise as hose given in Proposiion 4. 2: Define ΦU = Nk η ln i= expη u i, where U = u,..., u Nk R N k. 3: Le P k = N k,..., N k 4: Selec an acion using P k. 5: Play and observe he reward. 6: for = 2,..., n do 7: Le P k uniform disribuion. be he mixed sraegy a ime, i.e. Pk = 8: Consruc P k,i j as follows: replace p k i, in Pk Oher elemens remain unchanged. We obain P k,i j 9: Define where : Given δ k i j, δ k i j, = = exp p k,,.., pk i,,.., pk j, by zero, and insead increase pk p k,,..,,.., pk η Rk i j, m l:m l exp η Rk m l, k R i j, is calculaed by using 2 and 3., solve he following fixed poin equaion o find Pk : Final probabiliy disribuion yields P k = P k i j:i j = γ P k 2: Using he final P k, given by 6, selec an acion. 3: Play and observe he reward. 4: end for :,.., pk j, + pk i, N k,. j, o pk j, +pk i,.,.., pk N k,., 4 P k,i j δ k i j,. 5 + γ N k. 6 From Algorihm, NR-BEWAS has wo parameers, namely γ and η. In he even ha he game horizon, n, is known in advance, hese wo parameers are consan over ime η = η and γ = γ, and he growh rae of regre can be bounded precisely, mainly based on he resuls of [33]. Oherwise, hey vary wih ime. In his case, vanishing sub-linear in ime inernal regre can be guaraneed; neverheless, his bound migh be loose. This discussion is formalized by following proposiions.

14 4 Proposiion 3. Le η = η = ln N k 2N k n 2 3 and γ = γ = N 2 k ln N k 4n yields vanishing inernal regre and we have R k In OnN 2 k ln N k 2 3. Proof: See Appendix IX-D. 3. Then Algorihm NR-BEWAS Proposiion 4. Le η = γ3 and γ Nk 2 = 3. Then Algorihm NR-BEWAS yields vanishing inernal regre; ha is we have R k In on. Proof: See Appendix IX-E. The following corollaries follow from he above proposiions and Theorem. Corollary. If all players play according o NR-BEWAS, hen he empirical join frequencies of play converge o he se of correlaed equilibria. 4. Proof: The proof is a direc consequence of Theorem and Proposiion 3 or Proposiion Corollary 2. Le ɛ-correlaed equilibrium approximae correlaed equilibrium in he sense ha ɛ> C ɛ = C. Assuming ha he game horizon is known and all players play according o NR- BEWAS, hen he minimum required number of rials o achieve ɛ-correlaed equilibrium yields max k=,...,k ɛ 3 2 O N k KN 2 k ln N k + K 2 ln K, which is proporional o ɛ 3 2 polynomially in he number of acions as well as in he number of players. and increases Proof: The proof follows from he bound of Proposiion 3 and Remark 7.6 of [33]. 6 V. NO-REGRET BANDIT FOLLOW THE PERTURBED LEADER STRATEGY Similar o he weighed-average sraegy presened in he previous secion, he sraegy follow he perurbed leader is an approach o solve online decision-making problems. In he basic version of his approach, called follow he leader [39], he acion wih he minimum regre in he pas is seleced a each rial. However, his mehod is deerminisic and herefore does no achieve vanishing regre agains non-oblivious opponens. Therefore, in follow he perurbed leader, player adds a random perurbaion o he vecor of accumulaed regres, and he acion wih he minimum perurbed regre in he pas is seleced [33]. In [4], a bandi version of his 6 Deails are omied o avoid unnecessary resaemen of exising analysis.

15 5 algorihm is consruced, where unobserved rewards are esimaed. The auhors show ha he developed algorihm exhibis vanishing exernal regre. Similar o NR-BEWAS, we here modify he algorihm of [4] o ensure vanishing inernal regre. The approach is called no-regre bandi follow he perurbed leader sraegy NR-BFPLS. Algorihm 2 No-Regre Bandi Follow he Perurbed Leader Sraegy NR-BFPLS : Define ɛ = ɛ n = ln n 3, and γ N k n = min, N k ɛ. Noe ha unlike NR-BEWAS, here we know he game horizon n in advance. 2: Le P k = N k,..., N k uniform disribuion. 3: Selec an acion using P k. 4: Play and observe he reward. 5: for = 2,..., n do 6: Le P k be he mixed sraegy a ime, i.e. Pk = p k,,.., pk i,,.., pk j,,.., pk N k,. 7: Consruc P k,i j as follows: replace p k i, in Pk by zero, and insead increase pk j, o pk j, +pk i,. Oher elemens remain unchanged. We obain P k,i j 8: Calculae R k i j, 9: Define σ i j, = variables : Le Rk i j, using 2 and 3. R k i j, [4]. τ= δ k i j, = p k,,..,,.., pk j, + pk i,,.., pk N k, 2, which is he upper-bound of condiional variances of random = R k i j, + 2/N k σ i j, ln [4]. : Randomly selec a perurbaion vecor µ wih N k N k elemens from wo-sided exponenial disribuion wih widh ɛ. 2: Consider a selecion rule which selecs he acion i j given by argmax { Rk i j, + µ i j, }, i j {,..., N k N k } 7 Noe ha in our seing R i j denoes he esimaed regre of no playing acion i j, hence we find he acion wih larges R. 3: From 7, calculae he probabiliy δ k i j, assigned o each pair i j. 4: Given δ k i j,, solve he following fixed poin equaion o find Pk. 5: Final probabiliy disribuion yields P k = P k i j:i j = γ P k 6: Using he final P k, given by 9, selec an acion. 7: Play and observe he reward. 8: end for P k,i j δ k i j,. 8 + γ N k. 9. Algorihm 2 requires he knowledge of he probabiliy assigned o each acion by he follow he perurbed leader sraegy a every rial. However, in conras o NR-BEWAS, hese probabiliies

16 6 are no assigned explicily; herefore we explain how o calculae hese values. From 7, he selecion probabiliy of virual acion i j {,..., N k N k } is he probabiliy ha Ri j, plus perurbaion µ i j, is larger han hose of oher acions, i.e. Pr[I = i j] = Pr[ R i j, + µ i j, R i j, + µ i j, i j i j ] = = Pr[ R i j, + µ i j, = m R i j, + µ i j, m i j i j ]dm Pr[ R i j, + µ i j, = m] i j i j Pr[ R i j, + µ i j, m]dm. Since µ is disribued according o a wo-sided exponenial disribuion wih widh ɛ n, he erms under inegral can be calculaed easily see [4], for example. Now we are in a posiion o show some properies of NR-BFPLS Algorihm 2. Proposiion 5. Le ɛ = ɛ = ln n 3 N k n and γ = γ = min, N k ɛ. Then Algorihm 2 NR-BFPL yields vanishing inernal regre wih R k In OnN 2 k ln N k 2. Proof: By [4], we know ha if he BPFL algorihm is applied o N k acions, hen R k Ex OnN k ln N k 2. Using his, he proof proceeds along similar lines as he proof of Proposiion 3 and is herefore omied here. Corollary 3. Assuming ha he game horizon is known and all players play according o NR- BFPLS, hen he minimum required number of rials o achieve ɛ-correlaed equilibrium yields max k=,...,k ɛ 2 O N k KN 2 k ln N k + K 2 ln K, which is proporional o ɛ 2 polynomially in he number of acions as well as in he number of players. 2 and increases Proof: The proof is a resul of he bound of Proposiion 5 and Remark 7.6 of [33]. VI. BANDIT EXPERIMENTAL REGRET-TESTING STRATEGY Experimenal regre-esing belongs o he large family of exhausive search algorihms, and is comprehensively discussed in [32] and [33] for bandi games. In his secion, we briefly review his approach, and invesigae is performance laer in Secion VII-A. Firs, he ime is divided ino periods m =, 2,... of lengh T so ha for each m we have

17 7 [m T +, mt ]. A he beginning of period m, any player k randomly selecs a mixed sraegy, denoed by P k m. Moreover, some random variable U m k, {,..., n k,..., N k } is defined as follows. For [m T +, mt ], and for each n k, here are exacly s values of such ha U m k, = n k, and U m k, is seleced o be [38] = for he remaining = T sn k rials. A ime, he acion I k I k : is disribued as P k m equals n k if U m k, =. 2 if U m k, = n k A he end of period m, player k calculaes he experimenal regre of playing each acion n k as [38] ˆr k m,n k = T sn k mt =m T + g k I I { } U m k, = s mt =m T + g k n k, I,k I{ }. 22 U m k, =n k If he regre is smaller han an accepable hreshold ρ, he player coninues o play is curren mixed sraegy. Oherwise, anoher mixed sraegy is seleced. The procedure is summarized in Algorihm 3. I is known ha if he parameers of BERTS e.g. T and ρ are chosen appropriaely, hen, in a long run, he played mixed sraegy profile is an approximae Nash equilibrium for almos all he ime. Deails can be found in [33], and hence are omied. Algorihm 3 Bandi Experimenal Regre Tesing Sraegy [33] BERTS : Se T period lengh, ρ accepable regre hreshold, ξ exploraion parameer, m = period index. Noice ha for each period m =,..., M, we have [m T +, mt ]. according o he uniform disribuion, from he probabiliy simplex wih N k dimensions. 3: For each n k {,.., N k } selec s exploring rials a random. Exploraion rials which are dedicaed o differen acions should no overlap. 4: for = m T + y, where y < T do 5: if is an exploring rial dedicaed o acion i hen 6: play acion i and observe he reward. 7: else 2: Selec a mixed sraegy, P k m 8: selec an acion using P k m. Play and observe he reward. 9: end if : end for : Calculae he experimenal regre of period m, ˆr k 2: if max ˆr m,n k k > ρ, hen n k =,...,N k m,n k, using 22; 3: se m = m +, 2 go o line 2. 4: else 5: wih probabiliy ξ: se m = m +, 2 go o line 2; wih probabiliy ξ: le P k m+ = Pk m, 2 se m = m +, 3 go o line 3. 6: end if

18 8 VII. NUMERICAL ANALYSIS Numerical analysis consiss of wo pars. In Secion VII-A, we consider a simple nework, and clarify he work flow of algorihms. In Secion VII-B, we consider a larger nework, and sudy he performance of he proposed game model and algorihmic soluions in comparison wih some oher selecion sraegies. A. Par One Nework model: The nework consiss of wo ransmier-receiver pairs users. There exis wo orhogonal channels, C and C 2, and wo power-levels, P and P 2. Hence, he acion se of each user yields {a : C, P, a 2 : C, P 2, a 3 : C 2, P, a 4 : C 2, P 2 }. The disribuion of channel gains changes a each rial. We assume ha he variance of mean values of hese disribuions is relaively small, which corresponds o low dynamiciy. 7 Channel marices [.5,.8] [.5,.2] [.2,.5] [.2,.6] are H = and H 2 =, where H l,u,v [.,.5] [.,.9] [.5,.5] [.75,.95] u, v, l {, 2}, corresponds o he link u v hrough channel l, and presens he inerval from which he mean value of he disribuion of channel gain is seleced a each rial. Moreover, we assume P =, P 2 = 5 and α = 3. Excep for heir insananeous rewards, no oher informaion is revealed o users. This informaion can be provided by he receiver feedback o ransmier. Wih hese seings, i is easy o see ha C, P 2, C 2, P 2 is he unique pure sraegy Nash equilibrium of his game, i.e. he heoreical convergence poin. 2 Resuls and Discussion: We invesigae he performance of selecion sraegies NR-BEWAS, NR-BFPLS and BERTS. The following sraegies are also considered as benchmark: opimal cenralized acion channel and power level assignmen ha is based on global saisical channel knowledge and is performed by a cenral uni. uniformly random selecion. Figure compares he average reward achieved by NR-BEWAS and NR-BFPLS by hose of random and opimal selecions. From he figure, despie being provided wih only sricly limied informaion, boh NR-BFPLS and NR-BEWAS exhibi vanishing regre, in he sense ha he achieved average reward converges o ha of cenralized scenario. 7 Noe ha his assumpion is made in order o simplify he implemenaion; as esablished heoreically, all proposed procedures converge o equilibrium for arbirary varying disribuions.

19 9 2.5 User Average Reward Average Reward Opimal NR BFPLS NR BEWAS Random Trials/ User Opimal NR BFPLS NR BEWAS Random Trials/ Fig.. Performance of four selecion sraegies. Boh NR-BEWAS and NR-BFPLS exhibi vanishing regre; ha is, heir average rewards converge o ha of opimal cenralized selecion. T= T= T=4 T= Fig. 2. Evoluion of he mixed sraegy of User, applying NR-BEWAS. Horizonal axis denoes he acion indices, where index i, i {, 2, 3, 4}, sands for acion a i. Verical axis shows he weigh of each acion in he mixed sraegy, i.e. is probabiliy of being seleced. The mixed sraegy of User converges o π =,,,. Figures 2 and 3 illusrae he evoluion of mixed sraegies of he wo users when NR-BEWAS is used. Figures 4 and 5, on he oher hand, show he same variable when acions are seleced by using NR-BFPLS. For boh cases, he firs and second users respecively converge o a 2 : C, P 2 and a 4 : C 2, P 2, as suggesed by he heory.

20 2 T= T= T=4 T= Fig. 3. Evoluion of he mixed sraegy of User 2, applying NR-BEWAS. The horizonal and verical axes respecively depic he indices of acions and heir selecion probabiliies. The mixed sraegy of User 2 converges o π 2 =,,,. T= T= T=4 T= Fig. 4. Evoluion of he mixed sraegy of User, applying NR-BFPLS. The horizonal and verical axes respecively depic he indices of acions and heir selecion probabiliies. The mixed sraegy of User converges o π =,,,. The performance of BERTS, however, is no an explici funcion of game duraion. As described before, he procedure coninues o search mixed sraegies unil a suiable one, which yields a regre less han he seleced hreshold, is capured. Then his sraegy is played for he res of he game. Theorem 7.8 of [33] specifies he minimum game duraion o guaranee he convergence of BERTS, which is relaively long even for small number of users and acions. Neverheless, similar o oher search-based algorihms, here also exiss he possibiliy of finding some accepable sraegy a early sages of he game. As a resul, for relaively shor games, he

21 2 T= T= T=4 T= Fig. 5. Evoluion of he mixed sraegy of User 2, applying NR-BFPLS. The horizonal and verical axes respecively depic he indices of acions and heir selecion probabiliies. The mixed sraegy of User 2 converges o π 2 =,,,. performance of BERTS is raher unpredicable. The oher issue is he effec of regre hreshold. On he one hand, larger hreshold reduces he search ime, since he se of accepable sraegies is large. On he oher hand, large regre hreshold migh lead o performance loss, since here is he possibiliy ha he user ges locked a some sub-opimal sraegy a early sages, hereby incurring large accumulaed regre. I is worh noing ha due o is simpliciy, and despie unpredicable performance, BERTS is an appealing approach in cases where compuaional effor should be minimized, and convergence o Nash equilibrium is desired. Figure 6 summarizes he resuls of few exemplary performances of BERTS. The parameers are seleced as T = 8, M = 5 and ρ =.6 see Secion VI. Simulaion is performed for six independen rounds. The curve on he lef side of Figure 6 depics he period m 5 a which he algorihm finds an accepable sraegy. As expeced, he resuls exhibi no specific paern. The four sub-figures on he righ depic he mixed sraegies seleced by BERTS a rounds and 2, ogeher wih average rewards. From his figure, a round 2, accepable sraegies are found earlier han round by boh users, leading o beer average performance. I is also worh noing ha for User 2, he sraegy of round is in essence beer han ha of round 2; neverheless, i is found laer. As a resul, he average performance of round 2 is superior o ha of round.

22 22 Period a which an accepable sraegy is found User User Simulaion Round MS: User, Round A.R.= MS: User 2, Round A.R.= MS: User, Round 2 A.R.= MS: User 2, Round 2 A.R.= Fig. 6. Performance of BERTS. On he lef, he verical and horizonal axes show he periods and round number, respecively. The wo curves depic he period a which a suiable mixed sraegy MS is found a each of he 6 rounds. On he righ, hese mixed sraegies are shown for boh users a rounds and 2, ogeher wih average rewards. The horizonal and verical axes respecively depic he indices of acions and heir selecion probabiliies. B. Par Two In his secion we consider a wireless nework consising of 5 users ransmier-receiver pairs, ha compee for access o hree orhogonal channels a wo possible power levels hence six acions. We compare BFPLS and BEWAS wih he following selecion approaches. 8 Opimal cenralized acion assignmen as described in Secion VII-A2. Cenralized no-collision acion selecion, where no reward is assigned o users ha access he same channel. Thus, users are encouraged o avoid collisions a collision-avoidance 8 As menioned before, observing he join acion profile and/or communicaion among users is no required for implemening BEWAS, BFPLS and BERTS. Therefore, hey canno be compared wih sraegies ha include muual observaion and/or communicaion. A good example of such algorihms is he widely-used bes-response dynamics, where he sraegy of each player is o play wih he bes-response o eiher he hisorical [] or he prediced [5] join acion profile of opponens. Anoher example is he sraegy suggesed in [2], which is a combinaion of learning and aucion algorihms where users communicae wih each oher.

23 Aggregae Average Reward Opimal NR BFPLS NR BEWAS Greedy No Collision upper bound Epsilon Greedy Random LnTrials Fig. 7. Aggregae average reward of BFPLS and BEWAS compared o some oher selecion sraegies. sraegy. This curve can be considered as an upper-bound for he performance of learning algorihms ha selec acions based on collision avoidance, such as [2]. ɛ-greedy algorihm, where a each rial, wih probabiliy ɛ exploraion parameer, an acion is seleced uniformly a random, while wih probabiliy ɛ he bes acion so far is played. The average reward of seleced acion is updaed afer each play [42]. For saionary environmens, ɛ is usually ime-varying and converges o zero in he limi, while in adversarial cases, ɛ is preferred o remain fixed. Here we le ɛ =.. Greedy approach, where a he beginning of he game, some rials are reserved for exploraion, in which acions are seleced a random exploraion period. The lengh of his period is a pre-defined fracion of he enire game duraion. Based on he rewards of exploraion period, he bes possible acion is seleced, and is played for he res of he game exploiaion period [33]. This approach is exremely simple o implemen; however, o he bes of our knowledge, here is no analysis on he opimal lengh of he exploraion period. Uniformly random selecion. The numerical resuls are depiced in Figure 7. From his figure, we can conclude he following.

24 24 The performance of inerference-avoidance sraegies is srongly influenced by channel marices and ends o be poor specifically when he number of channels is less han ha of users. The reason is ha he sum reward of muliple inerfering users wih limied ransmi power migh be larger han he maximum achievable reward of any single user. The performance of boh BFPLS and BEWAS converge o ha of cenralized approach. As expeced, BFPLS converges faser han BEWAS and we poin ou ha he convergence speed of boh algorihms would be dramaically enhanced if some side informaion was available o players, e.g. if users observed he acions of each oher, or if communicaion was allowed among players. I is also worh noing ha alhough BFPLS converges faser han BEWAS, he compuaion of inegral 2 migh be involved, especially for large number of acions [4]. In general, ɛ-greedy and greedy approaches can be implemened easily wih low compuaional cos; neverheless, i can be seen ha he greedy approaches are inferior o BEWAS and BFPLS in erms of asympoic performance. Basically, hese approaches are more suiable for saionary environmens. VIII. CONCLUSION AND REMARKS This paper deals wih resource allocaion in muli-user infrasrucureless wireless neworks. The problem of uiliy maximizaion has been formulaed using he muli-player muli-armed bandi heory framework. More precisely, given no side informaion, he users aim a minimizing some regre expressed in erms of he loss of reward by selecing appropriae acions on a given space of ransmi power levels and orhogonal frequency channels. Based on some recen mahemaical resuls, we have designed wo selecion sraegies, which no only provide vanishing regre for each player, bu also guaranee he asympoic convergence of he game o he se of correlaed equilibria. We have also sudied experimenal regre esing sraegy ha asympoically converges o he se of Nash equilibria. Numerical resuls confirms he applicabiliy of he game model and proposed sraegies o wireless channel selecion and power conrol.

25 25 IX. APPENDIX A. Some Auxiliary Resuls In his secion, we sae some auxiliary resuls and maerials from game heory as well as bandi heory ha are necessary for proofs. Game Theory: Throughou his par, we consider a game G consising of a se of K players where he sraegy se of each player k {,..., K} is denoed by I k wih a generic elemen i k = i k,..., i k. Similarly, he se of join sraegy profiles of players is denoed by I wih a M generic elemen i = i,..., i K and i k sands for he join acion profile of all players excep for player k. Moreover, g k i sands for he uiliy funcion of some player k. 9 Definiion 4. A game G is smooh if, for each k {,..., K}, g k i has coninuous parial derivaives wih respec o he componens of i k. Definiion 5. Le g k g = k,, gk, and call g k he payoff gradien of i k i k k {,...,K} M a smooh game G. We say ha he payoff gradien is sricly monoone if K g k i g k j T i k j k <, 23 k= holds for all i, j I wih i j. Theorem 2 [43]. Consider a smooh game G wih compac sraegy ses. If he payoff gradien of G is sricly monoone hen i has a unique correlaed equilibrium, which places probabiliy one on a unique pure-sraegy Nash equilibrium. Definiion 6. A game G is poenial if here exiss a poenial funcion f : I R such ha for all i, j I k and k {,..., K}. g k i, i k gk j, i k = fi, i k fj, i k, 24 Theorem 3 [44]. Le G be a smooh poenial game wih a sricly concave poenial funcion. Then a sraegy profile is he unique pure sraegy Nash equilibrium if and only if i is he poenial maximizer. 9 Noe ha compared o he sysem model some noaion has been changed slighly.

26 26 Lemma [43]. Le G be a smooh poenial game. A poenial of G is sricly concave if and only if he payoff gradien of G is sricly monoone. 2 Bandi Theory: Lemma 2. Le R n and R Ex be given by and 2, respecively. Then, for any δ, ], we 2 have Pr R n R Ex n 2 ln 2δ, 25 δ from which i follows ha if R n on, hen we have R Ex on, wih arbirarily high probabiliy. Proof: By comparing and 2, i suffices o show ha Pr n = g I ḡ P n ln 2δ. To his end, define S := n 2 δ = g I, where g I [, ], n, are independen random variables see also Secion II-A. Furher noe ha S = E[S] = n = ḡp. Therefore, by Hoeffding s inequaliy [33], n Pr R n R Ex 2 ln =Pr S δ S n 2 ln δ 2 exp 2 n ln 26 2 δ = 2δ. n n Hence he Lemma follows wih Pr R n R Ex ln n = Pr R 2 δ n R Ex ln. 2 δ Lemma 3. Le R Ex be given by 2. Moreover, define R n = max i=,...,n n = g i n = g P, where g P = N i= p i, g i and g i is given by 2. Then we have n Pr Rn R Ex 2 ln 2δ. 27 δ Hence, for sufficienly small δ >, R Ex on implies ha Rn on, wih arbirarily high probabiliy. Throughou his secion and in order o simplify he noaion, he player index k is omied unless ambiguiy arises. Here and hereafer, he saemen Xn on wih arbirarily high probabiliy for some nonnegaive random sequence Xn R means ha he probabiliy of Xn / on can be made arbirarily small, provided ha some parameer is chosen sufficienly small.

27 27 Proof: Similar o he proof of Lemma 2, i follows from 2 and he definiion of R n ha i is sufficien o show ha Pr n = g n P ḡ P ln 2δ for δ, ]. To 2 δ 2 his end, noe ha g P [, ], n, are independen random variables. Moreover, since g i is an unbiased esimae of g i, we have E[ n = g P ] = n = ḡp. Hence, defining S = ḡ P g P and proceeding as in he proof of Lemma 2 wih he Hoeffding s inequaliy in hand proves he lemma. Proposiion 6. Le R n be given by and R n be defined as in Lemma 3. Then, R n on implies ha R n on. Proof: Lemma 2 implies ha R n on R Ex on wih arbirarily high probabiliy, while by Lemma 3, we have R Ex on R on. Therefore, if R n on, hen R on wih arbirarily high probabiliy. Theorem 4. [33] Le ΦU = ψ N i= φu i, where U = u,..., u N. Consider a selecion sraegy, which a ime selecs acion I according o disribuion P, whose elemens p i, are defined as where R i, = s= g si g s I s. Assume ha: A. n = γ 2 = o n2 ln n, φ R i, p i, = γ N k= φ R i, + γ N, 28 A2. For all vecors V = v,,..., v n, wih v i, N γ, we have lim n ψφn where CV = sup U R N ψ N i= φu i N i= φ u i v 2 i,. CV =, 29 = A3. For all vecors U = u,,..., u n,, wih u i,, lim n ψφn = γ N i= i ΦU =. 3 A4. For all vecors U = u,,..., u n,, wih u i,, ln n lim n N 2 i ΦU. 3 n ψφn = γ 2 i=

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

Expert Advice for Amateurs

Expert Advice for Amateurs Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Games Against Nature

Games Against Nature Advanced Course in Machine Learning Spring 2010 Games Agains Naure Handous are joinly prepared by Shie Mannor and Shai Shalev-Shwarz In he previous lecures we alked abou expers in differen seups and analyzed

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits DOI: 0.545/mjis.07.5009 Exponenial Weighed Moving Average (EWMA) Char Under The Assumpion of Moderaeness And Is 3 Conrol Limis KALPESH S TAILOR Assisan Professor, Deparmen of Saisics, M. K. Bhavnagar Universiy,

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Resource Allocation in Visible Light Communication Networks NOMA vs. OFDMA Transmission Techniques

Resource Allocation in Visible Light Communication Networks NOMA vs. OFDMA Transmission Techniques Resource Allocaion in Visible Ligh Communicaion Neworks NOMA vs. OFDMA Transmission Techniques Eirini Eleni Tsiropoulou, Iakovos Gialagkolidis, Panagiois Vamvakas, and Symeon Papavassiliou Insiue of Communicaions

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time. Supplemenary Figure 1 Spike-coun auocorrelaions in ime. Normalized auocorrelaion marices are shown for each area in a daase. The marix shows he mean correlaion of he spike coun in each ime bin wih he spike

More information

Stochastic Bandits with Pathwise Constraints

Stochastic Bandits with Pathwise Constraints Sochasic Bandis wih Pahwise Consrains Auhor Insiue Absrac. We consider he problem of sochasic bandis, wih he goal of maximizing a reward while saisfying pahwise consrains. The moivaion for his problem

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

Competitive and Cooperative Inventory Policies in a Two-Stage Supply-Chain

Competitive and Cooperative Inventory Policies in a Two-Stage Supply-Chain Compeiive and Cooperaive Invenory Policies in a Two-Sage Supply-Chain (G. P. Cachon and P. H. Zipkin) Presened by Shruivandana Sharma IOE 64, Supply Chain Managemen, Winer 2009 Universiy of Michigan, Ann

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data Chaper 2 Models, Censoring, and Likelihood for Failure-Time Daa William Q. Meeker and Luis A. Escobar Iowa Sae Universiy and Louisiana Sae Universiy Copyrigh 1998-2008 W. Q. Meeker and L. A. Escobar. Based

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN Inernaional Journal of Scienific & Engineering Research, Volume 4, Issue 10, Ocober-2013 900 FUZZY MEAN RESIDUAL LIFE ORDERING OF FUZZY RANDOM VARIABLES J. EARNEST LAZARUS PIRIYAKUMAR 1, A. YAMUNA 2 1.

More information

EXERCISES FOR SECTION 1.5

EXERCISES FOR SECTION 1.5 1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler

More information

Stability and Bifurcation in a Neural Network Model with Two Delays

Stability and Bifurcation in a Neural Network Model with Two Delays Inernaional Mahemaical Forum, Vol. 6, 11, no. 35, 175-1731 Sabiliy and Bifurcaion in a Neural Nework Model wih Two Delays GuangPing Hu and XiaoLing Li School of Mahemaics and Physics, Nanjing Universiy

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

Lecture 4 Notes (Little s Theorem)

Lecture 4 Notes (Little s Theorem) Lecure 4 Noes (Lile s Theorem) This lecure concerns one of he mos imporan (and simples) heorems in Queuing Theory, Lile s Theorem. More informaion can be found in he course book, Bersekas & Gallagher,

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

The field of mathematics has made tremendous impact on the study of

The field of mathematics has made tremendous impact on the study of A Populaion Firing Rae Model of Reverberaory Aciviy in Neuronal Neworks Zofia Koscielniak Carnegie Mellon Universiy Menor: Dr. G. Bard Ermenrou Universiy of Pisburgh Inroducion: The field of mahemaics

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

2. Nonlinear Conservation Law Equations

2. Nonlinear Conservation Law Equations . Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems Essenial Microeconomics -- 6.5: OPIMAL CONROL Consider he following class of opimizaion problems Max{ U( k, x) + U+ ( k+ ) k+ k F( k, x)}. { x, k+ } = In he language of conrol heory, he vecor k is he vecor

More information

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous

More information

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes

More information

Inventory Control of Perishable Items in a Two-Echelon Supply Chain

Inventory Control of Perishable Items in a Two-Echelon Supply Chain Journal of Indusrial Engineering, Universiy of ehran, Special Issue,, PP. 69-77 69 Invenory Conrol of Perishable Iems in a wo-echelon Supply Chain Fariborz Jolai *, Elmira Gheisariha and Farnaz Nojavan

More information

An Introduction to Backward Stochastic Differential Equations (BSDEs) PIMS Summer School 2016 in Mathematical Finance.

An Introduction to Backward Stochastic Differential Equations (BSDEs) PIMS Summer School 2016 in Mathematical Finance. 1 An Inroducion o Backward Sochasic Differenial Equaions (BSDEs) PIMS Summer School 2016 in Mahemaical Finance June 25, 2016 Chrisoph Frei cfrei@ualbera.ca This inroducion is based on Touzi [14], Bouchard

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems.

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems. di ernardo, M. (995). A purely adapive conroller o synchronize and conrol chaoic sysems. hps://doi.org/.6/375-96(96)8-x Early version, also known as pre-prin Link o published version (if available):.6/375-96(96)8-x

More information

Topics in Machine Learning Theory

Topics in Machine Learning Theory Topics in Machine Learning Theory The Adversarial Muli-armed Bandi Problem, Inernal Regre, and Correlaed Equilibria Avrim Blum 10/8/14 Plan for oday Online game playing / combining exper advice bu: Wha

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given

More information

OBJECTIVES OF TIME SERIES ANALYSIS

OBJECTIVES OF TIME SERIES ANALYSIS OBJECTIVES OF TIME SERIES ANALYSIS Undersanding he dynamic or imedependen srucure of he observaions of a single series (univariae analysis) Forecasing of fuure observaions Asceraining he leading, lagging

More information

MODULE 3 FUNCTION OF A RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES PROBABILITY DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE

MODULE 3 FUNCTION OF A RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES PROBABILITY DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE Topics MODULE 3 FUNCTION OF A RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES 2-6 3. FUNCTION OF A RANDOM VARIABLE 3.2 PROBABILITY DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE 3.3 EXPECTATION AND MOMENTS

More information

Physics 127b: Statistical Mechanics. Fokker-Planck Equation. Time Evolution

Physics 127b: Statistical Mechanics. Fokker-Planck Equation. Time Evolution Physics 7b: Saisical Mechanics Fokker-Planck Equaion The Langevin equaion approach o he evoluion of he velociy disribuion for he Brownian paricle migh leave you uncomforable. A more formal reamen of his

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

Cash Flow Valuation Mode Lin Discrete Time

Cash Flow Valuation Mode Lin Discrete Time IOSR Journal of Mahemaics (IOSR-JM) e-issn: 2278-5728,p-ISSN: 2319-765X, 6, Issue 6 (May. - Jun. 2013), PP 35-41 Cash Flow Valuaion Mode Lin Discree Time Olayiwola. M. A. and Oni, N. O. Deparmen of Mahemaics

More information

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers

More information

CHERNOFF DISTANCE AND AFFINITY FOR TRUNCATED DISTRIBUTIONS *

CHERNOFF DISTANCE AND AFFINITY FOR TRUNCATED DISTRIBUTIONS * haper 5 HERNOFF DISTANE AND AFFINITY FOR TRUNATED DISTRIBUTIONS * 5. Inroducion In he case of disribuions ha saisfy he regulariy condiions, he ramer- Rao inequaliy holds and he maximum likelihood esimaor

More information

Planning in POMDPs. Dominik Schoenberger Abstract

Planning in POMDPs. Dominik Schoenberger Abstract Planning in POMDPs Dominik Schoenberger d.schoenberger@sud.u-darmsad.de Absrac This documen briefly explains wha a Parially Observable Markov Decision Process is. Furhermore i inroduces he differen approaches

More information

Ordinary Differential Equations

Ordinary Differential Equations Ordinary Differenial Equaions 5. Examples of linear differenial equaions and heir applicaions We consider some examples of sysems of linear differenial equaions wih consan coefficiens y = a y +... + a

More information

Notes on online convex optimization

Notes on online convex optimization Noes on online convex opimizaion Karl Sraos Online convex opimizaion (OCO) is a principled framework for online learning: OnlineConvexOpimizaion Inpu: convex se S, number of seps T For =, 2,..., T : Selec

More information

Approximation Algorithms for Unique Games via Orthogonal Separators

Approximation Algorithms for Unique Games via Orthogonal Separators Approximaion Algorihms for Unique Games via Orhogonal Separaors Lecure noes by Konsanin Makarychev. Lecure noes are based on he papers [CMM06a, CMM06b, LM4]. Unique Games In hese lecure noes, we define

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

4 Sequences of measurable functions

4 Sequences of measurable functions 4 Sequences of measurable funcions 1. Le (Ω, A, µ) be a measure space (complee, afer a possible applicaion of he compleion heorem). In his chaper we invesigae relaions beween various (nonequivalen) convergences

More information

Class Meeting # 10: Introduction to the Wave Equation

Class Meeting # 10: Introduction to the Wave Equation MATH 8.5 COURSE NOTES - CLASS MEETING # 0 8.5 Inroducion o PDEs, Fall 0 Professor: Jared Speck Class Meeing # 0: Inroducion o he Wave Equaion. Wha is he wave equaion? The sandard wave equaion for a funcion

More information

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3 and d = c b - b c c d = c b - b c c This process is coninued unil he nh row has been compleed. The complee array of coefficiens is riangular. Noe ha in developing he array an enire row may be divided or

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

arxiv: v1 [math.gm] 4 Nov 2018

arxiv: v1 [math.gm] 4 Nov 2018 Unpredicable Soluions of Linear Differenial Equaions Mara Akhme 1,, Mehme Onur Fen 2, Madina Tleubergenova 3,4, Akylbek Zhamanshin 3,4 1 Deparmen of Mahemaics, Middle Eas Technical Universiy, 06800, Ankara,

More information

Appendix 14.1 The optimal control problem and its solution using

Appendix 14.1 The optimal control problem and its solution using 1 Appendix 14.1 he opimal conrol problem and is soluion using he maximum principle NOE: Many occurrences of f, x, u, and in his file (in equaions or as whole words in ex) are purposefully in bold in order

More information

Vanishing Viscosity Method. There are another instructive and perhaps more natural discontinuous solutions of the conservation law

Vanishing Viscosity Method. There are another instructive and perhaps more natural discontinuous solutions of the conservation law Vanishing Viscosiy Mehod. There are anoher insrucive and perhaps more naural disconinuous soluions of he conservaion law (1 u +(q(u x 0, he so called vanishing viscosiy mehod. This mehod consiss in viewing

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

A Dynamic Model of Economic Fluctuations

A Dynamic Model of Economic Fluctuations CHAPTER 15 A Dynamic Model of Economic Flucuaions Modified for ECON 2204 by Bob Murphy 2016 Worh Publishers, all righs reserved IN THIS CHAPTER, OU WILL LEARN: how o incorporae dynamics ino he AD-AS model

More information

Optimal Server Assignment in Multi-Server

Optimal Server Assignment in Multi-Server Opimal Server Assignmen in Muli-Server 1 Queueing Sysems wih Random Conneciviies Hassan Halabian, Suden Member, IEEE, Ioannis Lambadaris, Member, IEEE, arxiv:1112.1178v2 [mah.oc] 21 Jun 2013 Yannis Viniois,

More information

CENTRALIZED VERSUS DECENTRALIZED PRODUCTION PLANNING IN SUPPLY CHAINS

CENTRALIZED VERSUS DECENTRALIZED PRODUCTION PLANNING IN SUPPLY CHAINS CENRALIZED VERSUS DECENRALIZED PRODUCION PLANNING IN SUPPLY CHAINS Georges SAHARIDIS* a, Yves DALLERY* a, Fikri KARAESMEN* b * a Ecole Cenrale Paris Deparmen of Indusial Engineering (LGI), +3343388, saharidis,dallery@lgi.ecp.fr

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

Distributed Fictitious Play for Optimal Behavior of Multi-Agent Systems with Incomplete Information

Distributed Fictitious Play for Optimal Behavior of Multi-Agent Systems with Incomplete Information Disribued Ficiious Play for Opimal Behavior of Muli-Agen Sysems wih Incomplee Informaion Ceyhun Eksin and Alejandro Ribeiro arxiv:602.02066v [cs.g] 5 Feb 206 Absrac A muli-agen sysem operaes in an uncerain

More information

Reliability of Technical Systems

Reliability of Technical Systems eliabiliy of Technical Sysems Main Topics Inroducion, Key erms, framing he problem eliabiliy parameers: Failure ae, Failure Probabiliy, Availabiliy, ec. Some imporan reliabiliy disribuions Componen reliabiliy

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

18 Biological models with discrete time

18 Biological models with discrete time 8 Biological models wih discree ime The mos imporan applicaions, however, may be pedagogical. The elegan body of mahemaical heory peraining o linear sysems (Fourier analysis, orhogonal funcions, and so

More information

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3 Macroeconomic Theory Ph.D. Qualifying Examinaion Fall 2005 Comprehensive Examinaion UCLA Dep. of Economics You have 4 hours o complee he exam. There are hree pars o he exam. Answer all pars. Each par has

More information

Navneet Saini, Mayank Goyal, Vishal Bansal (2013); Term Project AML310; Indian Institute of Technology Delhi

Navneet Saini, Mayank Goyal, Vishal Bansal (2013); Term Project AML310; Indian Institute of Technology Delhi Creep in Viscoelasic Subsances Numerical mehods o calculae he coefficiens of he Prony equaion using creep es daa and Herediary Inegrals Mehod Navnee Saini, Mayank Goyal, Vishal Bansal (23); Term Projec

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

Sliding Mode Extremum Seeking Control for Linear Quadratic Dynamic Game

Sliding Mode Extremum Seeking Control for Linear Quadratic Dynamic Game Sliding Mode Exremum Seeking Conrol for Linear Quadraic Dynamic Game Yaodong Pan and Ümi Özgüner ITS Research Group, AIST Tsukuba Eas Namiki --, Tsukuba-shi,Ibaraki-ken 5-856, Japan e-mail: pan.yaodong@ais.go.jp

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updaes Michael Kearns AT&T Labs mkearns@research.a.com Sainder Singh AT&T Labs baveja@research.a.com Absrac We give he firs rigorous upper bounds on he error

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Unit Root Time Series. Univariate random walk

Unit Root Time Series. Univariate random walk Uni Roo ime Series Univariae random walk Consider he regression y y where ~ iid N 0, he leas squares esimae of is: ˆ yy y y yy Now wha if = If y y hen le y 0 =0 so ha y j j If ~ iid N 0, hen y ~ N 0, he

More information

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux Gues Lecures for Dr. MacFarlane s EE3350 Par Deux Michael Plane Mon., 08-30-2010 Wrie name in corner. Poin ou his is a review, so I will go faser. Remind hem o go lisen o online lecure abou geing an A

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j =

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j = 1: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME Moving Averages Recall ha a whie noise process is a series { } = having variance σ. The whie noise process has specral densiy f (λ) = of

More information

Problem Set #3: AK models

Problem Set #3: AK models Universiy of Warwick EC9A2 Advanced Macroeconomic Analysis Problem Se #3: AK models Jorge F. Chavez December 3, 2012 Problem 1 Consider a compeiive economy, in which he level of echnology, which is exernal

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY ECO 504 Spring 2006 Chris Sims RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY 1. INTRODUCTION Lagrange muliplier mehods are sandard fare in elemenary calculus courses, and hey play a cenral role in economic

More information

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem.

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem. Noes, M. Krause.. Problem Se 9: Exercise on FTPL Same model as in paper and lecure, only ha one-period govenmen bonds are replaced by consols, which are bonds ha pay one dollar forever. I has curren marke

More information

Air Traffic Forecast Empirical Research Based on the MCMC Method

Air Traffic Forecast Empirical Research Based on the MCMC Method Compuer and Informaion Science; Vol. 5, No. 5; 0 ISSN 93-8989 E-ISSN 93-8997 Published by Canadian Cener of Science and Educaion Air Traffic Forecas Empirical Research Based on he MCMC Mehod Jian-bo Wang,

More information

Empirical Process Theory

Empirical Process Theory Empirical Process heory 4.384 ime Series Analysis, Fall 27 Reciaion by Paul Schrimpf Supplemenary o lecures given by Anna Mikusheva Ocober 7, 28 Reciaion 7 Empirical Process heory Le x be a real-valued

More information

ON QUANTIZATION AND COMMUNICATION TOPOLOGIES IN MULTI-VEHICLE RENDEZVOUS 1. Karl Henrik Johansson Alberto Speranzon,2 Sandro Zampieri

ON QUANTIZATION AND COMMUNICATION TOPOLOGIES IN MULTI-VEHICLE RENDEZVOUS 1. Karl Henrik Johansson Alberto Speranzon,2 Sandro Zampieri ON QUANTIZATION AND COMMUNICATION TOPOLOGIES IN MULTI-VEHICLE RENDEZVOUS 1 Karl Henrik Johansson Albero Speranzon, Sandro Zampieri Deparmen of Signals, Sensors and Sysems Royal Insiue of Technology Osquldas

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t Exercise 7 C P = α + β R P + u C = αp + βr + v (a) (b) C R = α P R + β + w (c) Assumpions abou he disurbances u, v, w : Classical assumions on he disurbance of one of he equaions, eg. on (b): E(v v s P,

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information