1904 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 58, NO. 4, MAY 2009

Size: px
Start display at page:

Download "1904 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 58, NO. 4, MAY 2009"

Transcription

1 1904 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 58, NO. 4, MAY 2009 Learning o Compee for Resources in Wireless Sochasic Games Fangwen Fu, Suden Member, IEEE, and Mihaela van der Schaar, Senior Member, IEEE Absrac In his paper, we model he various users in a wireless newor e.g., cogniive radio newor) as a collecion of selfish auonomous agens ha sraegically inerac o acquire dynamically available specrum opporuniies. Our main focus is on developing soluions for wireless users o successfully compee wih each oher for he limied and ime-varying specrum opporuniies, given experienced dynamics in he wireless newor. To analyze he ineracions among users given he environmen disurbance, we propose a sochasic game framewor for modeling how he compeiion among users for specrum opporuniies evolves over ime. A each sage of he sochasic game, a cenral specrum moderaor CSM) aucions he available resources, and he users sraegically bid for he required resources. The join bid acions affec he resource allocaion and, hence, he rewards and fuure sraegies of all users. Based on he observed resource allocaions and corresponding rewards, we propose a bes-response learning algorihm ha can be deployed by wireless users o improve heir bidding policy a each sage. The simulaion resuls show ha by deploying he proposed bes-response learning algorihm, he wireless users can significanly improve heir own bidding sraegies and, hence, heir performance in erms of boh he applicaion qualiy and he incurred cos for he used resources. Index Terms Delay-sensiive ransmission, ineracive learning, muliuser resource managemen, reinforcemen learning, sochasic games, wireless newors. I. INTRODUCTION DYNAMIC resource managemen in heerogeneous wireless newors is a challenging problem [3]. The wireless saions and radio sysems ha mus coexis in such a newor differ in heir individual uiliy funcions, ransmission acions, resource demands, and capabiliies. Thus, various levels of sraegic 1 ineracion and adapaion are necessary o cope wih he widely varying dynamics. In his paper, we focus on synhesizing new, dynamic, and informaionally decenralized resource-managemen mechanisms o achieve high uiliy in compeiive and heerogeneous wireless newors including cogniive radio newors [1] [3]). Specifically, our focus is on designing associaed communicaion algorihms ha enable Manuscrip received Augus 28, 2007; revised April 17, 2008 and July 1, Firs published July 29, 2008; curren version published April 22, This wor was suppored by he Naional Science Foundaion under CAREER Award CCF The review of his paper was coordinaed by Prof. O. B. Aan. The auhors are wih he Deparmen of Elecrical Engineering, Universiy of California a Los Angeles, Los Angeles, CA USA fwfu@ee.ucla.edu; mihaela@ee.ucla.edu). Color versions of one or more of he figures in his paper are available online a hp://ieeexplore.ieee.org. Digial Objec Idenifier /TVT By sraegic users, we mean users ha are no price aers and do no have an aprioriconsensus on resource allocaion. self-ineresed auonomous wireless saions o sraegically compee for he available specrum resources in eiher ISM bands [1] or bands shared wih licensed users, according o apriorimandaed or negoiaed rules. This paper is primarily concerned wih he ensions and relaionships among auonomous adapaion by secondary unlicensed) users SUs), he compeiion among hese users, he ineracion of hese users wih specrum moderaors having heir own goals, e.g., maing money, imposing fairness rules, ensuring compliance wih he Federal Communicaions Commission FCC) [1], and local regulaions wih respec o primary licensed) users PUs), ec. Unlie previous wors on resource managemen [6], [21], [26], our main focus is on discussing how users can adap, predic, learn, and deermine how hey compee for he ime-varying resources, as well as how hey selec he associaed ransmission sraegies, given he experienced dynamics. In wireless newors, hese dynamics can be caegorized ino wo ypes: One is he disurbance due o he environmen, and he oher is he impac caused by compeing users. The disurbance due o he environmen resuls from variaions uncerainies) of he wireless channels or source e.g., mulimedia) characerisics. For example, he sochasic behavior of he PUs, he ime-varying channel condiions experienced by he SUs, and he ime-varying source raffic ha needs o be ransmied by he SUs can be considered as environmenal disurbances. These ypes of dynamics are generally modeled as saionary processes. For insance, he use of each channel by he PUs can be modeled as a wo-sae Marov chain wih ON-sae he channel is used by PUs) and OFF-sae he channel is available for he SUs) [7]. The channel condiions can be modeled using a finie-sae Marov model [24]. The pace arrival of he source raffic can be modeled as a Poisson process 2 [11]. Convenionally, wireless saions have only considered hese environmen disurbances when adaping heir cross-layer sraegies [12] for delay-sensiive ransmission. The oher ype of dynamics he impac from compeing users, which is due o he noncollaboraive, auonomous, and sraegic SUs in he newor ransmiing heir raffic is less well sudied o wireless communicaion newors. The goal of his paper is o provide soluions and associaed merics ha can be used by an auonomous SU o analyze and predic he oucome of various dynamic ineracions among compeing SUs in dynamic muliuser communicaion sysems 2 Oher pace arrival models can also been used in our proposed framewor /$ IEEE

2 FU AND VAN DER SCHAAR: LEARNING TO COMPETE FOR RESOURCES IN WIRELESS STOCHASTIC GAMES 1905 and, based on his forecas, adap and opimize is ransmission sraegy. In our considered wireless newors, he SUs are modeled as raional and sraegic. We model he specrum managemen as a sochasic game [22] in which he SUs simulaneously and repeaedly mae heir own resource bids. The compeiion for dynamic resources is assised by a cenral coordinaor similar o ha in exising wireless LAN WLAN) sandards such as e hybrid coordinaion funcion HCF) [13]). We refer o his coordinaor as he cenral specrum moderaor CSM). The role of he CSM is o allocae resources o he SUs based on he predeermined uiliy maximizaion rule. 3 In his paper, o explicily consider he sraegic behavior of auonomous SUs and he informaionally decenralized naure of he compeiion for wireless resources, we assume ha he CSM deploys an aucion mechanism for dynamically allocaing resources. Aucion heory has exensively been sudied in economics [19], and i has also been recenly applied o newor resource allocaion [4] [6]. Noe ha he role of he CSM 4 in our resource managemen game for our considered wireless newors will be ep o a minimum. Unlie alernaive exising soluions [21], he CSM will no require nowledge of he privae informaion of he users and will no perform complex compuaions for deciding he resource allocaion. Is only role will be he implemenaion of he specrum eiquee rules as in [8] and ensuring ha he available specrum holes are aucioned among users. To capure he newor dynamics, we allow he CSM o repeaedly aucion he available specrum opporuniies based on he PUs behaviors. Meanwhile, each SU is allowed o sraegically adap is bidding sraegy based on informaion abou he available specrum opporuniies, is source and channel characerisics, and he impac of he oher SU bidding acions. Using his sochasic wireless allocaion framewor, we develop a learning mehodology for SUs o improve heir policies for playing he aucion game, i.e., he policies for generaing he bids o compee for available resources. Specifically, during repeaed muliuser ineracion, he SUs can observe parial hisoric informaion of he oucome of he aucion game, hrough which he SUs can esimae he impac on heir fuure rewards and hen adop heir bes response o effecively compee for channel opporuniies. The esimaion of he impac on he expeced fuure reward can be performed using differen ypes of ineracive learning [18]. In his paper, we focus on reinforcemen learning [17], [27] because his allows he SUs o improve heir bidding sraegy based only on he nowledge of heir own pas received payoffs wihou nowing he bids or payoffs of he oher SUs. Our proposed bes-response learning algorihm is inspired from he Q-learning for he single agen ineracing wih he environmen. Unlie Q-learning, he proposed bes-response learning explicily considers he ineracions and coupling among SUs in he wireless newor. By deploying he bes-response learning algorihm, he SUs can sraegically 3 Oher fairness rules can also be deployed in he CSM such as air-ime fairness, uiliy-based fairness, ec. [12]. 4 I should be noed ha his approach can also allow for muliple CSMs o manage he specrum by fairly dividing heir responsibiliies, e.g., based on heir geolocaion or frequency band in which hey are operaing, or by compeing agains each oher for he number of SUs ha will associae wih hem. predic he impac of curren acions on fuure performance and hen opimally mae heir resource bids. This paper is organized as follows. In Secion II, we inroduce a sochasic game formulaion for muliuser ineracion in wireless newors. In Secion III, we show how a onesage aucion mechanism can be used o divide he specrum allocaion among sraegic SUs. In Secion IV, we presen he sae definiion, sae ransiion model, and sage reward funcion for he SUs in he sochasic game. In Secion V, we discuss he bidding sraegies of he SUs for playing he sochasic game. In Secion VI, we propose a bes-response learning approach for he SUs o predic heir fuure rewards based on he observed hisoric informaion. In Secion VII, we presen he simulaion resuls, followed by conclusions and fuure research in Secion VIII. II. STOCHASTIC GAME FORMULATION FOR DYNAMIC MULTIUSER INTERACTION We consider a specrum consising of N channels, each indexed by j {1,...,N}. TheN wireless channels are originally licensed o a primary newor PN) whose users i.e., PUs) exclusively access he channels. In he secondary newor SN), he MM N) auonomous SUs, each indexed by i {1,...,M} and ransmiing delay-sensiive daa, compee for he specrum opporuniies released by he PUs in hese N channels. Alhough he available ransmission opporuniies TxOps) for SUs depend on he access paerns of PUs and he deecion sysems [2], we do no discuss he deecion mehods in his paper bu raher rely on he exising lieraure for his purpose [3]. Insead, we assume ha he available TxOps in each channel change over ime due o he PUs joining or leaving he newor and can be modeled as a wo-sae Marov chain, as in [7] and [10]. Our goal is o develop a general framewor for muliuser ineracion in he SN, where users can compee for dynamically available TxOps. Moreover, we also aim o provide soluions for SUs o improve heir sraegies for playing he repeaed resource-managemen game by considering heir pas ineracions wih oher SUs. The communicaions of he PUs are assumed o follow a synchronous slo srucure. The ime slo has lengh of ΔT seconds. We assume ha during each ime slo, each channel is eiher exclusively occupied by PUs or ha here is no PU accessing he channel [7], [10]. Hence, during each ime slo, he channel is in one of he following wo saes: ON-sae his channel is currenly used by he PUs) or OFF-sae his channel is no used by he PUs, and hence, he SUs can use his channel). Noe ha if his is an unlicensed band, he channel will always be in he off mode and can be uilized by he SUs a all imes. The TxOp of channel j a ime slo N is denoed by yj {0, 1}, where y j is 0 if he channel is in he ON-sae and 1 if i is in he OFF-sae. In his paper, he TxOp yj of channel j is modeled by a wo-sae Marov chain wih ransiion probabiliy p FN j = py j =0 yj =1) and p NF j = py j =1 yj =0). The TxOp profile of he N channels is represened by y =[y1,...,y N ]. As in [13], we assume ha a polling-based medium-access proocol is deployed in he SN, which is arbiraed by a CSM.

3 1906 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 58, NO. 4, MAY 2009 Fig. 1. Concepual overview of he muli-su ineracion in he SN. The polling policy is only changed a he sar of every ime slo. For simpliciy, we assume ha each SU can access a single channel, and ha each channel can be accessed by a single SU wihin he ime slo. The SUs can swich he channels only when crossing ime slos. Noe ha his simple medium-access model used for illusraion in his paper can easily be exended o more sophisicaed models [10], where each SU can simulaneously access muliple channels or he channels are being shared by muliple SUs, ec. When using his ime-division channel access, we assume ha he wireless users deploy consan ransmission power and experience no inerference. Furhermore, we assume ha he wireless users move slowly, and hus, heir experienced channel condiions slowly change. During each ime slo, an SU needs o firs deermine how o compee wih he oher SUs for he ime-varying TxOps. This represens is exernal acions, since hey deermine he ineracion beween his SU and he oher SUs, and he amoun of resources allocaed o ha SU. The exernal acions a ime slo are denoed by a i A i, where A i is he se of possible exernal acions available o SU i. Based on he allocaed resources, he SU deermines how o ransmi is raffic applicaion layer daa) by selecing he various sraegies a differen layers of he open sysems inerconnecion OSI) sac e.g., hrough cross-layer adapaion [12]). These acions are referred o as inernal acions, since hey only deermine he SU s uiliy a he curren ime. The inernal acions a ime slo are denoed by b i B i, where B i is he se of possible inernal acions available o SU i. In his paper, we propose an aucion mechanism deployed in he CSM. Hence, he exernal acion a i of SU i is he bid i submis o CSM. The aucion mechanism will be deailed in Secion III. The environmen experienced by an SU i can be characerized by is curren sae s i S i, which will be discussed in Secion IV. A each ime slo, SU i generaes he exernal acion a i o compee for he TxOps y. The compeiion resul is ϑ i, based on which SU i performs is inernal acion b i and obains he reward r i a his ime slo. Afer pace ransmission, SU i ransis o he nex sae s i S i. The concepual overview of he muli-su ineracions in he repeaed aucions is illusraed in Fig. 1. The repeaed compeiion among he SUs can be modeled as a sochasic game [16], [22]. The ime slo corresponds o he erm sage, which is commonly used in sochasic games. In he remainder of his paper, we inerchangeably use he erms ime slo and sage. We define he sochasic game for SN resource allocaion as S i,a i,b i,o i,q i,r i M i=1, Y, where each SU i is associaed wih a uple S i,a i,b i,o i,q i,r i. Specifically, we have he following. 1) Y is a finie se of possible TxOps available for SUs. In his paper, Y = {0, 1} N, and y Y is he available TxOps a sage, which is common informaion for SUs. 2) S i is a finie local sae space of SU i. WeleS := N =1 S be he global sae space of all SUs and S i := i S be he global sae space of SUs oher han i. A sage, he global sae is denoed by s = s 1,...,s M )=s i, s i ), where i represens all he SUs oher han i. 3) A i is a finie se of exernal acions performed by SU i o compee for he available TxOps. The exernal acion vecor a sage for all SUs will be a =a 1,...,a M ). 4) B i is a finie se of inernal acions performed by SU i o deermine he pace ransmission. 5) O i is a finie se of possible oupu from muli-su compeiion. In his paper, he oupu ϑ i O i is he aucion resul compued by he CSM for SU i a sage. We will give he specific form of he oupu in Secion III. 6) q i is he sae ransiion probabiliy for SU i. Thus,,y s i,y,ϑ i,b i ) is he probabiliy ha he sae o s i and TxOp ransis from y o y if he compeiion oupu is ϑ i and he inernal acion is b i. The reason ha he ransiion probabiliy includes he common TxOp y is because he channel condiion ransiion of SU i depends on he available TxOp. 7) r i is he sage reward immediae reward) received by SU i, where r i :S i,o i,b i ) R. I should be noed ha he reward funcion r i depends on he compeiion oupu q i s i of SU i ransis from s i

4 FU AND VAN DER SCHAAR: LEARNING TO COMPETE FOR RESOURCES IN WIRELESS STOCHASTIC GAMES 1907 and, hence, indirecly depends on he oher SUs exernal acions. To design a sochasic game for he SN wih sraegic SUs, we have o consider he following: 1) Wha aucion mechanism can be deployed o resolve he compeiion among SUs; 2) how he dynamic environmen experienced by each SU can be modeled; and 3) how he SUs can forecas he impac of heir bids made a he curren ime on heir fuure performance? Fig. 2. Informaion exchange beween CSM and SU i. III. AUCTION MECHANISM ONE STAGE RESOURCE ALLOCATION In his paper, we assume ha he CSM is aware of he TxOp y and allocaes hrough polling he SUs) hose channels wih yj =1 o he SUs. To efficienly allocae he available resources opporuniies), he CSM needs o collec informaion abou he SUs [21]. However, as menioned in Secion I, in a wireless newor, he informaion is decenralized, and hus, he informaion exchange beween he SUs and he CSM needs o be ep limied due o he incurred communicaion cos. On he oher hand, he SUs compeing wih each oher are selfish and sraegic, and hence, he informaion hey hold is privae, and hey may no desire o reveal his informaion o he CSM. Therefore, one of our ey ineress in his paper is o deermine wha informaion should be exchanged beween he SUs and he CSM and how his informaion should be exchanged. In he following, we presen an aucion mechanism for dynamically coordinaing he ineracions among SUs and discuss he compuaional complexiy in he CSM and he communicaion cos beween SUs and CSM. Firs, he CSM announces he aucion by broadcasing he TxOp y. The SUs receive he announcemen and deermine he exernal acion i.e., he bid vecor) a i =[a i1,...,a in ] RN based on he announced informaion and heir own privae informaion abou he environmen hey experience, which is discussed in deail in Secion IV. Subsequenly, each SU submis he bid vecor o he CSM. Afer receiving he bid vecors from he SUs, he CSM compues he channel allocaion z i =[z i1,...,z in ] {0, 1}N for each SU i based on he submied bids. To compel he SUs o ruhfully declare heir bids [23], he CSM also compues he paymen τi R ha he SUs have o pay for he use of resources during he curren sage of he game. The negaive value of he paymen means he absolue value ha SU i has o pay he CSM for he used resources. Hence, he compeiion oupu ϑ i in his aucion mechanism includes he channel allocaion z i and he paymen τ i, i.e., ϑ i =z i,τ i ). The compeiion oupu is hen ransmied bac o he SUs. The compuaion of he channel allocaion z i and paymen τ i is described as follows. Afer each SU submis he bid vecor, he CSM performs wo compuaions, i.e., channel allocaion and paymen compuaion. Noe ha mos exising muliuser wireless resource allocaion soluions can be modeled as such repeaed aucions for resources. If he resources are priced or he users may lie abou heir resource needs, axes associaed wih he resource usage will need o be imposed [14]. Oherwise, hese axes can be considered o be zero hroughou he paper. We denoe he channel allocaion marix Z =[zij ] M N wih zij being 1 if channel j is assigned o SU i, and 0 oherwise. The feasible se of channel assignmens is denoed as Z = {Z M i=1 z ij = y j, j, N j=1 z ij 1, i, z ij {0, 1}}. The channel allocaion marix wihou he presence of SU i is denoed Z i =[z j ] M 1) N, and he corresponding feasible se is Z i = {Z i M =1, i z j = yj j, N j=1 z j 1 i, z j {0, 1}}, where i = {1,...,i 1,i+1,...,M}. During he firs phase, he CSM allocaes he channels o SUs based on is adoped fairness rule, e.g., maximizing he oal social welfare, 5 as Z,op = arg max Z Z M i=1 j=1 N zija ij. 1) If he resources are priced, we will consider in his paper, for illusraion, a second-price aucion mechanism [19], [23] for deermining he ax ha needs o be paid by SU i based on he above opimal channel assignmen Z,op =[z,op ij ] M N.This ax is equal o τ i = M N =1, i j=1 z,op j a j max M N Z i Z i =1, i j=1 z ja j. Noe ha when N =1, he generalized aucion mechanism presened above becomes he well-nown second-price aucion [19]. Alhough he opimizaion problems in 1) and 2) are discree opimizaions, hey can efficienly be solved using linear programming. As argued in [20], he linear opimizaion problem can be solved in polynomial ime, and hence, he CSM only requires limied compuaional complexiy. The informaion exchange beween he CSM and he SUs is illusraed in Fig. 2. From Fig. 2, we noe ha, a each sage, he CSM firs broadcass he available TxOps o all he SUs for he aucion, and hen each SU submis is own bid vecor over all he available TxOps. Afer receiving he bids, he CSM compues he aucion resuls and sends bac o he users he channel allocaions and he corresponding paymens. The signaling required for he aucion is mos ofen implemened a he applicaion layer. In he wors case, he amoun of 5 Noe ha oher fairness soluions han maximizing he social welfare could be adoped, and his will no influence our proposed soluion. 2)

5 1908 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 58, NO. 4, MAY 2009 daa communicaed beween he CSM o he SUs is equal o M +1)N + nn bis, where n is he amoun of bis represening he paymen for each SU. The amoun of daa communicaed by each SU o he CSM is n N bis, where n is he amoun of bis represening he bid submied o he CSM on each channel. Compared wih radiional one-sage resource allocaion mehods, our proposed aucion mechanism has he following advanages. 1) Unlie radiional cenralized resource allocaion mehods [30], our proposed aucion mechanism is no required o now he SUs uiliy funcions or preferences, which is ofen he privae informaion of he users and is no common nowledge. In fac, our aucion mechanism only requires he SUs o submi heir bid vecors for he available TxOps. The bid vecor compuaion is performed by he SUs, bu no he CSM, based on heir uiliies, preferences, acion ses, experienced environmen characerisics, ec. 2) Unlie radiional decenralized resource allocaion mehods [28] where muliple ieraions are required before convergence, our proposed aucion mechanism only requires he SUs o submi he bid vecors once. Hence, our proposed aucion mechanism is suiable for online resource managemen. Moreover, we do no assume as in [29] ha users are price aers and ha here is consensus abou wha is a fair disribuion of he resources. Insead, in he proposed framewor, users are sraegic and are able o deermine heir own bid vecors for resources based on heir nowledge, uiliies, preferences, ec. IV. USER MODELING IN THE STOCHASTIC GAME FRAMEWORK A. Definiion of SU Saes As discussed in Secion I, each SU needs o cope wih wo ypes of uncerainies, i.e., disurbances from he environmen and ineracions wih oher SUs. The environmen is characerized by pace arrivals from he source i.e., source/raffic characerizaion) conneced wih he ransmier and he channel condiions. In his secion, we will illusrae how hese disurbances can be modeled. However, noe ha oher models of he environmen exising in he lieraure can be adoped. The use of a specific model will only affec he performance of he proposed soluion and no he general framewor for muliuser ineracion proposed in his paper. For illusraion, we assume ha each SU i mainains a buffer wih limied size X i, which can be inerpreed as a ime window ha specifies which paces are considered for ransmission a each ime based on heir delay deadlines. Expired paces are dropped from he buffer. This model has exensively been used for delay-sensiive daa ransmission, e.g., leay buce model for video ransmission [25]. The number of paces in he buffer a ime slo is denoed as x i 0 x i X i). We assume ha he paces arrive from he source a he beginning of each ime slo, i.e., x i is only updaed a he beginning of a ime slo. The number of paces arriving ino he buffer during one ime slo is a random variable independen of he ime and denoed as χ i. χ i follows he Poisson disribuion wih he average arrival rae χ i paces per second [11]. However, noe ha he Poisson process is simply used for illusraion purposes, and oher raffic models e.g., renewal process, ec.) can also be used in our framewor. The average number of paces arriving during one ime slo is equal o χ i ΔT [11]. The condiion of channel j experienced by SU i is represened by he signal-o-noise raio SNR) and denoed as ρ ij in decibels). When y j =1, we assume ha he channel condiion of each channel can be represened by a se of discree SNR values, i.e., ρ ij {σ1 ij,...,σk ij }. Noe ha he number of discree SNR values K can be deermined by SU i by rading off he complexiy a larger K leads o a larger sae space) and he resuling impac on performance. When yj =0,weseρ ij equal o, which means ha he channel is unavailable o SUs a ha ime. As shown in [24], when yj =1, he channel condiion in erms of SNR) can also be modeled as a finie-sae Marov chain, where he ransiion from channel condiion σij l a ime o channel condiion σij a ime +1aes place wih probabiliy p l ij. These ransiion probabiliies can easily be esimaed by SU i by repeaedly ineracing wih he channel. We denoe by p ij he probabiliy ha he channel condiion is σij a ime +1, nowing ha y j =0and y j =1. The probabiliy ha he channel condiion ransiion o, nowing ha y j =0, is 1 no maer in wha condiion he channel j is a ime. Then, he combinaion yj,ρ ij ) is sill a Marov chain wih sae ransiion probabiliy as in 3), shown a he boom of he page. To model he dynamics experienced by SU i a ime in he SN, we define a sae s i =v i, ρ i ) S i, where ρ i = ρ i1,...,ρ in ). The sae encapsulaes he curren buffer sae as well as he sae of each channel. S i is he se of possible saes. 6 The oal number of possible saes for SU i is equal o S i =X i +1) K +1) N. We will show laer in his paper ha he sae informaion is sufficien for SU i o compee for resources mae bid vecor) a he curren ime. 6 We assume ha he channel sae and he ransmission buffer independenly evolve as ime goes by. ) 1 p FN j p l p ij, if yj =1, ρ ij = σl ij, y j =1, ρ ij = σij y j,ρ ij yj,ρ p ij) NF = j p ij, if yj =0, y j =1, ρ ij = σij p FN j, if yj =1, ρ ij = σl ij, y j =0 1 p NF j o. w. 3)

6 FU AND VAN DER SCHAAR: LEARNING TO COMPETE FOR RESOURCES IN WIRELESS STOCHASTIC GAMES 1909 B. Sae Transiion and Sage Reward We will now discuss he sae ransiion process. Remember ha he sae of SU i includes he buffer sae vi and he channel sae ρ i. In his paper, we assume ha he channel sae ransiion is independen of he buffer sae ransiion. In he above, we describe he ransiion of he channel sae ρ i and he TxOp y. The buffer sae ransiion is deermined by he number of paces arriving and he channel allocaion zi as well as he inernal acion b i during ha ime slo. The number of paces ransmied a sage is denoed by N i s i,z i,b i ). Given he channel allocaion, SU i can adap is own inernal acion o maximize he number of ransmied paces, i.e., n i s i, z i) = max N i s i, z i,b i). 4) b i B i The opimizaion can be performed by a cross-layer adapaion algorihm as in [5], [12], and [21]. Since our focus is on he muli-su ineracion, we assume ha he inernal acion will always be performed o maximize he number of ransmied paces. We simply use n i s i, z i ) o represen he number of ransmied paces and omi he inernal acions in he following noaions. The evoluion of he buffer sae is capured by v i =min{vi ns i, z i ))+ +χ i,x i}. We define h=v i vi ns i, z i ))+. Based on he pace arrival model, he buffer sae ransiion probabiliy is compued as in 5), shown a he boom of he page. The sae ransiion combined wih TxOps, given he curren resource allocaion z i, can be compued as q i s i, y s i, y, z i = p buf i v i vi, z ) i }{{} j=1 buffer sae ransiion ) N p y j,ρ ij yj,ρ ) ij } {{ } channel sae ransiion where he firs erm represens he buffer sae ransiion, which is independen of he second erm of he channel sae ransiion. Based on he channel allocaion zi, he SU ransmis he available paces in he buffer. In he nex ime slo, new paces arrive ino he buffer. Newly incoming paces may lead o paces already exising in he buffer being dropped whenever he buffer is full or heir delay deadline has passed. Clearly, he performance of he applicaion e.g., video qualiy) improves when fewer paces are los. Hence, we can inerpre a negaive value of he number of los paces as he sage gain, which is denoed by gi, i.e., gi s i, y, z i )= v i n is i, z i ))+ + χ i X i) +. The reward a ime for SU i is expressed using he quasi-linear form r i s i,ϑ i )=g i + τ i. Noe ha he gain g i and paymen 6) τi depend on he saes and bids of all he compeing SUs in he SN. Hence, he reward is also rewrien as r i s, y, a ). V. B IDDING STRATEGY FOR PLAYING THE STOCHASTIC GAME A. Bes-Response Bidding Policy In he SN, we assume ha he sochasic game is played by all he SUs for an infinie number of sages. This assumpion is reasonable for applicaions having a long duraion, such as video sreaming. In our newor seing, we define a hisory of he sochasic game up o ime as h = {s 0, y 0, a 0, z 0, τ 0,...,s 1, y 1, a 1, z 1, τ 1, s, y } H, which summarizes all previous saes, available TxOps, and he acions aen by he SUs as well as he oucomes a each sage of he aucion game, and H is he se of all possible hisories up o ime. However, during he sochasic game, each SU i canno observe he enire hisory bu raher par of he hisory h. The observaion of SU i is denoed as o i O i and o i h. Noe ha he curren sae s i can always be observed, i.e., s i o i. In his paper, we focus on he exernal acion selecion for he SUs. The exernal acion selecion for SU i o play he sochasic game is also referred o as a bidding policy πi : O i A i for SU i a ime and defined as a mapping from he observaions up o he ime ino he specific acion, i.e., a i = π i o i ). Furhermore, a policy profile π i for SU i aggregaes he bidding policies abou how o play he game over he enire course of he sochasic game, i.e., π i =πi 0,...,π i,...). The policy profile for all he SUs a ime slo is denoed as π =π1,...,π M )=π i, π i ). The policy π i is said o be Marov if he bidding policy πi for is, given he curren sae s i and curren available TxOp y, independen of he saes, TxOps, and acions prior o he ime, i.e., πi o i )=π i s i, y ). The policy π i is said o be saionary if he bidding policy πi = π i for. Therewardr i s, y, a ) of he sage is discouned by he facor α i ) a ime. The facor α i 0 α i < 1) is he discouned facor deermined by a specific applicaion for insance, for video sreaming applicaions, his facor can be se based on he olerable delay). The oal discouned sum of rewards Q i s, y, π) for SU i can be calculaed a ime saring from he sae profile s, assuming ha all SUs deploy saionary and Marov policies π =π i, π i ),asin7), shown a he boom of he nex page. The oal discouned sum of rewards in 7) consiss of wo pars: 1) he curren sage reward and 2) he expeced fuure reward discouned by α i. Noe ha SU i canno independenly deermine he above value wihou explicily nowing he policies and saes of oher SUs. The SU maximizes he oal discouned sum of fuure rewards o selec he bidding policy, which explicily considers p buf i v i vi, z i) = μ i ΔT ) h e μ i ΔT =h h!, if 0 h<x i vi n s i, z i ))+ μ i ΔT ) e μ i ΔT!, if h = X i vi n s i, 5) z i ))+

7 1910 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 58, NO. 4, MAY 2009 he impac of he curren bid vecor on he expeced fuure rewards. We define he bes response β i for SU i o oher SUs policies π i as β i π i ) = arg max Q i s, y, π i, π i ) ). 8) π i The cenral issue in our sochasic game is how he besresponse policies can be deermined by he SUs. In he repeaed aucion mechanism discussed in Secion III, he procedure ha each SU i follows o compee for he channel opporuniies is illusraed in Fig. 3. In his procedure, he bidding sraegy πi is coninuously improved by he bidding sraegy improvemen module. In Secion V-B, we discuss he challenges involved in building such a module, and in Secion VI, we develop a besresponse learning algorihm ha can be used o improve he bidding sraegy. B. Challenges for Selecing he Bes-Response Bidding Policy Recall ha during each ime slo, he CSM announces an aucion based on he available TxOps, and hen SUs bid for he resources. To enable he successful deploymen of his resource aucion mechanism, we can prove similar o our prior wor in [21]) ha SUs have no incenive o misrepresen heir informaion, i.e., hey adhere o he ruh elling policy. We assume ha a each ime slo, SUi has preference u ij over he channel j, which capures he benefi derived when using ha channel. The preference u ij is inerpreed as he benefi obained by SU i when using channel j compared o he benefi when his channel is no used. Noe ha his benefi also includes he expeced fuure rewards. The opimal bid a,op ij ha SU i can ae on channel j a ime is he bid maximizing he ne benefi u ij + τ i. In he aucion discussed in Secion III, he opimal bid ha SU i can mae is a,op ij = u ij, i.e., he opimal bid for SU i is o announce is rue preference o he CSM [21]. The proof is omied here due o space limiaions, since i is similar o ha in [21]. The paymen made by SU i is compued by he CSM based on he inconvenience incurred by oher SUs due o SU i during ha ime slo [23]. Nex, we define he preference u ij in he conex of he sochasic game model. Using he channel j, SUi obains he immediae gain gi s i, y, e j ) by ransmiing he paces in is buffer, where e j indicaes ha channel j is allocaed o SU i during he curren ime slo. SU i hen moves ino he nex sae s i from which i may obain he fuure reward Q i s, y, π). On he oher hand, if no channel is assigned o SU i, i receives he immediae gain gi s i, y, 0) and hen moves ino he nex sae s i, from which i may obain he fuure reward Q i s, y, π). We define a feasible se of channel assignmens o SU i s opponens given SU i s channel allocaion z i )asz i z i ), wih Z i z i )={Z i M =1, i z j = y j z i j, N j=1 z j 1 i, z j {0, 1}}. The preference over he curren sae can hen be compued as u ijs, y ) [ = s i, y ), e j + αi g i [ s S y {0,1} N q i s i, y s i, y, e j ) Z i Z i e j) [ M ]]] q s, y s, y, z ) Q i s, y, π) [ =1 s i, y, 0 ) + α i [ g i s S y {0,1} N q i s i, y s i, y, 0 ) Z i Z i 0) [ M ]]] q s, y s, y,z ) Q i s, y, π). =1 9) Q is, y, π) = α i ) r i s, y, πs, y ) ) = r i s, y, πs, y ) ) }{{} = sage reward a ime { M } + α i q s, y s, y, z πs, y )) Q i s, y, π) = { g i s S y {0,1} N =1 } {{ } expeced fuure reward s i, y, z i πs, y ) )) + τi πs, y ) ) } {{ } sage reward a ime { M + α i s S y {0,1} N =1 q s, y s, y, z πs, y )) Q i s, y, π) } {{ } expeced fuure reward } 7)

8 FU AND VAN DER SCHAAR: LEARNING TO COMPETE FOR RESOURCES IN WIRELESS STOCHASTIC GAMES 1911 Fig. 3. Procedure for SU i o play he aucion game a ime slo. From his equaion, i is clear ha he rue value u ij depends no only on is own curren sae s i bu also on he oher SUs saes s i, he channel allocaions Z i e j) o he oher users when channel j is assigned o SU i, Z i 0) when SU i is no assigned o any channel, and he sae ransiion models q s, y s, y, z ). However, he oher SUs saes, he channel allocaions, and he sae ransiion models of oher SUs are no nown o SU i, and i is, hus, impossible for each SU o deermine is preference u ij s, y ). Wihou nowing he oher SUs saes and sae ransiion models, SU i canno derive is opimal bidding sraegy a,op ij = u ij s, y ). However, if SU i chooses he bid vecor by only maximizing he immediae reward gi + τ i, i.e., he oal discouned sum of reward degeneraes in Q i s, y, π) =gi s i, y, z i πs, y ))) + τi πs, y ) by seing α i =0. Then, he preference over channel j becomes u ij s, y )=gi s i, y, e j ) gi s i, y, 0). Now, since u ij only depends on he sae s i, SU i can compue boh he opimal bid vecor and he opimal bidding policy. We refer o his opimal bidding policy as he myopic policy since i only aes he immediae reward ino consideraion and ignores he fuure impac. The myopic policy is referred o as π myopic i.to solve he difficul problem of opimal bidding policy selecion when α i 0, an SU needs o forecas he impac of is curren bidding acions on he expeced fuure rewards discouned by α i. The forecas can be performed using learning from is pas experiences. VI. INTERACTIVE LEARNING FOR PLAYING THE RESOURCE MANAGEMENT GAME A. How o Evaluae Learning Algorihms? Secion V-B shows ha an SU needs o now he oher SUs saes and sae ransiion models o derive is own opimal bidding policy. This coupling among SUs is due o he shared naure of wireless resources. However, an SU canno exacly now he oher SUs models and privae informaion in wireless newors. Thus, o improve he bidding policy, an SU can only predic he impacs of dynamics uncerainies) caused by he compeing SUs based on is observaions from pas aucions. In his paper, we propose a learning algorihm for predicing hese impacs. We define a learning algorihm L i for SU i as a funcion aing he observaion o i as inpu and having he bidding policy πi as oupu.

9 1912 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 58, NO. 4, MAY 2009 Before developing a learning algorihm, we firs discuss how o evaluae he performance of a learning algorihm in erms of is impac on he SU s reward. Unlie exising muliagen learning research, which is aimed a achieving converge o an equilibrium poin for he ineracing agens, we develop learning algorihms based on he performance of he bidding sraegy on he SU s reward. We denoe a bidding policy generaed by he learning algorihm L i as π L i i. An SU will learn o improve is bidding policy and is rewards from paricipaing in he aucion game. The performance of he bidding sraegy π i is defined as he ime average reward ha SU i obains in a ime window wih lengh T when i adops π i, i.e., V π i T )= 1 T T ri. 10) =1 Using his definiion, he performance of wo learning algorihms can easily be compared. For insance, given wo algorihms L i i and L i,ifvπl i > V πl i i, hen we say ha he learning algorihm L i is beer han L i. B. Wha Informaion o Learn From? Firs, le us consider wha informaion he SU can observe while playing he sochasic game in our SN. As shown in Fig. 1, a he beginning of ime slo, he SUs submi he bids a i i. Then, he CSM reurns he channel allocaions z i i and τi i. If SU i is no allowed o observe he bids, he channel allocaions, and paymens for oher SUs, hen he observaion of SU i becomes o i = {s 0 i, y0,a 0 i, z0 i, τ 0 i,...,s 1 i, y 1,a 1 i, z 1 i, τ 1 i,s i, y }.If he informaion is exchanged among SUs or broadcased and overheard by all SUs, he observed informaion by SU i becomes o i = {s0 i, y0,a 0, z 0, τ 0,..., s 1 i, y 1, a 1, z 1, τ 1,s i, y }. Now, he problem ha needs o be solved by SU i is how i can improve is own policy for playing he game by learning from he observaion o i.in his paper, we assume ha SU i observes he informaion o i = {s 0 i, y0,a 0 i, z0 i, τ 0 i,...,s 1 i, y 1,a 1 i, z 1 i, τ 1 i,s i, y }. C. Wha o Learn? In Secion VI-A, we inroduce learning as a ool o predic he impacs of dynamics and, hence, improve he bidding policy. However, a ey quesion is wha needs o be learned. Recall ha he opimal bidding policy for SU i is o generae a bid vecor ha represens is preferences for using differen channels. From 9), we can see ha SU i needs o learn he following: 1) he sae space of oher SUs, i.e., S i ; 2) he curren sae of oher SUs, i.e., s i ; 3) he ransiion probabiliy of oher SUs, i.e., i q s, y s, y, z ); 4) he resource allocaions Z i e j) j and Z i 0); and 5) he discouned sum of rewards Q i s, y, π). However, SU i can only observe he informaion o i = {s0 i, y 0,a 0 i, z0 i, τ 0 i,...,s 1 i, y 1,a 1 i, z 1 i, τ 1 i,s i, y } from which SU i canno accuraely infer he oher SUs sae space and ransiion probabiliy. Moreover, capuring he exac informaion abou oher SUs requires heavy compuaional and sorage complexiy. Insead, we allow SU i o classify he space S i ino H i classes, each of which is represened by a represenaive sae s i,h, h {1,...,H i }. We discuss how he space S i is decomposed in Secion VI-D. By dividing he sae space S i, he ransiion probabiliy i q s, y s, y, z ) is approximaed by q i s i,y s i, y, z i ), where s i and are he represenaive saes of he classes o which s i and s i s i belong. This approximaion is performed by aggregaing all he oher SUs saes ino one represenaive sae and assuming ha he ransiion depends on he resource allocaion zi. The ransiion probabiliy approximaion is also discussed in Secion VI-D. The discouned sum of rewards Q i s, y, π) is approximaed by V i s i, s i ), y ). Noe ha he classificaion on he sae space S i and he approximaion of he ransiion probabiliy and discouned sum of rewards affec he learning performance. Hence, a user can radeoff an increased complexiy for an increased performance. Afer he classificaion, he preference compuaion can be approximaed as u ij s i, s i), y ) [ = giq s i, y ), e j +αi [ [ s, s i i ) S i, S i ) y {0,1} N q i s i, y s i, y, e j ) q i s i, y s i, y, e j ) g i V i s i, s ) i, y )]] s i, y, 0 ) +α i [ s, s i i ) S i, S i ) y {0,1} N q i s i, y s i, y, 0 ) q i s i, y s i, y, 0 ) V i s i, s ) i, y )]]. 11) In his seing, o find he approximaed preference and, hus, he approximaed opimal bidding policy, we need o learn he following from pas observaions: 1) how he space S i is classified; 2) he ransiion probabiliy q i s i, y s i, y, z i ); and 3) he approximaed fuure i ), y ). rewards V i s i, s D. How o Learn? In his secion, we develop a learning algorihm o esimae he erms lised in Secion VI-C. 1) Decomposiion of he Space S i : As discussed in Secion VI-B, only o i = {s0 i, y0,a 0 i, z0 i, τ 0 i,...,s 1 i, y 1,a 1 i, z 1 i, τ 1 i,s i, y } are observed. From he aucion mechanism presened in Secion III, we now ha he value of

10 FU AND VAN DER SCHAAR: LEARNING TO COMPETE FOR RESOURCES IN WIRELESS STOCHASTIC GAMES 1913 he ax τi is compued based on he inconvenience ha SU i causes o he oher SUs. In oher words, a higher value of τi indicaes ha he newor is more congesed. 7 Based on he bid vecor b i, he channel allocaion z i, and he ax τ i,sui can infer newor congesion and hus, indirecly, he resource requiremens of he compeing SUs. Insead of nowing he exac sae space of oher SUs, SU i can classify he space S i as follows. We assume ha he maximum absolue ax is Γ. We spli he range [0, Γ] ino [Γ 0, Γ 1 ), [Γ 1, Γ 2 ),...,[Γ Hi 1, Γ Hi ] wih 0= Γ 0 Γ 1 Γ Hi =Γ. Here, we assume ha he values of {Γ 1,...,Γ Hi 1} are equally locaed in he range of [0, Γ]. Noe ha more sophisicaed selecion for hese values can be deployed, and his forms an ineresing area of fuure research.) We need o consider hree cases o deermine he represenaive sae s i a ime. 1) If he resource allocaion z i 0, hen he represenaive sae of he oher SUs is chosen as s i = h, if τ i [Γh 1, Γ h ). 12) 2) If he resource allocaion z i = 0 bu y 0, heaxis 0. In his case, we canno use he ax o predic newor congesion. However, we can infer ha he congesion is more severe han he minimum bid for hose available channels, i.e., min j {l:y l 0} {a ij }. This is because, in his curren sage of he aucion game, only SU i wih a i j a ij can obain channel j, which indicaes ha τi min j {l:y l 0} {a ij } if SU i is allocaed any channel. Then, he represenaive sae of he oher SUs is chosen as s i = h, if min j {l:y l 0} { a ij } [Γh 1, Γ h ). 13) 3) If he resource allocaion z i = 0 and y = 0, here is no ineracion among he SUs in his ime slo. Hence, s i = s 1 i. 7 When he CSM deploys a mechanism wihou ax for resource managemen, he space classificaion for oher SUs can also be done based on he announced informaion and corresponding resource allocaion. 2) Esimaing he Transiion Probabiliy: To esimae he ransiion probabiliy, SU i mainains a able F wih size H i H i N +1). Each enry f h,h,j in he able F represens he number of ransiions from sae s i = h o sae s i = h when he resource allocaion z i = e j or 0 if j =0). I is clear ha H i will significanly influence he complexiy and memory requiremens, ec., of SU i. The updae of F is simply based on he observaion o i and he sae classificaion in he above secion. Then, we use he frequency o approximae he ransiion probabiliy [15], i.e., q i s i = h s i = h ) f h, e j =,h,j h f. 14) h,h,j 3) Learning he Fuure Reward: By classifying he sae space S i and esimaing he ransiion probabiliy, SU i can now forecas he value of he average fuure reward V i s i, s i ), y ) using learning. Equaion 7) can be approximaed by 15), shown a he boom of he page. Similar o he Q-learning esablished in [17], we also use he received rewards o updae he esimaion of fuure rewards. However, he main difference beween our proposed algorihm and Q-learning is ha our soluion explicily considers he impacs of oher SUs bidding acions hrough he sae classificaions and ransiion probabiliy approximaion. We use a 3-D able o sore he value V i s i, s i ), y) wih s i S i, s i S i. The oal number of enries in V i is S i H i 2 N.SUi updaes he value of V i s i, s i ), y) a ime according o he rules in 16), shown a he boom of he page, where γi [0, 1) is a learning rae facor saisfying =1 γ i =, and =1 γ i )2 < [17]. In summary, he learning procedure ha is developed for an SU is shown in Table I. E. Complexiy of Learning In Secion III, we have discussed he compuaion complexiy incurred by he CSM and he communicaion cos beween he CSM and he SUs. In his secion, we furher quanify he complexiy of learning in erms of he compuaional and sorage burden. We use a floaing-poin operaion flop ) as a measure of complexiy, which will provide us an esimaion of Q i s i, s ) i, y, π ) {. = gi s i, y, z i πs, y ) )) + τi πs, y ) ) + α i { q i s i, y s i, y, z i πs, y ) )) q i s i, y s i, y, z i s, s i i ) S i, S i ) y {0,1} N πs, y ) )) V i s i, s ) i, y ) }} 15) { 1 γ Vi s i, s i ), y) = i ) Vi 1 s i, s i ), y)+γi Q i s i, s i ), y, π), if s i, i) s =si, s i ), y = y Vi 1 s i, s i ), y), oherwise 16)

11 1914 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 58, NO. 4, MAY 2009 TABLE I LEARNING PROCEDURE Fig. 4. Bidding sraegies based on he required informaion. he compuaional complexiy required o perform he learning algorihm. In addiion, based on his, we can deermine how complexiy grows wih he increasing number of SUs [20]. A each sage, SU performs he classificaion of oher he SUs saes, which, in he wors case, requires a number of flops of approximaely N. The number of flops o esimae he ransiion probabiliy of oher SUs saes as in 14) is approximaely H i +1). The number of flops o learn he fuure reward is approximaely 2 S i H i +6). Therefore, he oal number of flops incurred by he SU is N + H i +2 S i H i +7,from which we can noe ha he complexiy of learning for each SU is proporional o he number of possible saes of ha SU and he number of classes in which he oher SUs sae space is decomposed. To perform he learning algorihm, he SU needs o sore wo ables i.e., ransiion probabiliy able and sae value able), which, in oal, have H 2 i N +1)+2N S i H i ) enries. We also noe ha he sorage complexiy is also proporional o he number of possible saes of ha SU and he number of classes in which he oher SUs sae space is decomposed. VII. SIMULATION RESULTS In his secion, we aim a quanifying he performance of our proposed sochasic ineracion and learning framewor. We assume ha he SUs compee for available specrum opporuniies o ransmi delay-sensiive mulimedia daa. Firs, we compare he performance of various bidding sraegies. Nex, we quanify he performance of our proposed learning algorihm in various newor environmens. We will only presen here several illusraive examples. However, he same observaions can be obained using a larger number of SUs or channels. A. Various Bidding Sraegies for Dynamic Muliuser Ineracion In his secion, we highligh he meris of he sochasic game framewor proposed in Secion II by comparing he performance of differen SUs, which deploy differen bidding sraegies. The SUs are required o submi he bid vecor on he available channels. The SUs can deploy differen bidding sraegies o generae heir bid vecor. 1) Fixed bidding sraegy πi fixed : This sraegy generaes a consan bid vecor during each sage of he aucion game, irrespecive of he sae ha SU i is currenly in and of he saes oher SUs are in. In oher words, πi fixed does no consider any of he dynamics defined in Secion IV. 2) Source-aware bidding sraegy πi source : This sraegy generaes various bid vecors by considering he dynamics in source characerisics based on he curren buffer sae) bu no he channel dynamics. 3) Myopic bidding sraegy π myopic i : This sraegy aes ino accoun he disurbance due o he environmen as well as he impac caused by oher SUs, as discussed in Secion V-B. However, i does no consider he impac on fuure rewards. 4) Bidding sraegy based on bes-response learning π L i i : This sraegy is produced using he learning algorihm proposed in Secion VI. π L i i considers he wo ypes of dynamics defined in Secion IV and he ineracion impac on fuure reward. In erms of required informaion, he above bidding sraegies are illusraed in Fig. 4. For insance, he fixed bidding sraegy πi fixed does no require informaion abou SU i s sae or oher SUs saes. The source-aware bidding sraegy πi buff considers

12 FU AND VAN DER SCHAAR: LEARNING TO COMPETE FOR RESOURCES IN WIRELESS STOCHASTIC GAMES 1915 TABLE II PERFORMANCE OF SU 1 AND 2WITH VARIOUS BIDDING STRATEGIES IN THE TWO SU NETWORKS Fig. 5. Accumulaed pace loss and cos of SU 1 in he five scenarios. a) Accumulaed pace loss over he ime slo. b) Accumulaed cos over he ime slo. he source characerisics based on he curren buffer sae. However, he myopic bidding sraegy π myopic i requires full informaion abou SU i s sae. The bidding sraegy based on bes-response learning π L i i also requires informaion abou he saes of oher SUs. In his simulaion, we consider he SN as an exension of WLANs wih specral agile capabiliy [9]. In he following, we firs simulae he case ha wo SUs compee for he channel opporuniies and hen exend o he case wih muliple five) SUs. 1) Compeiion Among Two SUs for Channel Opporuniies: We firs consider a simple illusraive newor wih wo SUs compeing for available TxOps. The pace arrivals of he SUs are modeled using a Poisson process wih he same average arrival rae of 1 Mb/s. For simpliciy of illusraion, he channel condiion of SU 1 SU 2) on each channel only aes hree values K =3), which are 18, 23, and 26 db. The ransiion probabiliies are p 0 1 ij = p 0 2 ij =0.4, p 0 3 ij =0.2, p l 1 1j = p l 2 1j = 0.4, and p l 3 1j =0.2 i, j, l. The ransiion probabiliy of he availabiliy of channels o SUs is p NF j = p FN j =0.5. Forsimpliciy of illusraion, he environmen parameers experienced by he wo SUs are he same. The lengh of he ime slo ΔT is 10 2 s. In his simulaion, we consider five scenarios. In scenario 1, boh SU 1 and SU 2 deploy he fixed bidding sraegy π1 fixed. In scenarios 2 5, SU 1 deploys he fixed bidding sraegy π1 fixed, source-aware bidding sraegy π1 source, myopic bidding sraegy π myopic 1, and bes-response learning-based bidding sraegy π L 1 1, respecively, and SU 2 always deploys he myopic bidding sraegy π myopic 2. The discouned facor for he besresponse learning algorihm is se o 0.8. As discussed in Secion IV-B, he sage reward is defined as ri =g i + τ i ), wih gi τ i ) being he number of pace los plus he ax charged by he CSM noe ha τi 0). This can be inerpreed as he cos incurred a each sage. Similar o 10), we use he average cos over he ime window T = 1000 o evaluae he performance of he bidding sraegies. Hence, he lower he average cos, he beer he performance of he bidding sraegy. The pace loss rae, average ax, and cos per ime slo are presened in Table II. The accumulaed pace loss and cos of SU 1 for he five scenarios are ploed in Fig. 5a) and b), respecively. From his simulaion, comparing scenario 2 wih scenario 1, we observe ha when SU 2 deploys he myopic sraegy agains SU 1, which adoped he fixed bidding sraegy, SU 2 reduces is average cos by around 42% and he average pace loss rae by around 16.6%. This significan improvemen is because SU 2 can more accuraely value he channel opporuniies by modeling and considering is experienced dynamics, i.e., source characerisics, channel condiions, and availabiliy. In scenario 3, SU 1 improves is bidding sraegy i.e., i deploys now a source-aware bidding sraegy) by parially considering is experienced environmen, i.e., SU 1 generaes is bid vecor by only considering he source dynamics hough

13 1916 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 58, NO. 4, MAY 2009 TABLE III PERFORMANCE OF SU 1 5 WITH VARIOUS BIDDING STRATEGIES IN THE FIVE SU NETWORKS is curren buffer sae. Compared wih scenario 2, if SU 1 considers more informaion abou is own sae, i can furher reduce is pace loss rae by an average of 4.5% and an average cos by around 5.4%. This observaion verifies ha he informaion abou he SU s sae improves he bidding sraegy. In scenario 4, SU 1 deploys a myopic bidding sraegy, which is more advanced han he source-aware bidding sraegy since i considers boh ypes of dynamics defined in Secion IV including he dynamics regarding he source characerisics, channel condiions, and channel availabiliy, and he ineracion wih oher SUs in he aucion mechanism). The significan improvemen in erms of pace loss rae 13% reduced) and average cos 25% reduced), compared wih scenario 2, indicaes ha he myopic bidding sraegy provides he opimal bid vecor when only curren benefis are considered, as shown in Secion V-B. In scenario 5, SU 1 furher improves he bidding sraegy using he bes-response learning algorihm developed in Secion VI. Using learning, SU 1 reduces he pace loss rae o 15.14% and he average cos o % lower compared wih scenario 4). This significan improvemen is due o he abiliy of he SU o learning and forecas he fuure impac of is curren acions. I is also worh noing ha he reducion of pace loss rae of SU 1 in scenarios 2 5 comes from wo pars: One is he advanced bidding sraegies, which allows he SU o ae ino consideraion more informaion abou is own saes and he oher SUs saes and, based on his beer forecas, he impac of various acions; he oher one is he increase in he amoun of resources consumed by SU 1, which corresponds o a higher ax charged by he CSM, as shown in Table II. We furher noe ha he bidding sraegy deployed by SU 1 will affec he performance of SU 2. For example, comparing scenario 2 wih scenario 4, he fixed bidding sraegy of SU 1 in scenario 2 leads o a lower average cos 15% reduced) for SU 2. This is because SU 1 uses a fixed bidding sraegy, which does no accoun for he dynamic changes in is environmen, while SU 2 minimizes is curren cos he number of paces los plus he ax) based on is curren sae. However, when comparing scenario 5 wih scenario 4, SU 1 using learning no only improves is predicion of he curren environmen dynamics bu also beer predics he impac on he fuure cos based on he observaions. The improvemen leads o higher resource allocaion hence, incurring higher ax, see in Table II) for SU 1, hereby resuling in worse performance for SU 2 i.e., he average cos is increased by 22.2%). 2) Muliple SUs Compeiion for Channel Opporuniies: In his simulaion, we consider five SUs compeing for he available TxOps in he WLAN-lie SN. The pace arrivals of all he five SUs are modeled using a Poisson process wih he same average arrival rae of 1 Mb/s. The number of channels is 3, and he channel condiion of all he five SUs on each channel aes only hree values K =3), which are 18, 23, and 26 db. The ransiion probabiliies are p 0 1 ij = p 0 2 ij =0.4, p 0 3 ij =0.2, p l 1 1j = p l 2 1j =0.4, and p l 3 1j =0.2 i, j, l. The parameers of he model of he availabiliy of he channels o he SUs are p NF j =0.7and p FN j =0.3. The lengh of he ime slo ΔT is also 10 2 s. Similar parameers are used for he five SUs o clearly illusrae he performance differences obained based on he differen sraegies. In his simulaion, we consider only wo scenarios. In scenario 1, all SUs deploy a myopic bidding sraegy π myopic i, i = 1, 2,...,5, whereas in scenario 2, SU 5 deploys he muliuser learning-based bidding sraegy π L 5 5 wih he discoun facor of 0.5, and he oher SUs deploy he myopic bidding sraegy, i =1,...,4. The pace loss rae and cos per ime slo incurred by he SUs are presened in Table III. The accumulaed pace loss and cos of SU 5 for he five scenarios are ploed in Fig. 6a) and b), respecively. Similar o he wo-su newor, SU 5 significanly reduces he pace loss rae by 14.6% and average cos by 16.1% by adoping he bes-response learning-based bidding sraegy. Fig. 6a) and b) furher verifies he improvemen of he performance for SU 1. However, he oher SUs performances are decreased as hey now need o compee agains a learning SU i.e., SU 5), which is able o mae beer bids for he available resources. π myopic i B. Muliuser Learning and Delay Impac in a Wireless Tes Bed To validae he performance of muliuser learning and he impac of various delays in a realisic newor seing, we considered wo SUs compeing for he available TxOps in our a-enabled wireless es bed [31]. The channel condiion experienced by he SUs varied beween 10 and 30 db, and we represened his variaion using en saes K = 10). The parameers of he TxOp model are p NF j =0.6and p FN j =0.4. The lengh of he ime slo ΔT is also 10 2 s. The SUs sream he delay-sensiive video raffic e.g., he Mobile sequence encoded using an H.264 video encoder) o heir own desinaions wih an average daa rae of 1.5 Mb/s. We compare hree scenarios. In scenario 1, boh SUs deploy a myopic bidding sraegy π myopic i, i =1, 2. In scenario 2, SU 1 deploys he learning-based bidding sraegy π L 1 1 wih a discoun facor of 0.5, and SU 2 deploys a myopic sraegy π myopic 2. In scenario 3, boh SUs deploy he learning-based bidding sraegy π L i i, i = 1, 2. In he menioned hree scenarios, video applicaions are

14 FU AND VAN DER SCHAAR: LEARNING TO COMPETE FOR RESOURCES IN WIRELESS STOCHASTIC GAMES 1917 Fig. 6. Accumulaed pace loss and cos of SU 5 in he wo scenarios. a) Accumulaed pace loss over he ime slo. b) Accumulaed cos over he ime slo. TABLE IV PERFORMANCE OF SU 1 AND 2WITH VARIOUS BIDDING STRATEGIES IN THE MORE REALISTIC NETWORK considered o olerae a delay 8 of 533 ms, which is used in some real-ime video sreaming applicaions. In scenario 4, SU 1 deploys he learning-based bidding sraegy π L 1 1 wih a discoun facor of 0.5, and SU 2 deploys a myopic sraegy π myopic 2. However, in his scenario, SU 1 sreams a video sequence ha can only olerae a delay of 266 ms, which is ypical for video conferencing applicaions. Table IV shows he average video qualiy in erms of pea SNR PSNR) 9 and incurred cos for boh SUs under various scenarios. Comparing scenario 2 wih scenario 1, we observe ha he SU using he learning-based bidding sraegy improves he received video qualiy by 2.2 db and reduces he incurred cos by 9.3%. However, as he performance of SU 1 improves, his also resuls in worse performance for SU 2. This observaion is similar o he resuls in Secion VII-A1 and has he same explanaion. In scenario 3, boh SUs deploy he learning-based bidding sraegies and are able o beer predic he impac of heir curren bidding acions on he fuure cos based on heir observaions. Thus, compared wih scenario 1, he performance of boh SUs has improved: SU 1 SU 2) increases by 1 db 1.2 db) in erms of PSNR and reduces is cos by 4.3% 4.0%). Compared o scenario 2, if SU 2 also deploys he learning-based approach, hen SU 2 also observes is esimaed fuure reward and will increase is bid, hereby reducing he performance 8 During he simulaions, for simpliciy, we assume ha he paces wihin one Group of Picure GOP) have he same delay deadline. 9 PSNR is a widely adoped meric o objecively measure he video qualiy. A PSNR difference of 1 db is significan and can be seen by an unrained human observer. of SU 1. From Table IV, we noe ha he PSNR of SU 1 is decreased by 1.2 db, whereas he PSNR of SU 2 is increased by 2 db. We also observe ha he cos of SU 1 is increased by around 5.6%, whereas he cos of SU is decreased by 9.1%. In scenario 4, since SU 1 sreams a video applicaion wih a lower delay deadline, i has o bid more o ensure ha paces wih sringen delay deadline are ransmied o he desinaion, and hence, SU 1 incurs a higher ransmission cos 41% increased) compared wih scenario 2. Alhough SU 1 bids more for he limied available resources, he video qualiy of SU 1 is reduced by 1.8 db due o is sringen delay deadline. Ineresingly, he sringen delay deadline of he SU 1 s applicaion also increases he ransmission cos of SU 2 and also reduces is video qualiy. This is because he higher bid of SU 1 on limied resources auomaically increases he bid of SU 2. C. Learning Wih Imperfec Informaion In his secion, we consider ha SU 1 deploys he learningbased bidding sraegy and SU 2 deploys he myopic sraegy. The environmen parameers are he same as in Secion VII-B. To quanify he impac of imperfec informaion abou he environmen on SUs performance, we assume ha SU 1 has he ransiion probabiliy of TxOps p NF j =0.55 and p FN j =0.45), which is slighly differen from he rue one i.e., p NF j =0.6 and p FN j =0.4). Table V shows he PSNRs and corresponding cos of boh SUs when SU 1 has perfec or imperfec informaion abou he TxOps. From Table V, we observe ha an inaccurae model of TxOps reduces he performance of SU 1 i.e., he PSNR decreases by

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Resource Allocation in Visible Light Communication Networks NOMA vs. OFDMA Transmission Techniques

Resource Allocation in Visible Light Communication Networks NOMA vs. OFDMA Transmission Techniques Resource Allocaion in Visible Ligh Communicaion Neworks NOMA vs. OFDMA Transmission Techniques Eirini Eleni Tsiropoulou, Iakovos Gialagkolidis, Panagiois Vamvakas, and Symeon Papavassiliou Insiue of Communicaions

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

Solutions for Assignment 2

Solutions for Assignment 2 Faculy of rs and Science Universiy of Torono CSC 358 - Inroducion o Compuer Neworks, Winer 218 Soluions for ssignmen 2 Quesion 1 (2 Poins): Go-ack n RQ In his quesion, we review how Go-ack n RQ can be

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

. Now define y j = log x j, and solve the iteration.

. Now define y j = log x j, and solve the iteration. Problem 1: (Disribued Resource Allocaion (ALOHA!)) (Adaped from M& U, Problem 5.11) In his problem, we sudy a simple disribued proocol for allocaing agens o shared resources, wherein agens conend for resources

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

Simulating models with heterogeneous agents

Simulating models with heterogeneous agents Simulaing models wih heerogeneous agens Wouer J. Den Haan London School of Economics c by Wouer J. Den Haan Individual agen Subjec o employmen shocks (ε i, {0, 1}) Incomplee markes only way o save is hrough

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Maintenance Models. Prof. Robert C. Leachman IEOR 130, Methods of Manufacturing Improvement Spring, 2011

Maintenance Models. Prof. Robert C. Leachman IEOR 130, Methods of Manufacturing Improvement Spring, 2011 Mainenance Models Prof Rober C Leachman IEOR 3, Mehods of Manufacuring Improvemen Spring, Inroducion The mainenance of complex equipmen ofen accouns for a large porion of he coss associaed wih ha equipmen

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION DOI: 0.038/NCLIMATE893 Temporal resoluion and DICE * Supplemenal Informaion Alex L. Maren and Sephen C. Newbold Naional Cener for Environmenal Economics, US Environmenal Proecion

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

Block Diagram of a DCS in 411

Block Diagram of a DCS in 411 Informaion source Forma A/D From oher sources Pulse modu. Muliplex Bandpass modu. X M h: channel impulse response m i g i s i Digial inpu Digial oupu iming and synchronizaion Digial baseband/ bandpass

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

CENTRALIZED VERSUS DECENTRALIZED PRODUCTION PLANNING IN SUPPLY CHAINS

CENTRALIZED VERSUS DECENTRALIZED PRODUCTION PLANNING IN SUPPLY CHAINS CENRALIZED VERSUS DECENRALIZED PRODUCION PLANNING IN SUPPLY CHAINS Georges SAHARIDIS* a, Yves DALLERY* a, Fikri KARAESMEN* b * a Ecole Cenrale Paris Deparmen of Indusial Engineering (LGI), +3343388, saharidis,dallery@lgi.ecp.fr

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3 Macroeconomic Theory Ph.D. Qualifying Examinaion Fall 2005 Comprehensive Examinaion UCLA Dep. of Economics You have 4 hours o complee he exam. There are hree pars o he exam. Answer all pars. Each par has

More information

Planning in POMDPs. Dominik Schoenberger Abstract

Planning in POMDPs. Dominik Schoenberger Abstract Planning in POMDPs Dominik Schoenberger d.schoenberger@sud.u-darmsad.de Absrac This documen briefly explains wha a Parially Observable Markov Decision Process is. Furhermore i inroduces he differen approaches

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

A Dynamic Model of Economic Fluctuations

A Dynamic Model of Economic Fluctuations CHAPTER 15 A Dynamic Model of Economic Flucuaions Modified for ECON 2204 by Bob Murphy 2016 Worh Publishers, all righs reserved IN THIS CHAPTER, OU WILL LEARN: how o incorporae dynamics ino he AD-AS model

More information

Lecture 4 Notes (Little s Theorem)

Lecture 4 Notes (Little s Theorem) Lecure 4 Noes (Lile s Theorem) This lecure concerns one of he mos imporan (and simples) heorems in Queuing Theory, Lile s Theorem. More informaion can be found in he course book, Bersekas & Gallagher,

More information

Cash Flow Valuation Mode Lin Discrete Time

Cash Flow Valuation Mode Lin Discrete Time IOSR Journal of Mahemaics (IOSR-JM) e-issn: 2278-5728,p-ISSN: 2319-765X, 6, Issue 6 (May. - Jun. 2013), PP 35-41 Cash Flow Valuaion Mode Lin Discree Time Olayiwola. M. A. and Oni, N. O. Deparmen of Mahemaics

More information

Learning Objectives: Practice designing and simulating digital circuits including flip flops Experience state machine design procedure

Learning Objectives: Practice designing and simulating digital circuits including flip flops Experience state machine design procedure Lab 4: Synchronous Sae Machine Design Summary: Design and implemen synchronous sae machine circuis and es hem wih simulaions in Cadence Viruoso. Learning Objecives: Pracice designing and simulaing digial

More information

On-line Adaptive Optimal Timing Control of Switched Systems

On-line Adaptive Optimal Timing Control of Switched Systems On-line Adapive Opimal Timing Conrol of Swiched Sysems X.C. Ding, Y. Wardi and M. Egersed Absrac In his paper we consider he problem of opimizing over he swiching imes for a muli-modal dynamic sysem when

More information

Air Traffic Forecast Empirical Research Based on the MCMC Method

Air Traffic Forecast Empirical Research Based on the MCMC Method Compuer and Informaion Science; Vol. 5, No. 5; 0 ISSN 93-8989 E-ISSN 93-8997 Published by Canadian Cener of Science and Educaion Air Traffic Forecas Empirical Research Based on he MCMC Mehod Jian-bo Wang,

More information

Competitive and Cooperative Inventory Policies in a Two-Stage Supply-Chain

Competitive and Cooperative Inventory Policies in a Two-Stage Supply-Chain Compeiive and Cooperaive Invenory Policies in a Two-Sage Supply-Chain (G. P. Cachon and P. H. Zipkin) Presened by Shruivandana Sharma IOE 64, Supply Chain Managemen, Winer 2009 Universiy of Michigan, Ann

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

Stability and Bifurcation in a Neural Network Model with Two Delays

Stability and Bifurcation in a Neural Network Model with Two Delays Inernaional Mahemaical Forum, Vol. 6, 11, no. 35, 175-1731 Sabiliy and Bifurcaion in a Neural Nework Model wih Two Delays GuangPing Hu and XiaoLing Li School of Mahemaics and Physics, Nanjing Universiy

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

Games Against Nature

Games Against Nature Advanced Course in Machine Learning Spring 2010 Games Agains Naure Handous are joinly prepared by Shie Mannor and Shai Shalev-Shwarz In he previous lecures we alked abou expers in differen seups and analyzed

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Mean-square Stability Control for Networked Systems with Stochastic Time Delay

Mean-square Stability Control for Networked Systems with Stochastic Time Delay JOURNAL OF SIMULAION VOL. 5 NO. May 7 Mean-square Sabiliy Conrol for Newored Sysems wih Sochasic ime Delay YAO Hejun YUAN Fushun School of Mahemaics and Saisics Anyang Normal Universiy Anyang Henan. 455

More information

Inventory Control of Perishable Items in a Two-Echelon Supply Chain

Inventory Control of Perishable Items in a Two-Echelon Supply Chain Journal of Indusrial Engineering, Universiy of ehran, Special Issue,, PP. 69-77 69 Invenory Conrol of Perishable Iems in a wo-echelon Supply Chain Fariborz Jolai *, Elmira Gheisariha and Farnaz Nojavan

More information

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Capacity Constraints

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Capacity Constraints IJCSI Inernaional Journal of Compuer Science Issues, Vol 9, Issue 1, No 1, January 2012 wwwijcsiorg 18 Applying Geneic Algorihms for Invenory Lo-Sizing Problem wih Supplier Selecion under Sorage Capaciy

More information

Testing for a Single Factor Model in the Multivariate State Space Framework

Testing for a Single Factor Model in the Multivariate State Space Framework esing for a Single Facor Model in he Mulivariae Sae Space Framework Chen C.-Y. M. Chiba and M. Kobayashi Inernaional Graduae School of Social Sciences Yokohama Naional Universiy Japan Faculy of Economics

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

The electromagnetic interference in case of onboard navy ships computers - a new approach

The electromagnetic interference in case of onboard navy ships computers - a new approach The elecromagneic inerference in case of onboard navy ships compuers - a new approach Prof. dr. ing. Alexandru SOTIR Naval Academy Mircea cel Bărân, Fulgerului Sree, Consanţa, soiralexandru@yahoo.com Absrac.

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

Sliding Mode Extremum Seeking Control for Linear Quadratic Dynamic Game

Sliding Mode Extremum Seeking Control for Linear Quadratic Dynamic Game Sliding Mode Exremum Seeking Conrol for Linear Quadraic Dynamic Game Yaodong Pan and Ümi Özgüner ITS Research Group, AIST Tsukuba Eas Namiki --, Tsukuba-shi,Ibaraki-ken 5-856, Japan e-mail: pan.yaodong@ais.go.jp

More information

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems.

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems. di ernardo, M. (995). A purely adapive conroller o synchronize and conrol chaoic sysems. hps://doi.org/.6/375-96(96)8-x Early version, also known as pre-prin Link o published version (if available):.6/375-96(96)8-x

More information

Lecture Notes 5: Investment

Lecture Notes 5: Investment Lecure Noes 5: Invesmen Zhiwei Xu (xuzhiwei@sju.edu.cn) Invesmen decisions made by rms are one of he mos imporan behaviors in he economy. As he invesmen deermines how he capials accumulae along he ime,

More information

Lecture 3: Exponential Smoothing

Lecture 3: Exponential Smoothing NATCOR: Forecasing & Predicive Analyics Lecure 3: Exponenial Smoohing John Boylan Lancaser Cenre for Forecasing Deparmen of Managemen Science Mehods and Models Forecasing Mehod A (numerical) procedure

More information

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate.

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate. Inroducion Gordon Model (1962): D P = r g r = consan discoun rae, g = consan dividend growh rae. If raional expecaions of fuure discoun raes and dividend growh vary over ime, so should he D/P raio. Since

More information

Orientation. Connections between network coding and stochastic network theory. Outline. Bruce Hajek. Multicast with lost packets

Orientation. Connections between network coding and stochastic network theory. Outline. Bruce Hajek. Multicast with lost packets Connecions beween nework coding and sochasic nework heory Bruce Hajek Orienaion On Thursday, Ralf Koeer discussed nework coding: coding wihin he nework Absrac: Randomly generaed coded informaion blocks

More information

Expert Advice for Amateurs

Expert Advice for Amateurs Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he

More information

Object tracking: Using HMMs to estimate the geographical location of fish

Object tracking: Using HMMs to estimate the geographical location of fish Objec racking: Using HMMs o esimae he geographical locaion of fish 02433 - Hidden Markov Models Marin Wæver Pedersen, Henrik Madsen Course week 13 MWP, compiled June 8, 2011 Objecive: Locae fish from agging

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

A Reinforcement Learning Approach for Collaborative Filtering

A Reinforcement Learning Approach for Collaborative Filtering A Reinforcemen Learning Approach for Collaboraive Filering Jungkyu Lee, Byonghwa Oh 2, Jihoon Yang 2, and Sungyong Park 2 Cyram Inc, Seoul, Korea jklee@cyram.com 2 Sogang Universiy, Seoul, Korea {mrfive,yangjh,parksy}@sogang.ac.kr

More information

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates Biol. 356 Lab 8. Moraliy, Recruimen, and Migraion Raes (modified from Cox, 00, General Ecology Lab Manual, McGraw Hill) Las week we esimaed populaion size hrough several mehods. One assumpion of all hese

More information

Matlab and Python programming: how to get started

Matlab and Python programming: how to get started Malab and Pyhon programming: how o ge sared Equipping readers he skills o wrie programs o explore complex sysems and discover ineresing paerns from big daa is one of he main goals of his book. In his chaper,

More information

Economics 8105 Macroeconomic Theory Recitation 6

Economics 8105 Macroeconomic Theory Recitation 6 Economics 8105 Macroeconomic Theory Reciaion 6 Conor Ryan Ocober 11h, 2016 Ouline: Opimal Taxaion wih Governmen Invesmen 1 Governmen Expendiure in Producion In hese noes we will examine a model in which

More information

Policy regimes Theory

Policy regimes Theory Advanced Moneary Theory and Policy EPOS 2012/13 Policy regimes Theory Giovanni Di Barolomeo giovanni.dibarolomeo@uniroma1.i The moneary policy regime The simple model: x = - s (i - p e ) + x e + e D p

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time. Supplemenary Figure 1 Spike-coun auocorrelaions in ime. Normalized auocorrelaion marices are shown for each area in a daase. The marix shows he mean correlaion of he spike coun in each ime bin wih he spike

More information

Numerical Dispersion

Numerical Dispersion eview of Linear Numerical Sabiliy Numerical Dispersion n he previous lecure, we considered he linear numerical sabiliy of boh advecion and diffusion erms when approimaed wih several spaial and emporal

More information

Lecture Notes 3: Quantitative Analysis in DSGE Models: New Keynesian Model

Lecture Notes 3: Quantitative Analysis in DSGE Models: New Keynesian Model Lecure Noes 3: Quaniaive Analysis in DSGE Models: New Keynesian Model Zhiwei Xu, Email: xuzhiwei@sju.edu.cn The moneary policy plays lile role in he basic moneary model wihou price sickiness. We now urn

More information

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j =

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j = 1: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME Moving Averages Recall ha a whie noise process is a series { } = having variance σ. The whie noise process has specral densiy f (λ) = of

More information

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015 Explaining Toal Facor Produciviy Ulrich Kohli Universiy of Geneva December 2015 Needed: A Theory of Toal Facor Produciviy Edward C. Presco (1998) 2 1. Inroducion Toal Facor Produciviy (TFP) has become

More information

3.1 More on model selection

3.1 More on model selection 3. More on Model selecion 3. Comparing models AIC, BIC, Adjused R squared. 3. Over Fiing problem. 3.3 Sample spliing. 3. More on model selecion crieria Ofen afer model fiing you are lef wih a handful of

More information

Presentation Overview

Presentation Overview Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion

More information

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1 SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision

More information

Appendix to Creating Work Breaks From Available Idleness

Appendix to Creating Work Breaks From Available Idleness Appendix o Creaing Work Breaks From Available Idleness Xu Sun and Ward Whi Deparmen of Indusrial Engineering and Operaions Research, Columbia Universiy, New York, NY, 127; {xs2235,ww24}@columbia.edu Sepember

More information

Solutions to the Exam Digital Communications I given on the 11th of June = 111 and g 2. c 2

Solutions to the Exam Digital Communications I given on the 11th of June = 111 and g 2. c 2 Soluions o he Exam Digial Communicaions I given on he 11h of June 2007 Quesion 1 (14p) a) (2p) If X and Y are independen Gaussian variables, hen E [ XY ]=0 always. (Answer wih RUE or FALSE) ANSWER: False.

More information

EXERCISES FOR SECTION 1.5

EXERCISES FOR SECTION 1.5 1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin ACE 56 Fall 005 Lecure 4: Simple Linear Regression Model: Specificaion and Esimaion by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Simple Regression: Economic and Saisical Model

More information

OBJECTIVES OF TIME SERIES ANALYSIS

OBJECTIVES OF TIME SERIES ANALYSIS OBJECTIVES OF TIME SERIES ANALYSIS Undersanding he dynamic or imedependen srucure of he observaions of a single series (univariae analysis) Forecasing of fuure observaions Asceraining he leading, lagging

More information

Demodulation of Digitally Modulated Signals

Demodulation of Digitally Modulated Signals Addiional maerial for TSKS1 Digial Communicaion and TSKS2 Telecommunicaion Demodulaion of Digially Modulaed Signals Mikael Olofsson Insiuionen för sysemeknik Linköpings universie, 581 83 Linköping November

More information

Cooperative Ph.D. Program in School of Economic Sciences and Finance QUALIFYING EXAMINATION IN MACROECONOMICS. August 8, :45 a.m. to 1:00 p.m.

Cooperative Ph.D. Program in School of Economic Sciences and Finance QUALIFYING EXAMINATION IN MACROECONOMICS. August 8, :45 a.m. to 1:00 p.m. Cooperaive Ph.D. Program in School of Economic Sciences and Finance QUALIFYING EXAMINATION IN MACROECONOMICS Augus 8, 213 8:45 a.m. o 1: p.m. THERE ARE FIVE QUESTIONS ANSWER ANY FOUR OUT OF FIVE PROBLEMS.

More information

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8) I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression

More information

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number

More information

More Digital Logic. t p output. Low-to-high and high-to-low transitions could have different t p. V in (t)

More Digital Logic. t p output. Low-to-high and high-to-low transitions could have different t p. V in (t) EECS 4 Spring 23 Lecure 2 EECS 4 Spring 23 Lecure 2 More igial Logic Gae delay and signal propagaion Clocked circui elemens (flip-flop) Wriing a word o memory Simplifying digial circuis: Karnaugh maps

More information

Two Coupled Oscillators / Normal Modes

Two Coupled Oscillators / Normal Modes Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

Stable Scheduling Policies for Maximizing Throughput in Generalized Constrained Queueing Systems

Stable Scheduling Policies for Maximizing Throughput in Generalized Constrained Queueing Systems 1 Sable Scheduling Policies for Maximizing Throughpu in Generalized Consrained Queueing Sysems Prasanna Chaporar, Suden Member, IEEE, Saswai Sarar, Member, IEEE Absrac We consider a class of queueing newors

More information

Transmitting important bits and sailing high radio waves: a decentralized cross-layer approach to cooperative video transmission

Transmitting important bits and sailing high radio waves: a decentralized cross-layer approach to cooperative video transmission Transmiing imporan bis and sailing high radio waves: a decenralized cross-layer approach o cooperaive video ransmission 1 Nicholas Masronarde, Francesco Verde, Donaella Darsena, Anna Scaglione, and Mihaela

More information

Learning Relaying Strategies in Cellular D2D Networks with Token-Based Incentives

Learning Relaying Strategies in Cellular D2D Networks with Token-Based Incentives Learning Relaying Sraegies in Cellular D2D Neworks wih Token-Based Incenives Nicholas Masronarde, Viral Pael Deparmen of Elecrical Engineering Sae Universiy of New York a Buffalo Buffalo, NY USA Jie Xu,

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information