Power Allocation in Multi-user Cellular Networks With Deep Q Learning Approach

Size: px
Start display at page:

Download "Power Allocation in Multi-user Cellular Networks With Deep Q Learning Approach"

Transcription

1 Power Allocaion in Muli-user Cellular Neworks Wih Deep Q Learning Approach Fan Meng, Peng Chen and Lenan Wu arxiv: v1 [cs.it] 7 Dec 2018 Absrac The model-driven power allocaion (PA) algorihms in he wireless cellular neworks wih inerfering muliple-access channel (IMAC) have been invesigaed for decades. Nowadays, he daa-driven model-free machine learning-based approaches are rapidly developed in his field, and among hem he deep reinforcemen learning (DRL) is proved o be of grea promising poenial. Differen from supervised learning, he DRL akes advanages of exploraion and exploiaion o maximize he objecive funcion under cerain consrains. In our paper, we propose a wo-sep raining framework. Firs, wih he off-line learning in simulaed environmen, a deep Q nework (DQN) is rained wih deep Q learning (DQL) algorihm, which is welldesigned o be in consisen wih his PA issue. Second, he DQN will be furher fine-uned wih real daa in on-line raining procedure. The simulaion resuls show ha he proposed DQN achieves he highes averaged sum-rae, comparing o he ones wih presen DQL raining. Wih differen user densiies, our DQN ouperforms benchmark algorihms and hus a good generalizaion abiliy is verified. Index Terms Deep reinforcemen learning, deep Q learning, inerfering muliple-access channel, power allocaion. I. INTRODUCTION Daa ransmiing in wireless communicaion neworks has experienced explosively growh in recen decades and will keep rising in he fuure. The user densiy is grealy increasing, resuling in criical demand for more capaciy and specral efficiency. Therefore, boh inra-cell and iner-cell inerference managemens are significan o improve he overall capaciy of a cellular nework sysem. The problem of maximizing a generic sum-rae is sudied in his paper, and i is non-convex, NP-hard and canno be solved efficienly. Various model-driven algorihms have been proposed in he presen papers for PA problems, such as fracional programming (FP) [1], weighed MMSE (WMMSE) [2] and some ohers [3], [4]. Excellen performance can be observed hrough heoreical analysis and numerical simulaions, bu serious obsacles are faced in pracical deploymens [5]. Firs, hese echniques highly rely on racable mahemaical models, which are imperfec in real communicaion scenarios wih he specific user disribuion, geographical environmen, ec. Second, he compuaional complexiies of hese algorihms are high. In recen years, he machine learning (ML)-based approaches have been rapidly developed in wireless communicaions [6]. These algorihms are usually model-free, and are Fan Meng, and Lenan Wu are wih he School of Informaion Science and Engineering, Souheas Universiy, Nanjing , China ( mengxiaomaomao@oulook.com, wuln@seu.edu.cn). Peng Chen is wih he Sae Key Laboraory of Millimeer Waves, Souheas Universiy, Nanjing , China ( chenpengseu@seu.edu.cn). complian wih opimizaions in pracical communicaion scenarios. Addiionally, wih developmens of graphic processing uni (GPU) or specialized chips, he execuions can be boh fas and energy-efficien, which brings in solid foundaions for massive applicaions. Two main branches of ML, supervised learning and reinforcemen learning (RL) [7], are briefly inroduced here. Wih supervised learning, a deep neural nework (DNN) is rained o approximae some given opimal (or subopimal) objecive algorihms, and i has been realized in some applicaions [8] [10]. However, he arge algorihm is usually unavailable and he performance of DNN is bounded by he supervisor. Therefore, he RL has received widespread aenion, due o is naure of ineracing wih an unknown environmen by exploraion and exploiaion. The Q learning mehod is he mos well-sudied RL algorihm, and i is exploied o cope wih power allocaion (PA) in [11] [13], and some ohers [14]. The DNN rained wih Q learning is called deep Q nework (DQN), and i is proposed o address he disribued downlink single-user PA problem [15]. In our paper, we exend he work in [15], and he PA problem in cellular cells wih muliple users is invesigaed. The design of he DQN model is discussed and inroduced. Simulaion resuls show ha our DQN ouperforms he presen DQNs and he benchmark algorihms. The conribuions of his work are summarized as follows: A model-free wo-sep raining framework is proposed. The DQN is firs off-line rained wih DRL algorihm in simulaed scenarios. Second, he learned DQN can be furher dynamically opimized in real communicaion scenarios, wih he aid of ransfer learning. The PA problem using deep Q learning (DQL) is discussed, hen a DQN enabled approach is proposed o be rained wih curren sum-rae as reward funcion, including no fuure reward. The inpu feaures are welldesigned o help he DQN ge closer o he opimal soluion. Afer cenralized raining, he proposed DQN is esed by disribued execuion. The averaged rae-sum of DQN ouperforms he model-driven algorihms, and also shows good generalizaion abiliy in a series of benchmark simulaion ess. The remainder of his paper is organized as follows. Secion II oulines he PA problem in he wireless cellular nework wih IMAC. In Secion III our proposed DQN is inroduced in deail. Then, his DQN is esed in disinc scenarios, along

2 wih benchmark algorihms, and he simulaion resuls are analyzed in Secion IV. Conclusions and discussion are given in Secion V. II. SYSTEM MODEL The problem of PA in he cellular nework wih inerfering muliple-access channel (IMAC) is considered. In a sysem wih N cells, a he cener of each cell a base saions (BS) simulaneously serves K users wih sharing frequency bands. A simple nework example is shown in Fig. 1. A ime slo, he independen channel coefficien beween he n-h BS and he user k in cell j is denoed by gn,j,k, and can be expressed as g n,j,k = h n,j,k 2 β n,j,k, (1) where h n,j,k is he small scale complex fla fading elemen, and β n,j,k is he large scale fading componen aking accoun of boh he geomeric aenuaion and he shadow fading. Therefore, he signal o inerference plus noise raio (SINR) of his link can be described by g n,n,k p n,k sinr n,k = k k g n,n,k p n,k + n D n gn,n,k j p n,j +, σ2 (2) where D n is he se of inerference cells around he n-h cell, p is he emiing power of BS, and σ 2 denoes he addiional noise power. Wih normalized bandwidh, he downlink rae of his link is given as C n,k = log 2 ( 1 + sinr n,k ), (3) The opimizaion arge is o maximize his generic sum-rae objecive funcion under maximum power consrain, and i is formulaed as max p n k C n,k s.. 0 p n,k P max, n, k, where p = {p n,k n, k}, and P max denoes he maximum emiing power. We also define sum-rae C = n k C n,k, C = {Cn,k n, k}, and channel sae informaion (CSI) g = {gn,j,k n, j, k}. This problem is non-convex and NP-hard, so we propose a daa-driven learning algorihm based on he DQN model in he following secion. A. Background III. DEEP Q NETWORK Q learning is one of he mos popular RL algorihms aiming o deal wih he Markov decision process (MDP) problems [16]. A ime insan, by observing he sae s S, he agen akes acion a A and ineracs wih he environmen, and hen ge he reward r and he nex sae s +1 is obained. The noaions A and S are he acion se and he sae se, respecively. Since S can be coninuous, he DQN is proposed o combine Q learning wih a flexible DNN (4) Y axis (km) X axis (km) BS User Fig. 1. An illusraive example of a muli-user cellular nework wih 9 cells. In each cell, a BS serves 2 users simulaneously. o sele infinie sae space. The cumulaive discouned reward funcion is given as R = γ τ r +τ+1, (5) τ=0 where γ [0, 1) is a discoun facor ha rades off he imporance of immediae and fuure rewards, and r denoes he reward. Under a cerain policy π, he Q funcion of he agen wih an acion a in sae s is given as Q π (s, a; θ) = E π [ R s = s, a = a ], (6) where θ denoes he DQN parameers, and E [ ] is he expecaion operaor. Q learning concerns wih how agens ough o inerac wih an unknown environmen so as o maximize he Q funcion. The maximizaion of (6) is equivalen o he Bellman opimaliy equaion [17], and i is describe as y = r + γ max Q(s +1, a ; θ ), (7) a where y is he opimal Q value. The DQN is rained o approximae he Q funcion, and he sandard Q learning updae of he parameers θ is described as θ +1 = θ + η ( y Q(s, a ; θ ) ) Q(s, a ; θ ), (8) where η is he learning rae. This updae resembles sochasic gradien descen, gradually updaing he curren value Q(s, a ; θ ) owards he arge y. The experience daa of he agen is loaded as ( s, a, r, s +1). The DQN is rained wih recorded bach daa randomly sampled from he experience replay memory, which is a firs-in firs-ou queue. B. Discussion on DRL In many applicaions such as playing video games [16], where curren sraegy has long-erm impac on cumulaive reward, he DQN achieve remarkable resuls and bea humans. However, he discoun facor is suggesed o be zero in his

3 C p g C p g C p g Fig. 2. The soluion of DQN is deermined by CSI g, along wih downlink rae C 1 and ransmiing power p 1. PA problem. The DQL aims o maximize he Q funcion. Le γ = 0, from (6) we have max Q = max E [ π r s = s, a = a ]. (9) a A For a PA problem, clearly ha s = g, a = p. Then we le r = C and ge ha max Q = [ max E π C g, p ]. (10) 0 p p max In he execuion period he policy is deerminisic, and hus (10) can be wrien as max Q = max C ( g, p ), (11) 0 p p max which is a equivalen form of (4). In his inference process we assume ha γ = 0 and r = C, indicaing ha he opimal soluion o (4) is idenical o ha of (6), under hese wo condiions. As shown in Fig. 2, i is well-known ha he opimal soluion p of (4) is only deermined by curren CSI g, and he sum-rae C is calculaed wih (g, p ). Theoreically he opimal power p can be obained using a DQN wih inpu being jus g. In fac, he performance of his designed DQN is poor, since i is non-convex and he opimal poin is hard o find. Therefore, we propose o uilize wo more auxiliary feaures: C 1 and p 1. Since ha he channel can be modeled as a firs-order Markov process, he soluion of las ime period can help he DQN ge closer o he opimum, and (11) can be rewrien as max Q = max 0 p p max C ( g, p, C 1, p 1). (12) Once γ = 0 and r = C, (7) is simplified o be y = C, and he replay memory is also reduced o be (s, a, r ). The DQN works as an esimaor o predic he curren sumrae of corresponding power levels wih a cerain CSI. These discussions provide good guidance for he following DQN design. C. DQN Design in Cellular Nework In our proposed model-free wo-sep raining framework, he DQN is firs off-line pre-rained wih DRL algorihm in simulaed wireless communicaion sysem. This procedure is o reduce he on-line raining sress, due o he large daa requiremen of daa-driven algorihm by naure. Second, wih he aid of ransfer learning, he learned DQN can be furher dynamically fine-uned in real scenarios. Since he pracical wireless communicaion sysem is dynamic and influenced by unknown issues, he daa-driven algorihm is believed o be a promising echnique. We jus discuss he wo-sep framework here, and he firs raining sep is mainly focused in he following manuscrip. In a cerain cellular nework, each BS-user link is regarded as an agen and hus a muli-agen sysem is sudied. However, muli-agen raining is difficul since i needs much more learning daa, raining ime and DNN parameers. Therefore, cenralized raining is considered, and only one agen is rained by using all agens experience replay memory. Then, his agen s learned policy is shared in he disribued execuion period. For our designed DQN, componens of he replay memory are inroduced as follows. 1) Sae: The sae design for a cerain agen (n, k) is imporan, since he full environmen informaion is redundan and irrelevan elemens mus be removed. The agen is assumed o have corresponding perfec insan CSI informaion in (2), and we define logarihmic normalized inerferer se Γ n,k as Γ n,k = 1, } {{, 1, } K 1 {log 2 ( 1 + g n,j,k g n,k,k ) n D n, j }. (13) The channel ampliude of inerferers are normalized by ha of he needed link, and he logarihmic represenaion is preferred since he ampliudes of channel ofen vary by orders of magniude. The cardinaliy of Γ n,k is ( D n + 1)K 1. To furher decrease he inpu dimension and reduce he compuaional complexiies, he elemens in Γ n,k are sored in decrease urn and only he firs C elemens remain. As we discussed in III-B, hese remained componens and his link s corresponding downlink rae C 1 n,k and ransmiing power p 1 n,k a las ime slo, are he addiional wo pars of he inpu o our DQN. Therefore, he sae is composed of hree feaures: s n,k = {Γ n,k, C 1 n,k, p 1 n,k }. The cardinaliy of sae, i.e., he inpu dimension for DQN is S = 3C ) Acion: In (4) he downlink power is a coninuous variable, and is only consrained by maximum power consrain. Since he acion space of DQN mus be finie, he possible emiing power is quanized in A levels. The allowed power se is given as A = { 0, P min, P min ( Pmax P min ) 1 A 2,, Pmax } where P min is he non-zero minimum emiing power., (14)

4 3) Reward: In some manuscrips he reward funcion is elaboraely designed o improve he agen s ransmiing rae and also miigae he inerference influence. However, mos of hese reward funcions are subopimal approaches o he arge funcion of (4). In our paper, he C is direcly used as he reward funcion, and i is shared by all agens. In he raining simulaions wih small or medium scale cellular nework, his simple mehod proves o be feasible. TABLE I HYPER-PARAMETERS SETUP OF DQN TRAINING Parameer Value Parameer Value Number of T per episode 50 Iniial η 10 3 Observe episode number 100 Final η 10 4 Explore episode number 9900 Iniial ɛ 0.2 Train inerval 10 Final ɛ 10 4 Memory size Bach size 256 IV. SIMULATION RESULTS A. Simulaion Configuraion A cellular nework wih N = 25 cells is simulaed. A cener of each cell, a BS is deployed o synchronously serve K = 4 users which are locaed uniformly and randomly wihin he cell range r [R min, R min ], where R min = 0.01 km and R min = 1 km are he inner space and half cell-o-cell disance, respecively. The small-scale fading is simulaed o be Rayleigh disribued, and he Jakes model is adoped wih Doppler frequency f d = 10 Hz and ime period T = 20 ms. According o he LTE sandard, he large-scale fading is modeled as β = log 10 (d)+10 log 10 (z) db, where z is a log-normal random variable wih sandard deviaion being 8 db, and d is he ransmier-o-receiver disance (km). The AWGN power σ 2 is 114 dbm, and he emiing power consrains P min and P max are 5 and 38 dbm, respecively. A four-layer feed-forward neural nework (FNN) is chosen as DQN, and he neuron numbers of wo hidden layers are 128 and 64, respecively. The acivaion funcion of oupu layer is linear, and he ReLU is adoped in he hidden layers. The cardinaliy of adjacen cells is D n = 18, n, he firs C = 16 inerferers remain and power level number A = 10. Therefore, he inpu and oupu dimensions are 50 and 10, respecively. In he off-line raining period, he DQN is firs randomly iniialized and hen rained epoch by epoch. In he firs 100 episodes, he agens only ake acions sochasically, hen hey follow by adapive ɛ-greedy learning sraegy [17] o sep in he following exploring period. In each episode, he largescale fading is invarian, and hus he number of raining episode mus be large enough o overcome he generalizaion problem. There are 50 ime slos per episode, and he DQN is rained wih 256 random samples in he experience replay memory every 10 ime slos. The Adam algorihm [18] is adoped as he opimizer in our paper, and he learning rae η exponenially decays from 10 3 o All raining hyperparameers are lised in Tab.I for beer illusraion. In he following simulaions, hese defaul hyper-parameers will be clarified once changed. The FP algorihm, WMMSE algorihm, maximum PA and random PA schemes are reaed as benchmarks o evaluae our proposed DQN-based algorihm. The perfec CSI of curren momen is assumed o be known for all schemes. The simulaion code will be available afer formal publicaion. Average rae (bps) = 0.0 = 0.1 = 0.3 = 0.7 = Training Epoch Fig. 3. Wih differen γ values, he recorded average rae during raining period (Curves smoohed by averaged window). B. Discoun Facor In his subsecion, he performance of differen discoun facor γ is sudied. We se γ {0.0, 0.1, 0.3, 0.7, 0.9}, and he average rae C over he raining period is shown in Fig. 3. A he same ime slo, obviously he values of C wih higher γ {0.7, 0.9} are lower han he res wih lower γ values. The rained DQNs are hen esed in hree cellular neworks wih differen cell numbers. As shown in Fig. 4 shows ha DQN wih γ = 0.0 achieves he highes C score, while he lowes value is obained by he one wih highes γ value. The simulaion resul shows ha he non-zero γ has a negaive influence on he performance of DQN, which is consisen wih he analysis in III-B. Therefore, a zero or low discoun facor value is recommended. C. Algorihm Comparison The DQN rained wih zero γ is used, and he four benchmark algorihms saed before are esed as comparisons. In real cellular nework, he user densiy is changing over ime, and he DQN mus have good generalizaion abiliy agains his issue. The user number per cell K is assumed o be in se {1, 2, 4, 6}. The averaged simulaion resuls are obained afer 500 repeas. As shown in Fig. 5, he DQN achieves he highes C in all esing scenarios. Alhough i is rained wih K = 4, he DQN sill ouperforms he oher algorihms in he oher cases. We also noe ha he gap beween random/maximum PA schemes and he res opimizaion algorihms is increased

5 Average rae (bps) = 0.0 = 0.1 = 0.3 = 0.7 = 0.9 Average rae (bps/hz) DQN FP WMMSE Random power Maximal power N=25 N=49 N=100 Number of cells 0.0 K=1 K=2 K=4 K=6 User number per cell Fig. 4. The average rae C versus cellular nework scalabiliy for rained DQNs wih differen γ values. Fig. 5. The average rae C versus user number per cell. Five power allocaion schemes are esed. when K becomes larger. This can be mainly aribued ha he inra-cell inerference ges sronger wih increased user densiy, which indicaing ha he opimizaion of PA is more significan in he cellular neworks wih denser users. We also give an example resul of one esing episode here (K = 4). In comparison wih he averaged sum-rae values in Fig. 5, in Fig. 6 he performance of hree PA algorihms (DQN, FP, WMMSE) is no sable, especially depending on he specific large-scale fading effecs. Addiionally, in some episodes he DQN can no be beer han he oher algorihms over he ime (no shown in his paper), which means ha here is sill poenial o improve he DQN performance. In erms of compuaion complexiy, he ime cos of DQN is in linear relaionship wih layer numbers, wih he uilizaion of GPU. Meanwhile, boh FP and WMMSE are ieraive algorihms, and hus he ime cos is no consan, depending on he sopping crierion condiion, iniializaion and CSI. V. CONCLUSIONS The PA problem in he cellular nework wih IMAC has been invesigaed, and he daa-driven model-free DQL has been applied o solve his issue. To be in consisen wih he PA opimizaion arge, he curren sum-rae is used as reward funcion, including no fuure reward. This designed DQL algorihm is proposed, and he DQN simply works as an esimaor o predic he curren sum-rae under all power levels wih a cerain CSI. Simulaion resuls show ha he DQN rained wih zero γ achieves he highes average sumrae. Then in a series of differen scenarios, he proposed DQN ouperforms he benchmark algorihms, indicaing ha he designed DQN has good generalizaion abiliies. In our wo-sep raining framework, we have realized he off-line cenralized learning wih simulaed communicaion neworks, and he learned DQN is esed by disribued execuions. In our fuure work, he on-line learning will be furher Average rae (bps) DQN FP WMMSE Random power Maximal power Time slo Fig. 6. Comparisons of all five power allocaion schemes over 1000 ime slos (Curves smoohed by averaged window). sudied o accommodae he real scenarios wih specific user disribuions and geographical environmens. VI. ACKNOWLEDGMENTS This work was suppored in par by he Naional Naural Science Foundaion of China (Gran No , , ), he Naural Science Foundaion of Jiangsu Province (Gran No. BK ), he Open Program of Sae Key Laboraory of Millimeer Waves (Souheas Universiy, Gran No. Z201804). REFERENCES [1] K. Shen and W. Yu, Fracional programming for communicaion sysemspar i: Power conrol and beamforming, IEEE Transacions on Signal Processing, vol. 66, no. 10, pp , 2018.

6 [2] Q. Shi, M. Razaviyayn, Z. Q. Luo, and C. He, An ieraively weighed mmse approach o disribued sum-uiliy maximizaion for a mimo inerfering broadcas channel, in IEEE Inernaional Conference on Acousics, Speech and Signal Processing, 2011, pp [3] M. Chiang, P. Hande, T. Lan, and C. W. Tan, Power conrol in wireless cellular neworks, Foundaions and Trends in Neworking, vol. 2, no. 4, pp , [4] H. Zhang, L. Venurino, N. Prasad, P. Li, S. Rangarajan, and X. Wang, Weighed sum-rae maximizaion in muli-cell neworks via coordinaed scheduling and discree power conrol, IEEE Journal on Seleced Areas in Communicaions, vol. 29, no. 6, pp , June [5] Z. Qin, H. Ye, G. Y. Li, and B. F. Juang, Deep learning in physical layer communicaions, CoRR, vol. abs/ , [Online]. Available: hp://arxiv.org/abs/ [6] T. OShea and J. Hoydis, An inroducion o deep learning for he physical layer, IEEE Transacions on Cogniive Communicaions and Neworking, vol. 3, no. 4, pp , Dec [7] Y. Lecun, Y. Bengio, and G. Hinon, Deep learning. Naure, vol. 521, no. 7553, p. 436, [8] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos, Learning o opimize: Training deep neural neworks for inerference managemen, IEEE Transacions on Signal Processing, vol. 66, no. 20, pp , Oc [9] F. Meng, P. Chen, L. Wu, and X. Wang, Auomaic modulaion classificaion: A deep learning enabled approach, IEEE Transacions on Vehicular Technology, pp. 1 1, [10] H. Ye, G. Y. Li, and B. Juang, Power of deep learning for channel esimaion and signal deecion in ofdm sysems, IEEE Wireless Communicaions Leers, vol. 7, no. 1, pp , Feb [11] R. Amiri, H. Mehrpouyan, L. Fridman, R. K. Mallik, A. Nallanahan, and D. Maolak, A machine learning approach for power allocaion in henes considering qos, CoRR, vol. abs/ , [Online]. Available: hp://arxiv.org/abs/ [12] E. Ghadimi, F. D. Calabrese, G. Peers, and P. Soldai, A reinforcemen learning approach o power conrol and rae adapaion in cellular neworks, in 2017 IEEE Inernaional Conference on Communicaions (ICC), May 2017, pp [13] F. D. Calabrese, L. Wang, E. Ghadimi, G. Peers, L. Hanzo, and P. Soldai, Learning radio resource managemen in rans: Framework, opporuniies, and challenges, IEEE Communicaions Magazine, vol. 56, no. 9, pp , Sep [14] L. Xiao, D. Jiang, D. Xu, H. Zhu, Y. Zhang, and V. Poor, Twodimensional ani-jamming mobile communicaion based on reinforcemen learning, IEEE Transacions on Vehicular Technology, pp. 1 1, [15] Y. S. Nasir and D. Guo, Deep reinforcemen learning for disribued dynamic power allocaion in wireless neworks, CoRR, vol. abs/ , [Online]. Available: hp://arxiv.org/abs/ [16] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, and G. Osrovski, Human-level conrol hrough deep reinforcemen learning. Naure, vol. 518, no. 7540, p. 529, [17] S. Suon and A. G. Baro, Reinforcemen Learning: An Inroducion. Cambridge, MA: MIT Press, [18] D. P. Kingma and J. Ba, Adam: A mehod for sochasic opimizaion, CoRR, vol. abs/ , [Online]. Available: hp://arxiv.org/abs/

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

Resource Allocation in Visible Light Communication Networks NOMA vs. OFDMA Transmission Techniques

Resource Allocation in Visible Light Communication Networks NOMA vs. OFDMA Transmission Techniques Resource Allocaion in Visible Ligh Communicaion Neworks NOMA vs. OFDMA Transmission Techniques Eirini Eleni Tsiropoulou, Iakovos Gialagkolidis, Panagiois Vamvakas, and Symeon Papavassiliou Insiue of Communicaions

More information

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

Inventory Control of Perishable Items in a Two-Echelon Supply Chain

Inventory Control of Perishable Items in a Two-Echelon Supply Chain Journal of Indusrial Engineering, Universiy of ehran, Special Issue,, PP. 69-77 69 Invenory Conrol of Perishable Iems in a wo-echelon Supply Chain Fariborz Jolai *, Elmira Gheisariha and Farnaz Nojavan

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

Stability and Bifurcation in a Neural Network Model with Two Delays

Stability and Bifurcation in a Neural Network Model with Two Delays Inernaional Mahemaical Forum, Vol. 6, 11, no. 35, 175-1731 Sabiliy and Bifurcaion in a Neural Nework Model wih Two Delays GuangPing Hu and XiaoLing Li School of Mahemaics and Physics, Nanjing Universiy

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information

Single-Pass-Based Heuristic Algorithms for Group Flexible Flow-shop Scheduling Problems

Single-Pass-Based Heuristic Algorithms for Group Flexible Flow-shop Scheduling Problems Single-Pass-Based Heurisic Algorihms for Group Flexible Flow-shop Scheduling Problems PEI-YING HUANG, TZUNG-PEI HONG 2 and CHENG-YAN KAO, 3 Deparmen of Compuer Science and Informaion Engineering Naional

More information

Anti-Disturbance Control for Multiple Disturbances

Anti-Disturbance Control for Multiple Disturbances Workshop a 3 ACC Ani-Disurbance Conrol for Muliple Disurbances Lei Guo (lguo@buaa.edu.cn) Naional Key Laboraory on Science and Technology on Aircraf Conrol, Beihang Universiy, Beijing, 9, P.R. China. Presened

More information

Presentation Overview

Presentation Overview Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

Most Probable Phase Portraits of Stochastic Differential Equations and Its Numerical Simulation

Most Probable Phase Portraits of Stochastic Differential Equations and Its Numerical Simulation Mos Probable Phase Porrais of Sochasic Differenial Equaions and Is Numerical Simulaion Bing Yang, Zhu Zeng and Ling Wang 3 School of Mahemaics and Saisics, Huazhong Universiy of Science and Technology,

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3 and d = c b - b c c d = c b - b c c This process is coninued unil he nh row has been compleed. The complee array of coefficiens is riangular. Noe ha in developing he array an enire row may be divided or

More information

On a Discrete-In-Time Order Level Inventory Model for Items with Random Deterioration

On a Discrete-In-Time Order Level Inventory Model for Items with Random Deterioration Journal of Agriculure and Life Sciences Vol., No. ; June 4 On a Discree-In-Time Order Level Invenory Model for Iems wih Random Deerioraion Dr Biswaranjan Mandal Associae Professor of Mahemaics Acharya

More information

Block Diagram of a DCS in 411

Block Diagram of a DCS in 411 Informaion source Forma A/D From oher sources Pulse modu. Muliplex Bandpass modu. X M h: channel impulse response m i g i s i Digial inpu Digial oupu iming and synchronizaion Digial baseband/ bandpass

More information

CHAPTER 2 Signals And Spectra

CHAPTER 2 Signals And Spectra CHAPER Signals And Specra Properies of Signals and Noise In communicaion sysems he received waveform is usually caegorized ino he desired par conaining he informaion, and he undesired par. he desired par

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Dimitri Solomatine. D.P. Solomatine. Data-driven modelling (part 2). 2

Dimitri Solomatine. D.P. Solomatine. Data-driven modelling (part 2). 2 Daa-driven modelling. Par. Daa-driven Arificial di Neural modelling. Newors Par Dimiri Solomaine Arificial neural newors D.P. Solomaine. Daa-driven modelling par. 1 Arificial neural newors ANN: main pes

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Control of Stochastic Systems - P.R. Kumar

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Control of Stochastic Systems - P.R. Kumar CONROL OF SOCHASIC SYSEMS P.R. Kumar Deparmen of Elecrical and Compuer Engineering, and Coordinaed Science Laboraory, Universiy of Illinois, Urbana-Champaign, USA. Keywords: Markov chains, ransiion probabiliies,

More information

Adaptive Noise Estimation Based on Non-negative Matrix Factorization

Adaptive Noise Estimation Based on Non-negative Matrix Factorization dvanced cience and Technology Leers Vol.3 (ICC 213), pp.159-163 hp://dx.doi.org/1.14257/asl.213 dapive Noise Esimaion ased on Non-negaive Marix Facorizaion Kwang Myung Jeon and Hong Kook Kim chool of Informaion

More information

Navneet Saini, Mayank Goyal, Vishal Bansal (2013); Term Project AML310; Indian Institute of Technology Delhi

Navneet Saini, Mayank Goyal, Vishal Bansal (2013); Term Project AML310; Indian Institute of Technology Delhi Creep in Viscoelasic Subsances Numerical mehods o calculae he coefficiens of he Prony equaion using creep es daa and Herediary Inegrals Mehod Navnee Saini, Mayank Goyal, Vishal Bansal (23); Term Projec

More information

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems Paricle Swarm Opimizaion Combining Diversificaion and Inensificaion for Nonlinear Ineger Programming Problems Takeshi Masui, Masaoshi Sakawa, Kosuke Kao and Koichi Masumoo Hiroshima Universiy 1-4-1, Kagamiyama,

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks - Deep Learning: Theory, Techniques & Applicaions - Recurren Neural Neworks - Prof. Maeo Maeucci maeo.maeucci@polimi.i Deparmen of Elecronics, Informaion and Bioengineering Arificial Inelligence and Roboics

More information

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Capacity Constraints

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Capacity Constraints IJCSI Inernaional Journal of Compuer Science Issues, Vol 9, Issue 1, No 1, January 2012 wwwijcsiorg 18 Applying Geneic Algorihms for Invenory Lo-Sizing Problem wih Supplier Selecion under Sorage Capaciy

More information

6.2 Transforms of Derivatives and Integrals.

6.2 Transforms of Derivatives and Integrals. SEC. 6.2 Transforms of Derivaives and Inegrals. ODEs 2 3 33 39 23. Change of scale. If l( f ()) F(s) and c is any 33 45 APPLICATION OF s-shifting posiive consan, show ha l( f (c)) F(s>c)>c (Hin: In Probs.

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

ECON 482 / WH Hong Time Series Data Analysis 1. The Nature of Time Series Data. Example of time series data (inflation and unemployment rates)

ECON 482 / WH Hong Time Series Data Analysis 1. The Nature of Time Series Data. Example of time series data (inflation and unemployment rates) ECON 48 / WH Hong Time Series Daa Analysis. The Naure of Time Series Daa Example of ime series daa (inflaion and unemploymen raes) ECON 48 / WH Hong Time Series Daa Analysis The naure of ime series daa

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

Air Traffic Forecast Empirical Research Based on the MCMC Method

Air Traffic Forecast Empirical Research Based on the MCMC Method Compuer and Informaion Science; Vol. 5, No. 5; 0 ISSN 93-8989 E-ISSN 93-8997 Published by Canadian Cener of Science and Educaion Air Traffic Forecas Empirical Research Based on he MCMC Mehod Jian-bo Wang,

More information

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3 Macroeconomic Theory Ph.D. Qualifying Examinaion Fall 2005 Comprehensive Examinaion UCLA Dep. of Economics You have 4 hours o complee he exam. There are hree pars o he exam. Answer all pars. Each par has

More information

Internet Traffic Modeling for Efficient Network Research Management Prof. Zhili Sun, UniS Zhiyong Liu, CATR

Internet Traffic Modeling for Efficient Network Research Management Prof. Zhili Sun, UniS Zhiyong Liu, CATR Inerne Traffic Modeling for Efficien Nework Research Managemen Prof. Zhili Sun, UniS Zhiyong Liu, CATR UK-China Science Bridge Workshop 13-14 December 2011, London Ouline Inroducion Background Classical

More information

Planning in POMDPs. Dominik Schoenberger Abstract

Planning in POMDPs. Dominik Schoenberger Abstract Planning in POMDPs Dominik Schoenberger d.schoenberger@sud.u-darmsad.de Absrac This documen briefly explains wha a Parially Observable Markov Decision Process is. Furhermore i inroduces he differen approaches

More information

arxiv: v2 [math.oc] 19 Jun 2016

arxiv: v2 [math.oc] 19 Jun 2016 Using Deep Q-Learning o Conrol Opimizaion Hyperparameers Samanha Hansen IBM T.J. Wason Research Cener arxiv:16.6v [mah.oc] 19 Jun 16 Absrac We presen a novel definiion of he reinforcemen learning sae,

More information

A Dynamic Model of Economic Fluctuations

A Dynamic Model of Economic Fluctuations CHAPTER 15 A Dynamic Model of Economic Flucuaions Modified for ECON 2204 by Bob Murphy 2016 Worh Publishers, all righs reserved IN THIS CHAPTER, OU WILL LEARN: how o incorporae dynamics ino he AD-AS model

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time. Supplemenary Figure 1 Spike-coun auocorrelaions in ime. Normalized auocorrelaion marices are shown for each area in a daase. The marix shows he mean correlaion of he spike coun in each ime bin wih he spike

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070

More information

Sequential Importance Resampling (SIR) Particle Filter

Sequential Importance Resampling (SIR) Particle Filter Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle

More information

Pade and Laguerre Approximations Applied. to the Active Queue Management Model. of Internet Protocol

Pade and Laguerre Approximations Applied. to the Active Queue Management Model. of Internet Protocol Applied Mahemaical Sciences, Vol. 7, 013, no. 16, 663-673 HIKARI Ld, www.m-hikari.com hp://dx.doi.org/10.1988/ams.013.39499 Pade and Laguerre Approximaions Applied o he Acive Queue Managemen Model of Inerne

More information

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x WEEK-3 Reciaion PHYS 131 Ch. 3: FOC 1, 3, 4, 6, 14. Problems 9, 37, 41 & 71 and Ch. 4: FOC 1, 3, 5, 8. Problems 3, 5 & 16. Feb 8, 018 Ch. 3: FOC 1, 3, 4, 6, 14. 1. (a) The horizonal componen of he projecile

More information

RC, RL and RLC circuits

RC, RL and RLC circuits Name Dae Time o Complee h m Parner Course/ Secion / Grade RC, RL and RLC circuis Inroducion In his experimen we will invesigae he behavior of circuis conaining combinaions of resisors, capaciors, and inducors.

More information

Learning to Take Concurrent Actions

Learning to Take Concurrent Actions Learning o Take Concurren Acions Khashayar Rohanimanesh Deparmen of Compuer Science Universiy of Massachuses Amhers, MA 0003 khash@cs.umass.edu Sridhar Mahadevan Deparmen of Compuer Science Universiy of

More information

Recursive Estimation and Identification of Time-Varying Long- Term Fading Channels

Recursive Estimation and Identification of Time-Varying Long- Term Fading Channels Recursive Esimaion and Idenificaion of ime-varying Long- erm Fading Channels Mohammed M. Olama, Kiran K. Jaladhi, Seddi M. Djouadi, and Charalambos D. Charalambous 2 Universiy of ennessee Deparmen of Elecrical

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

A unit root test based on smooth transitions and nonlinear adjustment

A unit root test based on smooth transitions and nonlinear adjustment MPRA Munich Personal RePEc Archive A uni roo es based on smooh ransiions and nonlinear adjusmen Aycan Hepsag Isanbul Universiy 5 Ocober 2017 Online a hps://mpra.ub.uni-muenchen.de/81788/ MPRA Paper No.

More information

in Renewable Energy Powered Cellular Networks

in Renewable Energy Powered Cellular Networks IEEE TRANSACTIONS ON COMMUNICATIONS 1 Base Saion Sleeping and Resource Allocaion in Renewable Energy Powered Cellular Neworks Jie Gong, Member, IEEE, John S. Thompson, Member, IEEE, Sheng Zhou, Member,

More information

( ) ( ) if t = t. It must satisfy the identity. So, bulkiness of the unit impulse (hyper)function is equal to 1. The defining characteristic is

( ) ( ) if t = t. It must satisfy the identity. So, bulkiness of the unit impulse (hyper)function is equal to 1. The defining characteristic is UNIT IMPULSE RESPONSE, UNIT STEP RESPONSE, STABILITY. Uni impulse funcion (Dirac dela funcion, dela funcion) rigorously defined is no sricly a funcion, bu disribuion (or measure), precise reamen requires

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

Open loop vs Closed Loop. Example: Open Loop. Example: Feedforward Control. Advanced Control I

Open loop vs Closed Loop. Example: Open Loop. Example: Feedforward Control. Advanced Control I Open loop vs Closed Loop Advanced I Moor Command Movemen Overview Open Loop vs Closed Loop Some examples Useful Open Loop lers Dynamical sysems CPG (biologically inspired ), Force Fields Feedback conrol

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

The Rosenblatt s LMS algorithm for Perceptron (1958) is built around a linear neuron (a neuron with a linear

The Rosenblatt s LMS algorithm for Perceptron (1958) is built around a linear neuron (a neuron with a linear In The name of God Lecure4: Percepron and AALIE r. Majid MjidGhoshunih Inroducion The Rosenbla s LMS algorihm for Percepron 958 is buil around a linear neuron a neuron ih a linear acivaion funcion. Hoever,

More information

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important on-parameric echniques Insance Based Learning AKA: neares neighbor mehods, non-parameric, lazy, memorybased, or case-based learning Copyrigh 2005 by David Helmbold 1 Do no fi a model (as do LTU, decision

More information

Lecture Notes 3: Quantitative Analysis in DSGE Models: New Keynesian Model

Lecture Notes 3: Quantitative Analysis in DSGE Models: New Keynesian Model Lecure Noes 3: Quaniaive Analysis in DSGE Models: New Keynesian Model Zhiwei Xu, Email: xuzhiwei@sju.edu.cn The moneary policy plays lile role in he basic moneary model wihou price sickiness. We now urn

More information

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number

More information

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important on-parameric echniques Insance Based Learning AKA: neares neighbor mehods, non-parameric, lazy, memorybased, or case-based learning Copyrigh 2005 by David Helmbold 1 Do no fi a model (as do LDA, logisic

More information

A Reinforcement Learning Approach for Collaborative Filtering

A Reinforcement Learning Approach for Collaborative Filtering A Reinforcemen Learning Approach for Collaboraive Filering Jungkyu Lee, Byonghwa Oh 2, Jihoon Yang 2, and Sungyong Park 2 Cyram Inc, Seoul, Korea jklee@cyram.com 2 Sogang Universiy, Seoul, Korea {mrfive,yangjh,parksy}@sogang.ac.kr

More information

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits DOI: 0.545/mjis.07.5009 Exponenial Weighed Moving Average (EWMA) Char Under The Assumpion of Moderaeness And Is 3 Conrol Limis KALPESH S TAILOR Assisan Professor, Deparmen of Saisics, M. K. Bhavnagar Universiy,

More information

Errata (1 st Edition)

Errata (1 st Edition) P Sandborn, os Analysis of Elecronic Sysems, s Ediion, orld Scienific, Singapore, 03 Erraa ( s Ediion) S K 05D Page 8 Equaion (7) should be, E 05D E Nu e S K he L appearing in he equaion in he book does

More information

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems.

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems. di ernardo, M. (995). A purely adapive conroller o synchronize and conrol chaoic sysems. hps://doi.org/.6/375-96(96)8-x Early version, also known as pre-prin Link o published version (if available):.6/375-96(96)8-x

More information

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015 Explaining Toal Facor Produciviy Ulrich Kohli Universiy of Geneva December 2015 Needed: A Theory of Toal Facor Produciviy Edward C. Presco (1998) 2 1. Inroducion Toal Facor Produciviy (TFP) has become

More information

Examples of Dynamic Programming Problems

Examples of Dynamic Programming Problems M.I.T. 5.450-Fall 00 Sloan School of Managemen Professor Leonid Kogan Examples of Dynamic Programming Problems Problem A given quaniy X of a single resource is o be allocaed opimally among N producion

More information

A Video Vehicle Detection Algorithm Based on Improved Adaboost Algorithm Weiguang Liu and Qian Zhang*

A Video Vehicle Detection Algorithm Based on Improved Adaboost Algorithm Weiguang Liu and Qian Zhang* A Video Vehicle Deecion Algorihm Based on Improved Adaboos Algorihm Weiguang Liu and Qian Zhang* Zhongyuan Universiy of Technology, Zhengzhou 450000, China lwg66123@163.com, 2817343431@qq.com *The corresponding

More information

Lab 10: RC, RL, and RLC Circuits

Lab 10: RC, RL, and RLC Circuits Lab 10: RC, RL, and RLC Circuis In his experimen, we will invesigae he behavior of circuis conaining combinaions of resisors, capaciors, and inducors. We will sudy he way volages and currens change in

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes Some common engineering funcions 2.7 Inroducion This secion provides a caalogue of some common funcions ofen used in Science and Engineering. These include polynomials, raional funcions, he modulus funcion

More information

Mean Square Projection Error Gradient-based Variable Forgetting Factor FAPI

Mean Square Projection Error Gradient-based Variable Forgetting Factor FAPI 3rd Inernaional Conference on Advances in Elecrical and Elecronics Engineering (ICAEE'4) Feb. -, 4 Singapore Mean Square Projecion Error Gradien-based Variable Forgeing Facor FAPI Young-Kwang Seo, Jong-Woo

More information

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1 SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision

More information

A new flexible Weibull distribution

A new flexible Weibull distribution Communicaions for Saisical Applicaions and Mehods 2016, Vol. 23, No. 5, 399 409 hp://dx.doi.org/10.5351/csam.2016.23.5.399 Prin ISSN 2287-7843 / Online ISSN 2383-4757 A new flexible Weibull disribuion

More information

Mean-square Stability Control for Networked Systems with Stochastic Time Delay

Mean-square Stability Control for Networked Systems with Stochastic Time Delay JOURNAL OF SIMULAION VOL. 5 NO. May 7 Mean-square Sabiliy Conrol for Newored Sysems wih Sochasic ime Delay YAO Hejun YUAN Fushun School of Mahemaics and Saisics Anyang Normal Universiy Anyang Henan. 455

More information

Subway stations energy and air quality management

Subway stations energy and air quality management Subway saions energy and air qualiy managemen wih sochasic opimizaion Trisan Rigau 1,2,4, Advisors: P. Carpenier 3, J.-Ph. Chancelier 2, M. De Lara 2 EFFICACITY 1 CERMICS, ENPC 2 UMA, ENSTA 3 LISIS, IFSTTAR

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

LAB 5: Computer Simulation of RLC Circuit Response using PSpice

LAB 5: Computer Simulation of RLC Circuit Response using PSpice --3LabManualLab5.doc LAB 5: ompuer imulaion of RL ircui Response using Ppice PURPOE To use a compuer simulaion program (Ppice) o invesigae he response of an RL series circui o: (a) a sinusoidal exciaion.

More information

Optimal Server Assignment in Multi-Server

Optimal Server Assignment in Multi-Server Opimal Server Assignmen in Muli-Server 1 Queueing Sysems wih Random Conneciviies Hassan Halabian, Suden Member, IEEE, Ioannis Lambadaris, Member, IEEE, arxiv:1112.1178v2 [mah.oc] 21 Jun 2013 Yannis Viniois,

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Recursive Least-Squares Fixed-Interval Smoother Using Covariance Information based on Innovation Approach in Linear Continuous Stochastic Systems

Recursive Least-Squares Fixed-Interval Smoother Using Covariance Information based on Innovation Approach in Linear Continuous Stochastic Systems 8 Froniers in Signal Processing, Vol. 1, No. 1, July 217 hps://dx.doi.org/1.2266/fsp.217.112 Recursive Leas-Squares Fixed-Inerval Smooher Using Covariance Informaion based on Innovaion Approach in Linear

More information

EE 435. Lecture 31. Absolute and Relative Accuracy DAC Design. The String DAC

EE 435. Lecture 31. Absolute and Relative Accuracy DAC Design. The String DAC EE 435 Lecure 3 Absolue and Relaive Accuracy DAC Design The Sring DAC . Review from las lecure. DFT Simulaion from Malab Quanizaion Noise DACs and ADCs generally quanize boh ampliude and ime If convering

More information

Scheduling of Crude Oil Movements at Refinery Front-end

Scheduling of Crude Oil Movements at Refinery Front-end Scheduling of Crude Oil Movemens a Refinery Fron-end Ramkumar Karuppiah and Ignacio Grossmann Carnegie Mellon Universiy ExxonMobil Case Sudy: Dr. Kevin Furman Enerprise-wide Opimizaion Projec March 15,

More information

Competitive and Cooperative Inventory Policies in a Two-Stage Supply-Chain

Competitive and Cooperative Inventory Policies in a Two-Stage Supply-Chain Compeiive and Cooperaive Invenory Policies in a Two-Sage Supply-Chain (G. P. Cachon and P. H. Zipkin) Presened by Shruivandana Sharma IOE 64, Supply Chain Managemen, Winer 2009 Universiy of Michigan, Ann

More information

Mechanical Fatigue and Load-Induced Aging of Loudspeaker Suspension. Wolfgang Klippel,

Mechanical Fatigue and Load-Induced Aging of Loudspeaker Suspension. Wolfgang Klippel, Mechanical Faigue and Load-Induced Aging of Loudspeaker Suspension Wolfgang Klippel, Insiue of Acousics and Speech Communicaion Dresden Universiy of Technology presened a he ALMA Symposium 2012, Las Vegas

More information

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims Problem Se 5 Graduae Macro II, Spring 2017 The Universiy of Nore Dame Professor Sims Insrucions: You may consul wih oher members of he class, bu please make sure o urn in your own work. Where applicable,

More information

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers

More information

KINEMATICS IN ONE DIMENSION

KINEMATICS IN ONE DIMENSION KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec

More information