Reinforcement Learning with a Gaussian Mixture Model

Size: px
Start display at page:

Download "Reinforcement Learning with a Gaussian Mixture Model"

Transcription

1 Renforcement Lernng wth Gussn Mxture Model Alejndro Agostn, Member, IEEE nd Enrc Cely Abstrct Recent pproches to Renforcement Lernng (RL) wth functon pproxmton nclude Neurl Ftted Q Iterton nd the use of Gussn Processes. They belong to the clss of ftted vlue terton lgorthms, whch use set of support ponts to ft the vlue-functon n btch tertve process. These technques mke effcent use of reduced number of smples by reusng them s needed, nd re pproprte for pplctons where the cost of experencng new smple s hgher thn storng nd reusng t, but ths s t the expense of ncresng the computtonl effort, snce these lgorthms re not ncrementl. On the other hnd, non-prmetrc models for functon pproxmton, lke Gussn Processes, re preferred gnst prmetrc ones, due to ther greter flexblty. A further dvntge of usng Gussn Processes for functon pproxmton s tht they llow to quntfy the uncertnty of the estmton t ech pont. In ths pper, we propose new pproch for RL n contnuous domns bsed on Probblty Densty Estmtons. Our method combnes the best fetures of the prevous methods: t s non-prmetrc nd provdes n estmton of the vrnce of the pproxmted functon t ny pont of the domn. In ddton, our method s smple, ncrementl, nd computtonlly effcent. All these fetures mke ths pproch more ppelng thn Gussn Processes nd ftted vlue terton lgorthms n generl. I. INTRODUCTION A crucl ssue n Renforcement Lernng (RL) s how to del wth problems whose stte nd cton spces re contnuous, or dscrete but very lrge. In these cses, the pplcton of clsscl tbulr methods to store the Q-vlue for ech possble stte-cton pr (or the vlue of ech stte, n model-bsed pproches) becomes nfesble. On the other hnd, f the number of sttes s too lrge, lernng bout them by vstng them ll turns out to be mpossble, so tht t s necessry to nfer the vlue of stte from the vlues of smlr ones for whch experences hve been collected. To cheve ths, RL must be used wth some form of functon pproxmton provdng the necessry compctness n ts representton nd pproprte generlzton on sttes nd ctons. In generl, functon pproxmton methods cn be clssfed s prmetrc nd non-prmetrc [1]. Prmetrc methods nclude neurl nets, polynomls, nd combntons of rdl bss functons, mong others. They defne prmeterzed fmles of functons wth fnte number of prmeters (whch, n the dscrete cse, s much lesser thn the number of sttes), nd try to fnd the vlues of the prmeters for whch the functon best represents the A. Agostn nd E. Cely re wth the Insttut de Robòtc Informàtc Industrl (UPC - CSIC), c./ Llorens Artgs 4-6, Brcelon, Spn. emls: gostn@r.upc.edu nd cely@r.upc.edu Ths work ws prtlly supported by the Spnsh Mnstry of Scence nd Innovton under project MIPRCV, Consolder Ingeno 2010 (CSD ). vlble dt. Prmetrc methods hve been extensvely used snce they llow the pplcton of grdent technques for prmeter optmzton. One dffculty wth prmetrc models resdes n the selecton of the prmeterzed fmly: f t s too restrctve, t could not be ble to model the dt wth the necessry ccurcy; f t s too generl, there s rsk of overfttng the dt nd provde poor generlzton. Non-prmetrc functon pproxmtors, nsted, do not fx n dvnce the number or the nture of the prmeters (despte ther nme, non-prmetrc pproxmtors usully hve prmeters, but ther number s not upper bounded) so tht they cn be endowed wth unrestrcted functon pproxmton cpbltes. Some exmples re Gussn Processes, tree-bsed methods, nd Mxtures of Gussns wth vrble number of unts. Snce, n generl, n complex RL problem t s not possble to guess wht knd of functon representton wll work, the more flexble non-prmetrc methods re preferred. In the lst yers, dfferent non-prmetrc functon pproxmtors for RL hve been proposed. In [2], Rsmussen nd Kuss proposed the use of Gussn Processes (GP) for RL: Usng model-bsed pproch, number of GPs (one for ech dmenson of the stte spce) s used to model the system dynmcs, nd further GP represents the vlue functon. In [3], the pproch s extended to onlne lernng usng Byesn ctve lernng. An lterntve pplcton of GPs to RL s tht of Engel et l. [4], who use GP to drectly represent the Q-functon n model-free pproch. One beneft of usng GPs for functon pproxmton s tht, besdes provdng the expected vlue of the functon, they lso provde ts vrnce, wht llows to quntfy the uncertnty of the predcted vlue. As ponted out n [5], ths nformton my be very useful to drect the explorton n RL. All of these GP-bsed RL lgorthms fll n the clss of the so-clled ftted vlue terton lgorthms [6], whch, n order to pproxmte the desred functon, tke fnte number of smples, or support ponts, ndtrytoft the functon to them n btch tertve process. Ftted vlue terton hs been used wth both, prmetrc nd non-prmetrc functon pproxmtors. For exmple, Ernst et l. [7], bsed on the prevous work of Ormonet nd Sen [8] on kernel-bsed RL, proposed the ftted Q Iterton lgorthm usng (nonprmetrc) rndomzed trees for functon pproxmton, whle Redmller [9], [10] proposed Neurl Ftted Q Iterton usng (prmetrc) mult-lyer neurl net. The mn de of ftted vlue terton s to reuse the set of smples s much s needed to get ll possble nformton from them. Ths llows to lern wth mnml number of nterctons wth the rel system, but ths does not mply lesser number of /10/$ IEEE

2 functon updtes. Thus, t s pproprte when cqurng new dt s more costly thn just storng them for future use. However, ssumng tht dt cn be obtned t low cost, s for exmple by smulton, ths dvntge dsppers, nd cn even become dsdvntge when the dynmcs evolves wth tme, snce old dt would be no more vld. A key ssue n ftted vlue terton lgorthms s the generton of the set of representtve smples of the functon to be pproxmted. When the model of the problem s known, they cn be obtned by unform smplng through the stte-spce s n [7]. When the model s not vlble, smples cn be generted by nterctng wth the system wth rndom ctons, but ths strtegy my not work when the complexty of the problem s such tht rechng the nterestng regons of the stte spce requres long chn of lucky ctons. For ths reson, Redmller [9] uses greedy heurstc, whch conssts n explotng the polcy lerned n the prevous lernng stges to generte new smples for the next tertons of the lgorthm. Even so, when the problem rses n complexty, he fnds necessry to use wht he clls the hnt-to-gol heurstc to provde specfc exemplrs wthn the gol regon. Note tht, n prncple, ftted vlue terton lgorthms re not ncrementl, n the sense tht ech tme new smples re ntroduced, the functon pproxmton process must be repeted from scrtch for the new dtset, wht s computtonlly neffcent. In ths pper, we propose n pproch to RL for contnuous stte-cton spces wth functon pproxmton bsed on probblty densty estmtons. The de s to represent the densty dstrbuton of the observed smples n the jont spce of sttes, ctons, nd q-vlues. To represent ths densty dstrbuton we use Gussn Mxture Model wth vrble number of unts, so tht the functon pproxmton s nonprmetrc, wht mkes t generl. Wth ths pproch, t s possble to obtn, for ech gven stte nd cton, the probblty dstrbuton of q(s, ) s the condtonl probblty p(q s, ). From ths dstrbuton we cn obtn the vlue of Q(s, ) s the expected vlue of q(s, ). Furthermore, we cn obtn the vrnce of q(s, ) nd estmte ts confdence, so tht our pproch lso presents wht hs been rgued to be n mportnt feture of GPs [4], [3]. The Gussn Mxture Model cn be updted wth n ncrementl, low complexty verson of the Expectton Mxmzton lgorthm, wht mkes ths pproch more ppelng thn GPs nd ftted vlue terton lgorthms n generl. As further beneft of usng densty estmtons, t s possble, by mrgnlzton on the stte-cton vrbles, to obtn the locl smplng densty n pont (s, ), whch, n stochstc problems, my be used to evlute how relble s the estmton t ths pont. The rest of the pper s orgnzed s follows: Secton II brefly resumes the bscs of RL. Secton III ntroduces the GMM for multvrte densty estmton nd the EM lgorthm n ts btch verson. In Secton IV we defne the on-lne EM lgorthm for the GMM. In Secton V we develop our RL lgorthm usng densty estmton of the Q-vlue functon. Secton VI shows the fesblty of the pproch wth n exmple, nd Secton VII concludes the pper. II. REINFORCEMENT LEARNING Renforcement Lernng s prdgm n whch n gent hs to lern n optml cton polcy by nterctng wth ts envronment [11]. The tsk s formlly modelled s the soluton of Mrkov decson process n whch, t ech tme step, the gent observes the current stte of the envronment, s t, nd chooses n llowed cton t usng some cton polcy, t = π(s t ). In response to ths cton, the envronment chnges to stte s t+1 nd produces n nstntneous rewrd r t = r(s t, t ). Usng the nformton collected n ths wy, the gent must fnd the polcy tht mxmzes the expected sum of dscounted rewrds, lso clled return, defned s: R = γ t r t, (1) t=0 where γ s the dscount rte, wth vlues n [0,1], tht regultes the mportnce of future rewrds wth respect to mmedte ones. One of the most populr lgorthms used n RL s Q- Lernng [12], whch uses n cton-vlue functon Q(s, ) to estmte the mxmum expected return tht cn be obtned by executng cton n stuton s nd ctng optmlly therefter. Q-lernng uses the Bellmn equton [13] to estmte smple vlues for Q(s, ) tht we denote by q(s, ): q(s t, t )=r(s t, t )+γmx Q(s t+1,) (2) where mx Q(s t+1,) s the estmted mxmum expected return correspondng to the next observed stuton s t+1.at gven stge of the lernng, the temporry polcy cn be derved from the estmted Q-functon s π(s) =rgmx Q(s, ) (3) In ctor/crtc rchtectures, polcy functon (clled the ctor) s lerned nd explctly stored, so tht ctons re drectly decded by the ctor nd do not need to be computed through the mxmzton n (3). Despte ths computtonl dvntge, the lernng of n ctor my slow down convergence, snce then the lernng of the Q-functon must be done on-polcy nsted of off-polcy, nd both functons, ctor nd crtc, must dpt to ech other to rech convergence. In our mplementton we vod the use of n ctor, nd thus we must fce the problem of mxmzng the Q(s, ) functon n (3). The bsc formulton of Q-lernng ssumes dscrete stte-cton spces nd the Q-functon s stored n tbulr representton. For contnuous domns functon pproxmton s requred to represent the Q-functon nd generlze between smlr stutons. In next sectons we present our proposl for functon pproxmton usng densty estmtons.

3 III. DENSITY ESTIMATION WITH A GAUSSIAN MIXTURE MODEL A Gussn Mxture Model [14] s weghted sum of multvrte Gussn probblty densty functons, nd s used to represent generl probblty densty functons n multdmensonl spces. It s ssumed tht the smples of the dstrbuton to be represented hve been generted through the followng process: frst, one Gussn s rndomly selected wth prorgven probbltes, nd then, smple s rndomly generted wth the probblty dstrbuton of the selected Gussn. Accordng to ths, the probblty densty functon of genertng smple x s: p(x; Θ) = α N (x; μ, Σ ) (4) =1 where K s the number of Gussns of the mxture; α, usully denoted s the mxng prmeter, s the pror probblty, P (), of Gussn to generte smple; N (x; μ, Σ ) s the multdmensonl Gussn functon wth men vector μ nd covrnce mtrx Σ ; nd Θ = {{α 1,μ 1, Σ 1 },..., {α K,μ K, Σ K }} s the whole set of prmeters of the mxture. By llowng the dpton of the number K of Gussns n the mxture, ny smooth densty dstrbuton cn be pproxmted rbtrrly close [15]. The prmeters of the model cn be estmted usng mxmumlkelhood estmtor (MLE). Gven set of smples X = {x t ; t =1,...,N}, the lkelhood functon s gven by L[X; Θ] = N p(x t ; Θ). (5) The mxmum-lkelhood estmton of the model prmeters s the Θ tht mxmzes the lkelhood (5) for the dt set X. Drect computton of the MLE requres complete nformton bout whch mxture component generted whch nstnce. Snce ths nformton s mssng, the EM lgorthm, descrbed n the next secton, s often used. A. The Expectton-Mxmzton lgorthm The Expectton-Mxmzton (EM) lgorthm [16] s generl tool tht permts to estmte the prmeters tht mxmze the lkelhood functon (5) for bord clss of problems when there re some mssng dt. The EM method frst produces n estmton of the expected vlues of the mssng dt usng ntl vlues of the prmeters to be estmted (E step), nd then computes the MLE of the prmeters gven the expected vlues of the mssng dt (M step). Ths process s repeted tertvely untl convergence crteron s fulflled. In ths secton we brefly descrbe how EM s ppled to the specfc cse of GMM. The process strts wth n ntlzton of the men vectors nd covrnce mtrces of the Gussns. The E step conssts n obtnng the probblty P ( x t ) for ech component of genertng nstnce x t,tht we denote by w t,, w t, = P ( x t )= P ()p(x t ) = α N (x t ; μ, Σ ) P (j)p(x t j) α j N (x t ; μ j, Σ j ) j=1 j=1 (6) where t =1,.., N nd =1,.., K. The mxmzton step conssts n computng the MLE usng the estmted w t,.it cn be shown [17] tht, for the cse of GMM, the mxng prmeters, mens, nd covrnces re gven by α = 1 N w t, (7) N Σ = N w t, x t μ = (8) N w t, N w t, (x t μ )(x t μ ) T (9) N w t, IV. ON-LINE EM Estmtng probblty densty functon by mens of the EM lgorthm nvolves the terton of E nd M steps on the complete set of vlble dt, tht s, the mode of operton of EM s n btch. However, n RL, smple dt re not ll vlble t once: they rrve sequentlly nd must be used onlne to mprove the polcy tht wll llow n effcent explorton-explotton strtegy. Ths prevents the use of the off-lne EM lgorthm, nd requres n on-lne, ncrementl verson of t. Severl ncrementl EM lgorthms hve been proposed for the Gussn Mxture Model ppled to clusterng or clssfcton of sttonry dt [18], [19]. The pproch proposed n [18] n not strctly n on-lne EM lgorthm. It pples the conventonl btch EM lgorthm onto seprte dt strems correspondng to successve epsodes. For ech new strem, new GMM model s trned n btch mode nd then merged wth the prevous model. The number of components for ech new GMM s defned usng the Byesn Informton Crteron, nd the mergng process nvolves smlrty comprsons between Gussns. Ths method nvolves mny computtonlly expensve processes t ech epsode nd tends to generte more components thn ctully needed. The pplcblty of ths method to RL seems lmted, not only for ts computtonl cost, but lso becuse, due to the non-sttonrty of the Q-estmton, old dt should not be tken s eqully vld durng ll the process. The work of [19] performs ncrementl updtng of the densty model usng no hstorcl dt nd ssumng tht consecutve dt vry smoothly. The method mntns two GMMs: the current GMM estmton, nd prevous GMM of the sme complexty fter whch no model updtng (.e. no chnge n the number of Gussns) hs been done.

4 By comprng the current GMM wth the hstorcl one, t s determned f new Gussns re generted or f some Gussns re merged together. Two observed shortcomngs of the lgorthm re tht the system fls when new dt s well explned by the hstorcl GMM, nd when consecutve dt volte the condton of smooth vrton. In [20], n on-lne EM lgorthm s presented for the Normlzed Gussn Network (NGnet), model closely relted to the GMM. Ths lgorthm s bsed on the works of [21], [22]. In [21] method for the ncrementl dptton of the model prmeters usng forgettng fctor nd cumultve sttstcs s proposed, whle n [22] the method n [21] s evluted nd contrsted wth n ncrementl verson whch performs steps of EM over fxed set of smples n n ncrementl wy. The method proposed n [20] uses foundtons of both works to elborte n on-lne lernng lgorthm to trn NGnet for regresson, where weghted verges of the model prmeters re clculted usng lernng rte tht mplctly ncorportes forgettng fctor to del wth non-sttonrtes. Inspred by ths work, we developed n on-lne EM lgorthm for the GMM. Our pproch uses cumultve sttstcs whose updtng nvolves forgettng fctor explctly. A. On-lne EM for the GMM In the on-lne EM pproch, n E step nd n M step re performed fter the observton of ech ndvdul smple. The E step does not dffer from the btch verson (equton (6)), except tht t s only computed for the new smple. For the M step, the prmeters of ll mxture components re updted wth the new smple. For ths, we defne the followng tme-dscounted weghted sums W t, = [[1]] t, (10) X t, = [[x]] t, (11) (XX) t, = [[ xx T]] (12) t, where we use the notton: ( t t [[f(x)]] t, = τ =1 s=τ +1 λ s )f(x τ )w τ, (13) where λ t [0, 1] s tme dependent dscount fctor ntroduced for forgettng the effect of old, possbly outdted vlues. Observe tht for low vlues of λ t,thenfluence of old dt decreses progressvely, so tht they re forgotten long tme. Ths forgettng effect of old dt s ttenuted when λ t pproches 1: n ths cse, old nd new dt hve the sme nfluence n the sum. As lernng proceeds nd dt vlues become more stble, forgettng them s no more requred nd λ t cn be mde to progressvely pproch 1 to llow convergence. The sum W t, cn be nterpreted s the ccumulted number of smples (composed of weghts w t, ) ttrbuted to unt long tme, wth forgettng. Smlrly, X t, corresponds to the weghted sum wth forgettng of smple vectors x τ ttrbuted to unt, whch s used to derve the men vector μ. In the sme wy, (XX) t, s the weghted sum wth forgettng of the mtrces obtned s the products x τ x T τ of smple vectors ttrbuted to unt, whch wll be used to fnd the covrnce mtrx Σ. From (13), we obtn the recursve formul: [[f(x)]] t, = λ t [[f(x)]] t 1, + f(x t )w t,. (14) When new smple x t rrves, the ccumultors (10), (11), nd (12) re updted wth the ncrementl formul (14), nd new estmtors for the GMM prmeters re obtned s: α (t) = W t, (15) W t,j j=1 μ (t) = X t, W t, (16) Σ (t) = (XX) t, W t, μ (t)μ (t) T (17) If the number K of Gussns n the mxture s fxed, the GMM s prmetrc functon pproxmton method whose pproxmton cpbltes re determned by K. Sncewe cn not determne the most pproprte K beforehnd, we llow the number of Gussns to be ncremented on-lne by process of unt generton, so tht the functon pproxmton method s whole becomes non-prmetrc. The process of unt generton s explned n Secton V-B. B. Weght-Dependent Forgettng The fctors λ t n (13) were ntroduced by [20] wth the purpose of progressvely replce (forget) old dt by new, more relble vlues. The effect of ths s clerly seen n the ncrementl formul (14), whch shows how, t ech tme step, ll pst dt re multpled by λ t, nd ths s done for ll unts, no mtter how much weght w t, s ttrbuted to ech of them. We observe tht, the rel effect of pplyng (14) to unts wth low ctvton w t, s not to replce ther pst vlues by the new one but, essentlly, to decrese ther vlues by fctor λ t. It cn be seen tht ths s exctly the cse when settng w t, =0n equton (14), wht yelds: [[f(x)]] t, = λ t [[f(x)]] t 1,, (18) showng tht the ccumultors of unts tht re seldom ctvted wll systemtclly decy to 0. Ths stuton s prtculrly nnoyng n the cse of onlne RL, for whch t s very lkely tht hghly vlued regons of the sttecton spce wll be smpled much more frequently thn less promsng ones, so tht n the long term, unts coverng low vlued regons wll get ther sttstcs lost. Ths cn be voded by modfyng the updtng formul (14) n ths wy: [[f(x)]] t, = λ wt, t [[f(x)]] t 1, + f(x t )w t,. (19) Wth ths updtng formul, the mount by whch old dt re forgotten s regulted by the mount w t, n whch new vlue s dded to the sum, so tht dt re lwys replced,

5 nsted of smply forgotten. Effectvely, f now we mke w t, =0n (19), wht we get s: [[f(x)]] t, =[[f(x)]] t 1,, (20) so tht the vlues of the sttstcs of the nctve unts remn unchnged. On the other hnd, n the cse of full ctvton of unt,.e., f w t, =1, the effect of the new updtng formul s exctly the sme s tht of (14). Therefore, we wll prefer the updtng formul (19) to keep better trck of less smpled regons, notng tht by dong ths, the defnton gven n (13) does no longer hold. V. THE GMM FOR Q-LEARNING In ths Secton, we descrbe how the GMM cn be used for functon pproxmton to estmte the expected Q-vlue s well s ts vrnce t ech pont of the stte-cton spce by mens of sngle representton of the probblty densty functon n the jont spce of sttes, ctons nd Q-vlues: p(s,,q)= α N (s,,q; μ, Σ ). (21) =1 In onlne Q-lernng, ech smple s of the form x t = (s t, t,q(s t, t )), correspondng to the vsted stte s t,the executed cton t, nd the estmted vlue of q(s t, t ) s gven by eq. (2). To obtn ths estmton we need to evlute mx Q(s t+1, t ),whereq(s,) s defned s the expected vlue of q gven s nd for the jont probblty dstrbuton (21) provded by the GMM: Q(s,)=E [q s,]=μ(q s,). (22) To compute ths, we must frst obtn the dstrbuton p(q s,). Decomposng the covrnces Σ nd mens μ n the followng wy: ( ) (s,) μ μ = μ q (23) ( ) Σ = Σ (s,)(s,) Σ q,(s,) Σ (s,),q Σ qq, (24) the probblty dstrbuton of q, for gven stte s nd cton, cn then be expressed s: p(q s,)= β (s,)n (q; μ (q s,),σ (q)) (25) where, =1 μ (q s,)=μ q + Σq,(s,) σ 2 (q) =Σqq β (s,)= ( ( Σ q,(s,) Σ (s,)(s,) Σ (s,)(s,) ) 1 ( ) (s,) μ (s,) (26) ) 1 (s,),q Σ (27) α N (s,; μ (s,), Σ (s,)(s,) ). (28) α j N (s,; μ (s,) j, Σ (s,)(s,) j ) j=1 From (25) we cn obtn the condtonl men nd covrnce, μ(q s,) nd σ 2 (q s,), of the mxture t pont (s,) s: μ(q s,)= β (s,)μ (q s,) (29) σ 2 (q s,)= =1 β (s,)(σ 2 (q)+(μ (q s,) μ(q s,)) 2 ). =1 (30) Equton (29) s the estmted Q vlue for gven stte nd cton, whle (30) s ts estmted vrnce. Our purpose ws to fnd the mxmum for ll ctons nd for gven s of Q(s,), n order to compute (2). Unfortuntely, ths s hrd to do nlytclly, but n pproxmted vlue cn be obtned by numercl technques. In our mplementton, we tke the smple pproch of computng Q(s,) for fnte number of ctons, nd then tkng the lrgest Q vlue s the pproxmted mxmum: mx Q(s, ) mx Q(s, ), (31) A where A s the set of ctons tht we tke nto consderton to fnd the pproxmted mxmum. A. Acton Selecton wth Explorton Acton selecton n RL must ddress the explorton/explotton trdeoff. If we wnt just to explot wht hs been lernt so fr, wth no explorton, we must execute the cton g correspondng to the greedy polcy s gven by eq. (3), tht n our cse s computed s: g = π g (s) rgmx Q(s, ). (32) A However, durng lernng, n explorton strtegy s necessry tht gurntees tht no cton s excluded from executon n ny stte. Two well-known wys to cheve ths re the ε-greedy nd the Boltzmnn explorton. Accordng to [23], these strteges re n the fmly of the undrected explorton methods, menng tht explorton s bsed n rndomness, nd no explorton-specfc knowledge s used for gudng explorton. It s clmed tht drected explorton technques re often more effcent thn undrected ones, so we propose more drected method of explorton tht tkes nto ccount the predcton error for ech cton, whch s cptured n the vrnce of the Q-vlues. For ths, we defne: Q rnd (s, ) =Q(s, )+Δ Q (σ 2 (q s, )), (33) where Δ Q (σ 2 (q s, )) s vlue tken t rndom from norml probblty dstrbuton wth 0 men nd vrnce σ 2 (q s, ). Then, the cton selecton wth explorton s mde ccordng to: explr = rgmx Q rnd (s, )+ rnd, (34) A where rnd s n pproprtely szed rndom perturbton of the cton, ntroduced to llow the executon of rbtrry ctons nd not just those contned n A.

6 Wth ths form of explorton ll the ctons hve lwys chnce of gettng Q rnd bove ts compettors, nd hence, probblty to be selected. Usully, hgher-vlued ctons wll hve more chnces of gettng the hghest Q rnd.however, low-vlued cton my eventully receve hgh Q rnd tht surpsses the vlues of other ctons. Ths provdes blnce between explorton nd explotton, tht tends to tke the greedy cton when we re rther certn tht t wll result n lrger vlue, nd ncreses the probblty of explorng non-greedy cton when ts predcted outcome s uncertn. B. Unt Generton The GMM s ntlzed wth smll number of unts tht s selected ccordng to the expected complexty of the problem. However, f durng trnng the model s found to be nsuffcent to represent the smple dstrbuton wth the requred ccurcy, t my be upgrded by genertng new unts. Snce our mn nterest s to ccurtely represent the Q functon, the generton of new Gussn s determned by the flure of the current GMM to ccount for n ctully observed q vlue. Thus, new Gussn s generted when the two followng condtons re stsfed: 1) The estmton error of the observed q vlue s lrger thn predefned vlue δ: (q(s, ) μ(q s,)) 2 δ. (35) 2) Unts close to the experenced pont hve been suffcently updted. We consder unt s close to pont x = (s,, q), f the Mhlnobs dstnce D () M, wth covrnce mtrx Σ, between the unt men nd the pont s less thn 1, I = {1 K D () M (x,μ ) < 1}, (36) nd thus, the crteron cn be expressed s: t w τ, >N conf, I. (37) τ =1 The purpose of ths condton s to vod the premture generton of new unts n regon before the system hs hd the opportunty to dpt to dt n the gven regon. Whenever both crter re fulflled, Gussn s generted wth prmeters gven by: W K+1 =1 (38) μ K+1 (s,,q)=(s t, t,q(s t, t )) (39) Σ K+1 = Cdg{d 1,..., d D,d,d q }, (40) where d s the totl rnge sze of vrble, D s the dmenson of the stte spce, nd C s postve vlue to sze the vrnces of the new Gussn. VI. EXPERIMENTS A clsscl benchmrk problem for RL, the control of n nverted pendulum wth lmted torque [24], hs been selected to test our lgorthm. We ddressed the problem of the swng up nd stblzton of the pendulum. The tsk conssts n swngng the pendulum untl rechng the uprght poston nd then sty there ndefntely. The optml polcy for ths problem s not trvl to fnd snce, due to the lmted torques vlble, the controller hs to swng the pendulum severl tmes bck nd forth untl ts knetc energy s lrge enough to overcome the lod torque nd rech the uprght poston, nd then stblze the pendulum there. The stte spce of ths problem s two-dmensonl nd s formed by the ngulr poston θ nd ngulr velocty θ: s =(θ, θ), whereθ tkes vlues n the ntervl [ π, π], nd θ s lmted to the ntervl [ 8, 8]s 1. The Gussns of the mxture model re four-dmensonl nd the GMM provdes estmtons of the probblty denstes n the jont spce x=(θ, θ,, q). As the rewrd sgnl we smply tke the heght of the tp of the pendulum h = cos(θ) whch rnges n the ntervl [ 1, 1], nd the dscount coeffcent γ n equton (2) s set to We ntlze the model wth 20 Gussns wth rndom ntl mens μ for ll except the q dmenson, tht s ntlzed to the mxmum possble Q vlue to fvor explorton of unvsted regons. The ntl covrnce mtrces Σ re dgonl nd the vrnce of ech vrble s set to the rnge of tht vrble. The ntl number of smples W of ech Gussn s set to 0.1. Ths smll vlue mkes the component to hve smll nfluence n the estmton whle there s no, or lttle, updtng. The dscount fctor λ t for the weghted sums wth forgettng (secton IV) tkes vlues from the equton, λ t =1 1/(t + b) (41) where b regultes the ntl vlue λ 0,nd determnes ts growth rte towrd 1. In our experments we set =0.001 nd b = 1000 n the cse of usng the updtng formul (14), nd b =10n the cse of usng the updtng formul (19) to compenste for the effect of the exponent w t, < 1. For the experments, we dopt the set-up of [25]: we run epsodes of 7 seconds wth ctuton ntervls of 0.01 seconds. At the begnnng of ech epsode the pendulum s rndomly plced nsde n rch centered n the uprght poston. Ths cn be seen s form of the hnt-to-gol heurstc used n [9] nd lso n [3]. The length of the rch s stedly ncremented wth ech epsode untl coverng ts whole rnge, thus llowng ny rbtrry ntl poston. To evlute the performnce of the lernng system, we run 10 ndependent experments of 120 epsodes ech. At the end of ech epsode, 7 sec. test explotng the polcy lerned so fr s done, nd the totl ccumulted rewrd s computed. Fgure 1 shows the result of vergng the results of the 10 experments usng the updtng formul (14). It cn be observed tht, despte n cceptble control s reched, there s some nstblty tht perssts even n the fnl epsodes. Ths s cused by spordc perods n whch, fter hvng

7 Rewrd ccumulted Fg. 3. A stroboscopc sequence obtned from plcng the pendulum n the downrght poston Epsodes Fg. 1. Averge of the ccumulted rewrd per epsode performed over 10 experments, wth unform forgettng (λ t) lerned to correctly swng-up nd stblze the pendulum, the system unlerns t nd must re-lern gn to recover the rght polcy. Ths effect s consequence of the forgettng of the functon pproxmton n low vlued regons cused by the bsed smplng tht occurs when the system keeps lernng fter the rght polcy hs been lredy found. To correct ths effect s tht we ntroduced the updtng formul (19) for weght-dependent forgettng. The results obtned θ θ Rewrd ccumulted Epsodes Fg. 2. Averge of the ccumulted rewrd per epsode performed over 10 experments, wth weght-dependent forgettng (λ w t, t ). Fg. 4. q Projecton of the Gussns of the GMM nto the stte spce. usng ths formul re shown n Fgure 2. It cn be seen tht n ths cse convergence s much fster nd wth much more stble behvor, wht demonstrtes the effectveness of the pproch. To llustrte the performnce reched, Fgure 3 shows stroboscopc sequence of the pendulum strtng from the ntl poston of the pendulum hngng down. Fgures 4 nd Fg. 5. θ Projecton of the Gussns of the GMM nto the (θ, q) spce.

8 5 show two projectons of the Gussns of typcl GMM obtned for ths problem fter trnng. It cn be seen tht they re not eqully dstrbuted on the whole confgurton spce, but concentrted n the most common trjectores of the system, mkng n effcent use of resources. VII. CONCLUSIONS We hve shown tht estmtng probblty densty functon n the jont spce of sttes, ctons nd q-vlues, provdes useful tool for RL n contnuous domns. The probblty densty functon cptures ll the nformton vlble to the RL gent: In the frst plce, t provdes functon pproxmton for the cton-vlue functon Q(s, ) s the men of the smple vlues q(s, ); n second plce, s n the cse of usng GPs, the probblty densty functon provdes not only the men vlue of q(s, ), but full probblty dstrbuton of ts possble vlues, nd n prtculr, ts vrnce. We use ths nformton to drect the explorton, possblty suggested n [5], [4], but tht hd not been mplemented yet. Fnlly, the probblty densty n the jont spce cn be mrgnlzed to obtn the densty of smples n the stte-cton spce, whch my be used to mesure the confdence we my hve n the estmton t ech pont. To represent the probblty densty functon we use GMM wth vrble number of unts. Ths provdes generl, non-prmetrc, functon pproxmton tool. By usng n onlne verson of the EM lgorthm, the trnng of the GMM cn be done ncrementlly nd, thnks to the smplcty of the GMM, the updte process s computtonlly effcent. The fesblty of the method s demonstrted on stndrd benchmrk for RL, the swng-up nd blnce of n nverted pendulum wth lmted torque, wth good results. We beleve tht the smplcty nd expressveness of ths pproch mkes t promsng lterntve for RL n contnuous domns. REFERENCES [1] C. Stone, Optml globl rtes of convergence for nonprmetrc regresson, The Annls of Sttstcs, vol. 10, no. 4, pp , [2] C. Rsmussen nd M. Kuss, Gussn processes n renforcement lernng, Advnces n Neurl Informton Processng Systems, vol. 16, pp , [3] M. Desenroth, C. Rsmussen, nd J. Peters, Gussn process dynmc progrmmng, Neurocomputng, vol. 72, no. 7-9, pp , [4] Y. Engel, S. Mnnor, nd R. Mer, Renforcement lernng wth Gussn processes, n ICML 05: Proceedngs of the 22nd nterntonl conference on Mchne lernng. New York, NY, USA: ACM, 2005, pp [5], Byes meets Bellmn: The Gussn process pproch to temporl dfference lernng, n Proc. of the 20th Interntonl Conference on Mchne Lernng, 2003, pp [6] G. J. Gordon, Stble functon pproxmton n dynmc progrmmng, n ICML, 1995, pp [7] D. Ernst, P. Geurts, nd L. Wehenkel, Tree-bsed btch mode renforcement lernng, J. Mch. Lern. Res., vol. 6, pp , [8] D. Ormonet nd S. Sen, Kernel-bsed renforcement lernng, Mchne Lernng, vol. 49, no. 2-3, pp , [9] M. Redmller, Neurl Renforcement Lernng to Swng-up nd Blnce Rel Pole, n Proceedngs of the 2005 IEEE Interntonl Conference on Systems, Mn nd Cybernetcs, vol. 4, 2005, pp [10], Neurl ftted Q terton-frst experences wth dt effcent neurl renforcement lernng method, Lecture notes n computer scence, vol. 3720, pp , [11] R. Sutton nd A. Brto, Renforcement Lernng: An Introducton, B. Book, Ed. Cmbrdge, MA: MIT Press, [12] C. Wtkns nd P. Dyn, Q-lernng, Mchne Lernng, vol. 8, no. 3-4, pp , [Onlne]. Avlble: [13] R. Bellmn nd S. Dreyfus, Appled Dynmc Progrmmng. Prnceton, New Jersy: Prnceton Unversty Press, [14] C. M. Bshop, Pttern Recognton nd Mchne Lernng (Informton Scence nd Sttstcs). Secucus, NJ, USA: Sprnger-Verlg New York, Inc., [15] M. Fgueredo, On Gussn rdl bss functon pproxmtons: Interpretton, extensons, nd lernng strteges, Pttern Recognton, Interntonl Conference on, vol. 2, pp , [16] A. Dempster, N. Lrd, D. Rubn, et l., Mxmum lkelhood from ncomplete dt v the EM lgorthm, Journl of the Royl Sttstcl Socety. Seres B (Methodologcl), vol. 39, no. 1, pp. 1 38, [17] R. O. Dud, P. E. Hrt, nd D. G. Stork, Pttern clssfcton. New- York, USA: John Wley nd Sons, Inc, [18] M. Song nd H. Wng, Hghly effcent ncrementl estmton of Gussn mxture models for onlne dt strem clusterng, n Proceedngs of SPIE: Intellgent Computng: Theory nd Applctons III, Orlndo, FL, USA, 2005, pp [19] O. Arndjelovc nd R. Cpoll, Incrementl lernng of temporllycoherent Gussn mxture models, n Techncl Ppers - Socety of Mnufcturng Engneers (SME), [20] M.-A. Sto nd S. Ish, On-lne em lgorthm for the normlzed Gussn network, Neurl Comput., vol. 12, no. 2, pp , [21] S. J. Nowln, Soft compettve dptton: neurl network lernng lgorthms bsed on fttng sttstcl mxtures, Ph.D. dssertton, Pttsburgh, PA, USA, [22] R. Nel nd G. Hnton, A vew of the em lgorthm tht justfes ncrementl, sprse, nd other vrnts, n Proceedngs of the NATO Advnced Study Insttute on Lernng n grphcl models. Norwell, MA, USA: Kluwer Acdemc Publshers, 1998, pp [23] S. Thrun, The role of explorton n lernng control, n Hndbook for Intellgent Control: Neurl, Fuzzy nd Adptve Approches, D. Whte nd D. Sofge, Eds. Florence, Kentucky 41022: Vn Nostrnd Renhold, [24] K. Doy, Renforcement lernng n contnuous tme nd spce, Neurl Comput., vol. 12, no. 1, pp , [25] M.-. Sto nd S. Ish, Renforcement lernng bsed on on-lne em lgorthm, n Proceedngs of the 1998 conference on Advnces n neurl nformton processng systems (NIPS 99). Cmbrdge, MA, USA: MIT Press, 1999, pp

Dennis Bricker, 2001 Dept of Industrial Engineering The University of Iowa. MDP: Taxi page 1

Dennis Bricker, 2001 Dept of Industrial Engineering The University of Iowa. MDP: Taxi page 1 Denns Brcker, 2001 Dept of Industrl Engneerng The Unversty of Iow MDP: Tx pge 1 A tx serves three djcent towns: A, B, nd C. Ech tme the tx dschrges pssenger, the drver must choose from three possble ctons:

More information

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism CS294-40 Lernng for Rootcs nd Control Lecture 10-9/30/2008 Lecturer: Peter Aeel Prtlly Oservle Systems Scre: Dvd Nchum Lecture outlne POMDP formlsm Pont-sed vlue terton Glol methods: polytree, enumerton,

More information

Remember: Project Proposals are due April 11.

Remember: Project Proposals are due April 11. Bonformtcs ecture Notes Announcements Remember: Project Proposls re due Aprl. Clss 22 Aprl 4, 2002 A. Hdden Mrov Models. Defntons Emple - Consder the emple we tled bout n clss lst tme wth the cons. However,

More information

Applied Statistics Qualifier Examination

Applied Statistics Qualifier Examination Appled Sttstcs Qulfer Exmnton Qul_june_8 Fll 8 Instructons: () The exmnton contns 4 Questons. You re to nswer 3 out of 4 of them. () You my use ny books nd clss notes tht you mght fnd helpful n solvng

More information

Definition of Tracking

Definition of Tracking Trckng Defnton of Trckng Trckng: Generte some conclusons bout the moton of the scene, objects, or the cmer, gven sequence of mges. Knowng ths moton, predct where thngs re gong to project n the net mge,

More information

Principle Component Analysis

Principle Component Analysis Prncple Component Anlyss Jng Go SUNY Bufflo Why Dmensonlty Reducton? We hve too mny dmensons o reson bout or obtn nsghts from o vsulze oo much nose n the dt Need to reduce them to smller set of fctors

More information

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Chapter Newton-Raphson Method of Solving a Nonlinear Equation Chpter.4 Newton-Rphson Method of Solvng Nonlner Equton After redng ths chpter, you should be ble to:. derve the Newton-Rphson method formul,. develop the lgorthm of the Newton-Rphson method,. use the Newton-Rphson

More information

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC Introducton Rnk One Updte And the Google Mtrx y Al Bernsten Sgnl Scence, LLC www.sgnlscence.net here re two dfferent wys to perform mtrx multplctons. he frst uses dot product formulton nd the second uses

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 9

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 9 CS434/541: Pttern Recognton Prof. Olg Veksler Lecture 9 Announcements Fnl project proposl due Nov. 1 1-2 prgrph descrpton Lte Penlt: s 1 pont off for ech d lte Assgnment 3 due November 10 Dt for fnl project

More information

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Chapter Newton-Raphson Method of Solving a Nonlinear Equation Chpter 0.04 Newton-Rphson Method o Solvng Nonlner Equton Ater redng ths chpter, you should be ble to:. derve the Newton-Rphson method ormul,. develop the lgorthm o the Newton-Rphson method,. use the Newton-Rphson

More information

4. Eccentric axial loading, cross-section core

4. Eccentric axial loading, cross-section core . Eccentrc xl lodng, cross-secton core Introducton We re strtng to consder more generl cse when the xl force nd bxl bendng ct smultneousl n the cross-secton of the br. B vrtue of Snt-Vennt s prncple we

More information

UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS. M.Sc. in Economics MICROECONOMIC THEORY I. Problem Set II

UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS. M.Sc. in Economics MICROECONOMIC THEORY I. Problem Set II Mcroeconomc Theory I UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS MSc n Economcs MICROECONOMIC THEORY I Techng: A Lptns (Note: The number of ndctes exercse s dffculty level) ()True or flse? If V( y )

More information

CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION. Indu Manickam, Andrew S. Lan, and Richard G.

CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION. Indu Manickam, Andrew S. Lan, and Richard G. CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION Indu Mnckm, Andrew S. Ln, nd Rchrd G. Brnuk Rce Unversty ABSTRACT Optmzng the selecton of lernng resources nd prctce

More information

18.7 Artificial Neural Networks

18.7 Artificial Neural Networks 310 18.7 Artfcl Neurl Networks Neuroscence hs hypotheszed tht mentl ctvty conssts prmrly of electrochemcl ctvty n networks of brn cells clled neurons Ths led McCulloch nd Ptts to devse ther mthemtcl model

More information

Lecture 4: Piecewise Cubic Interpolation

Lecture 4: Piecewise Cubic Interpolation Lecture notes on Vrtonl nd Approxmte Methods n Appled Mthemtcs - A Perce UBC Lecture 4: Pecewse Cubc Interpolton Compled 6 August 7 In ths lecture we consder pecewse cubc nterpolton n whch cubc polynoml

More information

Statistics and Probability Letters

Statistics and Probability Letters Sttstcs nd Probblty Letters 79 (2009) 105 111 Contents lsts vlble t ScenceDrect Sttstcs nd Probblty Letters journl homepge: www.elsever.com/locte/stpro Lmtng behvour of movng verge processes under ϕ-mxng

More information

Improving Anytime Point-Based Value Iteration Using Principled Point Selections

Improving Anytime Point-Based Value Iteration Using Principled Point Selections In In Proceedngs of the Twenteth Interntonl Jont Conference on Artfcl Intellgence (IJCAI-7) Improvng Anytme Pont-Bsed Vlue Iterton Usng Prncpled Pont Selectons Mchel R. Jmes, Mchel E. Smples, nd Dmtr A.

More information

Statistics 423 Midterm Examination Winter 2009

Statistics 423 Midterm Examination Winter 2009 Sttstcs 43 Mdterm Exmnton Wnter 009 Nme: e-ml: 1. Plese prnt your nme nd e-ml ddress n the bove spces.. Do not turn ths pge untl nstructed to do so. 3. Ths s closed book exmnton. You my hve your hnd clcultor

More information

An Introduction to Support Vector Machines

An Introduction to Support Vector Machines An Introducton to Support Vector Mchnes Wht s good Decson Boundry? Consder two-clss, lnerly seprble clssfcton problem Clss How to fnd the lne (or hyperplne n n-dmensons, n>)? Any de? Clss Per Lug Mrtell

More information

Variable time amplitude amplification and quantum algorithms for linear algebra. Andris Ambainis University of Latvia

Variable time amplitude amplification and quantum algorithms for linear algebra. Andris Ambainis University of Latvia Vrble tme mpltude mplfcton nd quntum lgorthms for lner lgebr Andrs Ambns Unversty of Ltv Tlk outlne. ew verson of mpltude mplfcton;. Quntum lgorthm for testng f A s sngulr; 3. Quntum lgorthm for solvng

More information

On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization

On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization On-lne Renforcement Lernng Usng Incrementl Kernel-Bsed Stochstc Fctorzton André M. S. Brreto School of Computer Scence McGll Unversty Montrel, Cnd msb@cs.mcgll.c Don Precup School of Computer Scence McGll

More information

DCDM BUSINESS SCHOOL NUMERICAL METHODS (COS 233-8) Solutions to Assignment 3. x f(x)

DCDM BUSINESS SCHOOL NUMERICAL METHODS (COS 233-8) Solutions to Assignment 3. x f(x) DCDM BUSINESS SCHOOL NUMEICAL METHODS (COS -8) Solutons to Assgnment Queston Consder the followng dt: 5 f() 8 7 5 () Set up dfference tble through fourth dfferences. (b) Wht s the mnmum degree tht n nterpoltng

More information

arxiv: v2 [cs.lg] 9 Nov 2017

arxiv: v2 [cs.lg] 9 Nov 2017 Renforcement Lernng under Model Msmtch Aurko Roy 1, Hun Xu 2, nd Sebstn Pokutt 2 rxv:1706.04711v2 cs.lg 9 Nov 2017 1 Google Eml: urkor@google.com 2 ISyE, Georg Insttute of Technology, Atlnt, GA, USA. Eml:

More information

Two Coefficients of the Dyson Product

Two Coefficients of the Dyson Product Two Coeffcents of the Dyson Product rxv:07.460v mth.co 7 Nov 007 Lun Lv, Guoce Xn, nd Yue Zhou 3,,3 Center for Combntorcs, LPMC TJKLC Nnk Unversty, Tnjn 30007, P.R. Chn lvlun@cfc.nnk.edu.cn gn@nnk.edu.cn

More information

6 Roots of Equations: Open Methods

6 Roots of Equations: Open Methods HK Km Slghtly modfed 3//9, /8/6 Frstly wrtten t Mrch 5 6 Roots of Equtons: Open Methods Smple Fed-Pont Iterton Newton-Rphson Secnt Methods MATLAB Functon: fzero Polynomls Cse Study: Ppe Frcton Brcketng

More information

State Estimation in TPN and PPN Guidance Laws by Using Unscented and Extended Kalman Filters

State Estimation in TPN and PPN Guidance Laws by Using Unscented and Extended Kalman Filters Stte Estmton n PN nd PPN Gudnce Lws by Usng Unscented nd Extended Klmn Flters S.H. oospour*, S. oospour**, mostf.sdollh*** Fculty of Electrcl nd Computer Engneerng, Unversty of brz, brz, Irn, *s.h.moospour@gml.com

More information

The Schur-Cohn Algorithm

The Schur-Cohn Algorithm Modelng, Estmton nd Otml Flterng n Sgnl Processng Mohmed Njm Coyrght 8, ISTE Ltd. Aendx F The Schur-Cohn Algorthm In ths endx, our m s to resent the Schur-Cohn lgorthm [] whch s often used s crteron for

More information

Review of linear algebra. Nuno Vasconcelos UCSD

Review of linear algebra. Nuno Vasconcelos UCSD Revew of lner lgebr Nuno Vsconcelos UCSD Vector spces Defnton: vector spce s set H where ddton nd sclr multplcton re defned nd stsf: ) +( + ) (+ )+ 5) λ H 2) + + H 6) 3) H, + 7) λ(λ ) (λλ ) 4) H, - + 8)

More information

A Family of Multivariate Abel Series Distributions. of Order k

A Family of Multivariate Abel Series Distributions. of Order k Appled Mthemtcl Scences, Vol. 2, 2008, no. 45, 2239-2246 A Fmly of Multvrte Abel Seres Dstrbutons of Order k Rupk Gupt & Kshore K. Ds 2 Fculty of Scence & Technology, The Icf Unversty, Agrtl, Trpur, Ind

More information

Quiz: Experimental Physics Lab-I

Quiz: Experimental Physics Lab-I Mxmum Mrks: 18 Totl tme llowed: 35 mn Quz: Expermentl Physcs Lb-I Nme: Roll no: Attempt ll questons. 1. In n experment, bll of mss 100 g s dropped from heght of 65 cm nto the snd contner, the mpct s clled

More information

Two Activation Function Wavelet Network for the Identification of Functions with High Nonlinearity

Two Activation Function Wavelet Network for the Identification of Functions with High Nonlinearity Interntonl Journl of Engneerng & Computer Scence IJECS-IJENS Vol:1 No:04 81 Two Actvton Functon Wvelet Network for the Identfcton of Functons wth Hgh Nonlnerty Wsm Khld Abdulkder Abstrct-- The ntegrton

More information

International Journal of Pure and Applied Sciences and Technology

International Journal of Pure and Applied Sciences and Technology Int. J. Pure Appl. Sc. Technol., () (), pp. 44-49 Interntonl Journl of Pure nd Appled Scences nd Technolog ISSN 9-67 Avlle onlne t www.jopst.n Reserch Pper Numercl Soluton for Non-Lner Fredholm Integrl

More information

Electrochemical Thermodynamics. Interfaces and Energy Conversion

Electrochemical Thermodynamics. Interfaces and Energy Conversion CHE465/865, 2006-3, Lecture 6, 18 th Sep., 2006 Electrochemcl Thermodynmcs Interfces nd Energy Converson Where does the energy contrbuton F zϕ dn come from? Frst lw of thermodynmcs (conservton of energy):

More information

Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service

Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service Dynmc Power Mngement n Moble Multmed System wth Gurnteed Qulty-of-Servce Qnru Qu, Qng Wu, nd Mssoud Pedrm Dept. of Electrcl Engneerng-Systems Unversty of Southern Clforn Los Angeles CA 90089 Outlne! Introducton

More information

Simultaneous estimation of rewards and dynamics from noisy expert demonstrations

Simultaneous estimation of rewards and dynamics from noisy expert demonstrations Smultneous estmton of rewrds nd dynmcs from nosy expert demonstrtons Mchel Hermn,2, Tobs Gndele, Jo rg Wgner, Felx Schmtt, nd Wolfrm Burgrd2 - Robert Bosch GmbH - 70442 Stuttgrt - Germny 2- Unversty of

More information

We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and

We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and MANAGEMENT SCIENCE Vol. 53, No. 2, Februry 2007, pp. 308 322 ssn 0025-1909 essn 1526-5501 07 5302 0308 nforms do 10.1287/mnsc.1060.0614 2007 INFORMS Bs nd Vrnce Approxmton n Vlue Functon Estmtes She Mnnor

More information

Online Appendix to. Mandating Behavioral Conformity in Social Groups with Conformist Members

Online Appendix to. Mandating Behavioral Conformity in Social Groups with Conformist Members Onlne Appendx to Mndtng Behvorl Conformty n Socl Groups wth Conformst Members Peter Grzl Andrze Bnk (Correspondng uthor) Deprtment of Economcs, The Wllms School, Wshngton nd Lee Unversty, Lexngton, 4450

More information

CHAPTER - 7. Firefly Algorithm based Strategic Bidding to Maximize Profit of IPPs in Competitive Electricity Market

CHAPTER - 7. Firefly Algorithm based Strategic Bidding to Maximize Profit of IPPs in Competitive Electricity Market CHAPTER - 7 Frefly Algorthm sed Strtegc Bddng to Mxmze Proft of IPPs n Compettve Electrcty Mrket 7. Introducton The renovton of electrc power systems plys mjor role on economc nd relle operton of power

More information

SCALED GRADIENT DESCENT LEARNING RATE Reinforcement learning with light-seeking robot

SCALED GRADIENT DESCENT LEARNING RATE Reinforcement learning with light-seeking robot SCALED GRADIET DESCET LEARIG RATE Renforcement lernng wth lght-seekng robot Kry Främlng Helsnk Unversty of Technology, P.O. Box 54, FI-5 HUT, Fnlnd. Eml: Kry.Frmlng@hut.f Keywords: Abstrct: Lner functon

More information

Introduction to Numerical Integration Part II

Introduction to Numerical Integration Part II Introducton to umercl Integrton Prt II CS 75/Mth 75 Brn T. Smth, UM, CS Dept. Sprng, 998 4/9/998 qud_ Intro to Gussn Qudrture s eore, the generl tretment chnges the ntegrton prolem to ndng the ntegrl w

More information

Katholieke Universiteit Leuven Department of Computer Science

Katholieke Universiteit Leuven Department of Computer Science Updte Rules for Weghted Non-negtve FH*G Fctorzton Peter Peers Phlp Dutré Report CW 440, Aprl 006 Ktholeke Unverstet Leuven Deprtment of Computer Scence Celestjnenln 00A B-3001 Heverlee (Belgum) Updte Rules

More information

A Tri-Valued Belief Network Model for Information Retrieval

A Tri-Valued Belief Network Model for Information Retrieval December 200 A Tr-Vlued Belef Networ Model for Informton Retrevl Fernndo Ds-Neves Computer Scence Dept. Vrgn Polytechnc Insttute nd Stte Unversty Blcsburg, VA 24060. IR models t Combnng Evdence Grphcl

More information

523 P a g e. is measured through p. should be slower for lesser values of p and faster for greater values of p. If we set p*

523 P a g e. is measured through p. should be slower for lesser values of p and faster for greater values of p. If we set p* R. Smpth Kumr, R. Kruthk, R. Rdhkrshnn / Interntonl Journl of Engneerng Reserch nd Applctons (IJERA) ISSN: 48-96 www.jer.com Vol., Issue 4, July-August 0, pp.5-58 Constructon Of Mxed Smplng Plns Indexed

More information

Lecture 36. Finite Element Methods

Lecture 36. Finite Element Methods CE 60: Numercl Methods Lecture 36 Fnte Element Methods Course Coordntor: Dr. Suresh A. Krth, Assocte Professor, Deprtment of Cvl Engneerng, IIT Guwht. In the lst clss, we dscussed on the ppromte methods

More information

Soft Set Theoretic Approach for Dimensionality Reduction 1

Soft Set Theoretic Approach for Dimensionality Reduction 1 Interntonl Journl of Dtbse Theory nd pplcton Vol No June 00 Soft Set Theoretc pproch for Dmensonlty Reducton Tutut Herwn Rozd Ghzl Mustf Mt Ders Deprtment of Mthemtcs Educton nversts hmd Dhln Yogykrt Indones

More information

GAUSS ELIMINATION. Consider the following system of algebraic linear equations

GAUSS ELIMINATION. Consider the following system of algebraic linear equations Numercl Anlyss for Engneers Germn Jordnn Unversty GAUSS ELIMINATION Consder the followng system of lgebrc lner equtons To solve the bove system usng clsscl methods, equton () s subtrcted from equton ()

More information

In this Chapter. Chap. 3 Markov chains and hidden Markov models. Probabilistic Models. Example: CpG Islands

In this Chapter. Chap. 3 Markov chains and hidden Markov models. Probabilistic Models. Example: CpG Islands In ths Chpter Chp. 3 Mrov chns nd hdden Mrov models Bontellgence bortory School of Computer Sc. & Eng. Seoul Ntonl Unversty Seoul 5-74, Kore The probblstc model for sequence nlyss HMM (hdden Mrov model)

More information

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede Fll Anlss of Epermentl Mesurements B. Esensten/rev. S. Errede Monte Crlo Methods/Technques: These re mong the most powerful tools for dt nlss nd smulton of eperments. The growth of ther mportnce s closel

More information

Using Predictions in Online Optimization: Looking Forward with an Eye on the Past

Using Predictions in Online Optimization: Looking Forward with an Eye on the Past Usng Predctons n Onlne Optmzton: Lookng Forwrd wth n Eye on the Pst Nngjun Chen Jont work wth Joshu Comden, Zhenhu Lu, Anshul Gndh, nd Adm Wermn 1 Predctons re crucl for decson mkng 2 Predctons re crucl

More information

Bi-level models for OD matrix estimation

Bi-level models for OD matrix estimation TNK084 Trffc Theory seres Vol.4, number. My 2008 B-level models for OD mtrx estmton Hn Zhng, Quyng Meng Abstrct- Ths pper ntroduces two types of O/D mtrx estmton model: ME2 nd Grdent. ME2 s mxmum-entropy

More information

Study of Trapezoidal Fuzzy Linear System of Equations S. M. Bargir 1, *, M. S. Bapat 2, J. D. Yadav 3 1

Study of Trapezoidal Fuzzy Linear System of Equations S. M. Bargir 1, *, M. S. Bapat 2, J. D. Yadav 3 1 mercn Interntonl Journl of Reserch n cence Technology Engneerng & Mthemtcs vlble onlne t http://wwwsrnet IN (Prnt: 38-349 IN (Onlne: 38-3580 IN (CD-ROM: 38-369 IJRTEM s refereed ndexed peer-revewed multdscplnry

More information

Exploiting Structure in Probability Distributions Irit Gat-Viks

Exploiting Structure in Probability Distributions Irit Gat-Viks Explotng Structure n rolty Dstrutons Irt Gt-Vks Bsed on presentton nd lecture notes of Nr Fredmn, Herew Unversty Generl References: D. Koller nd N. Fredmn, prolstc grphcl models erl, rolstc Resonng n Intellgent

More information

CHI-SQUARE DIVERGENCE AND MINIMIZATION PROBLEM

CHI-SQUARE DIVERGENCE AND MINIMIZATION PROBLEM CHI-SQUARE DIVERGENCE AND MINIMIZATION PROBLEM PRANESH KUMAR AND INDER JEET TANEJA Abstrct The mnmum dcrmnton nformton prncple for the Kullbck-Lebler cross-entropy well known n the lterture In th pper

More information

A Reinforcement Learning System with Chaotic Neural Networks-Based Adaptive Hierarchical Memory Structure for Autonomous Robots

A Reinforcement Learning System with Chaotic Neural Networks-Based Adaptive Hierarchical Memory Structure for Autonomous Robots Interntonl Conference on Control, Automton nd ystems 008 Oct. 4-7, 008 n COEX, eoul, Kore A Renforcement ernng ystem wth Chotc Neurl Networs-Bsed Adptve Herrchcl Memory tructure for Autonomous Robots Msno

More information

Computing a complete histogram of an image in Log(n) steps and minimum expected memory requirements using hypercubes

Computing a complete histogram of an image in Log(n) steps and minimum expected memory requirements using hypercubes Computng complete hstogrm of n mge n Log(n) steps nd mnmum expected memory requrements usng hypercubes TAREK M. SOBH School of Engneerng, Unversty of Brdgeport, Connectcut, USA. Abstrct Ths work frst revews

More information

THE COMBINED SHEPARD ABEL GONCHAROV UNIVARIATE OPERATOR

THE COMBINED SHEPARD ABEL GONCHAROV UNIVARIATE OPERATOR REVUE D ANALYSE NUMÉRIQUE ET DE THÉORIE DE L APPROXIMATION Tome 32, N o 1, 2003, pp 11 20 THE COMBINED SHEPARD ABEL GONCHAROV UNIVARIATE OPERATOR TEODORA CĂTINAŞ Abstrct We extend the Sheprd opertor by

More information

INTRODUCTION TO COMPLEX NUMBERS

INTRODUCTION TO COMPLEX NUMBERS INTRODUCTION TO COMPLEX NUMBERS The numers -4, -3, -, -1, 0, 1,, 3, 4 represent the negtve nd postve rel numers termed ntegers. As one frst lerns n mddle school they cn e thought of s unt dstnce spced

More information

LOCAL FRACTIONAL LAPLACE SERIES EXPANSION METHOD FOR DIFFUSION EQUATION ARISING IN FRACTAL HEAT TRANSFER

LOCAL FRACTIONAL LAPLACE SERIES EXPANSION METHOD FOR DIFFUSION EQUATION ARISING IN FRACTAL HEAT TRANSFER Yn, S.-P.: Locl Frctonl Lplce Seres Expnson Method for Dffuson THERMAL SCIENCE, Yer 25, Vol. 9, Suppl., pp. S3-S35 S3 LOCAL FRACTIONAL LAPLACE SERIES EXPANSION METHOD FOR DIFFUSION EQUATION ARISING IN

More information

Jean Fernand Nguema LAMETA UFR Sciences Economiques Montpellier. Abstract

Jean Fernand Nguema LAMETA UFR Sciences Economiques Montpellier. Abstract Stochstc domnnce on optml portfolo wth one rsk less nd two rsky ssets Jen Fernnd Nguem LAMETA UFR Scences Economques Montpeller Abstrct The pper provdes restrctons on the nvestor's utlty functon whch re

More information

Altitude Estimation for 3-D Tracking with Two 2-D Radars

Altitude Estimation for 3-D Tracking with Two 2-D Radars th Interntonl Conference on Informton Fuson Chcgo Illnos USA July -8 Alttude Estmton for -D Trckng wth Two -D Rdrs Yothn Rkvongth Jfeng Ru Sv Svnnthn nd Soontorn Orntr Deprtment of Electrcl Engneerng Unversty

More information

UNSCENTED KALMAN FILTER POSITION ESTIMATION FOR AN AUTONOMOUS MOBILE ROBOT

UNSCENTED KALMAN FILTER POSITION ESTIMATION FOR AN AUTONOMOUS MOBILE ROBOT Bulletn of the rnslvn Unversty of Brşov Vol. 3 (52) - 21 Seres I: Engneerng Scences UNSCENED KALMAN FILER OSIION ESIMAION FOR AN AUONOMOUS MOBILE ROBO C. SULIMAN 1 F. MOLDOVEANU 1 Abstrct: he Klmn flters

More information

Course Review Introduction to Computer Methods

Course Review Introduction to Computer Methods Course Revew Wht you hopefully hve lerned:. How to nvgte nsde MIT computer system: Athen, UNIX, emcs etc. (GCR). Generl des bout progrmmng (GCR): formultng the problem, codng n Englsh trnslton nto computer

More information

CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVEYS

CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVEYS CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVES Rodolphe Prm, Ntle Shlomo Southmpton Sttstcl Scences Reserch Insttute Unverst of Southmpton Unted Kngdom SAE, August 20 The BLUE-ETS Project s fnnced

More information

3/6/00. Reading Assignments. Outline. Hidden Markov Models: Explanation and Model Learning

3/6/00. Reading Assignments. Outline. Hidden Markov Models: Explanation and Model Learning 3/6/ Hdden Mrkov Models: Explnton nd Model Lernng Brn C. Wllms 6.4/6.43 Sesson 2 9/3/ courtesy of JPL copyrght Brn Wllms, 2 Brn C. Wllms, copyrght 2 Redng Assgnments AIMA (Russell nd Norvg) Ch 5.-.3, 2.3

More information

PLEASE SCROLL DOWN FOR ARTICLE

PLEASE SCROLL DOWN FOR ARTICLE Ths rtcle ws downloded by:ntonl Cheng Kung Unversty] On: 1 September 7 Access Detls: subscrpton number 7765748] Publsher: Tylor & Frncs Inform Ltd Regstered n Englnd nd Wles Regstered Number: 17954 Regstered

More information

ESCI 342 Atmospheric Dynamics I Lesson 1 Vectors and Vector Calculus

ESCI 342 Atmospheric Dynamics I Lesson 1 Vectors and Vector Calculus ESI 34 tmospherc Dnmcs I Lesson 1 Vectors nd Vector lculus Reference: Schum s Outlne Seres: Mthemtcl Hndbook of Formuls nd Tbles Suggested Redng: Mrtn Secton 1 OORDINTE SYSTEMS n orthonorml coordnte sstem

More information

Machine Learning Support Vector Machines SVM

Machine Learning Support Vector Machines SVM Mchne Lernng Support Vector Mchnes SVM Lesson 6 Dt Clssfcton problem rnng set:, D,,, : nput dt smple {,, K}: clss or lbel of nput rget: Construct functon f : X Y f, D Predcton of clss for n unknon nput

More information

Cramer-Rao Lower Bound for a Nonlinear Filtering Problem with Multiplicative Measurement Errors and Forcing Noise

Cramer-Rao Lower Bound for a Nonlinear Filtering Problem with Multiplicative Measurement Errors and Forcing Noise Preprnts of the 9th World Congress he Interntonl Federton of Automtc Control Crmer-Ro Lower Bound for Nonlner Flterng Problem wth Multplctve Mesurement Errors Forcng Nose Stepnov О.А. Vslyev V.А. Concern

More information

Investigation phase in case of Bragg coupling

Investigation phase in case of Bragg coupling Journl of Th-Qr Unversty No.3 Vol.4 December/008 Investgton phse n cse of Brgg couplng Hder K. Mouhmd Deprtment of Physcs, College of Scence, Th-Qr, Unv. Mouhmd H. Abdullh Deprtment of Physcs, College

More information

Math 497C Sep 17, Curves and Surfaces Fall 2004, PSU

Math 497C Sep 17, Curves and Surfaces Fall 2004, PSU Mth 497C Sep 17, 004 1 Curves nd Surfces Fll 004, PSU Lecture Notes 3 1.8 The generl defnton of curvture; Fox-Mlnor s Theorem Let α: [, b] R n be curve nd P = {t 0,...,t n } be prtton of [, b], then the

More information

Non-Linear Data for Neural Networks Training and Testing

Non-Linear Data for Neural Networks Training and Testing Proceedngs of the 4th WSEAS Int Conf on Informton Securty, Communctons nd Computers, Tenerfe, Spn, December 6-8, 005 (pp466-47) Non-Lner Dt for Neurl Networks Trnng nd Testng ABDEL LATIF ABU-DALHOUM MOHAMMED

More information

The Number of Rows which Equal Certain Row

The Number of Rows which Equal Certain Row Interntonl Journl of Algebr, Vol 5, 011, no 30, 1481-1488 he Number of Rows whch Equl Certn Row Ahmd Hbl Deprtment of mthemtcs Fcult of Scences Dmscus unverst Dmscus, Sr hblhmd1@gmlcom Abstrct Let be X

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

Pyramid Algorithms for Barycentric Rational Interpolation

Pyramid Algorithms for Barycentric Rational Interpolation Pyrmd Algorthms for Brycentrc Rtonl Interpolton K Hormnn Scott Schefer Astrct We present new perspectve on the Floter Hormnn nterpolnt. Ths nterpolnt s rtonl of degree (n, d), reproduces polynomls of degree

More information

SVMs for regression Non-parametric/instance based classification method

SVMs for regression Non-parametric/instance based classification method S 75 Mchne ernng ecture Mos Huskrecht mos@cs.ptt.edu 539 Sennott Squre SVMs for regresson Non-prmetrc/nstnce sed cssfcton method S 75 Mchne ernng Soft-mrgn SVM Aos some fet on crossng the seprtng hperpne

More information

Online Learning Algorithms for Stochastic Water-Filling

Online Learning Algorithms for Stochastic Water-Filling Onlne Lernng Algorthms for Stochstc Wter-Fllng Y G nd Bhskr Krshnmchr Mng Hseh Deprtment of Electrcl Engneerng Unversty of Southern Clforn Los Angeles, CA 90089, USA Eml: {yg, bkrshn}@usc.edu Abstrct Wter-fllng

More information

Formulated Algorithm for Computing Dominant Eigenvalue. and the Corresponding Eigenvector

Formulated Algorithm for Computing Dominant Eigenvalue. and the Corresponding Eigenvector Int. J. Contemp. Mth. Scences Vol. 8 23 no. 9 899-9 HIKARI Ltd www.m-hkr.com http://dx.do.org/.2988/jcms.23.3674 Formulted Algorthm for Computng Domnnt Egenlue nd the Correspondng Egenector Igob Dod Knu

More information

Research Article On the Upper Bounds of Eigenvalues for a Class of Systems of Ordinary Differential Equations with Higher Order

Research Article On the Upper Bounds of Eigenvalues for a Class of Systems of Ordinary Differential Equations with Higher Order Hndw Publshng Corporton Interntonl Journl of Dfferentl Equtons Volume 0, Artcle ID 7703, pges do:055/0/7703 Reserch Artcle On the Upper Bounds of Egenvlues for Clss of Systems of Ordnry Dfferentl Equtons

More information

Audio De-noising Analysis Using Diagonal and Non-Diagonal Estimation Techniques

Audio De-noising Analysis Using Diagonal and Non-Diagonal Estimation Techniques Audo De-nosng Anlyss Usng Dgonl nd Non-Dgonl Estmton Technques Sugt R. Pwr 1, Vshl U. Gdero 2, nd Rhul N. Jdhv 3 1 AISSMS, IOIT, Pune, Ind Eml: sugtpwr@gml.com 2 Govt Polytechnque, Pune, Ind Eml: vshl.gdero@gml.com

More information

Smart Motorways HADECS 3 and what it means for your drivers

Smart Motorways HADECS 3 and what it means for your drivers Vehcle Rentl Smrt Motorwys HADECS 3 nd wht t mens for your drvers Vehcle Rentl Smrt Motorwys HADECS 3 nd wht t mens for your drvers You my hve seen some news rtcles bout the ntroducton of Hghwys Englnd

More information

LECTURE 5: LOGICS OF COOPERATION (II) 1.1 Examples of ATL Formulae. 1.2 Extended Example: Train Controller. 1 Alternating-time Temporal Logic

LECTURE 5: LOGICS OF COOPERATION (II) 1.1 Examples of ATL Formulae. 1.2 Extended Example: Train Controller. 1 Alternating-time Temporal Logic ctr out-of -gte reuest % " F F wll be true untl wll lwys be true 2 gwb nd tb hve jont strtegy for ensurng tht eventully there s pece gwb tb pece mp hs no strtegy for ensurng tht the udence s lwys excted

More information

CIS587 - Artificial Intelligence. Uncertainty CIS587 - AI. KB for medical diagnosis. Example.

CIS587 - Artificial Intelligence. Uncertainty CIS587 - AI. KB for medical diagnosis. Example. CIS587 - rtfcl Intellgence Uncertnty K for medcl dgnoss. Exmple. We wnt to uld K system for the dgnoss of pneumon. rolem descrpton: Dsese: pneumon tent symptoms fndngs, l tests: Fever, Cough, leness, WC

More information

Chapter 5 Supplemental Text Material R S T. ij i j ij ijk

Chapter 5 Supplemental Text Material R S T. ij i j ij ijk Chpter 5 Supplementl Text Mterl 5-. Expected Men Squres n the Two-fctor Fctorl Consder the two-fctor fxed effects model y = µ + τ + β + ( τβ) + ε k R S T =,,, =,,, k =,,, n gven s Equton (5-) n the textook.

More information

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. with respect to λ. 1. χ λ χ λ ( ) λ, and thus:

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. with respect to λ. 1. χ λ χ λ ( ) λ, and thus: More on χ nd errors : uppose tht we re fttng for sngle -prmeter, mnmzng: If we epnd The vlue χ ( ( ( ; ( wth respect to. χ n Tlor seres n the vcnt of ts mnmum vlue χ ( mn χ χ χ χ + + + mn mnmzes χ, nd

More information

M/G/1/GD/ / System. ! Pollaczek-Khinchin (PK) Equation. ! Steady-state probabilities. ! Finding L, W q, W. ! π 0 = 1 ρ

M/G/1/GD/ / System. ! Pollaczek-Khinchin (PK) Equation. ! Steady-state probabilities. ! Finding L, W q, W. ! π 0 = 1 ρ M/G//GD/ / System! Pollcze-Khnchn (PK) Equton L q 2 2 λ σ s 2( + ρ ρ! Stedy-stte probbltes! π 0 ρ! Fndng L, q, ) 2 2 M/M/R/GD/K/K System! Drw the trnston dgrm! Derve the stedy-stte probbltes:! Fnd L,L

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Expectation-Propagation for the Generative Aspect Model

Expectation-Propagation for the Generative Aspect Model Expectton-Propgton for the Genertve Aspect Model Thoms Mnk Deprtment of Sttstcs Crnege Mellon Unversty Pttsburgh, PA 15213 USA mnk@stt.cmu.edu John Lfferty School of Computer Scence Crnege Mellon Unversty

More information

Utility function estimation: The entropy approach

Utility function estimation: The entropy approach Physc A 387 (28) 3862 3867 www.elsever.com/locte/phys Utlty functon estmton: The entropy pproch Andre Donso,, A. Hetor Res b,c, Lus Coelho Unversty of Evor, Center of Busness Studes, CEFAGE-UE, Lrgo Colegs,

More information

A New Markov Chain Based Acceptance Sampling Policy via the Minimum Angle Method

A New Markov Chain Based Acceptance Sampling Policy via the Minimum Angle Method Irnn Journl of Opertons Reserch Vol. 3, No., 202, pp. 04- A New Mrkov Chn Bsed Acceptnce Smplng Polcy v the Mnmum Angle Method M. S. Fllh Nezhd * We develop n optmzton model bsed on Mrkovn pproch to determne

More information

A Regression-Based Approach for Scaling-Up Personalized Recommender Systems in E-Commerce

A Regression-Based Approach for Scaling-Up Personalized Recommender Systems in E-Commerce A Regresson-Bsed Approch for Sclng-Up Personlzed Recommender Systems n E-Commerce Slobodn Vucetc 1 nd Zorn Obrdovc 1, svucetc@eecs.wsu.edu, zorn@cs.temple.edu 1 Electrcl Engneerng nd Computer Scence, Wshngton

More information

Lesson 2. Thermomechanical Measurements for Energy Systems (MENR) Measurements for Mechanical Systems and Production (MMER)

Lesson 2. Thermomechanical Measurements for Energy Systems (MENR) Measurements for Mechanical Systems and Production (MMER) Lesson 2 Thermomechncl Mesurements for Energy Systems (MEN) Mesurements for Mechncl Systems nd Producton (MME) 1 A.Y. 2015-16 Zccr (no ) Del Prete A U The property A s clled: «mesurnd» the reference property

More information

ON SIMPSON S INEQUALITY AND APPLICATIONS. 1. Introduction The following inequality is well known in the literature as Simpson s inequality : 2 1 f (4)

ON SIMPSON S INEQUALITY AND APPLICATIONS. 1. Introduction The following inequality is well known in the literature as Simpson s inequality : 2 1 f (4) ON SIMPSON S INEQUALITY AND APPLICATIONS SS DRAGOMIR, RP AGARWAL, AND P CERONE Abstrct New neultes of Smpson type nd ther pplcton to udrture formule n Numercl Anlyss re gven Introducton The followng neulty

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

Linear and Nonlinear Optimization

Linear and Nonlinear Optimization Lner nd Nonlner Optmzton Ynyu Ye Deprtment of Mngement Scence nd Engneerng Stnford Unversty Stnford, CA 9430, U.S.A. http://www.stnford.edu/~yyye http://www.stnford.edu/clss/msnde/ Ynyu Ye, Stnford, MS&E

More information

Reproducing Kernel Hilbert Space for. Penalized Regression Multi-Predictors: Case in Longitudinal Data

Reproducing Kernel Hilbert Space for. Penalized Regression Multi-Predictors: Case in Longitudinal Data Interntonl Journl of Mthemtcl Anlyss Vol. 8, 04, no. 40, 95-96 HIKARI Ltd, www.m-hkr.com http://dx.do.org/0.988/jm.04.47 Reproducng Kernel Hlbert Spce for Penlzed Regresson Mult-Predctors: Cse n Longudnl

More information

AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force Base, Ohio

AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force Base, Ohio 7 CHANGE-POINT METHODS FOR OVERDISPERSED COUNT DATA THESIS Brn A. Wlken, Cptn, Unted Sttes Ar Force AFIT/GOR/ENS/7-26 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE OF TECHNOLOGY Wrght-Ptterson

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

The Dynamic Multi-Task Supply Chain Principal-Agent Analysis

The Dynamic Multi-Task Supply Chain Principal-Agent Analysis J. Servce Scence & Mngement 009 : 9- do:0.46/jssm.009.409 Publshed Onlne December 009 www.scp.org/journl/jssm) 9 he Dynmc Mult-sk Supply Chn Prncpl-Agent Anlyss Shnlng LI Chunhu WANG Dol ZHU Mngement School

More information

8. INVERSE Z-TRANSFORM

8. INVERSE Z-TRANSFORM 8. INVERSE Z-TRANSFORM The proce by whch Z-trnform of tme ere, nmely X(), returned to the tme domn clled the nvere Z-trnform. The nvere Z-trnform defned by: Computer tudy Z X M-fle trn.m ued to fnd nvere

More information