The Essential Dynamics Algorithm: Essential Results
|
|
- Oswald Morris
- 5 years ago
- Views:
Transcription
1 @ MIT maachuett nttute of technology artfcal ntellgence laboratory The Eental Dynamc Algorthm: Eental Reult Martn C. Martn AI Memo May maachuett nttute of technology, cambrdge, ma 0139 ua
2 Abtract Th paper preent a novel algorthm for learnng n a cla of tochatc Markov decon procee (MDP) wth contnuou tate and acton pace that trade peed for accuracy. A tranform of the tochatc MDP nto a determntc one preented whch capture the eence of the orgnal dynamc, n a ene made prece. In th tranformed MDP, the calculaton of value greatly mplfed. The onlne algorthm etmate the model of the tranformed MDP and multaneouly doe polcy earch agant t. Bound on the error of th approxmaton are proven, and expermental reult n a bcycle rdng doman are preented. The algorthm learn near optmal polce n order of magntude fewer nteracton wth the tochatc MDP, ung le doman knowledge. All code ued n the experment avalable on the project web te. Th work wa funded by DARPA a part of the "Natural Takng of Robot Baed on Human Interacton Cue" project under contract number DABT C-1010.
3 1 Introducton There currently much nteret n the problem of learnng n tochatc Markov decon procee (MDP) wth contnuou tate and acton pace [, 9, 10]. For uch doman, epecally when the tate or acton pace are of hgh dmenon, the value and Q-functon may be qute complcated and dffcult to approxmate. However, there may be relatvely mple polce whch perform well. Th ha lead to recent nteret n polcy earch algorthm, n whch the renforcement gnal ued to modfy the polcy drectly [5, 6, 10]. For many problem, a potve reward only acheved at the end of a tak f the agent reache a goal tate. For complex problem, the probablty that an ntal, random polcy would reach uch a tate could be vanhngly mall. A wdely ued methodology to overcome th hapng [1, 3, 4, 8]. Shapng the ntroducton of mall reward to reward partal progre toward the goal. A hapng functon eae the problem of backng up reward, nce acton are rewarded or punhed ooner. When a polcy change, etmatng the reultng change n value can be dffcult, requrng the new polcy to nteract wth the MDP for many epode. In th paper we ntroduce a method of tranformng a tochatc MDP nto a determntc one. Under certan condton on the orgnal MDP, and gven a hapng reward of the proper form, the determntc MDP can be ued to etmate the value of any polcy wth repect to the orgnal MDP. Th lead to an onlne algorthm for polcy earch: multaneouly etmate the parameter of a model for the tranformed, determntc MDP, and ue th model to etmate both the value of a polcy and the gradent of that value wth repect to the polcy parameter. Then, ung thee etmate, perform gradent decent earch on the polcy parameter. Snce the tranformaton capture what mportant about the orgnal MDP for plannng, we call our method the eental dynamc algorthm. The next ecton gve an overvew of the technque, developng the ntuton behnd t. In ecton 3 we decrbe the mathematcal foundaton of the algorthm, ncludng bound on the dfference between value n the orgnal and tranformed MDP. Secton 4 decrbe an applcaton of th technque to learnng to rde a bcycle. The lat ecton dcue thee reult, comparng them to prevou work. On the bcycle rdng tak, gven the mulator, the only doman knowledge needed a hapng reward that decreae a lean angle ncreae, and a angle to goal ncreae. Compared to prevou work on th problem, a near optmal polcy found n dramatcally le mulated tme, and wth le doman knowledge. Overvew of the Eental Dynamc Algorthm In the eental dynamc algorthm we learn a model of how tate evolve wth tme, and then ue th model to compute the value of the current polcy. In addton, f the polcy and model are from a parameterzed famly, we can compute the gradent of the value wth repect to the parameter. In puttng th plan nto practce, one dffculty that tate tranton are tochatc, o that expected reward mut be computed. One way to compute them to generate many trajectore and average over them, but th can be very tme conumng. Intead we mght be tempted to etmate only the mean of the tate at each future tme, and ue the reward aocated wth that. However, we can do better. If the reward quadratc, the expected reward partcularly mple. Gven knowledge of the tate at tme t, wecan then talk about the dtrbuton of poble tate at ome later tme. For a gven dtrbuton of tate, let denote the expected tate. Then Er [ ()] = ( a ( ) + b ( ) + c)p () d = avar() + b ( ) + c = avar() + c where a, b & c depend on. (1) The Eental Dynamc Algorthm: Eental Reult p.
4 Suppoe the polcy depend on a vector of parameter θ. When nteractng wth the MDP, at every tme t after havng taken acton a t-1 n tate t-1 and arrvng n tate t : 1. µ ( t 1, a t 1 ) t. ν ( t 1, a t 1 ) ( t µ ( t 1, a t 1 )) t σ t Ṽ = = = t For every τ n t+1.. t+n: a. τ = µ ( τ 1, π( τ 1 )) b. σ τ = ν ( τ 1, π( τ 1 )) + σ τ 1 ( µ' π ( τ 1 )) c. r τ = 1 --r''( τ )σ τ d. Ṽ = Ṽ + γ τ t r τ 7. Update the polcy n the drecton that ncreae Ṽ : θ = θ + α----- Ṽ θ Fgure 1: The eental dynamc algorthm for a one dmenonal tate pace. The notaton fx ( ) a mean adjut the parameter that determne f to make f(x) cloertoa, e.g. by gradent decent. µ' the dervatve of µ (, π() ) wth repect to. π + r τ ( ) Thu, to calculate the expected reward, we don t need to know the full tate dtrbuton, but mply t mean and varance. Thu, our model hould decrbe how the mean and varance evolve over tme. If the tate tranton are mooth, they can be approxmated by a Taylor ere. Let π be the current polcy, and let µ π () denote the expected tate that reult from takng acton π() n tate. If denote the mean tate at tme t, and the varance, and f tate tranton were determntc, then to frt order we would have σ t t 1 + µ π ( t ) σ t + 1 dµ π ( d t ) σ t where µ π ' the dervatve of µ π wth repect to tate. For tochatc tate tranton, let ν π () be the varance of the tate that reult from takng acton π() n tate. It turn out that the varance at the next tme tep mply ν π () plu the tranformed varance from above, leadng to t + 1 µ π ( t ) dµ σ t + 1 ν π ( t ) + π ( d t ) σ t () Thu, we learn etmate µ and ν of µ and ν repectvely, ue Eq. () to etmate the mean and varance of future tate, and Eq. (1) to calculate the expected reward. The reultng algorthm, whch we call the expected dynamc algorthm, preented n Fgure 1. t The Eental Dynamc Algorthm: Eental Reult p. 3
5 The next ecton gve a formal dervaton of the algorthm, and prove error bound on the etmated tate, varance, reward and value for the general n-dmenonal cae, where the reward only approxmately quadratc. 3 Dervaton of the Eental Dynamc Algorthm A Markov Decon Proce (MDP) a tuple S, D, A, P, a, r, γ where: S a et of tate; D: S thental-tate dtrbuton; A a et of acton; P, a : S are the tranton probablte; r: S A thereward; and γ the dcount factor. Th paper concerned wth contnuou tate and acton pace, n partcular we aume S n = and A n a =. We ue ubcrpt to denote tme and upercrpt to denote component of vector and matrce. Thu, t denote the th component of the vector at tme t. A (determntc) polcy a mappng from a tate to the acton to be taken n that tate, π : S A. Gven a polcy and a dtrbuton P t of tate at tme t, uch a the ntal tate dtrbuton or the oberved tate, the dtrbuton of tate at future tme defned by the recurve relaton P τ + 1 () = P ', π( ' ) ()P τ ( ' ) d' for τ > t. Gven uch a dtrbuton, we can defne the expectaton and the covarance matrx of a random vector x wth S repect to t, whch we denote E t [ x] and cov t () x repectvely. Thu, E t [ x] = xp t ()x x d j, cov t and () x = E t [( x E t [ x ])( x j E t [ x j ])]. When P t zero except for a ngle tate t, we ntroduce E[ x t ] a a ynonym for E t [ x] whch make the dtrbuton explct. Gven an MDP, we defne the lmted horzon value functon for a gven polcy a n V π ( t ) = γ τ t E [ r (, π ( ))] where the probablty denty at tme t zero except for τ τ τ τ = t tate t. Alo gven a polcy, we defne two functon, the mean µ π () and covarance matrx ν π () of the next tate. Thu, µ π ( t ) = E[ t + 1 t ] and ν π ( t ) = E[ ( t + 1 µ π ( t ))( t + 1 µ π ( t )) T t ]. In polcy earch, we have a fxed et of polce Π and we try to fnd one that reult n a value functon wth hgh value. We tranform the tochatc MDP M to a determntc one M' = S', 0 ', A', f', r', γ' a follow. A tate n the new MDP an ordered par contng of a tate from S and a covarance matrx, denoted (, Σ ). The new ntal tate 0 ' = ( E D [ ], cov D [ ] ). The new acton pace the et of all poble polce for M,that A' = { ππ: A S}. The tate tranton probablte are replaced wth a (determntc) tate tranton functon f' ( ' t, a' t ), whch gve the unque ucceor tate that reult from takng acton a' t = π n tate ' t = ( t, Σ t ). We et f' ( ' t, a' t ) = f' ( t, Σ t, π) = ( µ π ( t ), ν π ( t ) + ( µ π )Σ t ( µ π ) T ). 1 The reward r' (, Σ, π) = r() + --tr where denote the matrx of econd dervatve of r wth repect to each tate varable. Fnally, γ' = γ. The trength of the method come from the theorem below, whch tate that the ( r () Σ) r j j () above tranform approxmately capture the dynamc of the orgnal probabltc MDP to the extent that the orgnal dynamc are mooth. The frt theorem bound the error n approxmatng tate, the econd n covarance, the thrd n reward and the fourth n value. Theorem 1 Fx a tme t, a polcy π, and a dtrbuton of tate P t. Chooe M µ and M The Eental Dynamc Algorthm: Eental Reult p. 4
6 n uch that µπ (), < M, and, where j k µ µ π ( t ) < M cov t ( t, t ) F < M ε t jk,, = 1 F denote the Frobenu norm. Let t be gven, and defne t + 1 = µ π ( t ), = E t [ t ] t and ε t + 1 = E t [ t + 1 ] t + 1. Then ε t + 1 < ( ε t + M µ ) 3. --M ε t Theorem Suppoe M ν and M are choen o that, ν , j () < M, k ν E t [ t E t [ t ] k ] F < M for k = 1,, 3, 4, t + 1 = µ π ( t ) < M and all the condton of Theorem 1. Let Σ t be gven, and defne Σ t + 1 =, ( t ) + ( µ ( t )) T Σ t ( µ j ( t )). Let ε t Σ = cov t ( t, t ) Σ t, mlarly for ε Σ t + 1. Then, j ν j n, j, k = 1 ε Σ t + 1 F ( ε Σ t F + ε t + M µ + M ν )M ( 10 + O( ε t )) n 3 r n Theorem 3 Suppoe (), and j k jk,, r, < M r ( t ) = 1 j, < M j = 1 r( t ) < M and the condton of the prevou two theorem. Let ε r t = E t [ r ( t )] r' ( t ). Then E t [ r ( t )] = r' ( t ) ε r 1 + t = r t ( ) + --tr( r Σ t) + ε r where t j ε r t < ( ε Σ t F + ε t + M r ) M + O ( ε t ) Theorem 4 Fx a tme t and a polcy π, and a dtrbuton of tate P t. Let t and Σ t be gven, and defne τ and Σ τ for τ = t + 1 t + n recurvely a n theorem 1 and above. Let M εr be an upper bound for εr τ for all τ [ t, t + n]. Then under the condton of the above three theorem, E[ V ( t )] = V' ( ) + εv where. t t ε V 1 γ t < M n γ εr Proof: Frt, ome prelmnare. In the frt three theorem, whch deal only wth a ngle tranton and a ngle dtrbuton of tate at tme t, namely P t, let x = E Pt [ x] for any random varable x. Note that for any vector x and quare matrce A and B, x T Ax = tr( Axx ( T )) where tr(.) denote the trace of a matrx, tr( AB) A F B F, and xx T F = x. In the tatement of theorem, E t [ ( t t ) 3 ] a three dmenonal matrx whoe, j, k element E t [ ( t t )( j t j t )( k t k t ) ]. Smlarly, E t [ ( t t ) 4 ] a four dmenonal matrx, and f all of t element are fnte, then the lower power mut alo be fnte. The Frobenu norm of uch matrce mply the quare root of the umofthequareofalltherelement. Alo,fa, b, c & d are real number that are greater than zero, then ab + cd < ( a + c) ( b + d). Note that, nce µ π a vector valued functon, µ π () a matrx. Snce µ π, the th component of µ π, a real valued functon, µ π () n. Becaue ν() a matrx, ν, j (). Let µπ () x denote the matrx of econd partal dervatve of µ π, evaluated at x n j k. For any, let 1 = t, = t t and = 1 + = t. The Eental Dynamc Algorthm: Eental Reult p. 5
7 Thu, E Pt [ ] = and E t T T T T [ ] E t [ 1 1 ] + Σ t + Σ t ε Σ T = = = + t +. Note that = ε t..e. Proof of Theorem 1: Expand µ π () form of the remander, namely ung a frt order Taylor ere wth the Lagrange µ π () µ π ( t ) µ π ( t ) T 1 ( ) -- ( t ), (3) T = + + µπ () x ( t ) j k µ π () = µ π ( t ) + µ π ( t ) T T µπ () x j k for ome x on the lne jonng and t. Then E Pt [ t + 1 ] t + 1 = E Pt [ µ π ( t )] µ π ( t ) µ π ( t ) µ π ( t ) T 1 T = tr( µπ () x ( Σ t + )) µ π ( t ) j k (4) So M ε 1 < t + --M. µ ( M + ε t ) < ( ε t + M µ ) M ( M + ε t ) ε t + 1 Proof of Theorem : Let M k ' = E t [ ( t t ) k ] F. By the mean value theorem, ν j, () = ν, j ( t ) + ν, j () x for ome x on the lne jonng and t. Alo, ν j, ( t ) = E[ t + 1 j t + 1 t ] µ ( t )µ j ( t ) o that cov Pt ( t + 1, j t + 1 ) = E[ t + 1 j t + 1 ] t + 1 = j t + 1 = E Pt [ E[ t + 1 j t + 1 t ]] t + 1 j t + 1 ν, j ( ) + E t Pt [ ν, j () x ] + E Pt [ µ ( t )µ j ( t )] t + 1 j t + 1. (5) The econd term an error term, call t ε' j,. We have ε' < M ν M 1 '. For the thrd term, we expand both µ and µ j ung Eq. (4) and multplyng out the term, obtanng E Pt [ µ ( t )µ j ( t )] = µ ( t )µ j ( t ) + µ ( t ) µ j ( t ) T + µ j ( t ) µ ( t ) T µ ( t ) T Σ t ε Σ T + ( + t + ) µ j ( t ) µ ( t ) T E Pt T µπ j () x µ k l j ( t ) T E Pt T µπ () x k l µ ( t ) T E Pt T µπ j () x µ k l j ( t ) T E Pt T µπ () x k l E 4 Pt T µπ x k l () T k l µπ j () x ε'', j All term other than the frt and the one nvolvng Σ t. That, are error term, call ther um E Pt [ µ ( t )µ j ( t )] = µ ( t )µ j ( t ) + µ ( t ) T Σ t µ j ( t ) + ε'' j, The Eental Dynamc Algorthm: Eental Reult p. 6
8 o that where Latly let ε ''' = µ ( t )µ j ( t ) t + 1 j t + 1. By Theorem 1, ε''' < ε t t 1 Subttutng nto Eq. (5), we obtan: Each term ha at leat one of the mall bound ε t, ε Σ t F, Mµ or M ν. Ung the nequalty from the prelmnare, we can factor them out. The four M k ' are bounded by M + O( ε t ), a can be hown ung the bnomal theorem, e.g. 3 = E t [ 1 ] + 3 E t [ 1 ] + 3 Et [ 1 ] = E t [ 1 ] + O( ε t ). Proof of Theorem 3: Expand r() ung a econd order Taylor ere wth the Lagrange form of the remander, namely Call the lat term ε'. Thu, and ε Σ t + 1 ε'' < t 1 + µ ( t ) ε t + µ ( t ) ( ε t + ε Σ t F ) 1 + t + 1 M µ M ' + µ ( t ) M µ M 3 ' + --M 4 µ M4 ' ε''' j = µ ( t )µ j ( t ) t + 1 j t + 1 = µ ( t )µ j ( t ) ( µ ( t ) + ε t )( µ j ( t ) + ε j t ) = µ ( t )ε t µ j ( t )ε t ε t ε j t + + ε t cov Pt ( t + 1, j t + 1 ) = ν, j ( t ) + ε', j + µ ( t ) T Σ t µ j ( t ) + ε'' j, + ε''' j, Σ = ε' + ε'' + ε''' and ε t + 1 F M ν M 1 ' M ε t M ε t ε Σ 1 < + + ( + t ) + MM µ M ' + MM µ M 3 ' + --M 4 µ M4 ' + ( ε t + M µ ) 3 --M ε t M + ( εt + M µ ) 3 --M ε t 3 E t [ 1 + ] E t [( 1 + ) 3 ] r () r t ( ) r( t ) T 1 -- T = + + r ( t ) j, j, k = 1 ε r t r( t ) ε 1 < t + -- ( ε Σ t F + ε t )M M r M 3 ' ( ε t + ε Σ t F + M r ) M 1 --M 1 --M ε 1 < + + t + --M 6 3 '. Proof of Theorem 4: n 3 r () x j k. (6) j k E t [ r ()] r t ( ) r( t ) T 1 --tr r ( t ) Σ t ε Σ T = + + ( ( + t + )) + E t [ ε' ] j = r' ( t ) + ε r t The Eental Dynamc Algorthm: Eental Reult p. 7
9 So, n ε V t γ τ t n < M εr M εr γ τ t 1 γ = = M εr n γ τ = t τ = t The above theorem tate that a long a ε t, ε Σ t F, Mµ, M ν and M r are mall and M fnte, and gven a good etmate of the mean and covarance of the tate at ome tme, the tranformed MDP wll reult n good etmate at later tme, and hence the reward and value functon wll alo be good etmate. Note that no partcular dtrbuton of tate aumed, only that, eentally, the frt four moment are bounded at every tme. The mot unuual condton are that the reward r be roughly quadratc, and that the value functon nclude only a lmted number of future reward. Th motvate the ue of hapng reward. 4 Experment n = τ = t n = γ τ t τ = t E[ V ( t )] γ τ t E τ [ r ( τ )] ( r' ( τ ) + ε r τ ) The code ued for all experment n th paper avalable from martn/reearch.html. The eental dynamc algorthm wa appled to Randløv and Altrøm bcycle rdng tak [8], wth the objectve of rdng a bcycle to a goal 1km away. The fve tate varable were mply the lean angle, the handlebar angle, ther tme dervatve, and the angle to the goal. The two acton were the torque to apply to the handlebar and the horzontal dplacement of the rder center of ma from the bcycle center lne. The tochatcty of tate tranton came from a unform random number added to the rder dplacement. If the lean angle exceeded /15, the bcycle fell over and the run termnated. If the varance of the tate not too large at every tme tep, then the varance term n the tranformed reward can mply be condered another form of error, and only µ need be etmated. Th wa done here. A contnuou tme formulaton wa ued where, ntead of etmatng the value of the tate varable at a next tme, ther dervatve were etmated. The model wa of the form where ϕ(, a) wa a vector of feature and w wa a vector of weght. The feature were mply the tate and acton varable themelve. The dervatve of each tate varable wa etmated ung gradent decent on w wth the error meaure err = w ϕ(, a) and a learnng rate of 1.0. Th error meaure wa found to work better than the more tradtonal quared error. The quared error mnmzed by the mean of the oberved value, wherea the abolute value mnmzed by the medan [7]. The medan a more robut etmate of central tendency,.e. le uceptble to outler, and therefore may be a better choce n many practcal tuaton. Model etmaton wa done onlne, multaneou wth polcy earch. In the contnuou formulaton, the value functon the tme ntegral of the reward tme the dcount factor. The future tate wa etmated ung Euler ntegraton [7]. Whle the bcycle mulator alo ued Euler ntegraton, thee choce were unrelated. In fact, t = 0.01 for the bcycle mulator and for ntegratng the etmated reward. It wa ntegrated for 30 tme tep. n τ = t ε τ r = V' ( t ) + γ τ t = µ w t (, a) = w ϕ(, a) The Eental Dynamc Algorthm: Eental Reult p. 8
10 length of epode (ec) tranng tme (ec) mulated tme (ec) Fgure : The left graph how length of epode v. tranng tme for 10 run. The dahed lne ndcate the optmal polcy. Stable rdng wa acheved wthn 00 mulated econd. The rght graph how angle to goal v. tme for a ngle epode tartng after 3000 mulated econd of tranng. angle to goal (radan) The hapng reward wa the quare of the angle to goal plu 10 tme the quare of the lean angle. The polcy wa a weghted um of feature, wth a mall Gauan added for exploraton, π() = θ ϕ() + N( 0, 0.05). The feature were mply the tate varable themelve. When the model poor or the polcy parameter are far from a local optmum, V θ can be qute large, reultng n a large gradent decent tep whch may overhoot t regon of applcablty. Th can be addreed by reducng the learnng rate, but then learnng become ntermnably low. Thu, the gradent decent rule wa modfed to = α Near an optmum, when V θ «β, th reduce to θ V θ t ( β + V θ ) the uual rule wth a learnng rate of /. In thexperment, =0.01and =1.0. A graph of epode tme v. learnng tme hown n Fgure 1. After fallng over between 40 and 60 tme, the controller wa able to rde to the goal or the tme lmt wthout fallng over. After a ngle uch epode, t contently rode drectly to the goal n a near mnmum amount of tme. The reultng polcy wa eentally an optmal polcy. 5 Dcuon For learnng and plannng n complex world wth contnuou, hgh dmenonal tate and acton pace, the goal not o much to converge on a perfect oluton, but to fnd a good oluton wthn a reaonable tme. Such problem often ue a hapng reward to accelerate learnng. For a large cla of uch problem, th paper propoe approxmatng the problem dynamc n uch a way that the mean and covarance of the future tate can be etmated from the oberved current tate. We have hown that, under certan condton, the reward n the approxmate MDP are cloe to thoe n the orgnal, wth an error that grow boundedly a tme ncreae. Thu, f the reward are only ummed for a lmted number of tep ahead, the reultng value wll approxmate the value of the orgnal ytem. Learnng n th tranformed problem conderably eaer than n the orgnal, and both model etmaton and polcy earch can be acheved onlne. The mulaton of bcycle rdng a good example of a problem where the value functon complex and hard to approxmate, yet mple polce produce near optmal oluton. Ung a tradtonal value functon approxmaton approach, Randløv needed to augment the tate wth the econd dervatve of the lean angle ( Ω ) and provde hapng reward [8]. The reultng algorthm took 1700 epode to rde tably, and 400 epode to get to the goal for the frt tme. The reultng polce tended to rde n crcle and prece toward the goal, rdng roughly 7km to get to a goal 1km away. In contrat, when the acton a weghted um of (very mple) feature, random earch can fnd near optmal polce. Th wa teted expermentally; 0.55% of random polce contently reached the goal when Ω wa ncluded n the tate, and 0.30% dd The Eental Dynamc Algorthm: Eental Reult p. 9
11 when t wan t. 1 What more, over half of thee polce had a path length wthn 1% of the bet reported oluton. Polce that rode tably but not to the goal were obtaned 0.89% and 0.4% of the tme repectvely. Thu, a random earch of polce need only a few hundred epode to fnd a near optmal polcy. The eental dynamc algorthm contently fnd uch near optmal polce, and the author aware of only one other algorthm whch doe, the PEGASUS algorthm of [5]. The experment n th paper took 40 to 60 epode to rde tably, that, to the goal or untl the tme lmt wthout fallng over. After a ngle uch epode, the polcy contently rode drectly to the goal n a near mnmum amount of tme. In contrat, PEGASUS ued at leat 450 epode to evaluate each polcy. One reaonable ntal polcy to alway apply zero torque to the handlebar and zero dplacement of body poton. Th fall over n an average of 1.74 econd, o PEGASUS would need 780 mulated econd to evaluate uch a polcy. The eental dynamc algorthm learn to rde tably n approxmately 00 mulated econd, and n the econd 780 mulated econd wll have found a near optmal polcy. Th wa acheved ung very lttle doman knowledge. Ω wa not needed n the tate, and the feature were trval. The eental dynamc algorthm can be ued for onlne learnng, or can learn from trajectore provded by other polce, that, t can learn by watchng. In the bcycle experment, the eental dynamc algorthm needed many tme more computng power per mulated econd than PEGASUS, although t wa tll fater than real tme on a 1GHz moble Pentum III, and therefore could preumably be ued for learnng on a real bcycle. The experment n ecton 4 added the quare of the lean angle to the hapng reward, but dd not ue any nformaton about dynamc (.e. velocte or acceleraton), nor about the handlebar. In fact, the hapng reward mply correponded to the common ene advce tay uprght and head toward the goal. However, thee advantage do not come wthout drawback. The eental dynamc algorthm only doe polcy earch n an approxmaton to the orgnal MDP, o an optmal polcy for th approxmate MDP won t, n general, be optmal for the orgnal MDP. The theorem n ecton 3 gve bound on th error, and for bcycle rdng th error mall. Concluon Th paper ha preented an algorthm for onlne polcy earch n MDP wth contnuou tate and acton pace. A tochatc MDP tranformed to a determntc MDP whch capture the eental dynamc of the orgnal. Polcy earch then be performed n th tranformed MDP. Error bound were gven and the technque wa appled to a mulaton of bcycle rdng. The algorthm found near optmal oluton wth le doman knowledge and order of magntude le tme than extng technque. Acknowledgement The author would lke Lele Kaelblng, Al Rahm and epecally Kevn Murphy for enlghtenng comment and dcuon of th work. 1. Our experment contaned two condton, namely wth or wthout Ω n the tate, reultng n 5 or 6 tate varable. The feature were the tate varable themelve, tate and acton varable were caled to roughly the range [-1, +1], weght were choen unformly from [-, +], and each polcy wa run 30 tme. In 100,000 polce per condton, 549 (0.55%) reached the goal all 30 tme when Ω wa ncluded, and 300 (0.30%) when t wan t. For uch polce, the medan rdng dtance wa 1009m and 1008m repectvely. The code ued avalable on the web te.. [5] evaluated a gven polcy by mulatng t 30 tme. The dervatve wth repect to each of the 15 weght wa evaluated ung fnte dfference, requrng another 30 mulaton per weght, for a total of = 450 mulaton. Often, the tartng weght at a gven tage were evaluated durng the prevou tage, o only the dervatve need to be calculated. The Eental Dynamc Algorthm: Eental Reult p. 10
12 Reference [1] Colombett, M. & Dorgo, M. (1994) Tranng agent to perform equental behavor. In Adaptve Behavor, (3), pp [] Forbe, J., & Andre, D. (000) Real-tme renforcement learnng n contnuou doman. In AAAI Sprng Sympoum on Real-Tme Autonomou Sytem. [3] Matarc, M.J. (1994) Reward functon for accelerated learnng. In W.W. Cohen and H. Hrch (ed.) Proc. 11th Intl. Conf. on Machne Learnng. [4] Ng, A. et al. (1999) Polcy nvarance under reward tranformaton: Theory and applcaton to reward hapng. In Proc. 16th Intl. Conf. on Machne Learnng, pp [5] Ng, A. & Jordan, M. (000) PEGASUS: A polcy earch method for large MDP and POMDP. In Uncertanty n Artfcal Intellgence (UAI), Proc. of the Sxteenth Conf., pp [6] Pehkn, L. et al. (000) Learnng to Cooperate va Polcy Search. In Uncertanty n Artfcal Intellgence (UAI), Proc. of the Sxteenth Conf., pp [7] Pre,W.H.etal.(199) Numercal Recpe: The Art of Scentfc Computng. Cambrdge Unverty Pre. [8] Randløv, J. (000) Shapng n Renforcement Learnng by Changng the Phyc of the Problem. In Proc. Intl. Conf. on Machne Learnng. pp [9] Santamaría, J.C. et al. (1998) Experment wth Renforcement Learnng n Problem wth Contnuou State and Acton Space. In Adaptve Behavor, 6(), 1998 [10] Stren, M. J. A. & Moore, A.W. (00) Polcy Search ung Pared Comparon. In Journal of Machne Learnng Reearch, v. 3, pp The Eental Dynamc Algorthm: Eental Reult p. 11
Additional File 1 - Detailed explanation of the expression level CPD
Addtonal Fle - Detaled explanaton of the expreon level CPD A mentoned n the man text, the man CPD for the uterng model cont of two ndvdual factor: P( level gen P( level gen P ( level gen 2 (.).. CPD factor
More informationHarmonic oscillator approximation
armonc ocllator approxmaton armonc ocllator approxmaton Euaton to be olved We are fndng a mnmum of the functon under the retrcton where W P, P,..., P, Q, Q,..., Q P, P,..., P, Q, Q,..., Q lnwgner functon
More informationBatch RL Via Least Squares Policy Iteration
Batch RL Va Leat Square Polcy Iteraton Alan Fern * Baed n part on lde by Ronald Parr Overvew Motvaton LSPI Dervaton from LSTD Expermental reult Onlne veru Batch RL Onlne RL: ntegrate data collecton and
More informationImprovements on Waring s Problem
Improvement on Warng Problem L An-Png Bejng, PR Chna apl@nacom Abtract By a new recurve algorthm for the auxlary equaton, n th paper, we wll gve ome mprovement for Warng problem Keyword: Warng Problem,
More informationScattering of two identical particles in the center-of. of-mass frame. (b)
Lecture # November 5 Scatterng of two dentcal partcle Relatvtc Quantum Mechanc: The Klen-Gordon equaton Interpretaton of the Klen-Gordon equaton The Drac equaton Drac repreentaton for the matrce α and
More informationBatch Reinforcement Learning
Batch Renforcement Learnng Alan Fern * Baed n part on lde by Ronald Parr Overvew What batch renforcement learnng? Leat Square Polcy Iteraton Ftted Q-teraton Batch DQN Onlne veru Batch RL Onlne RL: ntegrate
More informationChapter 11. Supplemental Text Material. The method of steepest ascent can be derived as follows. Suppose that we have fit a firstorder
S-. The Method of Steepet cent Chapter. Supplemental Text Materal The method of teepet acent can be derved a follow. Suppoe that we have ft a frtorder model y = β + β x and we wh to ue th model to determne
More informationChapter 6 The Effect of the GPS Systematic Errors on Deformation Parameters
Chapter 6 The Effect of the GPS Sytematc Error on Deformaton Parameter 6.. General Beutler et al., (988) dd the frt comprehenve tudy on the GPS ytematc error. Baed on a geometrc approach and aumng a unform
More informationEstimation of Finite Population Total under PPS Sampling in Presence of Extra Auxiliary Information
Internatonal Journal of Stattc and Analy. ISSN 2248-9959 Volume 6, Number 1 (2016), pp. 9-16 Reearch Inda Publcaton http://www.rpublcaton.com Etmaton of Fnte Populaton Total under PPS Samplng n Preence
More informationMULTIPLE REGRESSION ANALYSIS For the Case of Two Regressors
MULTIPLE REGRESSION ANALYSIS For the Cae of Two Regreor In the followng note, leat-quare etmaton developed for multple regreon problem wth two eplanator varable, here called regreor (uch a n the Fat Food
More informationSpecification -- Assumptions of the Simple Classical Linear Regression Model (CLRM) 1. Introduction
ECONOMICS 35* -- NOTE ECON 35* -- NOTE Specfcaton -- Aumpton of the Smple Clacal Lnear Regreon Model (CLRM). Introducton CLRM tand for the Clacal Lnear Regreon Model. The CLRM alo known a the tandard lnear
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationIntroduction to Interfacial Segregation. Xiaozhe Zhang 10/02/2015
Introducton to Interfacal Segregaton Xaozhe Zhang 10/02/2015 Interfacal egregaton Segregaton n materal refer to the enrchment of a materal conttuent at a free urface or an nternal nterface of a materal.
More informationDeep Reinforcement Learning with Experience Replay Based on SARSA
Deep Renforcement Learnng wth Experence Replay Baed on SARSA Dongbn Zhao, Hatao Wang, Kun Shao and Yuanheng Zhu Key Laboratory of Management and Control for Complex Sytem Inttute of Automaton Chnee Academy
More informationStart Point and Trajectory Analysis for the Minimal Time System Design Algorithm
Start Pont and Trajectory Analy for the Mnmal Tme Sytem Degn Algorthm ALEXANDER ZEMLIAK, PEDRO MIRANDA Department of Phyc and Mathematc Puebla Autonomou Unverty Av San Claudo /n, Puebla, 757 MEXICO Abtract:
More informationTwo Approaches to Proving. Goldbach s Conjecture
Two Approache to Provng Goldbach Conecture By Bernard Farley Adved By Charle Parry May 3 rd 5 A Bref Introducton to Goldbach Conecture In 74 Goldbach made h mot famou contrbuton n mathematc wth the conecture
More informationThe multivariate Gaussian probability density function for random vector X (X 1,,X ) T. diagonal term of, denoted
Appendx Proof of heorem he multvarate Gauan probablty denty functon for random vector X (X,,X ) px exp / / x x mean and varance equal to the th dagonal term of, denoted he margnal dtrbuton of X Gauan wth
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationVariable Structure Control ~ Basics
Varable Structure Control ~ Bac Harry G. Kwatny Department of Mechancal Engneerng & Mechanc Drexel Unverty Outlne A prelmnary example VS ytem, ldng mode, reachng Bac of dcontnuou ytem Example: underea
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationSmall signal analysis
Small gnal analy. ntroducton Let u conder the crcut hown n Fg., where the nonlnear retor decrbed by the equaton g v havng graphcal repreentaton hown n Fg.. ( G (t G v(t v Fg. Fg. a D current ource wherea
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationAPPROXIMATE FUZZY REASONING BASED ON INTERPOLATION IN THE VAGUE ENVIRONMENT OF THE FUZZY RULEBASE AS A PRACTICAL ALTERNATIVE OF THE CLASSICAL CRI
Kovác, Sz., Kóczy, L.T.: Approxmate Fuzzy Reaonng Baed on Interpolaton n the Vague Envronment of the Fuzzy Rulebae a a Practcal Alternatve of the Clacal CRI, Proceedng of the 7 th Internatonal Fuzzy Sytem
More informationModule 5. Cables and Arches. Version 2 CE IIT, Kharagpur
odule 5 Cable and Arche Veron CE IIT, Kharagpur Leon 33 Two-nged Arch Veron CE IIT, Kharagpur Intructonal Objectve: After readng th chapter the tudent wll be able to 1. Compute horzontal reacton n two-hnged
More informationOptimal inference of sameness Supporting information
Optmal nference of amene Supportng nformaton Content Decon rule of the optmal oberver.... Unequal relablte.... Equal relablte... 5 Repone probablte of the optmal oberver... 6. Equal relablte... 6. Unequal
More informationAP Statistics Ch 3 Examining Relationships
Introducton To tud relatonhp between varable, we mut meaure the varable on the ame group of ndvdual. If we thnk a varable ma eplan or even caue change n another varable, then the eplanator varable and
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationPythagorean triples. Leen Noordzij.
Pythagorean trple. Leen Noordz Dr.l.noordz@leennoordz.nl www.leennoordz.me Content A Roadmap for generatng Pythagorean Trple.... Pythagorean Trple.... 3 Dcuon Concluon.... 5 A Roadmap for generatng Pythagorean
More informationNUMERICAL DIFFERENTIATION
NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the
More informationMethod Of Fundamental Solutions For Modeling Electromagnetic Wave Scattering Problems
Internatonal Workhop on MehFree Method 003 1 Method Of Fundamental Soluton For Modelng lectromagnetc Wave Scatterng Problem Der-Lang Young (1) and Jhh-We Ruan (1) Abtract: In th paper we attempt to contruct
More informationarxiv: v1 [cs.gt] 15 Jan 2019
Model and algorthm for tme-content rk-aware Markov game Wenje Huang, Pham Vet Ha and Wllam B. Hakell January 16, 2019 arxv:1901.04882v1 [c.gt] 15 Jan 2019 Abtract In th paper, we propoe a model for non-cooperatve
More informationIntroduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:
CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and
More informationImprovements on Waring s Problem
Imrovement on Warng Problem L An-Png Bejng 85, PR Chna al@nacom Abtract By a new recurve algorthm for the auxlary equaton, n th aer, we wll gve ome mrovement for Warng roblem Keyword: Warng Problem, Hardy-Lttlewood
More informationSolution Methods for Time-indexed MIP Models for Chemical Production Scheduling
Ian Davd Lockhart Bogle and Mchael Farweather (Edtor), Proceedng of the 22nd European Sympoum on Computer Aded Proce Engneerng, 17-2 June 212, London. 212 Elever B.V. All rght reerved. Soluton Method for
More informationTeam. Outline. Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference
Team Stattc and Art: Samplng, Repone Error, Mxed Model, Mng Data, and nference Ed Stanek Unverty of Maachuett- Amhert, USA 9/5/8 9/5/8 Outlne. Example: Doe-repone Model n Toxcology. ow to Predct Realzed
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationNumerical Heat and Mass Transfer
Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationErratum: A Generalized Path Integral Control Approach to Reinforcement Learning
Journal of Machne Learnng Research 00-9 Submtted /0; Publshed 7/ Erratum: A Generalzed Path Integral Control Approach to Renforcement Learnng Evangelos ATheodorou Jonas Buchl Stefan Schaal Department of
More informationInformation Acquisition in Global Games of Regime Change (Online Appendix)
Informaton Acquton n Global Game of Regme Change (Onlne Appendx) Mchal Szkup and Iabel Trevno Augut 4, 05 Introducton Th appendx contan the proof of all the ntermedate reult that have been omtted from
More informationStatistical Properties of the OLS Coefficient Estimators. 1. Introduction
ECOOMICS 35* -- OTE 4 ECO 35* -- OTE 4 Stattcal Properte of the OLS Coeffcent Etmator Introducton We derved n ote the OLS (Ordnary Leat Square etmator ˆβ j (j, of the regreon coeffcent βj (j, n the mple
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationRoot Locus Techniques
Root Locu Technque ELEC 32 Cloed-Loop Control The control nput u t ynthezed baed on the a pror knowledge of the ytem plant, the reference nput r t, and the error gnal, e t The control ytem meaure the output,
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More informationChapter 7 Four-Wave Mixing phenomena
Chapter 7 Four-Wave Mx phenomena We wll dcu n th chapter the general nonlnear optcal procee wth four nteract electromagnetc wave n a NLO medum. Frt note that FWM procee are allowed n all meda (nveron or
More informationKinetic-Energy Density-Functional Theory on a Lattice
h an open acce artcle publhed under an ACS AuthorChoce Lcene, whch permt copyng and redtrbuton of the artcle or any adaptaton for non-commercal purpoe. Artcle Cte h: J. Chem. heory Comput. 08, 4, 407 4087
More information4DVAR, according to the name, is a four-dimensional variational method.
4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More informationA METHOD TO REPRESENT THE SEMANTIC DESCRIPTION OF A WEB SERVICE BASED ON COMPLEXITY FUNCTIONS
UPB Sc Bull, Sere A, Vol 77, I, 5 ISSN 3-77 A METHOD TO REPRESENT THE SEMANTIC DESCRIPTION OF A WEB SERVICE BASED ON COMPLEXITY FUNCTIONS Andre-Hora MOGOS, Adna Magda FLOREA Semantc web ervce repreent
More informationThis appendix presents the derivations and proofs omitted from the main text.
Onlne Appendx A Appendx: Omtted Dervaton and Proof Th appendx preent the dervaton and proof omtted from the man text A Omtted dervaton n Secton Mot of the analy provded n the man text Here, we formally
More informationIntroduction. Modeling Data. Approach. Quality of Fit. Likelihood. Probabilistic Approach
Introducton Modelng Data Gven a et of obervaton, we wh to ft a mathematcal model Model deend on adutable arameter traght lne: m + c n Polnomal: a + a + a + L+ a n Choce of model deend uon roblem Aroach
More informationPhysics 111. CQ1: springs. con t. Aristocrat at a fixed angle. Wednesday, 8-9 pm in NSC 118/119 Sunday, 6:30-8 pm in CCLIR 468.
c Announcement day, ober 8, 004 Ch 8: Ch 10: Work done by orce at an angle Power Rotatonal Knematc angular dplacement angular velocty angular acceleraton Wedneday, 8-9 pm n NSC 118/119 Sunday, 6:30-8 pm
More informationOne-sided finite-difference approximations suitable for use with Richardson extrapolation
Journal of Computatonal Physcs 219 (2006) 13 20 Short note One-sded fnte-dfference approxmatons sutable for use wth Rchardson extrapolaton Kumar Rahul, S.N. Bhattacharyya * Department of Mechancal Engneerng,
More informationEstimation of a proportion under a certain two-stage sampling design
Etmaton of a roorton under a certan two-tage amng degn Danutė Kraavcatė nttute of athematc and nformatc Lthuana Stattc Lthuana Lthuana e-ma: raav@tmt Abtract The am of th aer to demontrate wth exame that
More informationPhysics 120. Exam #1. April 15, 2011
Phyc 120 Exam #1 Aprl 15, 2011 Name Multple Choce /16 Problem #1 /28 Problem #2 /28 Problem #3 /28 Total /100 PartI:Multple Choce:Crclethebetanwertoeachqueton.Anyothermark wllnotbegvencredt.eachmultple
More informationCHAPTER 9 LINEAR MOMENTUM, IMPULSE AND COLLISIONS
CHAPTER 9 LINEAR MOMENTUM, IMPULSE AND COLLISIONS 103 Phy 1 9.1 Lnear Momentum The prncple o energy conervaton can be ued to olve problem that are harder to olve jut ung Newton law. It ued to decrbe moton
More informationConfidence intervals for the difference and the ratio of Lognormal means with bounded parameters
Songklanakarn J. Sc. Technol. 37 () 3-40 Mar.-Apr. 05 http://www.jt.pu.ac.th Orgnal Artcle Confdence nterval for the dfference and the rato of Lognormal mean wth bounded parameter Sa-aat Nwtpong* Department
More informationand decompose in cycles of length two
Permutaton of Proceedng of the Natona Conference On Undergraduate Reearch (NCUR) 006 Domncan Unverty of Caforna San Rafae, Caforna Apr - 4, 007 that are gven by bnoma and decompoe n cyce of ength two Yeena
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationCanonical transformations
Canoncal transformatons November 23, 2014 Recall that we have defned a symplectc transformaton to be any lnear transformaton M A B leavng the symplectc form nvarant, Ω AB M A CM B DΩ CD Coordnate transformatons,
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationNote on EM-training of IBM-model 1
Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationResearch Article Runge-Kutta Type Methods for Directly Solving Special Fourth-Order Ordinary Differential Equations
Hndaw Publhng Corporaton Mathematcal Problem n Engneerng Volume 205, Artcle ID 893763, page http://dx.do.org/0.55/205/893763 Reearch Artcle Runge-Kutta Type Method for Drectly Solvng Specal Fourth-Order
More informationDiscrete Simultaneous Perturbation Stochastic Approximation on Loss Function with Noisy Measurements
0 Amercan Control Conference on O'Farrell Street San Francco CA USA June 9 - July 0 0 Dcrete Smultaneou Perturbaton Stochatc Approxmaton on Lo Functon wth Noy Meaurement Q Wang and Jame C Spall Abtract
More informationA NUMERICAL MODELING OF MAGNETIC FIELD PERTURBATED BY THE PRESENCE OF SCHIP S HULL
A NUMERCAL MODELNG OF MAGNETC FELD PERTURBATED BY THE PRESENCE OF SCHP S HULL M. Dennah* Z. Abd** * Laboratory Electromagnetc Sytem EMP BP b Ben-Aknoun 606 Alger Algera ** Electronc nttute USTHB Alger
More informationExtended Prigogine Theorem: Method for Universal Characterization of Complex System Evolution
Extended Prgogne Theorem: Method for Unveral Characterzaton of Complex Sytem Evoluton Sergey amenhchkov* Mocow State Unverty of M.V. Lomonoov, Phycal department, Rua, Mocow, Lennke Gory, 1/, 119991 Publhed
More informationForesighted Resource Reciprocation Strategies in P2P Networks
Foreghted Reource Recprocaton Stratege n PP Networ Hyunggon Par and Mhaela van der Schaar Electrcal Engneerng Department Unverty of Calforna Lo Angele (UCLA) Emal: {hgpar mhaela@ee.ucla.edu Abtract We
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationOn the SO 2 Problem in Thermal Power Plants. 2.Two-steps chemical absorption modeling
Internatonal Journal of Engneerng Reearch ISSN:39-689)(onlne),347-53(prnt) Volume No4, Iue No, pp : 557-56 Oct 5 On the SO Problem n Thermal Power Plant Two-tep chemcal aborpton modelng hr Boyadjev, P
More informationA Kernel Particle Filter Algorithm for Joint Tracking and Classification
A Kernel Partcle Flter Algorthm for Jont Tracng and Clafcaton Yunfe Guo Donglang Peng Inttute of Informaton and Control Automaton School Hangzhou Danz Unverty Hangzhou Chna gyf@hdueducn Huaje Chen Ane
More information728. Mechanical and electrical elements in reduction of vibrations
78. Mechancal and electrcal element n reducton of vbraton Katarzyna BIAŁAS The Slean Unverty of Technology, Faculty of Mechancal Engneerng Inttute of Engneerng Procee Automaton and Integrated Manufacturng
More informationTAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES
TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES SVANTE JANSON Abstract. We gve explct bounds for the tal probabltes for sums of ndependent geometrc or exponental varables, possbly wth dfferent
More informationLecture 21: Numerical methods for pricing American type derivatives
Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)
More informationA New Inverse Reliability Analysis Method Using MPP-Based Dimension Reduction Method (DRM)
roceedng of the ASME 007 Internatonal Degn Engneerng Techncal Conference & Computer and Informaton n Engneerng Conference IDETC/CIE 007 September 4-7, 007, La Vega, eada, USA DETC007-35098 A ew Inere Relablty
More informationMultilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata
Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,
More informationNON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS
IJRRAS 8 (3 September 011 www.arpapress.com/volumes/vol8issue3/ijrras_8_3_08.pdf NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS H.O. Bakodah Dept. of Mathematc
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More information2. SINGLE VS. MULTI POLARIZATION SAR DATA
. SINGLE VS. MULTI POLARIZATION SAR DATA.1 Scatterng Coeffcent v. Scatterng Matrx In the prevou chapter of th document, we dealt wth the decrpton and the characterzaton of electromagnetc wave. A t wa hown,
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationA New Virtual Indexing Method for Measuring Host Connection Degrees
A New Vrtual Indexng Method for Meaurng ot Connecton Degree Pnghu Wang, Xaohong Guan,, Webo Gong 3, and Don Towley 4 SKLMS Lab and MOE KLINNS Lab, X an Jaotong Unverty, X an, Chna Department of Automaton
More informationA new construction of 3-separable matrices via an improved decoding of Macula s construction
Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula
More informationCHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE
CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng
More informationPROBABILITY-CONSISTENT SCENARIO EARTHQUAKE AND ITS APPLICATION IN ESTIMATION OF GROUND MOTIONS
PROBABILITY-COSISTET SCEARIO EARTHQUAKE AD ITS APPLICATIO I ESTIATIO OF GROUD OTIOS Q-feng LUO SUARY Th paper preent a new defnton of probablty-content cenaro earthquae PCSE and an evaluaton method of
More informationDifference Equations
Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1
More informationLecture 14: Bandits with Budget Constraints
IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed
More informationNotes on Frequency Estimation in Data Streams
Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to
More informationSupervised Learning. Neural Networks and Back-Propagation Learning. Credit Assignment Problem. Feedforward Network. Adaptive System.
Part 7: Neura Networ & earnng /2/05 Superved earnng Neura Networ and Bac-Propagaton earnng Produce dered output for tranng nput Generaze reaonaby & appropratey to other nput Good exampe: pattern recognton
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationJoint Source Coding and Higher-Dimension Modulation
Jont Codng and Hgher-Dmenon Modulaton Tze C. Wong and Huck M. Kwon Electrcal Engneerng and Computer Scence Wchta State Unvert, Wchta, Kana 676, USA {tcwong; huck.kwon}@wchta.edu Abtract Th paper propoe
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More information4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA
4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected
More informationFirst Year Examination Department of Statistics, University of Florida
Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve
More informationTransfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system
Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng
More information