The Essential Dynamics Algorithm: Essential Results

Size: px
Start display at page:

Download "The Essential Dynamics Algorithm: Essential Results"

Transcription

1 @ MIT maachuett nttute of technology artfcal ntellgence laboratory The Eental Dynamc Algorthm: Eental Reult Martn C. Martn AI Memo May maachuett nttute of technology, cambrdge, ma 0139 ua

2 Abtract Th paper preent a novel algorthm for learnng n a cla of tochatc Markov decon procee (MDP) wth contnuou tate and acton pace that trade peed for accuracy. A tranform of the tochatc MDP nto a determntc one preented whch capture the eence of the orgnal dynamc, n a ene made prece. In th tranformed MDP, the calculaton of value greatly mplfed. The onlne algorthm etmate the model of the tranformed MDP and multaneouly doe polcy earch agant t. Bound on the error of th approxmaton are proven, and expermental reult n a bcycle rdng doman are preented. The algorthm learn near optmal polce n order of magntude fewer nteracton wth the tochatc MDP, ung le doman knowledge. All code ued n the experment avalable on the project web te. Th work wa funded by DARPA a part of the "Natural Takng of Robot Baed on Human Interacton Cue" project under contract number DABT C-1010.

3 1 Introducton There currently much nteret n the problem of learnng n tochatc Markov decon procee (MDP) wth contnuou tate and acton pace [, 9, 10]. For uch doman, epecally when the tate or acton pace are of hgh dmenon, the value and Q-functon may be qute complcated and dffcult to approxmate. However, there may be relatvely mple polce whch perform well. Th ha lead to recent nteret n polcy earch algorthm, n whch the renforcement gnal ued to modfy the polcy drectly [5, 6, 10]. For many problem, a potve reward only acheved at the end of a tak f the agent reache a goal tate. For complex problem, the probablty that an ntal, random polcy would reach uch a tate could be vanhngly mall. A wdely ued methodology to overcome th hapng [1, 3, 4, 8]. Shapng the ntroducton of mall reward to reward partal progre toward the goal. A hapng functon eae the problem of backng up reward, nce acton are rewarded or punhed ooner. When a polcy change, etmatng the reultng change n value can be dffcult, requrng the new polcy to nteract wth the MDP for many epode. In th paper we ntroduce a method of tranformng a tochatc MDP nto a determntc one. Under certan condton on the orgnal MDP, and gven a hapng reward of the proper form, the determntc MDP can be ued to etmate the value of any polcy wth repect to the orgnal MDP. Th lead to an onlne algorthm for polcy earch: multaneouly etmate the parameter of a model for the tranformed, determntc MDP, and ue th model to etmate both the value of a polcy and the gradent of that value wth repect to the polcy parameter. Then, ung thee etmate, perform gradent decent earch on the polcy parameter. Snce the tranformaton capture what mportant about the orgnal MDP for plannng, we call our method the eental dynamc algorthm. The next ecton gve an overvew of the technque, developng the ntuton behnd t. In ecton 3 we decrbe the mathematcal foundaton of the algorthm, ncludng bound on the dfference between value n the orgnal and tranformed MDP. Secton 4 decrbe an applcaton of th technque to learnng to rde a bcycle. The lat ecton dcue thee reult, comparng them to prevou work. On the bcycle rdng tak, gven the mulator, the only doman knowledge needed a hapng reward that decreae a lean angle ncreae, and a angle to goal ncreae. Compared to prevou work on th problem, a near optmal polcy found n dramatcally le mulated tme, and wth le doman knowledge. Overvew of the Eental Dynamc Algorthm In the eental dynamc algorthm we learn a model of how tate evolve wth tme, and then ue th model to compute the value of the current polcy. In addton, f the polcy and model are from a parameterzed famly, we can compute the gradent of the value wth repect to the parameter. In puttng th plan nto practce, one dffculty that tate tranton are tochatc, o that expected reward mut be computed. One way to compute them to generate many trajectore and average over them, but th can be very tme conumng. Intead we mght be tempted to etmate only the mean of the tate at each future tme, and ue the reward aocated wth that. However, we can do better. If the reward quadratc, the expected reward partcularly mple. Gven knowledge of the tate at tme t, wecan then talk about the dtrbuton of poble tate at ome later tme. For a gven dtrbuton of tate, let denote the expected tate. Then Er [ ()] = ( a ( ) + b ( ) + c)p () d = avar() + b ( ) + c = avar() + c where a, b & c depend on. (1) The Eental Dynamc Algorthm: Eental Reult p.

4 Suppoe the polcy depend on a vector of parameter θ. When nteractng wth the MDP, at every tme t after havng taken acton a t-1 n tate t-1 and arrvng n tate t : 1. µ ( t 1, a t 1 ) t. ν ( t 1, a t 1 ) ( t µ ( t 1, a t 1 )) t σ t Ṽ = = = t For every τ n t+1.. t+n: a. τ = µ ( τ 1, π( τ 1 )) b. σ τ = ν ( τ 1, π( τ 1 )) + σ τ 1 ( µ' π ( τ 1 )) c. r τ = 1 --r''( τ )σ τ d. Ṽ = Ṽ + γ τ t r τ 7. Update the polcy n the drecton that ncreae Ṽ : θ = θ + α----- Ṽ θ Fgure 1: The eental dynamc algorthm for a one dmenonal tate pace. The notaton fx ( ) a mean adjut the parameter that determne f to make f(x) cloertoa, e.g. by gradent decent. µ' the dervatve of µ (, π() ) wth repect to. π + r τ ( ) Thu, to calculate the expected reward, we don t need to know the full tate dtrbuton, but mply t mean and varance. Thu, our model hould decrbe how the mean and varance evolve over tme. If the tate tranton are mooth, they can be approxmated by a Taylor ere. Let π be the current polcy, and let µ π () denote the expected tate that reult from takng acton π() n tate. If denote the mean tate at tme t, and the varance, and f tate tranton were determntc, then to frt order we would have σ t t 1 + µ π ( t ) σ t + 1 dµ π ( d t ) σ t where µ π ' the dervatve of µ π wth repect to tate. For tochatc tate tranton, let ν π () be the varance of the tate that reult from takng acton π() n tate. It turn out that the varance at the next tme tep mply ν π () plu the tranformed varance from above, leadng to t + 1 µ π ( t ) dµ σ t + 1 ν π ( t ) + π ( d t ) σ t () Thu, we learn etmate µ and ν of µ and ν repectvely, ue Eq. () to etmate the mean and varance of future tate, and Eq. (1) to calculate the expected reward. The reultng algorthm, whch we call the expected dynamc algorthm, preented n Fgure 1. t The Eental Dynamc Algorthm: Eental Reult p. 3

5 The next ecton gve a formal dervaton of the algorthm, and prove error bound on the etmated tate, varance, reward and value for the general n-dmenonal cae, where the reward only approxmately quadratc. 3 Dervaton of the Eental Dynamc Algorthm A Markov Decon Proce (MDP) a tuple S, D, A, P, a, r, γ where: S a et of tate; D: S thental-tate dtrbuton; A a et of acton; P, a : S are the tranton probablte; r: S A thereward; and γ the dcount factor. Th paper concerned wth contnuou tate and acton pace, n partcular we aume S n = and A n a =. We ue ubcrpt to denote tme and upercrpt to denote component of vector and matrce. Thu, t denote the th component of the vector at tme t. A (determntc) polcy a mappng from a tate to the acton to be taken n that tate, π : S A. Gven a polcy and a dtrbuton P t of tate at tme t, uch a the ntal tate dtrbuton or the oberved tate, the dtrbuton of tate at future tme defned by the recurve relaton P τ + 1 () = P ', π( ' ) ()P τ ( ' ) d' for τ > t. Gven uch a dtrbuton, we can defne the expectaton and the covarance matrx of a random vector x wth S repect to t, whch we denote E t [ x] and cov t () x repectvely. Thu, E t [ x] = xp t ()x x d j, cov t and () x = E t [( x E t [ x ])( x j E t [ x j ])]. When P t zero except for a ngle tate t, we ntroduce E[ x t ] a a ynonym for E t [ x] whch make the dtrbuton explct. Gven an MDP, we defne the lmted horzon value functon for a gven polcy a n V π ( t ) = γ τ t E [ r (, π ( ))] where the probablty denty at tme t zero except for τ τ τ τ = t tate t. Alo gven a polcy, we defne two functon, the mean µ π () and covarance matrx ν π () of the next tate. Thu, µ π ( t ) = E[ t + 1 t ] and ν π ( t ) = E[ ( t + 1 µ π ( t ))( t + 1 µ π ( t )) T t ]. In polcy earch, we have a fxed et of polce Π and we try to fnd one that reult n a value functon wth hgh value. We tranform the tochatc MDP M to a determntc one M' = S', 0 ', A', f', r', γ' a follow. A tate n the new MDP an ordered par contng of a tate from S and a covarance matrx, denoted (, Σ ). The new ntal tate 0 ' = ( E D [ ], cov D [ ] ). The new acton pace the et of all poble polce for M,that A' = { ππ: A S}. The tate tranton probablte are replaced wth a (determntc) tate tranton functon f' ( ' t, a' t ), whch gve the unque ucceor tate that reult from takng acton a' t = π n tate ' t = ( t, Σ t ). We et f' ( ' t, a' t ) = f' ( t, Σ t, π) = ( µ π ( t ), ν π ( t ) + ( µ π )Σ t ( µ π ) T ). 1 The reward r' (, Σ, π) = r() + --tr where denote the matrx of econd dervatve of r wth repect to each tate varable. Fnally, γ' = γ. The trength of the method come from the theorem below, whch tate that the ( r () Σ) r j j () above tranform approxmately capture the dynamc of the orgnal probabltc MDP to the extent that the orgnal dynamc are mooth. The frt theorem bound the error n approxmatng tate, the econd n covarance, the thrd n reward and the fourth n value. Theorem 1 Fx a tme t, a polcy π, and a dtrbuton of tate P t. Chooe M µ and M The Eental Dynamc Algorthm: Eental Reult p. 4

6 n uch that µπ (), < M, and, where j k µ µ π ( t ) < M cov t ( t, t ) F < M ε t jk,, = 1 F denote the Frobenu norm. Let t be gven, and defne t + 1 = µ π ( t ), = E t [ t ] t and ε t + 1 = E t [ t + 1 ] t + 1. Then ε t + 1 < ( ε t + M µ ) 3. --M ε t Theorem Suppoe M ν and M are choen o that, ν , j () < M, k ν E t [ t E t [ t ] k ] F < M for k = 1,, 3, 4, t + 1 = µ π ( t ) < M and all the condton of Theorem 1. Let Σ t be gven, and defne Σ t + 1 =, ( t ) + ( µ ( t )) T Σ t ( µ j ( t )). Let ε t Σ = cov t ( t, t ) Σ t, mlarly for ε Σ t + 1. Then, j ν j n, j, k = 1 ε Σ t + 1 F ( ε Σ t F + ε t + M µ + M ν )M ( 10 + O( ε t )) n 3 r n Theorem 3 Suppoe (), and j k jk,, r, < M r ( t ) = 1 j, < M j = 1 r( t ) < M and the condton of the prevou two theorem. Let ε r t = E t [ r ( t )] r' ( t ). Then E t [ r ( t )] = r' ( t ) ε r 1 + t = r t ( ) + --tr( r Σ t) + ε r where t j ε r t < ( ε Σ t F + ε t + M r ) M + O ( ε t ) Theorem 4 Fx a tme t and a polcy π, and a dtrbuton of tate P t. Let t and Σ t be gven, and defne τ and Σ τ for τ = t + 1 t + n recurvely a n theorem 1 and above. Let M εr be an upper bound for εr τ for all τ [ t, t + n]. Then under the condton of the above three theorem, E[ V ( t )] = V' ( ) + εv where. t t ε V 1 γ t < M n γ εr Proof: Frt, ome prelmnare. In the frt three theorem, whch deal only wth a ngle tranton and a ngle dtrbuton of tate at tme t, namely P t, let x = E Pt [ x] for any random varable x. Note that for any vector x and quare matrce A and B, x T Ax = tr( Axx ( T )) where tr(.) denote the trace of a matrx, tr( AB) A F B F, and xx T F = x. In the tatement of theorem, E t [ ( t t ) 3 ] a three dmenonal matrx whoe, j, k element E t [ ( t t )( j t j t )( k t k t ) ]. Smlarly, E t [ ( t t ) 4 ] a four dmenonal matrx, and f all of t element are fnte, then the lower power mut alo be fnte. The Frobenu norm of uch matrce mply the quare root of the umofthequareofalltherelement. Alo,fa, b, c & d are real number that are greater than zero, then ab + cd < ( a + c) ( b + d). Note that, nce µ π a vector valued functon, µ π () a matrx. Snce µ π, the th component of µ π, a real valued functon, µ π () n. Becaue ν() a matrx, ν, j (). Let µπ () x denote the matrx of econd partal dervatve of µ π, evaluated at x n j k. For any, let 1 = t, = t t and = 1 + = t. The Eental Dynamc Algorthm: Eental Reult p. 5

7 Thu, E Pt [ ] = and E t T T T T [ ] E t [ 1 1 ] + Σ t + Σ t ε Σ T = = = + t +. Note that = ε t..e. Proof of Theorem 1: Expand µ π () form of the remander, namely ung a frt order Taylor ere wth the Lagrange µ π () µ π ( t ) µ π ( t ) T 1 ( ) -- ( t ), (3) T = + + µπ () x ( t ) j k µ π () = µ π ( t ) + µ π ( t ) T T µπ () x j k for ome x on the lne jonng and t. Then E Pt [ t + 1 ] t + 1 = E Pt [ µ π ( t )] µ π ( t ) µ π ( t ) µ π ( t ) T 1 T = tr( µπ () x ( Σ t + )) µ π ( t ) j k (4) So M ε 1 < t + --M. µ ( M + ε t ) < ( ε t + M µ ) M ( M + ε t ) ε t + 1 Proof of Theorem : Let M k ' = E t [ ( t t ) k ] F. By the mean value theorem, ν j, () = ν, j ( t ) + ν, j () x for ome x on the lne jonng and t. Alo, ν j, ( t ) = E[ t + 1 j t + 1 t ] µ ( t )µ j ( t ) o that cov Pt ( t + 1, j t + 1 ) = E[ t + 1 j t + 1 ] t + 1 = j t + 1 = E Pt [ E[ t + 1 j t + 1 t ]] t + 1 j t + 1 ν, j ( ) + E t Pt [ ν, j () x ] + E Pt [ µ ( t )µ j ( t )] t + 1 j t + 1. (5) The econd term an error term, call t ε' j,. We have ε' < M ν M 1 '. For the thrd term, we expand both µ and µ j ung Eq. (4) and multplyng out the term, obtanng E Pt [ µ ( t )µ j ( t )] = µ ( t )µ j ( t ) + µ ( t ) µ j ( t ) T + µ j ( t ) µ ( t ) T µ ( t ) T Σ t ε Σ T + ( + t + ) µ j ( t ) µ ( t ) T E Pt T µπ j () x µ k l j ( t ) T E Pt T µπ () x k l µ ( t ) T E Pt T µπ j () x µ k l j ( t ) T E Pt T µπ () x k l E 4 Pt T µπ x k l () T k l µπ j () x ε'', j All term other than the frt and the one nvolvng Σ t. That, are error term, call ther um E Pt [ µ ( t )µ j ( t )] = µ ( t )µ j ( t ) + µ ( t ) T Σ t µ j ( t ) + ε'' j, The Eental Dynamc Algorthm: Eental Reult p. 6

8 o that where Latly let ε ''' = µ ( t )µ j ( t ) t + 1 j t + 1. By Theorem 1, ε''' < ε t t 1 Subttutng nto Eq. (5), we obtan: Each term ha at leat one of the mall bound ε t, ε Σ t F, Mµ or M ν. Ung the nequalty from the prelmnare, we can factor them out. The four M k ' are bounded by M + O( ε t ), a can be hown ung the bnomal theorem, e.g. 3 = E t [ 1 ] + 3 E t [ 1 ] + 3 Et [ 1 ] = E t [ 1 ] + O( ε t ). Proof of Theorem 3: Expand r() ung a econd order Taylor ere wth the Lagrange form of the remander, namely Call the lat term ε'. Thu, and ε Σ t + 1 ε'' < t 1 + µ ( t ) ε t + µ ( t ) ( ε t + ε Σ t F ) 1 + t + 1 M µ M ' + µ ( t ) M µ M 3 ' + --M 4 µ M4 ' ε''' j = µ ( t )µ j ( t ) t + 1 j t + 1 = µ ( t )µ j ( t ) ( µ ( t ) + ε t )( µ j ( t ) + ε j t ) = µ ( t )ε t µ j ( t )ε t ε t ε j t + + ε t cov Pt ( t + 1, j t + 1 ) = ν, j ( t ) + ε', j + µ ( t ) T Σ t µ j ( t ) + ε'' j, + ε''' j, Σ = ε' + ε'' + ε''' and ε t + 1 F M ν M 1 ' M ε t M ε t ε Σ 1 < + + ( + t ) + MM µ M ' + MM µ M 3 ' + --M 4 µ M4 ' + ( ε t + M µ ) 3 --M ε t M + ( εt + M µ ) 3 --M ε t 3 E t [ 1 + ] E t [( 1 + ) 3 ] r () r t ( ) r( t ) T 1 -- T = + + r ( t ) j, j, k = 1 ε r t r( t ) ε 1 < t + -- ( ε Σ t F + ε t )M M r M 3 ' ( ε t + ε Σ t F + M r ) M 1 --M 1 --M ε 1 < + + t + --M 6 3 '. Proof of Theorem 4: n 3 r () x j k. (6) j k E t [ r ()] r t ( ) r( t ) T 1 --tr r ( t ) Σ t ε Σ T = + + ( ( + t + )) + E t [ ε' ] j = r' ( t ) + ε r t The Eental Dynamc Algorthm: Eental Reult p. 7

9 So, n ε V t γ τ t n < M εr M εr γ τ t 1 γ = = M εr n γ τ = t τ = t The above theorem tate that a long a ε t, ε Σ t F, Mµ, M ν and M r are mall and M fnte, and gven a good etmate of the mean and covarance of the tate at ome tme, the tranformed MDP wll reult n good etmate at later tme, and hence the reward and value functon wll alo be good etmate. Note that no partcular dtrbuton of tate aumed, only that, eentally, the frt four moment are bounded at every tme. The mot unuual condton are that the reward r be roughly quadratc, and that the value functon nclude only a lmted number of future reward. Th motvate the ue of hapng reward. 4 Experment n = τ = t n = γ τ t τ = t E[ V ( t )] γ τ t E τ [ r ( τ )] ( r' ( τ ) + ε r τ ) The code ued for all experment n th paper avalable from martn/reearch.html. The eental dynamc algorthm wa appled to Randløv and Altrøm bcycle rdng tak [8], wth the objectve of rdng a bcycle to a goal 1km away. The fve tate varable were mply the lean angle, the handlebar angle, ther tme dervatve, and the angle to the goal. The two acton were the torque to apply to the handlebar and the horzontal dplacement of the rder center of ma from the bcycle center lne. The tochatcty of tate tranton came from a unform random number added to the rder dplacement. If the lean angle exceeded /15, the bcycle fell over and the run termnated. If the varance of the tate not too large at every tme tep, then the varance term n the tranformed reward can mply be condered another form of error, and only µ need be etmated. Th wa done here. A contnuou tme formulaton wa ued where, ntead of etmatng the value of the tate varable at a next tme, ther dervatve were etmated. The model wa of the form where ϕ(, a) wa a vector of feature and w wa a vector of weght. The feature were mply the tate and acton varable themelve. The dervatve of each tate varable wa etmated ung gradent decent on w wth the error meaure err = w ϕ(, a) and a learnng rate of 1.0. Th error meaure wa found to work better than the more tradtonal quared error. The quared error mnmzed by the mean of the oberved value, wherea the abolute value mnmzed by the medan [7]. The medan a more robut etmate of central tendency,.e. le uceptble to outler, and therefore may be a better choce n many practcal tuaton. Model etmaton wa done onlne, multaneou wth polcy earch. In the contnuou formulaton, the value functon the tme ntegral of the reward tme the dcount factor. The future tate wa etmated ung Euler ntegraton [7]. Whle the bcycle mulator alo ued Euler ntegraton, thee choce were unrelated. In fact, t = 0.01 for the bcycle mulator and for ntegratng the etmated reward. It wa ntegrated for 30 tme tep. n τ = t ε τ r = V' ( t ) + γ τ t = µ w t (, a) = w ϕ(, a) The Eental Dynamc Algorthm: Eental Reult p. 8

10 length of epode (ec) tranng tme (ec) mulated tme (ec) Fgure : The left graph how length of epode v. tranng tme for 10 run. The dahed lne ndcate the optmal polcy. Stable rdng wa acheved wthn 00 mulated econd. The rght graph how angle to goal v. tme for a ngle epode tartng after 3000 mulated econd of tranng. angle to goal (radan) The hapng reward wa the quare of the angle to goal plu 10 tme the quare of the lean angle. The polcy wa a weghted um of feature, wth a mall Gauan added for exploraton, π() = θ ϕ() + N( 0, 0.05). The feature were mply the tate varable themelve. When the model poor or the polcy parameter are far from a local optmum, V θ can be qute large, reultng n a large gradent decent tep whch may overhoot t regon of applcablty. Th can be addreed by reducng the learnng rate, but then learnng become ntermnably low. Thu, the gradent decent rule wa modfed to = α Near an optmum, when V θ «β, th reduce to θ V θ t ( β + V θ ) the uual rule wth a learnng rate of /. In thexperment, =0.01and =1.0. A graph of epode tme v. learnng tme hown n Fgure 1. After fallng over between 40 and 60 tme, the controller wa able to rde to the goal or the tme lmt wthout fallng over. After a ngle uch epode, t contently rode drectly to the goal n a near mnmum amount of tme. The reultng polcy wa eentally an optmal polcy. 5 Dcuon For learnng and plannng n complex world wth contnuou, hgh dmenonal tate and acton pace, the goal not o much to converge on a perfect oluton, but to fnd a good oluton wthn a reaonable tme. Such problem often ue a hapng reward to accelerate learnng. For a large cla of uch problem, th paper propoe approxmatng the problem dynamc n uch a way that the mean and covarance of the future tate can be etmated from the oberved current tate. We have hown that, under certan condton, the reward n the approxmate MDP are cloe to thoe n the orgnal, wth an error that grow boundedly a tme ncreae. Thu, f the reward are only ummed for a lmted number of tep ahead, the reultng value wll approxmate the value of the orgnal ytem. Learnng n th tranformed problem conderably eaer than n the orgnal, and both model etmaton and polcy earch can be acheved onlne. The mulaton of bcycle rdng a good example of a problem where the value functon complex and hard to approxmate, yet mple polce produce near optmal oluton. Ung a tradtonal value functon approxmaton approach, Randløv needed to augment the tate wth the econd dervatve of the lean angle ( Ω ) and provde hapng reward [8]. The reultng algorthm took 1700 epode to rde tably, and 400 epode to get to the goal for the frt tme. The reultng polce tended to rde n crcle and prece toward the goal, rdng roughly 7km to get to a goal 1km away. In contrat, when the acton a weghted um of (very mple) feature, random earch can fnd near optmal polce. Th wa teted expermentally; 0.55% of random polce contently reached the goal when Ω wa ncluded n the tate, and 0.30% dd The Eental Dynamc Algorthm: Eental Reult p. 9

11 when t wan t. 1 What more, over half of thee polce had a path length wthn 1% of the bet reported oluton. Polce that rode tably but not to the goal were obtaned 0.89% and 0.4% of the tme repectvely. Thu, a random earch of polce need only a few hundred epode to fnd a near optmal polcy. The eental dynamc algorthm contently fnd uch near optmal polce, and the author aware of only one other algorthm whch doe, the PEGASUS algorthm of [5]. The experment n th paper took 40 to 60 epode to rde tably, that, to the goal or untl the tme lmt wthout fallng over. After a ngle uch epode, the polcy contently rode drectly to the goal n a near mnmum amount of tme. In contrat, PEGASUS ued at leat 450 epode to evaluate each polcy. One reaonable ntal polcy to alway apply zero torque to the handlebar and zero dplacement of body poton. Th fall over n an average of 1.74 econd, o PEGASUS would need 780 mulated econd to evaluate uch a polcy. The eental dynamc algorthm learn to rde tably n approxmately 00 mulated econd, and n the econd 780 mulated econd wll have found a near optmal polcy. Th wa acheved ung very lttle doman knowledge. Ω wa not needed n the tate, and the feature were trval. The eental dynamc algorthm can be ued for onlne learnng, or can learn from trajectore provded by other polce, that, t can learn by watchng. In the bcycle experment, the eental dynamc algorthm needed many tme more computng power per mulated econd than PEGASUS, although t wa tll fater than real tme on a 1GHz moble Pentum III, and therefore could preumably be ued for learnng on a real bcycle. The experment n ecton 4 added the quare of the lean angle to the hapng reward, but dd not ue any nformaton about dynamc (.e. velocte or acceleraton), nor about the handlebar. In fact, the hapng reward mply correponded to the common ene advce tay uprght and head toward the goal. However, thee advantage do not come wthout drawback. The eental dynamc algorthm only doe polcy earch n an approxmaton to the orgnal MDP, o an optmal polcy for th approxmate MDP won t, n general, be optmal for the orgnal MDP. The theorem n ecton 3 gve bound on th error, and for bcycle rdng th error mall. Concluon Th paper ha preented an algorthm for onlne polcy earch n MDP wth contnuou tate and acton pace. A tochatc MDP tranformed to a determntc MDP whch capture the eental dynamc of the orgnal. Polcy earch then be performed n th tranformed MDP. Error bound were gven and the technque wa appled to a mulaton of bcycle rdng. The algorthm found near optmal oluton wth le doman knowledge and order of magntude le tme than extng technque. Acknowledgement The author would lke Lele Kaelblng, Al Rahm and epecally Kevn Murphy for enlghtenng comment and dcuon of th work. 1. Our experment contaned two condton, namely wth or wthout Ω n the tate, reultng n 5 or 6 tate varable. The feature were the tate varable themelve, tate and acton varable were caled to roughly the range [-1, +1], weght were choen unformly from [-, +], and each polcy wa run 30 tme. In 100,000 polce per condton, 549 (0.55%) reached the goal all 30 tme when Ω wa ncluded, and 300 (0.30%) when t wan t. For uch polce, the medan rdng dtance wa 1009m and 1008m repectvely. The code ued avalable on the web te.. [5] evaluated a gven polcy by mulatng t 30 tme. The dervatve wth repect to each of the 15 weght wa evaluated ung fnte dfference, requrng another 30 mulaton per weght, for a total of = 450 mulaton. Often, the tartng weght at a gven tage were evaluated durng the prevou tage, o only the dervatve need to be calculated. The Eental Dynamc Algorthm: Eental Reult p. 10

12 Reference [1] Colombett, M. & Dorgo, M. (1994) Tranng agent to perform equental behavor. In Adaptve Behavor, (3), pp [] Forbe, J., & Andre, D. (000) Real-tme renforcement learnng n contnuou doman. In AAAI Sprng Sympoum on Real-Tme Autonomou Sytem. [3] Matarc, M.J. (1994) Reward functon for accelerated learnng. In W.W. Cohen and H. Hrch (ed.) Proc. 11th Intl. Conf. on Machne Learnng. [4] Ng, A. et al. (1999) Polcy nvarance under reward tranformaton: Theory and applcaton to reward hapng. In Proc. 16th Intl. Conf. on Machne Learnng, pp [5] Ng, A. & Jordan, M. (000) PEGASUS: A polcy earch method for large MDP and POMDP. In Uncertanty n Artfcal Intellgence (UAI), Proc. of the Sxteenth Conf., pp [6] Pehkn, L. et al. (000) Learnng to Cooperate va Polcy Search. In Uncertanty n Artfcal Intellgence (UAI), Proc. of the Sxteenth Conf., pp [7] Pre,W.H.etal.(199) Numercal Recpe: The Art of Scentfc Computng. Cambrdge Unverty Pre. [8] Randløv, J. (000) Shapng n Renforcement Learnng by Changng the Phyc of the Problem. In Proc. Intl. Conf. on Machne Learnng. pp [9] Santamaría, J.C. et al. (1998) Experment wth Renforcement Learnng n Problem wth Contnuou State and Acton Space. In Adaptve Behavor, 6(), 1998 [10] Stren, M. J. A. & Moore, A.W. (00) Polcy Search ung Pared Comparon. In Journal of Machne Learnng Reearch, v. 3, pp The Eental Dynamc Algorthm: Eental Reult p. 11

Additional File 1 - Detailed explanation of the expression level CPD

Additional File 1 - Detailed explanation of the expression level CPD Addtonal Fle - Detaled explanaton of the expreon level CPD A mentoned n the man text, the man CPD for the uterng model cont of two ndvdual factor: P( level gen P( level gen P ( level gen 2 (.).. CPD factor

More information

Harmonic oscillator approximation

Harmonic oscillator approximation armonc ocllator approxmaton armonc ocllator approxmaton Euaton to be olved We are fndng a mnmum of the functon under the retrcton where W P, P,..., P, Q, Q,..., Q P, P,..., P, Q, Q,..., Q lnwgner functon

More information

Batch RL Via Least Squares Policy Iteration

Batch RL Via Least Squares Policy Iteration Batch RL Va Leat Square Polcy Iteraton Alan Fern * Baed n part on lde by Ronald Parr Overvew Motvaton LSPI Dervaton from LSTD Expermental reult Onlne veru Batch RL Onlne RL: ntegrate data collecton and

More information

Improvements on Waring s Problem

Improvements on Waring s Problem Improvement on Warng Problem L An-Png Bejng, PR Chna apl@nacom Abtract By a new recurve algorthm for the auxlary equaton, n th paper, we wll gve ome mprovement for Warng problem Keyword: Warng Problem,

More information

Scattering of two identical particles in the center-of. of-mass frame. (b)

Scattering of two identical particles in the center-of. of-mass frame. (b) Lecture # November 5 Scatterng of two dentcal partcle Relatvtc Quantum Mechanc: The Klen-Gordon equaton Interpretaton of the Klen-Gordon equaton The Drac equaton Drac repreentaton for the matrce α and

More information

Batch Reinforcement Learning

Batch Reinforcement Learning Batch Renforcement Learnng Alan Fern * Baed n part on lde by Ronald Parr Overvew What batch renforcement learnng? Leat Square Polcy Iteraton Ftted Q-teraton Batch DQN Onlne veru Batch RL Onlne RL: ntegrate

More information

Chapter 11. Supplemental Text Material. The method of steepest ascent can be derived as follows. Suppose that we have fit a firstorder

Chapter 11. Supplemental Text Material. The method of steepest ascent can be derived as follows. Suppose that we have fit a firstorder S-. The Method of Steepet cent Chapter. Supplemental Text Materal The method of teepet acent can be derved a follow. Suppoe that we have ft a frtorder model y = β + β x and we wh to ue th model to determne

More information

Chapter 6 The Effect of the GPS Systematic Errors on Deformation Parameters

Chapter 6 The Effect of the GPS Systematic Errors on Deformation Parameters Chapter 6 The Effect of the GPS Sytematc Error on Deformaton Parameter 6.. General Beutler et al., (988) dd the frt comprehenve tudy on the GPS ytematc error. Baed on a geometrc approach and aumng a unform

More information

Estimation of Finite Population Total under PPS Sampling in Presence of Extra Auxiliary Information

Estimation of Finite Population Total under PPS Sampling in Presence of Extra Auxiliary Information Internatonal Journal of Stattc and Analy. ISSN 2248-9959 Volume 6, Number 1 (2016), pp. 9-16 Reearch Inda Publcaton http://www.rpublcaton.com Etmaton of Fnte Populaton Total under PPS Samplng n Preence

More information

MULTIPLE REGRESSION ANALYSIS For the Case of Two Regressors

MULTIPLE REGRESSION ANALYSIS For the Case of Two Regressors MULTIPLE REGRESSION ANALYSIS For the Cae of Two Regreor In the followng note, leat-quare etmaton developed for multple regreon problem wth two eplanator varable, here called regreor (uch a n the Fat Food

More information

Specification -- Assumptions of the Simple Classical Linear Regression Model (CLRM) 1. Introduction

Specification -- Assumptions of the Simple Classical Linear Regression Model (CLRM) 1. Introduction ECONOMICS 35* -- NOTE ECON 35* -- NOTE Specfcaton -- Aumpton of the Smple Clacal Lnear Regreon Model (CLRM). Introducton CLRM tand for the Clacal Lnear Regreon Model. The CLRM alo known a the tandard lnear

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Introduction to Interfacial Segregation. Xiaozhe Zhang 10/02/2015

Introduction to Interfacial Segregation. Xiaozhe Zhang 10/02/2015 Introducton to Interfacal Segregaton Xaozhe Zhang 10/02/2015 Interfacal egregaton Segregaton n materal refer to the enrchment of a materal conttuent at a free urface or an nternal nterface of a materal.

More information

Deep Reinforcement Learning with Experience Replay Based on SARSA

Deep Reinforcement Learning with Experience Replay Based on SARSA Deep Renforcement Learnng wth Experence Replay Baed on SARSA Dongbn Zhao, Hatao Wang, Kun Shao and Yuanheng Zhu Key Laboratory of Management and Control for Complex Sytem Inttute of Automaton Chnee Academy

More information

Start Point and Trajectory Analysis for the Minimal Time System Design Algorithm

Start Point and Trajectory Analysis for the Minimal Time System Design Algorithm Start Pont and Trajectory Analy for the Mnmal Tme Sytem Degn Algorthm ALEXANDER ZEMLIAK, PEDRO MIRANDA Department of Phyc and Mathematc Puebla Autonomou Unverty Av San Claudo /n, Puebla, 757 MEXICO Abtract:

More information

Two Approaches to Proving. Goldbach s Conjecture

Two Approaches to Proving. Goldbach s Conjecture Two Approache to Provng Goldbach Conecture By Bernard Farley Adved By Charle Parry May 3 rd 5 A Bref Introducton to Goldbach Conecture In 74 Goldbach made h mot famou contrbuton n mathematc wth the conecture

More information

The multivariate Gaussian probability density function for random vector X (X 1,,X ) T. diagonal term of, denoted

The multivariate Gaussian probability density function for random vector X (X 1,,X ) T. diagonal term of, denoted Appendx Proof of heorem he multvarate Gauan probablty denty functon for random vector X (X,,X ) px exp / / x x mean and varance equal to the th dagonal term of, denoted he margnal dtrbuton of X Gauan wth

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Variable Structure Control ~ Basics

Variable Structure Control ~ Basics Varable Structure Control ~ Bac Harry G. Kwatny Department of Mechancal Engneerng & Mechanc Drexel Unverty Outlne A prelmnary example VS ytem, ldng mode, reachng Bac of dcontnuou ytem Example: underea

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Small signal analysis

Small signal analysis Small gnal analy. ntroducton Let u conder the crcut hown n Fg., where the nonlnear retor decrbed by the equaton g v havng graphcal repreentaton hown n Fg.. ( G (t G v(t v Fg. Fg. a D current ource wherea

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

APPROXIMATE FUZZY REASONING BASED ON INTERPOLATION IN THE VAGUE ENVIRONMENT OF THE FUZZY RULEBASE AS A PRACTICAL ALTERNATIVE OF THE CLASSICAL CRI

APPROXIMATE FUZZY REASONING BASED ON INTERPOLATION IN THE VAGUE ENVIRONMENT OF THE FUZZY RULEBASE AS A PRACTICAL ALTERNATIVE OF THE CLASSICAL CRI Kovác, Sz., Kóczy, L.T.: Approxmate Fuzzy Reaonng Baed on Interpolaton n the Vague Envronment of the Fuzzy Rulebae a a Practcal Alternatve of the Clacal CRI, Proceedng of the 7 th Internatonal Fuzzy Sytem

More information

Module 5. Cables and Arches. Version 2 CE IIT, Kharagpur

Module 5. Cables and Arches. Version 2 CE IIT, Kharagpur odule 5 Cable and Arche Veron CE IIT, Kharagpur Leon 33 Two-nged Arch Veron CE IIT, Kharagpur Intructonal Objectve: After readng th chapter the tudent wll be able to 1. Compute horzontal reacton n two-hnged

More information

Optimal inference of sameness Supporting information

Optimal inference of sameness Supporting information Optmal nference of amene Supportng nformaton Content Decon rule of the optmal oberver.... Unequal relablte.... Equal relablte... 5 Repone probablte of the optmal oberver... 6. Equal relablte... 6. Unequal

More information

AP Statistics Ch 3 Examining Relationships

AP Statistics Ch 3 Examining Relationships Introducton To tud relatonhp between varable, we mut meaure the varable on the ame group of ndvdual. If we thnk a varable ma eplan or even caue change n another varable, then the eplanator varable and

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Pythagorean triples. Leen Noordzij.

Pythagorean triples. Leen Noordzij. Pythagorean trple. Leen Noordz Dr.l.noordz@leennoordz.nl www.leennoordz.me Content A Roadmap for generatng Pythagorean Trple.... Pythagorean Trple.... 3 Dcuon Concluon.... 5 A Roadmap for generatng Pythagorean

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Method Of Fundamental Solutions For Modeling Electromagnetic Wave Scattering Problems

Method Of Fundamental Solutions For Modeling Electromagnetic Wave Scattering Problems Internatonal Workhop on MehFree Method 003 1 Method Of Fundamental Soluton For Modelng lectromagnetc Wave Scatterng Problem Der-Lang Young (1) and Jhh-We Ruan (1) Abtract: In th paper we attempt to contruct

More information

arxiv: v1 [cs.gt] 15 Jan 2019

arxiv: v1 [cs.gt] 15 Jan 2019 Model and algorthm for tme-content rk-aware Markov game Wenje Huang, Pham Vet Ha and Wllam B. Hakell January 16, 2019 arxv:1901.04882v1 [c.gt] 15 Jan 2019 Abtract In th paper, we propoe a model for non-cooperatve

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Improvements on Waring s Problem

Improvements on Waring s Problem Imrovement on Warng Problem L An-Png Bejng 85, PR Chna al@nacom Abtract By a new recurve algorthm for the auxlary equaton, n th aer, we wll gve ome mrovement for Warng roblem Keyword: Warng Problem, Hardy-Lttlewood

More information

Solution Methods for Time-indexed MIP Models for Chemical Production Scheduling

Solution Methods for Time-indexed MIP Models for Chemical Production Scheduling Ian Davd Lockhart Bogle and Mchael Farweather (Edtor), Proceedng of the 22nd European Sympoum on Computer Aded Proce Engneerng, 17-2 June 212, London. 212 Elever B.V. All rght reerved. Soluton Method for

More information

Team. Outline. Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference

Team. Outline. Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference Team Stattc and Art: Samplng, Repone Error, Mxed Model, Mng Data, and nference Ed Stanek Unverty of Maachuett- Amhert, USA 9/5/8 9/5/8 Outlne. Example: Doe-repone Model n Toxcology. ow to Predct Realzed

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Erratum: A Generalized Path Integral Control Approach to Reinforcement Learning

Erratum: A Generalized Path Integral Control Approach to Reinforcement Learning Journal of Machne Learnng Research 00-9 Submtted /0; Publshed 7/ Erratum: A Generalzed Path Integral Control Approach to Renforcement Learnng Evangelos ATheodorou Jonas Buchl Stefan Schaal Department of

More information

Information Acquisition in Global Games of Regime Change (Online Appendix)

Information Acquisition in Global Games of Regime Change (Online Appendix) Informaton Acquton n Global Game of Regme Change (Onlne Appendx) Mchal Szkup and Iabel Trevno Augut 4, 05 Introducton Th appendx contan the proof of all the ntermedate reult that have been omtted from

More information

Statistical Properties of the OLS Coefficient Estimators. 1. Introduction

Statistical Properties of the OLS Coefficient Estimators. 1. Introduction ECOOMICS 35* -- OTE 4 ECO 35* -- OTE 4 Stattcal Properte of the OLS Coeffcent Etmator Introducton We derved n ote the OLS (Ordnary Leat Square etmator ˆβ j (j, of the regreon coeffcent βj (j, n the mple

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Root Locus Techniques

Root Locus Techniques Root Locu Technque ELEC 32 Cloed-Loop Control The control nput u t ynthezed baed on the a pror knowledge of the ytem plant, the reference nput r t, and the error gnal, e t The control ytem meaure the output,

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

Chapter 7 Four-Wave Mixing phenomena

Chapter 7 Four-Wave Mixing phenomena Chapter 7 Four-Wave Mx phenomena We wll dcu n th chapter the general nonlnear optcal procee wth four nteract electromagnetc wave n a NLO medum. Frt note that FWM procee are allowed n all meda (nveron or

More information

Kinetic-Energy Density-Functional Theory on a Lattice

Kinetic-Energy Density-Functional Theory on a Lattice h an open acce artcle publhed under an ACS AuthorChoce Lcene, whch permt copyng and redtrbuton of the artcle or any adaptaton for non-commercal purpoe. Artcle Cte h: J. Chem. heory Comput. 08, 4, 407 4087

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

A METHOD TO REPRESENT THE SEMANTIC DESCRIPTION OF A WEB SERVICE BASED ON COMPLEXITY FUNCTIONS

A METHOD TO REPRESENT THE SEMANTIC DESCRIPTION OF A WEB SERVICE BASED ON COMPLEXITY FUNCTIONS UPB Sc Bull, Sere A, Vol 77, I, 5 ISSN 3-77 A METHOD TO REPRESENT THE SEMANTIC DESCRIPTION OF A WEB SERVICE BASED ON COMPLEXITY FUNCTIONS Andre-Hora MOGOS, Adna Magda FLOREA Semantc web ervce repreent

More information

This appendix presents the derivations and proofs omitted from the main text.

This appendix presents the derivations and proofs omitted from the main text. Onlne Appendx A Appendx: Omtted Dervaton and Proof Th appendx preent the dervaton and proof omtted from the man text A Omtted dervaton n Secton Mot of the analy provded n the man text Here, we formally

More information

Introduction. Modeling Data. Approach. Quality of Fit. Likelihood. Probabilistic Approach

Introduction. Modeling Data. Approach. Quality of Fit. Likelihood. Probabilistic Approach Introducton Modelng Data Gven a et of obervaton, we wh to ft a mathematcal model Model deend on adutable arameter traght lne: m + c n Polnomal: a + a + a + L+ a n Choce of model deend uon roblem Aroach

More information

Physics 111. CQ1: springs. con t. Aristocrat at a fixed angle. Wednesday, 8-9 pm in NSC 118/119 Sunday, 6:30-8 pm in CCLIR 468.

Physics 111. CQ1: springs. con t. Aristocrat at a fixed angle. Wednesday, 8-9 pm in NSC 118/119 Sunday, 6:30-8 pm in CCLIR 468. c Announcement day, ober 8, 004 Ch 8: Ch 10: Work done by orce at an angle Power Rotatonal Knematc angular dplacement angular velocty angular acceleraton Wedneday, 8-9 pm n NSC 118/119 Sunday, 6:30-8 pm

More information

One-sided finite-difference approximations suitable for use with Richardson extrapolation

One-sided finite-difference approximations suitable for use with Richardson extrapolation Journal of Computatonal Physcs 219 (2006) 13 20 Short note One-sded fnte-dfference approxmatons sutable for use wth Rchardson extrapolaton Kumar Rahul, S.N. Bhattacharyya * Department of Mechancal Engneerng,

More information

Estimation of a proportion under a certain two-stage sampling design

Estimation of a proportion under a certain two-stage sampling design Etmaton of a roorton under a certan two-tage amng degn Danutė Kraavcatė nttute of athematc and nformatc Lthuana Stattc Lthuana Lthuana e-ma: raav@tmt Abtract The am of th aer to demontrate wth exame that

More information

Physics 120. Exam #1. April 15, 2011

Physics 120. Exam #1. April 15, 2011 Phyc 120 Exam #1 Aprl 15, 2011 Name Multple Choce /16 Problem #1 /28 Problem #2 /28 Problem #3 /28 Total /100 PartI:Multple Choce:Crclethebetanwertoeachqueton.Anyothermark wllnotbegvencredt.eachmultple

More information

CHAPTER 9 LINEAR MOMENTUM, IMPULSE AND COLLISIONS

CHAPTER 9 LINEAR MOMENTUM, IMPULSE AND COLLISIONS CHAPTER 9 LINEAR MOMENTUM, IMPULSE AND COLLISIONS 103 Phy 1 9.1 Lnear Momentum The prncple o energy conervaton can be ued to olve problem that are harder to olve jut ung Newton law. It ued to decrbe moton

More information

Confidence intervals for the difference and the ratio of Lognormal means with bounded parameters

Confidence intervals for the difference and the ratio of Lognormal means with bounded parameters Songklanakarn J. Sc. Technol. 37 () 3-40 Mar.-Apr. 05 http://www.jt.pu.ac.th Orgnal Artcle Confdence nterval for the dfference and the rato of Lognormal mean wth bounded parameter Sa-aat Nwtpong* Department

More information

and decompose in cycles of length two

and decompose in cycles of length two Permutaton of Proceedng of the Natona Conference On Undergraduate Reearch (NCUR) 006 Domncan Unverty of Caforna San Rafae, Caforna Apr - 4, 007 that are gven by bnoma and decompoe n cyce of ength two Yeena

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Canonical transformations

Canonical transformations Canoncal transformatons November 23, 2014 Recall that we have defned a symplectc transformaton to be any lnear transformaton M A B leavng the symplectc form nvarant, Ω AB M A CM B DΩ CD Coordnate transformatons,

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Note on EM-training of IBM-model 1

Note on EM-training of IBM-model 1 Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Research Article Runge-Kutta Type Methods for Directly Solving Special Fourth-Order Ordinary Differential Equations

Research Article Runge-Kutta Type Methods for Directly Solving Special Fourth-Order Ordinary Differential Equations Hndaw Publhng Corporaton Mathematcal Problem n Engneerng Volume 205, Artcle ID 893763, page http://dx.do.org/0.55/205/893763 Reearch Artcle Runge-Kutta Type Method for Drectly Solvng Specal Fourth-Order

More information

Discrete Simultaneous Perturbation Stochastic Approximation on Loss Function with Noisy Measurements

Discrete Simultaneous Perturbation Stochastic Approximation on Loss Function with Noisy Measurements 0 Amercan Control Conference on O'Farrell Street San Francco CA USA June 9 - July 0 0 Dcrete Smultaneou Perturbaton Stochatc Approxmaton on Lo Functon wth Noy Meaurement Q Wang and Jame C Spall Abtract

More information

A NUMERICAL MODELING OF MAGNETIC FIELD PERTURBATED BY THE PRESENCE OF SCHIP S HULL

A NUMERICAL MODELING OF MAGNETIC FIELD PERTURBATED BY THE PRESENCE OF SCHIP S HULL A NUMERCAL MODELNG OF MAGNETC FELD PERTURBATED BY THE PRESENCE OF SCHP S HULL M. Dennah* Z. Abd** * Laboratory Electromagnetc Sytem EMP BP b Ben-Aknoun 606 Alger Algera ** Electronc nttute USTHB Alger

More information

Extended Prigogine Theorem: Method for Universal Characterization of Complex System Evolution

Extended Prigogine Theorem: Method for Universal Characterization of Complex System Evolution Extended Prgogne Theorem: Method for Unveral Characterzaton of Complex Sytem Evoluton Sergey amenhchkov* Mocow State Unverty of M.V. Lomonoov, Phycal department, Rua, Mocow, Lennke Gory, 1/, 119991 Publhed

More information

Foresighted Resource Reciprocation Strategies in P2P Networks

Foresighted Resource Reciprocation Strategies in P2P Networks Foreghted Reource Recprocaton Stratege n PP Networ Hyunggon Par and Mhaela van der Schaar Electrcal Engneerng Department Unverty of Calforna Lo Angele (UCLA) Emal: {hgpar mhaela@ee.ucla.edu Abtract We

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

On the SO 2 Problem in Thermal Power Plants. 2.Two-steps chemical absorption modeling

On the SO 2 Problem in Thermal Power Plants. 2.Two-steps chemical absorption modeling Internatonal Journal of Engneerng Reearch ISSN:39-689)(onlne),347-53(prnt) Volume No4, Iue No, pp : 557-56 Oct 5 On the SO Problem n Thermal Power Plant Two-tep chemcal aborpton modelng hr Boyadjev, P

More information

A Kernel Particle Filter Algorithm for Joint Tracking and Classification

A Kernel Particle Filter Algorithm for Joint Tracking and Classification A Kernel Partcle Flter Algorthm for Jont Tracng and Clafcaton Yunfe Guo Donglang Peng Inttute of Informaton and Control Automaton School Hangzhou Danz Unverty Hangzhou Chna gyf@hdueducn Huaje Chen Ane

More information

728. Mechanical and electrical elements in reduction of vibrations

728. Mechanical and electrical elements in reduction of vibrations 78. Mechancal and electrcal element n reducton of vbraton Katarzyna BIAŁAS The Slean Unverty of Technology, Faculty of Mechancal Engneerng Inttute of Engneerng Procee Automaton and Integrated Manufacturng

More information

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES SVANTE JANSON Abstract. We gve explct bounds for the tal probabltes for sums of ndependent geometrc or exponental varables, possbly wth dfferent

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

A New Inverse Reliability Analysis Method Using MPP-Based Dimension Reduction Method (DRM)

A New Inverse Reliability Analysis Method Using MPP-Based Dimension Reduction Method (DRM) roceedng of the ASME 007 Internatonal Degn Engneerng Techncal Conference & Computer and Informaton n Engneerng Conference IDETC/CIE 007 September 4-7, 007, La Vega, eada, USA DETC007-35098 A ew Inere Relablty

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS IJRRAS 8 (3 September 011 www.arpapress.com/volumes/vol8issue3/ijrras_8_3_08.pdf NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS H.O. Bakodah Dept. of Mathematc

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

2. SINGLE VS. MULTI POLARIZATION SAR DATA

2. SINGLE VS. MULTI POLARIZATION SAR DATA . SINGLE VS. MULTI POLARIZATION SAR DATA.1 Scatterng Coeffcent v. Scatterng Matrx In the prevou chapter of th document, we dealt wth the decrpton and the characterzaton of electromagnetc wave. A t wa hown,

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

A New Virtual Indexing Method for Measuring Host Connection Degrees

A New Virtual Indexing Method for Measuring Host Connection Degrees A New Vrtual Indexng Method for Meaurng ot Connecton Degree Pnghu Wang, Xaohong Guan,, Webo Gong 3, and Don Towley 4 SKLMS Lab and MOE KLINNS Lab, X an Jaotong Unverty, X an, Chna Department of Automaton

More information

A new construction of 3-separable matrices via an improved decoding of Macula s construction

A new construction of 3-separable matrices via an improved decoding of Macula s construction Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

PROBABILITY-CONSISTENT SCENARIO EARTHQUAKE AND ITS APPLICATION IN ESTIMATION OF GROUND MOTIONS

PROBABILITY-CONSISTENT SCENARIO EARTHQUAKE AND ITS APPLICATION IN ESTIMATION OF GROUND MOTIONS PROBABILITY-COSISTET SCEARIO EARTHQUAKE AD ITS APPLICATIO I ESTIATIO OF GROUD OTIOS Q-feng LUO SUARY Th paper preent a new defnton of probablty-content cenaro earthquae PCSE and an evaluaton method of

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Lecture 14: Bandits with Budget Constraints

Lecture 14: Bandits with Budget Constraints IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

Supervised Learning. Neural Networks and Back-Propagation Learning. Credit Assignment Problem. Feedforward Network. Adaptive System.

Supervised Learning. Neural Networks and Back-Propagation Learning. Credit Assignment Problem. Feedforward Network. Adaptive System. Part 7: Neura Networ & earnng /2/05 Superved earnng Neura Networ and Bac-Propagaton earnng Produce dered output for tranng nput Generaze reaonaby & appropratey to other nput Good exampe: pattern recognton

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Joint Source Coding and Higher-Dimension Modulation

Joint Source Coding and Higher-Dimension Modulation Jont Codng and Hgher-Dmenon Modulaton Tze C. Wong and Huck M. Kwon Electrcal Engneerng and Computer Scence Wchta State Unvert, Wchta, Kana 676, USA {tcwong; huck.kwon}@wchta.edu Abtract Th paper propoe

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve

More information

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng

More information