Natural Temporal Difference Learning

Size: px
Start display at page:

Download "Natural Temporal Difference Learning"

Transcription

1 Proceedings of he Tweny-Eighh AAAI Conference on Arificial Inelligence Naural Temporal Difference Learning William Dabney and Philip S. Thomas School of Compuer Science Universiy of Massachuses Amhers 140 Governors Dr., Amhers, MA Absrac In his paper we invesigae he applicaion of naural gradien descen o Bellman error based reinforcemen learning algorihms. This combinaion is ineresing because naural gradien descen is invarian o he parameerizaion of he value funcion. This invariance propery means ha naural gradien descen adaps is updae direcions o correc for poorly condiioned represenaions. We presen and analyze quadraic and linear ime naural emporal difference learning algorihms, and prove ha hey are covarian. We conclude wih experimens which sugges ha he naural algorihms can mach or ouperform heir non-naural counerpars using linear funcion approximaion, and drasically improve upon heir non-naural counerpars when using non-linear funcion approximaion. Inroducion Much recen research has focused on problems wih coninuous acions. For hese problems, a significan leap in performance occurred when Kakade (2002) suggesed he applicaion of naural gradiens (Amari 1998) o policy gradien algorihms. This suggesion has resuled in many successful naural gradien based policy search algorihms (Morimura, Uchibe, and Doya 2005; Peers and Schaal 2008; Bhanagar e al. 2009; Degris, Pilarski, and Suon 2012). Despie he successful applicaions of naural gradiens o reinforcemen learning in he conex of policy search, i has no been applied o Bellman-error based algorihms like residual gradien and Sarsa(λ), which are he de faco algorihms for problems wih discree acion ses. A common complain is ha hese Bellman-error based algorihms learn slowly when using funcion approximaion. Naural gradiens are a quasi-newon approach ha is known o speed up gradien descen, and hus he synhesis of naural gradiens wih TD has he poenial o improve upon his drawback of reinforcemen learning. Addiionally, we show in he appendix ha he naural TD mehods are covarian, which makes hem more robus o he choice of represenaion han ordinary TD mehods. In his paper we provide a simple quadraic-ime naural emporal difference learning algorihm, show how he idea of compaible funcion approximaion can be leveraged Copyrigh c 2014, Associaion for he Advancemen of Arificial Inelligence ( All righs reserved. o achieve linear ime complexiy, and prove ha our algorihms are covarian. We conclude wih empirical comparisons on hree canonical domains (mounain car, carpole balancing, and acrobo) and one novel challenging domain (playing Tic-ac-oe using handwrien leers as inpu). When no oherwise specified, we assume he noaion of Suon and Baro (1998). Residual Gradien The residual gradien (RG) algorihm is he direc applicaion of sochasic gradien descen o he problem of minimizing he mean squared Bellman error (MSBE) (Baird 1995). I is given by he following updae equaions: δ = r + γq θ (s +1, a +1 ) Q θ (s, a ), (1) θ +1 = θ α δ δ, (2) where Q θ : S A R is a funcion approximaor wih parameer vecor θ. Residual gradien only follows unbiased esimaes of he gradien of he MSBE if i uses double sampling or when he domain has deerminisic sae ransiions (Suon and Baro 1998). In his paper we evaluae using sandard reinforcemen learning domains wih deerminisic ransiions, so he above formulaion of RG is unbiased. One significan drawback of residual gradien is ha i is no covarian. Consider he algorihm a wo differen levels, as depiced in Figure 1. A one level we can consider how i moves hrough he space of possible Q funcions. A anoher level, we can consider how i moves hrough wo differen parameer spaces, each corresponding o a differen represenaion of Q. Alhough hese wo represenaions may produce differen updae direcions in parameer space, we would expec a good algorihm o resul in boh represenaions producing he same updae direcion in he space of Q funcions. 1 Such an algorihm would be called covarian. Because residual gradien is no covarian, he choice of how o represen Q θ influences he direcion ha RG moves in he space of Q funcions. Oher emporal difference (TD) learning algorihms like Sarsa(λ) and TDC (Suon e al. 2009) are also 1 For echnical correcness, we mus assume ha boh represenaions can represen he same se of Q funcions. 1767

2 Q - space Q ( s, a) Q Q ( h sa, ) h θ - space h - space Figure 1: Q-space denoes he space of possible Q funcions, while θ and h-space denoe wo differen parameer spaces. The circles denoe differen locaions in θ and h-space ha correspond o he same Q funcion. The blue and red arrows denoe possible direcions ha a non-covarian algorihm migh aemp o change he parameers, which correspond o differen direcions in Q-space. The purple arrow denoes he updae direcion ha a covarian algorihm migh produce, regardless of he parameerizaion of Q. no covarian. Naural gradiens can be viewed as a way o correc he direcion of an updae o accoun for a paricular parameerizaion. Alhough naural gradiens do no always resul in covarian updaes, hey frequenly do (Bagnell and Schneider 2003). Formally, consider he direcion of seepes ascen of a funcion, L(θ), where L : R n R. If we assume ha θ resides in Euclidean space, hen he gradien, L(θ), gives he direcion of seepes ascen. However, if we assume ha θ resides in a Riemannian space wih meric ensor G(θ), hen he direcion of seepes ascen is given by G(θ) 1 L(θ) (Amari 1998). Naural Residual Gradien In his secion we describe how naural gradien descen can be applied o he residual gradien algorihm. The naural RG updae is θ +1 = θ + α G(θ ) 1 δ g, (3) where G(θ ) is he meric ensor for he parameer space and g = Q θ (s, a ) γ Q θ (s +1, a +1 ). In mos reinforcemen learning applicaions of naural gradiens, he meric ensor is used o correc for he parameerizaion of a probabiliy disribuion. In hese cases he Fisher informaion marix is a naural choice for he meric ensor (Amari and Douglas 1998). However, we are using naural gradiens o correc for he parameerizaion of a value funcion, which is no a disribuion. For a relaed applicaion, Amari (1998) suggess a ransformaion of a parameerized funcion o a parameerized probabiliy disribuion. Using his ransformaion, he Fisher informaion marix is G(θ ) = E [ δ 2 g g ]. (4) In he appendix we prove ha he class of meric ensors o which Equaion 4 belongs all resul in covarian gradien algorihms. Algorihms Quadraic Compuaional Complexiy A sraighforward implemenaion of he naural residual gradien algorihm would mainain an esimae of G(θ) and compue G(θ) 1 a each ime sep. Due o he marix inversion, his naïve algorihm has per ime sep compuaional complexiy O( θ 3 ), where we ignore he complexiy of differeniaing Q θ. This can be improved o O( θ 2 ) using he Sherman-Morrison formula o mainain an esimae of G(θ ) 1 direcly. The resuling quadraic ime naural algorihm is given by Algorihm 1, where {α } is a sep size schedule saisfying =0 α = and =0 α2 <. Algorihm 1 Naural Residual Gradien Iniialize G 1 0 = I, θ 0 = 0 δ = r( + γq θ (s +1, a +1 ) Q θ (s ), a ) Qθ (s g =,a ) γ Q θ (s +1,a +1) G 1 = G 1 1 δ2 G 1 1 gg G δ 2g G 1 1 g g θ +1 = θ + α δ G 1 Linear Compuaional Complexiy To achieve linear compuaional complexiy, we leverage he idea of compaible funcion approximaion. 2 We begin by esimaing he TD-error, δ, wih a linear funcion approximaor w (δ g ), where w are he unable parameers of he linear funcion approximaor and δ g are he compaible feaures. Specifically, we search for a w ha is a local minimum of he loss funcion L: [ L(w) = E (1 δ w g ) 2]. (5) A a local minimum of L, L(w)/ w = 0, so E [(1 δ w g ) δ g ] =0, (6) E [δ g ] =E [ δ 2 g g ] w. (7) Noice ha he lef hand side of Eq. 7 is he expeced updae o θ in he non-naural algorihms. We can herefore wrie he expeced updae o θ as θ +1 = θ + α E [δ g ] = θ + α E [ δ 2 g g ] w. (8) Therefore he expeced naural residual gradien updae is θ +1 =θ + α G(θ) 1 E [δ g ], (9) =θ + α w. (10) The challenge remains ha locally opimal w mus be aained. For his we propose a wo-imescale approach idenical o ha of Bhanagar e al. (2009). Tha is, we perform sochasic gradien descen on L(w) using a sep size schedule {β } ha decays faser han he sep size schedule {α } for updaes o θ. The resuling linear-complexiy wo-imescale naural algorihm is given by Algorihm 2. 2 The compaible feaures ha we presen are compaible wih Q θ, whereas he compaible feaures originally defined by Suon e al. (2000) are compaible wih a parameerized policy. Alhough relaed, hese wo ypes of compaible feaures are no he same. 1768

3 Algorihm 2 Naural Linear-Time Residual Gradien Iniialize w 0 = 0, θ 0 = 0 δ = r + γq θ (s +1, a +1 ) Q θ (s, a ) g = Q θ (s,a ) γ Q θ (s +1,a +1) w +1 = w + β (1 δ w g ) δ g θ +1 = θ + α w +1 The convergence properies of hese wo-imescale algorihms have been well sudied and been shown o converge under appropriae assumpions (Bhanagar e al. 2009; Kushner and Yin 2003). To summarize, wih cerain smoohness assumpions, if α = β = ; =0 =0 α 2, β 2 < ; β = o(α ), =0 =0 hen, since β 0 faser han α, θ converges as hough i was following he rue expeced naural gradien. As a resul, he linear complexiy algorihms mainain he convergence guaranees of heir non-naural counerpars. Unforunaely, unlike compaible funcion approximaion for naural policy gradien algorihms (Bhanagar e al. 2009), i is no clear how a useful baseline could be added o he sochasic gradien descen updaes of w. The baseline, b, would have o saisfy E [bδ g ] = 0, which is no even saisfied by a consan non-zero b. Exensions The meric ensor ha we derived for RG can be applied o oher similar algorihms. For example, Sarsa(λ) is no a gradien mehod, however in many ways i is similar o residual gradien. We herefore propose he use of G(θ), derived for RG, wih Sarsa(λ). Alhough no as principled as is use wih RG, in boh cases i correcs for he curvaure of he squared Bellman error and he parameerizaion of Q. This sraighforward exension gives us he algorihm for Naural Sarsa(λ) (Algorihm 3), and a linear ime Naural Sarsa(λ) algorihm can be defined similar o Algorihm 2. Algorihm 3 Naural Sarsa(λ) Iniialize G 1 0 = I, e 0 = 0, θ 0 = 0 δ = r + γq θ (s +1, a +1 ) Q θ (s, a ) g = Q θ (s,a ) e = γλe 1 + g G 1 = G 1 1 δ2 G 1 1 gg G δ 2g G 1 1 g e θ +1 = θ + α δ G 1 Anoher emporal difference learning algorihm which is closely relaed o residual gradien is he TDC algorihm (Suon e al. 2009). TDC is a linear ime gradien descen algorihm for TD-learning wih linear funcion approximaion, and suppors off-policy learning. The TDC algorihm is given by, θ +1 = θ + α δ φ α γφ +1 (φ w ), (11) w +1 = w + β (δ φ w )φ, (12) where φ = Q θ (s,a ) are basis funcions of he linear funcion approximaion. TDC minimizes he mean squared projeced Bellman error (MSPBE) using a projecion operaor ha minimizes he value funcion approximaion error. Wih a differen projecion operaor he same derivaion resuls in he sandard residual gradien algorihm. Applying he TD meric ensor we ge Naural TDC (Algorihm 4). Algorihm 4 Naural TDC Iniialize G 1 0 = I, θ 0 = 0, w 0 = 0 δ = r + γq θ (s +1, a +1 ) Q θ (s, a ) g = φ γφ +1 G 1 = G 1 1 δ2 G 1 1 gg G δ 2g G 1 1 g θ +1 = θ + α G 1 (δ φ γφ +1 (φ w )) w +1 = w + β (δ φ w )φ Experimenal Resuls Our goal is o show ha naural TD mehods improve upon heir non-naural counerpars, no o promoe one TD mehod over anoher. So, we focus our experimens on comparing he quadraic and linear ime naural varians of emporal differen learning algorihms wih he original TD algorihms hey build upon. To evaluae he performance of naural residual gradien and naural Sarsa(λ), we performed experimens on wo canonical domains: mounain car and car-pole balancing, as well as one new challenging domain ha we call visual Tic-ac-oe. We used an ɛ-greedy policy for all TD-learning algorihms. TDC is no a conrol algorihm, and hus o evaluae he performance of naural TDC we generae experience from a fixed policy in he acrobo domain and measure he mean squared error (MSE) of he learned value funcion compared wih mone carlo rollous of he fixed policy. For mounain car, car-pole balancing, and acrobo we used linear funcion approximaion wih a hird-order Fourier basis (Konidaris e al. 2012). On visual Tic-ac-oe we used a fully-conneced feed-forward arificial neural nework wih one hidden layer of 20 nodes. This allows us o show he benefis of naural gradiens when he value funcion parameerizaion is non-linear and more complex. We opimized he algorihm parameers for all experimens using a randomized search as suggesed by Bergsra and Bengio (2012). We seleced he hyper-parameers ha resuled in he larges mean discouned reurn over 20 episodes for mounain car, 50 episodes for car-pole balancing, and 100, 000 episodes for visual ic-ac-oe. Each parameer se was esed 10 imes and he performance averaged. For mounain car and car pole each algorihm s performance is an average over 50 and 30 rials respecively, wih sandard deviaions shown in he shaded regions. For visual ic-ac-oe and acrobo, algorihm performance is averaged 1769

4 Figure 2: Mounain Car (Residual Gradien) Figure 4: Car Pole (Residual Gradien). Same legend as Figure 2 parameers become meaningful. Ou of all he algorihms we found ha he quadraic ime Naural Sarsa(λ) algorihm performed he bes in mounain car, reaching he bes policy afer jus wo episodes. Figure 3: Mounain Car (Sarsa(λ)) over 10 rials, again wih sandard deviaions shown by he shaded regions. For he Sarsa(λ) experimens we include resuls for Naural Acor-Criic (Peers and Schaal 2008), o provide a comparison wih anoher approach o applying naural gradiens o reinforcemen learning. However, for hese experimens we do no include he sandard deviaions because hey make he figures much harder o read. We used a sof-max policy wih Naural Acor-Criic (NAC). Mounain Car Mounain car is a simple simulaion of an underpowered car suck in a valley; full deails of he domain can be found in he work of Suon and Baro (1998). Figures 2 and 3 give he resuls for each algorihm on mounain car. The linear ime naural residual gradien and Sarsa(λ) algorihms ake longer o learn good policies han he quadraic ime naural algorihms. One reason for he slower iniial learning of he linear algorihms is ha hey mus firs build up an esimae of he w vecor before updaes o he value funcion Car Pole Balancing Car pole balancing simulaes a car on a shor one dimensional rack wih a pole aached wih a roaional hinge, and is also referred o as he invered pendulum problem. There are many varieies of he car pole balancing domain, and we refer he reader o Baro, Suon, and Anderson (1983) for complee deails. Figures 4 and 5 give he resuls for each algorihm on car pole balancing. In he car pole balancing domain he wo quadraic algorihms, Naural Sarsa(λ) and Naural RG perform he bes. Again, he linear algorihm, akes a slower sar as i builds up an esimae of w, bu converges well above he non-naural algorihms and very close o he quadraic ones. Naural Sarsa(λ) reaches a near opimal policy wihin he firs couple of episodes, and compares favorably wih he heavily opimized Sarsa(λ), which does no even reach he same level of performance afer 100 episodes. Visual Tic-Tac-Toe Visual Tic-Tac-Toe is a novel challenging decision problem in which he agen plays Tic-ac-oe (Noughs and crosses) agains an opponen ha makes random legal moves. The game board is a 3 3 grid of handwrien leers (X, O, and B for blank) from he UCI Leer Recogniion Daa Se (Slae 1991), examples of which are shown in Figure 8. A every sep of he episode, each leer of he game board is drawn randomly wih replacemen from he se of available handwrien leers (787 X s, 753 O s, and 766 B s). Thus, i is easily possible for he agen o never see he same handwrien X, O, or B leer in a given episode. The agen s sae feaures are he 16 ineger valued aribues for each of he leers on he board. Deails of he daa se and he aribues can be found in he UCI reposiory. 1770

5 Figure 5: Car Pole (Sarsa(λ)) Figure 7: Acrobo Experimens (TDC) Figure 8: Visual Tic-Tac-Toe example leers Figure 6: Visual Tic-Tac-Toe Experimens There are nine possible acions available o he agen, bu aemping o play on a non-blank square is considered an illegal move and resuls in he agen losing is urn. This is paricularly challenging because blank squares are marked by a B, making recognizing legal moves challenging in and of iself. The opponen only plays legal moves, bu chooses randomly among hem. The reward for winning is 100, 100 for losing, and 0 oherwise. Figure 6 gives he resuls comparing Naural-LT Sarsa and Sarsa(λ) on he visual Tic-ac-oe domain using he arificial neural nework described previously. These resuls show linear naural Sarsa(λ) in a seing where i is able o accoun for he shape of a more complex value funcion parameerizaion, and hus confer greaer improvemen in convergence speed over non-naural algorihms. We do no compare quadraic ime algorihms due o compuaional limis. Acrobo Acrobo is anoher commonly sudied reinforcemen learning ask in which he agen conrols a wo-link under acuaed robo by applying orque o he lower join wih he goal of raising he op of he lower link above a cerain poin. See Suon and Baro (1998) for a full specificaion of he domain and is equaions of moion. To evaluae he off-policy Naural TDC algorihm we firs generaed a fixed policy by online raining of a hand uned Sarsa(λ) agen for 200 episodes. We hen rained TDC and Naural TDC for episodes in acrobo following he previously learned fixed policy. We evaluaed an algorihm s learned value funcion every 100 episodes by sampling saes and acions randomly and compuing he rue expeced undiscouned reurn using Mone Carlo rollous following he fixed policy. Figure 7 shows he MSE beween he learned values and he rue expeced reurn. Naural TDC clearly ou performs TDC, and in his experimen converged o much lower MSE. Addiionally, we found TDC o be sensiive o he sep-sizes used, and saw ha Naural TDC was much less sensiive o hese parameers. These resuls show ha he benefis of naural emporal difference learning, already observed in he conex of conrol learning, exend o TD-learning for value funcion esimaion as well. Discussion and Conclusion We have presened he naural residual gradien algorihm and proved ha i is covarian. We suggesed ha he emporal difference learning meric ensor, derived for naural residual gradien, can be used o creae oher naural empo- 1771

6 ral difference learning algorihms like naural Sarsa(λ) and naural TDC. The resuling algorihms begin wih he ideniy marix as heir esimae of he (inverse) meric ensor. This means ha before an esimae of he (inverse) meric ensor has been formed, hey sill provide meaningful updaes hey follow esimaes of he non-naural gradien. We showed how he concep of compaible funcion approximaion can be leveraged o creae linear-ime naural residual gradien and naural Sarsa(λ) algorihms. However, unlike he quadraic-ime varians, hese linear-ime varians do no provide meaningful updaes unil he naural gradien has been esimaed. As a resul, learning is iniially slower using he linear-ime algorihms. In our empirical sudies, he naural varians of all hree algorihms ouperformed heir non-naural counerpars on all hree domains. Addiionally, he quadraic-ime varians learn faser iniially, as expeced. Lasly, we showed empirically ha he benefis of naural gradiens are amplified when using non-linear funcion approximaion. Appendix A Proof of Covarian Theorem: The following heorem and is proof closely follow and exend he foundaions laid by Bagnell and Schneider (2003) and laer clarified by Peers and Schaal (2008) when proving ha he naural policy gradien is covarian. No algorihm can be covarian for all parameerizaions. Thus, consrains on he parameerized funcions ha we consider are required. Propery 1. Funcions g : Φ X R, and h : Θ X R are wo insananeous loss funcions parameerized by φ Φ and θ Θ respecively. These correspond o he loss funcions ĝ(φ) = E x X [g(φ, x)] and ĥ(θ) = E x X[h(θ, x)]. For breviy, hereafer, we suppress he x inpus o g and h. There exiss a differeniable funcion, Ψ : Φ Θ, such ha for some φ Φ, we have g(φ) = h(ψ(φ)) and he Jacobian of Ψ is full rank. Definiion 1. Algorihm A is covarian if, for all g, h, Ψ, and φ saisfying Propery 1, g(φ + φ) = h(ψ(φ) + θ), (13) where φ + φ and Ψ(φ) + θ are he parameers afer an updae of algorihm A. Lemma 1. An algorihm A is covarian for sufficienly small sep-sizes if θ = φ. (14) Proof. Le J Ψ(φ) be he Jacobian of Ψ(φ), i.e., J Ψ(φ) =. As such, i maps angen vecors of h o angen vecors of g, such ha g(φ) = J Ψ(φ) h(ψ(φ)), (15) when g(φ) = h(ψ(φ)), as J Ψ(φ) is a angen map (Lee 2003, p. 63). Taking he firs order Taylor expansion of boh sides of (13), we obain h(ψ(φ)) + h(ψ(φ)) θ + O( θ 2 ) = g(φ) g(φ) + φ +O( φ 2 ). For small sep-sizes, α > 0, he squared norms become negligible, and because g(φ) = h(ψ(φ)), his simplifies o h(ψ(φ)) θ = g(φ) φ, = ( J Ψ(φ) h(ψ(φ)) ) φ, = h(ψ(φ)) J Ψ(φ) φ. (16) Noice ha (16) is saisfied by θ = J Ψ(φ) φ, and hus if his equaliy holds hen A is covarian. Theorem 1. The naural gradien updae θ = G 1 θ h(θ) is covarian when he meric ensor G θ is given by [ h(θ) h(θ) ] G θ = E. (17) x X Proof. Firs, noice ha he meric ensor G φ is equivalen o G θ wih J Ψ(φ) wice as a facor, [ g(φ) g(φ) ] G φ = E, x X [ ] h(ψ(φ)) h(ψ(φ)) = E (J Ψ(φ) )(J Ψ(φ) ), x X [ ] h(ψ(φ)) h(ψ(φ)) = E J Ψ(φ) J x X Ψ(φ), [ ] h(ψ(φ)) h(ψ(φ)) = J Ψ(φ) E J x X Ψ(φ), = J Ψ(φ) G θ J Ψ(φ). (18) We show ha he righ hand side of (14) is equal o he lef, which, by Lemma 1, implies ha he naural gradien updae is covarian. J Ψ(φ) φ = J Ψ(φ) αg 1 φ g(φ), = J Ψ(φ) αg+ φ g(φ), (19) ( + = αj Ψ(φ) J Ψ(φ) G θ JΨ(φ)) JΨ(φ) h(ψ(φ)), = αj Ψ(φ) (J Ψ(φ) )+ G + θ J + Ψ(φ) J Ψ(φ) h(ψ(φ)). Since J Ψ(φ) is full rank, J + Ψ(φ) is a lef inverse, and hus J Ψ(φ) φ = αg 1 θ h(ψ(φ)), = θ. Noice ha, unlike he proof ha he naural acor-criic using LSTD is covarian (Peers and Schaal 2008), our proof does no assume ha J Ψ(φ) is inverible. Our proof is herefore more general, since i allows φ θ. 1772

7 References Amari, S., and Douglas, S Why naural gradien? In Proceedings of he 1998 IEEE Inernaional Conference on Acousics, Speech, and Signal Processing (ICASSP 98), volume 2, Amari, S Naural gradien works efficienly in learning. Neural Compuaion 10: Bagnell, J. A., and Schneider, J Covarian policy search. In Proceedings of he Inernaional Join Conference on Arificial Inelligence, Baird, L Residual algorihms: reinforcemen learning wih funcion approximaion. In Proceedings of he Twelfh Inernaional Conference on Machine Learning. Baro, A. G.; Suon, R. S.; and Anderson, C. W Neuronlike adapive elemens ha can solve difficul learning conrol problems. IEEE Transacions on Sysems, Man, and Cyberneics 13(5): Bergsra, J., and Bengio, Y Random search for hyperparameer opimizaion. In Journal of Machine Learning Research. Bhanagar, S.; Suon, R. S.; Ghavamzadeh, M.; and Lee, M Naural acor-criic algorihms. Auomaica 45(11): Degris, T.; Pilarski, P. M.; and Suon, R. S Modelfree reinforcemen learning wih coninuous acion in pracice. In Proceedings of he 2012 American Conrol Conference. Kakade, S A naural policy gradien. In Advances in Neural Informaion Processing Sysems, volume 14, Konidaris, G. D.; Kuindersma, S. R.; Grupen, R. A.; and Baro, A. G Robo learning from demonsraion by consrucing skill rees. volume 31, Kushner, H. J., and Yin, G Sochasic Approximaion and Recursive Algorihms and Applicaions. Springer. Lee, J. M Inroducion o Smooh Manifolds. Springer. Morimura, T.; Uchibe, E.; and Doya, K Uilizing he naural gradien in emporal difference reinforcemen learning wih eligibiliy races. In Inernaional Symposium on Informaion Geomery and is Applicaion, Peers, J., and Schaal, S Naural acor-criic. Neurocompuing 71: Slae, D UCI machine learning reposiory. Suon, R. S., and Baro, A. G Reinforcemen Learning: An Inroducion. Cambridge, MA: MIT Press. Suon, R. S.; McAlleser, D.; Singh, S.; and Mansour, Y Policy gradien mehods for reinforcemen learning wih funcion approximaion. In Advances in Neural Informaion Processing Sysems 12, Suon, R. S.; Maei, H. R.; Precup, D.; Bhanagar, S.; Silver, D.; Szepesvári, C.; and Wiewiora, E Fas gradiendescen mehods for emporal-difference learning wih linear funcion approximaion. In Proceedings of he 26h Annual Inernaional Conference on Machine Learning, ACM. 1773

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

Presentation Overview

Presentation Overview Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

Chapter 3 Boundary Value Problem

Chapter 3 Boundary Value Problem Chaper 3 Boundary Value Problem A boundary value problem (BVP) is a problem, ypically an ODE or a PDE, which has values assigned on he physical boundary of he domain in which he problem is specified. Le

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

An Analysis of Actor/Critic Algorithms using Eligibility Traces: Reinforcement Learning with Imperfect Value Functions

An Analysis of Actor/Critic Algorithms using Eligibility Traces: Reinforcement Learning with Imperfect Value Functions An Analysis of Acor/Criic Algorihms using Eligibiliy Traces: Reinforcemen Learning wih Imperfec Value Funcions Hajime Kimura 3 Tokyo Insiue of Technology gen@fe.dis.iech.ac.jp Shigenobu Kobayashi Tokyo

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Learning to Take Concurrent Actions

Learning to Take Concurrent Actions Learning o Take Concurren Acions Khashayar Rohanimanesh Deparmen of Compuer Science Universiy of Massachuses Amhers, MA 0003 khash@cs.umass.edu Sridhar Mahadevan Deparmen of Compuer Science Universiy of

More information

Off-policy TD(λ) with a true online equivalence

Off-policy TD(λ) with a true online equivalence Off-policy TD(λ) wih a rue online equivalence Hado van Hassel A Rupam Mahmood Richard S Suon Reinforcemen Learning and Arificial Inelligence Laboraory Universiy of Albera, Edmonon, AB T6G 2E8 Canada Absrac

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

A Reinforcement Learning Approach for Collaborative Filtering

A Reinforcement Learning Approach for Collaborative Filtering A Reinforcemen Learning Approach for Collaboraive Filering Jungkyu Lee, Byonghwa Oh 2, Jihoon Yang 2, and Sungyong Park 2 Cyram Inc, Seoul, Korea jklee@cyram.com 2 Sogang Universiy, Seoul, Korea {mrfive,yangjh,parksy}@sogang.ac.kr

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

Some Basic Information about M-S-D Systems

Some Basic Information about M-S-D Systems Some Basic Informaion abou M-S-D Sysems 1 Inroducion We wan o give some summary of he facs concerning unforced (homogeneous) and forced (non-homogeneous) models for linear oscillaors governed by second-order,

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Tom Heskes and Onno Zoeter. Presented by Mark Buller Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

In this chapter the model of free motion under gravity is extended to objects projected at an angle. When you have completed it, you should

In this chapter the model of free motion under gravity is extended to objects projected at an angle. When you have completed it, you should Cambridge Universiy Press 978--36-60033-7 Cambridge Inernaional AS and A Level Mahemaics: Mechanics Coursebook Excerp More Informaion Chaper The moion of projeciles In his chaper he model of free moion

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

EXPLICIT TIME INTEGRATORS FOR NONLINEAR DYNAMICS DERIVED FROM THE MIDPOINT RULE

EXPLICIT TIME INTEGRATORS FOR NONLINEAR DYNAMICS DERIVED FROM THE MIDPOINT RULE Version April 30, 2004.Submied o CTU Repors. EXPLICIT TIME INTEGRATORS FOR NONLINEAR DYNAMICS DERIVED FROM THE MIDPOINT RULE Per Krysl Universiy of California, San Diego La Jolla, California 92093-0085,

More information

Single and Double Pendulum Models

Single and Double Pendulum Models Single and Double Pendulum Models Mah 596 Projec Summary Spring 2016 Jarod Har 1 Overview Differen ypes of pendulums are used o model many phenomena in various disciplines. In paricular, single and double

More information

Two Coupled Oscillators / Normal Modes

Two Coupled Oscillators / Normal Modes Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks - Deep Learning: Theory, Techniques & Applicaions - Recurren Neural Neworks - Prof. Maeo Maeucci maeo.maeucci@polimi.i Deparmen of Elecronics, Informaion and Bioengineering Arificial Inelligence and Roboics

More information

Temporal Abstraction in Temporal-difference Networks

Temporal Abstraction in Temporal-difference Networks Temporal Absracion in Temporal-difference Neworks Richard S. Suon, Eddie J. Rafols, Anna Koop Deparmen of Compuing Science Universiy of Albera Edmonon, AB, Canada T6G 2E8 {suon,erafols,anna}@cs.ualbera.ca

More information

CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9)

CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9) CSE/NB 528 Lecure 14: Reinforcemen Learning Chaper 9 Image from hp://clasdean.la.asu.edu/news/images/ubep2001/neuron3.jpg Lecure figures are from Dayan & Abbo s book hp://people.brandeis.edu/~abbo/book/index.hml

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

Policy Evaluation Using the Ω-Return

Policy Evaluation Using the Ω-Return Policy Evaluaion Using he Ω-Reurn Philip S. Thomas Universiy of Massachuses Amhers Carnegie Mellon Universiy Georgios Theocharous Adobe Research Sco Niekum Universiy of Texas a Ausin George Konidaris Duke

More information

KINEMATICS IN ONE DIMENSION

KINEMATICS IN ONE DIMENSION KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec

More information

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems Paricle Swarm Opimizaion Combining Diversificaion and Inensificaion for Nonlinear Ineger Programming Problems Takeshi Masui, Masaoshi Sakawa, Kosuke Kao and Koichi Masumoo Hiroshima Universiy 1-4-1, Kagamiyama,

More information

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves Rapid Terminaion Evaluaion for Recursive Subdivision of Bezier Curves Thomas F. Hain School of Compuer and Informaion Sciences, Universiy of Souh Alabama, Mobile, AL, U.S.A. Absrac Bézier curve flaening

More information

Sliding Mode Extremum Seeking Control for Linear Quadratic Dynamic Game

Sliding Mode Extremum Seeking Control for Linear Quadratic Dynamic Game Sliding Mode Exremum Seeking Conrol for Linear Quadraic Dynamic Game Yaodong Pan and Ümi Özgüner ITS Research Group, AIST Tsukuba Eas Namiki --, Tsukuba-shi,Ibaraki-ken 5-856, Japan e-mail: pan.yaodong@ais.go.jp

More information

Sequential Importance Resampling (SIR) Particle Filter

Sequential Importance Resampling (SIR) Particle Filter Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle

More information

On-line Adaptive Optimal Timing Control of Switched Systems

On-line Adaptive Optimal Timing Control of Switched Systems On-line Adapive Opimal Timing Conrol of Swiched Sysems X.C. Ding, Y. Wardi and M. Egersed Absrac In his paper we consider he problem of opimizing over he swiching imes for a muli-modal dynamic sysem when

More information

Lie Derivatives operator vector field flow push back Lie derivative of

Lie Derivatives operator vector field flow push back Lie derivative of Lie Derivaives The Lie derivaive is a mehod of compuing he direcional derivaive of a vecor field wih respec o anoher vecor field We already know how o make sense of a direcional derivaive of real valued

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

Electrical and current self-induction

Electrical and current self-induction Elecrical and curren self-inducion F. F. Mende hp://fmnauka.narod.ru/works.hml mende_fedor@mail.ru Absrac The aricle considers he self-inducance of reacive elemens. Elecrical self-inducion To he laws of

More information

True Online Temporal-Difference Learning. A. Rupam Mahmood Patrick M. Pilarski

True Online Temporal-Difference Learning. A. Rupam Mahmood Patrick M. Pilarski True Online Temporal-Difference Learning True Online Temporal-Difference Learning Harm van Seijen harm.vanseijen@ualbera.ca A. Rupam Mahmood ashique@ualbera.ca Parick M. Pilarski parick.pilarski@ualbera.ca

More information

2. Nonlinear Conservation Law Equations

2. Nonlinear Conservation Law Equations . Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear

More information

Ordinary Differential Equations

Ordinary Differential Equations Ordinary Differenial Equaions 5. Examples of linear differenial equaions and heir applicaions We consider some examples of sysems of linear differenial equaions wih consan coefficiens y = a y +... + a

More information

) were both constant and we brought them from under the integral.

) were both constant and we brought them from under the integral. YIELD-PER-RECRUIT (coninued The yield-per-recrui model applies o a cohor, bu we saw in he Age Disribuions lecure ha he properies of a cohor do no apply in general o a collecion of cohors, which is wha

More information

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070

More information

1. VELOCITY AND ACCELERATION

1. VELOCITY AND ACCELERATION 1. VELOCITY AND ACCELERATION 1.1 Kinemaics Equaions s = u + 1 a and s = v 1 a s = 1 (u + v) v = u + as 1. Displacemen-Time Graph Gradien = speed 1.3 Velociy-Time Graph Gradien = acceleraion Area under

More information

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous

More information

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes

More information

15. Vector Valued Functions

15. Vector Valued Functions 1. Vecor Valued Funcions Up o his poin, we have presened vecors wih consan componens, for example, 1, and,,4. However, we can allow he componens of a vecor o be funcions of a common variable. For example,

More information

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x WEEK-3 Reciaion PHYS 131 Ch. 3: FOC 1, 3, 4, 6, 14. Problems 9, 37, 41 & 71 and Ch. 4: FOC 1, 3, 5, 8. Problems 3, 5 & 16. Feb 8, 018 Ch. 3: FOC 1, 3, 4, 6, 14. 1. (a) The horizonal componen of he projecile

More information

Lecture 10: The Poincaré Inequality in Euclidean space

Lecture 10: The Poincaré Inequality in Euclidean space Deparmens of Mahemaics Monana Sae Universiy Fall 215 Prof. Kevin Wildrick n inroducion o non-smooh analysis and geomery Lecure 1: The Poincaré Inequaliy in Euclidean space 1. Wha is he Poincaré inequaliy?

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION DOI: 0.038/NCLIMATE893 Temporal resoluion and DICE * Supplemenal Informaion Alex L. Maren and Sephen C. Newbold Naional Cener for Environmenal Economics, US Environmenal Proecion

More information

Math Week 14 April 16-20: sections first order systems of linear differential equations; 7.4 mass-spring systems.

Math Week 14 April 16-20: sections first order systems of linear differential equations; 7.4 mass-spring systems. Mah 2250-004 Week 4 April 6-20 secions 7.-7.3 firs order sysems of linear differenial equaions; 7.4 mass-spring sysems. Mon Apr 6 7.-7.2 Sysems of differenial equaions (7.), and he vecor Calculus we need

More information

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013 IMPLICI AND INVERSE FUNCION HEOREMS PAUL SCHRIMPF 1 OCOBER 25, 213 UNIVERSIY OF BRIISH COLUMBIA ECONOMICS 526 We have exensively sudied how o solve sysems of linear equaions. We know how o check wheher

More information

The Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite

The Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite American Journal of Operaions Research, 08, 8, 8-9 hp://wwwscirporg/journal/ajor ISSN Online: 60-8849 ISSN Prin: 60-8830 The Opimal Sopping Time for Selling an Asse When I Is Uncerain Wheher he Price Process

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Matlab and Python programming: how to get started

Matlab and Python programming: how to get started Malab and Pyhon programming: how o ge sared Equipping readers he skills o wrie programs o explore complex sysems and discover ineresing paerns from big daa is one of he main goals of his book. In his chaper,

More information

A Hop Constrained Min-Sum Arborescence with Outage Costs

A Hop Constrained Min-Sum Arborescence with Outage Costs A Hop Consrained Min-Sum Arborescence wih Ouage Coss Rakesh Kawara Minnesoa Sae Universiy, Mankao, MN 56001 Email: Kawara@mnsu.edu Absrac The hop consrained min-sum arborescence wih ouage coss problem

More information

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC This documen was generaed a :45 PM 8/8/04 Copyrigh 04 Richard T. Woodward. An inroducion o dynamic opimizaion -- Opimal Conrol and Dynamic Programming AGEC 637-04 I. Overview of opimizaion Opimizaion is

More information

Chapter 21. Reinforcement Learning. The Reinforcement Learning Agent

Chapter 21. Reinforcement Learning. The Reinforcement Learning Agent CSE 47 Chaper Reinforcemen Learning The Reinforcemen Learning Agen Agen Sae u Reward r Acion a Enironmen CSE AI Faculy Why reinforcemen learning Programming an agen o drie a car or fly a helicoper is ery

More information

Orientation. Connections between network coding and stochastic network theory. Outline. Bruce Hajek. Multicast with lost packets

Orientation. Connections between network coding and stochastic network theory. Outline. Bruce Hajek. Multicast with lost packets Connecions beween nework coding and sochasic nework heory Bruce Hajek Orienaion On Thursday, Ralf Koeer discussed nework coding: coding wihin he nework Absrac: Randomly generaed coded informaion blocks

More information

III. Module 3. Empirical and Theoretical Techniques

III. Module 3. Empirical and Theoretical Techniques III. Module 3. Empirical and Theoreical Techniques Applied Saisical Techniques 3. Auocorrelaion Correcions Persisence affecs sandard errors. The radiional response is o rea he auocorrelaion as a echnical

More information

Notes on online convex optimization

Notes on online convex optimization Noes on online convex opimizaion Karl Sraos Online convex opimizaion (OCO) is a principled framework for online learning: OnlineConvexOpimizaion Inpu: convex se S, number of seps T For =, 2,..., T : Selec

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t M ah 5 2 7 Fall 2 0 0 9 L ecure 1 0 O c. 7, 2 0 0 9 Hamilon- J acobi Equaion: Explici Formulas In his lecure we ry o apply he mehod of characerisics o he Hamilon-Jacobi equaion: u + H D u, x = 0 in R n

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

ESTIMATION OF DYNAMIC PANEL DATA MODELS WHEN REGRESSION COEFFICIENTS AND INDIVIDUAL EFFECTS ARE TIME-VARYING

ESTIMATION OF DYNAMIC PANEL DATA MODELS WHEN REGRESSION COEFFICIENTS AND INDIVIDUAL EFFECTS ARE TIME-VARYING Inernaional Journal of Social Science and Economic Research Volume:02 Issue:0 ESTIMATION OF DYNAMIC PANEL DATA MODELS WHEN REGRESSION COEFFICIENTS AND INDIVIDUAL EFFECTS ARE TIME-VARYING Chung-ki Min Professor

More information

Balanced Importance Sampling Estimation

Balanced Importance Sampling Estimation In: Proceedings of he 11h Inernaional Conference on Informaion Processing and Managemen of Uncerainy in Knowledge-based Sysems IPMU), Paris, July -7, 006, pp. 66-73. Balanced Imporance Sampling Esimaion

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Expert Advice for Amateurs

Expert Advice for Amateurs Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he

More information

Introduction to Probability and Statistics Slides 4 Chapter 4

Introduction to Probability and Statistics Slides 4 Chapter 4 Inroducion o Probabiliy and Saisics Slides 4 Chaper 4 Ammar M. Sarhan, asarhan@mahsa.dal.ca Deparmen of Mahemaics and Saisics, Dalhousie Universiy Fall Semeser 8 Dr. Ammar Sarhan Chaper 4 Coninuous Random

More information

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models Journal of Saisical and Economeric Mehods, vol.1, no.2, 2012, 65-70 ISSN: 2241-0384 (prin), 2241-0376 (online) Scienpress Ld, 2012 A Specificaion Tes for Linear Dynamic Sochasic General Equilibrium Models

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate.

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate. Inroducion Gordon Model (1962): D P = r g r = consan discoun rae, g = consan dividend growh rae. If raional expecaions of fuure discoun raes and dividend growh vary over ime, so should he D/P raio. Since

More information

Logic in computer science

Logic in computer science Logic in compuer science Logic plays an imporan role in compuer science Logic is ofen called he calculus of compuer science Logic plays a similar role in compuer science o ha played by calculus in he physical

More information

An Introduction to Malliavin calculus and its applications

An Introduction to Malliavin calculus and its applications An Inroducion o Malliavin calculus and is applicaions Lecure 5: Smoohness of he densiy and Hörmander s heorem David Nualar Deparmen of Mahemaics Kansas Universiy Universiy of Wyoming Summer School 214

More information

Differential Harnack Estimates for Parabolic Equations

Differential Harnack Estimates for Parabolic Equations Differenial Harnack Esimaes for Parabolic Equaions Xiaodong Cao and Zhou Zhang Absrac Le M,g be a soluion o he Ricci flow on a closed Riemannian manifold In his paper, we prove differenial Harnack inequaliies

More information

Estimation of Poses with Particle Filters

Estimation of Poses with Particle Filters Esimaion of Poses wih Paricle Filers Dr.-Ing. Bernd Ludwig Chair for Arificial Inelligence Deparmen of Compuer Science Friedrich-Alexander-Universiä Erlangen-Nürnberg 12/05/2008 Dr.-Ing. Bernd Ludwig (FAU

More information

Solutions from Chapter 9.1 and 9.2

Solutions from Chapter 9.1 and 9.2 Soluions from Chaper 9 and 92 Secion 9 Problem # This basically boils down o an exercise in he chain rule from calculus We are looking for soluions of he form: u( x) = f( k x c) where k x R 3 and k is

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

Planning in POMDPs. Dominik Schoenberger Abstract

Planning in POMDPs. Dominik Schoenberger Abstract Planning in POMDPs Dominik Schoenberger d.schoenberger@sud.u-darmsad.de Absrac This documen briefly explains wha a Parially Observable Markov Decision Process is. Furhermore i inroduces he differen approaches

More information

Online Transfer Learning in Reinforcement Learning Domains

Online Transfer Learning in Reinforcement Learning Domains Online Transfer Learning in Reinforcemen Learning Domains Yusen Zhan, Mahew E. Taylor School of Elecrical Engineering and Compuer Science Washingon Sae Universiy {yzhan,aylorm}@eecs.wsu.edu Absrac This

More information