Policy Gradient Methods for Reinforcement Learning with Function Approximation

Size: px
Start display at page:

Download "Policy Gradient Methods for Reinforcement Learning with Function Approximation"

Transcription

1 Policy Grdient Method for Reinforcement Lerning with Function Approximtion Richrd S. Sutton, Dvid McAlleter, Stinder Singh, Yihy Mnour AT&T Lb Reerch, 180 Prk Avenue, Florhm Prk, NJ Abtrct Function pproximtion i eentil to reinforcement lerning, but the tndrd pproch of pproximting vlue function nd determining policy from it h o fr proven theoreticlly intrctble. In thi pper we explore n lterntive pproch in which the policy i explicitly repreented by it own function pproximtor, independent of the vlue function, nd i updted ccording to the grdient of expected rewrd with repect to the policy prmeter. Willim REINFORCE method nd ctor critic method re exmple of thi pproch. Our min new reult i to how tht the grdient cn be written in form uitble for etimtion from experience ided by n pproximte ction-vlue or dvntge function. Uing thi reult, we prove for the firt time tht verion of policy itertion with rbitrry differentible function pproximtion i convergent to loclly optiml policy. Lrge ppliction of reinforcement lerning (RL) require the ue of generlizing function pproximtor uch neurl network, deciion-tree, or intnce-bed method. The dominnt pproch for the lt decde h been the vlue-function pproch, in which ll function pproximtion effort goe into etimting vlue function, with the ction-election policy repreented implicitly the greedy policy with repect to the etimted vlue (e.g., the policy tht elect in ech tte the ction with highet etimted vlue). The vlue-function pproch h worked well in mny ppliction, but h everl limittion. Firt, it i oriented towrd finding determinitic policie, where the optiml policy i often tochtic, electing different ction with pecific probbilitie (e.g., ee Singh, Jkkol, nd Jordn, 1994). Second, n rbitrrily mll chnge in the etimted vlue of n ction cn cue it to be, or not be, elected. Such dicontinuou chnge hve been identified key obtcle to etblihing convergence urnce for lgorithm following the vlue-function pproch (Bertek nd Titikli, 1996). For exmple, Q-lerning, Sr, nd dynmic progrmming method hve ll been hown unble to converge to ny policy for imple MDP nd imple function pproximtor (Gordon, 1995, 1996; Bird, 1995; Titikli nd vn Roy, 1996; Bertek nd Titikli, 1996). Thi cn occur even if the bet pproximtion i found t ech tep before chnging the policy, nd whether the notion of bet i in the men-qured-error ene or the lightly different ene of reidul-grdient, temporl-difference, nd dynmic-progrmming method.

2 In thi pper we explore n lterntive pproch to function pproximtion in RL. Rther thn pproximting vlue function nd uing tht to compute determinitic policy, we pproximte tochtic policy directly uing n independent function pproximtor with it own prmeter. For exmple, the policy might be repreented by neurl network whoe input i repreenttion of the tte, whoe output i ction election probbilitie, nd whoe weight re the policy prmeter. Let θ denote the vector of policy prmeter nd ρ the performnce of the correponding policy (e.g., the verge rewrd per tep). Then, in the policy grdient pproch, the policy prmeter re updted pproximtely proportionl to the grdient: θ α, (1) where α i poitive-definite tep ize. If the bove cn be chieved, then θ cn uully be ured to converge to loclly optiml policy in the performnce meure ρ. Unlike the vlue-function pproch, here mll chnge in θ cn cue only mll chnge in the policy nd in the tte-viittion ditribution. In thi pper we prove tht n unbied etimte of the grdient (1) cn be obtined from experience uing n pproximte vlue function tifying certin propertie. Willim (1988, 1992) REINFORCE lgorithm lo find n unbied etimte of the grdient, but without the itnce of lerned vlue function. REINFORCE lern much more lowly thn RL method uing vlue function nd h received reltively little ttention. Lerning vlue function nd uing it to reduce the vrince of the grdient etimte pper to be eentil for rpid lerning. Jkkol, Singh nd Jordn (1995) proved reult very imilr to our for the pecil ce of function pproximtion correponding to tbulr POMDP. Our reult trengthen their nd generlize it to rbitrry differentible function pproximtor. Our reult lo ugget wy of proving the convergence of wide vriety of lgorithm bed on ctor-critic or policy-itertion rchitecture (e.g., Brto, Sutton, nd Anderon, 1983; Sutton, 1984; Kimur nd Kobyhi, 1998). In thi pper we tke the firt tep in thi direction by proving for the firt time tht verion of policy itertion with generl differentible function pproximtion i convergent to loclly optiml policy. Bird nd Moore (1999) obtined weker but uperficilly imilr reult for their VAPS fmily of method. Like policy-grdient method, VAPS include eprtely prmeterized policy nd vlue function updted by grdient method. However, VAPS method do not climb the grdient of performnce (expected long-term rewrd), but of meure combining performnce nd vluefunction ccurcy. A reult, VAPS doe not converge to loclly optiml policy, except in the ce tht no weight i put upon vlue-function ccurcy, in which ce VAPS degenerte to REINFORCE. Similrly, Gordon (1995) fitted vlue itertion i lo convergent nd vlue-bed, but doe not find loclly optiml policy. 1 Policy Grdient Theorem We conider the tndrd reinforcement lerning frmework (ee, e.g., Sutton nd Brto, 1998), in which lerning gent interct with Mrkov deciion proce (MDP). The tte, ction, nd rewrd t ech time t {0, 1, 2,...} re denoted t S, t A, nd r t Rrepectively. The environment dynmic re chrcterized by tte trnition probbilitie, P = Pr{ t+1 = t =, t = }, nd expected rewrd R = E {r t+1 t =, t = },, S, A. The gent deciion mking procedure t ech time i chrcterized by policy, π(,, θ) =Pr{ t = t =, θ}, S, A, where θ R l, for l << S, i prmeter vector. We ume tht π i diffentible with repect to it prmeter, i.e., tht π(,) exit. We lo uully write jut π(, ) for π(,, θ).

3 With function pproximtion, two wy of formulting the gent objective re ueful. One i the verge rewrd formultion, in which policie re rnked ccording to their long-term expected rewrd per tep, ρ(π): 1 ρ(π) = lim n n E {r 1 + r r n π} = d π () π(, )R, where d π () = lim t Pr{ t = 0,π} i the ttionry ditribution of tte under π, which we ume exit nd i independent of 0 for ll policie. In the verge rewrd formultion, the vlue of tte ction pir given policy i defined Q π (, ) = E {r t ρ(π) 0 =, 0 =, π}, S, A. t=1 The econd formultion we cover i tht in which there i deignted trt tte 0, nd we cre only bout the long-term rewrd obtined from it. We will give our reult only once, but they will pply to thi formultion well under the definition { } { } ρ(π) =E γ t 1 r t 0,π nd Q π (, ) =E γ k 1 r t+k t =, t =, π. t=1 where γ 0, 1] i dicount rte (γ = 1 i llowed only in epiodic tk). In thi formultion, we define d π () dicounted weighting of tte encountered trting t 0 nd then following π: d π () = t=0 γt Pr{ t = 0,π}. Our firt reult concern the grdient of the performnce metric with repect to the policy prmeter: Theorem 1 (Policy Grdient). For ny MDP, in either the verge-rewrd or trt-tte formultion, Proof: See the ppendix. = d π () k=1 Q π (, ). (2) Mrbch nd Titikli (1998) decribe relted but different expreion for the grdient in term of the tte-vlue function, citing Jkkol, Singh, nd Jordn (1995) nd Co nd Chen (1997). In both tht expreion nd our, the key point i tht their re no term of the form dπ () : the effect of policy chnge on the ditribution of tte doe not pper. Thi i convenient for pproximting the grdient by mpling. For exmple, if w mpled from the ditribution obtined by following π, then π(,) Q π (, ) would be n unbied etimte of. Of coure, Q π (, ) i lo not normlly known nd mut be etimted. One pproch i to ue the ctul return, R t = k=1 r t+k ρ(π) (or R t = k=1 γk 1 r t+k in the trt-tte formultion) n pproximtion for ech Q π ( t, t ). Thi led to Willim epiodic REINFORCE lgorithm, θ t π(t,t) 1 R t π( (the 1 t, t) π( t, t) correct for the overmpling of ction preferred by π), which i known to follow in expected vlue (Willim, 1988, 1992). 2 Policy Grdient with Approximtion Now conider the ce in which Q π i pproximted by lerned function pproximtor. If the pproximtion i ufficiently good, we might hope to ue it in plce of Q π in (2) nd till point roughly in the direction of the grdient. For exmple, Jkkol,

4 Singh, nd Jordn (1995) proved tht for the pecil ce of function pproximtion riing in tbulr POMDP one could ure poitive inner product with the grdient, which i ufficient to enure improvement for moving in tht direction. Here we extend their reult to generl function pproximtion nd prove equlity with the grdient. Let f w : S A Rbe our pproximtion to Q π, with prmeter w. It i nturl to lern f w by following π nd updting w by rule uch w t ˆQ π ( t, t ) f w ( t, t )] 2 ˆQ π ( t, t ) f w ( t, t )] fw(t,t), where ˆQ π ( t, t ) i ome unbied etimtor of Q π ( t, t ), perhp R t. When uch proce h converged to locl optimum, then d π () π(, ) Q π (, ) f w (, ) ] f w (, ) =0. (3) Theorem 2 (Policy Grdient with Function Approximtion). If f w tifie (3) nd i comptible with the policy prmeteriztion in the ene tht then f w (, ) = = d π () 1 π(, ), (4) f w (, ). (5) Proof: Combining (3) nd (4) give d π () Q π (, ) f w (, ) ] = 0 (6) which tell u tht the error in f w (, ) i orthogonl to the grdient of the policy prmeteriztion. Becue the expreion bove i zero, we cn ubtrct it from the policy grdient theorem (2) to yield = = = d π () d π () d π () Q π (, ) f w (, ). d π () Q π (, ) Q π (, )+f w (, )] Q π (, ) f w (, ) ] 3 Appliction to Deriving Algorithm nd Advntge Given policy prmeteriztion, Theorem 2 cn be ued to derive n pproprite form for the vlue-function prmeteriztion. For exmple, conider policy tht i Gibb ditribution in liner combintion of feture: π(, ) = eθt φ b eθt φ b, S, A, where ech φ i n l-dimenionl feture vector chrcterizing tte-ction pir,. Meeting the comptibility condition (4) require tht f w (, ) = 1 π(, ) = φ b π(, b)φ b,

5 o tht the nturl prmeteriztion of f w i f w (, ) =w T φ b π(, b)φ b ]. In other word, f w mut be liner in the me feture the policy, except normlized to be men zero for ech tte. Other lgorithm cn eily be derived for vriety of nonliner policy prmeteriztion, uch multi-lyer bckpropgtion network. The creful reder will hve noticed tht the form given bove for f w require tht it hve zero men for ech tte: π(, )f w(, ) = 0, S. In thi ene it i better to think of f w n pproximtion of the dvntge function, A π (, ) = Q π (, ) V π () (much in Bird, 1993), rther thn of Q π. Our convergence requirement (3) i relly tht f w get the reltive vlue of the ction correct in ech tte, not the bolute vlue, nor the vrition from tte to tte. Our reult cn be viewed jutifiction for the pecil ttu of dvntge the trget for vlue function pproximtion in RL. In fct, our (2), (3), nd (5), cn ll be generlized to include n rbitrry function of tte dded to the vlue function or it pproximtion. For exmple, (5) cn be generlized to = dπ () π(,) f w (, )+v()],where v : S Ri n rbitrry function. (Thi follow immeditely becue π(,) = 0, S.) The choice of v doe not ffect ny of our theorem, but cn ubtntilly ffect the vrince of the grdient etimtor. The iue here re entirely nlogou to thoe in the ue of reinforcement beline in erlier work (e.g., Willim, 1992; Dyn, 1991; Sutton, 1984). In prctice, v hould preumbly be et to the bet vilble pproximtion of V π. Our reult etblih tht tht pproximtion proce cn proceed without ffecting the expected evolution of f w nd π. 4 Convergence of Policy Itertion with Function Approximtion Given Theorem 2, we cn prove for the firt time tht form of policy itertion with function pproximtion i convergent to loclly optiml policy. Theorem 3 (Policy Itertion with Function Approximtion). Let π nd f w be ny differentible function pproximtor for the policy nd vlue function repectively tht tify the comptibility condition (4) nd for which mx θ,,,i,j 2 π(,) i j <B<. Let {α k } k=0 be ny tep-ize equence uch tht lim k α k = 0 nd k α k =. Then, for ny MDP with bounded rewrd, the equence {(θ k,w k )}, defined by ny θ 0, π k = π(,,θ k ), nd w k = w uch tht d π k () θ k+1 = θ k + α k d π k () π k (, )Q π k (, ) f w (, )] f w(, ) π k (, ) f wk (, ), (π converge uch tht lim k ) k = 0. Proof: Our Theorem 2 ure tht the θ k updte i in the direction of the grdient. The bound on 2 π(,) i j nd on the MDP rewrd together ure u tht i lo bounded. Thee, together with the tep-ize requirement, re the necery =0 2 ρ i j condition to pply Propoition 3.5 from pge 96 of Bertek nd Titikli (1996), which ure convergence to locl optimum.

6 Acknowledgement The uthor wih to thnk Mrth Steentrup nd Doin Precup for comment, nd Michel Kern for inight into the notion of optiml policy under function pproximtion. Reference Bird, L. C. (1993). Advntge Updting. Wright Lb. Technicl Report WL-TR Bird, L. C. (1995). Reidul lgorithm: Reinforcement lerning with function pproximtion. Proc. of the Twelfth Int. Conf. on Mchine Lerning, pp Morgn Kufmnn. Bird, L. C., Moore, A. W. (1999). Grdient decent for generl reinforcement lerning. NIPS 11. MIT Pre. Brto, A. G., Sutton, R. S., Anderon, C. W. (1983). Neuronlike element tht cn olve difficult lerning control problem. IEEE Trn. on Sytem, Mn, nd Cybernetic 13:835. Bertek, D. P., Titikli, J. N. (1996). Neuro-Dynmic Progrmming. Athen Scientific. Co, X.-R., Chen, H.-F. (1997). Perturbtion reliztion, potentil, nd enitivity nlyi of Mrkov Procee, IEEE Trn. on Automtic Control 42(10): Dyn, P. (1991). Reinforcement comprion. In D. S. Touretzky, J. L. Elmn, T. J. Sejnowki, nd G. E. Hinton (ed.), Connectionit Model: Proceeding of the 1990 Summer School, pp Morgn Kufmnn. Gordon, G. J. (1995). Stble function pproximtion in dynmic progrmming. Proceeding of the Twelfth Int. Conf. on Mchine Lerning, pp Morgn Kufmnn. Gordon, G. J. (1996). Chttering in SARSA(λ). CMU Lerning Lb Technicl Report. Jkkol, T., Singh, S. P., Jordn, M. I. (1995) Reinforcement lerning lgorithm for prtilly obervble Mrkov deciion problem, NIPS 7, pp Morgn Kufmn. Kimur, H., Kobyhi, S. (1998). An nlyi of ctor/critic lgorithm uing eligibility trce: Reinforcement lerning with imperfect vlue function. Proceeding of the Fifteenth Interntionl Conference on Mchine Lerning. Morgn Kufmnn. Mrbch, P., Titikli, J. N. (1998) Simultion-bed optimiztion of Mrkov rewrd procee, technicl report LIDS-P-2411, Mchuett Intitute of Technology. Singh, S. P., Jkkol, T., Jordn, M. I. (1994). Lerning without tte-etimtion in prtilly obervble Mrkovin deciion problem. Proceeding of the Eleventh Interntionl Conference on Mchine Lerning, pp Morgn Kufmnn. Sutton, R. S. (1984). Temporl Credit Aignment in Reinforcement Lerning. Ph.D. thei, Univerity of Mchuett, Amhert. Sutton, R. S., Brto, A. G. (1998). Reinforcement Lerning: An Introduction. MIT Pre. Titikli, J. N. Vn Roy, B. (1996). Feture-bed method for lrge cle dynmic progrmming. Mchine Lerning 22: Willim, R. J. (1988). Towrd theory of reinforcement-lerning connectionit ytem. Technicl Report NU-CCS-88-3, Northetern Univerity, College of Computer Science. Willim, R. J. (1992). Simple ttiticl grdient-following lgorithm for connectionit reinforcement lerning. Mchine Lerning 8: Appendix: Proof of Theorem 1 We prove the theorem firt for the verge-rewrd formultion nd then for the trttte formultion. V π () def = π(, )Q π (, ) S = Q π (, )+π(, ) ] Qπ (, ) = Q π (, )+π(, ) R ρ(π)+ ]] P V π ( )

7 Therefore, = = Q π (, )+π(, ) + ]] P V π ( ) Q π (, )+π(, ) ] P V π ( ) V π () Summing both ide over the ttionry ditribution d π, d π () = d π () Q π (, )+ d π () d π () V π (), π(, ) P V π ( ) but ince d π i ttionry, d π () = d π () = d π () For the trt-tte formultion: V π () def = π(, )Q π (, ) = Q π (, )+ d π ( ) V π ( ) Q π (, ). S d π () V π () Q π (, )+π(, ) ] Qπ (, ) Q π (, )+π(, ) = R + γp V π ( ) = Q π (, )+π(, ) ] γp V π ( ) (7) = γ k Pr( x, k, π) π(x, ) Q π (x, ), x k=0 fter everl tep of unrolling (7), where Pr( x, k, π) i the probbility of going from tte to tte x in k tep under policy π. It i then immedite tht = { E } γ t 1 r t 0,π = V π ( 0 ) = = t=1 γ k Pr( 0, k, π) Q π (, ) k=0 d π () Q π (, ). ]]

Reinforcement learning

Reinforcement learning Reinforcement lerning Regulr MDP Given: Trnition model P Rewrd function R Find: Policy π Reinforcement lerning Trnition model nd rewrd function initilly unknown Still need to find the right policy Lern

More information

Reinforcement Learning and Policy Reuse

Reinforcement Learning and Policy Reuse Reinforcement Lerning nd Policy Reue Mnuel M. Veloo PEL Fll 206 Reding: Reinforcement Lerning: An Introduction R. Sutton nd A. Brto Probbilitic policy reue in reinforcement lerning gent Fernndo Fernndez

More information

Artificial Intelligence Markov Decision Problems

Artificial Intelligence Markov Decision Problems rtificil Intelligence Mrkov eciion Problem ilon - briefly mentioned in hpter Ruell nd orvig - hpter 7 Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple: probbilitic blockworld ction outcome

More information

Reinforcement Learning for Robotic Locomotions

Reinforcement Learning for Robotic Locomotions Reinforcement Lerning for Robotic Locomotion Bo Liu Stnford Univerity 121 Cmpu Drive Stnford, CA 94305, USA bliuxix@tnford.edu Hunzhong Xu Stnford Univerity 121 Cmpu Drive Stnford, CA 94305, USA xuhunvc@tnford.edu

More information

TP 10:Importance Sampling-The Metropolis Algorithm-The Ising Model-The Jackknife Method

TP 10:Importance Sampling-The Metropolis Algorithm-The Ising Model-The Jackknife Method TP 0:Importnce Smpling-The Metropoli Algorithm-The Iing Model-The Jckknife Method June, 200 The Cnonicl Enemble We conider phyicl ytem which re in therml contct with n environment. The environment i uully

More information

Bias in Natural Actor-Critic Algorithms

Bias in Natural Actor-Critic Algorithms Bi in Nturl Actor-Critic Algorithm Philip S. Thom pthom@c.um.edu Deprtment of Computer Science, Univerity of Mchuett, Amhert, MA 01002 USA Technicl Report UM-CS-2012-018 Abtrct We how tht two populr dicounted

More information

Markov Decision Processes

Markov Decision Processes Mrkov Deciion Procee A Brief Introduction nd Overview Jck L. King Ph.D. Geno UK Limited Preenttion Outline Introduction to MDP Motivtion for Study Definition Key Point of Interet Solution Technique Prtilly

More information

Non-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes

Non-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes Non-Myopic Multi-Apect Sening with Prtilly Oervle Mrkov Deciion Procee Shiho Ji 2 Ronld Prr nd Lwrence Crin Deprtment of Electricl & Computer Engineering 2 Deprtment of Computer Engineering Duke Univerity

More information

CHOOSING THE NUMBER OF MODELS OF THE REFERENCE MODEL USING MULTIPLE MODELS ADAPTIVE CONTROL SYSTEM

CHOOSING THE NUMBER OF MODELS OF THE REFERENCE MODEL USING MULTIPLE MODELS ADAPTIVE CONTROL SYSTEM Interntionl Crpthin Control Conference ICCC 00 ALENOVICE, CZEC REPUBLIC y 7-30, 00 COOSING TE NUBER OF ODELS OF TE REFERENCE ODEL USING ULTIPLE ODELS ADAPTIVE CONTROL SYSTE rin BICĂ, Victor-Vleriu PATRICIU

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

20.2. The Transform and its Inverse. Introduction. Prerequisites. Learning Outcomes

20.2. The Transform and its Inverse. Introduction. Prerequisites. Learning Outcomes The Trnform nd it Invere 2.2 Introduction In thi Section we formlly introduce the Lplce trnform. The trnform i only pplied to cul function which were introduced in Section 2.1. We find the Lplce trnform

More information

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

STABILITY and Routh-Hurwitz Stability Criterion

STABILITY and Routh-Hurwitz Stability Criterion Krdeniz Technicl Univerity Deprtment of Electricl nd Electronic Engineering 6080 Trbzon, Turkey Chpter 8- nd Routh-Hurwitz Stbility Criterion Bu der notlrı dece bu deri ln öğrencilerin kullnımın çık olup,

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

4-4 E-field Calculations using Coulomb s Law

4-4 E-field Calculations using Coulomb s Law 1/11/5 ection_4_4_e-field_clcultion_uing_coulomb_lw_empty.doc 1/1 4-4 E-field Clcultion uing Coulomb Lw Reding Aignment: pp. 9-98 Specificlly: 1. HO: The Uniform, Infinite Line Chrge. HO: The Uniform Dik

More information

ARCHIVUM MATHEMATICUM (BRNO) Tomus 47 (2011), Kristína Rostás

ARCHIVUM MATHEMATICUM (BRNO) Tomus 47 (2011), Kristína Rostás ARCHIVUM MAHEMAICUM (BRNO) omu 47 (20), 23 33 MINIMAL AND MAXIMAL SOLUIONS OF FOURH ORDER IERAED DIFFERENIAL EQUAIONS WIH SINGULAR NONLINEARIY Kritín Rotá Abtrct. In thi pper we re concerned with ufficient

More information

Robot Planning in Partially Observable Continuous Domains

Robot Planning in Partially Observable Continuous Domains Robot Plnning in Prtilly Obervble Continuou Domin Joep M. Port Intitut de Robòtic i Informàtic Indutril (UPC-CSIC) Lloren i Artig 4-6, 828, Brcelon Spin Emil: port@iri.upc.edu Mtthij T. J. Spn Informtic

More information

Excerpted Section. Consider the stochastic diffusion without Poisson jumps governed by the stochastic differential equation (SDE)

Excerpted Section. Consider the stochastic diffusion without Poisson jumps governed by the stochastic differential equation (SDE) ? > ) 1 Technique in Computtionl Stochtic Dynmic Progrmming Floyd B. Hnon niverity of Illinoi t Chicgo Chicgo, Illinoi 60607-705 Excerpted Section A. MARKOV CHAI APPROXIMATIO Another pproch to finite difference

More information

Robot Planning in Partially Observable Continuous Domains

Robot Planning in Partially Observable Continuous Domains Robot Plnning in Prtilly Obervble Continuou Domin Joep M. Port Intitut de Robòtic i Informàtic Indutril (UPC-CSIC) Lloren i Artig 4-6, 828, Brcelon Spin Emil: port@iri.upc.edu Mtthij T. J. Spn Informtic

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

{ } = E! & $ " k r t +k +1

{ } = E! & $  k r t +k +1 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Efficient Planning in R-max

Efficient Planning in R-max Efficient Plnning in R-mx Mrek Grześ nd Jee Hoey Dvid R. Cheriton School of Computer Science, Univerity of Wterloo 200 Univerity Avenue Wet, Wterloo, ON, N2L 3G1, Cnd {mgrze, jhoey}@c.uwterloo.c ABSTRACT

More information

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:

More information

Scalable Learning in Stochastic Games

Scalable Learning in Stochastic Games Sclble Lerning in Stochstic Gmes Michel Bowling nd Mnuel Veloso Computer Science Deprtment Crnegie Mellon University Pittsburgh PA, 15213-3891 Abstrct Stochstic gmes re generl model of interction between

More information

2. The Laplace Transform

2. The Laplace Transform . The Lplce Trnform. Review of Lplce Trnform Theory Pierre Simon Mrqui de Lplce (749-87 French tronomer, mthemticin nd politicin, Miniter of Interior for 6 wee under Npoleon, Preident of Acdemie Frncie

More information

MArkov decision processes (MDPs) have been widely

MArkov decision processes (MDPs) have been widely Spre Mrkov Deciion Procee with Cul Spre Tlli Entropy Regulriztion for Reinforcement Lerning yungje Lee, Sungjoon Choi, nd Songhwi Oh rxiv:709.0693v3 [c.lg] 3 Oct 07 Abtrct In thi pper, re Mrkov deciion

More information

Analysis of Variance and Design of Experiments-II

Analysis of Variance and Design of Experiments-II Anlyi of Vrince nd Deign of Experiment-II MODULE VI LECTURE - 7 SPLIT-PLOT AND STRIP-PLOT DESIGNS Dr. Shlbh Deprtment of Mthemtic & Sttitic Indin Intitute of Technology Knpur Anlyi of covrince ith one

More information

CONTROL SYSTEMS LABORATORY ECE311 LAB 3: Control Design Using the Root Locus

CONTROL SYSTEMS LABORATORY ECE311 LAB 3: Control Design Using the Root Locus CONTROL SYSTEMS LABORATORY ECE311 LAB 3: Control Deign Uing the Root Locu 1 Purpoe The purpoe of thi lbortory i to deign cruie control ytem for cr uing the root locu. 2 Introduction Diturbnce D( ) = d

More information

COUNTING DESCENTS, RISES, AND LEVELS, WITH PRESCRIBED FIRST ELEMENT, IN WORDS

COUNTING DESCENTS, RISES, AND LEVELS, WITH PRESCRIBED FIRST ELEMENT, IN WORDS COUNTING DESCENTS, RISES, AND LEVELS, WITH PRESCRIBED FIRST ELEMENT, IN WORDS Sergey Kitev The Mthemtic Intitute, Reykvik Univerity, IS-03 Reykvik, Icelnd ergey@rui Toufik Mnour Deprtment of Mthemtic,

More information

M. A. Pathan, O. A. Daman LAPLACE TRANSFORMS OF THE LOGARITHMIC FUNCTIONS AND THEIR APPLICATIONS

M. A. Pathan, O. A. Daman LAPLACE TRANSFORMS OF THE LOGARITHMIC FUNCTIONS AND THEIR APPLICATIONS DEMONSTRATIO MATHEMATICA Vol. XLVI No 3 3 M. A. Pthn, O. A. Dmn LAPLACE TRANSFORMS OF THE LOGARITHMIC FUNCTIONS AND THEIR APPLICATIONS Abtrct. Thi pper del with theorem nd formul uing the technique of

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

SPACE VECTOR PULSE- WIDTH-MODULATED (SV-PWM) INVERTERS

SPACE VECTOR PULSE- WIDTH-MODULATED (SV-PWM) INVERTERS CHAPTER 7 SPACE VECTOR PULSE- WIDTH-MODULATED (SV-PWM) INVERTERS 7-1 INTRODUCTION In Chpter 5, we briefly icue current-regulte PWM inverter uing current-hyterei control, in which the witching frequency

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Math 2142 Homework 2 Solutions. Problem 1. Prove the following formulas for Laplace transforms for s > 0. a s 2 + a 2 L{cos at} = e st.

Math 2142 Homework 2 Solutions. Problem 1. Prove the following formulas for Laplace transforms for s > 0. a s 2 + a 2 L{cos at} = e st. Mth 2142 Homework 2 Solution Problem 1. Prove the following formul for Lplce trnform for >. L{1} = 1 L{t} = 1 2 L{in t} = 2 + 2 L{co t} = 2 + 2 Solution. For the firt Lplce trnform, we need to clculte:

More information

Accelerator Physics. G. A. Krafft Jefferson Lab Old Dominion University Lecture 5

Accelerator Physics. G. A. Krafft Jefferson Lab Old Dominion University Lecture 5 Accelertor Phyic G. A. Krfft Jefferon L Old Dominion Univerity Lecture 5 ODU Accelertor Phyic Spring 15 Inhomogeneou Hill Eqution Fundmentl trnvere eqution of motion in prticle ccelertor for mll devition

More information

On the Adders with Minimum Tests

On the Adders with Minimum Tests Proceeding of the 5th Ain Tet Sympoium (ATS '97) On the Adder with Minimum Tet Seiji Kjihr nd Tutomu So Dept. of Computer Science nd Electronic, Kyuhu Intitute of Technology Atrct Thi pper conider two

More information

APPENDIX 2 LAPLACE TRANSFORMS

APPENDIX 2 LAPLACE TRANSFORMS APPENDIX LAPLACE TRANSFORMS Thi ppendix preent hort introduction to Lplce trnform, the bic tool ued in nlyzing continuou ytem in the frequency domin. The Lplce trnform convert liner ordinry differentil

More information

Oracular Partially Observable Markov Decision Processes: A Very Special Case

Oracular Partially Observable Markov Decision Processes: A Very Special Case Orculr Prtilly Obervble Mrkov Deciion Procee: A Very Specil Ce Nichol Armtrong-Crew nd Mnuel Veloo Robotic Intitute, Crnegie Mellon Univerity {nrmtro,veloo}@c.cmu.edu Abtrct We introduce the Orculr Prtilly

More information

EE Control Systems LECTURE 8

EE Control Systems LECTURE 8 Coyright F.L. Lewi 999 All right reerved Udted: Sundy, Ferury, 999 EE 44 - Control Sytem LECTURE 8 REALIZATION AND CANONICAL FORMS A liner time-invrint (LTI) ytem cn e rereented in mny wy, including: differentil

More information

. The set of these fractions is then obviously Q, and we can define addition and multiplication on it in the expected way by

. The set of these fractions is then obviously Q, and we can define addition and multiplication on it in the expected way by 50 Andre Gthmnn 6. LOCALIZATION Locliztion i very powerful technique in commuttive lgebr tht often llow to reduce quetion on ring nd module to union of mller locl problem. It cn eily be motivted both from

More information

arxiv: v6 [stat.ml] 13 Apr 2018

arxiv: v6 [stat.ml] 13 Apr 2018 Expected Policy Grdient Kmil Cioek nd Shimon Whiteon Deprtment of Computer Science, Univerity of Oxford Wolfon Building, Prk Rod, Oxford OX1 3QD {kmil.cioek,himon.whiteon}@c.ox.c.uk rxiv:1706.05374v6 [tt.ml

More information

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007 A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus

More information

2π(t s) (3) B(t, ω) has independent increments, i.e., for any 0 t 1 <t 2 < <t n, the random variables

2π(t s) (3) B(t, ω) has independent increments, i.e., for any 0 t 1 <t 2 < <t n, the random variables 2 Brownin Motion 2.1 Definition of Brownin Motion Let Ω,F,P) be probbility pce. A tochtic proce i meurble function Xt, ω) defined on the product pce [, ) Ω. In prticulr, ) for ech t, Xt, ) i rndom vrible,

More information

PHYS 601 HW 5 Solution. We wish to find a Fourier expansion of e sin ψ so that the solution can be written in the form

PHYS 601 HW 5 Solution. We wish to find a Fourier expansion of e sin ψ so that the solution can be written in the form 5 Solving Kepler eqution Conider the Kepler eqution ωt = ψ e in ψ We wih to find Fourier expnion of e in ψ o tht the olution cn be written in the form ψωt = ωt + A n innωt, n= where A n re the Fourier

More information

The ifs Package. December 28, 2005

The ifs Package. December 28, 2005 The if Pckge December 28, 2005 Verion 0.1-1 Title Iterted Function Sytem Author S. M. Icu Mintiner S. M. Icu Iterted Function Sytem Licene GPL Verion 2 or lter. R topic documented:

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

Theoretical foundations of Gaussian quadrature

Theoretical foundations of Gaussian quadrature Theoreticl foundtions of Gussin qudrture 1 Inner product vector spce Definition 1. A vector spce (or liner spce) is set V = {u, v, w,...} in which the following two opertions re defined: (A) Addition of

More information

PRACTICE EXAM 2 SOLUTIONS

PRACTICE EXAM 2 SOLUTIONS MASSACHUSETTS INSTITUTE OF TECHNOLOGY Deprtment of Phyic Phyic 8.01x Fll Term 00 PRACTICE EXAM SOLUTIONS Proble: Thi i reltively trihtforwrd Newton Second Lw problem. We et up coordinte ytem which i poitive

More information

Linear predictive coding

Linear predictive coding Liner predictive coding Thi ethod cobine liner proceing with clr quntiztion. The in ide of the ethod i to predict the vlue of the current ple by liner cobintion of previou lredy recontructed ple nd then

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

Package ifs. R topics documented: August 21, Version Title Iterated Function Systems. Author S. M. Iacus.

Package ifs. R topics documented: August 21, Version Title Iterated Function Systems. Author S. M. Iacus. Pckge if Augut 21, 2015 Verion 0.1.5 Title Iterted Function Sytem Author S. M. Icu Dte 2015-08-21 Mintiner S. M. Icu Iterted Function Sytem Etimtor. Licene GPL (>= 2) NeedCompiltion

More information

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables Copyright 013 Peron Eduction, Inc. Tble nd Formul for Sullivn, Sttitic: Informed Deciion Uing Dt 013 Peron Eduction, Inc Chpter Orgnizing nd Summrizing Dt Reltive frequency = frequency um of ll frequencie

More information

ODE: Existence and Uniqueness of a Solution

ODE: Existence and Uniqueness of a Solution Mth 22 Fll 213 Jerry Kzdn ODE: Existence nd Uniqueness of Solution The Fundmentl Theorem of Clculus tells us how to solve the ordinry differentil eqution (ODE) du = f(t) dt with initil condition u() =

More information

Generation of Lyapunov Functions by Neural Networks

Generation of Lyapunov Functions by Neural Networks WCE 28, July 2-4, 28, London, U.K. Genertion of Lypunov Functions by Neurl Networks Nvid Noroozi, Pknoosh Krimghee, Ftemeh Sfei, nd Hmed Jvdi Abstrct Lypunov function is generlly obtined bsed on tril nd

More information

Physics 116C Solution of inhomogeneous ordinary differential equations using Green s functions

Physics 116C Solution of inhomogeneous ordinary differential equations using Green s functions Physics 6C Solution of inhomogeneous ordinry differentil equtions using Green s functions Peter Young November 5, 29 Homogeneous Equtions We hve studied, especilly in long HW problem, second order liner

More information

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms Mchine Lerning, 39, 287 308, 2000. c 2000 Kluwer Acdemic Publishers. Printed in The Netherlnds. Convergence Results for Single-Step On-Policy Reinforcement-Lerning Algorithms SATINDER SINGH AT&T Lbs-Reserch,

More information

Transfer Functions. Chapter 5. Transfer Functions. Derivation of a Transfer Function. Transfer Functions

Transfer Functions. Chapter 5. Transfer Functions. Derivation of a Transfer Function. Transfer Functions 5/4/6 PM : Trnfer Function Chpter 5 Trnfer Function Defined G() = Y()/U() preent normlized model of proce, i.e., cn be ued with n input. Y() nd U() re both written in devition vrible form. The form of

More information

Jack Simons, Henry Eyring Scientist and Professor Chemistry Department University of Utah

Jack Simons, Henry Eyring Scientist and Professor Chemistry Department University of Utah 1. Born-Oppenheimer pprox.- energy surfces 2. Men-field (Hrtree-Fock) theory- orbitls 3. Pros nd cons of HF- RHF, UHF 4. Beyond HF- why? 5. First, one usully does HF-how? 6. Bsis sets nd nottions 7. MPn,

More information

Optimal Treatment of Queueing Model for Highway

Optimal Treatment of Queueing Model for Highway Journl of Computtion & Modelling, vol.1, no.1, 011, 61-71 ISSN: 179-765 (print, 179-8850 (online Interntionl Scientific Pre, 011 Optiml Tretment of Queueing Model for Highwy I.A. Imil 1, G.S. Mokddi, S.A.

More information

Actor-Critic. Hung-yi Lee

Actor-Critic. Hung-yi Lee Actor-Critic Hung-yi Lee Asynchronous Advntge Actor-Critic (A3C) Volodymyr Mnih, Adrià Puigdomènech Bdi, Mehdi Mirz, Alex Grves, Timothy P. Lillicrp, Tim Hrley, Dvid Silver, Kory Kvukcuoglu, Asynchronous

More information

Matrix Solution to Linear Equations and Markov Chains

Matrix Solution to Linear Equations and Markov Chains Trding Systems nd Methods, Fifth Edition By Perry J. Kufmn Copyright 2005, 2013 by Perry J. Kufmn APPENDIX 2 Mtrix Solution to Liner Equtions nd Mrkov Chins DIRECT SOLUTION AND CONVERGENCE METHOD Before

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

Fatigue Failure of an Oval Cross Section Prismatic Bar at Pulsating Torsion ( )

Fatigue Failure of an Oval Cross Section Prismatic Bar at Pulsating Torsion ( ) World Engineering & Applied Science Journl 6 (): 7-, 5 ISS 79- IDOSI Publiction, 5 DOI:.59/idoi.wej.5.6.. Ftigue Filure of n Ovl Cro Section Primtic Br t Pulting Torion L.Kh. Tlybly nd.m. giyev Intitute

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

Construction of Gauss Quadrature Rules

Construction of Gauss Quadrature Rules Jim Lmbers MAT 772 Fll Semester 2010-11 Lecture 15 Notes These notes correspond to Sections 10.2 nd 10.3 in the text. Construction of Guss Qudrture Rules Previously, we lerned tht Newton-Cotes qudrture

More information

Best Approximation in the 2-norm

Best Approximation in the 2-norm Jim Lmbers MAT 77 Fll Semester 1-11 Lecture 1 Notes These notes correspond to Sections 9. nd 9.3 in the text. Best Approximtion in the -norm Suppose tht we wish to obtin function f n (x) tht is liner combintion

More information

VSS CONTROL OF STRIP STEERING FOR HOT ROLLING MILLS. M.Okada, K.Murayama, Y.Anabuki, Y.Hayashi

VSS CONTROL OF STRIP STEERING FOR HOT ROLLING MILLS. M.Okada, K.Murayama, Y.Anabuki, Y.Hayashi V ONTROL OF TRIP TEERING FOR OT ROLLING MILL M.Okd.Murym Y.Anbuki Y.yhi Wet Jpn Work (urhiki Ditrict) JFE teel orportion wkidori -chome Mizuhim urhiki 7-85 Jpn Abtrct: trip teering i one of the mot eriou

More information

1.9 C 2 inner variations

1.9 C 2 inner variations 46 CHAPTER 1. INDIRECT METHODS 1.9 C 2 inner vritions So fr, we hve restricted ttention to liner vritions. These re vritions of the form vx; ǫ = ux + ǫφx where φ is in some liner perturbtion clss P, for

More information

Lecture 19: Continuous Least Squares Approximation

Lecture 19: Continuous Least Squares Approximation Lecture 19: Continuous Lest Squres Approximtion 33 Continuous lest squres pproximtion We begn 31 with the problem of pproximting some f C[, b] with polynomil p P n t the discrete points x, x 1,, x m for

More information

An Application of the Generalized Shrunken Least Squares Estimator on Principal Component Regression

An Application of the Generalized Shrunken Least Squares Estimator on Principal Component Regression An Appliction of the Generlized Shrunken Let Squre Etimtor on Principl Component Regreion. Introduction Profeor Jnn-Huei Jinn Deprtment of Sttitic Grnd Vlley Stte Univerity Allendle, MI 0 USA Profeor Chwn-Chin

More information

Solutions Problem Set 2. Problem (a) Let M denote the DFA constructed by swapping the accept and non-accepting state in M.

Solutions Problem Set 2. Problem (a) Let M denote the DFA constructed by swapping the accept and non-accepting state in M. Solution Prolem Set 2 Prolem.4 () Let M denote the DFA contructed y wpping the ccept nd non-ccepting tte in M. For ny tring w B, w will e ccepted y M, tht i, fter conuming the tring w, M will e in n ccepting

More information

Low-order simultaneous stabilization of linear bicycle models at different forward speeds

Low-order simultaneous stabilization of linear bicycle models at different forward speeds 203 Americn Control Conference (ACC) Whington, DC, USA, June 7-9, 203 Low-order imultneou tbiliztion of liner bicycle model t different forwrd peed A. N. Gündeş nd A. Nnngud 2 Abtrct Liner model of bicycle

More information

Anytime algorithms for multiagent decision making using coordination graphs

Anytime algorithms for multiagent decision making using coordination graphs Anytime lgorithms for multigent decision mking using coordintion grphs N. Vlssis R. Elhorst J. R. Kok Informtics Institute, University of Amsterdm, The Netherlnds {vlssis,reinhrst,jellekok}@science.uv.nl

More information

Tests for the Ratio of Two Poisson Rates

Tests for the Ratio of Two Poisson Rates Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson

More information

REINFORCEMENT learning (RL) was originally studied

REINFORCEMENT learning (RL) was originally studied IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 3, MARCH 2015 385 Multiobjective Reinforcement Lerning: A Comprehensive Overview Chunming Liu, Xin Xu, Senior Member, IEEE, nd

More information

The Fundamental Theorem of Calculus. The Total Change Theorem and the Area Under a Curve.

The Fundamental Theorem of Calculus. The Total Change Theorem and the Area Under a Curve. Clculus Li Vs The Fundmentl Theorem of Clculus. The Totl Chnge Theorem nd the Are Under Curve. Recll the following fct from Clculus course. If continuous function f(x) represents the rte of chnge of F

More information

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Riemann is the Mann! (But Lebesgue may besgue to differ.) Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >

More information

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize

More information

Chapters 4 & 5 Integrals & Applications

Chapters 4 & 5 Integrals & Applications Contents Chpters 4 & 5 Integrls & Applictions Motivtion to Chpters 4 & 5 2 Chpter 4 3 Ares nd Distnces 3. VIDEO - Ares Under Functions............................................ 3.2 VIDEO - Applictions

More information

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed

More information

Near-Bayesian Exploration in Polynomial Time

Near-Bayesian Exploration in Polynomial Time J. Zico Kolter kolter@cs.stnford.edu Andrew Y. Ng ng@cs.stnford.edu Computer Science Deprtment, Stnford University, CA 94305 Abstrct We consider the explortion/exploittion problem in reinforcement lerning

More information

A Signal-Level Fusion Model for Image-Based Change Detection in DARPA's Dynamic Database System

A Signal-Level Fusion Model for Image-Based Change Detection in DARPA's Dynamic Database System SPIE Aerosense 001 Conference on Signl Processing, Sensor Fusion, nd Trget Recognition X, April 16-0, Orlndo FL. (Minor errors in published version corrected.) A Signl-Level Fusion Model for Imge-Bsed

More information

The steps of the hypothesis test

The steps of the hypothesis test ttisticl Methods I (EXT 7005) Pge 78 Mosquito species Time of dy A B C Mid morning 0.0088 5.4900 5.5000 Mid Afternoon.3400 0.0300 0.8700 Dusk 0.600 5.400 3.000 The Chi squre test sttistic is the sum of

More information

CSCI565 - Compiler Design

CSCI565 - Compiler Design CSCI565 - Compiler Deign Spring 6 Due Dte: Fe. 5, 6 t : PM in Cl Prolem [ point]: Regulr Expreion nd Finite Automt Develop regulr expreion (RE) tht detet the longet tring over the lphet {-} with the following

More information

Jonathan Mugan. July 15, 2013

Jonathan Mugan. July 15, 2013 Jonthn Mugn July 15, 2013 Imgine rt in Skinner box. The rt cn see screen of imges, nd dot in the lower-right corner determines if there will be shock. Bottom-up methods my not find this dot, but top-down

More information

ELECTRICAL CIRCUITS 10. PART II BAND PASS BUTTERWORTH AND CHEBYSHEV

ELECTRICAL CIRCUITS 10. PART II BAND PASS BUTTERWORTH AND CHEBYSHEV 45 ELECTRICAL CIRCUITS 0. PART II BAND PASS BUTTERWRTH AND CHEBYSHEV Introduction Bnd p ctive filter re different enough from the low p nd high p ctive filter tht the ubject will be treted eprte prt. Thi

More information

MATH34032: Green s Functions, Integral Equations and the Calculus of Variations 1

MATH34032: Green s Functions, Integral Equations and the Calculus of Variations 1 MATH34032: Green s Functions, Integrl Equtions nd the Clculus of Vritions 1 Section 1 Function spces nd opertors Here we gives some brief detils nd definitions, prticulrly relting to opertors. For further

More information

Error Estimation of Practical Convolution Discrete Gaussian Sampling

Error Estimation of Practical Convolution Discrete Gaussian Sampling Error Etimtion of Prcticl Convolution Dicrete Guin Smpling Zhongxing Zheng, Xioyun Wng,3, Gungwu Xu 4, Chunhun Zho Deprtment of Computer Science nd Technology, Tinghu Univerity, Beijing 00084, Chin Intitute

More information

Estimation of Regions of Attraction of Spin Modes

Estimation of Regions of Attraction of Spin Modes 7 TH EUROPEAN CONFERENCE FOR AEROSPACE SCIENCES (EUCASS) Etimtion of Region of Attrction of Spin Mode Alexnder Khrbrov, Mri Sidoryuk, nd Dmitry Igntyev Centrl Aerohydrodynmic Intitute (TAGI), Zhukovky,

More information

Section 11.5 Estimation of difference of two proportions

Section 11.5 Estimation of difference of two proportions ection.5 Estimtion of difference of two proportions As seen in estimtion of difference of two mens for nonnorml popultion bsed on lrge smple sizes, one cn use CLT in the pproximtion of the distribution

More information

STOCHASTIC REGULAR LANGUAGE: A MATHEMATICAL MODEL FOR THE LANGUAGE OF SEQUENTIAL ACTIONS FOR DECISION MAKING UNDER UNCERTAINTY

STOCHASTIC REGULAR LANGUAGE: A MATHEMATICAL MODEL FOR THE LANGUAGE OF SEQUENTIAL ACTIONS FOR DECISION MAKING UNDER UNCERTAINTY Interntionl Journl of Mthemtic nd Computer Appliction Reerch (IJMCAR) ISSN 49-6955 Vol. 3, Iue, Mr 3, -8 TJPRC Pvt. Ltd. STOCHASTIC REGULAR LANGUAGE: A MATHEMATICAL MODEL FOR THE LANGUAGE OF SEQUENTIAL

More information

4.4 Areas, Integrals and Antiderivatives

4.4 Areas, Integrals and Antiderivatives . res, integrls nd ntiderivtives 333. Ares, Integrls nd Antiderivtives This section explores properties of functions defined s res nd exmines some connections mong res, integrls nd ntiderivtives. In order

More information

Applying Q-Learning to Flappy Bird

Applying Q-Learning to Flappy Bird Applying Q-Lerning to Flppy Bird Moritz Ebeling-Rump, Mnfred Ko, Zchry Hervieux-Moore Abstrct The field of mchine lerning is n interesting nd reltively new re of reserch in rtificil intelligence. In this

More information

Multi-objective optimization of dielectric layer photonic crystal filter

Multi-objective optimization of dielectric layer photonic crystal filter Optic Applict, Vol. XLVII, No. 1, 017 DOI: 10.577/o170103 Multi-objective optimiztion of dielectric lyer photonic crystl filter HONGWEI YANG *, CUIYING HUANG, SHANSHAN MENG College of Applied Sciences,

More information

An approximation to the arithmetic-geometric mean. G.J.O. Jameson, Math. Gazette 98 (2014), 85 95

An approximation to the arithmetic-geometric mean. G.J.O. Jameson, Math. Gazette 98 (2014), 85 95 An pproximtion to the rithmetic-geometric men G.J.O. Jmeson, Mth. Gzette 98 (4), 85 95 Given positive numbers > b, consider the itertion given by =, b = b nd n+ = ( n + b n ), b n+ = ( n b n ) /. At ech

More information

Applicability of Matrix Inverse in Simple Model of Economics An Analysis

Applicability of Matrix Inverse in Simple Model of Economics An Analysis IOSR Journl of Mthemtic IOSR-JM e-issn: 78-578, p-issn: 39-765X. Volume, Iue 5 Ver. VI Sep-Oct. 4, PP 7-34 pplicility of Mtrix Invere in Simple Moel of Economic n nlyi Mr. nupm Srm Deprtment of Economic

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information