Policy Gradient Methods for Reinforcement Learning with Function Approximation
|
|
- Clinton Bond
- 6 years ago
- Views:
Transcription
1 Policy Grdient Method for Reinforcement Lerning with Function Approximtion Richrd S. Sutton, Dvid McAlleter, Stinder Singh, Yihy Mnour AT&T Lb Reerch, 180 Prk Avenue, Florhm Prk, NJ Abtrct Function pproximtion i eentil to reinforcement lerning, but the tndrd pproch of pproximting vlue function nd determining policy from it h o fr proven theoreticlly intrctble. In thi pper we explore n lterntive pproch in which the policy i explicitly repreented by it own function pproximtor, independent of the vlue function, nd i updted ccording to the grdient of expected rewrd with repect to the policy prmeter. Willim REINFORCE method nd ctor critic method re exmple of thi pproch. Our min new reult i to how tht the grdient cn be written in form uitble for etimtion from experience ided by n pproximte ction-vlue or dvntge function. Uing thi reult, we prove for the firt time tht verion of policy itertion with rbitrry differentible function pproximtion i convergent to loclly optiml policy. Lrge ppliction of reinforcement lerning (RL) require the ue of generlizing function pproximtor uch neurl network, deciion-tree, or intnce-bed method. The dominnt pproch for the lt decde h been the vlue-function pproch, in which ll function pproximtion effort goe into etimting vlue function, with the ction-election policy repreented implicitly the greedy policy with repect to the etimted vlue (e.g., the policy tht elect in ech tte the ction with highet etimted vlue). The vlue-function pproch h worked well in mny ppliction, but h everl limittion. Firt, it i oriented towrd finding determinitic policie, where the optiml policy i often tochtic, electing different ction with pecific probbilitie (e.g., ee Singh, Jkkol, nd Jordn, 1994). Second, n rbitrrily mll chnge in the etimted vlue of n ction cn cue it to be, or not be, elected. Such dicontinuou chnge hve been identified key obtcle to etblihing convergence urnce for lgorithm following the vlue-function pproch (Bertek nd Titikli, 1996). For exmple, Q-lerning, Sr, nd dynmic progrmming method hve ll been hown unble to converge to ny policy for imple MDP nd imple function pproximtor (Gordon, 1995, 1996; Bird, 1995; Titikli nd vn Roy, 1996; Bertek nd Titikli, 1996). Thi cn occur even if the bet pproximtion i found t ech tep before chnging the policy, nd whether the notion of bet i in the men-qured-error ene or the lightly different ene of reidul-grdient, temporl-difference, nd dynmic-progrmming method.
2 In thi pper we explore n lterntive pproch to function pproximtion in RL. Rther thn pproximting vlue function nd uing tht to compute determinitic policy, we pproximte tochtic policy directly uing n independent function pproximtor with it own prmeter. For exmple, the policy might be repreented by neurl network whoe input i repreenttion of the tte, whoe output i ction election probbilitie, nd whoe weight re the policy prmeter. Let θ denote the vector of policy prmeter nd ρ the performnce of the correponding policy (e.g., the verge rewrd per tep). Then, in the policy grdient pproch, the policy prmeter re updted pproximtely proportionl to the grdient: θ α, (1) where α i poitive-definite tep ize. If the bove cn be chieved, then θ cn uully be ured to converge to loclly optiml policy in the performnce meure ρ. Unlike the vlue-function pproch, here mll chnge in θ cn cue only mll chnge in the policy nd in the tte-viittion ditribution. In thi pper we prove tht n unbied etimte of the grdient (1) cn be obtined from experience uing n pproximte vlue function tifying certin propertie. Willim (1988, 1992) REINFORCE lgorithm lo find n unbied etimte of the grdient, but without the itnce of lerned vlue function. REINFORCE lern much more lowly thn RL method uing vlue function nd h received reltively little ttention. Lerning vlue function nd uing it to reduce the vrince of the grdient etimte pper to be eentil for rpid lerning. Jkkol, Singh nd Jordn (1995) proved reult very imilr to our for the pecil ce of function pproximtion correponding to tbulr POMDP. Our reult trengthen their nd generlize it to rbitrry differentible function pproximtor. Our reult lo ugget wy of proving the convergence of wide vriety of lgorithm bed on ctor-critic or policy-itertion rchitecture (e.g., Brto, Sutton, nd Anderon, 1983; Sutton, 1984; Kimur nd Kobyhi, 1998). In thi pper we tke the firt tep in thi direction by proving for the firt time tht verion of policy itertion with generl differentible function pproximtion i convergent to loclly optiml policy. Bird nd Moore (1999) obtined weker but uperficilly imilr reult for their VAPS fmily of method. Like policy-grdient method, VAPS include eprtely prmeterized policy nd vlue function updted by grdient method. However, VAPS method do not climb the grdient of performnce (expected long-term rewrd), but of meure combining performnce nd vluefunction ccurcy. A reult, VAPS doe not converge to loclly optiml policy, except in the ce tht no weight i put upon vlue-function ccurcy, in which ce VAPS degenerte to REINFORCE. Similrly, Gordon (1995) fitted vlue itertion i lo convergent nd vlue-bed, but doe not find loclly optiml policy. 1 Policy Grdient Theorem We conider the tndrd reinforcement lerning frmework (ee, e.g., Sutton nd Brto, 1998), in which lerning gent interct with Mrkov deciion proce (MDP). The tte, ction, nd rewrd t ech time t {0, 1, 2,...} re denoted t S, t A, nd r t Rrepectively. The environment dynmic re chrcterized by tte trnition probbilitie, P = Pr{ t+1 = t =, t = }, nd expected rewrd R = E {r t+1 t =, t = },, S, A. The gent deciion mking procedure t ech time i chrcterized by policy, π(,, θ) =Pr{ t = t =, θ}, S, A, where θ R l, for l << S, i prmeter vector. We ume tht π i diffentible with repect to it prmeter, i.e., tht π(,) exit. We lo uully write jut π(, ) for π(,, θ).
3 With function pproximtion, two wy of formulting the gent objective re ueful. One i the verge rewrd formultion, in which policie re rnked ccording to their long-term expected rewrd per tep, ρ(π): 1 ρ(π) = lim n n E {r 1 + r r n π} = d π () π(, )R, where d π () = lim t Pr{ t = 0,π} i the ttionry ditribution of tte under π, which we ume exit nd i independent of 0 for ll policie. In the verge rewrd formultion, the vlue of tte ction pir given policy i defined Q π (, ) = E {r t ρ(π) 0 =, 0 =, π}, S, A. t=1 The econd formultion we cover i tht in which there i deignted trt tte 0, nd we cre only bout the long-term rewrd obtined from it. We will give our reult only once, but they will pply to thi formultion well under the definition { } { } ρ(π) =E γ t 1 r t 0,π nd Q π (, ) =E γ k 1 r t+k t =, t =, π. t=1 where γ 0, 1] i dicount rte (γ = 1 i llowed only in epiodic tk). In thi formultion, we define d π () dicounted weighting of tte encountered trting t 0 nd then following π: d π () = t=0 γt Pr{ t = 0,π}. Our firt reult concern the grdient of the performnce metric with repect to the policy prmeter: Theorem 1 (Policy Grdient). For ny MDP, in either the verge-rewrd or trt-tte formultion, Proof: See the ppendix. = d π () k=1 Q π (, ). (2) Mrbch nd Titikli (1998) decribe relted but different expreion for the grdient in term of the tte-vlue function, citing Jkkol, Singh, nd Jordn (1995) nd Co nd Chen (1997). In both tht expreion nd our, the key point i tht their re no term of the form dπ () : the effect of policy chnge on the ditribution of tte doe not pper. Thi i convenient for pproximting the grdient by mpling. For exmple, if w mpled from the ditribution obtined by following π, then π(,) Q π (, ) would be n unbied etimte of. Of coure, Q π (, ) i lo not normlly known nd mut be etimted. One pproch i to ue the ctul return, R t = k=1 r t+k ρ(π) (or R t = k=1 γk 1 r t+k in the trt-tte formultion) n pproximtion for ech Q π ( t, t ). Thi led to Willim epiodic REINFORCE lgorithm, θ t π(t,t) 1 R t π( (the 1 t, t) π( t, t) correct for the overmpling of ction preferred by π), which i known to follow in expected vlue (Willim, 1988, 1992). 2 Policy Grdient with Approximtion Now conider the ce in which Q π i pproximted by lerned function pproximtor. If the pproximtion i ufficiently good, we might hope to ue it in plce of Q π in (2) nd till point roughly in the direction of the grdient. For exmple, Jkkol,
4 Singh, nd Jordn (1995) proved tht for the pecil ce of function pproximtion riing in tbulr POMDP one could ure poitive inner product with the grdient, which i ufficient to enure improvement for moving in tht direction. Here we extend their reult to generl function pproximtion nd prove equlity with the grdient. Let f w : S A Rbe our pproximtion to Q π, with prmeter w. It i nturl to lern f w by following π nd updting w by rule uch w t ˆQ π ( t, t ) f w ( t, t )] 2 ˆQ π ( t, t ) f w ( t, t )] fw(t,t), where ˆQ π ( t, t ) i ome unbied etimtor of Q π ( t, t ), perhp R t. When uch proce h converged to locl optimum, then d π () π(, ) Q π (, ) f w (, ) ] f w (, ) =0. (3) Theorem 2 (Policy Grdient with Function Approximtion). If f w tifie (3) nd i comptible with the policy prmeteriztion in the ene tht then f w (, ) = = d π () 1 π(, ), (4) f w (, ). (5) Proof: Combining (3) nd (4) give d π () Q π (, ) f w (, ) ] = 0 (6) which tell u tht the error in f w (, ) i orthogonl to the grdient of the policy prmeteriztion. Becue the expreion bove i zero, we cn ubtrct it from the policy grdient theorem (2) to yield = = = d π () d π () d π () Q π (, ) f w (, ). d π () Q π (, ) Q π (, )+f w (, )] Q π (, ) f w (, ) ] 3 Appliction to Deriving Algorithm nd Advntge Given policy prmeteriztion, Theorem 2 cn be ued to derive n pproprite form for the vlue-function prmeteriztion. For exmple, conider policy tht i Gibb ditribution in liner combintion of feture: π(, ) = eθt φ b eθt φ b, S, A, where ech φ i n l-dimenionl feture vector chrcterizing tte-ction pir,. Meeting the comptibility condition (4) require tht f w (, ) = 1 π(, ) = φ b π(, b)φ b,
5 o tht the nturl prmeteriztion of f w i f w (, ) =w T φ b π(, b)φ b ]. In other word, f w mut be liner in the me feture the policy, except normlized to be men zero for ech tte. Other lgorithm cn eily be derived for vriety of nonliner policy prmeteriztion, uch multi-lyer bckpropgtion network. The creful reder will hve noticed tht the form given bove for f w require tht it hve zero men for ech tte: π(, )f w(, ) = 0, S. In thi ene it i better to think of f w n pproximtion of the dvntge function, A π (, ) = Q π (, ) V π () (much in Bird, 1993), rther thn of Q π. Our convergence requirement (3) i relly tht f w get the reltive vlue of the ction correct in ech tte, not the bolute vlue, nor the vrition from tte to tte. Our reult cn be viewed jutifiction for the pecil ttu of dvntge the trget for vlue function pproximtion in RL. In fct, our (2), (3), nd (5), cn ll be generlized to include n rbitrry function of tte dded to the vlue function or it pproximtion. For exmple, (5) cn be generlized to = dπ () π(,) f w (, )+v()],where v : S Ri n rbitrry function. (Thi follow immeditely becue π(,) = 0, S.) The choice of v doe not ffect ny of our theorem, but cn ubtntilly ffect the vrince of the grdient etimtor. The iue here re entirely nlogou to thoe in the ue of reinforcement beline in erlier work (e.g., Willim, 1992; Dyn, 1991; Sutton, 1984). In prctice, v hould preumbly be et to the bet vilble pproximtion of V π. Our reult etblih tht tht pproximtion proce cn proceed without ffecting the expected evolution of f w nd π. 4 Convergence of Policy Itertion with Function Approximtion Given Theorem 2, we cn prove for the firt time tht form of policy itertion with function pproximtion i convergent to loclly optiml policy. Theorem 3 (Policy Itertion with Function Approximtion). Let π nd f w be ny differentible function pproximtor for the policy nd vlue function repectively tht tify the comptibility condition (4) nd for which mx θ,,,i,j 2 π(,) i j <B<. Let {α k } k=0 be ny tep-ize equence uch tht lim k α k = 0 nd k α k =. Then, for ny MDP with bounded rewrd, the equence {(θ k,w k )}, defined by ny θ 0, π k = π(,,θ k ), nd w k = w uch tht d π k () θ k+1 = θ k + α k d π k () π k (, )Q π k (, ) f w (, )] f w(, ) π k (, ) f wk (, ), (π converge uch tht lim k ) k = 0. Proof: Our Theorem 2 ure tht the θ k updte i in the direction of the grdient. The bound on 2 π(,) i j nd on the MDP rewrd together ure u tht i lo bounded. Thee, together with the tep-ize requirement, re the necery =0 2 ρ i j condition to pply Propoition 3.5 from pge 96 of Bertek nd Titikli (1996), which ure convergence to locl optimum.
6 Acknowledgement The uthor wih to thnk Mrth Steentrup nd Doin Precup for comment, nd Michel Kern for inight into the notion of optiml policy under function pproximtion. Reference Bird, L. C. (1993). Advntge Updting. Wright Lb. Technicl Report WL-TR Bird, L. C. (1995). Reidul lgorithm: Reinforcement lerning with function pproximtion. Proc. of the Twelfth Int. Conf. on Mchine Lerning, pp Morgn Kufmnn. Bird, L. C., Moore, A. W. (1999). Grdient decent for generl reinforcement lerning. NIPS 11. MIT Pre. Brto, A. G., Sutton, R. S., Anderon, C. W. (1983). Neuronlike element tht cn olve difficult lerning control problem. IEEE Trn. on Sytem, Mn, nd Cybernetic 13:835. Bertek, D. P., Titikli, J. N. (1996). Neuro-Dynmic Progrmming. Athen Scientific. Co, X.-R., Chen, H.-F. (1997). Perturbtion reliztion, potentil, nd enitivity nlyi of Mrkov Procee, IEEE Trn. on Automtic Control 42(10): Dyn, P. (1991). Reinforcement comprion. In D. S. Touretzky, J. L. Elmn, T. J. Sejnowki, nd G. E. Hinton (ed.), Connectionit Model: Proceeding of the 1990 Summer School, pp Morgn Kufmnn. Gordon, G. J. (1995). Stble function pproximtion in dynmic progrmming. Proceeding of the Twelfth Int. Conf. on Mchine Lerning, pp Morgn Kufmnn. Gordon, G. J. (1996). Chttering in SARSA(λ). CMU Lerning Lb Technicl Report. Jkkol, T., Singh, S. P., Jordn, M. I. (1995) Reinforcement lerning lgorithm for prtilly obervble Mrkov deciion problem, NIPS 7, pp Morgn Kufmn. Kimur, H., Kobyhi, S. (1998). An nlyi of ctor/critic lgorithm uing eligibility trce: Reinforcement lerning with imperfect vlue function. Proceeding of the Fifteenth Interntionl Conference on Mchine Lerning. Morgn Kufmnn. Mrbch, P., Titikli, J. N. (1998) Simultion-bed optimiztion of Mrkov rewrd procee, technicl report LIDS-P-2411, Mchuett Intitute of Technology. Singh, S. P., Jkkol, T., Jordn, M. I. (1994). Lerning without tte-etimtion in prtilly obervble Mrkovin deciion problem. Proceeding of the Eleventh Interntionl Conference on Mchine Lerning, pp Morgn Kufmnn. Sutton, R. S. (1984). Temporl Credit Aignment in Reinforcement Lerning. Ph.D. thei, Univerity of Mchuett, Amhert. Sutton, R. S., Brto, A. G. (1998). Reinforcement Lerning: An Introduction. MIT Pre. Titikli, J. N. Vn Roy, B. (1996). Feture-bed method for lrge cle dynmic progrmming. Mchine Lerning 22: Willim, R. J. (1988). Towrd theory of reinforcement-lerning connectionit ytem. Technicl Report NU-CCS-88-3, Northetern Univerity, College of Computer Science. Willim, R. J. (1992). Simple ttiticl grdient-following lgorithm for connectionit reinforcement lerning. Mchine Lerning 8: Appendix: Proof of Theorem 1 We prove the theorem firt for the verge-rewrd formultion nd then for the trttte formultion. V π () def = π(, )Q π (, ) S = Q π (, )+π(, ) ] Qπ (, ) = Q π (, )+π(, ) R ρ(π)+ ]] P V π ( )
7 Therefore, = = Q π (, )+π(, ) + ]] P V π ( ) Q π (, )+π(, ) ] P V π ( ) V π () Summing both ide over the ttionry ditribution d π, d π () = d π () Q π (, )+ d π () d π () V π (), π(, ) P V π ( ) but ince d π i ttionry, d π () = d π () = d π () For the trt-tte formultion: V π () def = π(, )Q π (, ) = Q π (, )+ d π ( ) V π ( ) Q π (, ). S d π () V π () Q π (, )+π(, ) ] Qπ (, ) Q π (, )+π(, ) = R + γp V π ( ) = Q π (, )+π(, ) ] γp V π ( ) (7) = γ k Pr( x, k, π) π(x, ) Q π (x, ), x k=0 fter everl tep of unrolling (7), where Pr( x, k, π) i the probbility of going from tte to tte x in k tep under policy π. It i then immedite tht = { E } γ t 1 r t 0,π = V π ( 0 ) = = t=1 γ k Pr( 0, k, π) Q π (, ) k=0 d π () Q π (, ). ]]
Reinforcement learning
Reinforcement lerning Regulr MDP Given: Trnition model P Rewrd function R Find: Policy π Reinforcement lerning Trnition model nd rewrd function initilly unknown Still need to find the right policy Lern
More informationReinforcement Learning and Policy Reuse
Reinforcement Lerning nd Policy Reue Mnuel M. Veloo PEL Fll 206 Reding: Reinforcement Lerning: An Introduction R. Sutton nd A. Brto Probbilitic policy reue in reinforcement lerning gent Fernndo Fernndez
More informationArtificial Intelligence Markov Decision Problems
rtificil Intelligence Mrkov eciion Problem ilon - briefly mentioned in hpter Ruell nd orvig - hpter 7 Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple: probbilitic blockworld ction outcome
More informationReinforcement Learning for Robotic Locomotions
Reinforcement Lerning for Robotic Locomotion Bo Liu Stnford Univerity 121 Cmpu Drive Stnford, CA 94305, USA bliuxix@tnford.edu Hunzhong Xu Stnford Univerity 121 Cmpu Drive Stnford, CA 94305, USA xuhunvc@tnford.edu
More informationTP 10:Importance Sampling-The Metropolis Algorithm-The Ising Model-The Jackknife Method
TP 0:Importnce Smpling-The Metropoli Algorithm-The Iing Model-The Jckknife Method June, 200 The Cnonicl Enemble We conider phyicl ytem which re in therml contct with n environment. The environment i uully
More informationBias in Natural Actor-Critic Algorithms
Bi in Nturl Actor-Critic Algorithm Philip S. Thom pthom@c.um.edu Deprtment of Computer Science, Univerity of Mchuett, Amhert, MA 01002 USA Technicl Report UM-CS-2012-018 Abtrct We how tht two populr dicounted
More informationMarkov Decision Processes
Mrkov Deciion Procee A Brief Introduction nd Overview Jck L. King Ph.D. Geno UK Limited Preenttion Outline Introduction to MDP Motivtion for Study Definition Key Point of Interet Solution Technique Prtilly
More informationNon-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes
Non-Myopic Multi-Apect Sening with Prtilly Oervle Mrkov Deciion Procee Shiho Ji 2 Ronld Prr nd Lwrence Crin Deprtment of Electricl & Computer Engineering 2 Deprtment of Computer Engineering Duke Univerity
More informationCHOOSING THE NUMBER OF MODELS OF THE REFERENCE MODEL USING MULTIPLE MODELS ADAPTIVE CONTROL SYSTEM
Interntionl Crpthin Control Conference ICCC 00 ALENOVICE, CZEC REPUBLIC y 7-30, 00 COOSING TE NUBER OF ODELS OF TE REFERENCE ODEL USING ULTIPLE ODELS ADAPTIVE CONTROL SYSTE rin BICĂ, Victor-Vleriu PATRICIU
More informationReinforcement learning II
CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic
More information20.2. The Transform and its Inverse. Introduction. Prerequisites. Learning Outcomes
The Trnform nd it Invere 2.2 Introduction In thi Section we formlly introduce the Lplce trnform. The trnform i only pplied to cul function which were introduced in Section 2.1. We find the Lplce trnform
More informationBellman Optimality Equation for V*
Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s
More informationSTABILITY and Routh-Hurwitz Stability Criterion
Krdeniz Technicl Univerity Deprtment of Electricl nd Electronic Engineering 6080 Trbzon, Turkey Chpter 8- nd Routh-Hurwitz Stbility Criterion Bu der notlrı dece bu deri ln öğrencilerin kullnımın çık olup,
More informationReinforcement Learning
Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm
More information2D1431 Machine Learning Lab 3: Reinforcement Learning
2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed
More information4-4 E-field Calculations using Coulomb s Law
1/11/5 ection_4_4_e-field_clcultion_uing_coulomb_lw_empty.doc 1/1 4-4 E-field Clcultion uing Coulomb Lw Reding Aignment: pp. 9-98 Specificlly: 1. HO: The Uniform, Infinite Line Chrge. HO: The Uniform Dik
More informationARCHIVUM MATHEMATICUM (BRNO) Tomus 47 (2011), Kristína Rostás
ARCHIVUM MAHEMAICUM (BRNO) omu 47 (20), 23 33 MINIMAL AND MAXIMAL SOLUIONS OF FOURH ORDER IERAED DIFFERENIAL EQUAIONS WIH SINGULAR NONLINEARIY Kritín Rotá Abtrct. In thi pper we re concerned with ufficient
More informationRobot Planning in Partially Observable Continuous Domains
Robot Plnning in Prtilly Obervble Continuou Domin Joep M. Port Intitut de Robòtic i Informàtic Indutril (UPC-CSIC) Lloren i Artig 4-6, 828, Brcelon Spin Emil: port@iri.upc.edu Mtthij T. J. Spn Informtic
More informationExcerpted Section. Consider the stochastic diffusion without Poisson jumps governed by the stochastic differential equation (SDE)
? > ) 1 Technique in Computtionl Stochtic Dynmic Progrmming Floyd B. Hnon niverity of Illinoi t Chicgo Chicgo, Illinoi 60607-705 Excerpted Section A. MARKOV CHAI APPROXIMATIO Another pproch to finite difference
More informationRobot Planning in Partially Observable Continuous Domains
Robot Plnning in Prtilly Obervble Continuou Domin Joep M. Port Intitut de Robòtic i Informàtic Indutril (UPC-CSIC) Lloren i Artig 4-6, 828, Brcelon Spin Emil: port@iri.upc.edu Mtthij T. J. Spn Informtic
More informationAdministrivia CSE 190: Reinforcement Learning: An Introduction
Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these
More information{ } = E! & $ " k r t +k +1
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationChapter 4: Dynamic Programming
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationEfficient Planning in R-max
Efficient Plnning in R-mx Mrek Grześ nd Jee Hoey Dvid R. Cheriton School of Computer Science, Univerity of Wterloo 200 Univerity Avenue Wet, Wterloo, ON, N2L 3G1, Cnd {mgrze, jhoey}@c.uwterloo.c ABSTRACT
More informationModule 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo
Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:
More informationScalable Learning in Stochastic Games
Sclble Lerning in Stochstic Gmes Michel Bowling nd Mnuel Veloso Computer Science Deprtment Crnegie Mellon University Pittsburgh PA, 15213-3891 Abstrct Stochstic gmes re generl model of interction between
More information2. The Laplace Transform
. The Lplce Trnform. Review of Lplce Trnform Theory Pierre Simon Mrqui de Lplce (749-87 French tronomer, mthemticin nd politicin, Miniter of Interior for 6 wee under Npoleon, Preident of Acdemie Frncie
More informationMArkov decision processes (MDPs) have been widely
Spre Mrkov Deciion Procee with Cul Spre Tlli Entropy Regulriztion for Reinforcement Lerning yungje Lee, Sungjoon Choi, nd Songhwi Oh rxiv:709.0693v3 [c.lg] 3 Oct 07 Abtrct In thi pper, re Mrkov deciion
More informationAnalysis of Variance and Design of Experiments-II
Anlyi of Vrince nd Deign of Experiment-II MODULE VI LECTURE - 7 SPLIT-PLOT AND STRIP-PLOT DESIGNS Dr. Shlbh Deprtment of Mthemtic & Sttitic Indin Intitute of Technology Knpur Anlyi of covrince ith one
More informationCONTROL SYSTEMS LABORATORY ECE311 LAB 3: Control Design Using the Root Locus
CONTROL SYSTEMS LABORATORY ECE311 LAB 3: Control Deign Uing the Root Locu 1 Purpoe The purpoe of thi lbortory i to deign cruie control ytem for cr uing the root locu. 2 Introduction Diturbnce D( ) = d
More informationCOUNTING DESCENTS, RISES, AND LEVELS, WITH PRESCRIBED FIRST ELEMENT, IN WORDS
COUNTING DESCENTS, RISES, AND LEVELS, WITH PRESCRIBED FIRST ELEMENT, IN WORDS Sergey Kitev The Mthemtic Intitute, Reykvik Univerity, IS-03 Reykvik, Icelnd ergey@rui Toufik Mnour Deprtment of Mthemtic,
More informationM. A. Pathan, O. A. Daman LAPLACE TRANSFORMS OF THE LOGARITHMIC FUNCTIONS AND THEIR APPLICATIONS
DEMONSTRATIO MATHEMATICA Vol. XLVI No 3 3 M. A. Pthn, O. A. Dmn LAPLACE TRANSFORMS OF THE LOGARITHMIC FUNCTIONS AND THEIR APPLICATIONS Abtrct. Thi pper del with theorem nd formul uing the technique of
More information1 Online Learning and Regret Minimization
2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in
More informationSPACE VECTOR PULSE- WIDTH-MODULATED (SV-PWM) INVERTERS
CHAPTER 7 SPACE VECTOR PULSE- WIDTH-MODULATED (SV-PWM) INVERTERS 7-1 INTRODUCTION In Chpter 5, we briefly icue current-regulte PWM inverter uing current-hyterei control, in which the witching frequency
More informationReview of Calculus, cont d
Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some
More informationMath 2142 Homework 2 Solutions. Problem 1. Prove the following formulas for Laplace transforms for s > 0. a s 2 + a 2 L{cos at} = e st.
Mth 2142 Homework 2 Solution Problem 1. Prove the following formul for Lplce trnform for >. L{1} = 1 L{t} = 1 2 L{in t} = 2 + 2 L{co t} = 2 + 2 Solution. For the firt Lplce trnform, we need to clculte:
More informationAccelerator Physics. G. A. Krafft Jefferson Lab Old Dominion University Lecture 5
Accelertor Phyic G. A. Krfft Jefferon L Old Dominion Univerity Lecture 5 ODU Accelertor Phyic Spring 15 Inhomogeneou Hill Eqution Fundmentl trnvere eqution of motion in prticle ccelertor for mll devition
More informationOn the Adders with Minimum Tests
Proceeding of the 5th Ain Tet Sympoium (ATS '97) On the Adder with Minimum Tet Seiji Kjihr nd Tutomu So Dept. of Computer Science nd Electronic, Kyuhu Intitute of Technology Atrct Thi pper conider two
More informationAPPENDIX 2 LAPLACE TRANSFORMS
APPENDIX LAPLACE TRANSFORMS Thi ppendix preent hort introduction to Lplce trnform, the bic tool ued in nlyzing continuou ytem in the frequency domin. The Lplce trnform convert liner ordinry differentil
More informationOracular Partially Observable Markov Decision Processes: A Very Special Case
Orculr Prtilly Obervble Mrkov Deciion Procee: A Very Specil Ce Nichol Armtrong-Crew nd Mnuel Veloo Robotic Intitute, Crnegie Mellon Univerity {nrmtro,veloo}@c.cmu.edu Abtrct We introduce the Orculr Prtilly
More informationEE Control Systems LECTURE 8
Coyright F.L. Lewi 999 All right reerved Udted: Sundy, Ferury, 999 EE 44 - Control Sytem LECTURE 8 REALIZATION AND CANONICAL FORMS A liner time-invrint (LTI) ytem cn e rereented in mny wy, including: differentil
More information. The set of these fractions is then obviously Q, and we can define addition and multiplication on it in the expected way by
50 Andre Gthmnn 6. LOCALIZATION Locliztion i very powerful technique in commuttive lgebr tht often llow to reduce quetion on ring nd module to union of mller locl problem. It cn eily be motivted both from
More informationarxiv: v6 [stat.ml] 13 Apr 2018
Expected Policy Grdient Kmil Cioek nd Shimon Whiteon Deprtment of Computer Science, Univerity of Oxford Wolfon Building, Prk Rod, Oxford OX1 3QD {kmil.cioek,himon.whiteon}@c.ox.c.uk rxiv:1706.05374v6 [tt.ml
More informationA REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007
A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus
More information2π(t s) (3) B(t, ω) has independent increments, i.e., for any 0 t 1 <t 2 < <t n, the random variables
2 Brownin Motion 2.1 Definition of Brownin Motion Let Ω,F,P) be probbility pce. A tochtic proce i meurble function Xt, ω) defined on the product pce [, ) Ω. In prticulr, ) for ech t, Xt, ) i rndom vrible,
More informationPHYS 601 HW 5 Solution. We wish to find a Fourier expansion of e sin ψ so that the solution can be written in the form
5 Solving Kepler eqution Conider the Kepler eqution ωt = ψ e in ψ We wih to find Fourier expnion of e in ψ o tht the olution cn be written in the form ψωt = ωt + A n innωt, n= where A n re the Fourier
More informationThe ifs Package. December 28, 2005
The if Pckge December 28, 2005 Verion 0.1-1 Title Iterted Function Sytem Author S. M. Icu Mintiner S. M. Icu Iterted Function Sytem Licene GPL Verion 2 or lter. R topic documented:
More information19 Optimal behavior: Game theory
Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,
More informationTheoretical foundations of Gaussian quadrature
Theoreticl foundtions of Gussin qudrture 1 Inner product vector spce Definition 1. A vector spce (or liner spce) is set V = {u, v, w,...} in which the following two opertions re defined: (A) Addition of
More informationPRACTICE EXAM 2 SOLUTIONS
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Deprtment of Phyic Phyic 8.01x Fll Term 00 PRACTICE EXAM SOLUTIONS Proble: Thi i reltively trihtforwrd Newton Second Lw problem. We et up coordinte ytem which i poitive
More informationLinear predictive coding
Liner predictive coding Thi ethod cobine liner proceing with clr quntiztion. The in ide of the ethod i to predict the vlue of the current ple by liner cobintion of previou lredy recontructed ple nd then
More informationState space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies
Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response
More informationPackage ifs. R topics documented: August 21, Version Title Iterated Function Systems. Author S. M. Iacus.
Pckge if Augut 21, 2015 Verion 0.1.5 Title Iterted Function Sytem Author S. M. Icu Dte 2015-08-21 Mintiner S. M. Icu Iterted Function Sytem Etimtor. Licene GPL (>= 2) NeedCompiltion
More informationChapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables
Copyright 013 Peron Eduction, Inc. Tble nd Formul for Sullivn, Sttitic: Informed Deciion Uing Dt 013 Peron Eduction, Inc Chpter Orgnizing nd Summrizing Dt Reltive frequency = frequency um of ll frequencie
More informationODE: Existence and Uniqueness of a Solution
Mth 22 Fll 213 Jerry Kzdn ODE: Existence nd Uniqueness of Solution The Fundmentl Theorem of Clculus tells us how to solve the ordinry differentil eqution (ODE) du = f(t) dt with initil condition u() =
More informationGeneration of Lyapunov Functions by Neural Networks
WCE 28, July 2-4, 28, London, U.K. Genertion of Lypunov Functions by Neurl Networks Nvid Noroozi, Pknoosh Krimghee, Ftemeh Sfei, nd Hmed Jvdi Abstrct Lypunov function is generlly obtined bsed on tril nd
More informationPhysics 116C Solution of inhomogeneous ordinary differential equations using Green s functions
Physics 6C Solution of inhomogeneous ordinry differentil equtions using Green s functions Peter Young November 5, 29 Homogeneous Equtions We hve studied, especilly in long HW problem, second order liner
More informationConvergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
Mchine Lerning, 39, 287 308, 2000. c 2000 Kluwer Acdemic Publishers. Printed in The Netherlnds. Convergence Results for Single-Step On-Policy Reinforcement-Lerning Algorithms SATINDER SINGH AT&T Lbs-Reserch,
More informationTransfer Functions. Chapter 5. Transfer Functions. Derivation of a Transfer Function. Transfer Functions
5/4/6 PM : Trnfer Function Chpter 5 Trnfer Function Defined G() = Y()/U() preent normlized model of proce, i.e., cn be ued with n input. Y() nd U() re both written in devition vrible form. The form of
More informationJack Simons, Henry Eyring Scientist and Professor Chemistry Department University of Utah
1. Born-Oppenheimer pprox.- energy surfces 2. Men-field (Hrtree-Fock) theory- orbitls 3. Pros nd cons of HF- RHF, UHF 4. Beyond HF- why? 5. First, one usully does HF-how? 6. Bsis sets nd nottions 7. MPn,
More informationOptimal Treatment of Queueing Model for Highway
Journl of Computtion & Modelling, vol.1, no.1, 011, 61-71 ISSN: 179-765 (print, 179-8850 (online Interntionl Scientific Pre, 011 Optiml Tretment of Queueing Model for Highwy I.A. Imil 1, G.S. Mokddi, S.A.
More informationActor-Critic. Hung-yi Lee
Actor-Critic Hung-yi Lee Asynchronous Advntge Actor-Critic (A3C) Volodymyr Mnih, Adrià Puigdomènech Bdi, Mehdi Mirz, Alex Grves, Timothy P. Lillicrp, Tim Hrley, Dvid Silver, Kory Kvukcuoglu, Asynchronous
More informationMatrix Solution to Linear Equations and Markov Chains
Trding Systems nd Methods, Fifth Edition By Perry J. Kufmn Copyright 2005, 2013 by Perry J. Kufmn APPENDIX 2 Mtrix Solution to Liner Equtions nd Mrkov Chins DIRECT SOLUTION AND CONVERGENCE METHOD Before
More informationNew data structures to reduce data size and search time
New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute
More informationFatigue Failure of an Oval Cross Section Prismatic Bar at Pulsating Torsion ( )
World Engineering & Applied Science Journl 6 (): 7-, 5 ISS 79- IDOSI Publiction, 5 DOI:.59/idoi.wej.5.6.. Ftigue Filure of n Ovl Cro Section Primtic Br t Pulting Torion L.Kh. Tlybly nd.m. giyev Intitute
More informationMath 1B, lecture 4: Error bounds for numerical methods
Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the
More informationCMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature
CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy
More informationConstruction of Gauss Quadrature Rules
Jim Lmbers MAT 772 Fll Semester 2010-11 Lecture 15 Notes These notes correspond to Sections 10.2 nd 10.3 in the text. Construction of Guss Qudrture Rules Previously, we lerned tht Newton-Cotes qudrture
More informationBest Approximation in the 2-norm
Jim Lmbers MAT 77 Fll Semester 1-11 Lecture 1 Notes These notes correspond to Sections 9. nd 9.3 in the text. Best Approximtion in the -norm Suppose tht we wish to obtin function f n (x) tht is liner combintion
More informationVSS CONTROL OF STRIP STEERING FOR HOT ROLLING MILLS. M.Okada, K.Murayama, Y.Anabuki, Y.Hayashi
V ONTROL OF TRIP TEERING FOR OT ROLLING MILL M.Okd.Murym Y.Anbuki Y.yhi Wet Jpn Work (urhiki Ditrict) JFE teel orportion wkidori -chome Mizuhim urhiki 7-85 Jpn Abtrct: trip teering i one of the mot eriou
More information1.9 C 2 inner variations
46 CHAPTER 1. INDIRECT METHODS 1.9 C 2 inner vritions So fr, we hve restricted ttention to liner vritions. These re vritions of the form vx; ǫ = ux + ǫφx where φ is in some liner perturbtion clss P, for
More informationLecture 19: Continuous Least Squares Approximation
Lecture 19: Continuous Lest Squres Approximtion 33 Continuous lest squres pproximtion We begn 31 with the problem of pproximting some f C[, b] with polynomil p P n t the discrete points x, x 1,, x m for
More informationAn Application of the Generalized Shrunken Least Squares Estimator on Principal Component Regression
An Appliction of the Generlized Shrunken Let Squre Etimtor on Principl Component Regreion. Introduction Profeor Jnn-Huei Jinn Deprtment of Sttitic Grnd Vlley Stte Univerity Allendle, MI 0 USA Profeor Chwn-Chin
More informationSolutions Problem Set 2. Problem (a) Let M denote the DFA constructed by swapping the accept and non-accepting state in M.
Solution Prolem Set 2 Prolem.4 () Let M denote the DFA contructed y wpping the ccept nd non-ccepting tte in M. For ny tring w B, w will e ccepted y M, tht i, fter conuming the tring w, M will e in n ccepting
More informationLow-order simultaneous stabilization of linear bicycle models at different forward speeds
203 Americn Control Conference (ACC) Whington, DC, USA, June 7-9, 203 Low-order imultneou tbiliztion of liner bicycle model t different forwrd peed A. N. Gündeş nd A. Nnngud 2 Abtrct Liner model of bicycle
More informationAnytime algorithms for multiagent decision making using coordination graphs
Anytime lgorithms for multigent decision mking using coordintion grphs N. Vlssis R. Elhorst J. R. Kok Informtics Institute, University of Amsterdm, The Netherlnds {vlssis,reinhrst,jellekok}@science.uv.nl
More informationTests for the Ratio of Two Poisson Rates
Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson
More informationREINFORCEMENT learning (RL) was originally studied
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 3, MARCH 2015 385 Multiobjective Reinforcement Lerning: A Comprehensive Overview Chunming Liu, Xin Xu, Senior Member, IEEE, nd
More informationThe Fundamental Theorem of Calculus. The Total Change Theorem and the Area Under a Curve.
Clculus Li Vs The Fundmentl Theorem of Clculus. The Totl Chnge Theorem nd the Are Under Curve. Recll the following fct from Clculus course. If continuous function f(x) represents the rte of chnge of F
More informationRiemann is the Mann! (But Lebesgue may besgue to differ.)
Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >
More informationDecision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees
CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize
More informationChapters 4 & 5 Integrals & Applications
Contents Chpters 4 & 5 Integrls & Applictions Motivtion to Chpters 4 & 5 2 Chpter 4 3 Ares nd Distnces 3. VIDEO - Ares Under Functions............................................ 3.2 VIDEO - Applictions
More informationGenetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary
Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed
More informationNear-Bayesian Exploration in Polynomial Time
J. Zico Kolter kolter@cs.stnford.edu Andrew Y. Ng ng@cs.stnford.edu Computer Science Deprtment, Stnford University, CA 94305 Abstrct We consider the explortion/exploittion problem in reinforcement lerning
More informationA Signal-Level Fusion Model for Image-Based Change Detection in DARPA's Dynamic Database System
SPIE Aerosense 001 Conference on Signl Processing, Sensor Fusion, nd Trget Recognition X, April 16-0, Orlndo FL. (Minor errors in published version corrected.) A Signl-Level Fusion Model for Imge-Bsed
More informationThe steps of the hypothesis test
ttisticl Methods I (EXT 7005) Pge 78 Mosquito species Time of dy A B C Mid morning 0.0088 5.4900 5.5000 Mid Afternoon.3400 0.0300 0.8700 Dusk 0.600 5.400 3.000 The Chi squre test sttistic is the sum of
More informationCSCI565 - Compiler Design
CSCI565 - Compiler Deign Spring 6 Due Dte: Fe. 5, 6 t : PM in Cl Prolem [ point]: Regulr Expreion nd Finite Automt Develop regulr expreion (RE) tht detet the longet tring over the lphet {-} with the following
More informationJonathan Mugan. July 15, 2013
Jonthn Mugn July 15, 2013 Imgine rt in Skinner box. The rt cn see screen of imges, nd dot in the lower-right corner determines if there will be shock. Bottom-up methods my not find this dot, but top-down
More informationELECTRICAL CIRCUITS 10. PART II BAND PASS BUTTERWORTH AND CHEBYSHEV
45 ELECTRICAL CIRCUITS 0. PART II BAND PASS BUTTERWRTH AND CHEBYSHEV Introduction Bnd p ctive filter re different enough from the low p nd high p ctive filter tht the ubject will be treted eprte prt. Thi
More informationMATH34032: Green s Functions, Integral Equations and the Calculus of Variations 1
MATH34032: Green s Functions, Integrl Equtions nd the Clculus of Vritions 1 Section 1 Function spces nd opertors Here we gives some brief detils nd definitions, prticulrly relting to opertors. For further
More informationError Estimation of Practical Convolution Discrete Gaussian Sampling
Error Etimtion of Prcticl Convolution Dicrete Guin Smpling Zhongxing Zheng, Xioyun Wng,3, Gungwu Xu 4, Chunhun Zho Deprtment of Computer Science nd Technology, Tinghu Univerity, Beijing 00084, Chin Intitute
More informationEstimation of Regions of Attraction of Spin Modes
7 TH EUROPEAN CONFERENCE FOR AEROSPACE SCIENCES (EUCASS) Etimtion of Region of Attrction of Spin Mode Alexnder Khrbrov, Mri Sidoryuk, nd Dmitry Igntyev Centrl Aerohydrodynmic Intitute (TAGI), Zhukovky,
More informationSection 11.5 Estimation of difference of two proportions
ection.5 Estimtion of difference of two proportions As seen in estimtion of difference of two mens for nonnorml popultion bsed on lrge smple sizes, one cn use CLT in the pproximtion of the distribution
More informationSTOCHASTIC REGULAR LANGUAGE: A MATHEMATICAL MODEL FOR THE LANGUAGE OF SEQUENTIAL ACTIONS FOR DECISION MAKING UNDER UNCERTAINTY
Interntionl Journl of Mthemtic nd Computer Appliction Reerch (IJMCAR) ISSN 49-6955 Vol. 3, Iue, Mr 3, -8 TJPRC Pvt. Ltd. STOCHASTIC REGULAR LANGUAGE: A MATHEMATICAL MODEL FOR THE LANGUAGE OF SEQUENTIAL
More information4.4 Areas, Integrals and Antiderivatives
. res, integrls nd ntiderivtives 333. Ares, Integrls nd Antiderivtives This section explores properties of functions defined s res nd exmines some connections mong res, integrls nd ntiderivtives. In order
More informationApplying Q-Learning to Flappy Bird
Applying Q-Lerning to Flppy Bird Moritz Ebeling-Rump, Mnfred Ko, Zchry Hervieux-Moore Abstrct The field of mchine lerning is n interesting nd reltively new re of reserch in rtificil intelligence. In this
More informationMulti-objective optimization of dielectric layer photonic crystal filter
Optic Applict, Vol. XLVII, No. 1, 017 DOI: 10.577/o170103 Multi-objective optimiztion of dielectric lyer photonic crystl filter HONGWEI YANG *, CUIYING HUANG, SHANSHAN MENG College of Applied Sciences,
More informationAn approximation to the arithmetic-geometric mean. G.J.O. Jameson, Math. Gazette 98 (2014), 85 95
An pproximtion to the rithmetic-geometric men G.J.O. Jmeson, Mth. Gzette 98 (4), 85 95 Given positive numbers > b, consider the itertion given by =, b = b nd n+ = ( n + b n ), b n+ = ( n b n ) /. At ech
More informationApplicability of Matrix Inverse in Simple Model of Economics An Analysis
IOSR Journl of Mthemtic IOSR-JM e-issn: 78-578, p-issn: 39-765X. Volume, Iue 5 Ver. VI Sep-Oct. 4, PP 7-34 pplicility of Mtrix Invere in Simple Moel of Economic n nlyi Mr. nupm Srm Deprtment of Economic
More informationNew Expansion and Infinite Series
Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University
More information