Reactive Self2re scue Control for Autonomous Mobile Ro bot Ba sed on Reinforcement Learning
|
|
- Osborn Branden Lawrence
- 5 years ago
- Views:
Transcription
1 J OU RNAL OF SHAN GHA I J IAO TON G UNIV ERSIT Y Vol. 43 No. 11 Nov : (2009) ,,, (, ) :.,,.,.. : ; ; Q ; : TP 242 : A Reactive Self2re scue Control for Autonomous Mobile Ro bot Ba sed on Reinforcement Learning W A N G Zhon g2w ei, CA O Qi2x i n, L UA N N an, Z H A N G L ei ( Research Instit ute of Robotics, Shanghai Jiaotong U niversity, Shanghai , China) Abstract : A control technique to achieve self2rescue of autonomous mobile robot f rom obstacle environment based o n reinforcement learning was p ropo sed. Motio n co nt rol st rategy of self2rescue was got o n line t hrough t he interaction between t he mobile robot and the obstacle environment, so t he self2rescue control technique help s to overcome obstacle environment by t he autonomous mobile robot independently and avoid t he loss f rom rescue activity and task failure. Prior knowledge of working environment was applied to di2 rect the design of heuristic reward f unction for the reinforcement learning system, which guarantees t he correct direction of searching and learning control strategy. The simulation experiment s indicate that it is feasible to achieve self2rescue by self learning co nt rol st rategy. Key words : autonomous mobile robot ; reactive control ; Q2learning ; heuristic reward f unction,,,,.,,.,., [ 122 ], [324 ], [ 526 ], : : ( ), (863) (2006AA040203) : (19782),,,,. (19602),,,, ( Tel. ) : ; E2mail edu. cn.
2 ,,., Brooks [7 ],,,.,, [8 ] [9 ] [10 ].,,. ( Reinforcement Learning,RL ) Agent, [11 ],., RL,,,.,,, RL, RL [12213 ]., RL,,,. 1 1.,, S, A, G., S G,.,,, G,,, G. 1 Fig. 1 Schematic of mountain2car being in environmental obstacles [11 ] x t+1 = bound[ x t + vt+1 ] vt+1 = bound[ vt u - gcos 3 x t ] (1), x v. u 3, u { + 1, - 1, 0}, 3. g = ; G S A x (1) bound[ ] x v : x , x = , vt + 1 = 0 ; x 0. 5,. v [ ,0. 07 ]. 2,,,, Q Q [11 ],, Agent. Q : t, Agent at, st s t + 1, rt, rt s t + 1 at s t. Q,, Q, Q s t at,, Qt+1 (st, at) = rt + max a t { Q(st+1, at) at A} (2)
3 11, : 1753 Q, Q Agent Q,.,,,., Sutton [11 ], r( t) = - 1, x < , x 0. 5 (3),.,,,.,,.,,,,, 2.,, ;,, + 2 ;, + 1. : r( t) = + 2, ( x t+1 - x t) u 0, v ( t) 0 + 1, ( x t+1 - x t) u 0, v ( t) < 0-1, (4),,.,,,. Q : (1). (2) st. Q, 1 - a t ; a t. (3) st + 1, (4) rt. (4) rt, Q, Qt ( st, at) = (1 - at) Qt - 1 (st, at) + at [ rt + gv ( st + 1 ) ] s = st, a = at Qt - 1 ( st, at),v ( st + 1 ) = max a A { Qt - 1 ( st + 1, a) }. 3 2 Q2 Fig. 2 Framework of Q2learning based on heuristic reward f unction 1 3, :,. :, Q Q., RL. 2 Q : = 0. 5, = 0. 95, = x = v = Q., 3 (a) 200 ( : G 1 000,,, ). 3 ( b).
4 (a) (b) 3 Fig. 3 Variation curve of position and velocity of mountain car with different reward f unction, Q, ; Q,,,. : Q, Q,,..,,,,. x = ( n), n, 4. (a) = 0. 1 (b) = 0. 3 (c) = 0. 5 (d) = n Fig. 4 Relationship between trials number of mountain car s overcoming obstacle and parameters of control algorithm, 20.,. 4 Q, Q,.
5 11, : 1755 : Q,. : [ 1 ] Tsalatsanis A, Valavanis K, Tsourveloudis N. Mobile robot navigation using sonar and range meas2 urement s f rom uncalibrated cameras [J ]. Journal of Intelligent and Robotic Systems, 2007, 48 ( 2) : [ 2 ] Mata M, Armingol J M, Fernndez J, et al. Object learning and detection using evolutionary deformable models for mobile robot navigation [J ]. Robotica, 2008, 26 (1) : [ 3 ] Lan G P, Ma S G. Development of a novel crawler mechanism with polymorphic locomotion [J ]. vanced Robotics, 2007, 21 (3/ 4) : Ad2 [ 4 ],,,. [J ].,2004,30 (3) : DEN G Zong2quan, GAO Hai2bo, WAN G Shao2chun, et al. Analysis of climbing obstacle capability of lunar rover with planetary wheel [J ]. Journal of Beijing University of Aeronautics and Astronautics, 2004, 30 (3) : [ 5 ],,,. [J ]. ( ), 2006, 34 (7) : XIAN G Xian2bo, XU Guo2hua, CAI Tao, et al. In2 telligent self2rescue system for underwater vehicles [J ]. Journal of Huazhong University of Science and Technology ( Natural Science Edition), 2006, 34 (7) : [ 6 ] Wang Z W, Cao Q X, L uan N, et al. Development of new pipeline maintenance system for repairing ear2 ly2built off shore oil pipelines [ C]/ / IEEE ICIT Piscataway, United States : IEEE Publishers, 2008 : 126. [ 7 ] Brooks R A. Robust layered control system for a mo2 bile robot [J ]. IEEE Journal of Robotics and Automa2 tion, 1986, 2 (1) : [ 8 ] Faress K N, EI Hagry M T, EI Kosy A A. Trajecto2 ry tracking control for a wheeled mobile robot using f uzzy logic controller [J ]. WSEAS Transactions on Systems, 2005, 4 (7) : [ 9 ] Wang X Q, Hou Z G, Zou A M, et al. A behavior controller based on spiking neural networks for mo2 bile robot s [J ]. Neurocomputing, 2008, 71 ( 426 ) : [10 ] Er M J, Tan T P, Loh S Y. Control of a mobile robot using generalized dynamic f uzzy neural net2 works [J ]. Microprocessors and Microsystems, 2004, 28 (9) : [11 ] Sutton R S, Barto A G. Reinforcement Learning : An Introduction [ M ]. Cambridge, MA : MIT Press, [12 ] Mataric M J. Reward functions for accelerated learn2 ing [ C]/ / Proceedings of the Eleventh International Conference on Machine Learning. San Francisco, CA : Morgan Kauf mann Publishers, 1994 : [13 ],. [J ].,2005,32 (3) : WEI Ying2zi, ZHAO Ming2yang. Design and conver2 gence analysis of a heuristic reward f unction for rein2 forcement learning algorithms [J ]. Computer Science, 2005, 32 (3) : Zernike CT,, (, ) : Zernike CT. Zernike,,.,,.
Replacing eligibility trace for action-value learning with function approximation
Replacing eligibility trace for action-value learning with function approximation Kary FRÄMLING Helsinki University of Technology PL 5500, FI-02015 TKK - Finland Abstract. The eligibility trace is one
More informationilstd: Eligibility Traces and Convergence Analysis
ilstd: Eligibility Traces and Convergence Analysis Alborz Geramifard Michael Bowling Martin Zinkevich Richard S. Sutton Department of Computing Science University of Alberta Edmonton, Alberta {alborz,bowling,maz,sutton}@cs.ualberta.ca
More informationLearning Control Under Uncertainty: A Probabilistic Value-Iteration Approach
Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach B. Bischoff 1, D. Nguyen-Tuong 1,H.Markert 1 anda.knoll 2 1- Robert Bosch GmbH - Corporate Research Robert-Bosch-Str. 2, 71701
More informationImproving estimations of a robot s position and attitude w ith accelerom eter enhanced odometry
4 2 Vol. 4. 2 2009 4 CAA I Transactions on Intelligent System s Ap r. 2009 1, 2, 3 (1., 100094 2., 150080 3. 304 2, 046012) :,,. Gyrodometry,,,,,. 1 /3, 1 /6,. : Gyrodometry : TP242 : A : 167324785 (2009)
More informationOptimal algorithm and application for point to point iterative learning control via updating reference trajectory
33 9 2016 9 DOI: 10.7641/CTA.2016.50970 Control Theory & Applications Vol. 33 No. 9 Sep. 2016,, (, 214122) :,.,,.,,,.. : ; ; ; ; : TP273 : A Optimal algorithm and application for point to point iterative
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationAn application of the temporal difference algorithm to the truck backer-upper problem
ESANN 214 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 23-25 April 214, i6doc.com publ., ISBN 978-28741995-7. Available
More informationOpen Theoretical Questions in Reinforcement Learning
Open Theoretical Questions in Reinforcement Learning Richard S. Sutton AT&T Labs, Florham Park, NJ 07932, USA, sutton@research.att.com, www.cs.umass.edu/~rich Reinforcement learning (RL) concerns the problem
More informationRL 3: Reinforcement Learning
RL 3: Reinforcement Learning Q-Learning Michael Herrmann University of Edinburgh, School of Informatics 20/01/2015 Last time: Multi-Armed Bandits (10 Points to remember) MAB applications do exist (e.g.
More informationReinforcement Learning In Continuous Time and Space
Reinforcement Learning In Continuous Time and Space presentation of paper by Kenji Doya Leszek Rybicki lrybicki@mat.umk.pl 18.07.2008 Leszek Rybicki lrybicki@mat.umk.pl Reinforcement Learning In Continuous
More informationAn Adaptive Clustering Method for Model-free Reinforcement Learning
An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at
More informationIn: Proc. BENELEARN-98, 8th Belgian-Dutch Conference on Machine Learning, pp 9-46, 998 Linear Quadratic Regulation using Reinforcement Learning Stephan ten Hagen? and Ben Krose Department of Mathematics,
More informationEnhancing a Model-Free Adaptive Controller through Evolutionary Computation
Enhancing a Model-Free Adaptive Controller through Evolutionary Computation Anthony Clark, Philip McKinley, and Xiaobo Tan Michigan State University, East Lansing, USA Aquatic Robots Practical uses autonomous
More informationValue Function Approximation through Sparse Bayesian Modeling
Value Function Approximation through Sparse Bayesian Modeling Nikolaos Tziortziotis and Konstantinos Blekas Department of Computer Science, University of Ioannina, P.O. Box 1186, 45110 Ioannina, Greece,
More informationMulti-robot Formation Control Using Reinforcement Learning Method
Multi-robot Formation Control Using Reinforcement Learning Metho Guoyu Zuo, Jiatong Han, an Guansheng Han School of Electronic Information & Control Engineering, Beijing University of Technology, Beijing
More informationReinforcement Learning for Continuous. Action using Stochastic Gradient Ascent. Hajime KIMURA, Shigenobu KOBAYASHI JAPAN
Reinforcement Learning for Continuous Action using Stochastic Gradient Ascent Hajime KIMURA, Shigenobu KOBAYASHI Tokyo Institute of Technology, 4259 Nagatsuda, Midori-ku Yokohama 226-852 JAPAN Abstract:
More informationReinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it
More informationTrajectory Planning Using High-Order Polynomials under Acceleration Constraint
Journal of Optimization in Industrial Engineering (07) -6 Trajectory Planning Using High-Order Polynomials under Acceleration Constraint Hossein Barghi Jond a,*,vasif V. Nabiyev b, Rifat Benveniste c a
More informationPartially observable Markov decision processes. Department of Computer Science, Czech Technical University in Prague
Partially observable Markov decision processes Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:
More informationA Sliding Mode Control based on Nonlinear Disturbance Observer for the Mobile Manipulator
International Core Journal of Engineering Vol.3 No.6 7 ISSN: 44-895 A Sliding Mode Control based on Nonlinear Disturbance Observer for the Mobile Manipulator Yanna Si Information Engineering College Henan
More informationself-driving car technology introduction
self-driving car technology introduction slide 1 Contents of this presentation 1. Motivation 2. Methods 2.1 road lane detection 2.2 collision avoidance 3. Summary 4. Future work slide 2 Motivation slide
More informationAn online kernel-based clustering approach for value function approximation
An online kernel-based clustering approach for value function approximation N. Tziortziotis and K. Blekas Department of Computer Science, University of Ioannina P.O.Box 1186, Ioannina 45110 - Greece {ntziorzi,kblekas}@cs.uoi.gr
More informationFunction Approximation for Continuous Constrained MDPs
Function Approximation for Continuous Constrained MDPs Aditya Undurti, Alborz Geramifard, Jonathan P. How Abstract In this work we apply function approximation techniques to solve continuous, constrained
More informationTHE system is controlled by various approaches using the
, 23-25 October, 2013, San Francisco, USA A Study on Using Multiple Sets of Particle Filters to Control an Inverted Pendulum Midori Saito and Ichiro Kobayashi Abstract The dynamic system is controlled
More informationAn Introduction to Reinforcement Learning
An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@csa.iisc.ernet.in Department of Computer Science and Automation Indian Institute of Science August 2014 What is Reinforcement
More informationUnifying Behavior-Based Control Design and Hybrid Stability Theory
9 American Control Conference Hyatt Regency Riverfront St. Louis MO USA June - 9 ThC.6 Unifying Behavior-Based Control Design and Hybrid Stability Theory Vladimir Djapic 3 Jay Farrell 3 and Wenjie Dong
More informationChapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationOn-line Reinforcement Learning for Nonlinear Motion Control: Quadratic and Non-Quadratic Reward Functions
Preprints of the 19th World Congress The International Federation of Automatic Control Cape Town, South Africa. August 24-29, 214 On-line Reinforcement Learning for Nonlinear Motion Control: Quadratic
More informationNeural Map. Structured Memory for Deep RL. Emilio Parisotto
Neural Map Structured Memory for Deep RL Emilio Parisotto eparisot@andrew.cmu.edu PhD Student Machine Learning Department Carnegie Mellon University Supervised Learning Most deep learning problems are
More informationAttitude control of a hopping robot: a feasibility study
Attitude control of a hopping robot: a feasibility study Tammepõld, R. and Kruusmaa, M. Tallinn University of Technology, Estonia Fiorini, P. University of Verona, Italy Summary Motivation Small mobile
More informationShort Course: Multiagent Systems. Multiagent Systems. Lecture 1: Basics Agents Environments. Reinforcement Learning. This course is about:
Short Course: Multiagent Systems Lecture 1: Basics Agents Environments Reinforcement Learning Multiagent Systems This course is about: Agents: Sensing, reasoning, acting Multiagent Systems: Distributed
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationTransfer of Neuroevolved Controllers in Unstable Domains
In Proceedings of the Genetic Evolutionary Computation Conference (GECCO 24) Transfer of Neuroevolved Controllers in Unstable Domains Faustino J. Gomez and Risto Miikkulainen Department of Computer Sciences,
More informationTerramechanics Based Analysis and Motion Control of Rovers on Simulated Lunar Soil
ICRA '07 Space Robotics Workshop 14 April, 2007 Terramechanics Based Analysis and Motion Control of Rovers on Simulated Lunar Soil Kazuya Yoshida and Keiji Nagatani Dept. Aerospace Engineering Graduate
More informationTwo-stage Pedestrian Detection Based on Multiple Features and Machine Learning
38 3 Vol. 38, No. 3 2012 3 ACTA AUTOMATICA SINICA March, 2012 1 1 1, (Adaboost) (Support vector machine, SVM). (Four direction features, FDF) GAB (Gentle Adaboost) (Entropy-histograms of oriented gradients,
More informationAn Optimal Tracking Approach to Formation Control of Nonlinear Multi-Agent Systems
AIAA Guidance, Navigation, and Control Conference 13-16 August 212, Minneapolis, Minnesota AIAA 212-4694 An Optimal Tracking Approach to Formation Control of Nonlinear Multi-Agent Systems Ali Heydari 1
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationDialogue management: Parametric approaches to policy optimisation. Dialogue Systems Group, Cambridge University Engineering Department
Dialogue management: Parametric approaches to policy optimisation Milica Gašić Dialogue Systems Group, Cambridge University Engineering Department 1 / 30 Dialogue optimisation as a reinforcement learning
More informationVariable Geometry Single-Tracked Mechanism for a Rescue Robot
Proceedings of the 005 IEEE International Workshop on Safety, Security and Rescue Robotics Kobe, Japan, June 005 Variable Geometry Single-Tracked Mechanism for a Rescue Robot Sung Kyun Lim 373- Guseong-dong,
More informationAnalysis and Design of Hybrid AI/Control Systems
Analysis and Design of Hybrid AI/Control Systems Glen Henshaw, PhD (formerly) Space Systems Laboratory University of Maryland,College Park 13 May 2011 Dynamically Complex Vehicles Increased deployment
More informationOn the average diameter of directed loop networks
34 2 Vol.34 No. 2 203 2 Journal on Communications February 203 doi:0.3969/j.issn.000-436x.203.02.07 2. 243002 2. 24304 L- 4 a b p q 2 L- TP393.0 B 000-436X(203)02-038-09 On the average diameter of directed
More informationVisual Anomaly Detection from Small Samples for Mobile Robots. *The University of Tokyo, Japan Graduate School of Information Science and Technology
Visual Anomaly Detection from Small Samples for Mobile Robots Hiroharu Kato*, Tatsuya Harada *,**, Yasuo Kuniyoshi* *The University of Tokyo, Japan Graduate School of Information Science and Technology
More informationConsensus Tracking for Multi-Agent Systems with Nonlinear Dynamics under Fixed Communication Topologies
Proceedings of the World Congress on Engineering and Computer Science Vol I WCECS, October 9-,, San Francisco, USA Consensus Tracking for Multi-Agent Systems with Nonlinear Dynamics under Fixed Communication
More informationCon struction and applica tion of m odeling tendency of land type tran sition ba sed on spa tia l adjacency
29 1 2009 1 ACTA ECOLOGICA SIN ICA Vol. 29, No. 1 Jan., 2009 1, 2, 3, 1, 3 (1., 250014; 2. ( ), 100083; 3., 100101) :,, 2000 2005,,,,, : ; ; ; : 100020933 (2009) 0120337207 : F323. 1, Q147, S126 : A Con
More informationAnimal learning theory
Animal learning theory Based on [Sutton and Barto, 1990, Dayan and Abbott, 2001] Bert Kappen [Sutton and Barto, 1990] Classical conditioning: - A conditioned stimulus (CS) and unconditioned stimulus (US)
More informationResearch on Balance of Unmanned Aerial Vehicle with Intelligent Algorithms for Optimizing Four-Rotor Differential Control
2019 2nd International Conference on Computer Science and Advanced Materials (CSAM 2019) Research on Balance of Unmanned Aerial Vehicle with Intelligent Algorithms for Optimizing Four-Rotor Differential
More informationSTEADY-STATE MEAN SQUARE PERFORMANCE OF A SPARSIFIED KERNEL LEAST MEAN SQUARE ALGORITHM.
STEADY-STATE MEAN SQUARE PERFORMANCE OF A SPARSIFIED KERNEL LEAST MEAN SQUARE ALGORITHM Badong Chen 1, Zhengda Qin 1, Lei Sun 2 1 Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University,
More informationForward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning
Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning Vivek Veeriah Dept. of Computing Science University of Alberta Edmonton, Canada vivekveeriah@ualberta.ca Harm van Seijen
More informationModel Reference Adaptive Control of Underwater Robotic Vehicle in Plane Motion
Proceedings of the 11th WSEAS International Conference on SSTEMS Agios ikolaos Crete Island Greece July 23-25 27 38 Model Reference Adaptive Control of Underwater Robotic Vehicle in Plane Motion j.garus@amw.gdynia.pl
More informationGeneralization and Function Approximation
Generalization and Function Approximation 0 Generalization and Function Approximation Suggested reading: Chapter 8 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998.
More informationMethods for Interpretable Treatment Regimes
Methods for Interpretable Treatment Regimes John Sperger University of North Carolina at Chapel Hill May 4 th 2018 Sperger (UNC) Interpretable Treatment Regimes May 4 th 2018 1 / 16 Why Interpretable Methods?
More informationSTATE GENERALIZATION WITH SUPPORT VECTOR MACHINES IN REINFORCEMENT LEARNING. Ryo Goto, Toshihiro Matsui and Hiroshi Matsuo
STATE GENERALIZATION WITH SUPPORT VECTOR MACHINES IN REINFORCEMENT LEARNING Ryo Goto, Toshihiro Matsui and Hiroshi Matsuo Department of Electrical and Computer Engineering, Nagoya Institute of Technology
More informationThe Application of The Steepest Gradient Descent for Control Design of Dubins Car for Tracking a Desired Path
J. Math. and Its Appl. ISSN: 1829-605X Vol. 4, No. 1, May 2007, 1 8 The Application of The Steepest Gradient Descent for Control Design of Dubins Car for Tracking a Desired Path Miswanto 1, I. Pranoto
More informationThe convergence limit of the temporal difference learning
The convergence limit of the temporal difference learning Ryosuke Nomura the University of Tokyo September 3, 2013 1 Outline Reinforcement Learning Convergence limit Construction of the feature vector
More informationTarget Tracking and Obstacle Avoidance for Multi-agent Systems
International Journal of Automation and Computing 7(4), November 2010, 550-556 DOI: 10.1007/s11633-010-0539-z Target Tracking and Obstacle Avoidance for Multi-agent Systems Jing Yan 1 Xin-Ping Guan 1,2
More informationProf. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be
REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while
More informationDYNAMICS AND CONTROL OF PIEZOELECTRIC- BASED STEWART PLATFORM FOR VIBRATION ISOLA- TION
DYNAMICS AND CONTROL OF PIEZOELECTRIC- BASED STEWART PLATFORM FOR VIBRATION ISOLA- TION Jinjun Shan, Xiaobu Xue, Ziliang Kang Department of Earth and Space Science and Engineering, York University 47 Keele
More informationReinforcement. Function Approximation. Learning with KATJA HOFMANN. Researcher, MSR Cambridge
Reinforcement Learning with Function Approximation KATJA HOFMANN Researcher, MSR Cambridge Representation and Generalization in RL Focus on training stability Learning generalizable value functions Navigating
More informationRL 14: POMDPs continued
RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally
More informationAn Introduction to Reinforcement Learning
An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@cse.iitb.ac.in Department of Computer Science and Engineering Indian Institute of Technology Bombay April 2018 What is Reinforcement
More informationREINFORCEMENT learning (RL) is a machine learning
146 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 22, NO. 1, JANUARY 214 Kernel-Based Approximate Dynamic Programming for Real-Time Online Learning Control: An Experimental Study Xin Xu, Senior
More informationVol. 22 No. 324 Aug JOU RNAL OF EXPERIMEN TAL MECHANICS : (2007) 03 & : O34 ; X593 : A. , Tewanim [3 6 ], [12 ] 1.
22 324 2007 8 JOU RNAL OF EXPERIMEN TAL MECHANICS Vol. 22 No. 324 Aug. 2007 :100124888 (2007) 03 &0420429206 3,,,, (,, 230027) :,,,,,,,, 25dB : ; ; ; : O34 ; X593 : A 0 1911 [1 ], ( ),,,,,,,,,,,,,,,, Tewanim,
More informationMachine Learning I Reinforcement Learning
Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:
More informationOptimal Convergence in Multi-Agent MDPs
Optimal Convergence in Multi-Agent MDPs Peter Vrancx 1, Katja Verbeeck 2, and Ann Nowé 1 1 {pvrancx, ann.nowe}@vub.ac.be, Computational Modeling Lab, Vrije Universiteit Brussel 2 k.verbeeck@micc.unimaas.nl,
More informationUsing Expectation-Maximization for Reinforcement Learning
NOTE Communicated by Andrew Barto and Michael Jordan Using Expectation-Maximization for Reinforcement Learning Peter Dayan Department of Brain and Cognitive Sciences, Center for Biological and Computational
More informationRegularization and Feature Selection in. the Least-Squares Temporal Difference
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter kolter@cs.stanford.edu Andrew Y. Ng ang@cs.stanford.edu Computer Science Department, Stanford University,
More informationReinforcement learning an introduction
Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,
More informationReading Response: Due Wednesday. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
Reading Response: Due Wednesday R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Another Example Get to the top of the hill as quickly as possible. reward = 1 for each step where
More information1373. Structural synthesis for broken strands repair operation metamorphic mechanism of EHV transmission lines
1373. Structural synthesis for broken strands repair operation metamorphic mechanism of EHV transmission lines Q. Yang 1 H. G. Wang 2 S. J. Li 3 1 3 College of Mechanical Engineering and Automation Northeastern
More informationAwareness-Based Decision Making for Search and Tracking
28 American Control Conference Westin Seattle Hotel, Seattle, Washington, USA June 11-13, 28 ThC7.6 Awareness-Based Decision Making for Search and Tracking Y. Wang, I. I. Hussein, R. Scott Erwin Abstract
More informationOpposition-Based Reinforcement Learning in the Management of Water Resources
Proceedings of the 7 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 7) Opposition-Based Reinforcement Learning in the Management of Water Resources M. Mahootchi, H.
More informationLecture 3: The Reinforcement Learning Problem
Lecture 3: The Reinforcement Learning Problem Objectives of this lecture: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationApplying Particle Swarm Optimization to Adaptive Controller Leandro dos Santos Coelho 1 and Fabio A. Guerra 2
Applying Particle Swarm Optimization to Adaptive Controller Leandro dos Santos Coelho 1 and Fabio A. Guerra 2 1 Production and Systems Engineering Graduate Program, PPGEPS Pontifical Catholic University
More informationA Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games
International Journal of Fuzzy Systems manuscript (will be inserted by the editor) A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games Mostafa D Awheda Howard M Schwartz Received:
More informationA Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty
2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of
More informationFormation Control of Mobile Robots with Obstacle Avoidance using Fuzzy Artificial Potential Field
Formation Control of Mobile Robots with Obstacle Avoidance using Fuzzy Artificial Potential Field Abbas Chatraei Department of Electrical Engineering, Najafabad Branch, Islamic Azad University, Najafabad,
More informationA Control Lyapunov Function Approach to Multiagent Coordination
IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 18, NO. 5, OCTOBER 2002 847 A Control Lyapunov Function Approach to Multiagent Coordination Petter Ögren, Magnus Egerstedt, Member, IEEE, and Xiaoming
More informationChapter 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
Chapter 7: Eligibility Traces R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Midterm Mean = 77.33 Median = 82 R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
More informationREINFORCEMENT LEARNING
REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents
More informationNeuromorphic Sensing and Control of Autonomous Micro-Aerial Vehicles
Neuromorphic Sensing and Control of Autonomous Micro-Aerial Vehicles Commercialization Fellowship Technical Presentation Taylor Clawson Ithaca, NY June 12, 2018 About Me: Taylor Clawson 3 rd year PhD student
More informationEnsembles of Extreme Learning Machine Networks for Value Prediction
Ensembles of Extreme Learning Machine Networks for Value Prediction Pablo Escandell-Montero, José M. Martínez-Martínez, Emilio Soria-Olivas, Joan Vila-Francés, José D. Martín-Guerrero IDAL, Intelligent
More informationNONLINEAR PATH CONTROL FOR A DIFFERENTIAL DRIVE MOBILE ROBOT
NONLINEAR PATH CONTROL FOR A DIFFERENTIAL DRIVE MOBILE ROBOT Plamen PETROV Lubomir DIMITROV Technical University of Sofia Bulgaria Abstract. A nonlinear feedback path controller for a differential drive
More informationScheduled Perception for Energy-Efficient Path Following
Scheduled Perception for Energy-Efficient Path Following Peter Ondrúška, Corina Gurău, Letizia Marchegiani, Chi Hay Tong and Ingmar Posner Abstract This paper explores the idea of reducing a robot s energy
More informationELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems
ELEC4631 s Lecture 2: Dynamic Control Systems 7 March 2011 Overview of dynamic control systems Goals of Controller design Autonomous dynamic systems Linear Multi-input multi-output (MIMO) systems Bat flight
More informationAdvanced Probes for Planetary Surface and Subsurface Exploration
Workshop on Space Robotics, ICRA 2011 Advanced Probes for Planetary Surface and Subsurface Exploration Takashi Kubota (JAXA/ISAS/JSPEC) Hayato Omori, Taro Nakamura (Chuo Univ.) JAXA Space Exploration Program
More informationarxiv: v1 [cs.ai] 5 Nov 2017
arxiv:1711.01569v1 [cs.ai] 5 Nov 2017 Markus Dumke Department of Statistics Ludwig-Maximilians-Universität München markus.dumke@campus.lmu.de Abstract Temporal-difference (TD) learning is an important
More information1. 1 M oo 3-TiO 2gSiO 2. Perk in E lm er2l am bda 35 UV 2V is Spectrom eter : E2m ail: tp ṫ tj. cn
V ol. 25 N o. 6 2 0 0 4 6 CHEM ICAL JOU RNAL O F CH IN ESE UN IV ERS IT IES 11151119 M oo 3-TiO 2gSiO 2,,,,, (, 300072) M oo 32T io 2gSiO 2, XRD, BET, TPR, IR, UVDR S TPD,,,, ; T io 2 SiO 2 M oo 3,,. 100,
More informationValue Function Approximation in Reinforcement Learning using the Fourier Basis
Value Function Approximation in Reinforcement Learning using the Fourier Basis George Konidaris Sarah Osentoski Technical Report UM-CS-28-19 Autonomous Learning Laboratory Computer Science Department University
More informationProgress In Electromagnetics Research M, Vol. 21, 33 45, 2011
Progress In Electromagnetics Research M, Vol. 21, 33 45, 211 INTERFEROMETRIC ISAR THREE-DIMENSIONAL IMAGING USING ONE ANTENNA C. L. Liu *, X. Z. Gao, W. D. Jiang, and X. Li College of Electronic Science
More informationMarkov Models and Reinforcement Learning. Stephen G. Ware CSCI 4525 / 5525
Markov Models and Reinforcement Learning Stephen G. Ware CSCI 4525 / 5525 Camera Vacuum World (CVW) 2 discrete rooms with cameras that detect dirt. A mobile robot with a vacuum. The goal is to ensure both
More informationReinforcement Learning, Neural Networks and PI Control Applied to a Heating Coil
Reinforcement Learning, Neural Networks and PI Control Applied to a Heating Coil Charles W. Anderson 1, Douglas C. Hittle 2, Alon D. Katz 2, and R. Matt Kretchmar 1 1 Department of Computer Science Colorado
More informationDecentralized Control of Nonlinear Multi-Agent Systems Using Single Network Adaptive Critics
Decentralized Control of Nonlinear Multi-Agent Systems Using Single Network Adaptive Critics Ali Heydari Mechanical & Aerospace Engineering Dept. Missouri University of Science and Technology Rolla, MO,
More informationThe Optimal Reward Baseline for Gradient Based Reinforcement Learning
538 WEAVER& TAO UAI2001 The Optimal Reward Baseline for Gradient Based Reinforcement Learning Lex Weaver Department of Computer Science Australian National University ACT AUSTRALIA 0200 Lex. Weaver@cs.anu.edu.au
More informationQUICR-learning for Multi-Agent Coordination
QUICR-learning for Multi-Agent Coordination Adrian K. Agogino UCSC, NASA Ames Research Center Mailstop 269-3 Moffett Field, CA 94035 adrian@email.arc.nasa.gov Kagan Tumer NASA Ames Research Center Mailstop
More informationVariable Structure Control of Pendulum-driven Spherical Mobile Robots
rd International Conerence on Computer and Electrical Engineering (ICCEE ) IPCSIT vol. 5 () () IACSIT Press, Singapore DOI:.776/IPCSIT..V5.No..6 Variable Structure Control o Pendulum-driven Spherical Mobile
More informationFace Recognition Using Global Gabor Filter in Small Sample Case *
ISSN 1673-9418 CODEN JKYTA8 E-mail: fcst@public2.bta.net.cn Journal of Frontiers of Computer Science and Technology http://www.ceaj.org 1673-9418/2010/04(05)-0420-06 Tel: +86-10-51616056 DOI: 10.3778/j.issn.1673-9418.2010.05.004
More informationA Sliding Mode Controller Using Neural Networks for Robot Manipulator
ESANN'4 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 8-3 April 4, d-side publi., ISBN -9337-4-8, pp. 93-98 A Sliding Mode Controller Using Neural Networks for Robot
More informationDevelopment of a Deep Recurrent Neural Network Controller for Flight Applications
Development of a Deep Recurrent Neural Network Controller for Flight Applications American Control Conference (ACC) May 26, 2017 Scott A. Nivison Pramod P. Khargonekar Department of Electrical and Computer
More informationOn and Off-Policy Relational Reinforcement Learning
On and Off-Policy Relational Reinforcement Learning Christophe Rodrigues, Pierre Gérard, and Céline Rouveirol LIPN, UMR CNRS 73, Institut Galilée - Université Paris-Nord first.last@lipn.univ-paris13.fr
More informationCooperative Control and Mobile Sensor Networks
Cooperative Control and Mobile Sensor Networks Cooperative Control, Part I, A-C Naomi Ehrich Leonard Mechanical and Aerospace Engineering Princeton University and Electrical Systems and Automation University
More information