Reactive Self2re scue Control for Autonomous Mobile Ro bot Ba sed on Reinforcement Learning

Size: px
Start display at page:

Download "Reactive Self2re scue Control for Autonomous Mobile Ro bot Ba sed on Reinforcement Learning"

Transcription

1 J OU RNAL OF SHAN GHA I J IAO TON G UNIV ERSIT Y Vol. 43 No. 11 Nov : (2009) ,,, (, ) :.,,.,.. : ; ; Q ; : TP 242 : A Reactive Self2re scue Control for Autonomous Mobile Ro bot Ba sed on Reinforcement Learning W A N G Zhon g2w ei, CA O Qi2x i n, L UA N N an, Z H A N G L ei ( Research Instit ute of Robotics, Shanghai Jiaotong U niversity, Shanghai , China) Abstract : A control technique to achieve self2rescue of autonomous mobile robot f rom obstacle environment based o n reinforcement learning was p ropo sed. Motio n co nt rol st rategy of self2rescue was got o n line t hrough t he interaction between t he mobile robot and the obstacle environment, so t he self2rescue control technique help s to overcome obstacle environment by t he autonomous mobile robot independently and avoid t he loss f rom rescue activity and task failure. Prior knowledge of working environment was applied to di2 rect the design of heuristic reward f unction for the reinforcement learning system, which guarantees t he correct direction of searching and learning control strategy. The simulation experiment s indicate that it is feasible to achieve self2rescue by self learning co nt rol st rategy. Key words : autonomous mobile robot ; reactive control ; Q2learning ; heuristic reward f unction,,,,.,,.,., [ 122 ], [324 ], [ 526 ], : : ( ), (863) (2006AA040203) : (19782),,,,. (19602),,,, ( Tel. ) : ; E2mail edu. cn.

2 ,,., Brooks [7 ],,,.,, [8 ] [9 ] [10 ].,,. ( Reinforcement Learning,RL ) Agent, [11 ],., RL,,,.,,, RL, RL [12213 ]., RL,,,. 1 1.,, S, A, G., S G,.,,, G,,, G. 1 Fig. 1 Schematic of mountain2car being in environmental obstacles [11 ] x t+1 = bound[ x t + vt+1 ] vt+1 = bound[ vt u - gcos 3 x t ] (1), x v. u 3, u { + 1, - 1, 0}, 3. g = ; G S A x (1) bound[ ] x v : x , x = , vt + 1 = 0 ; x 0. 5,. v [ ,0. 07 ]. 2,,,, Q Q [11 ],, Agent. Q : t, Agent at, st s t + 1, rt, rt s t + 1 at s t. Q,, Q, Q s t at,, Qt+1 (st, at) = rt + max a t { Q(st+1, at) at A} (2)

3 11, : 1753 Q, Q Agent Q,.,,,., Sutton [11 ], r( t) = - 1, x < , x 0. 5 (3),.,,,.,,.,,,,, 2.,, ;,, + 2 ;, + 1. : r( t) = + 2, ( x t+1 - x t) u 0, v ( t) 0 + 1, ( x t+1 - x t) u 0, v ( t) < 0-1, (4),,.,,,. Q : (1). (2) st. Q, 1 - a t ; a t. (3) st + 1, (4) rt. (4) rt, Q, Qt ( st, at) = (1 - at) Qt - 1 (st, at) + at [ rt + gv ( st + 1 ) ] s = st, a = at Qt - 1 ( st, at),v ( st + 1 ) = max a A { Qt - 1 ( st + 1, a) }. 3 2 Q2 Fig. 2 Framework of Q2learning based on heuristic reward f unction 1 3, :,. :, Q Q., RL. 2 Q : = 0. 5, = 0. 95, = x = v = Q., 3 (a) 200 ( : G 1 000,,, ). 3 ( b).

4 (a) (b) 3 Fig. 3 Variation curve of position and velocity of mountain car with different reward f unction, Q, ; Q,,,. : Q, Q,,..,,,,. x = ( n), n, 4. (a) = 0. 1 (b) = 0. 3 (c) = 0. 5 (d) = n Fig. 4 Relationship between trials number of mountain car s overcoming obstacle and parameters of control algorithm, 20.,. 4 Q, Q,.

5 11, : 1755 : Q,. : [ 1 ] Tsalatsanis A, Valavanis K, Tsourveloudis N. Mobile robot navigation using sonar and range meas2 urement s f rom uncalibrated cameras [J ]. Journal of Intelligent and Robotic Systems, 2007, 48 ( 2) : [ 2 ] Mata M, Armingol J M, Fernndez J, et al. Object learning and detection using evolutionary deformable models for mobile robot navigation [J ]. Robotica, 2008, 26 (1) : [ 3 ] Lan G P, Ma S G. Development of a novel crawler mechanism with polymorphic locomotion [J ]. vanced Robotics, 2007, 21 (3/ 4) : Ad2 [ 4 ],,,. [J ].,2004,30 (3) : DEN G Zong2quan, GAO Hai2bo, WAN G Shao2chun, et al. Analysis of climbing obstacle capability of lunar rover with planetary wheel [J ]. Journal of Beijing University of Aeronautics and Astronautics, 2004, 30 (3) : [ 5 ],,,. [J ]. ( ), 2006, 34 (7) : XIAN G Xian2bo, XU Guo2hua, CAI Tao, et al. In2 telligent self2rescue system for underwater vehicles [J ]. Journal of Huazhong University of Science and Technology ( Natural Science Edition), 2006, 34 (7) : [ 6 ] Wang Z W, Cao Q X, L uan N, et al. Development of new pipeline maintenance system for repairing ear2 ly2built off shore oil pipelines [ C]/ / IEEE ICIT Piscataway, United States : IEEE Publishers, 2008 : 126. [ 7 ] Brooks R A. Robust layered control system for a mo2 bile robot [J ]. IEEE Journal of Robotics and Automa2 tion, 1986, 2 (1) : [ 8 ] Faress K N, EI Hagry M T, EI Kosy A A. Trajecto2 ry tracking control for a wheeled mobile robot using f uzzy logic controller [J ]. WSEAS Transactions on Systems, 2005, 4 (7) : [ 9 ] Wang X Q, Hou Z G, Zou A M, et al. A behavior controller based on spiking neural networks for mo2 bile robot s [J ]. Neurocomputing, 2008, 71 ( 426 ) : [10 ] Er M J, Tan T P, Loh S Y. Control of a mobile robot using generalized dynamic f uzzy neural net2 works [J ]. Microprocessors and Microsystems, 2004, 28 (9) : [11 ] Sutton R S, Barto A G. Reinforcement Learning : An Introduction [ M ]. Cambridge, MA : MIT Press, [12 ] Mataric M J. Reward functions for accelerated learn2 ing [ C]/ / Proceedings of the Eleventh International Conference on Machine Learning. San Francisco, CA : Morgan Kauf mann Publishers, 1994 : [13 ],. [J ].,2005,32 (3) : WEI Ying2zi, ZHAO Ming2yang. Design and conver2 gence analysis of a heuristic reward f unction for rein2 forcement learning algorithms [J ]. Computer Science, 2005, 32 (3) : Zernike CT,, (, ) : Zernike CT. Zernike,,.,,.

Replacing eligibility trace for action-value learning with function approximation

Replacing eligibility trace for action-value learning with function approximation Replacing eligibility trace for action-value learning with function approximation Kary FRÄMLING Helsinki University of Technology PL 5500, FI-02015 TKK - Finland Abstract. The eligibility trace is one

More information

ilstd: Eligibility Traces and Convergence Analysis

ilstd: Eligibility Traces and Convergence Analysis ilstd: Eligibility Traces and Convergence Analysis Alborz Geramifard Michael Bowling Martin Zinkevich Richard S. Sutton Department of Computing Science University of Alberta Edmonton, Alberta {alborz,bowling,maz,sutton}@cs.ualberta.ca

More information

Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach

Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach B. Bischoff 1, D. Nguyen-Tuong 1,H.Markert 1 anda.knoll 2 1- Robert Bosch GmbH - Corporate Research Robert-Bosch-Str. 2, 71701

More information

Improving estimations of a robot s position and attitude w ith accelerom eter enhanced odometry

Improving estimations of a robot s position and attitude w ith accelerom eter enhanced odometry 4 2 Vol. 4. 2 2009 4 CAA I Transactions on Intelligent System s Ap r. 2009 1, 2, 3 (1., 100094 2., 150080 3. 304 2, 046012) :,,. Gyrodometry,,,,,. 1 /3, 1 /6,. : Gyrodometry : TP242 : A : 167324785 (2009)

More information

Optimal algorithm and application for point to point iterative learning control via updating reference trajectory

Optimal algorithm and application for point to point iterative learning control via updating reference trajectory 33 9 2016 9 DOI: 10.7641/CTA.2016.50970 Control Theory & Applications Vol. 33 No. 9 Sep. 2016,, (, 214122) :,.,,.,,,.. : ; ; ; ; : TP273 : A Optimal algorithm and application for point to point iterative

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

An application of the temporal difference algorithm to the truck backer-upper problem

An application of the temporal difference algorithm to the truck backer-upper problem ESANN 214 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 23-25 April 214, i6doc.com publ., ISBN 978-28741995-7. Available

More information

Open Theoretical Questions in Reinforcement Learning

Open Theoretical Questions in Reinforcement Learning Open Theoretical Questions in Reinforcement Learning Richard S. Sutton AT&T Labs, Florham Park, NJ 07932, USA, sutton@research.att.com, www.cs.umass.edu/~rich Reinforcement learning (RL) concerns the problem

More information

RL 3: Reinforcement Learning

RL 3: Reinforcement Learning RL 3: Reinforcement Learning Q-Learning Michael Herrmann University of Edinburgh, School of Informatics 20/01/2015 Last time: Multi-Armed Bandits (10 Points to remember) MAB applications do exist (e.g.

More information

Reinforcement Learning In Continuous Time and Space

Reinforcement Learning In Continuous Time and Space Reinforcement Learning In Continuous Time and Space presentation of paper by Kenji Doya Leszek Rybicki lrybicki@mat.umk.pl 18.07.2008 Leszek Rybicki lrybicki@mat.umk.pl Reinforcement Learning In Continuous

More information

An Adaptive Clustering Method for Model-free Reinforcement Learning

An Adaptive Clustering Method for Model-free Reinforcement Learning An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at

More information

In: Proc. BENELEARN-98, 8th Belgian-Dutch Conference on Machine Learning, pp 9-46, 998 Linear Quadratic Regulation using Reinforcement Learning Stephan ten Hagen? and Ben Krose Department of Mathematics,

More information

Enhancing a Model-Free Adaptive Controller through Evolutionary Computation

Enhancing a Model-Free Adaptive Controller through Evolutionary Computation Enhancing a Model-Free Adaptive Controller through Evolutionary Computation Anthony Clark, Philip McKinley, and Xiaobo Tan Michigan State University, East Lansing, USA Aquatic Robots Practical uses autonomous

More information

Value Function Approximation through Sparse Bayesian Modeling

Value Function Approximation through Sparse Bayesian Modeling Value Function Approximation through Sparse Bayesian Modeling Nikolaos Tziortziotis and Konstantinos Blekas Department of Computer Science, University of Ioannina, P.O. Box 1186, 45110 Ioannina, Greece,

More information

Multi-robot Formation Control Using Reinforcement Learning Method

Multi-robot Formation Control Using Reinforcement Learning Method Multi-robot Formation Control Using Reinforcement Learning Metho Guoyu Zuo, Jiatong Han, an Guansheng Han School of Electronic Information & Control Engineering, Beijing University of Technology, Beijing

More information

Reinforcement Learning for Continuous. Action using Stochastic Gradient Ascent. Hajime KIMURA, Shigenobu KOBAYASHI JAPAN

Reinforcement Learning for Continuous. Action using Stochastic Gradient Ascent. Hajime KIMURA, Shigenobu KOBAYASHI JAPAN Reinforcement Learning for Continuous Action using Stochastic Gradient Ascent Hajime KIMURA, Shigenobu KOBAYASHI Tokyo Institute of Technology, 4259 Nagatsuda, Midori-ku Yokohama 226-852 JAPAN Abstract:

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Trajectory Planning Using High-Order Polynomials under Acceleration Constraint

Trajectory Planning Using High-Order Polynomials under Acceleration Constraint Journal of Optimization in Industrial Engineering (07) -6 Trajectory Planning Using High-Order Polynomials under Acceleration Constraint Hossein Barghi Jond a,*,vasif V. Nabiyev b, Rifat Benveniste c a

More information

Partially observable Markov decision processes. Department of Computer Science, Czech Technical University in Prague

Partially observable Markov decision processes. Department of Computer Science, Czech Technical University in Prague Partially observable Markov decision processes Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:

More information

A Sliding Mode Control based on Nonlinear Disturbance Observer for the Mobile Manipulator

A Sliding Mode Control based on Nonlinear Disturbance Observer for the Mobile Manipulator International Core Journal of Engineering Vol.3 No.6 7 ISSN: 44-895 A Sliding Mode Control based on Nonlinear Disturbance Observer for the Mobile Manipulator Yanna Si Information Engineering College Henan

More information

self-driving car technology introduction

self-driving car technology introduction self-driving car technology introduction slide 1 Contents of this presentation 1. Motivation 2. Methods 2.1 road lane detection 2.2 collision avoidance 3. Summary 4. Future work slide 2 Motivation slide

More information

An online kernel-based clustering approach for value function approximation

An online kernel-based clustering approach for value function approximation An online kernel-based clustering approach for value function approximation N. Tziortziotis and K. Blekas Department of Computer Science, University of Ioannina P.O.Box 1186, Ioannina 45110 - Greece {ntziorzi,kblekas}@cs.uoi.gr

More information

Function Approximation for Continuous Constrained MDPs

Function Approximation for Continuous Constrained MDPs Function Approximation for Continuous Constrained MDPs Aditya Undurti, Alborz Geramifard, Jonathan P. How Abstract In this work we apply function approximation techniques to solve continuous, constrained

More information

THE system is controlled by various approaches using the

THE system is controlled by various approaches using the , 23-25 October, 2013, San Francisco, USA A Study on Using Multiple Sets of Particle Filters to Control an Inverted Pendulum Midori Saito and Ichiro Kobayashi Abstract The dynamic system is controlled

More information

An Introduction to Reinforcement Learning

An Introduction to Reinforcement Learning An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@csa.iisc.ernet.in Department of Computer Science and Automation Indian Institute of Science August 2014 What is Reinforcement

More information

Unifying Behavior-Based Control Design and Hybrid Stability Theory

Unifying Behavior-Based Control Design and Hybrid Stability Theory 9 American Control Conference Hyatt Regency Riverfront St. Louis MO USA June - 9 ThC.6 Unifying Behavior-Based Control Design and Hybrid Stability Theory Vladimir Djapic 3 Jay Farrell 3 and Wenjie Dong

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

On-line Reinforcement Learning for Nonlinear Motion Control: Quadratic and Non-Quadratic Reward Functions

On-line Reinforcement Learning for Nonlinear Motion Control: Quadratic and Non-Quadratic Reward Functions Preprints of the 19th World Congress The International Federation of Automatic Control Cape Town, South Africa. August 24-29, 214 On-line Reinforcement Learning for Nonlinear Motion Control: Quadratic

More information

Neural Map. Structured Memory for Deep RL. Emilio Parisotto

Neural Map. Structured Memory for Deep RL. Emilio Parisotto Neural Map Structured Memory for Deep RL Emilio Parisotto eparisot@andrew.cmu.edu PhD Student Machine Learning Department Carnegie Mellon University Supervised Learning Most deep learning problems are

More information

Attitude control of a hopping robot: a feasibility study

Attitude control of a hopping robot: a feasibility study Attitude control of a hopping robot: a feasibility study Tammepõld, R. and Kruusmaa, M. Tallinn University of Technology, Estonia Fiorini, P. University of Verona, Italy Summary Motivation Small mobile

More information

Short Course: Multiagent Systems. Multiagent Systems. Lecture 1: Basics Agents Environments. Reinforcement Learning. This course is about:

Short Course: Multiagent Systems. Multiagent Systems. Lecture 1: Basics Agents Environments. Reinforcement Learning. This course is about: Short Course: Multiagent Systems Lecture 1: Basics Agents Environments Reinforcement Learning Multiagent Systems This course is about: Agents: Sensing, reasoning, acting Multiagent Systems: Distributed

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

Transfer of Neuroevolved Controllers in Unstable Domains

Transfer of Neuroevolved Controllers in Unstable Domains In Proceedings of the Genetic Evolutionary Computation Conference (GECCO 24) Transfer of Neuroevolved Controllers in Unstable Domains Faustino J. Gomez and Risto Miikkulainen Department of Computer Sciences,

More information

Terramechanics Based Analysis and Motion Control of Rovers on Simulated Lunar Soil

Terramechanics Based Analysis and Motion Control of Rovers on Simulated Lunar Soil ICRA '07 Space Robotics Workshop 14 April, 2007 Terramechanics Based Analysis and Motion Control of Rovers on Simulated Lunar Soil Kazuya Yoshida and Keiji Nagatani Dept. Aerospace Engineering Graduate

More information

Two-stage Pedestrian Detection Based on Multiple Features and Machine Learning

Two-stage Pedestrian Detection Based on Multiple Features and Machine Learning 38 3 Vol. 38, No. 3 2012 3 ACTA AUTOMATICA SINICA March, 2012 1 1 1, (Adaboost) (Support vector machine, SVM). (Four direction features, FDF) GAB (Gentle Adaboost) (Entropy-histograms of oriented gradients,

More information

An Optimal Tracking Approach to Formation Control of Nonlinear Multi-Agent Systems

An Optimal Tracking Approach to Formation Control of Nonlinear Multi-Agent Systems AIAA Guidance, Navigation, and Control Conference 13-16 August 212, Minneapolis, Minnesota AIAA 212-4694 An Optimal Tracking Approach to Formation Control of Nonlinear Multi-Agent Systems Ali Heydari 1

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Dialogue management: Parametric approaches to policy optimisation. Dialogue Systems Group, Cambridge University Engineering Department

Dialogue management: Parametric approaches to policy optimisation. Dialogue Systems Group, Cambridge University Engineering Department Dialogue management: Parametric approaches to policy optimisation Milica Gašić Dialogue Systems Group, Cambridge University Engineering Department 1 / 30 Dialogue optimisation as a reinforcement learning

More information

Variable Geometry Single-Tracked Mechanism for a Rescue Robot

Variable Geometry Single-Tracked Mechanism for a Rescue Robot Proceedings of the 005 IEEE International Workshop on Safety, Security and Rescue Robotics Kobe, Japan, June 005 Variable Geometry Single-Tracked Mechanism for a Rescue Robot Sung Kyun Lim 373- Guseong-dong,

More information

Analysis and Design of Hybrid AI/Control Systems

Analysis and Design of Hybrid AI/Control Systems Analysis and Design of Hybrid AI/Control Systems Glen Henshaw, PhD (formerly) Space Systems Laboratory University of Maryland,College Park 13 May 2011 Dynamically Complex Vehicles Increased deployment

More information

On the average diameter of directed loop networks

On the average diameter of directed loop networks 34 2 Vol.34 No. 2 203 2 Journal on Communications February 203 doi:0.3969/j.issn.000-436x.203.02.07 2. 243002 2. 24304 L- 4 a b p q 2 L- TP393.0 B 000-436X(203)02-038-09 On the average diameter of directed

More information

Visual Anomaly Detection from Small Samples for Mobile Robots. *The University of Tokyo, Japan Graduate School of Information Science and Technology

Visual Anomaly Detection from Small Samples for Mobile Robots. *The University of Tokyo, Japan Graduate School of Information Science and Technology Visual Anomaly Detection from Small Samples for Mobile Robots Hiroharu Kato*, Tatsuya Harada *,**, Yasuo Kuniyoshi* *The University of Tokyo, Japan Graduate School of Information Science and Technology

More information

Consensus Tracking for Multi-Agent Systems with Nonlinear Dynamics under Fixed Communication Topologies

Consensus Tracking for Multi-Agent Systems with Nonlinear Dynamics under Fixed Communication Topologies Proceedings of the World Congress on Engineering and Computer Science Vol I WCECS, October 9-,, San Francisco, USA Consensus Tracking for Multi-Agent Systems with Nonlinear Dynamics under Fixed Communication

More information

Con struction and applica tion of m odeling tendency of land type tran sition ba sed on spa tia l adjacency

Con struction and applica tion of m odeling tendency of land type tran sition ba sed on spa tia l adjacency 29 1 2009 1 ACTA ECOLOGICA SIN ICA Vol. 29, No. 1 Jan., 2009 1, 2, 3, 1, 3 (1., 250014; 2. ( ), 100083; 3., 100101) :,, 2000 2005,,,,, : ; ; ; : 100020933 (2009) 0120337207 : F323. 1, Q147, S126 : A Con

More information

Animal learning theory

Animal learning theory Animal learning theory Based on [Sutton and Barto, 1990, Dayan and Abbott, 2001] Bert Kappen [Sutton and Barto, 1990] Classical conditioning: - A conditioned stimulus (CS) and unconditioned stimulus (US)

More information

Research on Balance of Unmanned Aerial Vehicle with Intelligent Algorithms for Optimizing Four-Rotor Differential Control

Research on Balance of Unmanned Aerial Vehicle with Intelligent Algorithms for Optimizing Four-Rotor Differential Control 2019 2nd International Conference on Computer Science and Advanced Materials (CSAM 2019) Research on Balance of Unmanned Aerial Vehicle with Intelligent Algorithms for Optimizing Four-Rotor Differential

More information

STEADY-STATE MEAN SQUARE PERFORMANCE OF A SPARSIFIED KERNEL LEAST MEAN SQUARE ALGORITHM.

STEADY-STATE MEAN SQUARE PERFORMANCE OF A SPARSIFIED KERNEL LEAST MEAN SQUARE ALGORITHM. STEADY-STATE MEAN SQUARE PERFORMANCE OF A SPARSIFIED KERNEL LEAST MEAN SQUARE ALGORITHM Badong Chen 1, Zhengda Qin 1, Lei Sun 2 1 Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University,

More information

Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning

Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning Vivek Veeriah Dept. of Computing Science University of Alberta Edmonton, Canada vivekveeriah@ualberta.ca Harm van Seijen

More information

Model Reference Adaptive Control of Underwater Robotic Vehicle in Plane Motion

Model Reference Adaptive Control of Underwater Robotic Vehicle in Plane Motion Proceedings of the 11th WSEAS International Conference on SSTEMS Agios ikolaos Crete Island Greece July 23-25 27 38 Model Reference Adaptive Control of Underwater Robotic Vehicle in Plane Motion j.garus@amw.gdynia.pl

More information

Generalization and Function Approximation

Generalization and Function Approximation Generalization and Function Approximation 0 Generalization and Function Approximation Suggested reading: Chapter 8 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998.

More information

Methods for Interpretable Treatment Regimes

Methods for Interpretable Treatment Regimes Methods for Interpretable Treatment Regimes John Sperger University of North Carolina at Chapel Hill May 4 th 2018 Sperger (UNC) Interpretable Treatment Regimes May 4 th 2018 1 / 16 Why Interpretable Methods?

More information

STATE GENERALIZATION WITH SUPPORT VECTOR MACHINES IN REINFORCEMENT LEARNING. Ryo Goto, Toshihiro Matsui and Hiroshi Matsuo

STATE GENERALIZATION WITH SUPPORT VECTOR MACHINES IN REINFORCEMENT LEARNING. Ryo Goto, Toshihiro Matsui and Hiroshi Matsuo STATE GENERALIZATION WITH SUPPORT VECTOR MACHINES IN REINFORCEMENT LEARNING Ryo Goto, Toshihiro Matsui and Hiroshi Matsuo Department of Electrical and Computer Engineering, Nagoya Institute of Technology

More information

The Application of The Steepest Gradient Descent for Control Design of Dubins Car for Tracking a Desired Path

The Application of The Steepest Gradient Descent for Control Design of Dubins Car for Tracking a Desired Path J. Math. and Its Appl. ISSN: 1829-605X Vol. 4, No. 1, May 2007, 1 8 The Application of The Steepest Gradient Descent for Control Design of Dubins Car for Tracking a Desired Path Miswanto 1, I. Pranoto

More information

The convergence limit of the temporal difference learning

The convergence limit of the temporal difference learning The convergence limit of the temporal difference learning Ryosuke Nomura the University of Tokyo September 3, 2013 1 Outline Reinforcement Learning Convergence limit Construction of the feature vector

More information

Target Tracking and Obstacle Avoidance for Multi-agent Systems

Target Tracking and Obstacle Avoidance for Multi-agent Systems International Journal of Automation and Computing 7(4), November 2010, 550-556 DOI: 10.1007/s11633-010-0539-z Target Tracking and Obstacle Avoidance for Multi-agent Systems Jing Yan 1 Xin-Ping Guan 1,2

More information

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while

More information

DYNAMICS AND CONTROL OF PIEZOELECTRIC- BASED STEWART PLATFORM FOR VIBRATION ISOLA- TION

DYNAMICS AND CONTROL OF PIEZOELECTRIC- BASED STEWART PLATFORM FOR VIBRATION ISOLA- TION DYNAMICS AND CONTROL OF PIEZOELECTRIC- BASED STEWART PLATFORM FOR VIBRATION ISOLA- TION Jinjun Shan, Xiaobu Xue, Ziliang Kang Department of Earth and Space Science and Engineering, York University 47 Keele

More information

Reinforcement. Function Approximation. Learning with KATJA HOFMANN. Researcher, MSR Cambridge

Reinforcement. Function Approximation. Learning with KATJA HOFMANN. Researcher, MSR Cambridge Reinforcement Learning with Function Approximation KATJA HOFMANN Researcher, MSR Cambridge Representation and Generalization in RL Focus on training stability Learning generalizable value functions Navigating

More information

RL 14: POMDPs continued

RL 14: POMDPs continued RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally

More information

An Introduction to Reinforcement Learning

An Introduction to Reinforcement Learning An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@cse.iitb.ac.in Department of Computer Science and Engineering Indian Institute of Technology Bombay April 2018 What is Reinforcement

More information

REINFORCEMENT learning (RL) is a machine learning

REINFORCEMENT learning (RL) is a machine learning 146 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 22, NO. 1, JANUARY 214 Kernel-Based Approximate Dynamic Programming for Real-Time Online Learning Control: An Experimental Study Xin Xu, Senior

More information

Vol. 22 No. 324 Aug JOU RNAL OF EXPERIMEN TAL MECHANICS : (2007) 03 & : O34 ; X593 : A. , Tewanim [3 6 ], [12 ] 1.

Vol. 22 No. 324 Aug JOU RNAL OF EXPERIMEN TAL MECHANICS : (2007) 03 & : O34 ; X593 : A. , Tewanim [3 6 ], [12 ] 1. 22 324 2007 8 JOU RNAL OF EXPERIMEN TAL MECHANICS Vol. 22 No. 324 Aug. 2007 :100124888 (2007) 03 &0420429206 3,,,, (,, 230027) :,,,,,,,, 25dB : ; ; ; : O34 ; X593 : A 0 1911 [1 ], ( ),,,,,,,,,,,,,,,, Tewanim,

More information

Machine Learning I Reinforcement Learning

Machine Learning I Reinforcement Learning Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:

More information

Optimal Convergence in Multi-Agent MDPs

Optimal Convergence in Multi-Agent MDPs Optimal Convergence in Multi-Agent MDPs Peter Vrancx 1, Katja Verbeeck 2, and Ann Nowé 1 1 {pvrancx, ann.nowe}@vub.ac.be, Computational Modeling Lab, Vrije Universiteit Brussel 2 k.verbeeck@micc.unimaas.nl,

More information

Using Expectation-Maximization for Reinforcement Learning

Using Expectation-Maximization for Reinforcement Learning NOTE Communicated by Andrew Barto and Michael Jordan Using Expectation-Maximization for Reinforcement Learning Peter Dayan Department of Brain and Cognitive Sciences, Center for Biological and Computational

More information

Regularization and Feature Selection in. the Least-Squares Temporal Difference

Regularization and Feature Selection in. the Least-Squares Temporal Difference Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter kolter@cs.stanford.edu Andrew Y. Ng ang@cs.stanford.edu Computer Science Department, Stanford University,

More information

Reinforcement learning an introduction

Reinforcement learning an introduction Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,

More information

Reading Response: Due Wednesday. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

Reading Response: Due Wednesday. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Reading Response: Due Wednesday R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Another Example Get to the top of the hill as quickly as possible. reward = 1 for each step where

More information

1373. Structural synthesis for broken strands repair operation metamorphic mechanism of EHV transmission lines

1373. Structural synthesis for broken strands repair operation metamorphic mechanism of EHV transmission lines 1373. Structural synthesis for broken strands repair operation metamorphic mechanism of EHV transmission lines Q. Yang 1 H. G. Wang 2 S. J. Li 3 1 3 College of Mechanical Engineering and Automation Northeastern

More information

Awareness-Based Decision Making for Search and Tracking

Awareness-Based Decision Making for Search and Tracking 28 American Control Conference Westin Seattle Hotel, Seattle, Washington, USA June 11-13, 28 ThC7.6 Awareness-Based Decision Making for Search and Tracking Y. Wang, I. I. Hussein, R. Scott Erwin Abstract

More information

Opposition-Based Reinforcement Learning in the Management of Water Resources

Opposition-Based Reinforcement Learning in the Management of Water Resources Proceedings of the 7 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 7) Opposition-Based Reinforcement Learning in the Management of Water Resources M. Mahootchi, H.

More information

Lecture 3: The Reinforcement Learning Problem

Lecture 3: The Reinforcement Learning Problem Lecture 3: The Reinforcement Learning Problem Objectives of this lecture: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Applying Particle Swarm Optimization to Adaptive Controller Leandro dos Santos Coelho 1 and Fabio A. Guerra 2

Applying Particle Swarm Optimization to Adaptive Controller Leandro dos Santos Coelho 1 and Fabio A. Guerra 2 Applying Particle Swarm Optimization to Adaptive Controller Leandro dos Santos Coelho 1 and Fabio A. Guerra 2 1 Production and Systems Engineering Graduate Program, PPGEPS Pontifical Catholic University

More information

A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games

A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games International Journal of Fuzzy Systems manuscript (will be inserted by the editor) A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games Mostafa D Awheda Howard M Schwartz Received:

More information

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty 2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of

More information

Formation Control of Mobile Robots with Obstacle Avoidance using Fuzzy Artificial Potential Field

Formation Control of Mobile Robots with Obstacle Avoidance using Fuzzy Artificial Potential Field Formation Control of Mobile Robots with Obstacle Avoidance using Fuzzy Artificial Potential Field Abbas Chatraei Department of Electrical Engineering, Najafabad Branch, Islamic Azad University, Najafabad,

More information

A Control Lyapunov Function Approach to Multiagent Coordination

A Control Lyapunov Function Approach to Multiagent Coordination IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 18, NO. 5, OCTOBER 2002 847 A Control Lyapunov Function Approach to Multiagent Coordination Petter Ögren, Magnus Egerstedt, Member, IEEE, and Xiaoming

More information

Chapter 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

Chapter 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 7: Eligibility Traces R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Midterm Mean = 77.33 Median = 82 R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

More information

REINFORCEMENT LEARNING

REINFORCEMENT LEARNING REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents

More information

Neuromorphic Sensing and Control of Autonomous Micro-Aerial Vehicles

Neuromorphic Sensing and Control of Autonomous Micro-Aerial Vehicles Neuromorphic Sensing and Control of Autonomous Micro-Aerial Vehicles Commercialization Fellowship Technical Presentation Taylor Clawson Ithaca, NY June 12, 2018 About Me: Taylor Clawson 3 rd year PhD student

More information

Ensembles of Extreme Learning Machine Networks for Value Prediction

Ensembles of Extreme Learning Machine Networks for Value Prediction Ensembles of Extreme Learning Machine Networks for Value Prediction Pablo Escandell-Montero, José M. Martínez-Martínez, Emilio Soria-Olivas, Joan Vila-Francés, José D. Martín-Guerrero IDAL, Intelligent

More information

NONLINEAR PATH CONTROL FOR A DIFFERENTIAL DRIVE MOBILE ROBOT

NONLINEAR PATH CONTROL FOR A DIFFERENTIAL DRIVE MOBILE ROBOT NONLINEAR PATH CONTROL FOR A DIFFERENTIAL DRIVE MOBILE ROBOT Plamen PETROV Lubomir DIMITROV Technical University of Sofia Bulgaria Abstract. A nonlinear feedback path controller for a differential drive

More information

Scheduled Perception for Energy-Efficient Path Following

Scheduled Perception for Energy-Efficient Path Following Scheduled Perception for Energy-Efficient Path Following Peter Ondrúška, Corina Gurău, Letizia Marchegiani, Chi Hay Tong and Ingmar Posner Abstract This paper explores the idea of reducing a robot s energy

More information

ELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems

ELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems ELEC4631 s Lecture 2: Dynamic Control Systems 7 March 2011 Overview of dynamic control systems Goals of Controller design Autonomous dynamic systems Linear Multi-input multi-output (MIMO) systems Bat flight

More information

Advanced Probes for Planetary Surface and Subsurface Exploration

Advanced Probes for Planetary Surface and Subsurface Exploration Workshop on Space Robotics, ICRA 2011 Advanced Probes for Planetary Surface and Subsurface Exploration Takashi Kubota (JAXA/ISAS/JSPEC) Hayato Omori, Taro Nakamura (Chuo Univ.) JAXA Space Exploration Program

More information

arxiv: v1 [cs.ai] 5 Nov 2017

arxiv: v1 [cs.ai] 5 Nov 2017 arxiv:1711.01569v1 [cs.ai] 5 Nov 2017 Markus Dumke Department of Statistics Ludwig-Maximilians-Universität München markus.dumke@campus.lmu.de Abstract Temporal-difference (TD) learning is an important

More information

1. 1 M oo 3-TiO 2gSiO 2. Perk in E lm er2l am bda 35 UV 2V is Spectrom eter : E2m ail: tp ṫ tj. cn

1. 1 M oo 3-TiO 2gSiO 2. Perk in E lm er2l am bda 35 UV 2V is Spectrom eter : E2m ail: tp ṫ tj. cn V ol. 25 N o. 6 2 0 0 4 6 CHEM ICAL JOU RNAL O F CH IN ESE UN IV ERS IT IES 11151119 M oo 3-TiO 2gSiO 2,,,,, (, 300072) M oo 32T io 2gSiO 2, XRD, BET, TPR, IR, UVDR S TPD,,,, ; T io 2 SiO 2 M oo 3,,. 100,

More information

Value Function Approximation in Reinforcement Learning using the Fourier Basis

Value Function Approximation in Reinforcement Learning using the Fourier Basis Value Function Approximation in Reinforcement Learning using the Fourier Basis George Konidaris Sarah Osentoski Technical Report UM-CS-28-19 Autonomous Learning Laboratory Computer Science Department University

More information

Progress In Electromagnetics Research M, Vol. 21, 33 45, 2011

Progress In Electromagnetics Research M, Vol. 21, 33 45, 2011 Progress In Electromagnetics Research M, Vol. 21, 33 45, 211 INTERFEROMETRIC ISAR THREE-DIMENSIONAL IMAGING USING ONE ANTENNA C. L. Liu *, X. Z. Gao, W. D. Jiang, and X. Li College of Electronic Science

More information

Markov Models and Reinforcement Learning. Stephen G. Ware CSCI 4525 / 5525

Markov Models and Reinforcement Learning. Stephen G. Ware CSCI 4525 / 5525 Markov Models and Reinforcement Learning Stephen G. Ware CSCI 4525 / 5525 Camera Vacuum World (CVW) 2 discrete rooms with cameras that detect dirt. A mobile robot with a vacuum. The goal is to ensure both

More information

Reinforcement Learning, Neural Networks and PI Control Applied to a Heating Coil

Reinforcement Learning, Neural Networks and PI Control Applied to a Heating Coil Reinforcement Learning, Neural Networks and PI Control Applied to a Heating Coil Charles W. Anderson 1, Douglas C. Hittle 2, Alon D. Katz 2, and R. Matt Kretchmar 1 1 Department of Computer Science Colorado

More information

Decentralized Control of Nonlinear Multi-Agent Systems Using Single Network Adaptive Critics

Decentralized Control of Nonlinear Multi-Agent Systems Using Single Network Adaptive Critics Decentralized Control of Nonlinear Multi-Agent Systems Using Single Network Adaptive Critics Ali Heydari Mechanical & Aerospace Engineering Dept. Missouri University of Science and Technology Rolla, MO,

More information

The Optimal Reward Baseline for Gradient Based Reinforcement Learning

The Optimal Reward Baseline for Gradient Based Reinforcement Learning 538 WEAVER& TAO UAI2001 The Optimal Reward Baseline for Gradient Based Reinforcement Learning Lex Weaver Department of Computer Science Australian National University ACT AUSTRALIA 0200 Lex. Weaver@cs.anu.edu.au

More information

QUICR-learning for Multi-Agent Coordination

QUICR-learning for Multi-Agent Coordination QUICR-learning for Multi-Agent Coordination Adrian K. Agogino UCSC, NASA Ames Research Center Mailstop 269-3 Moffett Field, CA 94035 adrian@email.arc.nasa.gov Kagan Tumer NASA Ames Research Center Mailstop

More information

Variable Structure Control of Pendulum-driven Spherical Mobile Robots

Variable Structure Control of Pendulum-driven Spherical Mobile Robots rd International Conerence on Computer and Electrical Engineering (ICCEE ) IPCSIT vol. 5 () () IACSIT Press, Singapore DOI:.776/IPCSIT..V5.No..6 Variable Structure Control o Pendulum-driven Spherical Mobile

More information

Face Recognition Using Global Gabor Filter in Small Sample Case *

Face Recognition Using Global Gabor Filter in Small Sample Case * ISSN 1673-9418 CODEN JKYTA8 E-mail: fcst@public2.bta.net.cn Journal of Frontiers of Computer Science and Technology http://www.ceaj.org 1673-9418/2010/04(05)-0420-06 Tel: +86-10-51616056 DOI: 10.3778/j.issn.1673-9418.2010.05.004

More information

A Sliding Mode Controller Using Neural Networks for Robot Manipulator

A Sliding Mode Controller Using Neural Networks for Robot Manipulator ESANN'4 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 8-3 April 4, d-side publi., ISBN -9337-4-8, pp. 93-98 A Sliding Mode Controller Using Neural Networks for Robot

More information

Development of a Deep Recurrent Neural Network Controller for Flight Applications

Development of a Deep Recurrent Neural Network Controller for Flight Applications Development of a Deep Recurrent Neural Network Controller for Flight Applications American Control Conference (ACC) May 26, 2017 Scott A. Nivison Pramod P. Khargonekar Department of Electrical and Computer

More information

On and Off-Policy Relational Reinforcement Learning

On and Off-Policy Relational Reinforcement Learning On and Off-Policy Relational Reinforcement Learning Christophe Rodrigues, Pierre Gérard, and Céline Rouveirol LIPN, UMR CNRS 73, Institut Galilée - Université Paris-Nord first.last@lipn.univ-paris13.fr

More information

Cooperative Control and Mobile Sensor Networks

Cooperative Control and Mobile Sensor Networks Cooperative Control and Mobile Sensor Networks Cooperative Control, Part I, A-C Naomi Ehrich Leonard Mechanical and Aerospace Engineering Princeton University and Electrical Systems and Automation University

More information