Reactive Self2re scue Control for Autonomous Mobile Ro bot Ba sed on Reinforcement Learning

Size: px

Start display at page:

Download "Reactive Self2re scue Control for Autonomous Mobile Ro bot Ba sed on Reinforcement Learning"

Osborn Branden Lawrence
5 years ago
Views:

1 J OU RNAL OF SHAN GHA I J IAO TON G UNIV ERSIT Y Vol. 43 No. 11 Nov : (2009) ,,, (, ) :.,,.,.. : ; ; Q ; : TP 242 : A Reactive Self2re scue Control for Autonomous Mobile Ro bot Ba sed on Reinforcement Learning W A N G Zhon g2w ei, CA O Qi2x i n, L UA N N an, Z H A N G L ei ( Research Instit ute of Robotics, Shanghai Jiaotong U niversity, Shanghai , China) Abstract : A control technique to achieve self2rescue of autonomous mobile robot f rom obstacle environment based o n reinforcement learning was p ropo sed. Motio n co nt rol st rategy of self2rescue was got o n line t hrough t he interaction between t he mobile robot and the obstacle environment, so t he self2rescue control technique help s to overcome obstacle environment by t he autonomous mobile robot independently and avoid t he loss f rom rescue activity and task failure. Prior knowledge of working environment was applied to di2 rect the design of heuristic reward f unction for the reinforcement learning system, which guarantees t he correct direction of searching and learning control strategy. The simulation experiment s indicate that it is feasible to achieve self2rescue by self learning co nt rol st rategy. Key words : autonomous mobile robot ; reactive control ; Q2learning ; heuristic reward f unction,,,,.,,.,., [ 122 ], [324 ], [ 526 ], : : ( ), (863) (2006AA040203) : (19782),,,,. (19602),,,, ( Tel. ) : ; E2mail edu. cn.

2 ,,., Brooks [7 ],,,.,, [8 ] [9 ] [10 ].,,. ( Reinforcement Learning,RL ) Agent, [11 ],., RL,,,.,,, RL, RL [12213 ]., RL,,,. 1 1.,, S, A, G., S G,.,,, G,,, G. 1 Fig. 1 Schematic of mountain2car being in environmental obstacles [11 ] x t+1 = bound[ x t + vt+1 ] vt+1 = bound[ vt u - gcos 3 x t ] (1), x v. u 3, u { + 1, - 1, 0}, 3. g = ; G S A x (1) bound[ ] x v : x , x = , vt + 1 = 0 ; x 0. 5,. v [ ,0. 07 ]. 2,,,, Q Q [11 ],, Agent. Q : t, Agent at, st s t + 1, rt, rt s t + 1 at s t. Q,, Q, Q s t at,, Qt+1 (st, at) = rt + max a t { Q(st+1, at) at A} (2)

3 11, : 1753 Q, Q Agent Q,.,,,., Sutton [11 ], r( t) = - 1, x < , x 0. 5 (3),.,,,.,,.,,,,, 2.,, ;,, + 2 ;, + 1. : r( t) = + 2, ( x t+1 - x t) u 0, v ( t) 0 + 1, ( x t+1 - x t) u 0, v ( t) < 0-1, (4),,.,,,. Q : (1). (2) st. Q, 1 - a t ; a t. (3) st + 1, (4) rt. (4) rt, Q, Qt ( st, at) = (1 - at) Qt - 1 (st, at) + at [ rt + gv ( st + 1 ) ] s = st, a = at Qt - 1 ( st, at),v ( st + 1 ) = max a A { Qt - 1 ( st + 1, a) }. 3 2 Q2 Fig. 2 Framework of Q2learning based on heuristic reward f unction 1 3, :,. :, Q Q., RL. 2 Q : = 0. 5, = 0. 95, = x = v = Q., 3 (a) 200 ( : G 1 000,,, ). 3 ( b).

4 (a) (b) 3 Fig. 3 Variation curve of position and velocity of mountain car with different reward f unction, Q, ; Q,,,. : Q, Q,,..,,,,. x = ( n), n, 4. (a) = 0. 1 (b) = 0. 3 (c) = 0. 5 (d) = n Fig. 4 Relationship between trials number of mountain car s overcoming obstacle and parameters of control algorithm, 20.,. 4 Q, Q,.

5 11, : 1755 : Q,. : [ 1 ] Tsalatsanis A, Valavanis K, Tsourveloudis N. Mobile robot navigation using sonar and range meas2 urement s f rom uncalibrated cameras [J ]. Journal of Intelligent and Robotic Systems, 2007, 48 ( 2) : [ 2 ] Mata M, Armingol J M, Fernndez J, et al. Object learning and detection using evolutionary deformable models for mobile robot navigation [J ]. Robotica, 2008, 26 (1) : [ 3 ] Lan G P, Ma S G. Development of a novel crawler mechanism with polymorphic locomotion [J ]. vanced Robotics, 2007, 21 (3/ 4) : Ad2 [ 4 ],,,. [J ].,2004,30 (3) : DEN G Zong2quan, GAO Hai2bo, WAN G Shao2chun, et al. Analysis of climbing obstacle capability of lunar rover with planetary wheel [J ]. Journal of Beijing University of Aeronautics and Astronautics, 2004, 30 (3) : [ 5 ],,,. [J ]. ( ), 2006, 34 (7) : XIAN G Xian2bo, XU Guo2hua, CAI Tao, et al. In2 telligent self2rescue system for underwater vehicles [J ]. Journal of Huazhong University of Science and Technology ( Natural Science Edition), 2006, 34 (7) : [ 6 ] Wang Z W, Cao Q X, L uan N, et al. Development of new pipeline maintenance system for repairing ear2 ly2built off shore oil pipelines [ C]/ / IEEE ICIT Piscataway, United States : IEEE Publishers, 2008 : 126. [ 7 ] Brooks R A. Robust layered control system for a mo2 bile robot [J ]. IEEE Journal of Robotics and Automa2 tion, 1986, 2 (1) : [ 8 ] Faress K N, EI Hagry M T, EI Kosy A A. Trajecto2 ry tracking control for a wheeled mobile robot using f uzzy logic controller [J ]. WSEAS Transactions on Systems, 2005, 4 (7) : [ 9 ] Wang X Q, Hou Z G, Zou A M, et al. A behavior controller based on spiking neural networks for mo2 bile robot s [J ]. Neurocomputing, 2008, 71 ( 426 ) : [10 ] Er M J, Tan T P, Loh S Y. Control of a mobile robot using generalized dynamic f uzzy neural net2 works [J ]. Microprocessors and Microsystems, 2004, 28 (9) : [11 ] Sutton R S, Barto A G. Reinforcement Learning : An Introduction [ M ]. Cambridge, MA : MIT Press, [12 ] Mataric M J. Reward functions for accelerated learn2 ing [ C]/ / Proceedings of the Eleventh International Conference on Machine Learning. San Francisco, CA : Morgan Kauf mann Publishers, 1994 : [13 ],. [J ].,2005,32 (3) : WEI Ying2zi, ZHAO Ming2yang. Design and conver2 gence analysis of a heuristic reward f unction for rein2 forcement learning algorithms [J ]. Computer Science, 2005, 32 (3) : Zernike CT,, (, ) : Zernike CT. Zernike,,.,,.

Replacing eligibility trace for action-value learning with function approximation

Replacing eligibility trace for action-value learning with function approximation Kary FRÄMLING Helsinki University of Technology PL 5500, FI-02015 TKK - Finland Abstract. The eligibility trace is one