Robust Low Torque Biped Walking Using Differential Dynamic Programming With a Minimax Criterion

Similar documents
Minimax Differential Dynamic Programming: An Application to Robust Biped Walking

ACQUISITION OF A BIPED WALKING PATTERN USING A POINCARE MAP

A Simple Reinforcement Learning Algorithm For Biped Walking

Poincaré-Map-Based Reinforcement Learning For Biped Walking

Neighboring Optimal Control for Periodic Tasks for Systems with Discontinuous Dynamics

The Effect of Semicircular Feet on Energy Dissipation by Heel-strike in Dynamic Biped Locomotion

Stepping Motion for a Human-like Character to Maintain Balance against Large Perturbations

Derivation and Application of a Conserved Orbital Energy for the Inverted Pendulum Bipedal Walking Model

3D Dynamic Walking on Stepping Stones with Control Barrier Functions

DSCC OPTIMAL ROBUST TIME-VARYING SAFETY-CRITICAL CONTROL WITH APPLICATION TO DYNAMIC WALKING ON MOVING STEPPING STONES

Exploiting angular momentum to enhance bipedal centerof-mass

The Dynamic Postural Adjustment with the Quadratic Programming Method

Reinforcement Learning of Potential Fields to achieve Limit-Cycle Walking

On Humanoid Control. Hiro Hirukawa Humanoid Robotics Group Intelligent Systems Institute AIST, Japan

Increased Robustness of Humanoid Standing Balance in the Sagittal Plane through Adaptive Joint Torque Reduction

Limit Cycle Walking on Ice

Design and Evaluation of a Gravity Compensation Mechanism for a Humanoid Robot

Game Theoretic Continuous Time Differential Dynamic Programming

Coordinating Feet in Bipedal Balance

NUMERICAL ACCURACY OF TWO BENCHMARK MODELS OF WALKING: THE RIMLESS SPOKED WHEEL AND THE SIMPLEST WALKER

Robust Control of Cooperative Underactuated Manipulators

Humanoid Push Recovery

CASTRO: Robust Nonlinear Trajectory Optimization Using Multiple Models

Modeling, Simulation and Control of the Walking of Biped Robotic Devices Part III: Turning while Walking

Online maintaining behavior of high-load and unstable postures based on whole-body load balancing strategy with thermal prediction

We propose a model-based reinforcement

C 2 Continuous Gait-Pattern Generation for Biped Robots

Humanoid Robot Gait Generator: Foot Steps Calculation for Trajectory Following.

Sample-Based HZD Control for Robustness and Slope Invariance of Planar Passive Bipedal Gaits

Policy search by dynamic programming

Humanoid motion generation and swept volumes: theoretical bounds for safe steps

Reverse Order Swing-up Control of Serial Double Inverted Pendulums

Ankle and hip strategies for balance recovery of a biped subjected to an impact Dragomir N. Nenchev and Akinori Nishio

Biped Control To Follow Arbitrary Referential Longitudinal Velocity based on Dynamics Morphing

Toward Efficient and Robust Biped Walking Optimization

Dynamic Modeling of Human Gait Using a Model Predictive Control Approach

CHANGING SPEED IN A SIMPLE WALKER WITH DYNAMIC ONE-STEP TRANSITIONS

A Fast Online Gait Planning with Boundary Condition Relaxation for Humanoid Robots

Mechanical energy transfer by internal force during the swing phase of running

Learning Arm Motion Strategies for Balance Recovery of Humanoid Robots

Centroidal Momentum Matrix of a Humanoid Robot: Structure and Properties

A Hybrid Walk Controller for Resource-Constrained Humanoid Robots

moments of inertia at the center of gravity (C.G.) of the first and second pendulums are I 1 and I 2, respectively. Fig. 1. Double inverted pendulum T

High gain observer for embedded Acrobot

Restricted Discrete Invariance and Self-Synchronization For Stable Walking of Bipedal Robots

to (8) yields (because ξ 1 = θ = cq and H 0 q = h 0 = h d for x Z)

Capture Point: A Step toward Humanoid Push Recovery

Stochastic Differential Dynamic Programming

q 1 F m d p q 2 Figure 1: An automated crane with the relevant kinematic and dynamic definitions.

Rhythmic Robot Arm Control Using Oscillators

EFFICACY of a so-called pattern-based approach for the

Automatic Task-specific Model Reduction for Humanoid Robots

Online Center of Mass and Momentum Estimation for a Humanoid Robot Based on Identification of Inertial Parameters

Kinematic Analysis and Computation of ZMP for a 12-internal-dof Biped Robot

Bipedal Locomotion on Small Feet. Bipedal Locomotion on Small Feet. Pop Quiz for Tony 6/26/2015. Jessy Grizzle. Jessy Grizzle.

Data-Driven Differential Dynamic Programming Using Gaussian Processes

Lecture Schedule Week Date Lecture (M: 2:05p-3:50, 50-N202)

Feedback Control of Dynamic Bipedal Robot Locomotion

Swinging-Up and Stabilization Control Based on Natural Frequency for Pendulum Systems

Mimicking Human Push-Recovery Strategy based on Five-Mass with Angular Momentum Model

(W: 12:05-1:50, 50-N202)

Trajectory-tracking control of a planar 3-RRR parallel manipulator

Time scaling control for an underactuated biped robot

Hopping with Nearly-Passive Flight Phases

Control and Planning with Asymptotically Stable Gait Primitives: 3D Dynamic Walking to Locomotor Rehabilitation?

Decentralized PD Control for Non-uniform Motion of a Hamiltonian Hybrid System

Marc Bachelier, Sébastien Krut, Ahmed Chemori

Dynamic Reconfiguration Manipulability Analyses of Redundant Robot and Humanoid Walking

The Jacobian. Jesse van den Kieboom

Steering of a 3D Bipedal Robot with an Underactuated Ankle

Quadratic Leaky Integrate-and-Fire Neural Network tuned with an Evolution-Strategy for a Simulated 3D Biped Walking Controller

A penalty-based approach for contact forces computation in bipedal robots

Dynamics in the dynamic walk of a quadruped robot. Hiroshi Kimura. University of Tohoku. Aramaki Aoba, Aoba-ku, Sendai 980, Japan

arxiv: v1 [cs.sy] 3 Sep 2015

Balancing in Dynamic, Unstable Environments without Direct Feedback of Environment Information

Dynamics of Heel Strike in Bipedal Systems with Circular Feet

Planar Bipedal Robot with Impulsive Foot Action

Bipedal Walking Gait with Variable Stiffness Knees

Design and Experimental Implementation of a Compliant Hybrid Zero Dynamics Controller with Active Force Control for Running on MABEL

Compliant Leg Architectures and a Linear Control Strategy for the Stable Running of Planar Biped Robots

Momentum-centric whole-body control and kino-dynamic motion generation for floating-base robots

Previewed impedance adaptation to coordinate upper-limb trajectory tracking and postural balance in disturbed conditions

On-line Trajectory Adaptation for Active Lower Limbs Orthoses based on Neural Networks

Application of Newton/GMRES Method to Nonlinear Model Predictive Control of Functional Electrical Stimulation

Control of a biped robot by total rate of angular momentum using the task function approach

Dynamics and Balance Control of the Reaction Mass Pendulum (RMP): A 3D Multibody Pendulum with Variable Body Inertia

A consideration on position of center of ground reaction force in upright posture

Passive Dynamic Walking with Knees: A Point Foot Model. Vanessa F. Hsu Chen

Variable Impedance Control A Reinforcement Learning Approach

Iterative Learning Control for a Musculoskeletal Arm: Utilizing Multiple Space Variables to Improve the Robustness

NONLINEAR CONTROLLER DESIGN FOR ACTIVE SUSPENSION SYSTEMS USING THE IMMERSION AND INVARIANCE METHOD

Discrete Control Barrier Functions for Safety-Critical Control of Discrete Systems with Application to Bipedal Robot Navigation

University of Twente

Highly Robust Running of Articulated Bipeds in Unobserved Terrain

arxiv: v1 [cs.ro] 15 May 2017

Mechanical Engineering Department - University of São Paulo at São Carlos, São Carlos, SP, , Brazil

Real-time Stabilization of a Falling Humanoid Robot using Hand Contact: An Optimal Control Approach

Robust Global Swing-Up of the Pendubot via Hybrid Control

Discrete Control Barrier Functions for Safety-Critical Control of Discrete Systems with Application to Bipedal Robot Navigation

Robotics & Automation. Lecture 25. Dynamics of Constrained Systems, Dynamic Control. John T. Wen. April 26, 2007

Transcription:

Robust Low Torque Biped Walking Using Differential Dynamic Programming With a Minimax Criterion J. Morimoto and C. Atkeson Human Information Science Laboratories, ATR International, Department 3 2-2-2 Hikaridai Seika-cho Soraku-gun, Kyoto, JAPAN 619-0288 xmorimo@atr.co.jp The Robotics Institute, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA, USA 15213 xmorimo@cs.cmu.edu, cga@cmu.edu ABSTRACT We developed a control policy design method for robust low torque biped walking by using differential dynamic programming with a minimax criterion. As an example, we applied our method to a simulated five link biped robot. The results show lower joint torques from the optimal control policy compared to a hand-tuned PD servo controller. Results also show that the simulated biped robot can successfully walk with unknown disturbances that cause controllers generated by standard differential dynamic programming and the hand-tuned PD servo to fail. Future work will implement these controllers on a robot we are currently developing. Recent humanoid robots using Zero Moment Point (ZMP) control strategies have demonstrated impressive biped walking [7, 11, 8]. However, robots using ZMP control are often neither robust nor energy efficient and can generate large joint torques. McGeer [9] demonstrated that passive dynamic walking was possible. His robots walked down a slight incline without applying any torques at the joints. Recently, several studies have explored how to generate energy efficient biped walking [1, 10]. Many studies using optimization methods [4, 2] focus on finding optimal biped walk trajectories, but do not provide control laws to cope with disturbances. Our strategy is to use differential dynamic programming [3, 6], an optimization method, to find both a low torque biped walk and a policy or control law to handle deviations from the nominal trajectory. We use a minimax reward to insure the policy is robust. 1 BIPED ROBOT MODEL In this paper, we use a simulated five link biped robot (Fig. 1) to explore our approach. Kinematic and dynamic parameters of the simulated robot are chosen to match those of a biped robot we are currently developing (Fig. 2) and which we will use to further explore our approach. Height and total weight of the robot are about 0.4 [m] and 2.0 [kg] respectively. Table 1 shows the parameters of the robot model. Table 1: Physical parameters of the robot model link1 link2 link3 link4 link5 mass [kg] 0.05 0.43 1.0 0.43 0.05 length [m] 0.2 0.2 0.01 0.2 0.2 inertia [kg m 10 4 ] 1.75 4.29 4.33 4.29 1.75 1

3 link3 joint2,3 4 link4 link2 2 link1 joint1 1 joint4 5 link5 ankle Figure 1: Five link robot model Figure 2: Real robot We can represent the forward dynamics of the biped robot as ẋ = f(x) + b(x)u, (1) where x = {θ 1,..., θ 5, θ 1,..., θ 5 } denotes the input state vector, u = {τ 1,..., τ 4 } denotes the control command (each torque τ i is applied to joint i (Fig. 1)). In the minimax optimization case, we explicitly represent the existence of the disturbance as ẋ = f(x) + b(x)u + b w (x)w, (2) where w = {w 0, w 1, w 2, w 3, w 4 } denotes the disturbance (w 0 is applied to ankle, and w i (i = 1... 4) is applied to joint i (Fig. 1)). 2 OPTIMIZATION CRITERION AND METHOD We use the following objective function, which is designed to reward energy efficiency and enforce periodicity of the trajectory: J opt = i=0 L(x i, u i ) + Φ(x 0, x N ) (3) which is applied for half the walking cycle, from one heelstrike to the next heelstrike. This criterion sums the squared deviations from a nominal trajectory, the squared control magnitudes, and the squared deviations from a desired velocity of the center of mass: L(x i, u i ) = (x i x d i )T Q(x i x d i ) + u i T Ru i + (v i (x i ) vi d )T S(v i (x i ) vi d ), (4) 2

where x i is a state vector at the i-th time step, x d i is the nominal state vector at the i-th time step (taken from a trajectory generated by a hand-designed walking controller), v i denotes the velocity of the center of mass at the i-th time step, vi d denotes the desired velocity of the center of mass at the i-th time step, the term (x i x d i ) T Q(x i x d i ) encourages the robot to follow the nominal trajectory, the term u T i Ru i discourages using large control outputs, and the term (v i (x i ) vi d ) T S(v i (x i ) vi d ) encourages the robot to achieve the desired velocity. In addition, penalties on the initial (x 0 ) and final (x N ) states are applied: Φ(x 0, x N ) = F (x 0 ) + Φ N (x 0, x N ). (5) The term F (x 0 ) penalizes an initial state where the foot is not on the ground: F (x 0 ) = F h T (x 0 )P 0 F h (x 0 ), (6) where F h (x 0 ) denotes height of the swing foot at the initial state x 0. The term Φ N (x 0, x N ) is used to help generate periodic trajectories: Φ N (x 0, x N ) = (x N H(x 0 )) T P N (x N H(x 0 )), (7) where x N denotes the terminal state, x 0 denotes the initial state, and the term (x N H(x 0 )) T P N (x N H(x 0 )) is a measure of terminal control accuracy. A function H() represents the coordinate change caused by the exchange of a support leg and a swing leg, and the velocity change caused by a swing foot touching the ground (Appendix B). Dynamic programming provides a methodology to develop planners and controllers for nonlinear systems. However, general dynamic programming is computationally intractable. We use differential dynamic programming (DDP) which is a second order local trajectory optimization method to generate locally optimal plans and local models of the value function [3, 6]. This method also gives us a local policy or feedback controller to correct errors from the planned trajectory. We introduce a robust DDP method realized by adding a minimax term to the criterion (Appendix A). We use a modified objective function: J minmax = J opt i=0 w i T Gw i, (8) where w i denotes a disturbance vector at the i-th time step, and the term w i T Gw i rewards coping with large disturbances. This explicit representation of the disturbance w provides the robustness for the controller. 2.1 Neighboring Extremal Method Differential dynamic programming finds a locally optimal trajectory x opt i and the corresponding control trajectory u opt i. When we apply our control algorithm to a real robot, we usually need a feedback controller to cope with unknown disturbances or modeling errors. Fortunately, DDP provides us a local policy along the optimized trajectory: u opt (x i, i) = u opt i + K i (x i x opt i ), (9) where K i is a time dependent gain matrix given by taking the derivative of the optimal policy with respect to the state [3, 6]. 3

3 RESULTS We compare the optimized controller with a hand-tuned PD servo controller, which also is the source of the initial and nominal trajectories in the optimization process. We set the parameters for the optimization process as Q = 0.25I 10, R = 3.0I 4, S = 0.3I 1, desired velocity v d = 0.4[m/s] in equation (4), P 0 = 1000000.0I 1 in equation (6), and P N = diag{10000.0, 10000.0, 10000.0, 10000.0, 10000.0, 10.0, 10.0, 10.0, 5.0, 2.5} in equation (7), where I N denotes N dimensional identity matrix. Each parameter is set to acquire the best results in terms of both the robustness and the energy efficiency. Results in table 2 show that the controller generated by the optimization process did halve the cost of the trajectory, as compared to that of the original hand-tuned PD servo controller. Note that we defined the control cost as N 1 i=0 u i 2, where u i is the control output (torque) vector at i-th time step. Table 2: One step control cost (average over 100 steps) DDP PD servo control cost [(N m) 2 ] 11.7 24.8 To test robustness, we assume that there is unknown viscous friction at each joint: τ d j = µ j θ j (j = 1,..., 4), (10) where µ j denotes the viscous friction coefficient at joint j, and an unknown disturbing torque at the ankle: τ d ankle = τ c, (11) where τ c denotes a constant disturbing torque. We considered three disturbance conditions as 1) viscous friction 2) constant ankle torque 3) friction and ankle torque. We used two levels of disturbances in the simulation, with the higher level being about 3 times larger than the base level (Table 3). Table 3: Parameters of the disturbance µ 2,µ 3 (hip joints) µ 1,µ 4 (knee joints) τ c (ankle) base 0.01 0.04 0.003 large 0.03 0.15 0.01 All methods could handle the base level disturbances. Both the standard and the minimax DDP generated much less control cost than the hand-tuned PD servo controller (Table 4). However, because the minimax DDP is more conservative in taking advantage of the plant dynamics it has a slightly higher control cost than the standard DDP. We also found that the friction disturbance increased the control cost, as would be expected. Note that we used same parameters as we used in previous experiment for both the standard DDP and the minimax DDP (i.e. Q,R,S,v d,p 0,P N ). For the minimax DDP, we set the parameter for the disturbance reward in equation (8) as G = diag{5.0, 20.0, 20.0, 20.0, 20.0} (G with smaller elements generates more conservative but robust trajectories). 4

Only the minimax DDP control design could cope with the higher level of disturbances with the friction disturbance. Figure 3 shows trajectories for the three different methods with the friction and the ankle torque disturbances. Both the robot with the standard DDP and the hand-tuned PD servo controller fell down before achieving 50 steps. The bottom of figure 3 shows successful 50 steps of the robot with the minimax DDP. Table 5 shows the number of steps before the robot fell down. We terminated a trial when the robot achieved 100 steps. We found that the failed trials were mainly caused by the friction disturbance. Table 4: One step control cost [(N m) 2 ] with the base setting (averaged over 100 steps) standard DDP minimax DDP PD servo 1)friction 14.3 15.8 26.3 2)ankle torque 11.7 12.7 25.0 3)friction & ankle torque 14.4 16.5 26.5 Hand-tuned PD servo Standard DDP Minimax DDP Figure 3: Biped walk trajectories with the three different methods Table 5: Number of steps with large disturbances standard DDP minimax DDP PD servo 1)friction 25 100 47 2)ankle torque 100 100 100 3)friction & ankle torque 16 100 33 4 DISCUSSION In this study, we developed an optimization method to generate biped walking trajectories by using Differential Dynamic Programming (DDP). Both standard DDP and minimax DDP gen- 5

erated low torque biped trajectories. We showed that minimax DDP had more robustness than the controller provided by standard DDP and the hand-tuned PD servo. DDP provides a feedback controller which is important in coping with unknown disturbances and modeling errors. However, as shown in equation (9), the feedback controller depended on time, and development of a time independent feedback controller is a future goal. APPENDIX A Calculation of The Minimax DDP Here, we show the update rule of minimax DDP. Minimax DDP can be derived as an extension of standard DDP [3, 6]. The difference is that the proposed method has an additional disturbance variable w to explicitly represent the existence of disturbances. This representation of the disturbance provides the robustness for optimized trajectories and policies. The total return R by using control output u i and disturbance w i at the state x i is given by R(x i, u i, w i ) = L(x i, u i, w i ) + where the value function V is defined as V (x i ) = min u i and the reward L is defined as j=i+1 L(x j, u j, w j ) + Φ(x 0, x N ) = L(x i, u i, w i ) + V (x i+1 ), (12) max w i j=i L(x j, u j, w j ) + Φ(x 0, x N ), (13) L(x i, u i, w i ) = (x i x d i )T Q(x i x d i )+u i T Ru i w T i Gw i +(v i (x i ) vi d )T S(v i (x i ) vi d ). (14) Meaning of each parameter in equation (14) is described in section 2. Then, we expand the return R to second order in terms of δu, δw and δx about the nominal solution: R(x i, u i, w i ) = Z(i) + Z x (i)δx i + Z u (i)δu i + Z w (i)δw i Z xx (i) Z xu (i) Z xw (i) δx i +[δx T i δut i δwt i ] Z ux (i) Z uu (i) Z uw (i) δu i, (15) Z wx (i) Z wu (i) Z ww (i) δw i where Z(i) = L(x i, u i, w i ) + V (x i+1 ). Here, δu i and δw i must be chosen to minimize and maximize the return R(x i, u i, w i ) respectively, i.e., δu i = Zuu 1 (i)[z ux (i)δx i + Z uw (i)δw i + Z u (i)] δw i = Zww 1 wx(i)δx i + Z wu (i)δu i + Z w (i)]. (16) By solving (16), we can derive both δu i and δw i. After updating the control output u i and the disturbance w i with derived δu i and δw i, the value function V (x i ), first order derivative V x (x i ), and second order derivative V xx (x i ) are given as V (x i ) = V (x i+1 ) Z u (i)zuu 1 u(i) Z w (i)zww 1 w(i) V x (x i ) = Z x (i) Z u (i)zuu 1 ux(i) Z w (i)zww 1 wx(i) V xx (x i ) = Z xx (i) Z xu (i)zuu 1 (i)z ux (i) Z xw (i)zww(i)z 1 wx (i). (17) 6

B Ground Contact Model The function H() in equation (7) includes the mapping (velocity change) caused by the ground contact. To derive the first derivative of the value function V x (x N ) and the second derivative V xx (x N ), where x N denotes the terminal state, the function H() should be analytical. Then, we used an analytical ground contact model[5]: θ + θ = M 1 (θ)d(θ)f t, (18) where θ denotes joint angles of the robot, θ denotes angular velocities before ground contact, θ+ denotes angular velocities after ground contact, M denotes inertia matrix, D denotes Jacobian matrix which converts the ground contact force f to the torque at each joint, and t denotes time step of the simulation. REFERENCES [1] F. Asano, M. Yamakita, and K. Furuta. Virtual passive dynamic walking and energy-based control laws. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, 2000. [2] C. Chevallerau and Y. Aoustin. Optimal running trajectories for a biped. In 2nd International Conference on Climbing and Walking Robots, pages 559 570, 1999. [3] P. Dyer and S. R. McReynolds. The Computation and Theory of Optimal Control. Academic Press, New York, NY, 1970. [4] M. Hardt, J. Helton, and K. Kreuts-Delgado. Optimal biped walking with a complete dynamical model. In Proceedings of the 38th IEEE Conference on Decision and Control, pages 2999 3004, 1999. [5] Y. Hurmuzlu and D. B. Marghitu. Rigid body collisions of planar kinematic chains with multiple contact points. International Journal of Robotics Research, 13(1):82 92, 1994. [6] D. H. Jacobson and D. Q. Mayne. Differential Dynamic Programming. Elsevier, New York, NY, 1970. [7] S. Kagami, K. Nishiwaki, J. J. Kuffner, Y. Kuniyoshi, M. Inaba, and H. Inoue. Design and implementation of software research platform for humanoid robotics:h7. In International Conference on Humanoid Robots, pages 253 258, 2001. [8] Y. Kuroki, T. Ishida, J. Yamaguchi, M. Fujita, and T. Doi. A small biped entertainment robot. In International Conference on Humanoid Robots, pages 181 186, 2001. [9] T. McGeer. Passive dynamic walking. International Journal of Robotics Research, 9(2):62 82, 1990. [10] S. Miyakoshi, G. Cheng, and Y. Kuniyoshi. Transferring human biped walking function to a machine -towards the realization of a biped bike-. In 4th International Conference on Climbing and Walking Robots, pages 763 770, 2001. [11] K. Yokoi, F. Kanehiro, K. Kaneko, K. Fujiwara, S. Kajita, and H. Hirukawa. A honda humanoid robot controlled by aist software. In International Conference on Humanoid Robots, pages 259 264, 2001. 7