Underactuated Robotics: Learning, Planning, and Control for Efficient and Agile Machines Course Notes for MIT 6.832

Size: px

Start display at page:

Download "Underactuated Robotics: Learning, Planning, and Control for Efficient and Agile Machines Course Notes for MIT 6.832"

Holly Cobb
5 years ago
Views:

1 Underactuated Robotics: Learning, Planning, and Control for Efficient and Agile Machines Course Notes for MIT Russ Tedrake Massachusetts Institute of Technology c Russ Tedrake, 2009

2 2 c Russ Tedrake, 2009

3 Contents Preface vii 1 Fully Actuated vs. Underactuated Systems Motivation Honda s ASIMO vs. Passive Dynamic Walkers Birds vs. modern aircraft The common theme Definitions Feedback Linearization Input and State Constraints Underactuated robotics Goals for the course I Nonlinear Dynamics and Control 11 2 The Simple Pendulum Introduction Nonlinear Dynamics w/ a Constant Torque The Overdamped Pendulum The Undamped Pendulum w/ Zero Torque The Undamped Pendulum w/ a Constant Torque The Dampled Pendulum The Torque-limited Simple Pendulum The Acrobot and Cart-Pole Introduction The Acrobot Equations of Motion Cart-Pole Equations of Motion Balancing Linearizing the Manipulator Equations Controllability of Linear Systems LQR Feedback Partial Feedback Linearization PFL for the Cart-Pole System General Form Swing-Up Control Energy Shaping Simple Pendulum Cart-Pole Acrobot c Russ Tedrake, 2009 i

4 ii Discussion Other Model Systems Manipulation Introduction Dynamics of Manipulation Form Closure Force Closure Active research topics Ground reaction forces in Walking ZMP Underactuation in Walking Walking Limit Cycles Poincaré Maps The Ballistic Walker The Rimless Wheel Stance Dynamics Foot Collision Return Map Fixed Points and Stability The Compass Gait The Kneed Walker Numerical Analysis Finding Limit Cycles Local Stability of Limit Cycle Running Introduction Comparative Biomechanics Raibert hoppers Spring-loaded inverted pendulum (SLIP) Flight phase Stance phase Transitions Approximate solution Koditschek s Simplified Hopper Lateral Leg Spring (LLS) Flight Flate Plate Theory Simplest Glider Model Perching Swimming and Flapping Flight Swimming The Aerodynamics of Flapping Flight

5 iii 8 Model Systems with Stochasticity Stochastic Dynamics The Master Equation Continuous Time, Continuous Space Discrete Time, Discrete Space Stochastic Stability Walking on Rough Terrain State Estimation System Identification II Optimal Control and Motion Planning 65 9 Dynamic Programming Introduction to Optimal Control Finite Horizon Problems Additive Cost Dynamic Programming in Discrete Time Discrete-State, Discrete-Action Continuous-State, Discrete-Action Continuous-State, Continous-Actions Infinite Horizon Problems Value Iteration Value Iteration w/ Function Approximation Special case: Barycentric interpolation Detailed Example: the double integrator Pole placement The optimal control approach The minimum-time problem The quadratic regulator Detailed Example: The Simple Pendulum Analytical Optimal Control with the Hamilton-Jacobi-Bellman Sufficiency Theorem Introduction Dynamic Programming in Continuous Time Infinite-Horizon Problems The Hamilton-Jacobi-Bellman Examples Analytical Optimal Control with Pontryagin s Minimum Principle Introduction Necessary conditions for optimality Pontryagin s minimum principle Derivation sketch using calculus of variations Examples

6 iv 12 Trajectory Optimization The Policy Space Nonlinear optimization Gradient Descent Sequential Quadratic Programming Shooting Methods Computing the gradient with Backpropagation through time (BPTT) Computing the gradient w/ Real-Time Recurrent Learning (RTRL) BPTT vs. RTRL Direct Collocation LQR trajectory stabilization Linearizing along trajectories Linear Time-Varying (LTV) LQR Iterative LQR Real-time planning (aka receding horizon control) Feasible Motion Planning Artificial Intelligence via Search Motion Planning as Search Configuration Space Sampling-based Planners Rapidly-Exploring Randomized Trees (RRTs) Proximity Metrics Reachability-Guided RRTs Performance Probabilistic Roadmaps Discrete Search Algorithms Global policies from local policies Real-time Planning Multi-query Planning Probabilistic Roadmaps Feedback Motion Planning Stochastic Optimal Control Essentials Implications of Stochasticity Markov Decision Processes Dynamic Programming Methods Policy Gradient Methods Model-free Value Methods Introduction Policy Evaluation for known Markov Chains Monte Carlo Evaluation Bootstrapping

7 v A Continuum of Updates The TD(λ) Algorithm TD(λ) with function approximators LSTD Off-policy evaluation Q functions TD for Q with function approximation Importance Sampling LSTDQ Policy Improvement Sarsa(λ) Q(λ) LSPI Case Studies: Checkers and Backgammon Model-free Policy Search Introduction Stochastic Gradient Descent The Weight Pertubation Algorithm Performance of Weight Perturbation Weight Perturbation with an Estimated Baseline The REINFORCE Algorithm Optimizing a stochastic function Adding noise to the outputs Episodic REINFORCE Infinite-horizon REINFORCE LTI REINFORCE Better baselines with Importance sampling Actor-Critic Methods Introduction Pitfalls of RL Value methods Policy Gradient methods Actor-Critic Methods Case Study: Toddler III Applications and Extensions Learning Case Studies and Course Wrap-up Learning Robots Ng s Helicopters Schaal and Atkeson AIBO UNH Biped Morimoto

8 vi Heaving Foil Optima for Animals Bone geometry Bounding flight Preferred walking and running speeds/transitions RL in the Brain Bird-song Dopamine TD Course Wrap-up IV Appendix 117 A Robotics Preliminaries 118 A.1 Deriving the equations of motion (an example) A.2 The Manipulator Equations B Machine Learning Preliminaries 121 B.1 Function Approximation

9 Preface This book is about building robots that move with speed, efficiency, and grace. The author believes that this can only be achieve through a tight coupling between mechanical design, passive dynamics, and nonlinear control synthesis. Therefore, these notes contain selected material from dynamical systems theory, as well as linear and nonlinear control. These notes also reflect a deep belief in computational algorithms playing an essential role in finding and optimizing solutions to complex dynamics and control problems. Algorithms play an increasingly central role in modern control theory; nowadays even rigorous mathematicians use algorithms to develop mathematical proofs. Therefore, the notes also cover selected material from optimization theory, motion planning, and machine learning. Although the material in the book comes from many sources, the presentation is targeted very specifically at a handful of robotics problems. Concepts are introduced only when and if they can help progress our capabilities in robotics. I hope that the result is a broad but reasonably self-contained and readable manuscript that will be of use to any robotics practicioner. Organization The material in these notes is organized into two main parts: nonlinear dynamics and control, which introduces a series of increasingly complex dynamical systems and the associated control ideas, and optimal control and motion planning, which introduces a series of general derivations and algorithms that can be applied to many, if not all of the problems introduced in the first part of the book. This second part of the book is organized by techniques; perhaps the most logical order when using the book as a reference. In teaching the course, however, I take a spiral trajectory through the material, introducing robot dynamics and control problems one at a time, and introducing only the techniques that are required to solve that particular problem. Finally, a third part of the book puts it all together through a few more complicated case studies and examples. Exercises The exercises in these notes come in a few varieties. The standard exercises are intended to be straight-forward extensions of the materials presented in the chapter. Some exercises are labeled as MATLAB exercises - these are computational investigations, which sometimes involve existing code that will help get you started. Finally, some exercises are labeled as CHALLENGE problems. These are problems that I have not yet seen or found the answers too, yet, but which I would very much like to solve. I cannot guarantee that they are unsolved in the literature, but the intention is to identify some problems which would advance the state-of-the-art. Russ Tedrake, 2009 c Russ Tedrake, 2009 vii

10 viii c Russ Tedrake, 2009

11 Notation Dynamics and System Identification: q Generalized coordinates (e.g., joint space) x State space (x = [q T, q T ] T ) u Controllable inputs ᾱ Time trajectory of α s i S s i is a particular state in the set of all states S (for a discrete-state system) a i A a i is a particular action from the set of all actions A (for a discrete-action system) w Uncontrollable inputs (disturbances) y Outputs v Measurement errors z Observations H Mass/Inertial Matrix C Coriolis Matrix G Gravity and potential terms f First-order plant dynamics T Kinetic Energy U Potential Energy L Lagrangian (L = T U) Learning and Optimal Control: π Control policy π Optimal control policy α, β, γ,... Parameters g Instantaneous cost h Terminal cost J Long-term cost / cost-to-go function (value function) J Optimal cost-to-go function e eligibility vector η learning rate Basic Math: E[z], µ z Expected value of z σz 2 Variance of z σ xy, C xy Scalar covariance (and covariance matrix) between x and y f 1 f 1 f 1 x 1 x 2 x n f 2 x 1 f x Vector gradient ( = ).... f m f m x 1 x n { x 0 x < 0 δ(z) Continuous delta-function, defined via δ(z)dz = 1 x 0 δ[z] Discrete delta-function, equals 1 when z = 0, zero otherwise 1

12 2 δ ij Shorthand for δ[i j]

13 MIT OpenCourseWare Underactuated Robotics Spring 2009 For information about citing these materials or our Terms of Use, visit:

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Optimal Control. McGill COMP 765 Oct 3 rd, 2017 Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps