Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

Similar documents
Robotics. Control Theory. Marc Toussaint U Stuttgart

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:

The Planning-by-Inference view on decision making and goal-directed behaviour

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28

Lecture 5 Linear Quadratic Stochastic Control

Optimal Control Theory

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Chapter 5. Pontryagin s Minimum Principle (Constrained OCP)

Lecture 13: Dynamical Systems based Representation. Contents:

Online Learning in High Dimensions. LWPR and it s application

EE C128 / ME C134 Feedback Control Systems

ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

Artificial Intelligence

OPTIMAL CONTROL AND ESTIMATION

Lecture 2: Discrete-time Linear Quadratic Optimal Control

MATH4406 (Control Theory) Unit 6: The Linear Quadratic Regulator (LQR) and Model Predictive Control (MPC) Prepared by Yoni Nazarathy, Artem

Optimal control and estimation

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction

Time-Invariant Linear Quadratic Regulators Robert Stengel Optimal Control and Estimation MAE 546 Princeton University, 2015

Time-Invariant Linear Quadratic Regulators!

Problem 1 Cost of an Infinite Horizon LQR

Video 6.1 Vijay Kumar and Ani Hsieh

Optimal Control with Learned Forward Models

Stochastic and Adaptive Optimal Control

Path Integral Stochastic Optimal Control for Reinforcement Learning

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

Hamilton-Jacobi-Bellman Equation Feb 25, 2008

Control Theory in Physics and other Fields of Science

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

A path integral approach to agent planning

MATH4406 (Control Theory) Unit 1: Introduction Prepared by Yoni Nazarathy, July 21, 2012

Steady State Kalman Filter

Minimum Angular Acceleration Control of Articulated Body Dynamics

Autonomous Mobile Robot Design

Quadratic Stability of Dynamical Systems. Raktim Bhattacharya Aerospace Engineering, Texas A&M University

CDS 110b: Lecture 2-1 Linear Quadratic Regulators

Overview of the Seminar Topic

Policy Search for Path Integral Control

Underactuated Robotics: Learning, Planning, and Control for Efficient and Agile Machines Course Notes for MIT 6.832

Lecture XI Learning with Distal Teachers (Forward and Inverse Models)

Lecture Notes: (Stochastic) Optimal Control

Artificial Intelligence & Sequential Decision Problems

CS 7180: Behavioral Modeling and Decisionmaking

Markov Decision Processes and Dynamic Programming

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes

Path Integral Control by Reproducing Kernel Hilbert Space Embedding

Return Difference Function and Closed-Loop Roots Single-Input/Single-Output Control Systems

X 2 3. Derive state transition matrix and its properties [10M] 4. (a) Derive a state space representation of the following system [5M] 1

Pontryagin s maximum principle

CS599 Lecture 1 Introduction To RL

Markov Decision Processes

Lecture 3: Markov Decision Processes

Covariant Policy Search

An introduction to stochastic control theory, path integrals and reinforcement learning

Markov Decision Processes Chapter 17. Mausam

MODEL-BASED REINFORCEMENT LEARNING FOR ONLINE APPROXIMATE OPTIMAL CONTROL

arxiv: v1 [cs.lg] 20 Sep 2010

POMDPs and Policy Gradients

On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference

Outline. 1 Linear Quadratic Problem. 2 Constraints. 3 Dynamic Programming Solution. 4 The Infinite Horizon LQ Problem.

Bayesian Decision Theory in Sensorimotor Control

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Information, Utility & Bounded Rationality

Markov Decision Processes and Dynamic Programming

Chap 4. State-Space Solutions and

Control of Mobile Robots

Lecture 9. Systems of Two First Order Linear ODEs

Reinforcement Learning with Reference Tracking Control in Continuous State Spaces

Lecture 20: Linear Dynamics and LQG

ESC794: Special Topics: Model Predictive Control

LINEARLY SOLVABLE OPTIMAL CONTROL

Scaling Reinforcement Learning Paradigms for Motor Control

Contents. 1 State-Space Linear Systems 5. 2 Linearization Causality, Time Invariance, and Linearity 31

arxiv: v1 [cs.sy] 3 Sep 2015

Optimal Control, Trajectory Optimization, Learning Dynamics

Model Predictive Control Short Course Regulation

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

Continuous State Space Q-Learning for Control of Nonlinear Systems

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Yishay Mansour Tel-Aviv University

Markov decision processes

An online kernel-based clustering approach for value function approximation

Inverse Optimality Design for Biological Movement Systems

CS 598 Statistical Reinforcement Learning. Nan Jiang

Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation

Topic # /31 Feedback Control Systems. Analysis of Nonlinear Systems Lyapunov Stability Analysis

Linear-Quadratic Optimal Control: Full-State Feedback

MATH4406 (Control Theory) Unit 1: Introduction to Control Prepared by Yoni Nazarathy, August 11, 2014

Model-Based Reinforcement Learning with Continuous States and Actions

Reinforcement learning

, and rewards and transition matrices as shown below:

ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory

Chapter 2 Optimal Control Problem

Chapter 3: The Reinforcement Learning Problem

Transcription:

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: http://wcms.inf.ed.ac.uk/ipab/rss

Control Theory Concerns controlled systems of the form: and a controller of the form: We will neglect stochasticity here. When analyzing a given controller π, one analyzes closed-loop system as described by the differential equation (e.g., analysis for convergence and stability) Prof. Sethu Vijayakumar // R:SS 2017 2

Cart Pole Example Prof. Sethu Vijayakumar // R:SS 2017 3

Core Topics in Control Theory Stability Analyze the stability of a closed-loop system. Eigenvalue analysis or Lyapunov function method Controllability Analyze which dimensions (DoFs) of the system can actually in principle be controlled. Transfer function Analyze the closed-loop transfer function, i.e., how frequencies are transmitted through the system. ( Laplace transformation) Controller design Find a controller with desired stability and/or transfer function properties. Optimal control Define a cost function on the system behavior. Optimize a controller to minimize costs. Prof. Sethu Vijayakumar // R:SS 2017 4

Control Theory References Robert F. Stengel: Optimal control and estimation Online lectures: http://www.princeton.edu/~stengel/mae546lectures.html (esp. lectures 3,4 and 7-9) From robotics lectures: Stefan Schaal s lecture Introduction to Robotics: http://wwwclmc.usc.edu/teaching/teachingintroductiontoroboticssyllabus Drew Bagnell s lecture on Adaptive Control and Reinforcement Learning http://robotwhisperer.org/acrls11/ Prof. Sethu Vijayakumar // R:SS 2017 5

Optimal Control (Discrete Time) Given a controlled dynamic system: we define a cost function: Prof. Sethu Vijayakumar // R:SS 2017 6

Dynamic Programming & Bellman Principle An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. V [ c( edge) + ( next )] ( state) = min V state edge Prof. Sethu Vijayakumar // R:SS 2017 7

Optimal Control (Discrete Time) Define the value function or optimal cost-to-go function Bellman equation: The argmin gives the optimal control signal: Derivation: Prof. Sethu Vijayakumar // R:SS 2017 8

Optimal Control (Continuous Time) Given a controlled dynamic system: we define a cost function with horizon T: Prof. Sethu Vijayakumar // R:SS 2017 9

Hamilton-Jacobi-Bellman Equation (Continuous Time) Define the value function or optimal cost-to-go function Bellman equation: The argmin gives the optimal control signal: Derivation: Prof. Sethu Vijayakumar // R:SS 2017 10

Infinite Horizon Case This cost function is stationary (time invariant)! The HJB and Bellman equations remain the same but with the same (stationary) value function independent of t: The argmin gives the optimal control signal: Prof. Sethu Vijayakumar // R:SS 2017 11

Infinite Horizon Examples Cart Pole Balancing Problem You might define a cost: Reference Following: Several problems in control can be framed this way! Prof. Sethu Vijayakumar // R:SS 2017 12

Comments The Bellman equation is fundamental in optimal control theory, but also Reinforcement Learning The HJB equation is a differential equation for V (x,t) which is in general hard to solve The (time-discretized) Bellman equation can be solved by Dynamic Programming starting backward: But it might still be hard or infeasible to represent the functions V t (x) over continuous x! Both become significantly simpler under linear dynamics and quadratic costs Riccati Equation Prof. Sethu Vijayakumar // R:SS 2017 13

Linear-Quadratic Optimal Control Linear Dynamics Quadratic Costs Note: Dynamics neglects constant term; costs neglect linear and constant terms. This is because constant costs are irrelevant linear cost terms can be made away by redefining x or u constant dynamic term only introduces a constant drift Prof. Sethu Vijayakumar // R:SS 2017 14

LQ control as Local Approximation LQ control is important also to control non-lq systems in the neighbourhood of a desired state! Let x * be such a desired state (e.g., cart-pole: x * = (0, 0, 0, 0)) linearize the dynamics around x * use 2nd order approximation of the costs around x * control the system locally as if it was LQ re-estimate for new desired state! Prof. Sethu Vijayakumar // R:SS 2017 15

Ricatti Diff.Eq.= HJB Eq. in LQ case In the Linear-Quadratic (LQ) case, the value function is always a quadratic function of x! Let V(x,t)=x T P(t)x, then the HJB equations become: Ricatti Differential Equation: Prof. Sethu Vijayakumar // R:SS 2017 16

Ricatti Differential Equation This is a differential equation for the matrix P(t) describing the quadratic value function. If we solve it with the finite horizon constraint P(T) = F, we solved the optimal control problem. Prof. Sethu Vijayakumar // R:SS 2017 17

Ricatti Equations Finite Horizon Continuous Time Ricatti differential equations Infinite Horizon Continuous Time Algebraic Ricatti Equations (ARE) Finite Horizon Discrete Time Infinite Horizon Discrete Time Prof. Sethu Vijayakumar // R:SS 2017 18

Example: 1D Point Mass Dynamics Costs Algebraic Riccati Equations Prof. Sethu Vijayakumar // R:SS 2017 19

Example: 1D Point Mass (cont d) Algebraic Riccati Equations The Algebraic Riccati equation is usually solved numerically. (e.g., are, care or dare in Octave) Prof. Sethu Vijayakumar // R:SS 2017 20

Relations to other topics T-step Optimal 1-step Optimal Inverse Kinematics Operational Space Control Optimal Control control no control constraints Trajectory Optimization model known adaptive Reinforcement Learning Prof. Sethu Vijayakumar // R:SS 2017 21

Optimal Control Relations to other topics Inverse Kinematics Optimal Operational Space Control Trajectory Optimisation Reinforcement Learning Prof. Sethu Vijayakumar // R:SS 2017 22

For more details refer to... Todorov, E. Optimal control theory, In Bayesian Brain: Probabilistic Approaches to Neural Coding, Doya K at al (ed.), chap 12, pp 269-298, MIT Press (2006). Prof. Sethu Vijayakumar // R:SS 2017 23