Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

Size: px
Start display at page:

Download "Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:"

Transcription

1 Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

2 Control Theory Concerns controlled systems of the form: and a controller of the form: We will neglect stochasticity here. When analyzing a given controller π, one analyzes closed-loop system as described by the differential equation (e.g., analysis for convergence and stability) Prof. Sethu Vijayakumar // R:SS

3 Cart Pole Example Prof. Sethu Vijayakumar // R:SS

4 Core Topics in Control Theory Stability Analyze the stability of a closed-loop system. Eigenvalue analysis or Lyapunov function method Controllability Analyze which dimensions (DoFs) of the system can actually in principle be controlled. Transfer function Analyze the closed-loop transfer function, i.e., how frequencies are transmitted through the system. ( Laplace transformation) Controller design Find a controller with desired stability and/or transfer function properties. Optimal control Define a cost function on the system behavior. Optimize a controller to minimize costs. Prof. Sethu Vijayakumar // R:SS

5 Control Theory References Robert F. Stengel: Optimal control and estimation Online lectures: (esp. lectures 3,4 and 7-9) From robotics lectures: Stefan Schaal s lecture Introduction to Robotics: Drew Bagnell s lecture on Adaptive Control and Reinforcement Learning Prof. Sethu Vijayakumar // R:SS

6 Optimal Control (Discrete Time) Given a controlled dynamic system: we define a cost function: Prof. Sethu Vijayakumar // R:SS

7 Dynamic Programming & Bellman Principle An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. V [ c( edge) + ( next )] ( state) = min V state edge Prof. Sethu Vijayakumar // R:SS

8 Optimal Control (Discrete Time) Define the value function or optimal cost-to-go function Bellman equation: The argmin gives the optimal control signal: Derivation: Prof. Sethu Vijayakumar // R:SS

9 Optimal Control (Continuous Time) Given a controlled dynamic system: we define a cost function with horizon T: Prof. Sethu Vijayakumar // R:SS

10 Hamilton-Jacobi-Bellman Equation (Continuous Time) Define the value function or optimal cost-to-go function Bellman equation: The argmin gives the optimal control signal: Derivation: Prof. Sethu Vijayakumar // R:SS

11 Infinite Horizon Case This cost function is stationary (time invariant)! The HJB and Bellman equations remain the same but with the same (stationary) value function independent of t: The argmin gives the optimal control signal: Prof. Sethu Vijayakumar // R:SS

12 Infinite Horizon Examples Cart Pole Balancing Problem You might define a cost: Reference Following: Several problems in control can be framed this way! Prof. Sethu Vijayakumar // R:SS

13 Comments The Bellman equation is fundamental in optimal control theory, but also Reinforcement Learning The HJB equation is a differential equation for V (x,t) which is in general hard to solve The (time-discretized) Bellman equation can be solved by Dynamic Programming starting backward: But it might still be hard or infeasible to represent the functions V t (x) over continuous x! Both become significantly simpler under linear dynamics and quadratic costs Riccati Equation Prof. Sethu Vijayakumar // R:SS

14 Linear-Quadratic Optimal Control Linear Dynamics Quadratic Costs Note: Dynamics neglects constant term; costs neglect linear and constant terms. This is because constant costs are irrelevant linear cost terms can be made away by redefining x or u constant dynamic term only introduces a constant drift Prof. Sethu Vijayakumar // R:SS

15 LQ control as Local Approximation LQ control is important also to control non-lq systems in the neighbourhood of a desired state! Let x * be such a desired state (e.g., cart-pole: x * = (0, 0, 0, 0)) linearize the dynamics around x * use 2nd order approximation of the costs around x * control the system locally as if it was LQ re-estimate for new desired state! Prof. Sethu Vijayakumar // R:SS

16 Ricatti Diff.Eq.= HJB Eq. in LQ case In the Linear-Quadratic (LQ) case, the value function is always a quadratic function of x! Let V(x,t)=x T P(t)x, then the HJB equations become: Ricatti Differential Equation: Prof. Sethu Vijayakumar // R:SS

17 Ricatti Differential Equation This is a differential equation for the matrix P(t) describing the quadratic value function. If we solve it with the finite horizon constraint P(T) = F, we solved the optimal control problem. Prof. Sethu Vijayakumar // R:SS

18 Ricatti Equations Finite Horizon Continuous Time Ricatti differential equations Infinite Horizon Continuous Time Algebraic Ricatti Equations (ARE) Finite Horizon Discrete Time Infinite Horizon Discrete Time Prof. Sethu Vijayakumar // R:SS

19 Example: 1D Point Mass Dynamics Costs Algebraic Riccati Equations Prof. Sethu Vijayakumar // R:SS

20 Example: 1D Point Mass (cont d) Algebraic Riccati Equations The Algebraic Riccati equation is usually solved numerically. (e.g., are, care or dare in Octave) Prof. Sethu Vijayakumar // R:SS

21 Relations to other topics T-step Optimal 1-step Optimal Inverse Kinematics Operational Space Control Optimal Control control no control constraints Trajectory Optimization model known adaptive Reinforcement Learning Prof. Sethu Vijayakumar // R:SS

22 Optimal Control Relations to other topics Inverse Kinematics Optimal Operational Space Control Trajectory Optimisation Reinforcement Learning Prof. Sethu Vijayakumar // R:SS

23 For more details refer to... Todorov, E. Optimal control theory, In Bayesian Brain: Probabilistic Approaches to Neural Coding, Doya K at al (ed.), chap 12, pp , MIT Press (2006). Prof. Sethu Vijayakumar // R:SS

Robotics. Control Theory. Marc Toussaint U Stuttgart

Robotics. Control Theory. Marc Toussaint U Stuttgart Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,

More information

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function: Optimal Control Control design based on pole-placement has non unique solutions Best locations for eigenvalues are sometimes difficult to determine Linear Quadratic LQ) Optimal control minimizes a quadratic

More information

The Planning-by-Inference view on decision making and goal-directed behaviour

The Planning-by-Inference view on decision making and goal-directed behaviour The Planning-by-Inference view on decision making and goal-directed behaviour Marc Toussaint Machine Learning & Robotics Lab FU Berlin marc.toussaint@fu-berlin.de Nov. 14, 2011 1/20 pigeons 2/20 pigeons

More information

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28 OPTIMAL CONTROL Sadegh Bolouki Lecture slides for ECE 515 University of Illinois, Urbana-Champaign Fall 2016 S. Bolouki (UIUC) 1 / 28 (Example from Optimal Control Theory, Kirk) Objective: To get from

More information

Lecture 5 Linear Quadratic Stochastic Control

Lecture 5 Linear Quadratic Stochastic Control EE363 Winter 2008-09 Lecture 5 Linear Quadratic Stochastic Control linear-quadratic stochastic control problem solution via dynamic programming 5 1 Linear stochastic system linear dynamical system, over

More information

Optimal Control Theory

Optimal Control Theory Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline which provides algorithms to solve various control problems The elaborate mathematical machinery behind optimal

More information

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Optimal Control. McGill COMP 765 Oct 3 rd, 2017 Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps

More information

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 204 Emo Todorov (UW) AMATH/CSE 579, Winter

More information

Chapter 5. Pontryagin s Minimum Principle (Constrained OCP)

Chapter 5. Pontryagin s Minimum Principle (Constrained OCP) Chapter 5 Pontryagin s Minimum Principle (Constrained OCP) 1 Pontryagin s Minimum Principle Plant: (5-1) u () t U PI: (5-2) Boundary condition: The goal is to find Optimal Control. 2 Pontryagin s Minimum

More information

Lecture 13: Dynamical Systems based Representation. Contents:

Lecture 13: Dynamical Systems based Representation. Contents: Lecture 13: Dynamical Systems based Representation Contents: Differential Equation Force Fields, Velocity Fields Dynamical systems for Trajectory Plans Generating plans dynamically Fitting (or modifying)

More information

Online Learning in High Dimensions. LWPR and it s application

Online Learning in High Dimensions. LWPR and it s application Lecture 9 LWPR Online Learning in High Dimensions Contents: LWPR and it s application Sethu Vijayakumar, Aaron D'Souza and Stefan Schaal, Incremental Online Learning in High Dimensions, Neural Computation,

More information

EE C128 / ME C134 Feedback Control Systems

EE C128 / ME C134 Feedback Control Systems EE C128 / ME C134 Feedback Control Systems Lecture Additional Material Introduction to Model Predictive Control Maximilian Balandat Department of Electrical Engineering & Computer Science University of

More information

ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming

ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming ECE7850 Lecture 7 Discrete Time Optimal Control and Dynamic Programming Discrete Time Optimal control Problems Short Introduction to Dynamic Programming Connection to Stabilization Problems 1 DT nonlinear

More information

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4. Optimal Control Lecture 18 Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen Ref: Bryson & Ho Chapter 4. March 29, 2004 Outline Hamilton-Jacobi-Bellman (HJB) Equation Iterative solution of HJB Equation

More information

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

OPTIMAL CONTROL AND ESTIMATION

OPTIMAL CONTROL AND ESTIMATION OPTIMAL CONTROL AND ESTIMATION Robert F. Stengel Department of Mechanical and Aerospace Engineering Princeton University, Princeton, New Jersey DOVER PUBLICATIONS, INC. New York CONTENTS 1. INTRODUCTION

More information

Lecture 2: Discrete-time Linear Quadratic Optimal Control

Lecture 2: Discrete-time Linear Quadratic Optimal Control ME 33, U Berkeley, Spring 04 Xu hen Lecture : Discrete-time Linear Quadratic Optimal ontrol Big picture Example onvergence of finite-time LQ solutions Big picture previously: dynamic programming and finite-horizon

More information

MATH4406 (Control Theory) Unit 6: The Linear Quadratic Regulator (LQR) and Model Predictive Control (MPC) Prepared by Yoni Nazarathy, Artem

MATH4406 (Control Theory) Unit 6: The Linear Quadratic Regulator (LQR) and Model Predictive Control (MPC) Prepared by Yoni Nazarathy, Artem MATH4406 (Control Theory) Unit 6: The Linear Quadratic Regulator (LQR) and Model Predictive Control (MPC) Prepared by Yoni Nazarathy, Artem Pulemotov, September 12, 2012 Unit Outline Goal 1: Outline linear

More information

Optimal control and estimation

Optimal control and estimation Automatic Control 2 Optimal control and estimation Prof. Alberto Bemporad University of Trento Academic year 2010-2011 Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011

More information

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction MS&E338 Reinforcement Learning Lecture 1 - April 2, 2018 Introduction Lecturer: Ben Van Roy Scribe: Gabriel Maher 1 Reinforcement Learning Introduction In reinforcement learning (RL) we consider an agent

More information

Time-Invariant Linear Quadratic Regulators Robert Stengel Optimal Control and Estimation MAE 546 Princeton University, 2015

Time-Invariant Linear Quadratic Regulators Robert Stengel Optimal Control and Estimation MAE 546 Princeton University, 2015 Time-Invariant Linear Quadratic Regulators Robert Stengel Optimal Control and Estimation MAE 546 Princeton University, 15 Asymptotic approach from time-varying to constant gains Elimination of cross weighting

More information

Time-Invariant Linear Quadratic Regulators!

Time-Invariant Linear Quadratic Regulators! Time-Invariant Linear Quadratic Regulators Robert Stengel Optimal Control and Estimation MAE 546 Princeton University, 17 Asymptotic approach from time-varying to constant gains Elimination of cross weighting

More information

Problem 1 Cost of an Infinite Horizon LQR

Problem 1 Cost of an Infinite Horizon LQR THE UNIVERSITY OF TEXAS AT SAN ANTONIO EE 5243 INTRODUCTION TO CYBER-PHYSICAL SYSTEMS H O M E W O R K # 5 Ahmad F. Taha October 12, 215 Homework Instructions: 1. Type your solutions in the LATEX homework

More information

Video 6.1 Vijay Kumar and Ani Hsieh

Video 6.1 Vijay Kumar and Ani Hsieh Video 6.1 Vijay Kumar and Ani Hsieh Robo3x-1.6 1 In General Disturbance Input + - Input Controller + + System Output Robo3x-1.6 2 Learning Objectives for this Week State Space Notation Modeling in the

More information

Optimal Control with Learned Forward Models

Optimal Control with Learned Forward Models Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V

More information

Stochastic and Adaptive Optimal Control

Stochastic and Adaptive Optimal Control Stochastic and Adaptive Optimal Control Robert Stengel Optimal Control and Estimation, MAE 546 Princeton University, 2018! Nonlinear systems with random inputs and perfect measurements! Stochastic neighboring-optimal

More information

Path Integral Stochastic Optimal Control for Reinforcement Learning

Path Integral Stochastic Optimal Control for Reinforcement Learning Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute

More information

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018 EN530.603 Applied Optimal Control Lecture 8: Dynamic Programming October 0, 08 Lecturer: Marin Kobilarov Dynamic Programming (DP) is conerned with the computation of an optimal policy, i.e. an optimal

More information

Hamilton-Jacobi-Bellman Equation Feb 25, 2008

Hamilton-Jacobi-Bellman Equation Feb 25, 2008 Hamilton-Jacobi-Bellman Equation Feb 25, 2008 What is it? The Hamilton-Jacobi-Bellman (HJB) equation is the continuous-time analog to the discrete deterministic dynamic programming algorithm Discrete VS

More information

Control Theory in Physics and other Fields of Science

Control Theory in Physics and other Fields of Science Michael Schulz Control Theory in Physics and other Fields of Science Concepts, Tools, and Applications With 46 Figures Sprin ger 1 Introduction 1 1.1 The Aim of Control Theory 1 1.2 Dynamic State of Classical

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

A path integral approach to agent planning

A path integral approach to agent planning A path integral approach to agent planning Hilbert J. Kappen Department of Biophysics Radboud University Nijmegen, The Netherlands b.kappen@science.ru.nl Wim Wiegerinck B. van den Broek Department of Biophysics

More information

MATH4406 (Control Theory) Unit 1: Introduction Prepared by Yoni Nazarathy, July 21, 2012

MATH4406 (Control Theory) Unit 1: Introduction Prepared by Yoni Nazarathy, July 21, 2012 MATH4406 (Control Theory) Unit 1: Introduction Prepared by Yoni Nazarathy, July 21, 2012 Unit Outline Introduction to the course: Course goals, assessment, etc... What is Control Theory A bit of jargon,

More information

Steady State Kalman Filter

Steady State Kalman Filter Steady State Kalman Filter Infinite Horizon LQ Control: ẋ = Ax + Bu R positive definite, Q = Q T 2Q 1 2. (A, B) stabilizable, (A, Q 1 2) detectable. Solve for the positive (semi-) definite P in the ARE:

More information

Minimum Angular Acceleration Control of Articulated Body Dynamics

Minimum Angular Acceleration Control of Articulated Body Dynamics Minimum Angular Acceleration Control of Articulated Body Dynamics Javier R. Movellan UCSD Abstract As robots find applications in daily life conditions it becomes important to develop controllers that

More information

Autonomous Mobile Robot Design

Autonomous Mobile Robot Design Autonomous Mobile Robot Design Topic: Guidance and Control Introduction and PID Loops Dr. Kostas Alexis (CSE) Autonomous Robot Challenges How do I control where to go? Autonomous Mobile Robot Design Topic:

More information

Quadratic Stability of Dynamical Systems. Raktim Bhattacharya Aerospace Engineering, Texas A&M University

Quadratic Stability of Dynamical Systems. Raktim Bhattacharya Aerospace Engineering, Texas A&M University .. Quadratic Stability of Dynamical Systems Raktim Bhattacharya Aerospace Engineering, Texas A&M University Quadratic Lyapunov Functions Quadratic Stability Dynamical system is quadratically stable if

More information

CDS 110b: Lecture 2-1 Linear Quadratic Regulators

CDS 110b: Lecture 2-1 Linear Quadratic Regulators CDS 110b: Lecture 2-1 Linear Quadratic Regulators Richard M. Murray 11 January 2006 Goals: Derive the linear quadratic regulator and demonstrate its use Reading: Friedland, Chapter 9 (different derivation,

More information

Overview of the Seminar Topic

Overview of the Seminar Topic Overview of the Seminar Topic Simo Särkkä Laboratory of Computational Engineering Helsinki University of Technology September 17, 2007 Contents 1 What is Control Theory? 2 History

More information

Policy Search for Path Integral Control

Policy Search for Path Integral Control Policy Search for Path Integral Control Vicenç Gómez 1,2, Hilbert J Kappen 2, Jan Peters 3,4, and Gerhard Neumann 3 1 Universitat Pompeu Fabra, Barcelona Department of Information and Communication Technologies,

More information

Underactuated Robotics: Learning, Planning, and Control for Efficient and Agile Machines Course Notes for MIT 6.832

Underactuated Robotics: Learning, Planning, and Control for Efficient and Agile Machines Course Notes for MIT 6.832 Underactuated Robotics: Learning, Planning, and Control for Efficient and Agile Machines Course Notes for MIT 6.832 Russ Tedrake Massachusetts Institute of Technology c Russ Tedrake, 2009 2 c Russ Tedrake,

More information

Lecture XI Learning with Distal Teachers (Forward and Inverse Models)

Lecture XI Learning with Distal Teachers (Forward and Inverse Models) Lecture XI Learning with Distal eachers (Forward and Inverse Models) Contents: he Distal eacher Problem Distal Supervised Learning Direct Inverse Modeling Forward Models Jacobian Combined Inverse & Forward

More information

Lecture Notes: (Stochastic) Optimal Control

Lecture Notes: (Stochastic) Optimal Control Lecture Notes: (Stochastic) Optimal ontrol Marc Toussaint Machine Learning & Robotics group, TU erlin Franklinstr. 28/29, FR 6-9, 587 erlin, Germany July, 2 Disclaimer: These notes are not meant to be

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process

More information

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks

More information

Path Integral Control by Reproducing Kernel Hilbert Space Embedding

Path Integral Control by Reproducing Kernel Hilbert Space Embedding Path Integral Control by Reproducing Kernel Hilbert Space Embedding Konrad Rawlik School of Informatics University of Edinburgh Marc Toussaint Inst. für Parallele und Verteilte Systeme Universität Stuttgart

More information

Return Difference Function and Closed-Loop Roots Single-Input/Single-Output Control Systems

Return Difference Function and Closed-Loop Roots Single-Input/Single-Output Control Systems Spectral Properties of Linear- Quadratic Regulators Robert Stengel Optimal Control and Estimation MAE 546 Princeton University, 2018! Stability margins of single-input/singleoutput (SISO) systems! Characterizations

More information

X 2 3. Derive state transition matrix and its properties [10M] 4. (a) Derive a state space representation of the following system [5M] 1

X 2 3. Derive state transition matrix and its properties [10M] 4. (a) Derive a state space representation of the following system [5M] 1 QUESTION BANK 6 SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road 5758 QUESTION BANK (DESCRIPTIVE) Subject with Code :SYSTEM THEORY(6EE75) Year &Sem: I-M.Tech& I-Sem UNIT-I

More information

Pontryagin s maximum principle

Pontryagin s maximum principle Pontryagin s maximum principle Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2012 Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 1 / 9 Pontryagin

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour

More information

Lecture 3: Markov Decision Processes

Lecture 3: Markov Decision Processes Lecture 3: Markov Decision Processes Joseph Modayil 1 Markov Processes 2 Markov Reward Processes 3 Markov Decision Processes 4 Extensions to MDPs Markov Processes Introduction Introduction to MDPs Markov

More information

Covariant Policy Search

Covariant Policy Search Covariant Policy Search J. Andrew Bagnell and Jeff Schneider Robotics Institute Carnegie-Mellon University Pittsburgh, PA 15213 { dbagnell, schneide } @ ri. emu. edu Abstract We investigate the problem

More information

An introduction to stochastic control theory, path integrals and reinforcement learning

An introduction to stochastic control theory, path integrals and reinforcement learning An introduction to stochastic control theory, path integrals and reinforcement learning Hilbert J. Kappen Department of Biophysics, Radboud University, Geert Grooteplein 21, 6525 EZ Nijmegen Abstract.

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

MODEL-BASED REINFORCEMENT LEARNING FOR ONLINE APPROXIMATE OPTIMAL CONTROL

MODEL-BASED REINFORCEMENT LEARNING FOR ONLINE APPROXIMATE OPTIMAL CONTROL MODEL-BASED REINFORCEMENT LEARNING FOR ONLINE APPROXIMATE OPTIMAL CONTROL By RUSHIKESH LAMBODAR KAMALAPURKAR A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

More information

arxiv: v1 [cs.lg] 20 Sep 2010

arxiv: v1 [cs.lg] 20 Sep 2010 Approximate Inference and Stochastic Optimal Control Konrad Rawlik 1, Marc Toussaint 2, and Sethu Vijayakumar 1 1 Statistical Machine Learning and Motor Control Group, University of Edinburgh 2 Machine

More information

POMDPs and Policy Gradients

POMDPs and Policy Gradients POMDPs and Policy Gradients MLSS 2006, Canberra Douglas Aberdeen Canberra Node, RSISE Building Australian National University 15th February 2006 Outline 1 Introduction What is Reinforcement Learning? Types

More information

On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference

On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference Konrad Rawlik, Marc Toussaint and Sethu Vijayakumar School of Informatics, University of Edinburgh, UK Department of Computer

More information

Outline. 1 Linear Quadratic Problem. 2 Constraints. 3 Dynamic Programming Solution. 4 The Infinite Horizon LQ Problem.

Outline. 1 Linear Quadratic Problem. 2 Constraints. 3 Dynamic Programming Solution. 4 The Infinite Horizon LQ Problem. Model Predictive Control Short Course Regulation James B. Rawlings Michael J. Risbeck Nishith R. Patel Department of Chemical and Biological Engineering Copyright c 217 by James B. Rawlings Outline 1 Linear

More information

Bayesian Decision Theory in Sensorimotor Control

Bayesian Decision Theory in Sensorimotor Control Bayesian Decision Theory in Sensorimotor Control Matthias Freiberger, Martin Öttl Signal Processing and Speech Communication Laboratory Advanced Signal Processing Matthias Freiberger, Martin Öttl Advanced

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Information, Utility & Bounded Rationality

Information, Utility & Bounded Rationality Information, Utility & Bounded Rationality Pedro A. Ortega and Daniel A. Braun Department of Engineering, University of Cambridge Trumpington Street, Cambridge, CB2 PZ, UK {dab54,pao32}@cam.ac.uk Abstract.

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes

More information

Chap 4. State-Space Solutions and

Chap 4. State-Space Solutions and Chap 4. State-Space Solutions and Realizations Outlines 1. Introduction 2. Solution of LTI State Equation 3. Equivalent State Equations 4. Realizations 5. Solution of Linear Time-Varying (LTV) Equations

More information

Control of Mobile Robots

Control of Mobile Robots Control of Mobile Robots Regulation and trajectory tracking Prof. Luca Bascetta (luca.bascetta@polimi.it) Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria Organization and

More information

Lecture 9. Systems of Two First Order Linear ODEs

Lecture 9. Systems of Two First Order Linear ODEs Math 245 - Mathematics of Physics and Engineering I Lecture 9. Systems of Two First Order Linear ODEs January 30, 2012 Konstantin Zuev (USC) Math 245, Lecture 9 January 30, 2012 1 / 15 Agenda General Form

More information

Reinforcement Learning with Reference Tracking Control in Continuous State Spaces

Reinforcement Learning with Reference Tracking Control in Continuous State Spaces Reinforcement Learning with Reference Tracking Control in Continuous State Spaces Joseph Hall, Carl Edward Rasmussen and Jan Maciejowski Abstract The contribution described in this paper is an algorithm

More information

Lecture 20: Linear Dynamics and LQG

Lecture 20: Linear Dynamics and LQG CSE599i: Online and Adaptive Machine Learning Winter 2018 Lecturer: Kevin Jamieson Lecture 20: Linear Dynamics and LQG Scribes: Atinuke Ademola-Idowu, Yuanyuan Shi Disclaimer: These notes have not been

More information

ESC794: Special Topics: Model Predictive Control

ESC794: Special Topics: Model Predictive Control ESC794: Special Topics: Model Predictive Control Discrete-Time Systems Hanz Richter, Professor Mechanical Engineering Department Cleveland State University Discrete-Time vs. Sampled-Data Systems A continuous-time

More information

LINEARLY SOLVABLE OPTIMAL CONTROL

LINEARLY SOLVABLE OPTIMAL CONTROL CHAPTER 6 LINEARLY SOLVABLE OPTIMAL CONTROL K. Dvijotham 1 and E. Todorov 2 1 Computer Science & Engineering, University of Washington, Seattle 2 Computer Science & Engineering and Applied Mathematics,

More information

Scaling Reinforcement Learning Paradigms for Motor Control

Scaling Reinforcement Learning Paradigms for Motor Control Scaling Reinforcement Learning Paradigms for Motor Control Jan Peters, Sethu Vijayakumar, Stefan Schaal Computer Science & Neuroscience, Univ. of Southern California, Los Angeles, CA 90089 ATR Computational

More information

Contents. 1 State-Space Linear Systems 5. 2 Linearization Causality, Time Invariance, and Linearity 31

Contents. 1 State-Space Linear Systems 5. 2 Linearization Causality, Time Invariance, and Linearity 31 Contents Preamble xiii Linear Systems I Basic Concepts 1 I System Representation 3 1 State-Space Linear Systems 5 1.1 State-Space Linear Systems 5 1.2 Block Diagrams 7 1.3 Exercises 11 2 Linearization

More information

arxiv: v1 [cs.sy] 3 Sep 2015

arxiv: v1 [cs.sy] 3 Sep 2015 Model Based Reinforcement Learning with Final Time Horizon Optimization arxiv:9.86v [cs.sy] 3 Sep Wei Sun School of Aerospace Engineering Georgia Institute of Technology Atlanta, GA 333-, USA. wsun4@gatech.edu

More information

Optimal Control, Trajectory Optimization, Learning Dynamics

Optimal Control, Trajectory Optimization, Learning Dynamics Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Optimal Control, Trajectory Optimization, Learning Dynamics Katerina Fragkiadaki So far.. Most Reinforcement Learning

More information

Model Predictive Control Short Course Regulation

Model Predictive Control Short Course Regulation Model Predictive Control Short Course Regulation James B. Rawlings Michael J. Risbeck Nishith R. Patel Department of Chemical and Biological Engineering Copyright c 2017 by James B. Rawlings Milwaukee,

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

Continuous State Space Q-Learning for Control of Nonlinear Systems

Continuous State Space Q-Learning for Control of Nonlinear Systems Continuous State Space Q-Learning for Control of Nonlinear Systems . Continuous State Space Q-Learning for Control of Nonlinear Systems ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad van doctor aan

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Reinforcement Learning. Yishay Mansour Tel-Aviv University

Reinforcement Learning. Yishay Mansour Tel-Aviv University Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

An online kernel-based clustering approach for value function approximation

An online kernel-based clustering approach for value function approximation An online kernel-based clustering approach for value function approximation N. Tziortziotis and K. Blekas Department of Computer Science, University of Ioannina P.O.Box 1186, Ioannina 45110 - Greece {ntziorzi,kblekas}@cs.uoi.gr

More information

Inverse Optimality Design for Biological Movement Systems

Inverse Optimality Design for Biological Movement Systems Inverse Optimality Design for Biological Movement Systems Weiwei Li Emanuel Todorov Dan Liu Nordson Asymtek Carlsbad CA 921 USA e-mail: wwli@ieee.org. University of Washington Seattle WA 98195 USA Google

More information

CS 598 Statistical Reinforcement Learning. Nan Jiang

CS 598 Statistical Reinforcement Learning. Nan Jiang CS 598 Statistical Reinforcement Learning Nan Jiang Overview What s this course about? A grad-level seminar course on theory of RL 3 What s this course about? A grad-level seminar course on theory of RL

More information

Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation

Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation EE363 Winter 2008-09 Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation partially observed linear-quadratic stochastic control problem estimation-control separation principle

More information

Topic # /31 Feedback Control Systems. Analysis of Nonlinear Systems Lyapunov Stability Analysis

Topic # /31 Feedback Control Systems. Analysis of Nonlinear Systems Lyapunov Stability Analysis Topic # 16.30/31 Feedback Control Systems Analysis of Nonlinear Systems Lyapunov Stability Analysis Fall 010 16.30/31 Lyapunov Stability Analysis Very general method to prove (or disprove) stability of

More information

Linear-Quadratic Optimal Control: Full-State Feedback

Linear-Quadratic Optimal Control: Full-State Feedback Chapter 4 Linear-Quadratic Optimal Control: Full-State Feedback 1 Linear quadratic optimization is a basic method for designing controllers for linear (and often nonlinear) dynamical systems and is actually

More information

MATH4406 (Control Theory) Unit 1: Introduction to Control Prepared by Yoni Nazarathy, August 11, 2014

MATH4406 (Control Theory) Unit 1: Introduction to Control Prepared by Yoni Nazarathy, August 11, 2014 MATH4406 (Control Theory) Unit 1: Introduction to Control Prepared by Yoni Nazarathy, August 11, 2014 Unit Outline The many faces of Control Theory A bit of jargon, history and applications Focus on inherently

More information

Model-Based Reinforcement Learning with Continuous States and Actions

Model-Based Reinforcement Learning with Continuous States and Actions Marc P. Deisenroth, Carl E. Rasmussen, and Jan Peters: Model-Based Reinforcement Learning with Continuous States and Actions in Proceedings of the 16th European Symposium on Artificial Neural Networks

More information

Reinforcement learning

Reinforcement learning Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki

ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches Ville Kyrki 9.10.2017 Today Direct policy learning via policy gradient. Learning goals Understand basis and limitations

More information

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory Constrained Innite-Time Nonlinear Quadratic Optimal Control V. Manousiouthakis D. Chmielewski Chemical Engineering Department UCLA 1998 AIChE Annual Meeting Outline Unconstrained Innite-Time Nonlinear

More information

Chapter 2 Optimal Control Problem

Chapter 2 Optimal Control Problem Chapter 2 Optimal Control Problem Optimal control of any process can be achieved either in open or closed loop. In the following two chapters we concentrate mainly on the first class. The first chapter

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information