Nonlinear Optimization for Optimal Control Part 2. Pieter Abbeel UC Berkeley EECS. From linear to nonlinear Model-predictive control (MPC) POMDPs

Similar documents
Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs)

Nonlinear Optimization for Optimal Control

EE C128 / ME C134 Feedback Control Systems

EKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

EKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Particle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions

CS 5522: Artificial Intelligence II

Bellman s Curse of Dimensionality

CS 5522: Artificial Intelligence II

CS 343: Artificial Intelligence

Learning Model Predictive Control for Iterative Tasks: A Computationally Efficient Approach for Linear System

CSE 473: Artificial Intelligence

Autonomous Navigation for Flying Robots

2D Image Processing. Bayes filter implementation: Kalman filter

Stochastic Model Predictive Controller with Chance Constraints for Comfortable and Safe Driving Behavior of Autonomous Vehicles

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence

Motion Planning Under Uncertainty In Highly Deformable Environments

CSE 473: Artificial Intelligence Spring 2014

2D Image Processing. Bayes filter implementation: Kalman filter

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

CSE 473: Ar+ficial Intelligence. Example. Par+cle Filters for HMMs. An HMM is defined by: Ini+al distribu+on: Transi+ons: Emissions:

CS 188: Artificial Intelligence Spring Announcements

Robust Model Predictive Control for Autonomous Vehicle/Self-Driving Cars

Markov localization uses an explicit, discrete representation for the probability of all position in the state space.

COS Lecture 16 Autonomous Robot Navigation

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

EKF and SLAM. McGill COMP 765 Sept 18 th, 2017

2D Image Processing (Extended) Kalman and particle filter

1 Computing with constraints

Introduction to Mobile Robotics Probabilistic Robotics

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

CS 188: Artificial Intelligence

Robot Localization and Kalman Filters

Optimal Control, Trajectory Optimization, Learning Dynamics

CDS 110b: Lecture 2-1 Linear Quadratic Regulators

CS 5522: Artificial Intelligence II

CS 188: Artificial Intelligence Spring 2009

6.231 DYNAMIC PROGRAMMING LECTURE 7 LECTURE OUTLINE

CSE 473: Ar+ficial Intelligence. Probability Recap. Markov Models - II. Condi+onal probability. Product rule. Chain rule.

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Design of Norm-Optimal Iterative Learning Controllers: The Effect of an Iteration-Domain Kalman Filter for Disturbance Estimation

MPC Infeasibility Handling

(Extended) Kalman Filter

CS 5522: Artificial Intelligence II

Probabilistic Graphical Models

Here represents the impulse (or delta) function. is an diagonal matrix of intensities, and is an diagonal matrix of intensities.

Mathematical Formulation of Our Example

Introduction to General and Generalized Linear Models

Maximum Likelihood (ML), Expecta6on Maximiza6on (EM)

CS 343: Artificial Intelligence

E190Q Lecture 10 Autonomous Robot Navigation

Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs. Lorenzo Nardi and Cyrill Stachniss

Gaussians. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

ROBOTICS 01PEEQW. Basilio Bona DAUIN Politecnico di Torino

Convex receding horizon control in non-gaussian belief space

THIS work is concerned with motion planning in Dynamic,

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

2.098/6.255/ Optimization Methods Practice True/False Questions

Sensor Fusion: Particle Filter

Introduction to Mobile Robotics Bayes Filter Kalman Filter

Robot Mapping. Least Squares. Cyrill Stachniss

Control Theory : Course Summary

Markov decision processes

CS 188: Artificial Intelligence Spring Announcements

Multidisciplinary System Design Optimization (MSDO)

CSE 473: Ar+ficial Intelligence

Partially Observable Markov Decision Processes (POMDPs)

The Kalman Filter. Data Assimilation & Inverse Problems from Weather Forecasting to Neuroscience. Sarah Dance

IEOR 265 Lecture 14 (Robust) Linear Tube MPC

Nonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania

MATH4406 (Control Theory) Unit 1: Introduction Prepared by Yoni Nazarathy, July 21, 2012

Anytime Planning for Decentralized Multi-Robot Active Information Gathering

Introduction to Machine Learning

Optimal Control Theory SF 2852

EL2520 Control Theory and Practice

Lecture 3 September 1

Markov Chains and Hidden Markov Models

The Kalman Filter ImPr Talk

Block Condensing with qpdunes

Motion Planning for LTL Specifications: A Satisfiability Modulo Convex Optimization Approach

Autonomous Mobile Robot Design

Nonlinear and robust MPC with applications in robotics

Theory in Model Predictive Control :" Constraint Satisfaction and Stability!

Mobile Robot Localization

CS711008Z Algorithm Design and Analysis

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

LQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin

CS181 Midterm 2 Practice Solutions

CHAPTER 2: QUADRATIC PROGRAMMING

Contents. 1 State-Space Linear Systems 5. 2 Linearization Causality, Time Invariance, and Linearity 31

Algorithms for constrained local optimization

Linear-Quadratic Optimal Control: Full-State Feedback

5 Handling Constraints

Transcription:

Nonlinear Optimization for Optimal Control Part 2 Pieter Abbeel UC Berkeley EECS Outline From linear to nonlinear Model-predictive control (MPC) POMDPs Page 1!

From Linear to Nonlinear We know how to solve (assuming g t, U t, X t convex): (1) How about nonlinear dynamics: Shooting Methods (feasible) Iterate for i=1, 2, 3, Execute (from solving (1)) Linearize around resulting trajectory Solve (1) for current linearization Collocation Methods (infeasible) Iterate for i=1, 2, 3, --- (no execution)--- Linearize around current solution of (1) Solve (1) for current linearization Sequential Quadratic Programming (SQP) = either of the above methods, but instead of using linearization, linearize equality constraints, convex-quadratic approximate objective function Example Shooting Page 2!

Example Collocation Practical Benefits and Issues with Shooting + At all times the sequence of controls is meaningful, and the objective function optimized directly corresponds to the current control sequence - For unstable systems, need to run feedback controller during forward simulation Why? Open loop sequence of control inputs computed for the linearized system will not be perfect for the nonlinear system. If the nonlinear system is unstable, open loop execution would give poor performance. Fixes: Run Model Predictive Control for forward simulation Compute a linear feedback controller from the 2 nd order Taylor expansion at the optimum (exercise: work out the details!) Page 3!

Practical Benefits and Issues with Collocation + Can initialize with infeasible trajectory. Hence if you have a rough idea of a sequence of states that would form a reasonable solution, you can initialize with this sequence of states without needing to know a control sequence that would lead through them, and without needing to make them consistent with the dynamics - Sequence of control inputs and states might never converge onto a feasible sequence Iterative LQR versus Sequential Convex Programming Both can solve Can run iterative LQR both as a shooting method or as a collocation method, it s just a different way of executing Solve (1) for current linearization. In case of shooting, the sequence of linear feedback controllers found can be used for (closed-loop) execution. Iterative LQR might need some outer iterations, adjusting t of the log barrier Shooting Methods (feasible) Iterate for i=1, 2, 3, Execute feedback controller (from solving (1)) Linearize around resulting trajectory Collocation Methods (infeasible) Iterate for i=1, 2, 3, --- (no execution)--- Linearize around current solution of (1) Solve (1) for current linearization Solve (1) for current linearization Sequential Quadratic Programming (SQP) = either of the above methods, but instead of using linearization, linearize equality constraints, convex-quadratic approximate objective function Page 4!

Outline From linear to nonlinear Model-predictive control (MPC) For an entire semester course on MPC: see Francesco Borrelli POMDPs Model Predictive Control Given: For k=0, 1, 2,, T Solve Execute u k Observe resulting state, Page 5!

Initialization Initialization with solution from iteration k-1 can make solver very fast can be done most conveniently with infeasible start Newton method Terminal Cost Re-solving over full horizon can be computationally too expensive given frequency at which one might want to do control Instead solve Estimate of cost-to-go Estimate of cost-to-go If using iterative LQR can use quadratic value function found for time t+h If using nonlinear optimization for open-loop control sequenceà can find quadratic approximation from Hessian at solution (exercise, try to derive it!) Page 6!

Car Control with MPC Video Prof. Francesco Borrelli (M.E.) and collaborators http://video.google.com/videoplay? docid=-8338487882440308275 Outline From linear to nonlinear Model-predictive control (MPC) POMDPs Page 7!

POMDP Examples Localization/Navigation à Coastal Navigation SLAM + robot execution à Active exploration of unknown areas Needle steering à maximize probability of success Ghostbusters (188) à Can choose to sense or bust while navigating a maze with ghosts Certainty equivalent solution does not always do well Robotic Needle Steering [from van den Berg, Patil, Alterovitz, Abbeel, Goldberg, WAFR2010] Page 8!

Robotic Needle Steering [from van den Berg, Patil, Alterovitz, Abbeel, Goldberg, WAFR2010] POMDP: Partially Observable Markov Decision Process Belief state B t, B t (x) = P(x t = x z 0,, z t, u 0,, u t-1 ) If the control input is u t, and observation z t+1 then B t+1 (x ) = x B t (x) P(x x,u t ) P(z t+1 x ) Page 9!

POMDP Solution Methods Value Iteration: Perform value iteration on the belief state space High-dimensional space, usually impractical Approximate belief with Gaussian Just keep track of mean and covariance Using (extended or unscented) KF, dynamics model, observation model, we get a nonlinear system equation for our new state variables, : Can now run any of the nonlinear optimization methods for optimal control Example: Nonlinear Optimization for Control in Belief Space using Gaussian Approximations [van den Berg, Patil, Alterovitz, ISSR 2011] Page 10!

Example: Nonlinear Optimization for Control in Belief Space using Gaussian Approximations [van den Berg, Patil, Alterovitz, ISSR 2011] Linear Gaussian System with Quadratic Cost: Separation Principle Very special case: Linear Gaussian Dynamics Linear Gaussian Observation Model Quadratic Cost Fact: The optimal control policy in belief space for the above system consists of running the optimal feedback controller for the same system when the state is fully observed, which we know from earlier lectures is a time-varying linear feedback controller easily found by value iteration a Kalman filter, which feeds its state estimate into the feedback controller Page 11!