Lecture Notes: (Stochastic) Optimal Control

Similar documents
Optimal Control with Learned Forward Models

Reconstructing Null-space Policies Subject to Dynamic Task Constraints in Redundant Manipulators

Robotics. Control Theory. Marc Toussaint U Stuttgart

Control of industrial robots. Centralized control

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Partially Observable Markov Decision Processes (POMDPs)

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart

Advanced Robotic Manipulation

Robotics I. April 1, the motion starts and ends with zero Cartesian velocity and acceleration;

Artificial Intelligence

Probabilistic Prioritization of Movement Primitives

Robotics. Kinematics. Marc Toussaint University of Stuttgart Winter 2017/18

Inverse differential kinematics Statics and force transformations

Machine Learning 4771

Managing Uncertainty

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

Linear Regression (9/11/13)

Statistical Data Mining and Machine Learning Hilary Term 2016

Online Learning in High Dimensions. LWPR and it s application

Lecture «Robot Dynamics»: Dynamics 2

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

10-725/36-725: Convex Optimization Prerequisite Topics

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods

Probabilistic inference for computing optimal policies in MDPs

Planning and Moving in Dynamic Environments

Path Integral Stochastic Optimal Control for Reinforcement Learning

11. Learning graphical models

Neural Network Training

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture «Robot Dynamics» : Kinematics 3

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Probabilistic Graphical Models

arxiv: v1 [cs.lg] 20 Sep 2010

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

Markov Decision Processes

Probabilistic Graphical Models

Lecture Schedule Week Date Lecture (M: 2:05p-3:50, 50-N202)

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Robotics & Automation. Lecture 25. Dynamics of Constrained Systems, Dynamic Control. John T. Wen. April 26, 2007

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Inference and estimation in probabilistic time series models

Robotics I. February 6, 2014

Operational Space Control of Constrained and Underactuated Systems

Overfitting, Bias / Variance Analysis

Introduction. Chapter 1

Least Squares Regression

Linear Algebra and Robot Modeling

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Manipulators. Robotics. Outline. Non-holonomic robots. Sensors. Mobile Robots

LQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Least Squares Regression

Coarticulation in Markov Decision Processes

Probabilistic Graphical Models

ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki

Robotics. Dynamics. Marc Toussaint U Stuttgart

Robust Control of Cooperative Underactuated Manipulators

A Study of Morphological Computation by using Probabilistic Inference for Motor Planning

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

Lecture «Robot Dynamics»: Dynamics and Control

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

A new large projection operator for the redundancy framework

Linear Models for Classification

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning

EM-based Reinforcement Learning

Learning Operational Space Control

Linear Models in Machine Learning

Linear Dynamical Systems

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Linear Dynamical Systems (Kalman filter)

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Nonlinear Observers. Jaime A. Moreno. Eléctrica y Computación Instituto de Ingeniería Universidad Nacional Autónoma de México

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

Anytime Planning for Decentralized Multi-Robot Active Information Gathering

Gradient Methods for Markov Decision Processes

Closed-loop fluid flow control with a reduced-order model gain-scheduling approach

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera

Robotics 1 Inverse kinematics

Probabilistic Robotics

EL2520 Control Theory and Practice

Data Structures for Efficient Inference and Optimization

A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games

(W: 12:05-1:50, 50-N202)

Linear Models for Regression. Sargur Srihari

STA 4273H: Statistical Machine Learning

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

The Expectation-Maximization Algorithm

The Jacobian. Jesse van den Kieboom

ECE521 week 3: 23/26 January 2017

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form

Policy Gradient Reinforcement Learning for Robotics

CS281 Section 4: Factor Analysis and PCA

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

MEAM 520. More Velocity Kinematics

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Transcription:

Lecture Notes: (Stochastic) Optimal ontrol Marc Toussaint Machine Learning & Robotics group, TU erlin Franklinstr. 28/29, FR 6-9, 587 erlin, Germany July, 2 Disclaimer: These notes are not meant to be a complete or comprehensive survey on Stochastic Optimal ontrol. This is more of a personal script which I use to keep an overview over control methods and their derivations. One point of these notes is to fix a consistent notation and provide a coherent overview for these specific methods. ontents Notation 2 Stochastic optimal control (discrete time) 2 2. The Linear-uadratic-Gaussian (LQG) case.......................................... 3 2.2 Message passing in LQG.................................................... 3 2.3 Special case: kinematic control................................................. 5 2.4 Special case: multiple kinematic task variables........................................ 5 2.5 Special case: Pseudo-dynamic process............................................. 6 2.6 Special case: dynamic process................................................. 6 2.7 Special case: multiple dynamic/kinematic task variables.................................. 6 3 The optimization view on classical control laws 7 3. General uadratic loss function and constraints....................................... 7 3.2 Ridge regression......................................................... 8 3.3 Motion rate control (pseudo-inverse kinematics)....................................... 8 3.4 Regularized inverse kinematics (singularity robust motion rate control)......................... 8 3.5 Multiple regularized task variables.............................................. 9 3.6 Multiple prioritized task variables (prioritized inverse kinematics)............................ 9 3.7 Optimal dynamic control (incl. operational space control)................................. 9 Notation x system state (can be or (, )) R n robot posture (vector of joint angles) y R d a task variable (e.g., endeffector position) φ : y differentiable kinematic function J() = φ Rd n task Jacobian in posture We define a Gaussian over x with mean a and covariance matrix A as the function N(x a, A) = 2πA /2 exp{ 2 (x a) A - (x a)} () with property N(x a, A) = N(a x, A). We also define the canonical representation Nx a, A = exp{ 2 a A - a} 2πA - /2 exp{ 2 x A x x a} (2) with properties Nx a, A = N(x A - a, A - ), N(x a, A) = Nx A - a, A -.

Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, 2 2 The product of two Gaussians can be expressed as Nx a, A Nx b, = Nx a b, A N(A - a - b, A - - ), (3) N(x a, A) N(x b, ) = Nx A - a - b, A - - N(a b, A ), (4) N(x a, A) Nx b, = Nx A - a b, A - N(a - b, A - ). (5) Linear transformations in x imply the following identities, N(F x f a, A) = F N(x F - (a f), F - AF - ), (6) = F Nx F A - (a f), F A - F, (7) NF x f a, A = F Nx F (a Af), F AF. (8) The joint Gaussian of two linearly dependent Gaussian variables reads N(x a, A) N(y b F x, ) = N ( x y A a A, b F a A A F ) F A F A F A (See the lecture notes on Gaussian identities for more identities.) Let us collect some matrix identities which we will need throughout. The Woodbury identity (J - J W ) - J - = W - J (JW - J ) -, holds for any positive definite and W. Further we have the identity I n (J - J W ) - J - J = (J - J W ) - W. () We define the pseudo-inverse of J w.r.t. W as (9) J W = W - J (JW - J ) - () and similar uantity as J = (J- J ) - J -. (2) 2 Stochastic optimal control (discrete time) We assume a framework that is basically the same as for Markov Decision Processes but with a slight change in notation. Instead of maximizing rewards we will minimize costs; instead of an action a t we refer to the control u t ; instead of V (x) we denote the optimal value function by J t (x). ost and dynamics are generally non-stationary. onsider a discrete time stochastic controlled system x t = f t (x t, u t ) ξ t, ξ t N(, Q t ) (3) with the state x t R n, the control signal u t R m, and Gaussian noise ξ of covariance Q. An alternative notation is P (x t u t, x t ) = N(x t f t (x t, u t ), Q t ) (4) For a given state-control seuence x :T, u :T we define the cost (x :T, u :T ) = c t (x t, u t ). t= Unlike the reward function in stationary MDPs, this cost function is typically not stationary. For instance, the cost might focus on the final state x T relative to a goal state. In conseuence also optimal policies are non-stationary. Just as we had in the MDP case, the value function obeys the ellman optimality euation J t (x) = min c t (x, u) P (x u, x) J t (x ) (6) u x There are two versions of stochastic optimal control problems: The open-loop control problem is to find a control seuence u :T that minimizes the expected cost. The closed-loop (feedback) control problem is find a control policy πt : x t u t (that exploits the true state observation in each time step and maps it to a feedback control signal) that minimizes the expected cost. (5)

Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, 2 3 2. The Linear-uadratic-Gaussian (LQG) case onsider a linear control process with Gaussian noise, P (x t x t, u t ) = N(x t A t x t a t t u t, Q t ), (7) and uadratic costs, c t (x t, u t ) = x tr t x t 2r t x t u th t u t. (8) The general LQG case is specified by matrices and vectors A :T, a :T, :T, Q :T, R :T, r :T, H :T. With a proper choice of R t and r t this corresponds to the problem of tracing an arbitrary desired trajectory x t, where the cost is uadratic in (x t x t ). The LQG case allows us to derive an exact backward recursion, called Riccati euation, for the computation of the value function. The value function will always be a uadratic form of the state. Let us assume that we know the value function J t (x) at time t and that it has the form Then J t (x) = x V t x 2v tx. J t (x) = min x R t x 2r t x u H t u u N(y A t x a t t u, Q t ) (y V t y 2v ty) dx y (9) (2) ONVENTION: For the remainder of this section we will drop the subscript t for A, a,, Q, R, r, H whereever it is missing we refer to time t. The expectation of a uadratic form under a Gaussian is E N(y a,a) {y V y 2v y} = a V a 2v a tr(v A). So we have J t (x) = min x Rx 2r x u Hu (Ax a u) V t (Ax a u) u 2v t(ax a u) tr(v t Q) = min x (R A V t A)x 2(r (v t V t a) A)x u (H V t )u u 2u (V t (Ax a) v t ) a V t a 2v ta tr(v t Q) Minimizing w.r.t. u by setting the gradient to zero we have = 2(H V t )u 2 (V t (Ax a) v t ) (23) u t (x) = (H V t ) - (V t (Ax a) v t ) (24) J t (x) = x (R A V t A)x 2(r (v t V t a) A)x (V t (Ax a) v t ) (H V t ) - (V t (Ax a) v t ) 2(V t (Ax a) v t ) (H V t ) - (V t (Ax a) v t ) a V t a 2v ta tr(v t Q) (25) J t (x) = x V t x 2x v t terms independent of x, V t = R A V t A KV t A v t = r A (v t V t a) K(v t V t a) (27) K := A V t(v t (H - ) - ) -, The last euation for J t is called the Ricatti euation. Initialized with V T = R T and v T = r T this gives a backward recursion to compute the value function J t at each time step. Euation (24) also gives the optimal control policy. Note that the optimal control and path is independent of the process noise Q. (2) (22) (26) 2.2 Message passing in LQG We may translate costs to probabilities by introducing a binary random variable z t dependent on x t and u t, P (z t = u t, x t ) = exp{ c t (x t, u t )}. (28)

Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, 2 4 In the LQG case we can simplify this to P (z t = x t ) Nx t r t, R t, P (u t ) = N(u t, H - ) (29) (3) What is the posterior over the state trajectory x :T conditioned on that we permanently observe ĉ t =? Since we can integrate out the control u t this is a simple Markov process with continuous state and Gaussian observation, P (x t x t ) = u du N(x t Ax t a u, Q) N(u, H - ) (3) = N(x t Ax t a, Q H - ). (32) Inference is a standard forward-backward process just as Kalman smoothing. The messages read µ xt- x t (x t ) = N(x t s t, S t ), s t = a t- A t- (S - t- R t- ) - (S - t-s t- r t- ) S t = Q t- H - t- A t- (S - t- R t- ) - A t- (33) µ xt x t (x t ) = N(x t v t, V t ), v t = A - t a t A - t (Vt - R t ) - (Vtv - t r t ) V t = A - t Q t H - t (Vt - R t ) - A - t (34) µĉt x t (x t ) = Nx t r t, R t, The potentials (v t, V T ) which define the backward message can also be expressed in a different way: let us define (35) V t = Vt - R t v t = Vtv - t r t, (36) (37) which corresponds to a backward message (in canonical representation) which has the cost message already absorbed. Using a special case of the Woodbury identity, (A - ) - = A A(A - ) - A, (38) the bwd messages can be rewritten as Vt - = A - V t Q H - - A = A Vt A K V t A K := A Vt V t (Q H - ) - - Vt - v t = A Vt a t A v t K V t a t K v t = A ( v t V t a t ) K( v t V t a t ) (4) V t = R t (A K) V t A (4) v t = r t (A K)( v t V t a t ) They correspond exactly to the Recatti euations (26), (27) except for the dependence on Q which interacts directly with the control cost metric H. Yet another way to write them is: Proof. V t = R t A I V t V t (Q H - ) - - Vt A (43) v t = r t A I V t V t (Q H - ) - - ( v t V t a t ) (44) Since all factors are pairwise we can use the expression (??) for the messages. We have µ xt- x t (x t) = R x t- dx t- P (x t x t-) µ xt-2 x t- (x t-) µĉt- x t- (x t-) = R x t- dx t- N(x t A t-x t- a t-, Q t-h - t-) N(x t- s t-, S t-) Nx t- r t-, R t- Using the product rule (4) on the last two terms gives a Gaussian N(s t- R - t-r t-, S t- R - t-) independent of x t which we can subsume in the normalization. What remains is µ xt- x t (x t) R x t- dx t- N(x t A t-x t- a t-, Q t-h - t-) Nx t- St-s t- r t-, St- - R t- = R x t- dx t- N(x t A t-x t- a t-, Q t-h - t-) N(x t- (St- - R t-) - (St-s t- r t-), (St- - R t-) - ) (39) (42)

Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, 2 5 = N(x t A t-(s - t- R t-) - (S - t-s t- r t-) a t-, Q t-h - t- A t-(s - t- R t-) - A t-) which gives the messages as in (33). For comparison we also give the canonical representation. Let S t- = St- - R t- and s t = St-s t- r t-, S t = Q t-h - t- A S- t- t- A t- = A t-{a - t-(q t-h - t-)a - - t- S t-}a t- St - = A - t-{ S t- S t- S t- A t-(q t-h - t-) - A t- S t-}a - t- s t = a t- A S- t- t- s t St - s t = A - S t- t-a - t-a t- A - t- s t A - S t- t- S t-a - t-a t- A - S t- t- s t = A - t-( s t S t-a - t-a t-) A - S t- t- ( s t S t-a - t-a t-) We repeat the derivation for µ xt x t (x t), µ xt x t (x t) = R x t dx t P (x t x t) µ xt2 x t (x t) µĉt x t (x t) = R x t dx t N(x t A tx t a t, Q th - t ) N(x t v t, V t) Nx t r t, R t R x t dx t N(x t A tx t a t, Q th - t ) Nx t Vtv - t r t, Vt - R t = N(A tx t a t (Vt - R t) - (Vtv - t r t), Q th - t (Vt - R t) - ) = N(x t A - t a t A - t (Vt - R t) - (Vtv - t r t), A - t Q th - t (Vt - R t) - A - t ) For this backward message it is instructive to derive the canonical representation. Let V t = Vt - R t and v t = Vtv - t r t, Vt - = A - V t Q H - - A = A VtA A Vt V t (Q H - ) - - VtA = A { V t V t V t (Q H - ) - - Vt}A v t = A - t a t A - - t V t v t Vt - v t = A Vta t A v t A Vt - Vta t A Vt - v t = A ( v t V ta t) A Vt - ( v t V ta t) 2.3 Special case: kinematic control We assume a kinematic control problem: (We write instead of x.) The Process is simply t = t u t ξ. (45) This means we have A t = t =, a t = (46) 2.4 Special case: multiple kinematic task variables Let φ i : x i be a kinematic mapping to a task variable y i and J i () its Jacobian. We assume we are given targets y i,:t in the task space and (time-dependent) precisions ϱ i,t by which we want to follow the task targets. We have c t ( t, u t ) = u t H R t = r t = u t H = u t H i= i= yi,t φ i ( t ) - i i= yi,t φ i (ˆ t ) J iˆ t J i t -, J i = J i (ˆ t ) i i= i= J i - i J i t J i - i J i t 2(y i,t φ i (ˆ t ) J iˆ t ) - i J i t const (47) J i - i (y i,t φ i (ˆ t ) J iˆ t ) (49) Note: the product of a fwd message with a task message corresponds to the classical optimal control for multiple regularized task variables (92) N( t s t, S t ) N t r t, R t N( t b, ) (5) - b = R t St - r t St - s t (5) (48)

Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, 2 6 2.5 Special case: Pseudo-dynamic process We replace x by t = t A and assume u t corresponds directly to accelerations: t P ( t t, t ) = N( t t τ t, W - ), P ( t t, u t ) = N( t t τu t, Q), t A = τ t A A τ 2 Au t ξ, dξdξ = t t τ A = τ A, = τ 2 A, a = (52) (53) W - A (54) Q The following euations might be computationally more efficient than the general Ricatti recursion but I m not sure, u t (x) = (H V t ) - (V t (Ax a) v t ) V t = v t = V t V 2 t Vt 3 Vt 4 v t vt 2 A = R V. τ V. V. V. 2 V. 3 τv. A τ 2 V. 3 τv. 3 V. 4 A(H τ 2 V. 4 ) - V. 3, τv. 3 V. 4 «, (56) A = r v. τ v A τ 2 V 3. τv. 3 V. 4 A(H τ 2 V. 4 ) - v. (57) u t (x) = (H τ 2 V 4. ) - (V t Ax v t ) 2.6 Special case: dynamic process We replace x by t = t A t P ( t t, t ) = N( t t τ t, W - ), P ( t t, u t ) = N( t t τm - (u t F ), Q), (6) t A = τ t A A τ 2 AM - (u t F ) ξ, dξdξ = W - A t τ Q (6) t A = τ A, = τ 2 M - τm - A, a = τ 2 M - F τm - A (62) F The following euations might be computationally more efficient than the general Ricatti recursion but I m not sure, u t (x) = (H V t ) - (V t (Ax a) v t ) V t = v t = V t V 2 t Vt 3 Vt 4 v t vt 2 A = R V. τ A = r v. τ V.. V 3 V. V 2 2V 2. F v 2V. 4 F 2τV 2 V 3. (55) (58) (59). τv. A τ 2 τv. 3 V. 4 AM - (H τ 2 M - V. 4 M - ) - M - V. 3, τv. 3 V. 4 (63). F V 3. A τ 2 τv. 3 V. 4 AM - (H τ 2 M - V. 4 M - ) - M - (v. 2τ V 2. F V. 4 F u t (x) = (H τ 2 M - V 4. M - ) - M - (V t (Ax a) v t ) (65) 2.7 Special case: multiple dynamic/kinematic task variables As before we have access to kinematic functions φ i () and Jacobians J i (). We are given task targets x i,:t and ẋ i,:t and want to follow them with (time-dependent) precisions ϱ i,:t and ν i,:t. We have c( t, t, u t ) = ϱ i,t x i,t φ i ( t ) 2 ν i,t ẋ i,t J i t 2 u th t u t i= ϱ i,t t J i J i t 2(x i,t φ i (ˆ t ) J iˆ t ) J i t const ν i,t t J i J i t 2(ẋ i,t) J i t const u th t u t i= A) (64) (66) «,

Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, 2 7 R t = r t = i= i= ϱ i,t J i J i ν i,t J i J i A ϱ i,t J i (x i,t φ i(ˆ t ) J iˆ t ) ν i,t J i ẋ A (68) i,t 3 The optimization view on classical control laws In this section we will first review classical control laws as minimization of a basic loss function. Since this loss function has a ayesian interpretation it will be straight-forward to develop also a ayesian view on these control laws. The ayesian inference approach can then be generalized to what we actually aim for: motion planning in temporal probabilistic models. (67) 3. General uadratic loss function and constraints Let y R d and R n. Given y, consider the problem of finding that minimizes L = W y J - 2h W (69) where W = W denotes a norm and and W are symmetric positive definite matrices. This loss function can be interpreted as follows: The first term measures how well a constraint y = J is fulfilled relative to a covariance matrix, the second term measures = with metric W, the third term measures the scalar product between and h w.r.t. W. The solution can easily be found by taking the derivative L = 2 W 2(y J) - J 2h W L = = (J - J W ) - (J - y W h) Using the Woddbury and related identities and definitions as given in section we can rewrite the solution in several forms: = (J - J W ) - (J - y W h) = (J - J W ) - J - y I n (J - J W ) - J - J h (7) = (J - J W ) - J - (y Jh) h (72) = W - J (JW - J ) - y I n W - J (JW - J ) - J h (73) = W - J (JW - J ) - (y Jh) h. (74) This also allows us to properly derive the following limits: : = J W y (I n J W J) h = J W (y Jh) h (75) W : = J y (I n J J) h = J (y Jh) h W = λi n : = J (JJ λ) - y I n J (JJ λ) - J h (76) = σi d : = (J J σw ) - J y I n (J J σw ) - J J h These limits can be interpreted as follows. : we need to fulfill the constraint y = J exactly. = σi d : we use a standard suared error measure for y J. W : we do not care about the norm W (i.e., no regularization); but interestingly, the cost term h W has a nullspace effect also in this limit. W = λi n : we use a standard ridge as regulariser. The first of these limits is perhaps the most important. It corresponds to a hard constraint, that is, (75) is the solution to argmin W 2h W such that y = J (77) The loss function (69) has many applications, as we discuss in the following. (7)

Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, 2 8 3.2 Ridge regression Let us first give an off topic example from machine learning: In ridge regression, when we have d samples of n-dimensional inputs and D outputs, we have a minimization problem L = y Xβ λ β with a input data matrix X R d n, an output data vector y R d and a regressor β R n. The first term measures the standard suared error (with uniform output covariance = I d ), the second is a regulariser (or stabilizer) as introduced by Tikhonov. The special form λ β of the regulariser is called ridge. The solution is given by euation (76) when replacing the notation according to β, y y, J X, I d, W λi n, h : β = X (XX λi d ) - y. In the ayesian interpretation of ridge regression, the ridge λ β defines a prior exp{ 2λ β } over the regressor β. The above euation gives the MAP β. Since ridge regression has a ayesian interpretation, also standard motion rate control, as discussed shortly, will have a ayesian interpretation. 3.3 Motion rate control (pseudo-inverse kinematics) onsider a robot with n DoFs and a d-dimensional task space with d < n (e.g., an endeffector state). The current joint state is R n. In a given state we can compute the end-effector Jacobian J and we are given a joint space potential H(). We would like to compute joint velocities which fulfill the task constraint ẏ = J while minimizing the absolute joint velocity W and following the negative gradient h = W - H(). In summary, the problem and its solution are (problem) = argmin W 2 H() such that J = ẏ (78) (solution) = J W ẏ (I n J W J) W - H(). (79) The solution was taken from (75) by replacing the notation according to, y ẏ. Note that we have derived pseudo-inverse kinematics from a basic constrained uadratic optimization problem. Let us repeat this briefly for case when time is discretized. We can formulate the problem as t = argmin t t- W 2h W t such that φ( t ) = y t (8) t Generally, the constraint φ( t ) = y t is non-linear. We linearize it at t- and get the simpler problem and its solution (problem) t = argmin t t- W 2h W t such that J( t t- ) = y t φ( t- ) (8) t (solution) t = t- J W y t φ( t- ) (I n J W J) h. (82) The solution was taken from (75) by replacing the notation according to ( t t- ), y (y t φ( t- )). 3.4 Regularized inverse kinematics (singularity robust motion rate control) Under some conditions motion rate control is infeable, for instance when the arm cannot be further streched to reach a desired endeffector position. In this case the computation of the pseudo-inverse J W becomes singular. lassical control developed the singularity robuse pseudo-inverse (Nakamura & Hanafusa, 986), which can be interpreted as regularizing the computation of the pseudo-inverse, or as relaxing the hard task constraint. In our framework this corresponds to not taking the limit. This regularized inverse kinematics is given as (problem) = argmin W ẏ J - 2h W (83) (solution) = JWẏ (I n J W J) h. (84) J W := (J - J W ) - J - = W - J (JW - J ) - (85) The solution was taken from (7) by replacing the notation according to, y ẏ. Note that J W is a regularization of J W (defined in ()). Euations (7-74) give many interesting alternatives to write this control law. The linearized ( ˆφ is linearized at t- as above) time discretized version is: (problem) t = argmin t t- W y t ˆφ( t ) - 2h W t (86) t (solution) t = t- J W y t φ( t- ) (I n J W J) h. (87)

Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, 2 9 3.5 Multiple regularized task variables Assume we have m task variables y,.., y m, where the ith variable y i R di is d i -dimensional. Also assume that we regularize w.r.t. each task variable, that is, we have different error metrics i - in each task space. We want to follow all of the tasks and express this as the optimization problem and its solution (problem) = argmin W i= i= m (solution) = J i - J W ẏ i J - 2h W i - m i= (88) J i - ẏ i W h. (89) The solution was taken from (7) in the following way: We can collect all task variables into one bit task vector y = y. A y m, J = J. A J m, =... m A J - J = i= J i - i J i, J - ẏ = i= J i - i ẏ i (9) And the linearized time discretized version: (problem) t = argmin t t- W t m (solution) t = t- J i - J W i= i= y i,t ˆφ i ( t ) - 2h W i - m i= (9) J i - y i,t φ( t- ) W h. (92) 3.6 Multiple prioritized task variables (prioritized inverse kinematics) The case of multiple task variables is classically not addressed by regularizing all of them, but by imposing a hierarchy on them (Nakamura et al., 987; aerlocher & oulic, 24). Let us first explain the classical prioritized inverse kinematics: The control law is based on standard motion rate control but iteratively projects each desired task rate ẏ i in the remaining nullspace of all higher level control signals. Initializing the nullspace projection with N = I and =, the control law is defined by iterating for i =,.., m J i = J i N i-, i = i- J i (ẏ i J i i- ), N i = N i- J i J i. (93) We call J i a nullspace Jacobian which has the property that J i projects to changes in that do not change control variables x,..,i- with higher priority. Also an additional nullspace movement h in the remaining nullspace of all control signals can be included when defining the final control law as = m N m h. (94) In effect, the first task rate ẋ is guaranteed to be fulfilled exactly. The second ẋ 2 is guaranteed to be fulfilled as best as possible given that ẋ must be fulfilled, et cetera. This hierarchical projection of task can also be derived by starting with the regularize task variables as in the problem (88) and then iteratively taking the limit i starting with i = up to i = m. More formally, the iterative limit corresponds to i = ɛ m i I di and ɛ. For m = 2 task variables one can prove the euivalence between prioritized inverse kinematics and the hierarchical classical limit of the MAP motion exactly (by directly applying the Woodbury identity). For m > 2 we could not find an elegant proof but we numerically confirmed this limit for up to m = 4. Non-zero task variances can again be interpreted as regularizers. Note that without regularizers the standard prioritized inverse kinematics is numerically brittle. Handling many control signals (e.g., the over-determined case di >n) is problematic since the nullspace-projected Jacobians will become singular (with rank < d i ). For non-zero regularizations i the computations in euation (92) are numerically robust. 3.7 Optimal dynamic control (incl. operational space control) onsider a robot with dynamics = M - (u F ), where M is some generalized mass matrix, F subsumes external (also oriolis and gravitational) forces, and u is the n-dimensional torue control signal. We want to compute a

Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, 2 control signal u which generates an acceleration such that a general task constraint ÿ = J J remains fulfilled while also minimizing the absolute norm u H of the control. The problem and its solution can be written as (problem) u = argmin u u H 2h Hu such that ÿ J JM - (u F ) = (95) (solution) u = T H (ÿ J T F ) (I n T H T ) h, with T = JM-. (96) The solution was taken from (75) by replacing the notation according to u, y (ÿ J JM - F ), J JM -. For h = this solution is identical to Theorem in (Peters et al., 25). Peters et al. discuss in detail stability issues and important special cases of this control scheme. A common special case is H = M -, which is called operational space control. References aerlocher, P., & oulic, R. (24). An inverse kinematic architecture enforcing an arbitrary number of strict priority levels. The Visual omputer. Nakamura, Y., & Hanafusa, H. (986). Inverse kinematic solutions with singularity robustness for robot manipulator control. Journal of Dynamic Systems, Measurement and ontrol, 8. Nakamura, Y., Hanafusa, H., & Yoshikawa, T. (987). Task-priority based redundancy control of robot manipulators. Int. Journal of Robotics Research, 6. Peters, J., Mistry, M., Udwadia, F. E., ory, R., Nakanishi, J., & Schaal, S. (25). A unifying framework for the control of robotics systems. IEEE Int. onf. on Intelligent Robots and Systems (IROS 25) (pp. 824 83).