Real Time Stochastic Control and Decision Making: From theory to algorithms and applications Evangelos A. Theodorou Autonomous Control and Decision Systems Lab
Challenges in control Uncertainty Stochastic uncertainty Parametric uncertainty Stochastic and parametric uncertainty Total uncertainty Scalability State space Control parameterization Computational Efficiency Real time Few data/interactions Optimality Global optimality 2
First principles Stochastic Control Theory Statistical Physics Richard Bellman Richard Feynman Lev Pontryagin Parallel Computation 8 Machine Learning Velocity at 6 m/s and 8 m/s Targets Mark Kac 7 Velocity (m/s) 6 5 4 3 2 0 8 7 6 5 4 3 2 U x U y k(u x,u y)k 6 m/s Target 8 m/s Target 0 0 20 40 60 80 00 20 Time (s)
Autonomy Aerospace Engineering Lineariza8on & Uncertainty Representa8on Machine Learning Control Theory & Dynamical Systems Stochas8c Control (SC) Sampling, Feynman-Kac & Informa8on Theore8c Physics & Mechanics Robo8cs Applica8on to Neuroscience Bio-control Theore8cal Areas Neuroscience & Bioinspired Robo8cs
Dynamic Programming Problem Formulation: Cost under minimization: V (x,t i )=min u Running Cost: E Q apple (x,t N )+ Z tn L(x, u,t)=q 0 (x,t)+q (x,t)u + 2 ut Ru Controlled Diffusion Dynamics: L(x, u,t)dt dx = F (x, u)dt + B(x)d! t i Noise F(x, u) =f(x)+g(x)u Control E Q : Expectation w.r.t controlled dynamics E P : Expectation w.r.t uncontrolled dynamics Bellman Principle Discrete: Cost to go(current state) = min[cost to reach(next state) + Cost to go(next state)] V (x ) Start Stage V (x 2 ) V (x 3 ) 2 3 Stage 2 V (x 4 ) 4 5 V (x 5 ) Stage3 L 4,6 L 4,7 L 4,8 6 V (x 6 ) 7 V (x 7 ) 8 V (x 8 ) Stage 4 L 6,9 L 7,9 L 8,9 9 Goal
Dynamic Programming Problem Formulation: Cost under minimization: V (x,t i )=min u Running Cost: E Q apple (x,t N )+ Z tn L(x, u,t)=q 0 (x,t)+q (x,t)u + 2 ut Ru Controlled Diffusion Dynamics: L(x, u,t)dt dx = F (x, u)dt + B(x)d! t i Noise F(x, u) =f(x)+g(x)u Control E Q : Expectation w.r.t controlled dynamics E P : Expectation w.r.t uncontrolled dynamics Bellman Principle Discrete: Cost to go(current state) = min[cost to reach(next state) + Cost to go(next state)] V (x ) Start Stage V (x 2 ) 4 V (x 5 ) V (x 6 ) L 6,9 ) Backward Process L L 7 7,9 2) Partial Differential Equation 4,8 hard to solve V (x 3 ) 2 3 Stage 2 V (x 4 ) 5 Stage3 L 4,6 L 4,7 6 V (x 7 ) 8 V (x 8 ) Stage 4 L 8,9 9 Goal
Making Control Linear Hamilton Jacobi Bellman: @ t V = q +(r x V ) T f 2 (r xv ) T GR G T (r x V )+ 2 tr (r xxv )BB T Desirability: (x,t)= V (x,t) Noise regulates control authority: G(x)R G(x) T = B(x)B(x) T Chapman Kolmogorov: @ t = q + f T (r x )+ 2 tr (r xx )BB T Statistical Physics Feynman - Kac Sample from the uncontrolled dynamics: dx = f(x)dt + B(x)d! Evaluate the ectation: Find optimal Controls: u PI (x) = apple (x,t i )=E P R q (x,t) Z tn t i q(x)dt G(x) T r x (x,t) (x,t) (x,t N ) Push the dynamics to states with high desirability. Richard Feynman Mark Kac
Making Control Linear Hamilton Jacobi Bellman: @ t V = q +(r x V ) T f 2 (r xv ) T GR G T (r x V )+ 2 tr (r xxv )BB T Desirability: (x,t)= V (x,t) Noise regulates control authority: G(x)R G(x) T = B(x)B(x) T Chapman Kolmogorov: @ t = q + f T (r x )+ 2 tr (r xx )BB T Statistical Physics Feynman - Kac Sample from the uncontrolled dynamics: dx = f(x)dt + B(x)d! Evaluate the ectation: (x,t i )=E P apple Z tn q(x)dt (x,t N ) Richard Feynman t i Mark Kac
Making Control Linear Hamilton Jacobi Bellman: @ t V = q +(r x V ) T f 2 (r xv ) T GR G T (r x V )+ 2 tr (r xxv )BB T Desirability: (x,t)= V (x,t) Noise regulates control authority: G(x)R G(x) T = B(x)B(x) T Chapman Kolmogorov: @ t = q + f T (r x )+ 2 tr (r xx )BB T Statistical Physics Feynman - Kac Sample from the uncontrolled dynamics: dx = f(x)dt + B(x)d! Evaluate the ectation: Lower bound: apple (x,t i )=E P apple V (x,t i )= log E P apple apple E Q (x,t N )+ Z tn t i Z tn t i Z tn t i q(x,t)dt L(x, u,t)dt q(x)dt (x,t N ) (x,t N ) Richard Feynman Mark Kac
Generalized Path Integral Control Path Integral Control: Local Controls: u L (x ti,t i )=R G(x ti ) T G(x ti )R G(x ti ) T G(x ti )dw(t i ) Y Trajectory Probability: P(~x) = S(~x) R S(~x) d~x Intuition... Start dw () t 0 dw (M) t 0 dw (2) t 0 S 2 (~x) S (~x) S M (~x) Goal u = X i P i dw (i) t o P i = P ( S i(~x)) i ( S i(~x)) Sample from the uncontrolled dynamics: 8 E.Theodorou, J. Buchli,S. Schaal. AISTATS2009, JMLR200
Iterative Path Integral Control Sampling with uncontrolled dynamics: (x,t i )= Z q(xt,t)dt (x,t N )dq Uncontrolled Sampling with controlled dynamics: (x,t i )= Z q(xt,t)dt (x,t N ) dq Uncontrolled dp Controlled dp Controlled Find optimal Controls: u PI (x) = R q (x,t) G(x) T r x (x,t) (x,t) Radon Nikodym derivative Iterative Path Integral Control: u new (t k )=u old (t k )+ X i P i dw (i) (t k ) Path Probability: P i = P i S(~x i ) + correction S(~x i ) + correction 0 E.Theodorou, J. Buchli,S. Schaal JMLR200 E.Theodorou, E. Todorov. CDC202
Overview Optimal Control Theory Constrained Optimization Dynamic Programming Hamilton Jacobi Bellman PDE V (x,t) V (x,t)= log (x,t) Backward Chapman Kolmogorov (x,t) Feynman Kac Lemma Fundamental Bound E. Theodorou, E Todorov. CDC 202 E. Theodorou, Entropy 205
Information Theoretic View Free Energy: F = log Z J(x) dp(!) Generalized Entropy: S(Q P) = Z dq(!) dp(!) log dq(!) dp(!) dp(!) log Z apple J(x) dp(!) apple E Q J(x) + KL Q P Free Energy apple Work Temperature Generalized Entropy Optimal Measure: dq = R J(x) dp J(x) dp
Non-Classical View Uncontrolled dynamics: Controlled dynamics: P :dx = f(x)dt + p B(x)d! (0) Q :dx = f(x)dt + B(x) udt + p d! () Fundamental Relationship: = apple log E P J (x) apple apple E Q J (x) + KL Q P Radon Nikodym: Z dq tn dp = 2 t i u T udt + p Z tn t i u T dw () (t) Fundamental Relationship: (x,t i )= apple log E P J (x) apple E Q applej (x)+ Z tn t i 2 ut udt Free Energy Cost Function There is a lower bound in the cost function How to find this lower bound Permits generalizations to other classes of stochastic dynamics.
Non-Classical View Basic Relationship: (x,t i )= Z log (x,t) apple J(x) dp(!) apple E Q J(x)+ Z tn t i 2 ut Rudt Backward Chapman Kolmogorov: @ t = q 0 + f T (r x )+ 2 tr (r xx )BB T Exponential Transformation: (x,t)=( (x,t)) Hamilton Jacobi Bellman-PDE: @ t = q 0 +(r x ) T f 2 (r x ) T BB T (r x ) + 2 tr (r xx )BB T (x,t) : is a Value function (x,t) : is a desirability function
Overview Optimal Control Theory Constrained Optimization Dynamic Programming Information Theory Free Energy & Relative Entropy Jensens Inequality & Girsanov Theorem Hamilton Jacobi Bellman PDE V (x,t) Fundamental Bound V (x,t)= log (x,t) (x,t)= (x,t) V (x,t)= (x,t) Feynman Kac Lemma Backward Chapman Kolmogorov (x,t) Backward Chapman Kolmogorov (x,t) Feynman Kac Lemma (x,t)= log (x,t) Fundamental Bound Hamilton Jacobi Bellman PDE E. Theodorou, E Todorov, CDC 202. E. Theodorou, Entropy 205.
Non-Classical View Fundamental Relationship: Z log apple J(x) dp(!) apple E Q J(x) + KL Q P Uncontrolled dynamics: P :dx = F(x, 0)dt + C(x)dw (0) Controlled dynamics: Q :dx = F(x, u)dt + C(x)dw () Grady et all, ICRA 206.
Non-Classical View Fundamental Relationship: Z log apple J(x) dp(!) apple E Q J(x) + KL Q P Uncontrolled dynamics: P :dx = F(x, 0)dt + C(x)dw (0) Controlled dynamics: Q :dx = F(x, u)dt + C(x)dw () Optimal Measure: dq = R J(x) dp J(x) dp New Optimal Control Formulation: u = argmin KL Q Q(u) Grady et all, ICRA 206.
Non-Classical View Fundamental Relationship: Z log apple J(x) dp(!) apple E Q J(x) + KL Q P Uncontrolled dynamics: P :dx = F(x, 0)dt + C(x)dw (0) Controlled dynamics: Q :dx = F(x, u)dt + C(x)dw () Optimal Measure: dq = R J(x) dp J(x) dp New Optimal Control Formulation: u = argmin KL Q Q(u) Optimal Measure: dq dp Radon Nikodym: dp dq(u) Grady et all, ICRA 206.
Non-Classical View Fundamental Relationship: Z log apple J(x) dp(!) apple E Q J(x) + KL Q P Uncontrolled dynamics: P :dx = F(x, 0)dt + C(x)dw (0) Controlled dynamics: Q :dx = F(x, u)dt + C(x)dw () Optimal Measure: dq = R J(x) dp J(x) dp New Optimal Control Formulation: u = argmin KL Q Q(u) Grady et all, ICRA 206.
Non-Classical View Fundamental Relationship: Z log apple J(x) dp(!) apple E Q J(x) + KL Q P Uncontrolled dynamics: P :dx = F(x, 0)dt + C(x)dw (0) Controlled dynamics: Q :dx = F(x, u)dt + C(x)dw () Optimal Measure: dq = R J(x) dp J(x) dp New Optimal Control Formulation: u = argmin KL Q Q(u) Control parameterization: 8 u 0 if 0 apple t< t u if t apple t<2 t >< u t =. u j if j t apple t<(j + ) t >:. Grady et all, ICRA 206.
Non-Classical View Fundamental Relationship: Z log apple J(x) dp(!) apple E Q J(x) + KL Q P Uncontrolled dynamics: P :dx = F(x, 0)dt + C(x)dw (0) Controlled dynamics: Q :dx = F(x, u)dt + C(x)dw () Optimal Measure: dq = R J(x) dp J(x) dp New Optimal Control Formulation: u = argmin KL Q Q(u) Control parameterization: 8 u 0 if 0 apple t< t u if t apple t<2 t >< u t =. u j if j t apple t<(j + ) t >:. Generalized Importance Sampling: E P apple f(x) = E Qm, applef(x) dp dq m, Grady et all, ICRA 206.
MPPI with offline model learning 7~8 m/sec 20
Y Position (m) 0 5 0 5 0 0 5 0 5 0 Applications in Robotics-High Speed MPPI with offline learned dynamics Paths Taken for 6 m/s and 8 m/s Targets 20 5 0 5 0 5 X Position (m) 6 m/s Target 8 m/s Target Position Error (m) Heading Error (rad) 4.0 3.5 3.0 2.5 2.0.5.0 0.5 P x P y 0.0 0.0 0.5.0.5 2.0 2.5 4.0 3.5 3.0 2.5 2.0.5.0 0.5 0.0 0.0 0.5.0.5 2.0 2.5 Multi-Step Prediction Error.8.6.4.2.0 0.8 0.6 0.4 0.2 U x U y Velocity Error (m/s) 0.0 0.5 0.0 0.5.0.5 2.0 2.5.0 0.8 0.6 0.4 Time (s) Hea. Rate Error (rad/s) 0.2 0.0 0.0 0.5.0.5 2.0 2.5 Controlled dynamics: f i (x) N (µ i (x), 2 i (x)) Mean: µ i (x) = 2 n ( x) T A i ( X)Y i Variance: 2 i ( x) = ( x) T A i ( x) A i = 2 n ( x) ( x) T + p
Applications in Robotics-MPPI with online model learning 8~9 m/sec
Model Predictive Path Integral Control using Artificial Neural Networks ~ m/sec 23
Applications in Multi-vehicle
Applications in Robotics-Comparisons Average Running Cost 30 25 20 5 0 DDP Solution = 50 = 00 = 50 = 300 5.0.5 2.0 2.5 3.0 3.5 4.0 Number of Rollouts (Log Scale) 6 4 2 0 2 4 6 5 0 5 0 5 0 5 DDP Cornering Maneuver MPC-DDP 6 4 2 0 2 4 6 5 0 5 0 5 0 5 MPPI Cornering Maneuver MPPI
Applications in Multi-vehicle Success Rate 00 80 60 40 20 Static Obstacles Moving Obstacles 9 Quadrotors Optimization Iteration Time (ms) 250 200 50 00 50 Quadrotor 3 Quadrotors 9 Quadrotors Real-Time CUDA Optimization Times 0 0. 0.5.0 2.0 3.0 4.0 5.0 6.0 Rollouts (Thousands) 0 0 2000 4000 6000 8000 0000 Number of Rollouts
Learning and optimization in different Time scales u(x,t,) = (x,t) (k) PI (t i) (k) PI (t i+) (k) PI (t N ) u(x,t,) = (x)(t) 32
Applications in Robotics-Manipulation
Applications in Robotics-Manipulation
Nonlinear Feynman-Kac Cost Function: apple J(,x ; u( )) = E g(x(t )) + Z T q 0 (t, x(t)) + q (t, x(t)) > u(t) dt Stochastic Dynamics: dx(t) =f(t, x(t))dt + G(t, x(t))u(t)dt + (t, x(t))dw t Hamilton-Jacobi-Bellman: v t +inf u2u 2 tr(v xx > )+v > x f + v > x G + q > D(sgn(u)) u + q 0 =0 Hamilton-Jacobi-Bellman: v t + 2 tr(v xx > )+v > x f + q 0 + X i= min (v > x G + q > i umax i, 0, (v > x G q > i umin i =0
Nonlinear Feynman-Kac Forward SDE: dx s = b(s, X s )ds + (s, X s )dw s Backward SDE: dy s = h(s, Xs t,x,y s,z s )ds + Z s > dw s b(t, x) f(t, x) h(t, x, z) q 0 (t, x)+ X i= min (z > + q > i umax i, 0, (z > q > i umin i Importance Forward SDE: d X s =[b(s, X s )+ (s, X s )K s ]ds + (s, X s )dw s Compensated Backward SDE: dỹs =[ h(s, X s, Ỹs, Z s )+ Z > s K s ]ds + Z > s dw s
Nonlinear Feynman-Kac Forward SDE: dx s = b(s, X s )ds + (s, X s )dw s Backward SDE: dy s = h(s, Xs t,x,y s,z s )ds + Z s > dw s b(t, x) f(t, x) h(t, x, z) q 0 (t, x)+ X i= min (z > + q > i umax i, 0, (z > q > i umin i Importance Forward SDE: d X s =[b(s, X s )+ (s, X s )K s ]ds + (s, X s )dw s Compensated Backward SDE: dỹs =[ h(s, X s, Ỹs, Z s )+ Z > s K s ]ds + Z > s dw s v t + 2 tr(v xx > )+v > x (b + K)+h(t, x, v, > v x ) v > x K =0
Nonlinear Feynman-Kac Cost Mean and Variance 70 Cost per iteration 60 Cost (mean + 3 std dev) 50 40 30 20 0 0 0 5 0 5 Iteration Mean of the Controlled System Trajectories of each Iteration 6 Cost Mean 5 Cost Variance 6 6 5 5 4 4 3 0 4 2 0 2 5 Angle 3 2 Angular Vel. 2 4 0 Det. Open Loop Det. Feedback Stoch. Feedback 0 Det. Open Loop Det. Feedback Stoch. Feedback Stochastic Cooperative Games Risk Sensitive Stochastic Control affine, State/Control Multiplicative 0 0 0.5.5 Time t 6 8 0 2 0 0.5.5 Time t
Statistical Physics Summary Control Theory Richard Feynman Richard Bellman Mark Kac Parallel Computation Neuro-Morphic Velocity (m/s) 8 7 6 5 4 3 2 0 8 7 6 5 4 3 2 Machine Learning Ux Uy Velocity at 6 m/s and 8 m/s Targets k(ux,uy)k 0 0 20 40 60 80 00 20 Time (s) 6 m/s Target 8 m/s Target Lev Pontryagin
Autonomous Control and Decision Systems Lab My#world#is#delayed# My#world#is#quantum#mechanical# What#these#guys#are# talking#about?# My#world#is#infinite#dimensional# My#world#is#Varia-onal# My#world#is#parallel## My#world#is# probabilis-c# My#world#is#mul-ple# Collaborators: Sponsors: 40
Thanks!