Bayesian Decision Theory in Sensorimotor Control

Similar documents
Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Controlled Diffusions and Hamilton-Jacobi Bellman Equations

Robotics. Control Theory. Marc Toussaint U Stuttgart

Inverse Optimality Design for Biological Movement Systems

Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation

Steady State Kalman Filter

Optimal Control Theory

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

Hamilton-Jacobi-Bellman Equation Feb 25, 2008

6 OUTPUT FEEDBACK DESIGN

Stochastic and Adaptive Optimal Control

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10)

Lecture 5 Linear Quadratic Stochastic Control

Pontryagin s maximum principle

Lecture 5: Control Over Lossy Networks

6.241 Dynamic Systems and Control

Partially Observable Markov Decision Processes (POMDPs)

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28

Linearly-Solvable Stochastic Optimal Control Problems

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011

Deterministic Dynamic Programming

EE C128 / ME C134 Final Exam Fall 2014

1 Kalman Filter Introduction

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Path Integral Stochastic Optimal Control for Reinforcement Learning

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

Optimal Control with Learned Forward Models

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

= m(0) + 4e 2 ( 3e 2 ) 2e 2, 1 (2k + k 2 ) dt. m(0) = u + R 1 B T P x 2 R dt. u + R 1 B T P y 2 R dt +

LQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin

Optimal control and estimation

CALIFORNIA INSTITUTE OF TECHNOLOGY Control and Dynamical Systems. CDS 110b

Reinforcement Learning In Continuous Time and Space

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Practical numerical methods for stochastic optimal control of biological systems in continuous time and space

Topic # Feedback Control Systems

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Stochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System

Stochastic optimal control methods for uncertain predictive reaching movements

Final Exam Solutions

Grundlagen der Künstlichen Intelligenz

State estimation and the Kalman filter

ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming

Organization. I MCMC discussion. I project talks. I Lecture.

D(s) G(s) A control system design definition

Miscellaneous. Regarding reading materials. Again, ask questions (if you have) and ask them earlier

State Regulator. Advanced Control. design of controllers using pole placement and LQ design rules

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:

SYSTEMTEORI - KALMAN FILTER VS LQ CONTROL

Lecture 6: Bayesian Inference in SDE Models

CALIFORNIA INSTITUTE OF TECHNOLOGY Control and Dynamical Systems. CDS 110b

EL2520 Control Theory and Practice

Basics of reinforcement learning

Optimization-Based Control

Problem 1 Cost of an Infinite Horizon LQR

Lecture 1: Pragmatic Introduction to Stochastic Differential Equations

Machine Learning 4771

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Automatic Control II Computer exercise 3. LQG Design

Gaussians. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Markov Chain Monte Carlo Methods for Stochastic

Lecture 9 Nonlinear Control Design

Linear Dynamical Systems

MATH4406 (Control Theory) Unit 1: Introduction Prepared by Yoni Nazarathy, July 21, 2012

Homework Solution # 3

Optimization-Based Control

Alberto Bressan. Department of Mathematics, Penn State University

Stochastic Optimal Control!

A Crash Course on Kalman Filtering

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Time Series Analysis

ELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems

Static and Dynamic Optimization (42111)

EECS C128/ ME C134 Final Wed. Dec. 15, am. Closed book. Two pages of formula sheets. No calculators.

Solution of Stochastic Optimal Control Problems and Financial Applications

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Uniformly Uniformly-ergodic Markov chains and BSDEs

Lecture 19 Observability and state estimation

Kalman Filter. Man-Wai MAK

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods

The Kalman Filter ImPr Talk

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form

Lecture Notes: (Stochastic) Optimal Control

Topic # /31 Feedback Control Systems. Analysis of Nonlinear Systems Lyapunov Stability Analysis

Nonlinear Observer Design for Dynamic Positioning

Nonlinear Model Predictive Control Tools (NMPC Tools)

ECE557 Systems Control

Reinforcement Learning with Reference Tracking Control in Continuous State Spaces

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory

EL 625 Lecture 10. Pole Placement and Observer Design. ẋ = Ax (1)

Computational Issues in Nonlinear Dynamics and Control

Theory and Implementation of Biomimetic Motor Controllers

Robust control and applications in economic theory

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Observability for deterministic systems and high-gain observers

EE221A Linear System Theory Final Exam

Neural Networks Lecture 10: Fault Detection and Isolation (FDI) Using Neural Networks

Transcription:

Bayesian Decision Theory in Sensorimotor Control Matthias Freiberger, Martin Öttl Signal Processing and Speech Communication Laboratory Advanced Signal Processing Matthias Freiberger, Martin Öttl Advanced Signal Processing page 1/88

Outline Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References Matthias Freiberger, Martin Öttl Advanced Signal Processing page 2/88

Intro - What is sensorimotor control? sen so ri mo tor: (adj.) Of, relating to, or involving both sensory and motor activity: sensorimotor nerve centers; sensorimotor pathways. The American Heritage Dictionary of the English Language, Fourth Edition Movement is the only way for humans to interact with the world. All communication including speech, sign language, gestures and writing, is mediated by the motor system. http://www.pom.cam.ac.uk/research/sensorimotor.html Matthias Freiberger, Martin Öttl Advanced Signal Processing page 3/88

Intro - What is sensorimotor control? We want to understand/describe by application methods from computer science and control theory how.. human beings are able to play back a tennis ball.. or grab a bottle of water and drink.. birds of prey are capable of catching a mouse in flight.. basically how any kind of physical interaction with the environment is performed by biological systems, pursuing a certain objective while permanently performing corrections using sensor input Matthias Freiberger, Martin Öttl Advanced Signal Processing page 4/88

Intro - Challenges Action selection is a fundamental decision process CNS sends constantly sends motor commands to the muscles At each point in time: the appropriate motor command needs to be selected Knowledge about the environment needs to be combined with actual observation data and knowledge about cost/reward of currently possible actions to make optimal decisions. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 5/88

Intro - Schematic Control Flow Matthias Freiberger, Martin Öttl Advanced Signal Processing page 6/88

Intro - Uncertainty of human sensorium Human sensorium is plagued by noise Muscle output is noisy as well Therefore state of environment/body needs to be estimated Additionally the cost of each movement shall be minimized Bayesian statistics come in as a powerful way to deal with the uncertainty of the human sensorium Matthias Freiberger, Martin Öttl Advanced Signal Processing page 7/88

Intro - Bayesian integration CNS needs to integrate prior knowledge about environment with knowledge obtained from sensory data to estimate the state of the environment optimally When estimating bounce location of a tennis ball: ball might be more likely to bounce off at edges of court Matthias Freiberger, Martin Öttl Advanced Signal Processing page 8/88

Intro - Bayesian Cue Combination Combination of sensor signals for better estimates Combination of different sensor modalities (e.g. Vision and Proprioception) Combination of signal of same modality (several visual cues to a stereo image... ) Cues need to be weighted against each other Matthias Freiberger, Martin Öttl Advanced Signal Processing page 9/88

Intro - Bayesian Cue Combination Given a set of observations from different cues d 1, d 2, d 3,..., d n under the assumption that cues are independent from each other we can rewrite the likelihood P (d 1, d 2, d 3,..., d n ) as P (d 1, d 2, d 3,..., d n s) = n P (d k s) (1) k=1 Therefore we can rewrite the corresponding posterior probability: P (s d 1, d 2, d 3,..., d n ) = P (s) n k=1 P (d k s) P (d 1, d 2, d 3,..., d n ) (2) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 10/88

Intro - Cost Functions Model how good or bad the outcome of a particular move is Seems reasonable to minimize consumed energy and strain on muscles Several cost functions have been proposed (smoothness,precision) CNS also adapts very well to external cost functions Matthias Freiberger, Martin Öttl Advanced Signal Processing page 11/88

Intro - Cost Functions Actual cost function of human movement can be inferred using indifference lines Utility function can be found from these lines : compare points from lines,and assigning utilities to lines Matthias Freiberger, Martin Öttl Advanced Signal Processing page 12/88

Intro - Cost Functions Matthias Freiberger, Martin Öttl Advanced Signal Processing page 13/88

Intro - Cost Functions Given a set of possible actions X and a set of possible outcomes O, as well as a utility function U(o) : O R, for any x X we can compute the expected utility E{U} = O P (o x) U(o) (3) Therefore the optimal decision in respect to the cost function U(o) is considered to be the one which maximizes the expected utility E{U}. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 14/88

Outline Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References Matthias Freiberger, Martin Öttl Advanced Signal Processing page 15/88

Optimal Estimation Intro Until now Find the optimal action for a finite amount of actions But the world is continious... Actual continuos state of our body parts has to be estimated permanently, optimal actions according to state estimation need to be found. In control terms... We need to model ourselves an observer, which estimates the inner state (e.g the position and velocity) of our limbs Matthias Freiberger, Martin Öttl Advanced Signal Processing page 16/88

Optimal Estimation Experiment Experiment setup Test subjects had to estimate the location of their thumb after moving their arm Resistive or assistive force has been added by torque motors Hand is constrained to move on a straight line Arm is illuminated for 2s, to give an initial state After that, participants have to rely solely on proprioception Matthias Freiberger, Martin Öttl Advanced Signal Processing page 17/88

Optimal Estimation Experiment Experiment setup Matthias Freiberger, Martin Öttl Advanced Signal Processing page 18/88

Optimal Estimation Models A system that mimics the behavior of a natural process, is called an internal model Internal models are an important concept in motor control Basically, two classes of internal models can be distinguished: forward models and backward models Matthias Freiberger, Martin Öttl Advanced Signal Processing page 19/88

Optimal Estimation Internal models: forward vs. backward Forward models Mimic the causal flow of a process by predicting its next state Comes up natural since delays in most sensorimotor loops are large,feedback control may be too slow for rapid movements Key indegredient in systems that use motor outflow (efference copy) Backward models Estimate the appropriate motor command which caused a particular state transition Matthias Freiberger, Martin Öttl Advanced Signal Processing page 20/88

Optimal Estimation Internal models: forward vs. backward How do we optimally model our limbs now? Wolpert et. al. used a forward model incorparating a correction term for the given problem. State estimation for a system containing noise is a complex task We will follow an intuitive approach by modeling an observer for a deterministic system first From our deterministic observer, we will perform the transition to a Probabilistic Observer ( Kalman Filter) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 21/88

Optimal Estimation - The Plant Model arm as a damped mass system State model ẋ = Ax + bu y = c T x + du State variables state update equation model for sensory output x x 1 position of the mass (hand) u(t) applied force x 2 velocity of the mass (hand) x y(t) sensory output Matthias Freiberger, Martin Öttl Advanced Signal Processing page 22/88

Optimal Estimation - The Plant Model parameters ( ) 0 1 A = 0 β m ( ) 1 c = 0 ) ( 0 b = 1 m ( ) 0 d = 0 m β mass of hand damping parameter Matthias Freiberger, Martin Öttl Advanced Signal Processing page 23/88

TU Graz - Signal Processing and Speech Communication Laboratory Optimal Estimation- Luenberger Observer Observer Model u(t) ẋ = Ax+bu y(t) Observer Ansatz for the Luenberger Observer ˆx = ˆx + ˆb 1 u + ˆb 2 y (4) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 24/88

Optimal Estimation- Luenberger Observer Derivation Error constraint e(t) = x(t) ˆx(t) lim e(t) = 0 t ė = ẋ ˆx = (Ax + bu) (ˆx + ˆb 1 u + ˆb 2 y) Set y = c T x and rearrange the equation ė = (A ˆb 2 c T )x ˆx + (b ˆb 1 )u Matthias Freiberger, Martin Öttl Advanced Signal Processing page 25/88

Optimal Estimation- Luenberger Observer Derivation ė = (A ˆb 2 c T )x ˆx + (b ˆb 1 )u Error shall be independent from the input set ˆb 1 = b ė = (A ˆb 2 c T )x ˆx Choose  = A ˆb 2 c T and get for the error ė = (A ˆb 2 c T )e Final model: ˆx = (A ˆb 2 c T )ˆx + bu + ˆb 2 y Matthias Freiberger, Martin Öttl Advanced Signal Processing page 26/88

Optimal Estimation- Luenberger Observer Derivation ˆx = (A ˆb 2 c T )ˆx + ˆb 1 u + ˆb 2 y Rewrite ˆb 2 = ˆb and c T ˆx = ŷ ˆx = Aˆx ˆbŷ + ˆby + bu Comprehend terms ˆx = Aˆx + bu + ˆb(y ŷ) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 27/88

Optimal Estimation- Luenberger Observer Where are our models now? ˆẋ = Aˆx }{{ + bu } + ˆb(y ŷ) }{{} F orward model Sensory correction Forward model takes the actual state estimate, tries to predict the further trend of the state Use difference between actual sensory feedback y prediction ŷ weighted by ˆb to update state estimate. How to choose ˆb? For deterministic Systems: Choose ˆb such that (A ˆb 2 c T ) is asymptotically stable. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 28/88

TU Graz - Signal Processing and Speech Communication Laboratory Optimal Estimation- Probabilistic Observer Real world can be mean and difficult Noise is everywhere.. Circuits are plagued by noise so are radio transmissions and even our body u(t) ẋ = Ax+bu y(t) ˆẋ = Aˆx+bu+ ˆb(y ŷ) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 29/88

TU Graz - Signal Processing and Speech Communication Laboratory Optimal Estimation- Probabilistic Observer Real world can be mean and difficult Noise is everywhere.. Circuits are plagued by noise so are radio transmissions and even our body v(t) u(t) ẋ = Ax+bu+w y(t) Observer? Matthias Freiberger, Martin Öttl Advanced Signal Processing page 30/88

Optimal Estimation- Probabilistic Observer Stochastic model ẋ = Ax + bu + w y = Cx + v state update equation model for sensory output x w(t) v(t) motor noise sensory noise w and v are Random Variables Therefore, the state vector x is a vector of RVs as well This means that we need a Bayesian estimator to estimate the mean x and covariance matrix P of an RV X Matthias Freiberger, Martin Öttl Advanced Signal Processing page 31/88

Optimal Estimation- Probabilistic Observer Some simplifications We assume that our noise is Additive White Gaussian Noise, as well as uncorrelated from the initial state x 0 w(t) N (0, Q c ) v(t) N (0, R c ) It can be shown that in this case, the minimum variance estimator is the Kalman Filter. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 32/88

Optimal Estimation- Kalman Filter model for a Kalman filter ˆx = } Aˆx {{ + bu } + K t (y ŷ) }{{} F orwardmodel Sensorycorrection (5) Computation of K and P K t = P t C T R 1 c Kalman matrix P t = K t R 1 c K t T + AP t + P t A T + Q c Update rule for P Matthias Freiberger, Martin Öttl Advanced Signal Processing page 33/88

Optimal Estimation- Experiment results Test probands (GAM) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 34/88

Optimal Estimation- Experiment results Kalman Filter Matthias Freiberger, Martin Öttl Advanced Signal Processing page 35/88

Optimal Estimation- Experiment results Test probands (GAM) Kalman Filter Matthias Freiberger, Martin Öttl Advanced Signal Processing page 36/88

Optimal Estimation- Experiment conclusions Kalman Filter Curves are quite similar Noticeable peak at 1s seems to be a tradeoff between forward model and backward model Variance jitter on experiment for changing forces,no force dependent change in variance is predicted by the Kalman filter The experiment provides support for the use of forward models applying sensory correction Matthias Freiberger, Martin Öttl Advanced Signal Processing page 37/88

Outline Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References Matthias Freiberger, Martin Öttl Advanced Signal Processing page 38/88

Optimal Feedback Control Introduction Matthias Freiberger, Martin Öttl Advanced Signal Processing page 39/88

Intro Markov Decision Process (MDP) Some notation x X state of the Markov process u U(x) action / control in state x p(x x, u) control-dependent transition probability distribution l(x, u) 0 immediate cost for choosing control u in state x Shortest Path problem Cumulative cost: 5 Immediate cost: 2 1 4 5 1 3 1 Target 5 2 0 2 3 3 3 3 Matthias Freiberger, Martin Öttl Advanced Signal Processing page 40/88

Intro MDP First exit formulation (1) Goal find for each state a control law / policy u = π(x) U(x) which moves the trajectory towards a terminal state x T. each trajectory should cause the lowest total cost v π (x). v π (x) is also called cost-to-go. cost at terminal state is v π (x) = q T (x). Matthias Freiberger, Martin Öttl Advanced Signal Processing page 41/88

Intro MDP First exit formulation (2) Cost-to-go as path sum v π (x) = E x0=x x k+1 p(. x k,π k (x k )) Definition Hamiltonian H [x, u, v( )] def = l(x, u) + E x p( x,u) {v(x )} { q T (x tfirst ) + } t first 1 k=0 l(x k, π k (x k )) Bellman equations policy-specific cost-to-go v π (x) = H [x, π(x), v π ( )] optimal cost-to-go v (x) = min u U(x) H [x, u, v ( )] optimal policy π (x) = argmin u U(x) H [x, u, v ( )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 42/88

Intro MDP Finite horizon formulation All trajectories end at t = N. Cost-to-go as path sum v π t (x) = E xt=x x k+1 p(. x k,π k (x k )) Definition Hamiltonian H [x, u, v( )] def = l(x, u) + E x p( x,u) {v(x )} { q T (x N ) + } N 1 k=t l(x k, π k (x k )) Bellman equations policy-specific cost-to-go v π t (x) = H [ x, π t (x), v π t+1 ( )] optimal cost-to-go v t (x) = min u U(x) H [ x, u, v t+1 ( )] optimal policy π t (x) = argmin u U(x) H [ x, u, v t+1 ( )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 43/88

Intro MDP Infinite horizon discounted cost form. Trajectories continue forever; future costs are exponentially discounted with α < 1 to ensure a finite cost-to-go. Cost-to-go as path sum v π (x) = E x0=x x k+1 p(. x k,π k (x k )) Definition Hamiltonian { k=0 αk l(x k, π(x k )) } H α [x, u, v( )] def = l(x, u) + α E x p( x,u) {v(x )} Bellman equations policy-specific cost-to-go v π (x) = H α [x, π(x), v π ( )] optimal cost-to-go v (x) = min u U(x) H α [x, u, v ( )] optimal policy π (x) = argmin u U(x) H α [x, u, v ( )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 44/88

Intro MDP Infinite horizon average cost formulation Trajectories continue forever; there is no discounting and therefore the resulting cost-to-go is infinte. Average cost-to-go c π = lim N 1 N vπ,n 0 (x) Differential cost-to-go ṽ π (x) = v π,n 0 (x) Nc π Bellman equations policy-specific cost-to-go c π + ṽ π (x) = H [x, π(x), ṽ π ( )] optimal cost-to-go c + ṽ (x) = min u U(x) H [x, u, ṽ ( )] optimal policy π (x) = argmin u U(x) H [x, u, ṽ ( )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 45/88

Intro MDP Solution Algorithms for calculating a optimal cost-to-go Value Iteration Policy Iteration Linear Programming Matthias Freiberger, Martin Öttl Advanced Signal Processing page 46/88

Intro Continuous-time stochastic system (1) System Dynamics x(t) R n u(t) R m ξ(t) R k dx = f(x, u) dt + G(x, u) dξ state vector control vector Brownian motion (integral of white noise) Interpretation x(t) x(0) = t f(x(s), u(s)) ds + t 0 0 The last integral is an Ito integral. G(x(s), u(s)) dξ(s) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 47/88

Intro Continuous-time stochastic system (2) Ito integral An Ito integral for a square integrable function g(t) is defined as following t 0 g(s) dξ(s) = lim n k=0 with 0 = s 0 < s 1 < < s n = t n 1 g(s k )(ξ(s k+1 ) ξ(s k )) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 48/88

Intro Continuous-time stochastic system (3) Definition Hamiltonian H [x, u, v( )] def = l(x, u) + f(x, u) T v x (x) + 1 2 tr (Σ(x, u)v xx(x)) Σ(x, u) = G(x, u) G(x, u) T is the noise covariance. Hamilton-Jacobi-Bellman (HJB) equations for optimal cost-to-go first exit 0 = min u H [x, u, v ( )] v (x T ) = q T (x) finite horizon v t (x, t) = min u H [x, u, v (, t)] v (x, T ) = q T (x) discounted 1 τ v (x) = min u H [x, u, v ( )] average c = min u H [x, u, ṽ ( )] discounted cost-to-go v π (x) = E { 0 exp( t/τ) l(x(t), u(t)) dt } Matthias Freiberger, Martin Öttl Advanced Signal Processing page 49/88

Intro Inverse pendulum example (1) Task find optimal control law for inverse pendulum θ = k sin(θ) + u force of gravity k, angle θ, torque u state dependent cost q(θ) = 1 exp( 2θ 2 ) control dependent cost r 2 u2 overall cost per step l(x, u) = q(θ) + r 2 u2 Mechanics q k State dependent cost x2 = q. -8 0 +8 -p 0 p x1 = q Matthias Freiberger, Martin Öttl Advanced Signal Processing page 50/88

Intro Inverse pendulum example (2) Stochastic dynamics θ = k sin(θ) + u dx = (a(x) + Bu)dt + Gdξ [ ] [ [ ] [ ] [ ] x1 θ θ] x x = =, a(x) = 2 0 0, B =, G = k sin(x 1 ) 1 σ x 2 Discounted HJB equation (from above) 1 τ v (x) = min u H [x, u, v ( )] = min u [ l(x, u) + f(x, u) T v x(x) + 1 2 tr (Σ(x, u)v xx(x)) ] HJB for inverse pendulum 1 τ v (x) = min u [ q(x) + r 2 u2 + (a(x) + Bu) T v x(x) + 1 2 tr ( GG T v xx(x) )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 51/88

Intro Inverse pendulum example (3) Optimal control law minimizes Hamiltonian differentiate Hamiltonian and set it zero ru + B T vx(x) = 0 u = 1 r BT vx(x) = 1 r v x 2 (x) Remarks v x is also called costate vector optimal control law depends on multiplication of a matrix containing system dynamics and energy costs with costate vector Matthias Freiberger, Martin Öttl Advanced Signal Processing page 52/88

Intro Inverse pendulum example (4) Calculation of costate vector insert optimal control law into HJB 1 τ v (x) = q(x) 1 2r v x 2 2 + x 2 v x 1 + k sin(x 1 )v x 2 + 1 2 σ2 v x 2 x 2 construct MDP (discretize state space, approximate derivates with finite differences,...) solve MDP Matthias Freiberger, Martin Öttl Advanced Signal Processing page 53/88

Intro Inverse pendulum example (5) State dependent cost q(x) Optimal cost-to-go v (x) Optimal policy u = 1 r v x 2 (x) x2 = q. 0 +8 x2 = q. 0 +8 x2 = q. 0 +8-8 -p 0 p x 1 = q -8 -p 0 p x 1 = q -8 -p 0 p x 1 = q Matthias Freiberger, Martin Öttl Advanced Signal Processing page 54/88

Optimal Feedback Control Linear Quadratic Gaussian Framework (LQG) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 55/88

LQG Linear Quadratic Gaussian framework In most cases a optimal control law can t be obtained in closed form, one exception is the LQG system. LQG properties linear dynamics quadratic costs additive Gaussian noise (if present) Here the Hamiltonian can be minimized analytically. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 56/88

LQG Continuous-time stochastic system Continuous-time LQG dynamics dx = (Ax + Bu) dt + Gdξ cost rate l(x, u) = 1 2 ut Ru + 1 2 xt Qx final cost h(x) = 1 2 xt Q f x R Q Q f control costs matrix (symmetric positive definite) state costs matrix (symmetric) final state cost matrix (symmetric) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 57/88

LQG Derivation cont.-time stochastic system (1) Guess for optimal value function v(x, t) = 1 2 xt V (t)x + a(t), V (t) symmetric Derivatives v t (x, t) = 1 2 xt V (t)x + ȧ(t) v x (x, t) = V (t) x v xx (x, t) = V (t) Substitution into finite horizon HJB 1 2 xt V (t)x ȧ(t) = min u { 1 2 ut Ru + 1 2 xt Qx + (Ax + Bu) T V (t)x + 1 2 tr(ggt V (t)) } Remember x T Ax x = ( A + A T ) a x T x x = xt a x = a Matthias Freiberger, Martin Öttl Advanced Signal Processing page 58/88

LQG Derivation cont.-time stochastic system (2) Analytically found minimum u = R 1 B T V (t)x Using this u, the control dependent part in HJB becomes 1 2 ut Ru + (Bu) T V (t)x = 1 2 xt V (t)br 1 B T V (t)x Simplifications (V is symmetric) x T A T V x = x T V T Ax = x T V Ax 2x T A T V x = x T A T V x + x T V Ax Matthias Freiberger, Martin Öttl Advanced Signal Processing page 59/88

LQG Derivation cont.-time stochastic system (3) Regrouping of the HJB equation yields to 1 2 xt V (t)x ȧ(t) = 1 2 xt ( Q + A T V (t) + V (t)a V (t)br 1 B T V (t) ) x + 1 2 tr(ggt V (t)) Our guess of the optimal value function is correct iff both following equations hold; the first one is called continuous-time Riccati equation V (t) = Q + A T V (t) + V (t)a V (t)br 1 B T V (t) ȧ(t) = 1 2 tr(ggt V (t)) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 60/88

LQG Derivation cont.-time stochastic system (4) Boundary conditions (we used finite horizon HJB) v(x, t f ) = 1 2 xt V (t f )x + a(t f ) = h(x) V (t f ) = Q f a(t f ) = 0 V (t), a(t) can be obtained by using the boundary conditions and integrating V (t) =... and ȧ(t) =... backward in time Optimal control law (repeated from above) u = R 1 B T V (t)x control law is independent of noise Matthias Freiberger, Martin Öttl Advanced Signal Processing page 61/88

LQG Discrete-time stochastic system (1) In practice one uses usually discrete time systems. Discrete-Time LQG dynamics cost rate final cost x t+1 = Ax t + Bu t + ξ t l(x, u) = 1 2 ut t Ru t + 1 2 xt t Qx t h(x) = 1 2 xt t f Q f x tf Optimal control law u t = L t x t Control gain L t = ( R + B T V t+1 B ) 1 B T V t+1 A Matthias Freiberger, Martin Öttl Advanced Signal Processing page 62/88

LQG Discrete-time stochastic system (2) Discrete-time Riccati equation V t = Q t + A T V t+1 (A BL t ) Solving above equations control gain is independent of state sequence and can be computed offline V t is computed by initializing V tf = Q f and iterating the Riccati equation backward in time Matthias Freiberger, Martin Öttl Advanced Signal Processing page 63/88

Optimal Feedback Control LQG + Kalman Filter Matthias Freiberger, Martin Öttl Advanced Signal Processing page 64/88

LQG + Kalman Filter Overview controller generates motor command u t and needs actual state estimate ˆx t estimator compensates sensory delays by the usage of the efference copy u t controller and estimator operate in a loop and therefore can generate motor commands even when sensory data become unavailable Controller (LQG) Estimated state ^ xt Motor command ut Efference copy ut Estimator (Kalman Filter) Process noise xt Sensory data yt Measurement noise wt Biomechanical plant State xt Sensory apparatus Matthias Freiberger, Martin Öttl Advanced Signal Processing page 65/88

LQG + Kalman Filter System model System model dynamics feedback cost per step x t+1 = Ax t + Bu t + ξ t y t = Hx t + ω t l(x, u) = x T t Q t x t + u T t Ru t ξ t ω t H process noise, Gaussian with zero mean and covariance Ω ξ measurement noise, Gaussian with zero mean and covariance Ω ω observation matrix Matthias Freiberger, Martin Öttl Advanced Signal Processing page 66/88

LQG + Kalman Filter Controller/Estimator Kalman Filter state estimate ˆx t+1 = Aˆx t + Bu t + K t (y t Hˆx t ) filter gain K t = AΣ t H T ( HΣ t H T + Ω ω) 1 estimation error covariance ˆx 0 and Σ 0 is given! Linear-Quadratic Regulator (LQR) control law control gain Σ t+1 = Ω ξ + (A K t H) Σ t A T u t = L tˆx t L t = ( R + B T V t+1 B ) 1 B T V t+1 A Riccati equation V t = Q t + A T V t+1 (A BL t ) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 67/88

Optimal Feedback Control LQG Multiplicative Noise Matthias Freiberger, Martin Öttl Advanced Signal Processing page 68/88

LQG Multiplicative Noise Overview Motivation Fitts law Faster movements are less accurate suggests that noise is control dependent standard deviation of muscle force can be good approximated by a linear function of mean force no explicit smoothness cost formulation necessary to achieve smooth trajectories Matthias Freiberger, Martin Öttl Advanced Signal Processing page 69/88

LQG Multiplicative Noise Definition System Model dynamics feedback cost per step x t+1 = Ax t + Bu t + ξ t + c i=1 εi tc i u t y t = Hx t + ω t + d i=1 ɛi td i x t l(x, u) = x T t Q t x t + u T t Ru t C i ε i t D i ɛ i t scaling matrices for control-dependent system noise i th control dependent noise component Gaussian with zero mean and covariance Ω ε = I scaling matrices for state-dependent observation noise i th state dependent noise component Gaussian with zero mean and covariance Ω ɛ = I Matthias Freiberger, Martin Öttl Advanced Signal Processing page 70/88

LQG Multiplicative Noise Controller/Estimator Estimator ˆx t+1 = (A BL t )ˆx t + K t (y t Hˆx t ) + η t Controller u t = L tˆx t Properties considers also internal noise η t independence of estimation and control isn t given anymore K t and L t are calculated offline; equations see [4] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 71/88

LQG Multiplicative Noise Algorithm Algorithm to calculate K t and L t 1. initialize filter gains K 1... K n 1 with zero or with Kalman filter gain 2. calculate control gain L t backward in time 3. calculate filter gain K t in a forward pass through time 4. repeat 2. and 3. until convergence Matthias Freiberger, Martin Öttl Advanced Signal Processing page 72/88

LQG Multiplicative Noise Example (1) Task 1 D positioning of point mass from start position p(0) = 0 to target position p time step ; duration t end = 0.3s minimal energy consumption Dynamics mechanic p(t + ) = p(t) + ṗ(t) ṗ(t + ) = ṗ(t) + f(t) /m Biomechanical plant u(t) f(t) m = 1kg p(t) 0 p* X e(t) + muscle-like low pass filter f(t) p(t) Dynamics muscle (time constants τ 1, τ 2 ) f(t + ) = f(t)(1 /τ 2 ) + g(t) /τ 2 g(t + ) = g(t)(1 /τ 1 ) + u(t)(1 + σ c ε t ) /τ 1 Matthias Freiberger, Martin Öttl Advanced Signal Processing page 73/88

LQG Multiplicative Noise Example (2) Dynamics matrix formulation x t+1 = Ax t + Bu t + ε t C 1 u t x t = [ p(t) ṗ(t) f(t) g(t) p ] T 1 0 0 0 0 0 1 /m 0 0 A = 0 0 1 /τ 2 /τ 2 0 0 0 0 1 /τ 1 0, B = 0 0 /τ 1, C 1 = Bσ c 0 0 0 0 1 0 Feedback matrix formulation 1 0 0 0 0 y t = Hx t + ω t H = 0 1 0 0 0 0 0 1 0 0 Matthias Freiberger, Martin Öttl Advanced Signal Processing page 74/88

LQG Multiplicative Noise Example (3) Total cost (p(t end ) p ) 2 + (w }{{} v ṗ(t end )) 2 + (w }{{} f f(t end )) 2 + r n 1 u 2 (k ) }{{} n 1 k=1 (1) (2) (3) }{{} (4) (1) penalizes deviations from target position (2)+(3) force that movement must be finished at t end (4) ensures energy minimization w v, w f and r are corresponding weights Matthias Freiberger, Martin Öttl Advanced Signal Processing page 75/88

LQG Multiplicative Noise Example (4) Cost per step matrix formulation we define p = [ 1 0 0 0 1 ] T and can write p(t end ) p = p T x t, therefore term (1) can be expressed as x T t (pp T )x t for term (2) and (3) we use v = [ 0 w v 0 0 0 ] T and f = [ 0 0 w f 0 0 ] T that leads to l(x, u) = x T t Q t x t + u T t Ru t with Q 1,...,n 1 = 0, Q n = pp T + vv T + ff T and R = r Matthias Freiberger, Martin Öttl Advanced Signal Processing page 76/88

LQG Multiplicative Noise Example (5) Resulting trajectories smooth trajectories without modeling smoothness in the costs system can be unstable, but not encountered in problems the author is dealing with Matthias Freiberger, Martin Öttl Advanced Signal Processing page 77/88

Optimal Feedback Control Minimal Intervention Principle Matthias Freiberger, Martin Öttl Advanced Signal Processing page 78/88

Minimal Intervention Principle Definition Definition ignore task-irrelevant deviations Simple example x 1, x 2 are uncoupled state variables states are driven by u 1, u 2 control multiplicative noise initial state is sampled from a circular Gaussian Matthias Freiberger, Martin Öttl Advanced Signal Processing page 79/88

Minimal Intervention Principle Example 1 Task x 1 + x 2 = target use small u 1, u 2 Optimum u 1 = u 2 control law depends on x 1 + x 2 u 1, u 2 form motor synergy Result black ellipse shows distribution of final states Matthias Freiberger, Martin Öttl Advanced Signal Processing page 80/88

Minimal Intervention Principle Example 2 Alternative control law x 1 = x 2 = target/2 Results gray circle shows distribution of final states variance in redundant direction is reduced variance in task relevant direction is increased control signals are increased ( not optimal) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 81/88

Optimal Feedback Control Hierarchical Optimal Controller Matthias Freiberger, Martin Öttl Advanced Signal Processing page 82/88

Hierarchical Optimal Controller Overview Principle low-level controller generates abstract representation y(x) of state x high-level controller generates commands v(y) to change y low-level controller computes energy efficient controls u(v, x) consistent with v Comparison with example 1, minimal intervention principle y = x 1 + x 2 v = f(y) u = [ v v ] T Matthias Freiberger, Martin Öttl Advanced Signal Processing page 83/88

Optimal Feedback Control Conclusion Matthias Freiberger, Martin Öttl Advanced Signal Processing page 84/88

Conclusion Summary We talked about... Markov Decision Process (MDP), Cost-to-go formulations Continuous-time stochastic system Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Matthias Freiberger, Martin Öttl Advanced Signal Processing page 85/88

Outline Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References Matthias Freiberger, Martin Öttl Advanced Signal Processing page 86/88

References Konrad P. Körding, Daniel M. Wolpert Bayesian decision theory in sensorimotor control, Trends Cogn Sci., vol. 10, no. 7, pp. 319 326, July 2006, 10.1016/j.tics.2006.05.003. Konrad P. Körding, Daniel M. Wolpert, Bayesian integration in sensorimotor learning, letters to nature vol. 427, no. 15, pp. 244 247, January 2004 Emanuel Todorov, Optimality principles in sensorimotor control, Nature Neuroscience, vol. 7, no. 9, pp. 907 915, Sep. 2004, 10.1038/nn1309. Emanuel Todorov, Stochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System, Neural Comput., vol. 17, no. 5, pp. 1084 1108, May. 2005, 10.1162/0899766053491887. Kenji Doya et al, Bayesian Brain, Chapter: Optimal Control Theory, MIT Press, 2006 Emanuel Todorov, Lecture: Intelligent control through learning and optimization, http://www.cs.washington.edu/homes/todorov/courses/amath579/, accessed 14 Mai 2012 Matthias Freiberger, Martin Öttl Advanced Signal Processing page 87/88