Bayesian Decision Theory in Sensorimotor Control

Bayesian Decision Theory in Sensorimotor Control Matthias Freiberger, Martin Öttl Signal Processing and Speech Communication Laboratory Advanced Signal Processing Matthias Freiberger, Martin Öttl Advanced Signal Processing page 1/88

Outline Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References Matthias Freiberger, Martin Öttl Advanced Signal Processing page 2/88

Intro - What is sensorimotor control? sen so ri mo tor: (adj.) Of, relating to, or involving both sensory and motor activity: sensorimotor nerve centers; sensorimotor pathways. The American Heritage Dictionary of the English Language, Fourth Edition Movement is the only way for humans to interact with the world. All communication including speech, sign language, gestures and writing, is mediated by the motor system. http://www.pom.cam.ac.uk/research/sensorimotor.html Matthias Freiberger, Martin Öttl Advanced Signal Processing page 3/88

Intro - What is sensorimotor control? We want to understand/describe by application methods from computer science and control theory how.. human beings are able to play back a tennis ball.. or grab a bottle of water and drink.. birds of prey are capable of catching a mouse in flight.. basically how any kind of physical interaction with the environment is performed by biological systems, pursuing a certain objective while permanently performing corrections using sensor input Matthias Freiberger, Martin Öttl Advanced Signal Processing page 4/88

Intro - Challenges Action selection is a fundamental decision process CNS sends constantly sends motor commands to the muscles At each point in time: the appropriate motor command needs to be selected Knowledge about the environment needs to be combined with actual observation data and knowledge about cost/reward of currently possible actions to make optimal decisions. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 5/88

Intro - Schematic Control Flow Matthias Freiberger, Martin Öttl Advanced Signal Processing page 6/88

Intro - Uncertainty of human sensorium Human sensorium is plagued by noise Muscle output is noisy as well Therefore state of environment/body needs to be estimated Additionally the cost of each movement shall be minimized Bayesian statistics come in as a powerful way to deal with the uncertainty of the human sensorium Matthias Freiberger, Martin Öttl Advanced Signal Processing page 7/88

Intro - Bayesian integration CNS needs to integrate prior knowledge about environment with knowledge obtained from sensory data to estimate the state of the environment optimally When estimating bounce location of a tennis ball: ball might be more likely to bounce off at edges of court Matthias Freiberger, Martin Öttl Advanced Signal Processing page 8/88

Intro - Bayesian Cue Combination Combination of sensor signals for better estimates Combination of different sensor modalities (e.g. Vision and Proprioception) Combination of signal of same modality (several visual cues to a stereo image... ) Cues need to be weighted against each other Matthias Freiberger, Martin Öttl Advanced Signal Processing page 9/88

Intro - Bayesian Cue Combination Given a set of observations from different cues d 1, d 2, d 3,..., d n under the assumption that cues are independent from each other we can rewrite the likelihood P (d 1, d 2, d 3,..., d n ) as P (d 1, d 2, d 3,..., d n s) = n P (d k s) (1) k=1 Therefore we can rewrite the corresponding posterior probability: P (s d 1, d 2, d 3,..., d n ) = P (s) n k=1 P (d k s) P (d 1, d 2, d 3,..., d n ) (2) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 10/88

Intro - Cost Functions Model how good or bad the outcome of a particular move is Seems reasonable to minimize consumed energy and strain on muscles Several cost functions have been proposed (smoothness,precision) CNS also adapts very well to external cost functions Matthias Freiberger, Martin Öttl Advanced Signal Processing page 11/88

Intro - Cost Functions Actual cost function of human movement can be inferred using indifference lines Utility function can be found from these lines : compare points from lines,and assigning utilities to lines Matthias Freiberger, Martin Öttl Advanced Signal Processing page 12/88

Intro - Cost Functions Matthias Freiberger, Martin Öttl Advanced Signal Processing page 13/88

Intro - Cost Functions Given a set of possible actions X and a set of possible outcomes O, as well as a utility function U(o) : O R, for any x X we can compute the expected utility E{U} = O P (o x) U(o) (3) Therefore the optimal decision in respect to the cost function U(o) is considered to be the one which maximizes the expected utility E{U}. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 14/88

Optimal Estimation Intro Until now Find the optimal action for a finite amount of actions But the world is continious... Actual continuos state of our body parts has to be estimated permanently, optimal actions according to state estimation need to be found. In control terms... We need to model ourselves an observer, which estimates the inner state (e.g the position and velocity) of our limbs Matthias Freiberger, Martin Öttl Advanced Signal Processing page 16/88

Optimal Estimation Experiment Experiment setup Test subjects had to estimate the location of their thumb after moving their arm Resistive or assistive force has been added by torque motors Hand is constrained to move on a straight line Arm is illuminated for 2s, to give an initial state After that, participants have to rely solely on proprioception Matthias Freiberger, Martin Öttl Advanced Signal Processing page 17/88

Optimal Estimation Experiment Experiment setup Matthias Freiberger, Martin Öttl Advanced Signal Processing page 18/88

Optimal Estimation Models A system that mimics the behavior of a natural process, is called an internal model Internal models are an important concept in motor control Basically, two classes of internal models can be distinguished: forward models and backward models Matthias Freiberger, Martin Öttl Advanced Signal Processing page 19/88

Optimal Estimation Internal models: forward vs. backward Forward models Mimic the causal flow of a process by predicting its next state Comes up natural since delays in most sensorimotor loops are large,feedback control may be too slow for rapid movements Key indegredient in systems that use motor outflow (efference copy) Backward models Estimate the appropriate motor command which caused a particular state transition Matthias Freiberger, Martin Öttl Advanced Signal Processing page 20/88

Optimal Estimation Internal models: forward vs. backward How do we optimally model our limbs now? Wolpert et. al. used a forward model incorparating a correction term for the given problem. State estimation for a system containing noise is a complex task We will follow an intuitive approach by modeling an observer for a deterministic system first From our deterministic observer, we will perform the transition to a Probabilistic Observer ( Kalman Filter) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 21/88

Optimal Estimation - The Plant Model arm as a damped mass system State model ẋ = Ax + bu y = c T x + du State variables state update equation model for sensory output x x 1 position of the mass (hand) u(t) applied force x 2 velocity of the mass (hand) x y(t) sensory output Matthias Freiberger, Martin Öttl Advanced Signal Processing page 22/88

Optimal Estimation - The Plant Model parameters ( ) 0 1 A = 0 β m ( ) 1 c = 0 ) ( 0 b = 1 m ( ) 0 d = 0 m β mass of hand damping parameter Matthias Freiberger, Martin Öttl Advanced Signal Processing page 23/88

TU Graz - Signal Processing and Speech Communication Laboratory Optimal Estimation- Luenberger Observer Observer Model u(t) ẋ = Ax+bu y(t) Observer Ansatz for the Luenberger Observer ˆx = Âˆx + ˆb 1 u + ˆb 2 y (4) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 24/88

Optimal Estimation- Luenberger Observer Derivation Error constraint e(t) = x(t) ˆx(t) lim e(t) = 0 t ė = ẋ ˆx = (Ax + bu) (Âˆx + ˆb 1 u + ˆb 2 y) Set y = c T x and rearrange the equation ė = (A ˆb 2 c T )x Âˆx + (b ˆb 1 )u Matthias Freiberger, Martin Öttl Advanced Signal Processing page 25/88

Optimal Estimation- Luenberger Observer Derivation ė = (A ˆb 2 c T )x Âˆx + (b ˆb 1 )u Error shall be independent from the input set ˆb 1 = b ė = (A ˆb 2 c T )x Âˆx Choose Â = A ˆb 2 c T and get for the error ė = (A ˆb 2 c T )e Final model: ˆx = (A ˆb 2 c T )ˆx + bu + ˆb 2 y Matthias Freiberger, Martin Öttl Advanced Signal Processing page 26/88

Optimal Estimation- Luenberger Observer Derivation ˆx = (A ˆb 2 c T )ˆx + ˆb 1 u + ˆb 2 y Rewrite ˆb 2 = ˆb and c T ˆx = ŷ ˆx = Aˆx ˆbŷ + ˆby + bu Comprehend terms ˆx = Aˆx + bu + ˆb(y ŷ) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 27/88

Optimal Estimation- Luenberger Observer Where are our models now? ˆẋ = Aˆx }{{ + bu } + ˆb(y ŷ) }{{} F orward model Sensory correction Forward model takes the actual state estimate, tries to predict the further trend of the state Use difference between actual sensory feedback y prediction ŷ weighted by ˆb to update state estimate. How to choose ˆb? For deterministic Systems: Choose ˆb such that (A ˆb 2 c T ) is asymptotically stable. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 28/88

TU Graz - Signal Processing and Speech Communication Laboratory Optimal Estimation- Probabilistic Observer Real world can be mean and difficult Noise is everywhere.. Circuits are plagued by noise so are radio transmissions and even our body u(t) ẋ = Ax+bu y(t) ˆẋ = Aˆx+bu+ ˆb(y ŷ) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 29/88

TU Graz - Signal Processing and Speech Communication Laboratory Optimal Estimation- Probabilistic Observer Real world can be mean and difficult Noise is everywhere.. Circuits are plagued by noise so are radio transmissions and even our body v(t) u(t) ẋ = Ax+bu+w y(t) Observer? Matthias Freiberger, Martin Öttl Advanced Signal Processing page 30/88

Optimal Estimation- Probabilistic Observer Stochastic model ẋ = Ax + bu + w y = Cx + v state update equation model for sensory output x w(t) v(t) motor noise sensory noise w and v are Random Variables Therefore, the state vector x is a vector of RVs as well This means that we need a Bayesian estimator to estimate the mean x and covariance matrix P of an RV X Matthias Freiberger, Martin Öttl Advanced Signal Processing page 31/88

Optimal Estimation- Probabilistic Observer Some simplifications We assume that our noise is Additive White Gaussian Noise, as well as uncorrelated from the initial state x 0 w(t) N (0, Q c ) v(t) N (0, R c ) It can be shown that in this case, the minimum variance estimator is the Kalman Filter. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 32/88

Optimal Estimation- Kalman Filter model for a Kalman filter ˆx = } Aˆx {{ + bu } + K t (y ŷ) }{{} F orwardmodel Sensorycorrection (5) Computation of K and P K t = P t C T R 1 c Kalman matrix P t = K t R 1 c K t T + AP t + P t A T + Q c Update rule for P Matthias Freiberger, Martin Öttl Advanced Signal Processing page 33/88

Optimal Estimation- Experiment results Test probands (GAM) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 34/88

Optimal Estimation- Experiment results Kalman Filter Matthias Freiberger, Martin Öttl Advanced Signal Processing page 35/88

Optimal Estimation- Experiment results Test probands (GAM) Kalman Filter Matthias Freiberger, Martin Öttl Advanced Signal Processing page 36/88

Optimal Estimation- Experiment conclusions Kalman Filter Curves are quite similar Noticeable peak at 1s seems to be a tradeoff between forward model and backward model Variance jitter on experiment for changing forces,no force dependent change in variance is predicted by the Kalman filter The experiment provides support for the use of forward models applying sensory correction Matthias Freiberger, Martin Öttl Advanced Signal Processing page 37/88

Optimal Feedback Control Introduction Matthias Freiberger, Martin Öttl Advanced Signal Processing page 39/88

Intro Markov Decision Process (MDP) Some notation x X state of the Markov process u U(x) action / control in state x p(x x, u) control-dependent transition probability distribution l(x, u) 0 immediate cost for choosing control u in state x Shortest Path problem Cumulative cost: 5 Immediate cost: 2 1 4 5 1 3 1 Target 5 2 0 2 3 3 3 3 Matthias Freiberger, Martin Öttl Advanced Signal Processing page 40/88

Intro MDP First exit formulation (1) Goal find for each state a control law / policy u = π(x) U(x) which moves the trajectory towards a terminal state x T. each trajectory should cause the lowest total cost v π (x). v π (x) is also called cost-to-go. cost at terminal state is v π (x) = q T (x). Matthias Freiberger, Martin Öttl Advanced Signal Processing page 41/88

Intro MDP First exit formulation (2) Cost-to-go as path sum v π (x) = E x0=x x k+1 p(. x k,π k (x k )) Definition Hamiltonian H [x, u, v( )] def = l(x, u) + E x p( x,u) {v(x )} { q T (x tfirst ) + } t first 1 k=0 l(x k, π k (x k )) Bellman equations policy-specific cost-to-go v π (x) = H [x, π(x), v π ( )] optimal cost-to-go v (x) = min u U(x) H [x, u, v ( )] optimal policy π (x) = argmin u U(x) H [x, u, v ( )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 42/88

Intro MDP Finite horizon formulation All trajectories end at t = N. Cost-to-go as path sum v π t (x) = E xt=x x k+1 p(. x k,π k (x k )) Definition Hamiltonian H [x, u, v( )] def = l(x, u) + E x p( x,u) {v(x )} { q T (x N ) + } N 1 k=t l(x k, π k (x k )) Bellman equations policy-specific cost-to-go v π t (x) = H [ x, π t (x), v π t+1 ( )] optimal cost-to-go v t (x) = min u U(x) H [ x, u, v t+1 ( )] optimal policy π t (x) = argmin u U(x) H [ x, u, v t+1 ( )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 43/88

Intro MDP Infinite horizon discounted cost form. Trajectories continue forever; future costs are exponentially discounted with α < 1 to ensure a finite cost-to-go. Cost-to-go as path sum v π (x) = E x0=x x k+1 p(. x k,π k (x k )) Definition Hamiltonian { k=0 αk l(x k, π(x k )) } H α [x, u, v( )] def = l(x, u) + α E x p( x,u) {v(x )} Bellman equations policy-specific cost-to-go v π (x) = H α [x, π(x), v π ( )] optimal cost-to-go v (x) = min u U(x) H α [x, u, v ( )] optimal policy π (x) = argmin u U(x) H α [x, u, v ( )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 44/88

Intro MDP Infinite horizon average cost formulation Trajectories continue forever; there is no discounting and therefore the resulting cost-to-go is infinte. Average cost-to-go c π = lim N 1 N vπ,n 0 (x) Differential cost-to-go ṽ π (x) = v π,n 0 (x) Nc π Bellman equations policy-specific cost-to-go c π + ṽ π (x) = H [x, π(x), ṽ π ( )] optimal cost-to-go c + ṽ (x) = min u U(x) H [x, u, ṽ ( )] optimal policy π (x) = argmin u U(x) H [x, u, ṽ ( )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 45/88

Intro MDP Solution Algorithms for calculating a optimal cost-to-go Value Iteration Policy Iteration Linear Programming Matthias Freiberger, Martin Öttl Advanced Signal Processing page 46/88

Intro Continuous-time stochastic system (1) System Dynamics x(t) R n u(t) R m ξ(t) R k dx = f(x, u) dt + G(x, u) dξ state vector control vector Brownian motion (integral of white noise) Interpretation x(t) x(0) = t f(x(s), u(s)) ds + t 0 0 The last integral is an Ito integral. G(x(s), u(s)) dξ(s) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 47/88

Intro Continuous-time stochastic system (2) Ito integral An Ito integral for a square integrable function g(t) is defined as following t 0 g(s) dξ(s) = lim n k=0 with 0 = s 0 < s 1 < < s n = t n 1 g(s k )(ξ(s k+1 ) ξ(s k )) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 48/88

Intro Continuous-time stochastic system (3) Definition Hamiltonian H [x, u, v( )] def = l(x, u) + f(x, u) T v x (x) + 1 2 tr (Σ(x, u)v xx(x)) Σ(x, u) = G(x, u) G(x, u) T is the noise covariance. Hamilton-Jacobi-Bellman (HJB) equations for optimal cost-to-go first exit 0 = min u H [x, u, v ( )] v (x T ) = q T (x) finite horizon v t (x, t) = min u H [x, u, v (, t)] v (x, T ) = q T (x) discounted 1 τ v (x) = min u H [x, u, v ( )] average c = min u H [x, u, ṽ ( )] discounted cost-to-go v π (x) = E { 0 exp( t/τ) l(x(t), u(t)) dt } Matthias Freiberger, Martin Öttl Advanced Signal Processing page 49/88

Intro Inverse pendulum example (1) Task find optimal control law for inverse pendulum θ = k sin(θ) + u force of gravity k, angle θ, torque u state dependent cost q(θ) = 1 exp( 2θ 2 ) control dependent cost r 2 u2 overall cost per step l(x, u) = q(θ) + r 2 u2 Mechanics q k State dependent cost x2 = q. -8 0 +8 -p 0 p x1 = q Matthias Freiberger, Martin Öttl Advanced Signal Processing page 50/88

Intro Inverse pendulum example (2) Stochastic dynamics θ = k sin(θ) + u dx = (a(x) + Bu)dt + Gdξ [ ] [ [ ] [ ] [ ] x1 θ θ] x x = =, a(x) = 2 0 0, B =, G = k sin(x 1 ) 1 σ x 2 Discounted HJB equation (from above) 1 τ v (x) = min u H [x, u, v ( )] = min u [ l(x, u) + f(x, u) T v x(x) + 1 2 tr (Σ(x, u)v xx(x)) ] HJB for inverse pendulum 1 τ v (x) = min u [ q(x) + r 2 u2 + (a(x) + Bu) T v x(x) + 1 2 tr ( GG T v xx(x) )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 51/88

Intro Inverse pendulum example (3) Optimal control law minimizes Hamiltonian differentiate Hamiltonian and set it zero ru + B T vx(x) = 0 u = 1 r BT vx(x) = 1 r v x 2 (x) Remarks v x is also called costate vector optimal control law depends on multiplication of a matrix containing system dynamics and energy costs with costate vector Matthias Freiberger, Martin Öttl Advanced Signal Processing page 52/88

Intro Inverse pendulum example (4) Calculation of costate vector insert optimal control law into HJB 1 τ v (x) = q(x) 1 2r v x 2 2 + x 2 v x 1 + k sin(x 1 )v x 2 + 1 2 σ2 v x 2 x 2 construct MDP (discretize state space, approximate derivates with finite differences,...) solve MDP Matthias Freiberger, Martin Öttl Advanced Signal Processing page 53/88

Intro Inverse pendulum example (5) State dependent cost q(x) Optimal cost-to-go v (x) Optimal policy u = 1 r v x 2 (x) x2 = q. 0 +8 x2 = q. 0 +8 x2 = q. 0 +8-8 -p 0 p x 1 = q -8 -p 0 p x 1 = q -8 -p 0 p x 1 = q Matthias Freiberger, Martin Öttl Advanced Signal Processing page 54/88

Optimal Feedback Control Linear Quadratic Gaussian Framework (LQG) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 55/88

LQG Linear Quadratic Gaussian framework In most cases a optimal control law can t be obtained in closed form, one exception is the LQG system. LQG properties linear dynamics quadratic costs additive Gaussian noise (if present) Here the Hamiltonian can be minimized analytically. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 56/88

LQG Continuous-time stochastic system Continuous-time LQG dynamics dx = (Ax + Bu) dt + Gdξ cost rate l(x, u) = 1 2 ut Ru + 1 2 xt Qx final cost h(x) = 1 2 xt Q f x R Q Q f control costs matrix (symmetric positive definite) state costs matrix (symmetric) final state cost matrix (symmetric) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 57/88

LQG Derivation cont.-time stochastic system (1) Guess for optimal value function v(x, t) = 1 2 xt V (t)x + a(t), V (t) symmetric Derivatives v t (x, t) = 1 2 xt V (t)x + ȧ(t) v x (x, t) = V (t) x v xx (x, t) = V (t) Substitution into finite horizon HJB 1 2 xt V (t)x ȧ(t) = min u { 1 2 ut Ru + 1 2 xt Qx + (Ax + Bu) T V (t)x + 1 2 tr(ggt V (t)) } Remember x T Ax x = ( A + A T ) a x T x x = xt a x = a Matthias Freiberger, Martin Öttl Advanced Signal Processing page 58/88

LQG Derivation cont.-time stochastic system (2) Analytically found minimum u = R 1 B T V (t)x Using this u, the control dependent part in HJB becomes 1 2 ut Ru + (Bu) T V (t)x = 1 2 xt V (t)br 1 B T V (t)x Simplifications (V is symmetric) x T A T V x = x T V T Ax = x T V Ax 2x T A T V x = x T A T V x + x T V Ax Matthias Freiberger, Martin Öttl Advanced Signal Processing page 59/88

LQG Derivation cont.-time stochastic system (3) Regrouping of the HJB equation yields to 1 2 xt V (t)x ȧ(t) = 1 2 xt ( Q + A T V (t) + V (t)a V (t)br 1 B T V (t) ) x + 1 2 tr(ggt V (t)) Our guess of the optimal value function is correct iff both following equations hold; the first one is called continuous-time Riccati equation V (t) = Q + A T V (t) + V (t)a V (t)br 1 B T V (t) ȧ(t) = 1 2 tr(ggt V (t)) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 60/88

LQG Derivation cont.-time stochastic system (4) Boundary conditions (we used finite horizon HJB) v(x, t f ) = 1 2 xt V (t f )x + a(t f ) = h(x) V (t f ) = Q f a(t f ) = 0 V (t), a(t) can be obtained by using the boundary conditions and integrating V (t) =... and ȧ(t) =... backward in time Optimal control law (repeated from above) u = R 1 B T V (t)x control law is independent of noise Matthias Freiberger, Martin Öttl Advanced Signal Processing page 61/88

LQG Discrete-time stochastic system (1) In practice one uses usually discrete time systems. Discrete-Time LQG dynamics cost rate final cost x t+1 = Ax t + Bu t + ξ t l(x, u) = 1 2 ut t Ru t + 1 2 xt t Qx t h(x) = 1 2 xt t f Q f x tf Optimal control law u t = L t x t Control gain L t = ( R + B T V t+1 B ) 1 B T V t+1 A Matthias Freiberger, Martin Öttl Advanced Signal Processing page 62/88

LQG Discrete-time stochastic system (2) Discrete-time Riccati equation V t = Q t + A T V t+1 (A BL t ) Solving above equations control gain is independent of state sequence and can be computed offline V t is computed by initializing V tf = Q f and iterating the Riccati equation backward in time Matthias Freiberger, Martin Öttl Advanced Signal Processing page 63/88

Optimal Feedback Control LQG + Kalman Filter Matthias Freiberger, Martin Öttl Advanced Signal Processing page 64/88

LQG + Kalman Filter Overview controller generates motor command u t and needs actual state estimate ˆx t estimator compensates sensory delays by the usage of the efference copy u t controller and estimator operate in a loop and therefore can generate motor commands even when sensory data become unavailable Controller (LQG) Estimated state ^ xt Motor command ut Efference copy ut Estimator (Kalman Filter) Process noise xt Sensory data yt Measurement noise wt Biomechanical plant State xt Sensory apparatus Matthias Freiberger, Martin Öttl Advanced Signal Processing page 65/88

LQG + Kalman Filter System model System model dynamics feedback cost per step x t+1 = Ax t + Bu t + ξ t y t = Hx t + ω t l(x, u) = x T t Q t x t + u T t Ru t ξ t ω t H process noise, Gaussian with zero mean and covariance Ω ξ measurement noise, Gaussian with zero mean and covariance Ω ω observation matrix Matthias Freiberger, Martin Öttl Advanced Signal Processing page 66/88

LQG + Kalman Filter Controller/Estimator Kalman Filter state estimate ˆx t+1 = Aˆx t + Bu t + K t (y t Hˆx t ) filter gain K t = AΣ t H T ( HΣ t H T + Ω ω) 1 estimation error covariance ˆx 0 and Σ 0 is given! Linear-Quadratic Regulator (LQR) control law control gain Σ t+1 = Ω ξ + (A K t H) Σ t A T u t = L tˆx t L t = ( R + B T V t+1 B ) 1 B T V t+1 A Riccati equation V t = Q t + A T V t+1 (A BL t ) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 67/88

Optimal Feedback Control LQG Multiplicative Noise Matthias Freiberger, Martin Öttl Advanced Signal Processing page 68/88

LQG Multiplicative Noise Overview Motivation Fitts law Faster movements are less accurate suggests that noise is control dependent standard deviation of muscle force can be good approximated by a linear function of mean force no explicit smoothness cost formulation necessary to achieve smooth trajectories Matthias Freiberger, Martin Öttl Advanced Signal Processing page 69/88

LQG Multiplicative Noise Definition System Model dynamics feedback cost per step x t+1 = Ax t + Bu t + ξ t + c i=1 εi tc i u t y t = Hx t + ω t + d i=1 ɛi td i x t l(x, u) = x T t Q t x t + u T t Ru t C i ε i t D i ɛ i t scaling matrices for control-dependent system noise i th control dependent noise component Gaussian with zero mean and covariance Ω ε = I scaling matrices for state-dependent observation noise i th state dependent noise component Gaussian with zero mean and covariance Ω ɛ = I Matthias Freiberger, Martin Öttl Advanced Signal Processing page 70/88

LQG Multiplicative Noise Controller/Estimator Estimator ˆx t+1 = (A BL t )ˆx t + K t (y t Hˆx t ) + η t Controller u t = L tˆx t Properties considers also internal noise η t independence of estimation and control isn t given anymore K t and L t are calculated offline; equations see [4] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 71/88

LQG Multiplicative Noise Algorithm Algorithm to calculate K t and L t 1. initialize filter gains K 1... K n 1 with zero or with Kalman filter gain 2. calculate control gain L t backward in time 3. calculate filter gain K t in a forward pass through time 4. repeat 2. and 3. until convergence Matthias Freiberger, Martin Öttl Advanced Signal Processing page 72/88

LQG Multiplicative Noise Example (1) Task 1 D positioning of point mass from start position p(0) = 0 to target position p time step ; duration t end = 0.3s minimal energy consumption Dynamics mechanic p(t + ) = p(t) + ṗ(t) ṗ(t + ) = ṗ(t) + f(t) /m Biomechanical plant u(t) f(t) m = 1kg p(t) 0 p* X e(t) + muscle-like low pass filter f(t) p(t) Dynamics muscle (time constants τ 1, τ 2 ) f(t + ) = f(t)(1 /τ 2 ) + g(t) /τ 2 g(t + ) = g(t)(1 /τ 1 ) + u(t)(1 + σ c ε t ) /τ 1 Matthias Freiberger, Martin Öttl Advanced Signal Processing page 73/88

LQG Multiplicative Noise Example (2) Dynamics matrix formulation x t+1 = Ax t + Bu t + ε t C 1 u t x t = [ p(t) ṗ(t) f(t) g(t) p ] T 1 0 0 0 0 0 1 /m 0 0 A = 0 0 1 /τ 2 /τ 2 0 0 0 0 1 /τ 1 0, B = 0 0 /τ 1, C 1 = Bσ c 0 0 0 0 1 0 Feedback matrix formulation 1 0 0 0 0 y t = Hx t + ω t H = 0 1 0 0 0 0 0 1 0 0 Matthias Freiberger, Martin Öttl Advanced Signal Processing page 74/88

LQG Multiplicative Noise Example (3) Total cost (p(t end ) p ) 2 + (w }{{} v ṗ(t end )) 2 + (w }{{} f f(t end )) 2 + r n 1 u 2 (k ) }{{} n 1 k=1 (1) (2) (3) }{{} (4) (1) penalizes deviations from target position (2)+(3) force that movement must be finished at t end (4) ensures energy minimization w v, w f and r are corresponding weights Matthias Freiberger, Martin Öttl Advanced Signal Processing page 75/88

LQG Multiplicative Noise Example (4) Cost per step matrix formulation we define p = [ 1 0 0 0 1 ] T and can write p(t end ) p = p T x t, therefore term (1) can be expressed as x T t (pp T )x t for term (2) and (3) we use v = [ 0 w v 0 0 0 ] T and f = [ 0 0 w f 0 0 ] T that leads to l(x, u) = x T t Q t x t + u T t Ru t with Q 1,...,n 1 = 0, Q n = pp T + vv T + ff T and R = r Matthias Freiberger, Martin Öttl Advanced Signal Processing page 76/88

LQG Multiplicative Noise Example (5) Resulting trajectories smooth trajectories without modeling smoothness in the costs system can be unstable, but not encountered in problems the author is dealing with Matthias Freiberger, Martin Öttl Advanced Signal Processing page 77/88

Optimal Feedback Control Minimal Intervention Principle Matthias Freiberger, Martin Öttl Advanced Signal Processing page 78/88

Minimal Intervention Principle Definition Definition ignore task-irrelevant deviations Simple example x 1, x 2 are uncoupled state variables states are driven by u 1, u 2 control multiplicative noise initial state is sampled from a circular Gaussian Matthias Freiberger, Martin Öttl Advanced Signal Processing page 79/88

Minimal Intervention Principle Example 1 Task x 1 + x 2 = target use small u 1, u 2 Optimum u 1 = u 2 control law depends on x 1 + x 2 u 1, u 2 form motor synergy Result black ellipse shows distribution of final states Matthias Freiberger, Martin Öttl Advanced Signal Processing page 80/88

Minimal Intervention Principle Example 2 Alternative control law x 1 = x 2 = target/2 Results gray circle shows distribution of final states variance in redundant direction is reduced variance in task relevant direction is increased control signals are increased ( not optimal) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 81/88

Optimal Feedback Control Hierarchical Optimal Controller Matthias Freiberger, Martin Öttl Advanced Signal Processing page 82/88

Hierarchical Optimal Controller Overview Principle low-level controller generates abstract representation y(x) of state x high-level controller generates commands v(y) to change y low-level controller computes energy efficient controls u(v, x) consistent with v Comparison with example 1, minimal intervention principle y = x 1 + x 2 v = f(y) u = [ v v ] T Matthias Freiberger, Martin Öttl Advanced Signal Processing page 83/88

Optimal Feedback Control Conclusion Matthias Freiberger, Martin Öttl Advanced Signal Processing page 84/88

Conclusion Summary We talked about... Markov Decision Process (MDP), Cost-to-go formulations Continuous-time stochastic system Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Matthias Freiberger, Martin Öttl Advanced Signal Processing page 85/88

References Konrad P. Körding, Daniel M. Wolpert Bayesian decision theory in sensorimotor control, Trends Cogn Sci., vol. 10, no. 7, pp. 319 326, July 2006, 10.1016/j.tics.2006.05.003. Konrad P. Körding, Daniel M. Wolpert, Bayesian integration in sensorimotor learning, letters to nature vol. 427, no. 15, pp. 244 247, January 2004 Emanuel Todorov, Optimality principles in sensorimotor control, Nature Neuroscience, vol. 7, no. 9, pp. 907 915, Sep. 2004, 10.1038/nn1309. Emanuel Todorov, Stochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System, Neural Comput., vol. 17, no. 5, pp. 1084 1108, May. 2005, 10.1162/0899766053491887. Kenji Doya et al, Bayesian Brain, Chapter: Optimal Control Theory, MIT Press, 2006 Emanuel Todorov, Lecture: Intelligent control through learning and optimization, http://www.cs.washington.edu/homes/todorov/courses/amath579/, accessed 14 Mai 2012 Matthias Freiberger, Martin Öttl Advanced Signal Processing page 87/88