Bayesian Decision Theory in Sensorimotor Control

Size: px
Start display at page:

Download "Bayesian Decision Theory in Sensorimotor Control"

Transcription

1 Bayesian Decision Theory in Sensorimotor Control Matthias Freiberger, Martin Öttl Signal Processing and Speech Communication Laboratory Advanced Signal Processing Matthias Freiberger, Martin Öttl Advanced Signal Processing page 1/88

2 Outline Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References Matthias Freiberger, Martin Öttl Advanced Signal Processing page 2/88

3 Intro - What is sensorimotor control? sen so ri mo tor: (adj.) Of, relating to, or involving both sensory and motor activity: sensorimotor nerve centers; sensorimotor pathways. The American Heritage Dictionary of the English Language, Fourth Edition Movement is the only way for humans to interact with the world. All communication including speech, sign language, gestures and writing, is mediated by the motor system. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 3/88

4 Intro - What is sensorimotor control? We want to understand/describe by application methods from computer science and control theory how.. human beings are able to play back a tennis ball.. or grab a bottle of water and drink.. birds of prey are capable of catching a mouse in flight.. basically how any kind of physical interaction with the environment is performed by biological systems, pursuing a certain objective while permanently performing corrections using sensor input Matthias Freiberger, Martin Öttl Advanced Signal Processing page 4/88

5 Intro - Challenges Action selection is a fundamental decision process CNS sends constantly sends motor commands to the muscles At each point in time: the appropriate motor command needs to be selected Knowledge about the environment needs to be combined with actual observation data and knowledge about cost/reward of currently possible actions to make optimal decisions. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 5/88

6 Intro - Schematic Control Flow Matthias Freiberger, Martin Öttl Advanced Signal Processing page 6/88

7 Intro - Uncertainty of human sensorium Human sensorium is plagued by noise Muscle output is noisy as well Therefore state of environment/body needs to be estimated Additionally the cost of each movement shall be minimized Bayesian statistics come in as a powerful way to deal with the uncertainty of the human sensorium Matthias Freiberger, Martin Öttl Advanced Signal Processing page 7/88

8 Intro - Bayesian integration CNS needs to integrate prior knowledge about environment with knowledge obtained from sensory data to estimate the state of the environment optimally When estimating bounce location of a tennis ball: ball might be more likely to bounce off at edges of court Matthias Freiberger, Martin Öttl Advanced Signal Processing page 8/88

9 Intro - Bayesian Cue Combination Combination of sensor signals for better estimates Combination of different sensor modalities (e.g. Vision and Proprioception) Combination of signal of same modality (several visual cues to a stereo image... ) Cues need to be weighted against each other Matthias Freiberger, Martin Öttl Advanced Signal Processing page 9/88

10 Intro - Bayesian Cue Combination Given a set of observations from different cues d 1, d 2, d 3,..., d n under the assumption that cues are independent from each other we can rewrite the likelihood P (d 1, d 2, d 3,..., d n ) as P (d 1, d 2, d 3,..., d n s) = n P (d k s) (1) k=1 Therefore we can rewrite the corresponding posterior probability: P (s d 1, d 2, d 3,..., d n ) = P (s) n k=1 P (d k s) P (d 1, d 2, d 3,..., d n ) (2) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 10/88

11 Intro - Cost Functions Model how good or bad the outcome of a particular move is Seems reasonable to minimize consumed energy and strain on muscles Several cost functions have been proposed (smoothness,precision) CNS also adapts very well to external cost functions Matthias Freiberger, Martin Öttl Advanced Signal Processing page 11/88

12 Intro - Cost Functions Actual cost function of human movement can be inferred using indifference lines Utility function can be found from these lines : compare points from lines,and assigning utilities to lines Matthias Freiberger, Martin Öttl Advanced Signal Processing page 12/88

13 Intro - Cost Functions Matthias Freiberger, Martin Öttl Advanced Signal Processing page 13/88

14 Intro - Cost Functions Given a set of possible actions X and a set of possible outcomes O, as well as a utility function U(o) : O R, for any x X we can compute the expected utility E{U} = O P (o x) U(o) (3) Therefore the optimal decision in respect to the cost function U(o) is considered to be the one which maximizes the expected utility E{U}. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 14/88

15 Outline Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References Matthias Freiberger, Martin Öttl Advanced Signal Processing page 15/88

16 Optimal Estimation Intro Until now Find the optimal action for a finite amount of actions But the world is continious... Actual continuos state of our body parts has to be estimated permanently, optimal actions according to state estimation need to be found. In control terms... We need to model ourselves an observer, which estimates the inner state (e.g the position and velocity) of our limbs Matthias Freiberger, Martin Öttl Advanced Signal Processing page 16/88

17 Optimal Estimation Experiment Experiment setup Test subjects had to estimate the location of their thumb after moving their arm Resistive or assistive force has been added by torque motors Hand is constrained to move on a straight line Arm is illuminated for 2s, to give an initial state After that, participants have to rely solely on proprioception Matthias Freiberger, Martin Öttl Advanced Signal Processing page 17/88

18 Optimal Estimation Experiment Experiment setup Matthias Freiberger, Martin Öttl Advanced Signal Processing page 18/88

19 Optimal Estimation Models A system that mimics the behavior of a natural process, is called an internal model Internal models are an important concept in motor control Basically, two classes of internal models can be distinguished: forward models and backward models Matthias Freiberger, Martin Öttl Advanced Signal Processing page 19/88

20 Optimal Estimation Internal models: forward vs. backward Forward models Mimic the causal flow of a process by predicting its next state Comes up natural since delays in most sensorimotor loops are large,feedback control may be too slow for rapid movements Key indegredient in systems that use motor outflow (efference copy) Backward models Estimate the appropriate motor command which caused a particular state transition Matthias Freiberger, Martin Öttl Advanced Signal Processing page 20/88

21 Optimal Estimation Internal models: forward vs. backward How do we optimally model our limbs now? Wolpert et. al. used a forward model incorparating a correction term for the given problem. State estimation for a system containing noise is a complex task We will follow an intuitive approach by modeling an observer for a deterministic system first From our deterministic observer, we will perform the transition to a Probabilistic Observer ( Kalman Filter) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 21/88

22 Optimal Estimation - The Plant Model arm as a damped mass system State model ẋ = Ax + bu y = c T x + du State variables state update equation model for sensory output x x 1 position of the mass (hand) u(t) applied force x 2 velocity of the mass (hand) x y(t) sensory output Matthias Freiberger, Martin Öttl Advanced Signal Processing page 22/88

23 Optimal Estimation - The Plant Model parameters ( ) 0 1 A = 0 β m ( ) 1 c = 0 ) ( 0 b = 1 m ( ) 0 d = 0 m β mass of hand damping parameter Matthias Freiberger, Martin Öttl Advanced Signal Processing page 23/88

24 TU Graz - Signal Processing and Speech Communication Laboratory Optimal Estimation- Luenberger Observer Observer Model u(t) ẋ = Ax+bu y(t) Observer Ansatz for the Luenberger Observer ˆx = ˆx + ˆb 1 u + ˆb 2 y (4) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 24/88

25 Optimal Estimation- Luenberger Observer Derivation Error constraint e(t) = x(t) ˆx(t) lim e(t) = 0 t ė = ẋ ˆx = (Ax + bu) (ˆx + ˆb 1 u + ˆb 2 y) Set y = c T x and rearrange the equation ė = (A ˆb 2 c T )x ˆx + (b ˆb 1 )u Matthias Freiberger, Martin Öttl Advanced Signal Processing page 25/88

26 Optimal Estimation- Luenberger Observer Derivation ė = (A ˆb 2 c T )x ˆx + (b ˆb 1 )u Error shall be independent from the input set ˆb 1 = b ė = (A ˆb 2 c T )x ˆx Choose  = A ˆb 2 c T and get for the error ė = (A ˆb 2 c T )e Final model: ˆx = (A ˆb 2 c T )ˆx + bu + ˆb 2 y Matthias Freiberger, Martin Öttl Advanced Signal Processing page 26/88

27 Optimal Estimation- Luenberger Observer Derivation ˆx = (A ˆb 2 c T )ˆx + ˆb 1 u + ˆb 2 y Rewrite ˆb 2 = ˆb and c T ˆx = ŷ ˆx = Aˆx ˆbŷ + ˆby + bu Comprehend terms ˆx = Aˆx + bu + ˆb(y ŷ) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 27/88

28 Optimal Estimation- Luenberger Observer Where are our models now? ˆẋ = Aˆx }{{ + bu } + ˆb(y ŷ) }{{} F orward model Sensory correction Forward model takes the actual state estimate, tries to predict the further trend of the state Use difference between actual sensory feedback y prediction ŷ weighted by ˆb to update state estimate. How to choose ˆb? For deterministic Systems: Choose ˆb such that (A ˆb 2 c T ) is asymptotically stable. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 28/88

29 TU Graz - Signal Processing and Speech Communication Laboratory Optimal Estimation- Probabilistic Observer Real world can be mean and difficult Noise is everywhere.. Circuits are plagued by noise so are radio transmissions and even our body u(t) ẋ = Ax+bu y(t) ˆẋ = Aˆx+bu+ ˆb(y ŷ) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 29/88

30 TU Graz - Signal Processing and Speech Communication Laboratory Optimal Estimation- Probabilistic Observer Real world can be mean and difficult Noise is everywhere.. Circuits are plagued by noise so are radio transmissions and even our body v(t) u(t) ẋ = Ax+bu+w y(t) Observer? Matthias Freiberger, Martin Öttl Advanced Signal Processing page 30/88

31 Optimal Estimation- Probabilistic Observer Stochastic model ẋ = Ax + bu + w y = Cx + v state update equation model for sensory output x w(t) v(t) motor noise sensory noise w and v are Random Variables Therefore, the state vector x is a vector of RVs as well This means that we need a Bayesian estimator to estimate the mean x and covariance matrix P of an RV X Matthias Freiberger, Martin Öttl Advanced Signal Processing page 31/88

32 Optimal Estimation- Probabilistic Observer Some simplifications We assume that our noise is Additive White Gaussian Noise, as well as uncorrelated from the initial state x 0 w(t) N (0, Q c ) v(t) N (0, R c ) It can be shown that in this case, the minimum variance estimator is the Kalman Filter. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 32/88

33 Optimal Estimation- Kalman Filter model for a Kalman filter ˆx = } Aˆx {{ + bu } + K t (y ŷ) }{{} F orwardmodel Sensorycorrection (5) Computation of K and P K t = P t C T R 1 c Kalman matrix P t = K t R 1 c K t T + AP t + P t A T + Q c Update rule for P Matthias Freiberger, Martin Öttl Advanced Signal Processing page 33/88

34 Optimal Estimation- Experiment results Test probands (GAM) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 34/88

35 Optimal Estimation- Experiment results Kalman Filter Matthias Freiberger, Martin Öttl Advanced Signal Processing page 35/88

36 Optimal Estimation- Experiment results Test probands (GAM) Kalman Filter Matthias Freiberger, Martin Öttl Advanced Signal Processing page 36/88

37 Optimal Estimation- Experiment conclusions Kalman Filter Curves are quite similar Noticeable peak at 1s seems to be a tradeoff between forward model and backward model Variance jitter on experiment for changing forces,no force dependent change in variance is predicted by the Kalman filter The experiment provides support for the use of forward models applying sensory correction Matthias Freiberger, Martin Öttl Advanced Signal Processing page 37/88

38 Outline Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References Matthias Freiberger, Martin Öttl Advanced Signal Processing page 38/88

39 Optimal Feedback Control Introduction Matthias Freiberger, Martin Öttl Advanced Signal Processing page 39/88

40 Intro Markov Decision Process (MDP) Some notation x X state of the Markov process u U(x) action / control in state x p(x x, u) control-dependent transition probability distribution l(x, u) 0 immediate cost for choosing control u in state x Shortest Path problem Cumulative cost: 5 Immediate cost: Target Matthias Freiberger, Martin Öttl Advanced Signal Processing page 40/88

41 Intro MDP First exit formulation (1) Goal find for each state a control law / policy u = π(x) U(x) which moves the trajectory towards a terminal state x T. each trajectory should cause the lowest total cost v π (x). v π (x) is also called cost-to-go. cost at terminal state is v π (x) = q T (x). Matthias Freiberger, Martin Öttl Advanced Signal Processing page 41/88

42 Intro MDP First exit formulation (2) Cost-to-go as path sum v π (x) = E x0=x x k+1 p(. x k,π k (x k )) Definition Hamiltonian H [x, u, v( )] def = l(x, u) + E x p( x,u) {v(x )} { q T (x tfirst ) + } t first 1 k=0 l(x k, π k (x k )) Bellman equations policy-specific cost-to-go v π (x) = H [x, π(x), v π ( )] optimal cost-to-go v (x) = min u U(x) H [x, u, v ( )] optimal policy π (x) = argmin u U(x) H [x, u, v ( )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 42/88

43 Intro MDP Finite horizon formulation All trajectories end at t = N. Cost-to-go as path sum v π t (x) = E xt=x x k+1 p(. x k,π k (x k )) Definition Hamiltonian H [x, u, v( )] def = l(x, u) + E x p( x,u) {v(x )} { q T (x N ) + } N 1 k=t l(x k, π k (x k )) Bellman equations policy-specific cost-to-go v π t (x) = H [ x, π t (x), v π t+1 ( )] optimal cost-to-go v t (x) = min u U(x) H [ x, u, v t+1 ( )] optimal policy π t (x) = argmin u U(x) H [ x, u, v t+1 ( )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 43/88

44 Intro MDP Infinite horizon discounted cost form. Trajectories continue forever; future costs are exponentially discounted with α < 1 to ensure a finite cost-to-go. Cost-to-go as path sum v π (x) = E x0=x x k+1 p(. x k,π k (x k )) Definition Hamiltonian { k=0 αk l(x k, π(x k )) } H α [x, u, v( )] def = l(x, u) + α E x p( x,u) {v(x )} Bellman equations policy-specific cost-to-go v π (x) = H α [x, π(x), v π ( )] optimal cost-to-go v (x) = min u U(x) H α [x, u, v ( )] optimal policy π (x) = argmin u U(x) H α [x, u, v ( )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 44/88

45 Intro MDP Infinite horizon average cost formulation Trajectories continue forever; there is no discounting and therefore the resulting cost-to-go is infinte. Average cost-to-go c π = lim N 1 N vπ,n 0 (x) Differential cost-to-go ṽ π (x) = v π,n 0 (x) Nc π Bellman equations policy-specific cost-to-go c π + ṽ π (x) = H [x, π(x), ṽ π ( )] optimal cost-to-go c + ṽ (x) = min u U(x) H [x, u, ṽ ( )] optimal policy π (x) = argmin u U(x) H [x, u, ṽ ( )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 45/88

46 Intro MDP Solution Algorithms for calculating a optimal cost-to-go Value Iteration Policy Iteration Linear Programming Matthias Freiberger, Martin Öttl Advanced Signal Processing page 46/88

47 Intro Continuous-time stochastic system (1) System Dynamics x(t) R n u(t) R m ξ(t) R k dx = f(x, u) dt + G(x, u) dξ state vector control vector Brownian motion (integral of white noise) Interpretation x(t) x(0) = t f(x(s), u(s)) ds + t 0 0 The last integral is an Ito integral. G(x(s), u(s)) dξ(s) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 47/88

48 Intro Continuous-time stochastic system (2) Ito integral An Ito integral for a square integrable function g(t) is defined as following t 0 g(s) dξ(s) = lim n k=0 with 0 = s 0 < s 1 < < s n = t n 1 g(s k )(ξ(s k+1 ) ξ(s k )) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 48/88

49 Intro Continuous-time stochastic system (3) Definition Hamiltonian H [x, u, v( )] def = l(x, u) + f(x, u) T v x (x) tr (Σ(x, u)v xx(x)) Σ(x, u) = G(x, u) G(x, u) T is the noise covariance. Hamilton-Jacobi-Bellman (HJB) equations for optimal cost-to-go first exit 0 = min u H [x, u, v ( )] v (x T ) = q T (x) finite horizon v t (x, t) = min u H [x, u, v (, t)] v (x, T ) = q T (x) discounted 1 τ v (x) = min u H [x, u, v ( )] average c = min u H [x, u, ṽ ( )] discounted cost-to-go v π (x) = E { 0 exp( t/τ) l(x(t), u(t)) dt } Matthias Freiberger, Martin Öttl Advanced Signal Processing page 49/88

50 Intro Inverse pendulum example (1) Task find optimal control law for inverse pendulum θ = k sin(θ) + u force of gravity k, angle θ, torque u state dependent cost q(θ) = 1 exp( 2θ 2 ) control dependent cost r 2 u2 overall cost per step l(x, u) = q(θ) + r 2 u2 Mechanics q k State dependent cost x2 = q p 0 p x1 = q Matthias Freiberger, Martin Öttl Advanced Signal Processing page 50/88

51 Intro Inverse pendulum example (2) Stochastic dynamics θ = k sin(θ) + u dx = (a(x) + Bu)dt + Gdξ [ ] [ [ ] [ ] [ ] x1 θ θ] x x = =, a(x) = 2 0 0, B =, G = k sin(x 1 ) 1 σ x 2 Discounted HJB equation (from above) 1 τ v (x) = min u H [x, u, v ( )] = min u [ l(x, u) + f(x, u) T v x(x) tr (Σ(x, u)v xx(x)) ] HJB for inverse pendulum 1 τ v (x) = min u [ q(x) + r 2 u2 + (a(x) + Bu) T v x(x) tr ( GG T v xx(x) )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 51/88

52 Intro Inverse pendulum example (3) Optimal control law minimizes Hamiltonian differentiate Hamiltonian and set it zero ru + B T vx(x) = 0 u = 1 r BT vx(x) = 1 r v x 2 (x) Remarks v x is also called costate vector optimal control law depends on multiplication of a matrix containing system dynamics and energy costs with costate vector Matthias Freiberger, Martin Öttl Advanced Signal Processing page 52/88

53 Intro Inverse pendulum example (4) Calculation of costate vector insert optimal control law into HJB 1 τ v (x) = q(x) 1 2r v x x 2 v x 1 + k sin(x 1 )v x σ2 v x 2 x 2 construct MDP (discretize state space, approximate derivates with finite differences,...) solve MDP Matthias Freiberger, Martin Öttl Advanced Signal Processing page 53/88

54 Intro Inverse pendulum example (5) State dependent cost q(x) Optimal cost-to-go v (x) Optimal policy u = 1 r v x 2 (x) x2 = q x2 = q x2 = q p 0 p x 1 = q -8 -p 0 p x 1 = q -8 -p 0 p x 1 = q Matthias Freiberger, Martin Öttl Advanced Signal Processing page 54/88

55 Optimal Feedback Control Linear Quadratic Gaussian Framework (LQG) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 55/88

56 LQG Linear Quadratic Gaussian framework In most cases a optimal control law can t be obtained in closed form, one exception is the LQG system. LQG properties linear dynamics quadratic costs additive Gaussian noise (if present) Here the Hamiltonian can be minimized analytically. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 56/88

57 LQG Continuous-time stochastic system Continuous-time LQG dynamics dx = (Ax + Bu) dt + Gdξ cost rate l(x, u) = 1 2 ut Ru xt Qx final cost h(x) = 1 2 xt Q f x R Q Q f control costs matrix (symmetric positive definite) state costs matrix (symmetric) final state cost matrix (symmetric) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 57/88

58 LQG Derivation cont.-time stochastic system (1) Guess for optimal value function v(x, t) = 1 2 xt V (t)x + a(t), V (t) symmetric Derivatives v t (x, t) = 1 2 xt V (t)x + ȧ(t) v x (x, t) = V (t) x v xx (x, t) = V (t) Substitution into finite horizon HJB 1 2 xt V (t)x ȧ(t) = min u { 1 2 ut Ru xt Qx + (Ax + Bu) T V (t)x tr(ggt V (t)) } Remember x T Ax x = ( A + A T ) a x T x x = xt a x = a Matthias Freiberger, Martin Öttl Advanced Signal Processing page 58/88

59 LQG Derivation cont.-time stochastic system (2) Analytically found minimum u = R 1 B T V (t)x Using this u, the control dependent part in HJB becomes 1 2 ut Ru + (Bu) T V (t)x = 1 2 xt V (t)br 1 B T V (t)x Simplifications (V is symmetric) x T A T V x = x T V T Ax = x T V Ax 2x T A T V x = x T A T V x + x T V Ax Matthias Freiberger, Martin Öttl Advanced Signal Processing page 59/88

60 LQG Derivation cont.-time stochastic system (3) Regrouping of the HJB equation yields to 1 2 xt V (t)x ȧ(t) = 1 2 xt ( Q + A T V (t) + V (t)a V (t)br 1 B T V (t) ) x tr(ggt V (t)) Our guess of the optimal value function is correct iff both following equations hold; the first one is called continuous-time Riccati equation V (t) = Q + A T V (t) + V (t)a V (t)br 1 B T V (t) ȧ(t) = 1 2 tr(ggt V (t)) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 60/88

61 LQG Derivation cont.-time stochastic system (4) Boundary conditions (we used finite horizon HJB) v(x, t f ) = 1 2 xt V (t f )x + a(t f ) = h(x) V (t f ) = Q f a(t f ) = 0 V (t), a(t) can be obtained by using the boundary conditions and integrating V (t) =... and ȧ(t) =... backward in time Optimal control law (repeated from above) u = R 1 B T V (t)x control law is independent of noise Matthias Freiberger, Martin Öttl Advanced Signal Processing page 61/88

62 LQG Discrete-time stochastic system (1) In practice one uses usually discrete time systems. Discrete-Time LQG dynamics cost rate final cost x t+1 = Ax t + Bu t + ξ t l(x, u) = 1 2 ut t Ru t xt t Qx t h(x) = 1 2 xt t f Q f x tf Optimal control law u t = L t x t Control gain L t = ( R + B T V t+1 B ) 1 B T V t+1 A Matthias Freiberger, Martin Öttl Advanced Signal Processing page 62/88

63 LQG Discrete-time stochastic system (2) Discrete-time Riccati equation V t = Q t + A T V t+1 (A BL t ) Solving above equations control gain is independent of state sequence and can be computed offline V t is computed by initializing V tf = Q f and iterating the Riccati equation backward in time Matthias Freiberger, Martin Öttl Advanced Signal Processing page 63/88

64 Optimal Feedback Control LQG + Kalman Filter Matthias Freiberger, Martin Öttl Advanced Signal Processing page 64/88

65 LQG + Kalman Filter Overview controller generates motor command u t and needs actual state estimate ˆx t estimator compensates sensory delays by the usage of the efference copy u t controller and estimator operate in a loop and therefore can generate motor commands even when sensory data become unavailable Controller (LQG) Estimated state ^ xt Motor command ut Efference copy ut Estimator (Kalman Filter) Process noise xt Sensory data yt Measurement noise wt Biomechanical plant State xt Sensory apparatus Matthias Freiberger, Martin Öttl Advanced Signal Processing page 65/88

66 LQG + Kalman Filter System model System model dynamics feedback cost per step x t+1 = Ax t + Bu t + ξ t y t = Hx t + ω t l(x, u) = x T t Q t x t + u T t Ru t ξ t ω t H process noise, Gaussian with zero mean and covariance Ω ξ measurement noise, Gaussian with zero mean and covariance Ω ω observation matrix Matthias Freiberger, Martin Öttl Advanced Signal Processing page 66/88

67 LQG + Kalman Filter Controller/Estimator Kalman Filter state estimate ˆx t+1 = Aˆx t + Bu t + K t (y t Hˆx t ) filter gain K t = AΣ t H T ( HΣ t H T + Ω ω) 1 estimation error covariance ˆx 0 and Σ 0 is given! Linear-Quadratic Regulator (LQR) control law control gain Σ t+1 = Ω ξ + (A K t H) Σ t A T u t = L tˆx t L t = ( R + B T V t+1 B ) 1 B T V t+1 A Riccati equation V t = Q t + A T V t+1 (A BL t ) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 67/88

68 Optimal Feedback Control LQG Multiplicative Noise Matthias Freiberger, Martin Öttl Advanced Signal Processing page 68/88

69 LQG Multiplicative Noise Overview Motivation Fitts law Faster movements are less accurate suggests that noise is control dependent standard deviation of muscle force can be good approximated by a linear function of mean force no explicit smoothness cost formulation necessary to achieve smooth trajectories Matthias Freiberger, Martin Öttl Advanced Signal Processing page 69/88

70 LQG Multiplicative Noise Definition System Model dynamics feedback cost per step x t+1 = Ax t + Bu t + ξ t + c i=1 εi tc i u t y t = Hx t + ω t + d i=1 ɛi td i x t l(x, u) = x T t Q t x t + u T t Ru t C i ε i t D i ɛ i t scaling matrices for control-dependent system noise i th control dependent noise component Gaussian with zero mean and covariance Ω ε = I scaling matrices for state-dependent observation noise i th state dependent noise component Gaussian with zero mean and covariance Ω ɛ = I Matthias Freiberger, Martin Öttl Advanced Signal Processing page 70/88

71 LQG Multiplicative Noise Controller/Estimator Estimator ˆx t+1 = (A BL t )ˆx t + K t (y t Hˆx t ) + η t Controller u t = L tˆx t Properties considers also internal noise η t independence of estimation and control isn t given anymore K t and L t are calculated offline; equations see [4] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 71/88

72 LQG Multiplicative Noise Algorithm Algorithm to calculate K t and L t 1. initialize filter gains K 1... K n 1 with zero or with Kalman filter gain 2. calculate control gain L t backward in time 3. calculate filter gain K t in a forward pass through time 4. repeat 2. and 3. until convergence Matthias Freiberger, Martin Öttl Advanced Signal Processing page 72/88

73 LQG Multiplicative Noise Example (1) Task 1 D positioning of point mass from start position p(0) = 0 to target position p time step ; duration t end = 0.3s minimal energy consumption Dynamics mechanic p(t + ) = p(t) + ṗ(t) ṗ(t + ) = ṗ(t) + f(t) /m Biomechanical plant u(t) f(t) m = 1kg p(t) 0 p* X e(t) + muscle-like low pass filter f(t) p(t) Dynamics muscle (time constants τ 1, τ 2 ) f(t + ) = f(t)(1 /τ 2 ) + g(t) /τ 2 g(t + ) = g(t)(1 /τ 1 ) + u(t)(1 + σ c ε t ) /τ 1 Matthias Freiberger, Martin Öttl Advanced Signal Processing page 73/88

74 LQG Multiplicative Noise Example (2) Dynamics matrix formulation x t+1 = Ax t + Bu t + ε t C 1 u t x t = [ p(t) ṗ(t) f(t) g(t) p ] T /m 0 0 A = /τ 2 /τ /τ 1 0, B = 0 0 /τ 1, C 1 = Bσ c Feedback matrix formulation y t = Hx t + ω t H = Matthias Freiberger, Martin Öttl Advanced Signal Processing page 74/88

75 LQG Multiplicative Noise Example (3) Total cost (p(t end ) p ) 2 + (w }{{} v ṗ(t end )) 2 + (w }{{} f f(t end )) 2 + r n 1 u 2 (k ) }{{} n 1 k=1 (1) (2) (3) }{{} (4) (1) penalizes deviations from target position (2)+(3) force that movement must be finished at t end (4) ensures energy minimization w v, w f and r are corresponding weights Matthias Freiberger, Martin Öttl Advanced Signal Processing page 75/88

76 LQG Multiplicative Noise Example (4) Cost per step matrix formulation we define p = [ ] T and can write p(t end ) p = p T x t, therefore term (1) can be expressed as x T t (pp T )x t for term (2) and (3) we use v = [ 0 w v ] T and f = [ 0 0 w f 0 0 ] T that leads to l(x, u) = x T t Q t x t + u T t Ru t with Q 1,...,n 1 = 0, Q n = pp T + vv T + ff T and R = r Matthias Freiberger, Martin Öttl Advanced Signal Processing page 76/88

77 LQG Multiplicative Noise Example (5) Resulting trajectories smooth trajectories without modeling smoothness in the costs system can be unstable, but not encountered in problems the author is dealing with Matthias Freiberger, Martin Öttl Advanced Signal Processing page 77/88

78 Optimal Feedback Control Minimal Intervention Principle Matthias Freiberger, Martin Öttl Advanced Signal Processing page 78/88

79 Minimal Intervention Principle Definition Definition ignore task-irrelevant deviations Simple example x 1, x 2 are uncoupled state variables states are driven by u 1, u 2 control multiplicative noise initial state is sampled from a circular Gaussian Matthias Freiberger, Martin Öttl Advanced Signal Processing page 79/88

80 Minimal Intervention Principle Example 1 Task x 1 + x 2 = target use small u 1, u 2 Optimum u 1 = u 2 control law depends on x 1 + x 2 u 1, u 2 form motor synergy Result black ellipse shows distribution of final states Matthias Freiberger, Martin Öttl Advanced Signal Processing page 80/88

81 Minimal Intervention Principle Example 2 Alternative control law x 1 = x 2 = target/2 Results gray circle shows distribution of final states variance in redundant direction is reduced variance in task relevant direction is increased control signals are increased ( not optimal) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 81/88

82 Optimal Feedback Control Hierarchical Optimal Controller Matthias Freiberger, Martin Öttl Advanced Signal Processing page 82/88

83 Hierarchical Optimal Controller Overview Principle low-level controller generates abstract representation y(x) of state x high-level controller generates commands v(y) to change y low-level controller computes energy efficient controls u(v, x) consistent with v Comparison with example 1, minimal intervention principle y = x 1 + x 2 v = f(y) u = [ v v ] T Matthias Freiberger, Martin Öttl Advanced Signal Processing page 83/88

84 Optimal Feedback Control Conclusion Matthias Freiberger, Martin Öttl Advanced Signal Processing page 84/88

85 Conclusion Summary We talked about... Markov Decision Process (MDP), Cost-to-go formulations Continuous-time stochastic system Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Matthias Freiberger, Martin Öttl Advanced Signal Processing page 85/88

86 Outline Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References Matthias Freiberger, Martin Öttl Advanced Signal Processing page 86/88

87 References Konrad P. Körding, Daniel M. Wolpert Bayesian decision theory in sensorimotor control, Trends Cogn Sci., vol. 10, no. 7, pp , July 2006, /j.tics Konrad P. Körding, Daniel M. Wolpert, Bayesian integration in sensorimotor learning, letters to nature vol. 427, no. 15, pp , January 2004 Emanuel Todorov, Optimality principles in sensorimotor control, Nature Neuroscience, vol. 7, no. 9, pp , Sep. 2004, /nn1309. Emanuel Todorov, Stochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System, Neural Comput., vol. 17, no. 5, pp , May. 2005, / Kenji Doya et al, Bayesian Brain, Chapter: Optimal Control Theory, MIT Press, 2006 Emanuel Todorov, Lecture: Intelligent control through learning and optimization, accessed 14 Mai 2012 Matthias Freiberger, Martin Öttl Advanced Signal Processing page 87/88

88

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 204 Emo Todorov (UW) AMATH/CSE 579, Winter

More information

Controlled Diffusions and Hamilton-Jacobi Bellman Equations

Controlled Diffusions and Hamilton-Jacobi Bellman Equations Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter

More information

Robotics. Control Theory. Marc Toussaint U Stuttgart

Robotics. Control Theory. Marc Toussaint U Stuttgart Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,

More information

Inverse Optimality Design for Biological Movement Systems

Inverse Optimality Design for Biological Movement Systems Inverse Optimality Design for Biological Movement Systems Weiwei Li Emanuel Todorov Dan Liu Nordson Asymtek Carlsbad CA 921 USA e-mail: wwli@ieee.org. University of Washington Seattle WA 98195 USA Google

More information

Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation

Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation EE363 Winter 2008-09 Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation partially observed linear-quadratic stochastic control problem estimation-control separation principle

More information

Steady State Kalman Filter

Steady State Kalman Filter Steady State Kalman Filter Infinite Horizon LQ Control: ẋ = Ax + Bu R positive definite, Q = Q T 2Q 1 2. (A, B) stabilizable, (A, Q 1 2) detectable. Solve for the positive (semi-) definite P in the ARE:

More information

Optimal Control Theory

Optimal Control Theory Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline which provides algorithms to solve various control problems The elaborate mathematical machinery behind optimal

More information

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018 EN530.603 Applied Optimal Control Lecture 8: Dynamic Programming October 0, 08 Lecturer: Marin Kobilarov Dynamic Programming (DP) is conerned with the computation of an optimal policy, i.e. an optimal

More information

Hamilton-Jacobi-Bellman Equation Feb 25, 2008

Hamilton-Jacobi-Bellman Equation Feb 25, 2008 Hamilton-Jacobi-Bellman Equation Feb 25, 2008 What is it? The Hamilton-Jacobi-Bellman (HJB) equation is the continuous-time analog to the discrete deterministic dynamic programming algorithm Discrete VS

More information

6 OUTPUT FEEDBACK DESIGN

6 OUTPUT FEEDBACK DESIGN 6 OUTPUT FEEDBACK DESIGN When the whole sate vector is not available for feedback, i.e, we can measure only y = Cx. 6.1 Review of observer design Recall from the first class in linear systems that a simple

More information

Stochastic and Adaptive Optimal Control

Stochastic and Adaptive Optimal Control Stochastic and Adaptive Optimal Control Robert Stengel Optimal Control and Estimation, MAE 546 Princeton University, 2018! Nonlinear systems with random inputs and perfect measurements! Stochastic neighboring-optimal

More information

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10)

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10) Subject: Optimal Control Assignment- (Related to Lecture notes -). Design a oil mug, shown in fig., to hold as much oil possible. The height and radius of the mug should not be more than 6cm. The mug must

More information

Lecture 5 Linear Quadratic Stochastic Control

Lecture 5 Linear Quadratic Stochastic Control EE363 Winter 2008-09 Lecture 5 Linear Quadratic Stochastic Control linear-quadratic stochastic control problem solution via dynamic programming 5 1 Linear stochastic system linear dynamical system, over

More information

Pontryagin s maximum principle

Pontryagin s maximum principle Pontryagin s maximum principle Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2012 Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 1 / 9 Pontryagin

More information

Lecture 5: Control Over Lossy Networks

Lecture 5: Control Over Lossy Networks Lecture 5: Control Over Lossy Networks Yilin Mo July 2, 2015 1 Classical LQG Control The system: x k+1 = Ax k + Bu k + w k, y k = Cx k + v k x 0 N (0, Σ), w k N (0, Q), v k N (0, R). Information available

More information

6.241 Dynamic Systems and Control

6.241 Dynamic Systems and Control 6.241 Dynamic Systems and Control Lecture 24: H2 Synthesis Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute of Technology May 4, 2011 E. Frazzoli (MIT) Lecture 24: H 2 Synthesis May

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions

More information

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4. Optimal Control Lecture 18 Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen Ref: Bryson & Ho Chapter 4. March 29, 2004 Outline Hamilton-Jacobi-Bellman (HJB) Equation Iterative solution of HJB Equation

More information

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28 OPTIMAL CONTROL Sadegh Bolouki Lecture slides for ECE 515 University of Illinois, Urbana-Champaign Fall 2016 S. Bolouki (UIUC) 1 / 28 (Example from Optimal Control Theory, Kirk) Objective: To get from

More information

Linearly-Solvable Stochastic Optimal Control Problems

Linearly-Solvable Stochastic Optimal Control Problems Linearly-Solvable Stochastic Optimal Control Problems Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter 2014

More information

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011 Department of Probability and Mathematical Statistics Faculty of Mathematics and Physics, Charles University in Prague petrasek@karlin.mff.cuni.cz Seminar in Stochastic Modelling in Economics and Finance

More information

Deterministic Dynamic Programming

Deterministic Dynamic Programming Deterministic Dynamic Programming 1 Value Function Consider the following optimal control problem in Mayer s form: V (t 0, x 0 ) = inf u U J(t 1, x(t 1 )) (1) subject to ẋ(t) = f(t, x(t), u(t)), x(t 0

More information

EE C128 / ME C134 Final Exam Fall 2014

EE C128 / ME C134 Final Exam Fall 2014 EE C128 / ME C134 Final Exam Fall 2014 December 19, 2014 Your PRINTED FULL NAME Your STUDENT ID NUMBER Number of additional sheets 1. No computers, no tablets, no connected device (phone etc.) 2. Pocket

More information

1 Kalman Filter Introduction

1 Kalman Filter Introduction 1 Kalman Filter Introduction You should first read Chapter 1 of Stochastic models, estimation, and control: Volume 1 by Peter S. Maybec (available here). 1.1 Explanation of Equations (1-3) and (1-4) Equation

More information

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Optimal Control. McGill COMP 765 Oct 3 rd, 2017 Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps

More information

Path Integral Stochastic Optimal Control for Reinforcement Learning

Path Integral Stochastic Optimal Control for Reinforcement Learning Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute

More information

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: http://wcms.inf.ed.ac.uk/ipab/rss Control Theory Concerns controlled systems of the form: and a controller of the

More information

Optimal Control with Learned Forward Models

Optimal Control with Learned Forward Models Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V

More information

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal

More information

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study

More information

= m(0) + 4e 2 ( 3e 2 ) 2e 2, 1 (2k + k 2 ) dt. m(0) = u + R 1 B T P x 2 R dt. u + R 1 B T P y 2 R dt +

= m(0) + 4e 2 ( 3e 2 ) 2e 2, 1 (2k + k 2 ) dt. m(0) = u + R 1 B T P x 2 R dt. u + R 1 B T P y 2 R dt + ECE 553, Spring 8 Posted: May nd, 8 Problem Set #7 Solution Solutions: 1. The optimal controller is still the one given in the solution to the Problem 6 in Homework #5: u (x, t) = p(t)x k(t), t. The minimum

More information

LQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin

LQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin LQR, Kalman Filter, and LQG Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin May 2015 Linear Quadratic Regulator (LQR) Consider a linear system

More information

Optimal control and estimation

Optimal control and estimation Automatic Control 2 Optimal control and estimation Prof. Alberto Bemporad University of Trento Academic year 2010-2011 Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011

More information

CALIFORNIA INSTITUTE OF TECHNOLOGY Control and Dynamical Systems. CDS 110b

CALIFORNIA INSTITUTE OF TECHNOLOGY Control and Dynamical Systems. CDS 110b CALIFORNIA INSTITUTE OF TECHNOLOGY Control and Dynamical Systems CDS 110b R. M. Murray Kalman Filters 25 January 2006 Reading: This set of lectures provides a brief introduction to Kalman filtering, following

More information

Reinforcement Learning In Continuous Time and Space

Reinforcement Learning In Continuous Time and Space Reinforcement Learning In Continuous Time and Space presentation of paper by Kenji Doya Leszek Rybicki lrybicki@mat.umk.pl 18.07.2008 Leszek Rybicki lrybicki@mat.umk.pl Reinforcement Learning In Continuous

More information

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations Department of Biomedical Engineering and Computational Science Aalto University April 28, 2010 Contents 1 Multiple Model

More information

Practical numerical methods for stochastic optimal control of biological systems in continuous time and space

Practical numerical methods for stochastic optimal control of biological systems in continuous time and space Practical numerical methods for stochastic optimal control of biological systems in continuous time and space Alex Simpkins, and Emanuel Todorov Abstract In previous studies it has been suggested that

More information

Topic # Feedback Control Systems

Topic # Feedback Control Systems Topic #17 16.31 Feedback Control Systems Deterministic LQR Optimal control and the Riccati equation Weight Selection Fall 2007 16.31 17 1 Linear Quadratic Regulator (LQR) Have seen the solutions to the

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Stochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System

Stochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System LETTER Communicated by Tamar Flash Stochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System Emanuel Todorov todorov@cogsci.ucsd.edu Department of

More information

Stochastic optimal control methods for uncertain predictive reaching movements

Stochastic optimal control methods for uncertain predictive reaching movements Stochastic optimal control methods for uncertain predictive reaching movements Alex Simpkins, Dan Liu, and Emo Todorov Abstract People learn from the varying environment, and adapt their control strategy

More information

Final Exam Solutions

Final Exam Solutions EE55: Linear Systems Final Exam SIST, ShanghaiTech Final Exam Solutions Course: Linear Systems Teacher: Prof. Boris Houska Duration: 85min YOUR NAME: (type in English letters) I Introduction This exam

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of

More information

State estimation and the Kalman filter

State estimation and the Kalman filter State estimation and the Kalman filter PhD, David Di Ruscio Telemark university college Department of Technology Systems and Control Engineering N-3914 Porsgrunn, Norway Fax: +47 35 57 52 50 Tel: +47 35

More information

ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming

ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming ECE7850 Lecture 7 Discrete Time Optimal Control and Dynamic Programming Discrete Time Optimal control Problems Short Introduction to Dynamic Programming Connection to Stabilization Problems 1 DT nonlinear

More information

Organization. I MCMC discussion. I project talks. I Lecture.

Organization. I MCMC discussion. I project talks. I Lecture. Organization I MCMC discussion I project talks. I Lecture. Content I Uncertainty Propagation Overview I Forward-Backward with an Ensemble I Model Reduction (Intro) Uncertainty Propagation in Causal Systems

More information

D(s) G(s) A control system design definition

D(s) G(s) A control system design definition R E Compensation D(s) U Plant G(s) Y Figure 7. A control system design definition x x x 2 x 2 U 2 s s 7 2 Y Figure 7.2 A block diagram representing Eq. (7.) in control form z U 2 s z Y 4 z 2 s z 2 3 Figure

More information

Miscellaneous. Regarding reading materials. Again, ask questions (if you have) and ask them earlier

Miscellaneous. Regarding reading materials. Again, ask questions (if you have) and ask them earlier Miscellaneous Regarding reading materials Reading materials will be provided as needed If no assigned reading, it means I think the material from class is sufficient Should be enough for you to do your

More information

State Regulator. Advanced Control. design of controllers using pole placement and LQ design rules

State Regulator. Advanced Control. design of controllers using pole placement and LQ design rules Advanced Control State Regulator Scope design of controllers using pole placement and LQ design rules Keywords pole placement, optimal control, LQ regulator, weighting matrixes Prerequisites Contact state

More information

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function: Optimal Control Control design based on pole-placement has non unique solutions Best locations for eigenvalues are sometimes difficult to determine Linear Quadratic LQ) Optimal control minimizes a quadratic

More information

SYSTEMTEORI - KALMAN FILTER VS LQ CONTROL

SYSTEMTEORI - KALMAN FILTER VS LQ CONTROL SYSTEMTEORI - KALMAN FILTER VS LQ CONTROL 1. Optimal regulator with noisy measurement Consider the following system: ẋ = Ax + Bu + w, x(0) = x 0 where w(t) is white noise with Ew(t) = 0, and x 0 is a stochastic

More information

Lecture 6: Bayesian Inference in SDE Models

Lecture 6: Bayesian Inference in SDE Models Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs

More information

CALIFORNIA INSTITUTE OF TECHNOLOGY Control and Dynamical Systems. CDS 110b

CALIFORNIA INSTITUTE OF TECHNOLOGY Control and Dynamical Systems. CDS 110b CALIFORNIA INSTITUTE OF TECHNOLOGY Control and Dynamical Systems CDS 110b R. M. Murray Kalman Filters 14 January 2007 Reading: This set of lectures provides a brief introduction to Kalman filtering, following

More information

EL2520 Control Theory and Practice

EL2520 Control Theory and Practice EL2520 Control Theory and Practice Lecture 8: Linear quadratic control Mikael Johansson School of Electrical Engineering KTH, Stockholm, Sweden Linear quadratic control Allows to compute the controller

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

Optimization-Based Control

Optimization-Based Control Optimization-Based Control Richard M. Murray Control and Dynamical Systems California Institute of Technology DRAFT v1.7a, 19 February 2008 c California Institute of Technology All rights reserved. This

More information

Problem 1 Cost of an Infinite Horizon LQR

Problem 1 Cost of an Infinite Horizon LQR THE UNIVERSITY OF TEXAS AT SAN ANTONIO EE 5243 INTRODUCTION TO CYBER-PHYSICAL SYSTEMS H O M E W O R K # 5 Ahmad F. Taha October 12, 215 Homework Instructions: 1. Type your solutions in the LATEX homework

More information

Lecture 1: Pragmatic Introduction to Stochastic Differential Equations

Lecture 1: Pragmatic Introduction to Stochastic Differential Equations Lecture 1: Pragmatic Introduction to Stochastic Differential Equations Simo Särkkä Aalto University, Finland (visiting at Oxford University, UK) November 13, 2013 Simo Särkkä (Aalto) Lecture 1: Pragmatic

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object

More information

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max-Planck-Institute for Intelligent Systems Tübingen, Germany & Computer Science, Neuroscience, &

More information

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References 24th March 2011 Update Hierarchical Model Rao and Ballard (1999) presented a hierarchical model of visual cortex to show how classical and extra-classical Receptive Field (RF) effects could be explained

More information

Automatic Control II Computer exercise 3. LQG Design

Automatic Control II Computer exercise 3. LQG Design Uppsala University Information Technology Systems and Control HN,FS,KN 2000-10 Last revised by HR August 16, 2017 Automatic Control II Computer exercise 3 LQG Design Preparations: Read Chapters 5 and 9

More information

Gaussians. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Gaussians. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Gaussians Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Univariate Gaussian Multivariate Gaussian Law of Total Probability Conditioning

More information

Markov Chain Monte Carlo Methods for Stochastic

Markov Chain Monte Carlo Methods for Stochastic Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013

More information

Lecture 9 Nonlinear Control Design

Lecture 9 Nonlinear Control Design Lecture 9 Nonlinear Control Design Exact-linearization Lyapunov-based design Lab 2 Adaptive control Sliding modes control Literature: [Khalil, ch.s 13, 14.1,14.2] and [Glad-Ljung,ch.17] Course Outline

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

MATH4406 (Control Theory) Unit 1: Introduction Prepared by Yoni Nazarathy, July 21, 2012

MATH4406 (Control Theory) Unit 1: Introduction Prepared by Yoni Nazarathy, July 21, 2012 MATH4406 (Control Theory) Unit 1: Introduction Prepared by Yoni Nazarathy, July 21, 2012 Unit Outline Introduction to the course: Course goals, assessment, etc... What is Control Theory A bit of jargon,

More information

Homework Solution # 3

Homework Solution # 3 ECSE 644 Optimal Control Feb, 4 Due: Feb 17, 4 (Tuesday) Homework Solution # 3 1 (5%) Consider the discrete nonlinear control system in Homework # For the optimal control and trajectory that you have found

More information

Optimization-Based Control

Optimization-Based Control Optimization-Based Control Richard M. Murray Control and Dynamical Systems California Institute of Technology DRAFT v2.1a, January 3, 2010 c California Institute of Technology All rights reserved. This

More information

Alberto Bressan. Department of Mathematics, Penn State University

Alberto Bressan. Department of Mathematics, Penn State University Non-cooperative Differential Games A Homotopy Approach Alberto Bressan Department of Mathematics, Penn State University 1 Differential Games d dt x(t) = G(x(t), u 1(t), u 2 (t)), x(0) = y, u i (t) U i

More information

Stochastic Optimal Control!

Stochastic Optimal Control! Stochastic Control! Robert Stengel! Robotics and Intelligent Systems, MAE 345, Princeton University, 2015 Learning Objectives Overview of the Linear-Quadratic-Gaussian (LQG) Regulator Introduction to Stochastic

More information

A Crash Course on Kalman Filtering

A Crash Course on Kalman Filtering A Crash Course on Kalman Filtering Dan Simon Cleveland State University Fall 2014 1 / 64 Outline Linear Systems Probability State Means and Covariances Least Squares Estimation The Kalman Filter Unknown

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Time Series Analysis

Time Series Analysis Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture State space models, 1st part: Model: Sec. 10.1 The

More information

ELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems

ELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems ELEC4631 s Lecture 2: Dynamic Control Systems 7 March 2011 Overview of dynamic control systems Goals of Controller design Autonomous dynamic systems Linear Multi-input multi-output (MIMO) systems Bat flight

More information

Static and Dynamic Optimization (42111)

Static and Dynamic Optimization (42111) Static and Dynamic Optimization (421) Niels Kjølstad Poulsen Build. 0b, room 01 Section for Dynamical Systems Dept. of Applied Mathematics and Computer Science The Technical University of Denmark Email:

More information

EECS C128/ ME C134 Final Wed. Dec. 15, am. Closed book. Two pages of formula sheets. No calculators.

EECS C128/ ME C134 Final Wed. Dec. 15, am. Closed book. Two pages of formula sheets. No calculators. Name: SID: EECS C28/ ME C34 Final Wed. Dec. 5, 2 8- am Closed book. Two pages of formula sheets. No calculators. There are 8 problems worth points total. Problem Points Score 2 2 6 3 4 4 5 6 6 7 8 2 Total

More information

Solution of Stochastic Optimal Control Problems and Financial Applications

Solution of Stochastic Optimal Control Problems and Financial Applications Journal of Mathematical Extension Vol. 11, No. 4, (2017), 27-44 ISSN: 1735-8299 URL: http://www.ijmex.com Solution of Stochastic Optimal Control Problems and Financial Applications 2 Mat B. Kafash 1 Faculty

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

Uniformly Uniformly-ergodic Markov chains and BSDEs

Uniformly Uniformly-ergodic Markov chains and BSDEs Uniformly Uniformly-ergodic Markov chains and BSDEs Samuel N. Cohen Mathematical Institute, University of Oxford (Based on joint work with Ying Hu, Robert Elliott, Lukas Szpruch) Centre Henri Lebesgue,

More information

Lecture 19 Observability and state estimation

Lecture 19 Observability and state estimation EE263 Autumn 2007-08 Stephen Boyd Lecture 19 Observability and state estimation state estimation discrete-time observability observability controllability duality observers for noiseless case continuous-time

More information

Kalman Filter. Man-Wai MAK

Kalman Filter. Man-Wai MAK Kalman Filter Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S. Gannot and A. Yeredor,

More information

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Yaakov Engel Joint work with Peter Szabo and Dmitry Volkinshtein (ex. Technion) Why use GPs in RL? A Bayesian approach

More information

The Kalman Filter ImPr Talk

The Kalman Filter ImPr Talk The Kalman Filter ImPr Talk Ged Ridgway Centre for Medical Image Computing November, 2006 Outline What is the Kalman Filter? State Space Models Kalman Filter Overview Bayesian Updating of Estimates Kalman

More information

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER KRISTOFFER P. NIMARK The Kalman Filter We will be concerned with state space systems of the form X t = A t X t 1 + C t u t 0.1 Z t

More information

Lecture Notes: (Stochastic) Optimal Control

Lecture Notes: (Stochastic) Optimal Control Lecture Notes: (Stochastic) Optimal ontrol Marc Toussaint Machine Learning & Robotics group, TU erlin Franklinstr. 28/29, FR 6-9, 587 erlin, Germany July, 2 Disclaimer: These notes are not meant to be

More information

Topic # /31 Feedback Control Systems. Analysis of Nonlinear Systems Lyapunov Stability Analysis

Topic # /31 Feedback Control Systems. Analysis of Nonlinear Systems Lyapunov Stability Analysis Topic # 16.30/31 Feedback Control Systems Analysis of Nonlinear Systems Lyapunov Stability Analysis Fall 010 16.30/31 Lyapunov Stability Analysis Very general method to prove (or disprove) stability of

More information

Nonlinear Observer Design for Dynamic Positioning

Nonlinear Observer Design for Dynamic Positioning Author s Name, Company Title of the Paper DYNAMIC POSITIONING CONFERENCE November 15-16, 2005 Control Systems I J.G. Snijders, J.W. van der Woude Delft University of Technology (The Netherlands) J. Westhuis

More information

Nonlinear Model Predictive Control Tools (NMPC Tools)

Nonlinear Model Predictive Control Tools (NMPC Tools) Nonlinear Model Predictive Control Tools (NMPC Tools) Rishi Amrit, James B. Rawlings April 5, 2008 1 Formulation We consider a control system composed of three parts([2]). Estimator Target calculator Regulator

More information

ECE557 Systems Control

ECE557 Systems Control ECE557 Systems Control Bruce Francis Course notes, Version.0, September 008 Preface This is the second Engineering Science course on control. It assumes ECE56 as a prerequisite. If you didn t take ECE56,

More information

Reinforcement Learning with Reference Tracking Control in Continuous State Spaces

Reinforcement Learning with Reference Tracking Control in Continuous State Spaces Reinforcement Learning with Reference Tracking Control in Continuous State Spaces Joseph Hall, Carl Edward Rasmussen and Jan Maciejowski Abstract The contribution described in this paper is an algorithm

More information

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory Constrained Innite-time Optimal Control Donald J. Chmielewski Chemical Engineering Department University of California Los Angeles February 23, 2000 Stochastic Formulation - Min Max Formulation - UCLA

More information

EL 625 Lecture 10. Pole Placement and Observer Design. ẋ = Ax (1)

EL 625 Lecture 10. Pole Placement and Observer Design. ẋ = Ax (1) EL 625 Lecture 0 EL 625 Lecture 0 Pole Placement and Observer Design Pole Placement Consider the system ẋ Ax () The solution to this system is x(t) e At x(0) (2) If the eigenvalues of A all lie in the

More information

Computational Issues in Nonlinear Dynamics and Control

Computational Issues in Nonlinear Dynamics and Control Computational Issues in Nonlinear Dynamics and Control Arthur J. Krener ajkrener@ucdavis.edu Supported by AFOSR and NSF Typical Problems Numerical Computation of Invariant Manifolds Typical Problems Numerical

More information

Theory and Implementation of Biomimetic Motor Controllers

Theory and Implementation of Biomimetic Motor Controllers Theory and Implementation of Biomimetic Motor Controllers Thesis submitted for the degree of Doctor of Philosophy by Yuval Tassa Submitted to the Senate of the Hebrew University of Jerusalem February 2011

More information

Robust control and applications in economic theory

Robust control and applications in economic theory Robust control and applications in economic theory In honour of Professor Emeritus Grigoris Kalogeropoulos on the occasion of his retirement A. N. Yannacopoulos Department of Statistics AUEB 24 May 2013

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Observability for deterministic systems and high-gain observers

Observability for deterministic systems and high-gain observers Observability for deterministic systems and high-gain observers design. Part 1. March 29, 2011 Introduction and problem description Definition of observability Consequences of instantaneous observability

More information

EE221A Linear System Theory Final Exam

EE221A Linear System Theory Final Exam EE221A Linear System Theory Final Exam Professor C. Tomlin Department of Electrical Engineering and Computer Sciences, UC Berkeley Fall 2016 12/16/16, 8-11am Your answers must be supported by analysis,

More information

Neural Networks Lecture 10: Fault Detection and Isolation (FDI) Using Neural Networks

Neural Networks Lecture 10: Fault Detection and Isolation (FDI) Using Neural Networks Neural Networks Lecture 10: Fault Detection and Isolation (FDI) Using Neural Networks H.A. Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011.

More information