Bayesian Decision Theory in Sensorimotor Control
|
|
- Angela Mason
- 6 years ago
- Views:
Transcription
1 Bayesian Decision Theory in Sensorimotor Control Matthias Freiberger, Martin Öttl Signal Processing and Speech Communication Laboratory Advanced Signal Processing Matthias Freiberger, Martin Öttl Advanced Signal Processing page 1/88
2 Outline Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References Matthias Freiberger, Martin Öttl Advanced Signal Processing page 2/88
3 Intro - What is sensorimotor control? sen so ri mo tor: (adj.) Of, relating to, or involving both sensory and motor activity: sensorimotor nerve centers; sensorimotor pathways. The American Heritage Dictionary of the English Language, Fourth Edition Movement is the only way for humans to interact with the world. All communication including speech, sign language, gestures and writing, is mediated by the motor system. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 3/88
4 Intro - What is sensorimotor control? We want to understand/describe by application methods from computer science and control theory how.. human beings are able to play back a tennis ball.. or grab a bottle of water and drink.. birds of prey are capable of catching a mouse in flight.. basically how any kind of physical interaction with the environment is performed by biological systems, pursuing a certain objective while permanently performing corrections using sensor input Matthias Freiberger, Martin Öttl Advanced Signal Processing page 4/88
5 Intro - Challenges Action selection is a fundamental decision process CNS sends constantly sends motor commands to the muscles At each point in time: the appropriate motor command needs to be selected Knowledge about the environment needs to be combined with actual observation data and knowledge about cost/reward of currently possible actions to make optimal decisions. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 5/88
6 Intro - Schematic Control Flow Matthias Freiberger, Martin Öttl Advanced Signal Processing page 6/88
7 Intro - Uncertainty of human sensorium Human sensorium is plagued by noise Muscle output is noisy as well Therefore state of environment/body needs to be estimated Additionally the cost of each movement shall be minimized Bayesian statistics come in as a powerful way to deal with the uncertainty of the human sensorium Matthias Freiberger, Martin Öttl Advanced Signal Processing page 7/88
8 Intro - Bayesian integration CNS needs to integrate prior knowledge about environment with knowledge obtained from sensory data to estimate the state of the environment optimally When estimating bounce location of a tennis ball: ball might be more likely to bounce off at edges of court Matthias Freiberger, Martin Öttl Advanced Signal Processing page 8/88
9 Intro - Bayesian Cue Combination Combination of sensor signals for better estimates Combination of different sensor modalities (e.g. Vision and Proprioception) Combination of signal of same modality (several visual cues to a stereo image... ) Cues need to be weighted against each other Matthias Freiberger, Martin Öttl Advanced Signal Processing page 9/88
10 Intro - Bayesian Cue Combination Given a set of observations from different cues d 1, d 2, d 3,..., d n under the assumption that cues are independent from each other we can rewrite the likelihood P (d 1, d 2, d 3,..., d n ) as P (d 1, d 2, d 3,..., d n s) = n P (d k s) (1) k=1 Therefore we can rewrite the corresponding posterior probability: P (s d 1, d 2, d 3,..., d n ) = P (s) n k=1 P (d k s) P (d 1, d 2, d 3,..., d n ) (2) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 10/88
11 Intro - Cost Functions Model how good or bad the outcome of a particular move is Seems reasonable to minimize consumed energy and strain on muscles Several cost functions have been proposed (smoothness,precision) CNS also adapts very well to external cost functions Matthias Freiberger, Martin Öttl Advanced Signal Processing page 11/88
12 Intro - Cost Functions Actual cost function of human movement can be inferred using indifference lines Utility function can be found from these lines : compare points from lines,and assigning utilities to lines Matthias Freiberger, Martin Öttl Advanced Signal Processing page 12/88
13 Intro - Cost Functions Matthias Freiberger, Martin Öttl Advanced Signal Processing page 13/88
14 Intro - Cost Functions Given a set of possible actions X and a set of possible outcomes O, as well as a utility function U(o) : O R, for any x X we can compute the expected utility E{U} = O P (o x) U(o) (3) Therefore the optimal decision in respect to the cost function U(o) is considered to be the one which maximizes the expected utility E{U}. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 14/88
15 Outline Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References Matthias Freiberger, Martin Öttl Advanced Signal Processing page 15/88
16 Optimal Estimation Intro Until now Find the optimal action for a finite amount of actions But the world is continious... Actual continuos state of our body parts has to be estimated permanently, optimal actions according to state estimation need to be found. In control terms... We need to model ourselves an observer, which estimates the inner state (e.g the position and velocity) of our limbs Matthias Freiberger, Martin Öttl Advanced Signal Processing page 16/88
17 Optimal Estimation Experiment Experiment setup Test subjects had to estimate the location of their thumb after moving their arm Resistive or assistive force has been added by torque motors Hand is constrained to move on a straight line Arm is illuminated for 2s, to give an initial state After that, participants have to rely solely on proprioception Matthias Freiberger, Martin Öttl Advanced Signal Processing page 17/88
18 Optimal Estimation Experiment Experiment setup Matthias Freiberger, Martin Öttl Advanced Signal Processing page 18/88
19 Optimal Estimation Models A system that mimics the behavior of a natural process, is called an internal model Internal models are an important concept in motor control Basically, two classes of internal models can be distinguished: forward models and backward models Matthias Freiberger, Martin Öttl Advanced Signal Processing page 19/88
20 Optimal Estimation Internal models: forward vs. backward Forward models Mimic the causal flow of a process by predicting its next state Comes up natural since delays in most sensorimotor loops are large,feedback control may be too slow for rapid movements Key indegredient in systems that use motor outflow (efference copy) Backward models Estimate the appropriate motor command which caused a particular state transition Matthias Freiberger, Martin Öttl Advanced Signal Processing page 20/88
21 Optimal Estimation Internal models: forward vs. backward How do we optimally model our limbs now? Wolpert et. al. used a forward model incorparating a correction term for the given problem. State estimation for a system containing noise is a complex task We will follow an intuitive approach by modeling an observer for a deterministic system first From our deterministic observer, we will perform the transition to a Probabilistic Observer ( Kalman Filter) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 21/88
22 Optimal Estimation - The Plant Model arm as a damped mass system State model ẋ = Ax + bu y = c T x + du State variables state update equation model for sensory output x x 1 position of the mass (hand) u(t) applied force x 2 velocity of the mass (hand) x y(t) sensory output Matthias Freiberger, Martin Öttl Advanced Signal Processing page 22/88
23 Optimal Estimation - The Plant Model parameters ( ) 0 1 A = 0 β m ( ) 1 c = 0 ) ( 0 b = 1 m ( ) 0 d = 0 m β mass of hand damping parameter Matthias Freiberger, Martin Öttl Advanced Signal Processing page 23/88
24 TU Graz - Signal Processing and Speech Communication Laboratory Optimal Estimation- Luenberger Observer Observer Model u(t) ẋ = Ax+bu y(t) Observer Ansatz for the Luenberger Observer ˆx = ˆx + ˆb 1 u + ˆb 2 y (4) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 24/88
25 Optimal Estimation- Luenberger Observer Derivation Error constraint e(t) = x(t) ˆx(t) lim e(t) = 0 t ė = ẋ ˆx = (Ax + bu) (ˆx + ˆb 1 u + ˆb 2 y) Set y = c T x and rearrange the equation ė = (A ˆb 2 c T )x ˆx + (b ˆb 1 )u Matthias Freiberger, Martin Öttl Advanced Signal Processing page 25/88
26 Optimal Estimation- Luenberger Observer Derivation ė = (A ˆb 2 c T )x ˆx + (b ˆb 1 )u Error shall be independent from the input set ˆb 1 = b ė = (A ˆb 2 c T )x ˆx Choose  = A ˆb 2 c T and get for the error ė = (A ˆb 2 c T )e Final model: ˆx = (A ˆb 2 c T )ˆx + bu + ˆb 2 y Matthias Freiberger, Martin Öttl Advanced Signal Processing page 26/88
27 Optimal Estimation- Luenberger Observer Derivation ˆx = (A ˆb 2 c T )ˆx + ˆb 1 u + ˆb 2 y Rewrite ˆb 2 = ˆb and c T ˆx = ŷ ˆx = Aˆx ˆbŷ + ˆby + bu Comprehend terms ˆx = Aˆx + bu + ˆb(y ŷ) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 27/88
28 Optimal Estimation- Luenberger Observer Where are our models now? ˆẋ = Aˆx }{{ + bu } + ˆb(y ŷ) }{{} F orward model Sensory correction Forward model takes the actual state estimate, tries to predict the further trend of the state Use difference between actual sensory feedback y prediction ŷ weighted by ˆb to update state estimate. How to choose ˆb? For deterministic Systems: Choose ˆb such that (A ˆb 2 c T ) is asymptotically stable. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 28/88
29 TU Graz - Signal Processing and Speech Communication Laboratory Optimal Estimation- Probabilistic Observer Real world can be mean and difficult Noise is everywhere.. Circuits are plagued by noise so are radio transmissions and even our body u(t) ẋ = Ax+bu y(t) ˆẋ = Aˆx+bu+ ˆb(y ŷ) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 29/88
30 TU Graz - Signal Processing and Speech Communication Laboratory Optimal Estimation- Probabilistic Observer Real world can be mean and difficult Noise is everywhere.. Circuits are plagued by noise so are radio transmissions and even our body v(t) u(t) ẋ = Ax+bu+w y(t) Observer? Matthias Freiberger, Martin Öttl Advanced Signal Processing page 30/88
31 Optimal Estimation- Probabilistic Observer Stochastic model ẋ = Ax + bu + w y = Cx + v state update equation model for sensory output x w(t) v(t) motor noise sensory noise w and v are Random Variables Therefore, the state vector x is a vector of RVs as well This means that we need a Bayesian estimator to estimate the mean x and covariance matrix P of an RV X Matthias Freiberger, Martin Öttl Advanced Signal Processing page 31/88
32 Optimal Estimation- Probabilistic Observer Some simplifications We assume that our noise is Additive White Gaussian Noise, as well as uncorrelated from the initial state x 0 w(t) N (0, Q c ) v(t) N (0, R c ) It can be shown that in this case, the minimum variance estimator is the Kalman Filter. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 32/88
33 Optimal Estimation- Kalman Filter model for a Kalman filter ˆx = } Aˆx {{ + bu } + K t (y ŷ) }{{} F orwardmodel Sensorycorrection (5) Computation of K and P K t = P t C T R 1 c Kalman matrix P t = K t R 1 c K t T + AP t + P t A T + Q c Update rule for P Matthias Freiberger, Martin Öttl Advanced Signal Processing page 33/88
34 Optimal Estimation- Experiment results Test probands (GAM) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 34/88
35 Optimal Estimation- Experiment results Kalman Filter Matthias Freiberger, Martin Öttl Advanced Signal Processing page 35/88
36 Optimal Estimation- Experiment results Test probands (GAM) Kalman Filter Matthias Freiberger, Martin Öttl Advanced Signal Processing page 36/88
37 Optimal Estimation- Experiment conclusions Kalman Filter Curves are quite similar Noticeable peak at 1s seems to be a tradeoff between forward model and backward model Variance jitter on experiment for changing forces,no force dependent change in variance is predicted by the Kalman filter The experiment provides support for the use of forward models applying sensory correction Matthias Freiberger, Martin Öttl Advanced Signal Processing page 37/88
38 Outline Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References Matthias Freiberger, Martin Öttl Advanced Signal Processing page 38/88
39 Optimal Feedback Control Introduction Matthias Freiberger, Martin Öttl Advanced Signal Processing page 39/88
40 Intro Markov Decision Process (MDP) Some notation x X state of the Markov process u U(x) action / control in state x p(x x, u) control-dependent transition probability distribution l(x, u) 0 immediate cost for choosing control u in state x Shortest Path problem Cumulative cost: 5 Immediate cost: Target Matthias Freiberger, Martin Öttl Advanced Signal Processing page 40/88
41 Intro MDP First exit formulation (1) Goal find for each state a control law / policy u = π(x) U(x) which moves the trajectory towards a terminal state x T. each trajectory should cause the lowest total cost v π (x). v π (x) is also called cost-to-go. cost at terminal state is v π (x) = q T (x). Matthias Freiberger, Martin Öttl Advanced Signal Processing page 41/88
42 Intro MDP First exit formulation (2) Cost-to-go as path sum v π (x) = E x0=x x k+1 p(. x k,π k (x k )) Definition Hamiltonian H [x, u, v( )] def = l(x, u) + E x p( x,u) {v(x )} { q T (x tfirst ) + } t first 1 k=0 l(x k, π k (x k )) Bellman equations policy-specific cost-to-go v π (x) = H [x, π(x), v π ( )] optimal cost-to-go v (x) = min u U(x) H [x, u, v ( )] optimal policy π (x) = argmin u U(x) H [x, u, v ( )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 42/88
43 Intro MDP Finite horizon formulation All trajectories end at t = N. Cost-to-go as path sum v π t (x) = E xt=x x k+1 p(. x k,π k (x k )) Definition Hamiltonian H [x, u, v( )] def = l(x, u) + E x p( x,u) {v(x )} { q T (x N ) + } N 1 k=t l(x k, π k (x k )) Bellman equations policy-specific cost-to-go v π t (x) = H [ x, π t (x), v π t+1 ( )] optimal cost-to-go v t (x) = min u U(x) H [ x, u, v t+1 ( )] optimal policy π t (x) = argmin u U(x) H [ x, u, v t+1 ( )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 43/88
44 Intro MDP Infinite horizon discounted cost form. Trajectories continue forever; future costs are exponentially discounted with α < 1 to ensure a finite cost-to-go. Cost-to-go as path sum v π (x) = E x0=x x k+1 p(. x k,π k (x k )) Definition Hamiltonian { k=0 αk l(x k, π(x k )) } H α [x, u, v( )] def = l(x, u) + α E x p( x,u) {v(x )} Bellman equations policy-specific cost-to-go v π (x) = H α [x, π(x), v π ( )] optimal cost-to-go v (x) = min u U(x) H α [x, u, v ( )] optimal policy π (x) = argmin u U(x) H α [x, u, v ( )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 44/88
45 Intro MDP Infinite horizon average cost formulation Trajectories continue forever; there is no discounting and therefore the resulting cost-to-go is infinte. Average cost-to-go c π = lim N 1 N vπ,n 0 (x) Differential cost-to-go ṽ π (x) = v π,n 0 (x) Nc π Bellman equations policy-specific cost-to-go c π + ṽ π (x) = H [x, π(x), ṽ π ( )] optimal cost-to-go c + ṽ (x) = min u U(x) H [x, u, ṽ ( )] optimal policy π (x) = argmin u U(x) H [x, u, ṽ ( )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 45/88
46 Intro MDP Solution Algorithms for calculating a optimal cost-to-go Value Iteration Policy Iteration Linear Programming Matthias Freiberger, Martin Öttl Advanced Signal Processing page 46/88
47 Intro Continuous-time stochastic system (1) System Dynamics x(t) R n u(t) R m ξ(t) R k dx = f(x, u) dt + G(x, u) dξ state vector control vector Brownian motion (integral of white noise) Interpretation x(t) x(0) = t f(x(s), u(s)) ds + t 0 0 The last integral is an Ito integral. G(x(s), u(s)) dξ(s) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 47/88
48 Intro Continuous-time stochastic system (2) Ito integral An Ito integral for a square integrable function g(t) is defined as following t 0 g(s) dξ(s) = lim n k=0 with 0 = s 0 < s 1 < < s n = t n 1 g(s k )(ξ(s k+1 ) ξ(s k )) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 48/88
49 Intro Continuous-time stochastic system (3) Definition Hamiltonian H [x, u, v( )] def = l(x, u) + f(x, u) T v x (x) tr (Σ(x, u)v xx(x)) Σ(x, u) = G(x, u) G(x, u) T is the noise covariance. Hamilton-Jacobi-Bellman (HJB) equations for optimal cost-to-go first exit 0 = min u H [x, u, v ( )] v (x T ) = q T (x) finite horizon v t (x, t) = min u H [x, u, v (, t)] v (x, T ) = q T (x) discounted 1 τ v (x) = min u H [x, u, v ( )] average c = min u H [x, u, ṽ ( )] discounted cost-to-go v π (x) = E { 0 exp( t/τ) l(x(t), u(t)) dt } Matthias Freiberger, Martin Öttl Advanced Signal Processing page 49/88
50 Intro Inverse pendulum example (1) Task find optimal control law for inverse pendulum θ = k sin(θ) + u force of gravity k, angle θ, torque u state dependent cost q(θ) = 1 exp( 2θ 2 ) control dependent cost r 2 u2 overall cost per step l(x, u) = q(θ) + r 2 u2 Mechanics q k State dependent cost x2 = q p 0 p x1 = q Matthias Freiberger, Martin Öttl Advanced Signal Processing page 50/88
51 Intro Inverse pendulum example (2) Stochastic dynamics θ = k sin(θ) + u dx = (a(x) + Bu)dt + Gdξ [ ] [ [ ] [ ] [ ] x1 θ θ] x x = =, a(x) = 2 0 0, B =, G = k sin(x 1 ) 1 σ x 2 Discounted HJB equation (from above) 1 τ v (x) = min u H [x, u, v ( )] = min u [ l(x, u) + f(x, u) T v x(x) tr (Σ(x, u)v xx(x)) ] HJB for inverse pendulum 1 τ v (x) = min u [ q(x) + r 2 u2 + (a(x) + Bu) T v x(x) tr ( GG T v xx(x) )] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 51/88
52 Intro Inverse pendulum example (3) Optimal control law minimizes Hamiltonian differentiate Hamiltonian and set it zero ru + B T vx(x) = 0 u = 1 r BT vx(x) = 1 r v x 2 (x) Remarks v x is also called costate vector optimal control law depends on multiplication of a matrix containing system dynamics and energy costs with costate vector Matthias Freiberger, Martin Öttl Advanced Signal Processing page 52/88
53 Intro Inverse pendulum example (4) Calculation of costate vector insert optimal control law into HJB 1 τ v (x) = q(x) 1 2r v x x 2 v x 1 + k sin(x 1 )v x σ2 v x 2 x 2 construct MDP (discretize state space, approximate derivates with finite differences,...) solve MDP Matthias Freiberger, Martin Öttl Advanced Signal Processing page 53/88
54 Intro Inverse pendulum example (5) State dependent cost q(x) Optimal cost-to-go v (x) Optimal policy u = 1 r v x 2 (x) x2 = q x2 = q x2 = q p 0 p x 1 = q -8 -p 0 p x 1 = q -8 -p 0 p x 1 = q Matthias Freiberger, Martin Öttl Advanced Signal Processing page 54/88
55 Optimal Feedback Control Linear Quadratic Gaussian Framework (LQG) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 55/88
56 LQG Linear Quadratic Gaussian framework In most cases a optimal control law can t be obtained in closed form, one exception is the LQG system. LQG properties linear dynamics quadratic costs additive Gaussian noise (if present) Here the Hamiltonian can be minimized analytically. Matthias Freiberger, Martin Öttl Advanced Signal Processing page 56/88
57 LQG Continuous-time stochastic system Continuous-time LQG dynamics dx = (Ax + Bu) dt + Gdξ cost rate l(x, u) = 1 2 ut Ru xt Qx final cost h(x) = 1 2 xt Q f x R Q Q f control costs matrix (symmetric positive definite) state costs matrix (symmetric) final state cost matrix (symmetric) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 57/88
58 LQG Derivation cont.-time stochastic system (1) Guess for optimal value function v(x, t) = 1 2 xt V (t)x + a(t), V (t) symmetric Derivatives v t (x, t) = 1 2 xt V (t)x + ȧ(t) v x (x, t) = V (t) x v xx (x, t) = V (t) Substitution into finite horizon HJB 1 2 xt V (t)x ȧ(t) = min u { 1 2 ut Ru xt Qx + (Ax + Bu) T V (t)x tr(ggt V (t)) } Remember x T Ax x = ( A + A T ) a x T x x = xt a x = a Matthias Freiberger, Martin Öttl Advanced Signal Processing page 58/88
59 LQG Derivation cont.-time stochastic system (2) Analytically found minimum u = R 1 B T V (t)x Using this u, the control dependent part in HJB becomes 1 2 ut Ru + (Bu) T V (t)x = 1 2 xt V (t)br 1 B T V (t)x Simplifications (V is symmetric) x T A T V x = x T V T Ax = x T V Ax 2x T A T V x = x T A T V x + x T V Ax Matthias Freiberger, Martin Öttl Advanced Signal Processing page 59/88
60 LQG Derivation cont.-time stochastic system (3) Regrouping of the HJB equation yields to 1 2 xt V (t)x ȧ(t) = 1 2 xt ( Q + A T V (t) + V (t)a V (t)br 1 B T V (t) ) x tr(ggt V (t)) Our guess of the optimal value function is correct iff both following equations hold; the first one is called continuous-time Riccati equation V (t) = Q + A T V (t) + V (t)a V (t)br 1 B T V (t) ȧ(t) = 1 2 tr(ggt V (t)) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 60/88
61 LQG Derivation cont.-time stochastic system (4) Boundary conditions (we used finite horizon HJB) v(x, t f ) = 1 2 xt V (t f )x + a(t f ) = h(x) V (t f ) = Q f a(t f ) = 0 V (t), a(t) can be obtained by using the boundary conditions and integrating V (t) =... and ȧ(t) =... backward in time Optimal control law (repeated from above) u = R 1 B T V (t)x control law is independent of noise Matthias Freiberger, Martin Öttl Advanced Signal Processing page 61/88
62 LQG Discrete-time stochastic system (1) In practice one uses usually discrete time systems. Discrete-Time LQG dynamics cost rate final cost x t+1 = Ax t + Bu t + ξ t l(x, u) = 1 2 ut t Ru t xt t Qx t h(x) = 1 2 xt t f Q f x tf Optimal control law u t = L t x t Control gain L t = ( R + B T V t+1 B ) 1 B T V t+1 A Matthias Freiberger, Martin Öttl Advanced Signal Processing page 62/88
63 LQG Discrete-time stochastic system (2) Discrete-time Riccati equation V t = Q t + A T V t+1 (A BL t ) Solving above equations control gain is independent of state sequence and can be computed offline V t is computed by initializing V tf = Q f and iterating the Riccati equation backward in time Matthias Freiberger, Martin Öttl Advanced Signal Processing page 63/88
64 Optimal Feedback Control LQG + Kalman Filter Matthias Freiberger, Martin Öttl Advanced Signal Processing page 64/88
65 LQG + Kalman Filter Overview controller generates motor command u t and needs actual state estimate ˆx t estimator compensates sensory delays by the usage of the efference copy u t controller and estimator operate in a loop and therefore can generate motor commands even when sensory data become unavailable Controller (LQG) Estimated state ^ xt Motor command ut Efference copy ut Estimator (Kalman Filter) Process noise xt Sensory data yt Measurement noise wt Biomechanical plant State xt Sensory apparatus Matthias Freiberger, Martin Öttl Advanced Signal Processing page 65/88
66 LQG + Kalman Filter System model System model dynamics feedback cost per step x t+1 = Ax t + Bu t + ξ t y t = Hx t + ω t l(x, u) = x T t Q t x t + u T t Ru t ξ t ω t H process noise, Gaussian with zero mean and covariance Ω ξ measurement noise, Gaussian with zero mean and covariance Ω ω observation matrix Matthias Freiberger, Martin Öttl Advanced Signal Processing page 66/88
67 LQG + Kalman Filter Controller/Estimator Kalman Filter state estimate ˆx t+1 = Aˆx t + Bu t + K t (y t Hˆx t ) filter gain K t = AΣ t H T ( HΣ t H T + Ω ω) 1 estimation error covariance ˆx 0 and Σ 0 is given! Linear-Quadratic Regulator (LQR) control law control gain Σ t+1 = Ω ξ + (A K t H) Σ t A T u t = L tˆx t L t = ( R + B T V t+1 B ) 1 B T V t+1 A Riccati equation V t = Q t + A T V t+1 (A BL t ) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 67/88
68 Optimal Feedback Control LQG Multiplicative Noise Matthias Freiberger, Martin Öttl Advanced Signal Processing page 68/88
69 LQG Multiplicative Noise Overview Motivation Fitts law Faster movements are less accurate suggests that noise is control dependent standard deviation of muscle force can be good approximated by a linear function of mean force no explicit smoothness cost formulation necessary to achieve smooth trajectories Matthias Freiberger, Martin Öttl Advanced Signal Processing page 69/88
70 LQG Multiplicative Noise Definition System Model dynamics feedback cost per step x t+1 = Ax t + Bu t + ξ t + c i=1 εi tc i u t y t = Hx t + ω t + d i=1 ɛi td i x t l(x, u) = x T t Q t x t + u T t Ru t C i ε i t D i ɛ i t scaling matrices for control-dependent system noise i th control dependent noise component Gaussian with zero mean and covariance Ω ε = I scaling matrices for state-dependent observation noise i th state dependent noise component Gaussian with zero mean and covariance Ω ɛ = I Matthias Freiberger, Martin Öttl Advanced Signal Processing page 70/88
71 LQG Multiplicative Noise Controller/Estimator Estimator ˆx t+1 = (A BL t )ˆx t + K t (y t Hˆx t ) + η t Controller u t = L tˆx t Properties considers also internal noise η t independence of estimation and control isn t given anymore K t and L t are calculated offline; equations see [4] Matthias Freiberger, Martin Öttl Advanced Signal Processing page 71/88
72 LQG Multiplicative Noise Algorithm Algorithm to calculate K t and L t 1. initialize filter gains K 1... K n 1 with zero or with Kalman filter gain 2. calculate control gain L t backward in time 3. calculate filter gain K t in a forward pass through time 4. repeat 2. and 3. until convergence Matthias Freiberger, Martin Öttl Advanced Signal Processing page 72/88
73 LQG Multiplicative Noise Example (1) Task 1 D positioning of point mass from start position p(0) = 0 to target position p time step ; duration t end = 0.3s minimal energy consumption Dynamics mechanic p(t + ) = p(t) + ṗ(t) ṗ(t + ) = ṗ(t) + f(t) /m Biomechanical plant u(t) f(t) m = 1kg p(t) 0 p* X e(t) + muscle-like low pass filter f(t) p(t) Dynamics muscle (time constants τ 1, τ 2 ) f(t + ) = f(t)(1 /τ 2 ) + g(t) /τ 2 g(t + ) = g(t)(1 /τ 1 ) + u(t)(1 + σ c ε t ) /τ 1 Matthias Freiberger, Martin Öttl Advanced Signal Processing page 73/88
74 LQG Multiplicative Noise Example (2) Dynamics matrix formulation x t+1 = Ax t + Bu t + ε t C 1 u t x t = [ p(t) ṗ(t) f(t) g(t) p ] T /m 0 0 A = /τ 2 /τ /τ 1 0, B = 0 0 /τ 1, C 1 = Bσ c Feedback matrix formulation y t = Hx t + ω t H = Matthias Freiberger, Martin Öttl Advanced Signal Processing page 74/88
75 LQG Multiplicative Noise Example (3) Total cost (p(t end ) p ) 2 + (w }{{} v ṗ(t end )) 2 + (w }{{} f f(t end )) 2 + r n 1 u 2 (k ) }{{} n 1 k=1 (1) (2) (3) }{{} (4) (1) penalizes deviations from target position (2)+(3) force that movement must be finished at t end (4) ensures energy minimization w v, w f and r are corresponding weights Matthias Freiberger, Martin Öttl Advanced Signal Processing page 75/88
76 LQG Multiplicative Noise Example (4) Cost per step matrix formulation we define p = [ ] T and can write p(t end ) p = p T x t, therefore term (1) can be expressed as x T t (pp T )x t for term (2) and (3) we use v = [ 0 w v ] T and f = [ 0 0 w f 0 0 ] T that leads to l(x, u) = x T t Q t x t + u T t Ru t with Q 1,...,n 1 = 0, Q n = pp T + vv T + ff T and R = r Matthias Freiberger, Martin Öttl Advanced Signal Processing page 76/88
77 LQG Multiplicative Noise Example (5) Resulting trajectories smooth trajectories without modeling smoothness in the costs system can be unstable, but not encountered in problems the author is dealing with Matthias Freiberger, Martin Öttl Advanced Signal Processing page 77/88
78 Optimal Feedback Control Minimal Intervention Principle Matthias Freiberger, Martin Öttl Advanced Signal Processing page 78/88
79 Minimal Intervention Principle Definition Definition ignore task-irrelevant deviations Simple example x 1, x 2 are uncoupled state variables states are driven by u 1, u 2 control multiplicative noise initial state is sampled from a circular Gaussian Matthias Freiberger, Martin Öttl Advanced Signal Processing page 79/88
80 Minimal Intervention Principle Example 1 Task x 1 + x 2 = target use small u 1, u 2 Optimum u 1 = u 2 control law depends on x 1 + x 2 u 1, u 2 form motor synergy Result black ellipse shows distribution of final states Matthias Freiberger, Martin Öttl Advanced Signal Processing page 80/88
81 Minimal Intervention Principle Example 2 Alternative control law x 1 = x 2 = target/2 Results gray circle shows distribution of final states variance in redundant direction is reduced variance in task relevant direction is increased control signals are increased ( not optimal) Matthias Freiberger, Martin Öttl Advanced Signal Processing page 81/88
82 Optimal Feedback Control Hierarchical Optimal Controller Matthias Freiberger, Martin Öttl Advanced Signal Processing page 82/88
83 Hierarchical Optimal Controller Overview Principle low-level controller generates abstract representation y(x) of state x high-level controller generates commands v(y) to change y low-level controller computes energy efficient controls u(v, x) consistent with v Comparison with example 1, minimal intervention principle y = x 1 + x 2 v = f(y) u = [ v v ] T Matthias Freiberger, Martin Öttl Advanced Signal Processing page 83/88
84 Optimal Feedback Control Conclusion Matthias Freiberger, Martin Öttl Advanced Signal Processing page 84/88
85 Conclusion Summary We talked about... Markov Decision Process (MDP), Cost-to-go formulations Continuous-time stochastic system Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Matthias Freiberger, Martin Öttl Advanced Signal Processing page 85/88
86 Outline Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References Matthias Freiberger, Martin Öttl Advanced Signal Processing page 86/88
87 References Konrad P. Körding, Daniel M. Wolpert Bayesian decision theory in sensorimotor control, Trends Cogn Sci., vol. 10, no. 7, pp , July 2006, /j.tics Konrad P. Körding, Daniel M. Wolpert, Bayesian integration in sensorimotor learning, letters to nature vol. 427, no. 15, pp , January 2004 Emanuel Todorov, Optimality principles in sensorimotor control, Nature Neuroscience, vol. 7, no. 9, pp , Sep. 2004, /nn1309. Emanuel Todorov, Stochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System, Neural Comput., vol. 17, no. 5, pp , May. 2005, / Kenji Doya et al, Bayesian Brain, Chapter: Optimal Control Theory, MIT Press, 2006 Emanuel Todorov, Lecture: Intelligent control through learning and optimization, accessed 14 Mai 2012 Matthias Freiberger, Martin Öttl Advanced Signal Processing page 87/88
88
Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters
Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 204 Emo Todorov (UW) AMATH/CSE 579, Winter
More informationControlled Diffusions and Hamilton-Jacobi Bellman Equations
Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter
More informationRobotics. Control Theory. Marc Toussaint U Stuttgart
Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,
More informationInverse Optimality Design for Biological Movement Systems
Inverse Optimality Design for Biological Movement Systems Weiwei Li Emanuel Todorov Dan Liu Nordson Asymtek Carlsbad CA 921 USA e-mail: wwli@ieee.org. University of Washington Seattle WA 98195 USA Google
More informationLecture 10 Linear Quadratic Stochastic Control with Partial State Observation
EE363 Winter 2008-09 Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation partially observed linear-quadratic stochastic control problem estimation-control separation principle
More informationSteady State Kalman Filter
Steady State Kalman Filter Infinite Horizon LQ Control: ẋ = Ax + Bu R positive definite, Q = Q T 2Q 1 2. (A, B) stabilizable, (A, Q 1 2) detectable. Solve for the positive (semi-) definite P in the ARE:
More informationOptimal Control Theory
Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline which provides algorithms to solve various control problems The elaborate mathematical machinery behind optimal
More informationEN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018
EN530.603 Applied Optimal Control Lecture 8: Dynamic Programming October 0, 08 Lecturer: Marin Kobilarov Dynamic Programming (DP) is conerned with the computation of an optimal policy, i.e. an optimal
More informationHamilton-Jacobi-Bellman Equation Feb 25, 2008
Hamilton-Jacobi-Bellman Equation Feb 25, 2008 What is it? The Hamilton-Jacobi-Bellman (HJB) equation is the continuous-time analog to the discrete deterministic dynamic programming algorithm Discrete VS
More information6 OUTPUT FEEDBACK DESIGN
6 OUTPUT FEEDBACK DESIGN When the whole sate vector is not available for feedback, i.e, we can measure only y = Cx. 6.1 Review of observer design Recall from the first class in linear systems that a simple
More informationStochastic and Adaptive Optimal Control
Stochastic and Adaptive Optimal Control Robert Stengel Optimal Control and Estimation, MAE 546 Princeton University, 2018! Nonlinear systems with random inputs and perfect measurements! Stochastic neighboring-optimal
More informationSubject: Optimal Control Assignment-1 (Related to Lecture notes 1-10)
Subject: Optimal Control Assignment- (Related to Lecture notes -). Design a oil mug, shown in fig., to hold as much oil possible. The height and radius of the mug should not be more than 6cm. The mug must
More informationLecture 5 Linear Quadratic Stochastic Control
EE363 Winter 2008-09 Lecture 5 Linear Quadratic Stochastic Control linear-quadratic stochastic control problem solution via dynamic programming 5 1 Linear stochastic system linear dynamical system, over
More informationPontryagin s maximum principle
Pontryagin s maximum principle Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2012 Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 1 / 9 Pontryagin
More informationLecture 5: Control Over Lossy Networks
Lecture 5: Control Over Lossy Networks Yilin Mo July 2, 2015 1 Classical LQG Control The system: x k+1 = Ax k + Bu k + w k, y k = Cx k + v k x 0 N (0, Σ), w k N (0, Q), v k N (0, R). Information available
More information6.241 Dynamic Systems and Control
6.241 Dynamic Systems and Control Lecture 24: H2 Synthesis Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute of Technology May 4, 2011 E. Frazzoli (MIT) Lecture 24: H 2 Synthesis May
More informationPartially Observable Markov Decision Processes (POMDPs)
Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions
More informationOptimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.
Optimal Control Lecture 18 Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen Ref: Bryson & Ho Chapter 4. March 29, 2004 Outline Hamilton-Jacobi-Bellman (HJB) Equation Iterative solution of HJB Equation
More informationOPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28
OPTIMAL CONTROL Sadegh Bolouki Lecture slides for ECE 515 University of Illinois, Urbana-Champaign Fall 2016 S. Bolouki (UIUC) 1 / 28 (Example from Optimal Control Theory, Kirk) Objective: To get from
More informationLinearly-Solvable Stochastic Optimal Control Problems
Linearly-Solvable Stochastic Optimal Control Problems Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter 2014
More informationHJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011
Department of Probability and Mathematical Statistics Faculty of Mathematics and Physics, Charles University in Prague petrasek@karlin.mff.cuni.cz Seminar in Stochastic Modelling in Economics and Finance
More informationDeterministic Dynamic Programming
Deterministic Dynamic Programming 1 Value Function Consider the following optimal control problem in Mayer s form: V (t 0, x 0 ) = inf u U J(t 1, x(t 1 )) (1) subject to ẋ(t) = f(t, x(t), u(t)), x(t 0
More informationEE C128 / ME C134 Final Exam Fall 2014
EE C128 / ME C134 Final Exam Fall 2014 December 19, 2014 Your PRINTED FULL NAME Your STUDENT ID NUMBER Number of additional sheets 1. No computers, no tablets, no connected device (phone etc.) 2. Pocket
More information1 Kalman Filter Introduction
1 Kalman Filter Introduction You should first read Chapter 1 of Stochastic models, estimation, and control: Volume 1 by Peter S. Maybec (available here). 1.1 Explanation of Equations (1-3) and (1-4) Equation
More informationOptimal Control. McGill COMP 765 Oct 3 rd, 2017
Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps
More informationPath Integral Stochastic Optimal Control for Reinforcement Learning
Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute
More informationRobotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:
Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: http://wcms.inf.ed.ac.uk/ipab/rss Control Theory Concerns controlled systems of the form: and a controller of the
More informationOptimal Control with Learned Forward Models
Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V
More informationPartially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS
Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal
More informationA Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley
A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study
More information= m(0) + 4e 2 ( 3e 2 ) 2e 2, 1 (2k + k 2 ) dt. m(0) = u + R 1 B T P x 2 R dt. u + R 1 B T P y 2 R dt +
ECE 553, Spring 8 Posted: May nd, 8 Problem Set #7 Solution Solutions: 1. The optimal controller is still the one given in the solution to the Problem 6 in Homework #5: u (x, t) = p(t)x k(t), t. The minimum
More informationLQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin
LQR, Kalman Filter, and LQG Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin May 2015 Linear Quadratic Regulator (LQR) Consider a linear system
More informationOptimal control and estimation
Automatic Control 2 Optimal control and estimation Prof. Alberto Bemporad University of Trento Academic year 2010-2011 Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011
More informationCALIFORNIA INSTITUTE OF TECHNOLOGY Control and Dynamical Systems. CDS 110b
CALIFORNIA INSTITUTE OF TECHNOLOGY Control and Dynamical Systems CDS 110b R. M. Murray Kalman Filters 25 January 2006 Reading: This set of lectures provides a brief introduction to Kalman filtering, following
More informationReinforcement Learning In Continuous Time and Space
Reinforcement Learning In Continuous Time and Space presentation of paper by Kenji Doya Leszek Rybicki lrybicki@mat.umk.pl 18.07.2008 Leszek Rybicki lrybicki@mat.umk.pl Reinforcement Learning In Continuous
More informationLecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations
Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations Department of Biomedical Engineering and Computational Science Aalto University April 28, 2010 Contents 1 Multiple Model
More informationPractical numerical methods for stochastic optimal control of biological systems in continuous time and space
Practical numerical methods for stochastic optimal control of biological systems in continuous time and space Alex Simpkins, and Emanuel Todorov Abstract In previous studies it has been suggested that
More informationTopic # Feedback Control Systems
Topic #17 16.31 Feedback Control Systems Deterministic LQR Optimal control and the Riccati equation Weight Selection Fall 2007 16.31 17 1 Linear Quadratic Regulator (LQR) Have seen the solutions to the
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationStochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System
LETTER Communicated by Tamar Flash Stochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System Emanuel Todorov todorov@cogsci.ucsd.edu Department of
More informationStochastic optimal control methods for uncertain predictive reaching movements
Stochastic optimal control methods for uncertain predictive reaching movements Alex Simpkins, Dan Liu, and Emo Todorov Abstract People learn from the varying environment, and adapt their control strategy
More informationFinal Exam Solutions
EE55: Linear Systems Final Exam SIST, ShanghaiTech Final Exam Solutions Course: Linear Systems Teacher: Prof. Boris Houska Duration: 85min YOUR NAME: (type in English letters) I Introduction This exam
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of
More informationState estimation and the Kalman filter
State estimation and the Kalman filter PhD, David Di Ruscio Telemark university college Department of Technology Systems and Control Engineering N-3914 Porsgrunn, Norway Fax: +47 35 57 52 50 Tel: +47 35
More informationECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming
ECE7850 Lecture 7 Discrete Time Optimal Control and Dynamic Programming Discrete Time Optimal control Problems Short Introduction to Dynamic Programming Connection to Stabilization Problems 1 DT nonlinear
More informationOrganization. I MCMC discussion. I project talks. I Lecture.
Organization I MCMC discussion I project talks. I Lecture. Content I Uncertainty Propagation Overview I Forward-Backward with an Ensemble I Model Reduction (Intro) Uncertainty Propagation in Causal Systems
More informationD(s) G(s) A control system design definition
R E Compensation D(s) U Plant G(s) Y Figure 7. A control system design definition x x x 2 x 2 U 2 s s 7 2 Y Figure 7.2 A block diagram representing Eq. (7.) in control form z U 2 s z Y 4 z 2 s z 2 3 Figure
More informationMiscellaneous. Regarding reading materials. Again, ask questions (if you have) and ask them earlier
Miscellaneous Regarding reading materials Reading materials will be provided as needed If no assigned reading, it means I think the material from class is sufficient Should be enough for you to do your
More informationState Regulator. Advanced Control. design of controllers using pole placement and LQ design rules
Advanced Control State Regulator Scope design of controllers using pole placement and LQ design rules Keywords pole placement, optimal control, LQ regulator, weighting matrixes Prerequisites Contact state
More informationOptimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:
Optimal Control Control design based on pole-placement has non unique solutions Best locations for eigenvalues are sometimes difficult to determine Linear Quadratic LQ) Optimal control minimizes a quadratic
More informationSYSTEMTEORI - KALMAN FILTER VS LQ CONTROL
SYSTEMTEORI - KALMAN FILTER VS LQ CONTROL 1. Optimal regulator with noisy measurement Consider the following system: ẋ = Ax + Bu + w, x(0) = x 0 where w(t) is white noise with Ew(t) = 0, and x 0 is a stochastic
More informationLecture 6: Bayesian Inference in SDE Models
Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs
More informationCALIFORNIA INSTITUTE OF TECHNOLOGY Control and Dynamical Systems. CDS 110b
CALIFORNIA INSTITUTE OF TECHNOLOGY Control and Dynamical Systems CDS 110b R. M. Murray Kalman Filters 14 January 2007 Reading: This set of lectures provides a brief introduction to Kalman filtering, following
More informationEL2520 Control Theory and Practice
EL2520 Control Theory and Practice Lecture 8: Linear quadratic control Mikael Johansson School of Electrical Engineering KTH, Stockholm, Sweden Linear quadratic control Allows to compute the controller
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationOptimization-Based Control
Optimization-Based Control Richard M. Murray Control and Dynamical Systems California Institute of Technology DRAFT v1.7a, 19 February 2008 c California Institute of Technology All rights reserved. This
More informationProblem 1 Cost of an Infinite Horizon LQR
THE UNIVERSITY OF TEXAS AT SAN ANTONIO EE 5243 INTRODUCTION TO CYBER-PHYSICAL SYSTEMS H O M E W O R K # 5 Ahmad F. Taha October 12, 215 Homework Instructions: 1. Type your solutions in the LATEX homework
More informationLecture 1: Pragmatic Introduction to Stochastic Differential Equations
Lecture 1: Pragmatic Introduction to Stochastic Differential Equations Simo Särkkä Aalto University, Finland (visiting at Oxford University, UK) November 13, 2013 Simo Särkkä (Aalto) Lecture 1: Pragmatic
More informationMachine Learning 4771
Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object
More informationRobotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning
Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max-Planck-Institute for Intelligent Systems Tübingen, Germany & Computer Science, Neuroscience, &
More informationHierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References
24th March 2011 Update Hierarchical Model Rao and Ballard (1999) presented a hierarchical model of visual cortex to show how classical and extra-classical Receptive Field (RF) effects could be explained
More informationAutomatic Control II Computer exercise 3. LQG Design
Uppsala University Information Technology Systems and Control HN,FS,KN 2000-10 Last revised by HR August 16, 2017 Automatic Control II Computer exercise 3 LQG Design Preparations: Read Chapters 5 and 9
More informationGaussians. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics
Gaussians Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Univariate Gaussian Multivariate Gaussian Law of Total Probability Conditioning
More informationMarkov Chain Monte Carlo Methods for Stochastic
Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013
More informationLecture 9 Nonlinear Control Design
Lecture 9 Nonlinear Control Design Exact-linearization Lyapunov-based design Lab 2 Adaptive control Sliding modes control Literature: [Khalil, ch.s 13, 14.1,14.2] and [Glad-Ljung,ch.17] Course Outline
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationMATH4406 (Control Theory) Unit 1: Introduction Prepared by Yoni Nazarathy, July 21, 2012
MATH4406 (Control Theory) Unit 1: Introduction Prepared by Yoni Nazarathy, July 21, 2012 Unit Outline Introduction to the course: Course goals, assessment, etc... What is Control Theory A bit of jargon,
More informationHomework Solution # 3
ECSE 644 Optimal Control Feb, 4 Due: Feb 17, 4 (Tuesday) Homework Solution # 3 1 (5%) Consider the discrete nonlinear control system in Homework # For the optimal control and trajectory that you have found
More informationOptimization-Based Control
Optimization-Based Control Richard M. Murray Control and Dynamical Systems California Institute of Technology DRAFT v2.1a, January 3, 2010 c California Institute of Technology All rights reserved. This
More informationAlberto Bressan. Department of Mathematics, Penn State University
Non-cooperative Differential Games A Homotopy Approach Alberto Bressan Department of Mathematics, Penn State University 1 Differential Games d dt x(t) = G(x(t), u 1(t), u 2 (t)), x(0) = y, u i (t) U i
More informationStochastic Optimal Control!
Stochastic Control! Robert Stengel! Robotics and Intelligent Systems, MAE 345, Princeton University, 2015 Learning Objectives Overview of the Linear-Quadratic-Gaussian (LQG) Regulator Introduction to Stochastic
More informationA Crash Course on Kalman Filtering
A Crash Course on Kalman Filtering Dan Simon Cleveland State University Fall 2014 1 / 64 Outline Linear Systems Probability State Means and Covariances Least Squares Estimation The Kalman Filter Unknown
More informationReinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it
More informationTime Series Analysis
Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture State space models, 1st part: Model: Sec. 10.1 The
More informationELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems
ELEC4631 s Lecture 2: Dynamic Control Systems 7 March 2011 Overview of dynamic control systems Goals of Controller design Autonomous dynamic systems Linear Multi-input multi-output (MIMO) systems Bat flight
More informationStatic and Dynamic Optimization (42111)
Static and Dynamic Optimization (421) Niels Kjølstad Poulsen Build. 0b, room 01 Section for Dynamical Systems Dept. of Applied Mathematics and Computer Science The Technical University of Denmark Email:
More informationEECS C128/ ME C134 Final Wed. Dec. 15, am. Closed book. Two pages of formula sheets. No calculators.
Name: SID: EECS C28/ ME C34 Final Wed. Dec. 5, 2 8- am Closed book. Two pages of formula sheets. No calculators. There are 8 problems worth points total. Problem Points Score 2 2 6 3 4 4 5 6 6 7 8 2 Total
More informationSolution of Stochastic Optimal Control Problems and Financial Applications
Journal of Mathematical Extension Vol. 11, No. 4, (2017), 27-44 ISSN: 1735-8299 URL: http://www.ijmex.com Solution of Stochastic Optimal Control Problems and Financial Applications 2 Mat B. Kafash 1 Faculty
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationUniformly Uniformly-ergodic Markov chains and BSDEs
Uniformly Uniformly-ergodic Markov chains and BSDEs Samuel N. Cohen Mathematical Institute, University of Oxford (Based on joint work with Ying Hu, Robert Elliott, Lukas Szpruch) Centre Henri Lebesgue,
More informationLecture 19 Observability and state estimation
EE263 Autumn 2007-08 Stephen Boyd Lecture 19 Observability and state estimation state estimation discrete-time observability observability controllability duality observers for noiseless case continuous-time
More informationKalman Filter. Man-Wai MAK
Kalman Filter Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S. Gannot and A. Yeredor,
More informationLearning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods
Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Yaakov Engel Joint work with Peter Szabo and Dmitry Volkinshtein (ex. Technion) Why use GPs in RL? A Bayesian approach
More informationThe Kalman Filter ImPr Talk
The Kalman Filter ImPr Talk Ged Ridgway Centre for Medical Image Computing November, 2006 Outline What is the Kalman Filter? State Space Models Kalman Filter Overview Bayesian Updating of Estimates Kalman
More informationECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form
ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER KRISTOFFER P. NIMARK The Kalman Filter We will be concerned with state space systems of the form X t = A t X t 1 + C t u t 0.1 Z t
More informationLecture Notes: (Stochastic) Optimal Control
Lecture Notes: (Stochastic) Optimal ontrol Marc Toussaint Machine Learning & Robotics group, TU erlin Franklinstr. 28/29, FR 6-9, 587 erlin, Germany July, 2 Disclaimer: These notes are not meant to be
More informationTopic # /31 Feedback Control Systems. Analysis of Nonlinear Systems Lyapunov Stability Analysis
Topic # 16.30/31 Feedback Control Systems Analysis of Nonlinear Systems Lyapunov Stability Analysis Fall 010 16.30/31 Lyapunov Stability Analysis Very general method to prove (or disprove) stability of
More informationNonlinear Observer Design for Dynamic Positioning
Author s Name, Company Title of the Paper DYNAMIC POSITIONING CONFERENCE November 15-16, 2005 Control Systems I J.G. Snijders, J.W. van der Woude Delft University of Technology (The Netherlands) J. Westhuis
More informationNonlinear Model Predictive Control Tools (NMPC Tools)
Nonlinear Model Predictive Control Tools (NMPC Tools) Rishi Amrit, James B. Rawlings April 5, 2008 1 Formulation We consider a control system composed of three parts([2]). Estimator Target calculator Regulator
More informationECE557 Systems Control
ECE557 Systems Control Bruce Francis Course notes, Version.0, September 008 Preface This is the second Engineering Science course on control. It assumes ECE56 as a prerequisite. If you didn t take ECE56,
More informationReinforcement Learning with Reference Tracking Control in Continuous State Spaces
Reinforcement Learning with Reference Tracking Control in Continuous State Spaces Joseph Hall, Carl Edward Rasmussen and Jan Maciejowski Abstract The contribution described in this paper is an algorithm
More informationUCLA Chemical Engineering. Process & Control Systems Engineering Laboratory
Constrained Innite-time Optimal Control Donald J. Chmielewski Chemical Engineering Department University of California Los Angeles February 23, 2000 Stochastic Formulation - Min Max Formulation - UCLA
More informationEL 625 Lecture 10. Pole Placement and Observer Design. ẋ = Ax (1)
EL 625 Lecture 0 EL 625 Lecture 0 Pole Placement and Observer Design Pole Placement Consider the system ẋ Ax () The solution to this system is x(t) e At x(0) (2) If the eigenvalues of A all lie in the
More informationComputational Issues in Nonlinear Dynamics and Control
Computational Issues in Nonlinear Dynamics and Control Arthur J. Krener ajkrener@ucdavis.edu Supported by AFOSR and NSF Typical Problems Numerical Computation of Invariant Manifolds Typical Problems Numerical
More informationTheory and Implementation of Biomimetic Motor Controllers
Theory and Implementation of Biomimetic Motor Controllers Thesis submitted for the degree of Doctor of Philosophy by Yuval Tassa Submitted to the Senate of the Hebrew University of Jerusalem February 2011
More informationRobust control and applications in economic theory
Robust control and applications in economic theory In honour of Professor Emeritus Grigoris Kalogeropoulos on the occasion of his retirement A. N. Yannacopoulos Department of Statistics AUEB 24 May 2013
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationObservability for deterministic systems and high-gain observers
Observability for deterministic systems and high-gain observers design. Part 1. March 29, 2011 Introduction and problem description Definition of observability Consequences of instantaneous observability
More informationEE221A Linear System Theory Final Exam
EE221A Linear System Theory Final Exam Professor C. Tomlin Department of Electrical Engineering and Computer Sciences, UC Berkeley Fall 2016 12/16/16, 8-11am Your answers must be supported by analysis,
More informationNeural Networks Lecture 10: Fault Detection and Isolation (FDI) Using Neural Networks
Neural Networks Lecture 10: Fault Detection and Isolation (FDI) Using Neural Networks H.A. Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011.
More information