An efficient approach to stochastic optimal control. Bert Kappen SNN Radboud University Nijmegen the Netherlands

Size: px
Start display at page:

Download "An efficient approach to stochastic optimal control. Bert Kappen SNN Radboud University Nijmegen the Netherlands"

Transcription

1 An efficient approach to stochastic optimal control Bert Kappen SNN Radboud University Nijmegen the Netherlands Bert Kappen

2 Examples of control tasks Motor control Bert Kappen Pascal workshop, May

3 Examples of control tasks Foraging Bert Kappen Pascal workshop, May

4 Examples of control tasks Collaborating agents Bert Kappen Pascal workshop, May

5 Stochastic optimal control theory Control: how to act (now) to optimize future rewards - optimal solution is noise dependent - computation is intractable - tractable approaches are unimodal (LQ, deterministic) Bert Kappen Pascal workshop, May

6 Outline Control theory Path integral control theory Spontaneous symmetry breaking, timing of decisions Agents Summary If time permits: Learning and neural implementation Bert Kappen Pascal workshop, May

7 Discrete time control Consider the control of a discrete time dynamical system: x t+1 = f(t, x t, u t ), t = 0, 1,..., T (1) x t is an n-dimensional vector describing the state of the system and u t is an m-dimensional vector that specifies the control or action at time t. Note, that Eq. 1 describes a noiseless dynamics. If we specify x at t = 0 as x 0 and we specify a sequence of controls u 0:T = u 0, u 1,..., u T, we can compute future states of the system x 1,..., x T +1 recursively from Eq.1. Define a cost function that assigns a cost to each sequence of controls: C(x 0, u 0:T ) = T R(t, x t, u t ) (2) t=0 R(t, x, u) is the cost that is associated with taking action u at time t in state x. Bert Kappen Pascal workshop, May

8 Discrete time control The problem of optimal control is to find the sequence u 0:T C(x 0, u 0:T ). that minimizes The problem has a standard solution, which is known as dynamic programming. Introduce the optimal cost-to-go: J(t, x t ) = min u t:t T R(s, x s, u s ) (3) s=t which solves the optimal control problem from an intermediate time t until the fixed end time T, starting at an arbitrary location x t. The minimum of Eq. 2 is given by J(0, x 0 ). Bert Kappen Pascal workshop, May

9 Discrete time control One can recursively compute J(t, x) from J(t + 1, x) for all x in the following way: J(T + 1, x) = 0 J(t, x t ) = min u t:t = min u t T R(s, x s, u s ) s=t ( R(t, x t, u t ) + min u t+1:t T s=t+1 = min u t (R(t, x t, u t ) + J(t + 1, x t+1 )) R(s, x s, u s ) ) The minimizers u 0:T give the optimal control path. Bert Kappen Pascal workshop, May

10 The discrete time recursion is: Continuous limit J(t, x t ) = min u t (R(t, x t, u t ) + J(t + dt, x t+dt )) In the limit of continuous time we get J(t + dt, x t+dt ) = J(t, x t ) + dt t J(t, x t ) + dx x J(t, x t ) dx = f(x, u, t)dt Thus, t J(t, x) = min u (R(t, x, u) + f(x, u, t) x J(x, t)) with boundary condition J(x, T ) = R(T, x) = φ(x). Bert Kappen Pascal workshop, May

11 Example: Bang-bang control The spring force F = z towards the rest position. Control force u. Newton s Law F = m z with m = 1: z = z + u Control problem: Given initial position and velocity z i = ż i = 0 at time t = 0, find the control path 1 < u(0 T ) < 1 such that z(t ) is maximal. Bert Kappen Pascal workshop, May

12 Introduce x 1 = z, x 2 = ż, then Example: Bang-bang control x 1 = x 2 x 2 = x 1 + u The end cost is φ(x) = x 1 and R(x, u, t) = 0. The HJB takes the form: t J = min u ( J J x 2 + x 1 + J ) u x 1 x 2 x 2 J = x 2 J x 1 + x 1 J x 2 x 2, u = sign ( J x 2 ) Bert Kappen Pascal workshop, May

13 Example: Bang-bang control The solution is J(t, x 1, x 2 ) = cos(t T )x 1 + sin(t T )x 2 + α(t) u(t, x 1, x 2 ) = sign(sin(t T )) As an example consider T = 2π. Then, the optimal control is u = 1, u = 1, 0 < t < π π < t < 2π x 1 x t Bert Kappen Pascal workshop, May

14 Stochastic optimal control Consider a stochastic dynamical system dx = f(t, x, u)dt + dξ dξ Gaussian noise dξ 2 = νdt. The cost becomes an expectation: C(t, x, u(t T )) = φ(x(t )) + T t dτ R(t, x(t), u(t)) over all stochastic trajectories starting at x with control path u(t T ). Bert Kappen Pascal workshop, May

15 Stochastic optimal control We obtain a similar discrete time recursion: J(t, x t ) = min u t R(t, x t, u t ) + J(t + dt, x t+dt ) In the limit of continuous time we get J(t + dt, x t+dt ) = J(t, x t ) + dt t J(t, x t ) + dx x J(t, x t ) dx 2 2 xj(t, x t ) dx = f(x, u, t)dt dx 2 = νdt Thus, t J(t, x) = min u ( R(t, x, u) + f(x, u, t) x J(x, t) + 1 ) 2 ν 2 xj(x, t) with boundary condition J(x, T ) = φ(x). Bert Kappen Pascal workshop, May

16 Path integral control Consider the special case: f(t, x, u) = f(t, x) + u R(t, x, u) = V (t, x) u2 then t J = min u ( 1 2 u2 + V + f x J + 1 ) 2 ν 2 xj = 1 2 ( xj) 2 + V + f x J ν 2 xj u = x J(x, t) Bert Kappen Pascal workshop, May

17 Solution 1. define Ψ(x, t) = exp( J(x, t)/ν), then t Ψ = V ν Ψ + f xψ ν 2 xψ, Ψ(x, T ) = exp( φ(x)/ν) = HΨ 2. define the conditional probability ρ(y, τ x, t), τ t through a diffusion equation: τ ρ = V ν ρ y(fρ) ν 2 yρ, ρ(y, t x, t) = δ(y x) = H Ψ 3. By construction, dyρ(y, τ x, t)ψ(y, τ) independent of τ. Bert Kappen Pascal workshop, May

18 4. Evaluate at t and T : Ψ(x, t) = dyρ(y, T x, t) exp ( φ(y)/ν) Ψ gives J gives u. Bert Kappen Pascal workshop, May

19 An example: double slit 8 6 dx = udt + dξ C = 1 2 x(t )2 + T 0 dτ 1 2 u(τ)2 + V (x, t) V (x, t = 1) implements a slit at an intermediate time t = 1. Ψ(x, t) = can be solved in closed form. dyρ(y, T x, t)ψ(y, T ) J t=0 t=0.99 t=1.01 t= x Bert Kappen Pascal workshop, May

20 The delayed choice x t Obstacle avoidance requires mechanism when to decide. We take V = 0 and f = 0 and φ(x) = for all x, except for two narrow slits of infinitesimal size ɛ at x = ±1. Bert Kappen Pascal workshop, May

21 We can compute J exactly and is given by J(x, T ) = ν log = 1 T dyρ(y x)e φ(y)/ν ( 1 2 x2 νt log 2 cosh x ) νt J(x,t) T=2 T=1 T=0.5 where T the time to reach the slits. The expression between brackets is a typical free energy with temperature νt x Symmetry breaking at νt = 1 separates two qualitatively different behaviours. Bert Kappen Pascal workshop, May

22 The delayed choice 2 stochastic 2 deterministic The timing of the decision, that is when the automaton decides to go left or right, is the consequence of spontaneous symmetry breaking. Bert Kappen Pascal workshop, May

23 The diffusion process ρ(y, τ x, t) satisfies the diffusion equation: τ ρ = V ν ρ y(fρ) ν 2 yρ, τ = t T ρ(y, t x, t) = δ(y x) and can be sampled as dy = f(y, t)dt + dξ y = y + dy, with probability 1 V (y, t)dt/ν y =, with probability V (y, t)dt/ν Bert Kappen Pascal workshop, May

24 The diffusion process The diffusion process can be written as a path integral: ρ(y, T x, t) = [dx] y x exp ( 1 ) ν S path(x(t T )) S path (x(t T )) = T t dτ 1 2 (ẋ(τ) f(x(τ), τ))2 + V (x(τ), τ) x y t t f Bert Kappen Pascal workshop, May

25 The path integral formulation Ψ(x, t) = = ( dyρ(y, T x, t) exp φ(x) ) ν [dx] x exp ( 1ν ) S(x(t T )) S(x(t T )) = S path (x(t T ) + φ(x(t )) Ψ is a partition sum and J = ν log Ψ therefore can be interpreted as a free energy. S is the energy of a path and ν the temperature. The corresponding probability distribution is p(x(t T ) x, t) = 1 ( Ψ(x, t) exp 1ν ) S(x(t T )) Bert Kappen Pascal workshop, May

26 Gibbs sampling Sample paths x 0:n from p(x 0:n ) exp( S(x 0:n )/ν) End cost φ(x n ) centered on target. Path cost V (x) for obstacles ( 1 J(x, t) = ν log N udt = exp(j/ν) N ) N exp( S(x i 0:n)/ν) i N exp( S(x i 0:n)/ν)dξ i i Bert Kappen Pascal workshop, May

27 n agents with independent dynamics Coordination of agents dx α = (f α (x α, t) + u α ) + dξ α, α = 1,..., n should coordinate their actions to minimize a cost at a future time t = T : φ(y 1,..., y n ) y α {z 1,... z k } and φ = elsewhere. Bert Kappen Pascal workshop, May

28 Coordination of agents Then, Ψ(x 1,..., x n, t) = = y dy 1... dy n ρ(y α, T x α, t) exp( φ(y 1,..., y n )/ν) α exp( E( y x, t)/ν) p( y) = 1 exp( E( y x, t)/ν) Z log ρ(yα, T x α, t) u α ( x, t) = xα J = x α with x = (x 1,..., x n ), y = (y 1,..., y n ). E has a graphical model structure if φ has. Bert Kappen Pascal workshop, May

29 Pseudo code Loop: 1. Compute the cost and its log derivative for each agent to move to each target: ρ(z i, T x α, t), i = 1,..., k, α = 1,..., n This path integral can be estimated using MC sampling or variational approximation. 2. Compute u α using graphical model inference in p( y) (exact, BP, MF). Bert Kappen Pascal workshop, May

30 A simple 1d example Intrinsic dynamics f α = 0, V (x 1,..., x n ) = 0: p(y α, T x α, t) exp( (y α x α ) 2 /2ν(T t)) End cost φ(y 1,..., y n ) = k j=1 (n j( y) n j ) 2, with n j ( y) the # of agents that go to target j. Optimal control is for agent α is u α = 1 T t ( y α x α ) Bert Kappen Pascal workshop, May

31 A simple 1d example <y> 0 x t t (a) Agent predicted target y α (b) Agent position x Bert Kappen Pascal workshop, May

32 A simple 1d example Cost Difference Noise CPU Time Agents Control cost greedy control (red) MF control (blue) BP control (green) CPU time exact control (black) MF control (blue) BP control (green) greedy control (red) Bert Kappen Pascal workshop, May

33 Nonlinear Coordination Agents a = 1,..., n in 2D: dx a (t) = v a (t) cos ϕ a (t) dt dy a (t) = v a (t) sin ϕ a (t) dt dv a (t) = u a (t)dt + dξ a (t) dϕ a (t) = ω a (t)dt + dζ a (t) Initial states O, v a (0) = 0, ϕ a (0) = 0 Targets X, v a (T ) = 0, ϕ a (T ) = 0 Sample paths specified at t i = t + i dt, i = 0,..., 6, dt = (T t)/6 Example of 10 agents & 10 targets: Sample paths: Bert Kappen Pascal workshop, May

34 Computation Time Inference methods: Junction Tree ( ) MF ( ) (100 sample paths per agent-target) CPU time (s) vs. number of agents: CPU time Number of Agents (# agents = # targets) JT MF : exponential in number of agents (intractable for # agents > 10) : polynomial in number of agents Bert Kappen Pascal workshop, May

35 Summary A restricted class of control problems can be reformulated in statistical physics language. - path integrals - symmetry breaking - efficient computation (MCMC, BP, MF, EP) - coordination of agents Future: - Robotics in dynamical environment - Learning/exploration Bert Kappen Pascal workshop, May

36 Further reading H.J. Kappen, Physical Review Letters (2005) H.J. Kappen, Journal of statistical mechanics: theory and experiment, November 2005 P11011 W. Wiegerinck, B. van den Broek, H.J. Kappen, Proceedings UAI (2006) H.J. Kappen, 9th Granada seminar on Computational Physics: Computational and Mathematical Modeling of Cooperative Behavior in Neural Systems, Americal Institute of Physics (2007) Bert Kappen Pascal workshop, May

37 Learning Model-based: first learn a model and then do optimal control. Model-free: interleave learning and optimal control - more natural biologically and for AI - problem of exploration-exploitation:. Intermediate control is suboptimal. Control theory does not address exploration RL/actor critic approach: exploration = exploitation + noise PI control: exploration is forward diffusion. Ψ(x i, 0) = exp dt λ j=i+n j=i V (x j ) with T = ndt and x j, j = i, i + 1,..., i + n the states visited after state x i. Exploration can be optimized as in important sampling. Bert Kappen Pascal workshop, May

38 Learning x x J T=3 J T=10 V T*V J mc J lp x T*V 11 J mc J lp x Sampling of J(x) with one trajectory of N = 8000 iterations starting at x = 0. Left: The diffusion process dx = dξ explores the area between x = 7.5 and x = 6. Shown is a histogram of the points visited (300 bins). In each bin x, an estimate of ψ(x) is made by averaging all ψ(x i ) with x i from bin x. Right: V (x) and J T (x)/t versus x for T = 3 and T = 10. Bert Kappen Pascal workshop, May

39 A neural implementation/thinking ahead Topological map represents space x. Neuron i is active when animal at x. dρ i dt = V i λ ρ i(t) + ν 2 D ij ρ j (t) with D the diffusion matrix D ii = 2, D ii+1 = D ii 1 = 1 and all other entries of D are zero. V i is the immediate reward at location i. Some mechanism ensures i ρ i(t) = 1. j T=0.1 T=5 T= Thinking ahead. When the animal is at x 1 it can start the diffusion dynamics to anticipate what will happen in the future. Bert Kappen Pascal workshop, May

Stochastic optimal control theory

Stochastic optimal control theory Stochastic optimal control theory Bert Kappen SNN Radboud University Nijmegen the Netherlands July 5, 2008 Bert Kappen Introduction Optimal control theory: Optimize sum of a path cost and end cost. Result

More information

Stochastic Optimal Control in Continuous Space-Time Multi-Agent Systems

Stochastic Optimal Control in Continuous Space-Time Multi-Agent Systems Stochastic Optimal Control in Continuous Space-Time Multi-Agent Systems Wim Wiegerinck Bart van den Broek Bert Kappen SNN, Radboud University Nijmegen 6525 EZ Nijmegen, The Netherlands {w.wiegerinck,b.vandenbroek,b.kappen}@science.ru.nl

More information

A path integral approach to agent planning

A path integral approach to agent planning A path integral approach to agent planning Hilbert J. Kappen Department of Biophysics Radboud University Nijmegen, The Netherlands b.kappen@science.ru.nl Wim Wiegerinck B. van den Broek Department of Biophysics

More information

Stochastic optimal control theory

Stochastic optimal control theory Stochastic optimal control theory ICML, Helsinki 8 tutorial H.J. Kappen, Radboud University, Nijmegen, the Netherlands July 4, 8 Abstract Control theory is a mathematical description of how to act optimally

More information

An introduction to stochastic control theory, path integrals and reinforcement learning

An introduction to stochastic control theory, path integrals and reinforcement learning An introduction to stochastic control theory, path integrals and reinforcement learning Hilbert J. Kappen Department of Biophysics, Radboud University, Geert Grooteplein 21, 6525 EZ Nijmegen Abstract.

More information

Latent state estimation using control theory

Latent state estimation using control theory Latent state estimation using control theory Bert Kappen SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London August 3, 7 with Hans Christian Ruiz Bert Kappen Smoothing problem Given

More information

arxiv: v3 [math.oc] 18 Jan 2012

arxiv: v3 [math.oc] 18 Jan 2012 Optimal control as a graphical model inference problem Hilbert J. Kappen Vicenç Gómez Manfred Opper arxiv:0901.0633v3 [math.oc] 18 Jan 01 Abstract We reformulate a class of non-linear stochastic optimal

More information

Path Integral Stochastic Optimal Control for Reinforcement Learning

Path Integral Stochastic Optimal Control for Reinforcement Learning Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute

More information

Reinforcement learning

Reinforcement learning Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Stochastic and Adaptive Optimal Control

Stochastic and Adaptive Optimal Control Stochastic and Adaptive Optimal Control Robert Stengel Optimal Control and Estimation, MAE 546 Princeton University, 2018! Nonlinear systems with random inputs and perfect measurements! Stochastic neighboring-optimal

More information

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function: Optimal Control Control design based on pole-placement has non unique solutions Best locations for eigenvalues are sometimes difficult to determine Linear Quadratic LQ) Optimal control minimizes a quadratic

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a preprint version which may differ from the publisher's version. For additional information about this

More information

Reinforcement Learning

Reinforcement Learning 1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision

More information

Message passing and approximate message passing

Message passing and approximate message passing Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x

More information

Deterministic Dynamic Programming

Deterministic Dynamic Programming Deterministic Dynamic Programming 1 Value Function Consider the following optimal control problem in Mayer s form: V (t 0, x 0 ) = inf u U J(t 1, x(t 1 )) (1) subject to ẋ(t) = f(t, x(t), u(t)), x(t 0

More information

Robotics. Control Theory. Marc Toussaint U Stuttgart

Robotics. Control Theory. Marc Toussaint U Stuttgart Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Controlled Diffusions and Hamilton-Jacobi Bellman Equations

Controlled Diffusions and Hamilton-Jacobi Bellman Equations Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter

More information

Closed-Loop Impulse Control of Oscillating Systems

Closed-Loop Impulse Control of Oscillating Systems Closed-Loop Impulse Control of Oscillating Systems A. N. Daryin and A. B. Kurzhanski Moscow State (Lomonosov) University Faculty of Computational Mathematics and Cybernetics Periodic Control Systems, 2007

More information

Lecture Note 13:Continuous Time Switched Optimal Control: Embedding Principle and Numerical Algorithms

Lecture Note 13:Continuous Time Switched Optimal Control: Embedding Principle and Numerical Algorithms ECE785: Hybrid Systems:Theory and Applications Lecture Note 13:Continuous Time Switched Optimal Control: Embedding Principle and Numerical Algorithms Wei Zhang Assistant Professor Department of Electrical

More information

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011. L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen Outline George McCullochs model The Variational Garrote Bert Kappen L methods

More information

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018 EN530.603 Applied Optimal Control Lecture 8: Dynamic Programming October 0, 08 Lecturer: Marin Kobilarov Dynamic Programming (DP) is conerned with the computation of an optimal policy, i.e. an optimal

More information

Lecture 6: Bayesian Inference in SDE Models

Lecture 6: Bayesian Inference in SDE Models Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Prof. Krstic Nonlinear Systems MAE281A Homework set 1 Linearization & phase portrait

Prof. Krstic Nonlinear Systems MAE281A Homework set 1 Linearization & phase portrait Prof. Krstic Nonlinear Systems MAE28A Homework set Linearization & phase portrait. For each of the following systems, find all equilibrium points and determine the type of each isolated equilibrium. Use

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Solution of Stochastic Optimal Control Problems and Financial Applications

Solution of Stochastic Optimal Control Problems and Financial Applications Journal of Mathematical Extension Vol. 11, No. 4, (2017), 27-44 ISSN: 1735-8299 URL: http://www.ijmex.com Solution of Stochastic Optimal Control Problems and Financial Applications 2 Mat B. Kafash 1 Faculty

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Reinforcement Learning In Continuous Time and Space

Reinforcement Learning In Continuous Time and Space Reinforcement Learning In Continuous Time and Space presentation of paper by Kenji Doya Leszek Rybicki lrybicki@mat.umk.pl 18.07.2008 Leszek Rybicki lrybicki@mat.umk.pl Reinforcement Learning In Continuous

More information

Optimal control as a graphical model inference problem

Optimal control as a graphical model inference problem DOI 10.1007/s10994-012-5278-7 Optimal control as a graphical model inference problem Hilbert J. Kappen Vicenç Gómez Manfred Opper Received: 3 December 2010 / Accepted: 11 January 2012 The Author(s) 2012.

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Policy Search for Path Integral Control

Policy Search for Path Integral Control Policy Search for Path Integral Control Vicenç Gómez 1,2, Hilbert J Kappen 2, Jan Peters 3,4, and Gerhard Neumann 3 1 Universitat Pompeu Fabra, Barcelona Department of Information and Communication Technologies,

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Planning by Probabilistic Inference

Planning by Probabilistic Inference Planning by Probabilistic Inference Hagai Attias Microsoft Research 1 Microsoft Way Redmond, WA 98052 Abstract This paper presents and demonstrates a new approach to the problem of planning under uncertainty.

More information

Towards a Bayesian model for Cyber Security

Towards a Bayesian model for Cyber Security Towards a Bayesian model for Cyber Security Mark Briers (mbriers@turing.ac.uk) Joint work with Henry Clausen and Prof. Niall Adams (Imperial College London) 27 September 2017 The Alan Turing Institute

More information

Animal learning theory

Animal learning theory Animal learning theory Based on [Sutton and Barto, 1990, Dayan and Abbott, 2001] Bert Kappen [Sutton and Barto, 1990] Classical conditioning: - A conditioned stimulus (CS) and unconditioned stimulus (US)

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

Nonlinear and robust MPC with applications in robotics

Nonlinear and robust MPC with applications in robotics Nonlinear and robust MPC with applications in robotics Boris Houska, Mario Villanueva, Benoît Chachuat ShanghaiTech, Texas A&M, Imperial College London 1 Overview Introduction to Robust MPC Min-Max Differential

More information

A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games

A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games International Journal of Fuzzy Systems manuscript (will be inserted by the editor) A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games Mostafa D Awheda Howard M Schwartz Received:

More information

Theoretical Tutorial Session 2

Theoretical Tutorial Session 2 1 / 36 Theoretical Tutorial Session 2 Xiaoming Song Department of Mathematics Drexel University July 27, 216 Outline 2 / 36 Itô s formula Martingale representation theorem Stochastic differential equations

More information

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section

More information

Lecture 7 Unconstrained nonlinear programming

Lecture 7 Unconstrained nonlinear programming Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,

More information

Linear Differential Equations. Problems

Linear Differential Equations. Problems Chapter 1 Linear Differential Equations. Problems 1.1 Introduction 1.1.1 Show that the function ϕ : R R, given by the expression ϕ(t) = 2e 3t for all t R, is a solution of the Initial Value Problem x =

More information

Uncertainty quantification and systemic risk

Uncertainty quantification and systemic risk Uncertainty quantification and systemic risk Josselin Garnier (Université Paris Diderot) with George Papanicolaou and Tzu-Wei Yang (Stanford University) February 3, 2016 Modeling systemic risk We consider

More information

Linear SPDEs driven by stationary random distributions

Linear SPDEs driven by stationary random distributions Linear SPDEs driven by stationary random distributions aluca Balan University of Ottawa Workshop on Stochastic Analysis and Applications June 4-8, 2012 aluca Balan (University of Ottawa) Linear SPDEs with

More information

Smoluchowski Diffusion Equation

Smoluchowski Diffusion Equation Chapter 4 Smoluchowski Diffusion Equation Contents 4. Derivation of the Smoluchoswki Diffusion Equation for Potential Fields 64 4.2 One-DimensionalDiffusoninaLinearPotential... 67 4.2. Diffusion in an

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

UNIVERSITY OF MANITOBA

UNIVERSITY OF MANITOBA DATE: May 8, 2015 Question Points Score INSTRUCTIONS TO STUDENTS: This is a 6 hour examination. No extra time will be given. No texts, notes, or other aids are permitted. There are no calculators, cellphones

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Lecture 12: Detailed balance and Eigenfunction methods

Lecture 12: Detailed balance and Eigenfunction methods Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2015 Lecture 12: Detailed balance and Eigenfunction methods Readings Recommended: Pavliotis [2014] 4.5-4.7 (eigenfunction methods and reversibility),

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Data Structures for Efficient Inference and Optimization

Data Structures for Efficient Inference and Optimization Data Structures for Efficient Inference and Optimization in Expressive Continuous Domains Scott Sanner Ehsan Abbasnejad Zahra Zamani Karina Valdivia Delgado Leliane Nunes de Barros Cheng Fang Discrete

More information

Higher-Order Dynamics in Asset-Pricing Models with Recursive Preferences

Higher-Order Dynamics in Asset-Pricing Models with Recursive Preferences Higher-Order Dynamics in Asset-Pricing Models with Recursive Preferences Walt Pohl Karl Schmedders Ole Wilms Dept. of Business Administration, University of Zurich Becker Friedman Institute Computational

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Electrodynamics Exam Solutions

Electrodynamics Exam Solutions Electrodynamics Exam Solutions Name: FS 215 Prof. C. Anastasiou Student number: Exercise 1 2 3 4 Total Max. points 15 15 15 15 6 Points Visum 1 Visum 2 The exam lasts 18 minutes. Start every new exercise

More information

Existence and Comparisons for BSDEs in general spaces

Existence and Comparisons for BSDEs in general spaces Existence and Comparisons for BSDEs in general spaces Samuel N. Cohen and Robert J. Elliott University of Adelaide and University of Calgary BFS 2010 S.N. Cohen, R.J. Elliott (Adelaide, Calgary) BSDEs

More information

Real Time Stochastic Control and Decision Making: From theory to algorithms and applications

Real Time Stochastic Control and Decision Making: From theory to algorithms and applications Real Time Stochastic Control and Decision Making: From theory to algorithms and applications Evangelos A. Theodorou Autonomous Control and Decision Systems Lab Challenges in control Uncertainty Stochastic

More information

Methods of Data Analysis Random numbers, Monte Carlo integration, and Stochastic Simulation Algorithm (SSA / Gillespie)

Methods of Data Analysis Random numbers, Monte Carlo integration, and Stochastic Simulation Algorithm (SSA / Gillespie) Methods of Data Analysis Random numbers, Monte Carlo integration, and Stochastic Simulation Algorithm (SSA / Gillespie) Week 1 1 Motivation Random numbers (RNs) are of course only pseudo-random when generated

More information

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables Revised submission to IEEE TNN Aapo Hyvärinen Dept of Computer Science and HIIT University

More information

Hybrid Automata and ɛ-analysis on a Neural Oscillator

Hybrid Automata and ɛ-analysis on a Neural Oscillator Hybrid Automata and ɛ-analysis on a Neural Oscillator A. Casagrande 1 T. Dreossi 2 C. Piazza 2 1 DMG, University of Trieste, Italy 2 DIMI, University of Udine, Italy Intuitively... Motivations: Reachability

More information

Kolmogorov Equations and Markov Processes

Kolmogorov Equations and Markov Processes Kolmogorov Equations and Markov Processes May 3, 013 1 Transition measures and functions Consider a stochastic process {X(t)} t 0 whose state space is a product of intervals contained in R n. We define

More information

Recitation 9: Loopy BP

Recitation 9: Loopy BP Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 204 Recitation 9: Loopy BP General Comments. In terms of implementation,

More information

This is a Gaussian probability centered around m = 0 (the most probable and mean position is the origin) and the mean square displacement m 2 = n,or

This is a Gaussian probability centered around m = 0 (the most probable and mean position is the origin) and the mean square displacement m 2 = n,or Physics 7b: Statistical Mechanics Brownian Motion Brownian motion is the motion of a particle due to the buffeting by the molecules in a gas or liquid. The particle must be small enough that the effects

More information

Introduction. Stochastic Processes. Will Penny. Stochastic Differential Equations. Stochastic Chain Rule. Expectations.

Introduction. Stochastic Processes. Will Penny. Stochastic Differential Equations. Stochastic Chain Rule. Expectations. 19th May 2011 Chain Introduction We will Show the relation between stochastic differential equations, Gaussian processes and methods This gives us a formal way of deriving equations for the activity of

More information

Math 211. Substitute Lecture. November 20, 2000

Math 211. Substitute Lecture. November 20, 2000 1 Math 211 Substitute Lecture November 20, 2000 2 Solutions to y + py + qy =0. Look for exponential solutions y(t) =e λt. Characteristic equation: λ 2 + pλ + q =0. Characteristic polynomial: λ 2 + pλ +

More information

Strong Markov property of determinantal processes associated with extended kernels

Strong Markov property of determinantal processes associated with extended kernels Strong Markov property of determinantal processes associated with extended kernels Hideki Tanemura Chiba university (Chiba, Japan) (November 22, 2013) Hideki Tanemura (Chiba univ.) () Markov process (November

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

ELEMENTS OF PROBABILITY THEORY

ELEMENTS OF PROBABILITY THEORY ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable

More information

4 The Continuous Time Fourier Transform

4 The Continuous Time Fourier Transform 96 4 The Continuous Time ourier Transform ourier (or frequency domain) analysis turns out to be a tool of even greater usefulness Extension of ourier series representation to aperiodic signals oundation

More information

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

arxiv: v1 [cs.lg] 20 Sep 2010

arxiv: v1 [cs.lg] 20 Sep 2010 Approximate Inference and Stochastic Optimal Control Konrad Rawlik 1, Marc Toussaint 2, and Sethu Vijayakumar 1 1 Statistical Machine Learning and Motor Control Group, University of Edinburgh 2 Machine

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Mortality Surface by Means of Continuous Time Cohort Models

Mortality Surface by Means of Continuous Time Cohort Models Mortality Surface by Means of Continuous Time Cohort Models Petar Jevtić, Elisa Luciano and Elena Vigna Longevity Eight 2012, Waterloo, Canada, 7-8 September 2012 Outline 1 Introduction Model construction

More information

0.3.4 Burgers Equation and Nonlinear Wave

0.3.4 Burgers Equation and Nonlinear Wave 16 CONTENTS Solution to step (discontinuity) initial condition u(x, 0) = ul if X < 0 u r if X > 0, (80) u(x, t) = u L + (u L u R ) ( 1 1 π X 4νt e Y 2 dy ) (81) 0.3.4 Burgers Equation and Nonlinear Wave

More information

First order differential equations

First order differential equations First order differential equations Samy Tindel Purdue University Differential equations and linear algebra - MA 262 Taken from Differential equations and linear algebra by Goode and Annin Samy T. First

More information

ONR MURI AIRFOILS: Animal Inspired Robust Flight with Outer and Inner Loop Strategies. Calin Belta

ONR MURI AIRFOILS: Animal Inspired Robust Flight with Outer and Inner Loop Strategies. Calin Belta ONR MURI AIRFOILS: Animal Inspired Robust Flight with Outer and Inner Loop Strategies Provable safety for animal inspired agile flight Calin Belta Hybrid and Networked Systems (HyNeSs) Lab Department of

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

Dynamical systems with Gaussian and Levy noise: analytical and stochastic approaches

Dynamical systems with Gaussian and Levy noise: analytical and stochastic approaches Dynamical systems with Gaussian and Levy noise: analytical and stochastic approaches Noise is often considered as some disturbing component of the system. In particular physical situations, noise becomes

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

2.152 Course Notes Contraction Analysis MIT, 2005

2.152 Course Notes Contraction Analysis MIT, 2005 2.152 Course Notes Contraction Analysis MIT, 2005 Jean-Jacques Slotine Contraction Theory ẋ = f(x, t) If Θ(x, t) such that, uniformly x, t 0, F = ( Θ + Θ f x )Θ 1 < 0 Θ(x, t) T Θ(x, t) > 0 then all solutions

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

Autonomous Helicopter Flight via Reinforcement Learning

Autonomous Helicopter Flight via Reinforcement Learning Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy

More information

Physics 202 Laboratory 3. Root-Finding 1. Laboratory 3. Physics 202 Laboratory

Physics 202 Laboratory 3. Root-Finding 1. Laboratory 3. Physics 202 Laboratory Physics 202 Laboratory 3 Root-Finding 1 Laboratory 3 Physics 202 Laboratory The fundamental question answered by this week s lab work will be: Given a function F (x), find some/all of the values {x i }

More information

Linearly-Solvable Stochastic Optimal Control Problems

Linearly-Solvable Stochastic Optimal Control Problems Linearly-Solvable Stochastic Optimal Control Problems Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter 2014

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Metric Spaces. Exercises Fall 2017 Lecturer: Viveka Erlandsson. Written by M.van den Berg

Metric Spaces. Exercises Fall 2017 Lecturer: Viveka Erlandsson. Written by M.van den Berg Metric Spaces Exercises Fall 2017 Lecturer: Viveka Erlandsson Written by M.van den Berg School of Mathematics University of Bristol BS8 1TW Bristol, UK 1 Exercises. 1. Let X be a non-empty set, and suppose

More information

Reflected Brownian Motion

Reflected Brownian Motion Chapter 6 Reflected Brownian Motion Often we encounter Diffusions in regions with boundary. If the process can reach the boundary from the interior in finite time with positive probability we need to decide

More information

Stochastic Spectral Approaches to Bayesian Inference

Stochastic Spectral Approaches to Bayesian Inference Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to

More information

This homework will not be collected or graded. It is intended to help you practice for the final exam. Solutions will be posted.

This homework will not be collected or graded. It is intended to help you practice for the final exam. Solutions will be posted. 6.003 Homework #14 This homework will not be collected or graded. It is intended to help you practice for the final exam. Solutions will be posted. Problems 1. Neural signals The following figure illustrates

More information

5. Sum-product algorithm

5. Sum-product algorithm Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider

More information

Supporting Information

Supporting Information Supporting Information A: Calculation of radial distribution functions To get an effective propagator in one dimension, we first transform 1) into spherical coordinates: x a = ρ sin θ cos φ, y = ρ sin

More information

MATH 220: Problem Set 3 Solutions

MATH 220: Problem Set 3 Solutions MATH 220: Problem Set 3 Solutions Problem 1. Let ψ C() be given by: 0, x < 1, 1 + x, 1 < x < 0, ψ(x) = 1 x, 0 < x < 1, 0, x > 1, so that it verifies ψ 0, ψ(x) = 0 if x 1 and ψ(x)dx = 1. Consider (ψ j )

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal

More information

Lecture 3. Dynamical Systems in Continuous Time

Lecture 3. Dynamical Systems in Continuous Time Lecture 3. Dynamical Systems in Continuous Time University of British Columbia, Vancouver Yue-Xian Li November 2, 2017 1 3.1 Exponential growth and decay A Population With Generation Overlap Consider a

More information

Open quantum random walks: bi-stability and ballistic diffusion. Open quantum brownian motion

Open quantum random walks: bi-stability and ballistic diffusion. Open quantum brownian motion Open quantum random walks: bi-stability and ballistic diffusion Open quantum brownian motion with Michel Bauer and Antoine Tilloy Autrans, July 2013 Different regimes in «open quantum random walks»: Open

More information

C.-H. Lamarque. University of Lyon/ENTPE/LGCB & LTDS UMR CNRS 5513

C.-H. Lamarque. University of Lyon/ENTPE/LGCB & LTDS UMR CNRS 5513 Nonlinear Dynamics of Smooth and Non-Smooth Systems with Application to Passive Controls 3rd Sperlonga Summer School on Mechanics and Engineering Sciences on Dynamics, Stability and Control of Flexible

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and

More information