An efficient approach to stochastic optimal control. Bert Kappen SNN Radboud University Nijmegen the Netherlands
|
|
- Calvin Chase
- 5 years ago
- Views:
Transcription
1 An efficient approach to stochastic optimal control Bert Kappen SNN Radboud University Nijmegen the Netherlands Bert Kappen
2 Examples of control tasks Motor control Bert Kappen Pascal workshop, May
3 Examples of control tasks Foraging Bert Kappen Pascal workshop, May
4 Examples of control tasks Collaborating agents Bert Kappen Pascal workshop, May
5 Stochastic optimal control theory Control: how to act (now) to optimize future rewards - optimal solution is noise dependent - computation is intractable - tractable approaches are unimodal (LQ, deterministic) Bert Kappen Pascal workshop, May
6 Outline Control theory Path integral control theory Spontaneous symmetry breaking, timing of decisions Agents Summary If time permits: Learning and neural implementation Bert Kappen Pascal workshop, May
7 Discrete time control Consider the control of a discrete time dynamical system: x t+1 = f(t, x t, u t ), t = 0, 1,..., T (1) x t is an n-dimensional vector describing the state of the system and u t is an m-dimensional vector that specifies the control or action at time t. Note, that Eq. 1 describes a noiseless dynamics. If we specify x at t = 0 as x 0 and we specify a sequence of controls u 0:T = u 0, u 1,..., u T, we can compute future states of the system x 1,..., x T +1 recursively from Eq.1. Define a cost function that assigns a cost to each sequence of controls: C(x 0, u 0:T ) = T R(t, x t, u t ) (2) t=0 R(t, x, u) is the cost that is associated with taking action u at time t in state x. Bert Kappen Pascal workshop, May
8 Discrete time control The problem of optimal control is to find the sequence u 0:T C(x 0, u 0:T ). that minimizes The problem has a standard solution, which is known as dynamic programming. Introduce the optimal cost-to-go: J(t, x t ) = min u t:t T R(s, x s, u s ) (3) s=t which solves the optimal control problem from an intermediate time t until the fixed end time T, starting at an arbitrary location x t. The minimum of Eq. 2 is given by J(0, x 0 ). Bert Kappen Pascal workshop, May
9 Discrete time control One can recursively compute J(t, x) from J(t + 1, x) for all x in the following way: J(T + 1, x) = 0 J(t, x t ) = min u t:t = min u t T R(s, x s, u s ) s=t ( R(t, x t, u t ) + min u t+1:t T s=t+1 = min u t (R(t, x t, u t ) + J(t + 1, x t+1 )) R(s, x s, u s ) ) The minimizers u 0:T give the optimal control path. Bert Kappen Pascal workshop, May
10 The discrete time recursion is: Continuous limit J(t, x t ) = min u t (R(t, x t, u t ) + J(t + dt, x t+dt )) In the limit of continuous time we get J(t + dt, x t+dt ) = J(t, x t ) + dt t J(t, x t ) + dx x J(t, x t ) dx = f(x, u, t)dt Thus, t J(t, x) = min u (R(t, x, u) + f(x, u, t) x J(x, t)) with boundary condition J(x, T ) = R(T, x) = φ(x). Bert Kappen Pascal workshop, May
11 Example: Bang-bang control The spring force F = z towards the rest position. Control force u. Newton s Law F = m z with m = 1: z = z + u Control problem: Given initial position and velocity z i = ż i = 0 at time t = 0, find the control path 1 < u(0 T ) < 1 such that z(t ) is maximal. Bert Kappen Pascal workshop, May
12 Introduce x 1 = z, x 2 = ż, then Example: Bang-bang control x 1 = x 2 x 2 = x 1 + u The end cost is φ(x) = x 1 and R(x, u, t) = 0. The HJB takes the form: t J = min u ( J J x 2 + x 1 + J ) u x 1 x 2 x 2 J = x 2 J x 1 + x 1 J x 2 x 2, u = sign ( J x 2 ) Bert Kappen Pascal workshop, May
13 Example: Bang-bang control The solution is J(t, x 1, x 2 ) = cos(t T )x 1 + sin(t T )x 2 + α(t) u(t, x 1, x 2 ) = sign(sin(t T )) As an example consider T = 2π. Then, the optimal control is u = 1, u = 1, 0 < t < π π < t < 2π x 1 x t Bert Kappen Pascal workshop, May
14 Stochastic optimal control Consider a stochastic dynamical system dx = f(t, x, u)dt + dξ dξ Gaussian noise dξ 2 = νdt. The cost becomes an expectation: C(t, x, u(t T )) = φ(x(t )) + T t dτ R(t, x(t), u(t)) over all stochastic trajectories starting at x with control path u(t T ). Bert Kappen Pascal workshop, May
15 Stochastic optimal control We obtain a similar discrete time recursion: J(t, x t ) = min u t R(t, x t, u t ) + J(t + dt, x t+dt ) In the limit of continuous time we get J(t + dt, x t+dt ) = J(t, x t ) + dt t J(t, x t ) + dx x J(t, x t ) dx 2 2 xj(t, x t ) dx = f(x, u, t)dt dx 2 = νdt Thus, t J(t, x) = min u ( R(t, x, u) + f(x, u, t) x J(x, t) + 1 ) 2 ν 2 xj(x, t) with boundary condition J(x, T ) = φ(x). Bert Kappen Pascal workshop, May
16 Path integral control Consider the special case: f(t, x, u) = f(t, x) + u R(t, x, u) = V (t, x) u2 then t J = min u ( 1 2 u2 + V + f x J + 1 ) 2 ν 2 xj = 1 2 ( xj) 2 + V + f x J ν 2 xj u = x J(x, t) Bert Kappen Pascal workshop, May
17 Solution 1. define Ψ(x, t) = exp( J(x, t)/ν), then t Ψ = V ν Ψ + f xψ ν 2 xψ, Ψ(x, T ) = exp( φ(x)/ν) = HΨ 2. define the conditional probability ρ(y, τ x, t), τ t through a diffusion equation: τ ρ = V ν ρ y(fρ) ν 2 yρ, ρ(y, t x, t) = δ(y x) = H Ψ 3. By construction, dyρ(y, τ x, t)ψ(y, τ) independent of τ. Bert Kappen Pascal workshop, May
18 4. Evaluate at t and T : Ψ(x, t) = dyρ(y, T x, t) exp ( φ(y)/ν) Ψ gives J gives u. Bert Kappen Pascal workshop, May
19 An example: double slit 8 6 dx = udt + dξ C = 1 2 x(t )2 + T 0 dτ 1 2 u(τ)2 + V (x, t) V (x, t = 1) implements a slit at an intermediate time t = 1. Ψ(x, t) = can be solved in closed form. dyρ(y, T x, t)ψ(y, T ) J t=0 t=0.99 t=1.01 t= x Bert Kappen Pascal workshop, May
20 The delayed choice x t Obstacle avoidance requires mechanism when to decide. We take V = 0 and f = 0 and φ(x) = for all x, except for two narrow slits of infinitesimal size ɛ at x = ±1. Bert Kappen Pascal workshop, May
21 We can compute J exactly and is given by J(x, T ) = ν log = 1 T dyρ(y x)e φ(y)/ν ( 1 2 x2 νt log 2 cosh x ) νt J(x,t) T=2 T=1 T=0.5 where T the time to reach the slits. The expression between brackets is a typical free energy with temperature νt x Symmetry breaking at νt = 1 separates two qualitatively different behaviours. Bert Kappen Pascal workshop, May
22 The delayed choice 2 stochastic 2 deterministic The timing of the decision, that is when the automaton decides to go left or right, is the consequence of spontaneous symmetry breaking. Bert Kappen Pascal workshop, May
23 The diffusion process ρ(y, τ x, t) satisfies the diffusion equation: τ ρ = V ν ρ y(fρ) ν 2 yρ, τ = t T ρ(y, t x, t) = δ(y x) and can be sampled as dy = f(y, t)dt + dξ y = y + dy, with probability 1 V (y, t)dt/ν y =, with probability V (y, t)dt/ν Bert Kappen Pascal workshop, May
24 The diffusion process The diffusion process can be written as a path integral: ρ(y, T x, t) = [dx] y x exp ( 1 ) ν S path(x(t T )) S path (x(t T )) = T t dτ 1 2 (ẋ(τ) f(x(τ), τ))2 + V (x(τ), τ) x y t t f Bert Kappen Pascal workshop, May
25 The path integral formulation Ψ(x, t) = = ( dyρ(y, T x, t) exp φ(x) ) ν [dx] x exp ( 1ν ) S(x(t T )) S(x(t T )) = S path (x(t T ) + φ(x(t )) Ψ is a partition sum and J = ν log Ψ therefore can be interpreted as a free energy. S is the energy of a path and ν the temperature. The corresponding probability distribution is p(x(t T ) x, t) = 1 ( Ψ(x, t) exp 1ν ) S(x(t T )) Bert Kappen Pascal workshop, May
26 Gibbs sampling Sample paths x 0:n from p(x 0:n ) exp( S(x 0:n )/ν) End cost φ(x n ) centered on target. Path cost V (x) for obstacles ( 1 J(x, t) = ν log N udt = exp(j/ν) N ) N exp( S(x i 0:n)/ν) i N exp( S(x i 0:n)/ν)dξ i i Bert Kappen Pascal workshop, May
27 n agents with independent dynamics Coordination of agents dx α = (f α (x α, t) + u α ) + dξ α, α = 1,..., n should coordinate their actions to minimize a cost at a future time t = T : φ(y 1,..., y n ) y α {z 1,... z k } and φ = elsewhere. Bert Kappen Pascal workshop, May
28 Coordination of agents Then, Ψ(x 1,..., x n, t) = = y dy 1... dy n ρ(y α, T x α, t) exp( φ(y 1,..., y n )/ν) α exp( E( y x, t)/ν) p( y) = 1 exp( E( y x, t)/ν) Z log ρ(yα, T x α, t) u α ( x, t) = xα J = x α with x = (x 1,..., x n ), y = (y 1,..., y n ). E has a graphical model structure if φ has. Bert Kappen Pascal workshop, May
29 Pseudo code Loop: 1. Compute the cost and its log derivative for each agent to move to each target: ρ(z i, T x α, t), i = 1,..., k, α = 1,..., n This path integral can be estimated using MC sampling or variational approximation. 2. Compute u α using graphical model inference in p( y) (exact, BP, MF). Bert Kappen Pascal workshop, May
30 A simple 1d example Intrinsic dynamics f α = 0, V (x 1,..., x n ) = 0: p(y α, T x α, t) exp( (y α x α ) 2 /2ν(T t)) End cost φ(y 1,..., y n ) = k j=1 (n j( y) n j ) 2, with n j ( y) the # of agents that go to target j. Optimal control is for agent α is u α = 1 T t ( y α x α ) Bert Kappen Pascal workshop, May
31 A simple 1d example <y> 0 x t t (a) Agent predicted target y α (b) Agent position x Bert Kappen Pascal workshop, May
32 A simple 1d example Cost Difference Noise CPU Time Agents Control cost greedy control (red) MF control (blue) BP control (green) CPU time exact control (black) MF control (blue) BP control (green) greedy control (red) Bert Kappen Pascal workshop, May
33 Nonlinear Coordination Agents a = 1,..., n in 2D: dx a (t) = v a (t) cos ϕ a (t) dt dy a (t) = v a (t) sin ϕ a (t) dt dv a (t) = u a (t)dt + dξ a (t) dϕ a (t) = ω a (t)dt + dζ a (t) Initial states O, v a (0) = 0, ϕ a (0) = 0 Targets X, v a (T ) = 0, ϕ a (T ) = 0 Sample paths specified at t i = t + i dt, i = 0,..., 6, dt = (T t)/6 Example of 10 agents & 10 targets: Sample paths: Bert Kappen Pascal workshop, May
34 Computation Time Inference methods: Junction Tree ( ) MF ( ) (100 sample paths per agent-target) CPU time (s) vs. number of agents: CPU time Number of Agents (# agents = # targets) JT MF : exponential in number of agents (intractable for # agents > 10) : polynomial in number of agents Bert Kappen Pascal workshop, May
35 Summary A restricted class of control problems can be reformulated in statistical physics language. - path integrals - symmetry breaking - efficient computation (MCMC, BP, MF, EP) - coordination of agents Future: - Robotics in dynamical environment - Learning/exploration Bert Kappen Pascal workshop, May
36 Further reading H.J. Kappen, Physical Review Letters (2005) H.J. Kappen, Journal of statistical mechanics: theory and experiment, November 2005 P11011 W. Wiegerinck, B. van den Broek, H.J. Kappen, Proceedings UAI (2006) H.J. Kappen, 9th Granada seminar on Computational Physics: Computational and Mathematical Modeling of Cooperative Behavior in Neural Systems, Americal Institute of Physics (2007) Bert Kappen Pascal workshop, May
37 Learning Model-based: first learn a model and then do optimal control. Model-free: interleave learning and optimal control - more natural biologically and for AI - problem of exploration-exploitation:. Intermediate control is suboptimal. Control theory does not address exploration RL/actor critic approach: exploration = exploitation + noise PI control: exploration is forward diffusion. Ψ(x i, 0) = exp dt λ j=i+n j=i V (x j ) with T = ndt and x j, j = i, i + 1,..., i + n the states visited after state x i. Exploration can be optimized as in important sampling. Bert Kappen Pascal workshop, May
38 Learning x x J T=3 J T=10 V T*V J mc J lp x T*V 11 J mc J lp x Sampling of J(x) with one trajectory of N = 8000 iterations starting at x = 0. Left: The diffusion process dx = dξ explores the area between x = 7.5 and x = 6. Shown is a histogram of the points visited (300 bins). In each bin x, an estimate of ψ(x) is made by averaging all ψ(x i ) with x i from bin x. Right: V (x) and J T (x)/t versus x for T = 3 and T = 10. Bert Kappen Pascal workshop, May
39 A neural implementation/thinking ahead Topological map represents space x. Neuron i is active when animal at x. dρ i dt = V i λ ρ i(t) + ν 2 D ij ρ j (t) with D the diffusion matrix D ii = 2, D ii+1 = D ii 1 = 1 and all other entries of D are zero. V i is the immediate reward at location i. Some mechanism ensures i ρ i(t) = 1. j T=0.1 T=5 T= Thinking ahead. When the animal is at x 1 it can start the diffusion dynamics to anticipate what will happen in the future. Bert Kappen Pascal workshop, May
Stochastic optimal control theory
Stochastic optimal control theory Bert Kappen SNN Radboud University Nijmegen the Netherlands July 5, 2008 Bert Kappen Introduction Optimal control theory: Optimize sum of a path cost and end cost. Result
More informationStochastic Optimal Control in Continuous Space-Time Multi-Agent Systems
Stochastic Optimal Control in Continuous Space-Time Multi-Agent Systems Wim Wiegerinck Bart van den Broek Bert Kappen SNN, Radboud University Nijmegen 6525 EZ Nijmegen, The Netherlands {w.wiegerinck,b.vandenbroek,b.kappen}@science.ru.nl
More informationA path integral approach to agent planning
A path integral approach to agent planning Hilbert J. Kappen Department of Biophysics Radboud University Nijmegen, The Netherlands b.kappen@science.ru.nl Wim Wiegerinck B. van den Broek Department of Biophysics
More informationStochastic optimal control theory
Stochastic optimal control theory ICML, Helsinki 8 tutorial H.J. Kappen, Radboud University, Nijmegen, the Netherlands July 4, 8 Abstract Control theory is a mathematical description of how to act optimally
More informationAn introduction to stochastic control theory, path integrals and reinforcement learning
An introduction to stochastic control theory, path integrals and reinforcement learning Hilbert J. Kappen Department of Biophysics, Radboud University, Geert Grooteplein 21, 6525 EZ Nijmegen Abstract.
More informationLatent state estimation using control theory
Latent state estimation using control theory Bert Kappen SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London August 3, 7 with Hans Christian Ruiz Bert Kappen Smoothing problem Given
More informationarxiv: v3 [math.oc] 18 Jan 2012
Optimal control as a graphical model inference problem Hilbert J. Kappen Vicenç Gómez Manfred Opper arxiv:0901.0633v3 [math.oc] 18 Jan 01 Abstract We reformulate a class of non-linear stochastic optimal
More informationPath Integral Stochastic Optimal Control for Reinforcement Learning
Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute
More informationReinforcement learning
Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationStochastic and Adaptive Optimal Control
Stochastic and Adaptive Optimal Control Robert Stengel Optimal Control and Estimation, MAE 546 Princeton University, 2018! Nonlinear systems with random inputs and perfect measurements! Stochastic neighboring-optimal
More informationOptimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:
Optimal Control Control design based on pole-placement has non unique solutions Best locations for eigenvalues are sometimes difficult to determine Linear Quadratic LQ) Optimal control minimizes a quadratic
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationPDF hosted at the Radboud Repository of the Radboud University Nijmegen
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a preprint version which may differ from the publisher's version. For additional information about this
More informationReinforcement Learning
1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision
More informationMessage passing and approximate message passing
Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x
More informationDeterministic Dynamic Programming
Deterministic Dynamic Programming 1 Value Function Consider the following optimal control problem in Mayer s form: V (t 0, x 0 ) = inf u U J(t 1, x(t 1 )) (1) subject to ẋ(t) = f(t, x(t), u(t)), x(t 0
More informationRobotics. Control Theory. Marc Toussaint U Stuttgart
Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationControlled Diffusions and Hamilton-Jacobi Bellman Equations
Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter
More informationClosed-Loop Impulse Control of Oscillating Systems
Closed-Loop Impulse Control of Oscillating Systems A. N. Daryin and A. B. Kurzhanski Moscow State (Lomonosov) University Faculty of Computational Mathematics and Cybernetics Periodic Control Systems, 2007
More informationLecture Note 13:Continuous Time Switched Optimal Control: Embedding Principle and Numerical Algorithms
ECE785: Hybrid Systems:Theory and Applications Lecture Note 13:Continuous Time Switched Optimal Control: Embedding Principle and Numerical Algorithms Wei Zhang Assistant Professor Department of Electrical
More informationL 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.
L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen Outline George McCullochs model The Variational Garrote Bert Kappen L methods
More informationEN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018
EN530.603 Applied Optimal Control Lecture 8: Dynamic Programming October 0, 08 Lecturer: Marin Kobilarov Dynamic Programming (DP) is conerned with the computation of an optimal policy, i.e. an optimal
More informationLecture 6: Bayesian Inference in SDE Models
Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationProf. Krstic Nonlinear Systems MAE281A Homework set 1 Linearization & phase portrait
Prof. Krstic Nonlinear Systems MAE28A Homework set Linearization & phase portrait. For each of the following systems, find all equilibrium points and determine the type of each isolated equilibrium. Use
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationSolution of Stochastic Optimal Control Problems and Financial Applications
Journal of Mathematical Extension Vol. 11, No. 4, (2017), 27-44 ISSN: 1735-8299 URL: http://www.ijmex.com Solution of Stochastic Optimal Control Problems and Financial Applications 2 Mat B. Kafash 1 Faculty
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationReinforcement Learning In Continuous Time and Space
Reinforcement Learning In Continuous Time and Space presentation of paper by Kenji Doya Leszek Rybicki lrybicki@mat.umk.pl 18.07.2008 Leszek Rybicki lrybicki@mat.umk.pl Reinforcement Learning In Continuous
More informationOptimal control as a graphical model inference problem
DOI 10.1007/s10994-012-5278-7 Optimal control as a graphical model inference problem Hilbert J. Kappen Vicenç Gómez Manfred Opper Received: 3 December 2010 / Accepted: 11 January 2012 The Author(s) 2012.
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More informationPolicy Search for Path Integral Control
Policy Search for Path Integral Control Vicenç Gómez 1,2, Hilbert J Kappen 2, Jan Peters 3,4, and Gerhard Neumann 3 1 Universitat Pompeu Fabra, Barcelona Department of Information and Communication Technologies,
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationPlanning by Probabilistic Inference
Planning by Probabilistic Inference Hagai Attias Microsoft Research 1 Microsoft Way Redmond, WA 98052 Abstract This paper presents and demonstrates a new approach to the problem of planning under uncertainty.
More informationTowards a Bayesian model for Cyber Security
Towards a Bayesian model for Cyber Security Mark Briers (mbriers@turing.ac.uk) Joint work with Henry Clausen and Prof. Niall Adams (Imperial College London) 27 September 2017 The Alan Turing Institute
More informationAnimal learning theory
Animal learning theory Based on [Sutton and Barto, 1990, Dayan and Abbott, 2001] Bert Kappen [Sutton and Barto, 1990] Classical conditioning: - A conditioned stimulus (CS) and unconditioned stimulus (US)
More informationProbabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two
More informationNonlinear and robust MPC with applications in robotics
Nonlinear and robust MPC with applications in robotics Boris Houska, Mario Villanueva, Benoît Chachuat ShanghaiTech, Texas A&M, Imperial College London 1 Overview Introduction to Robust MPC Min-Max Differential
More informationA Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games
International Journal of Fuzzy Systems manuscript (will be inserted by the editor) A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games Mostafa D Awheda Howard M Schwartz Received:
More informationTheoretical Tutorial Session 2
1 / 36 Theoretical Tutorial Session 2 Xiaoming Song Department of Mathematics Drexel University July 27, 216 Outline 2 / 36 Itô s formula Martingale representation theorem Stochastic differential equations
More informationLecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications
Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section
More informationLecture 7 Unconstrained nonlinear programming
Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,
More informationLinear Differential Equations. Problems
Chapter 1 Linear Differential Equations. Problems 1.1 Introduction 1.1.1 Show that the function ϕ : R R, given by the expression ϕ(t) = 2e 3t for all t R, is a solution of the Initial Value Problem x =
More informationUncertainty quantification and systemic risk
Uncertainty quantification and systemic risk Josselin Garnier (Université Paris Diderot) with George Papanicolaou and Tzu-Wei Yang (Stanford University) February 3, 2016 Modeling systemic risk We consider
More informationLinear SPDEs driven by stationary random distributions
Linear SPDEs driven by stationary random distributions aluca Balan University of Ottawa Workshop on Stochastic Analysis and Applications June 4-8, 2012 aluca Balan (University of Ottawa) Linear SPDEs with
More informationSmoluchowski Diffusion Equation
Chapter 4 Smoluchowski Diffusion Equation Contents 4. Derivation of the Smoluchoswki Diffusion Equation for Potential Fields 64 4.2 One-DimensionalDiffusoninaLinearPotential... 67 4.2. Diffusion in an
More informationChapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationUNIVERSITY OF MANITOBA
DATE: May 8, 2015 Question Points Score INSTRUCTIONS TO STUDENTS: This is a 6 hour examination. No extra time will be given. No texts, notes, or other aids are permitted. There are no calculators, cellphones
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationLecture 12: Detailed balance and Eigenfunction methods
Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2015 Lecture 12: Detailed balance and Eigenfunction methods Readings Recommended: Pavliotis [2014] 4.5-4.7 (eigenfunction methods and reversibility),
More informationCS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling
CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy
More informationData Structures for Efficient Inference and Optimization
Data Structures for Efficient Inference and Optimization in Expressive Continuous Domains Scott Sanner Ehsan Abbasnejad Zahra Zamani Karina Valdivia Delgado Leliane Nunes de Barros Cheng Fang Discrete
More informationHigher-Order Dynamics in Asset-Pricing Models with Recursive Preferences
Higher-Order Dynamics in Asset-Pricing Models with Recursive Preferences Walt Pohl Karl Schmedders Ole Wilms Dept. of Business Administration, University of Zurich Becker Friedman Institute Computational
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationElectrodynamics Exam Solutions
Electrodynamics Exam Solutions Name: FS 215 Prof. C. Anastasiou Student number: Exercise 1 2 3 4 Total Max. points 15 15 15 15 6 Points Visum 1 Visum 2 The exam lasts 18 minutes. Start every new exercise
More informationExistence and Comparisons for BSDEs in general spaces
Existence and Comparisons for BSDEs in general spaces Samuel N. Cohen and Robert J. Elliott University of Adelaide and University of Calgary BFS 2010 S.N. Cohen, R.J. Elliott (Adelaide, Calgary) BSDEs
More informationReal Time Stochastic Control and Decision Making: From theory to algorithms and applications
Real Time Stochastic Control and Decision Making: From theory to algorithms and applications Evangelos A. Theodorou Autonomous Control and Decision Systems Lab Challenges in control Uncertainty Stochastic
More informationMethods of Data Analysis Random numbers, Monte Carlo integration, and Stochastic Simulation Algorithm (SSA / Gillespie)
Methods of Data Analysis Random numbers, Monte Carlo integration, and Stochastic Simulation Algorithm (SSA / Gillespie) Week 1 1 Motivation Random numbers (RNs) are of course only pseudo-random when generated
More informationConnections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN
Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables Revised submission to IEEE TNN Aapo Hyvärinen Dept of Computer Science and HIIT University
More informationHybrid Automata and ɛ-analysis on a Neural Oscillator
Hybrid Automata and ɛ-analysis on a Neural Oscillator A. Casagrande 1 T. Dreossi 2 C. Piazza 2 1 DMG, University of Trieste, Italy 2 DIMI, University of Udine, Italy Intuitively... Motivations: Reachability
More informationKolmogorov Equations and Markov Processes
Kolmogorov Equations and Markov Processes May 3, 013 1 Transition measures and functions Consider a stochastic process {X(t)} t 0 whose state space is a product of intervals contained in R n. We define
More informationRecitation 9: Loopy BP
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 204 Recitation 9: Loopy BP General Comments. In terms of implementation,
More informationThis is a Gaussian probability centered around m = 0 (the most probable and mean position is the origin) and the mean square displacement m 2 = n,or
Physics 7b: Statistical Mechanics Brownian Motion Brownian motion is the motion of a particle due to the buffeting by the molecules in a gas or liquid. The particle must be small enough that the effects
More informationIntroduction. Stochastic Processes. Will Penny. Stochastic Differential Equations. Stochastic Chain Rule. Expectations.
19th May 2011 Chain Introduction We will Show the relation between stochastic differential equations, Gaussian processes and methods This gives us a formal way of deriving equations for the activity of
More informationMath 211. Substitute Lecture. November 20, 2000
1 Math 211 Substitute Lecture November 20, 2000 2 Solutions to y + py + qy =0. Look for exponential solutions y(t) =e λt. Characteristic equation: λ 2 + pλ + q =0. Characteristic polynomial: λ 2 + pλ +
More informationStrong Markov property of determinantal processes associated with extended kernels
Strong Markov property of determinantal processes associated with extended kernels Hideki Tanemura Chiba university (Chiba, Japan) (November 22, 2013) Hideki Tanemura (Chiba univ.) () Markov process (November
More informationInference in Bayesian Networks
Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)
More informationELEMENTS OF PROBABILITY THEORY
ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable
More information4 The Continuous Time Fourier Transform
96 4 The Continuous Time ourier Transform ourier (or frequency domain) analysis turns out to be a tool of even greater usefulness Extension of ourier series representation to aperiodic signals oundation
More informationCISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)
CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models
More informationarxiv: v1 [cs.lg] 20 Sep 2010
Approximate Inference and Stochastic Optimal Control Konrad Rawlik 1, Marc Toussaint 2, and Sethu Vijayakumar 1 1 Statistical Machine Learning and Motor Control Group, University of Edinburgh 2 Machine
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationMortality Surface by Means of Continuous Time Cohort Models
Mortality Surface by Means of Continuous Time Cohort Models Petar Jevtić, Elisa Luciano and Elena Vigna Longevity Eight 2012, Waterloo, Canada, 7-8 September 2012 Outline 1 Introduction Model construction
More information0.3.4 Burgers Equation and Nonlinear Wave
16 CONTENTS Solution to step (discontinuity) initial condition u(x, 0) = ul if X < 0 u r if X > 0, (80) u(x, t) = u L + (u L u R ) ( 1 1 π X 4νt e Y 2 dy ) (81) 0.3.4 Burgers Equation and Nonlinear Wave
More informationFirst order differential equations
First order differential equations Samy Tindel Purdue University Differential equations and linear algebra - MA 262 Taken from Differential equations and linear algebra by Goode and Annin Samy T. First
More informationONR MURI AIRFOILS: Animal Inspired Robust Flight with Outer and Inner Loop Strategies. Calin Belta
ONR MURI AIRFOILS: Animal Inspired Robust Flight with Outer and Inner Loop Strategies Provable safety for animal inspired agile flight Calin Belta Hybrid and Networked Systems (HyNeSs) Lab Department of
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationDynamical systems with Gaussian and Levy noise: analytical and stochastic approaches
Dynamical systems with Gaussian and Levy noise: analytical and stochastic approaches Noise is often considered as some disturbing component of the system. In particular physical situations, noise becomes
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More information2.152 Course Notes Contraction Analysis MIT, 2005
2.152 Course Notes Contraction Analysis MIT, 2005 Jean-Jacques Slotine Contraction Theory ẋ = f(x, t) If Θ(x, t) such that, uniformly x, t 0, F = ( Θ + Θ f x )Θ 1 < 0 Θ(x, t) T Θ(x, t) > 0 then all solutions
More informationApproximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)
Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles
More informationAutonomous Helicopter Flight via Reinforcement Learning
Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy
More informationPhysics 202 Laboratory 3. Root-Finding 1. Laboratory 3. Physics 202 Laboratory
Physics 202 Laboratory 3 Root-Finding 1 Laboratory 3 Physics 202 Laboratory The fundamental question answered by this week s lab work will be: Given a function F (x), find some/all of the values {x i }
More informationLinearly-Solvable Stochastic Optimal Control Problems
Linearly-Solvable Stochastic Optimal Control Problems Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter 2014
More informationChapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationMetric Spaces. Exercises Fall 2017 Lecturer: Viveka Erlandsson. Written by M.van den Berg
Metric Spaces Exercises Fall 2017 Lecturer: Viveka Erlandsson Written by M.van den Berg School of Mathematics University of Bristol BS8 1TW Bristol, UK 1 Exercises. 1. Let X be a non-empty set, and suppose
More informationReflected Brownian Motion
Chapter 6 Reflected Brownian Motion Often we encounter Diffusions in regions with boundary. If the process can reach the boundary from the interior in finite time with positive probability we need to decide
More informationStochastic Spectral Approaches to Bayesian Inference
Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to
More informationThis homework will not be collected or graded. It is intended to help you practice for the final exam. Solutions will be posted.
6.003 Homework #14 This homework will not be collected or graded. It is intended to help you practice for the final exam. Solutions will be posted. Problems 1. Neural signals The following figure illustrates
More information5. Sum-product algorithm
Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider
More informationSupporting Information
Supporting Information A: Calculation of radial distribution functions To get an effective propagator in one dimension, we first transform 1) into spherical coordinates: x a = ρ sin θ cos φ, y = ρ sin
More informationMATH 220: Problem Set 3 Solutions
MATH 220: Problem Set 3 Solutions Problem 1. Let ψ C() be given by: 0, x < 1, 1 + x, 1 < x < 0, ψ(x) = 1 x, 0 < x < 1, 0, x > 1, so that it verifies ψ 0, ψ(x) = 0 if x 1 and ψ(x)dx = 1. Consider (ψ j )
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationPartially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS
Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal
More informationLecture 3. Dynamical Systems in Continuous Time
Lecture 3. Dynamical Systems in Continuous Time University of British Columbia, Vancouver Yue-Xian Li November 2, 2017 1 3.1 Exponential growth and decay A Population With Generation Overlap Consider a
More informationOpen quantum random walks: bi-stability and ballistic diffusion. Open quantum brownian motion
Open quantum random walks: bi-stability and ballistic diffusion Open quantum brownian motion with Michel Bauer and Antoine Tilloy Autrans, July 2013 Different regimes in «open quantum random walks»: Open
More informationC.-H. Lamarque. University of Lyon/ENTPE/LGCB & LTDS UMR CNRS 5513
Nonlinear Dynamics of Smooth and Non-Smooth Systems with Application to Passive Controls 3rd Sperlonga Summer School on Mechanics and Engineering Sciences on Dynamics, Stability and Control of Flexible
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and
More information