Stochastic Gradient Descent in Continuous Time

Size: px
Start display at page:

Download "Stochastic Gradient Descent in Continuous Time"

Transcription

1 Stochastic Gradient Descent in Continuous Time Justin Sirignano University of Illinois at Urbana Champaign with Konstantinos Spiliopoulos (Boston University) 1 / 27

2 We consider a diffusion X t X = R m : dx t = f (X t )dt + σdw t. The goal is to statistically estimate a model f (x, θ) for f (x) where θ R n. f (x, θ) and f (x) may be non-convex W t is a standard Brownian motion. The diffusion term W t represents any random behavior of the system or environment. 2 / 27

3 The parameter update satisfies the SDE: dθ t = α t [ θ f (X t ; θ t )(σσ T ) 1 dx t θ f (X t, θ t )(σσ T ) 1 f (X t, θ t )dt ] α t is the learning rate Can be used for both: Statistical estimation given previously observed data Online learning (i.e., statistical estimation in real-time as data becomes available) If m = 1 and σ = 1: dθ t = α t [ θ f (X t ; θ t )dx t θ f (X t, θ t )f (X t, θ t )dt ] 3 / 27

4 Why is Stochastic Gradient Descent in Continuous Time useful? Physics and engineering models are typically in continuous time. It therefore makes sense to also develop the statistical learning updates in continuous time. Continuous-time dynamics are oftentimes much simpler than discrete dynamics at longer time intervals. 4 / 27

5 Although stochastic gradient descent in continuous time must ultimately be discretized for numerical implementation, the continuous-time framework still has significant numerical advantages. Continuous-time stochastic gradient descent allows for the control and reduction of numerical error due to discretization Example 1: Higher-order numerical schemes for numerical solution of SDE Example 2: Non-uniform time step sizes. If convergence is slow, the time step size may be adaptively decreased. In contrast, discrete-time stochastic gradient descent uses fixed discrete steps and cannot do this. 5 / 27

6 Overview of Result Assume X t is ergodic and has a unique invariant measure π(dx). Define: h(θ) = X h(x, θ)π(dx) Define the natural objective function: g(x, θ) = 1 2 f (x, θ) f (x) 2 σσ T We show that lim ḡ(θ t) = 0, almost surely. t 6 / 27

7 Assumptions Assume that 0 α t dt =, 0 αt 2 dt < and that 0 α s ds <. The condition 0 α s ds < follows immediately from the other two restrictions for the learning rate if it is chosen to be a monotonic function of t. A standard choice is α t = 1 C+t for some constant 0 < C <. Polynomial bounds on g and f is Lipschitz (see our paper for details) 7 / 27

8 Related Literature Extensive research on stochastic gradient descent in discrete time. Relatively little research for continuous time Bertsekas and Tsitsiklis (2000) prove convergence of stochastic gradient descent in discrete time in the absence of the X process. The X term introduces correlation across times, and this correlation does not disappear as t Unlike in Bertsekas and Tsitsiklis (2000) where parameter updates are unbiased and noise is i.i.d., the X process causes parameter updates to be biased and correlated across times. This complicates the analysis. 8 / 27

9 ODE method : proves discrete-time stochastic gradient descent converges to the solution of an ODE which itself converges to a limiting point, Kushner and Yin (2003), Benveniste, Metivier and Priouret (2012) Requires the strong assumption that the iterates (i.e., the model parameters which are being learned) remain in a bounded set with probability one. Proving that the iterates remain in a bounded set with probability one can be challenging to show and, moreover, may not necessarily be true for all models. 9 / 27

10 Proof Approach Consider the cycles of random times where for k = 1, 2, 0 = σ 0 τ 1 σ 1 τ 2 σ 2... τ k = inf{t > σ k 1 : ḡ(θ t ) κ} σ k = sup{t > τ k : ḡ(θ τ k ) ḡ(θ s ) 2 ḡ(θ τk ) 2 for all s [τ k, t] and t τ k α s ds λ} The purpose of these random times is to control the periods of time where ḡ(θ ) is close to zero and away from zero. Let us next define the random time intervals I k = [τ k, σ k ) and J k = [σ k 1, τ k ). Notice that for every t J k we have ḡ(θ t ) < κ. 10 / 27

11 Suppose that there are an infinite number of intervals I k = [τ k, σ k ). There is a fixed constant γ = γ(κ) > 0 such that for k large enough, one has Then, ḡ(θ t ). ḡ(θ σk ) ḡ(θ τk ) γ However, ḡ 0. Therefore (by contradiction) there are a finite number of intervals I k. 11 / 27

12 + + + σk ḡ(θ σk ) ḡ(θ τk ) = σk σk τ k σk τ k α s ḡ(θ s ) 2 ds α s ḡ(θs ), θ f (X s, θ s )σ 1 dw s τ k αs 2 [ ] 2 tr ( θ f (X s, θ s )σ 1 )( θ f (X s, θ s )σ 1 ) T θ θ ḡ(θ s ) τ k α s θ ḡ(θ s ), θ ḡ(θ s ) θ g(x s, θ s ) ds ds Recall that 0 α t dt = and 0 α 2 t dt < (Ex: α t = 1 1+t ). 12 / 27

13 Most Difficult Term σk τ k α s θ ḡ(θ s ), θ ḡ(θ s ) θ g(x s, θ s ) ds Rewrite this term using an associated Poisson equation. Assume: X G(x, θ)π(dx) = 0 Let L x be the generator for the X process. Then the Poisson equation L x u(x, θ) = G(x, θ) has a unique solution (with some nice properties). 13 / 27

14 Numerical Examples Ornstein-Uhlenbeck (OU) process Multi-dimensional OU process Burger s equation Reinforcement learning 14 / 27

15 The Ornstein-Uhlenbeck (OU) process X t R satisfies the stochastic differential equation: dx t = c(m X t )dt + dw t. We use continuous stochastic gradient descent to learn the parameters θ = (c, m) R 2. f (x, θ) = c(m x) and f (x) = f (x, θ ) 15 / 27

16 We study 10, 500 cases. For each case, a different θ is randomly generated in the range [1, 2] [1, 2]. For each case, we solve for the parameter θ t over the time period [0, T ] for T = To summarize: For cases n = 1 to 10,500 Generate a random θ in [1, 2] [1, 2] Simulate a single path of X t given θ and simultaneously solve for the path of θ t on [0, T ] 16 / 27

17 Figure: Mean error in percent plotted against time. Time is in log scale. 17 / 27

18 Figure: Mean squared error plotted against time. Time is in log scale. 18 / 27

19 The multidimensional Ornstein-Uhlenbeck process X t R d satisfies the stochastic differential equation: dx t = (M AX t )dt + dw t. We use continuous stochastic gradient descent to learn the parameters θ = (M, A) R d R d d. 19 / 27

20 Figure: Mean error in percent plotted against time. Time is in log scale. 20 / 27

21 Figure: Mean squared error plotted against time. Time is in log scale. 21 / 27

22 The stochastic Burger s equation is: u t (t, x) = θ 2 u u(t, x) u x 2 x (t, x) + σ 2 W (t, x), t x where x [0, 1] and W (t, x) is a Brownian sheet. 22 / 27

23 Error/Time Maximum Error % quantile of error Mean squared error Mean Error in percent Maximum error in percent % quantile of error in percent Table: Error at different times for the estimate θ t of θ across 525 cases. The error is θ n t θ,n where n represents the n-th case. The error in percent is 100 θ n t θ,n / θ,n. 23 / 27

24 Figure: Mean error in percent plotted against time. Time is in log scale. 24 / 27

25 Figure: Mean squared error plotted against time. Time is in log scale. 25 / 27

26 We consider the classic reinforcement learning problem of balancing a pole on a moving cart. The goal is to balance a pole on a cart and to keep the cart from moving outside the boundaries via applying a force of ±10 Newtons. The position x of the cart, the velocity ẋ of the cart, angle of the pole β, and angular velocity β of the pole are observed. The dynamics of (x, ẋ, β, β) satisfy a set of ODEs 26 / 27

27 Reward/Episode Maximum Reward % quantile of reward Mean reward % quantile of reward Minimum reward Table: Reward at the k-th episode across the 525 cases using continuous stochastic gradient descent to learn the model dynamics. 27 / 27

On the fast convergence of random perturbations of the gradient flow.

On the fast convergence of random perturbations of the gradient flow. On the fast convergence of random perturbations of the gradient flow. Wenqing Hu. 1 (Joint work with Chris Junchi Li 2.) 1. Department of Mathematics and Statistics, Missouri S&T. 2. Department of Operations

More information

A random perturbation approach to some stochastic approximation algorithms in optimization.

A random perturbation approach to some stochastic approximation algorithms in optimization. A random perturbation approach to some stochastic approximation algorithms in optimization. Wenqing Hu. 1 (Presentation based on joint works with Chris Junchi Li 2, Weijie Su 3, Haoyi Xiong 4.) 1. Department

More information

Quasi Stochastic Approximation American Control Conference San Francisco, June 2011

Quasi Stochastic Approximation American Control Conference San Francisco, June 2011 Quasi Stochastic Approximation American Control Conference San Francisco, June 2011 Sean P. Meyn Joint work with Darshan Shirodkar and Prashant Mehta Coordinated Science Laboratory and the Department of

More information

Introduction to Random Diffusions

Introduction to Random Diffusions Introduction to Random Diffusions The main reason to study random diffusions is that this class of processes combines two key features of modern probability theory. On the one hand they are semi-martingales

More information

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS PROBABILITY: LIMIT THEOREMS II, SPRING 218. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please

More information

Controlled Diffusions and Hamilton-Jacobi Bellman Equations

Controlled Diffusions and Hamilton-Jacobi Bellman Equations Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter

More information

Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms

Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms Vladislav B. Tadić Abstract The almost sure convergence of two time-scale stochastic approximation algorithms is analyzed under

More information

Partial Differential Equations with Applications to Finance Seminar 1: Proving and applying Dynkin s formula

Partial Differential Equations with Applications to Finance Seminar 1: Proving and applying Dynkin s formula Partial Differential Equations with Applications to Finance Seminar 1: Proving and applying Dynkin s formula Group 4: Bertan Yilmaz, Richard Oti-Aboagye and Di Liu May, 15 Chapter 1 Proving Dynkin s formula

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Outline. A Central Limit Theorem for Truncating Stochastic Algorithms

Outline. A Central Limit Theorem for Truncating Stochastic Algorithms Outline A Central Limit Theorem for Truncating Stochastic Algorithms Jérôme Lelong http://cermics.enpc.fr/ lelong Tuesday September 5, 6 1 3 4 Jérôme Lelong (CERMICS) Tuesday September 5, 6 1 / 3 Jérôme

More information

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS PROBABILITY: LIMIT THEOREMS II, SPRING 15. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please

More information

Fast-slow systems with chaotic noise

Fast-slow systems with chaotic noise Fast-slow systems with chaotic noise David Kelly Ian Melbourne Courant Institute New York University New York NY www.dtbkelly.com May 1, 216 Statistical properties of dynamical systems, ESI Vienna. David

More information

Interest Rate Models:

Interest Rate Models: 1/17 Interest Rate Models: from Parametric Statistics to Infinite Dimensional Stochastic Analysis René Carmona Bendheim Center for Finance ORFE & PACM, Princeton University email: rcarmna@princeton.edu

More information

I forgot to mention last time: in the Ito formula for two standard processes, putting

I forgot to mention last time: in the Ito formula for two standard processes, putting I forgot to mention last time: in the Ito formula for two standard processes, putting dx t = a t dt + b t db t dy t = α t dt + β t db t, and taking f(x, y = xy, one has f x = y, f y = x, and f xx = f yy

More information

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( ) Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio (2014-2015) Etienne Tanré - Olivier Faugeras INRIA - Team Tosca November 26th, 2014 E. Tanré (INRIA - Team Tosca) Mathematical

More information

Computing ergodic limits for SDEs

Computing ergodic limits for SDEs Computing ergodic limits for SDEs M.V. Tretyakov School of Mathematical Sciences, University of Nottingham, UK Talk at the workshop Stochastic numerical algorithms, multiscale modelling and high-dimensional

More information

Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations.

Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations. Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations. Wenqing Hu. 1 (Joint work with Michael Salins 2, Konstantinos Spiliopoulos 3.) 1. Department of Mathematics

More information

Stochastic Volatility and Correction to the Heat Equation

Stochastic Volatility and Correction to the Heat Equation Stochastic Volatility and Correction to the Heat Equation Jean-Pierre Fouque, George Papanicolaou and Ronnie Sircar Abstract. From a probabilist s point of view the Twentieth Century has been a century

More information

Stochastic Composition Optimization

Stochastic Composition Optimization Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators

More information

Lecture 4: Introduction to stochastic processes and stochastic calculus

Lecture 4: Introduction to stochastic processes and stochastic calculus Lecture 4: Introduction to stochastic processes and stochastic calculus Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London

More information

Lecture 1: Pragmatic Introduction to Stochastic Differential Equations

Lecture 1: Pragmatic Introduction to Stochastic Differential Equations Lecture 1: Pragmatic Introduction to Stochastic Differential Equations Simo Särkkä Aalto University, Finland (visiting at Oxford University, UK) November 13, 2013 Simo Särkkä (Aalto) Lecture 1: Pragmatic

More information

Bridging the Gap between Center and Tail for Multiscale Processes

Bridging the Gap between Center and Tail for Multiscale Processes Bridging the Gap between Center and Tail for Multiscale Processes Matthew R. Morse Department of Mathematics and Statistics Boston University BU-Keio 2016, August 16 Matthew R. Morse (BU) Moderate Deviations

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Theoretical Tutorial Session 2

Theoretical Tutorial Session 2 1 / 36 Theoretical Tutorial Session 2 Xiaoming Song Department of Mathematics Drexel University July 27, 216 Outline 2 / 36 Itô s formula Martingale representation theorem Stochastic differential equations

More information

Introduction to multiscale modeling and simulation. Explicit methods for ODEs : forward Euler. y n+1 = y n + tf(y n ) dy dt = f(y), y(0) = y 0

Introduction to multiscale modeling and simulation. Explicit methods for ODEs : forward Euler. y n+1 = y n + tf(y n ) dy dt = f(y), y(0) = y 0 Introduction to multiscale modeling and simulation Lecture 5 Numerical methods for ODEs, SDEs and PDEs The need for multiscale methods Two generic frameworks for multiscale computation Explicit methods

More information

Towards stability and optimality in stochastic gradient descent

Towards stability and optimality in stochastic gradient descent Towards stability and optimality in stochastic gradient descent Panos Toulis, Dustin Tran and Edoardo M. Airoldi August 26, 2016 Discussion by Ikenna Odinaka Duke University Outline Introduction 1 Introduction

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

p 1 ( Y p dp) 1/p ( X p dp) 1 1 p

p 1 ( Y p dp) 1/p ( X p dp) 1 1 p Doob s inequality Let X(t) be a right continuous submartingale with respect to F(t), t 1 P(sup s t X(s) λ) 1 λ {sup s t X(s) λ} X + (t)dp 2 For 1 < p

More information

Advanced computational methods X Selected Topics: SGD

Advanced computational methods X Selected Topics: SGD Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety

More information

arxiv: v2 [cs.sy] 27 Sep 2016

arxiv: v2 [cs.sy] 27 Sep 2016 Analysis of gradient descent methods with non-diminishing, bounded errors Arunselvan Ramaswamy 1 and Shalabh Bhatnagar 2 arxiv:1604.00151v2 [cs.sy] 27 Sep 2016 1 arunselvan@csa.iisc.ernet.in 2 shalabh@csa.iisc.ernet.in

More information

8.3 Partial Fraction Decomposition

8.3 Partial Fraction Decomposition 8.3 partial fraction decomposition 575 8.3 Partial Fraction Decomposition Rational functions (polynomials divided by polynomials) and their integrals play important roles in mathematics and applications,

More information

The concentration of a drug in blood. Exponential decay. Different realizations. Exponential decay with noise. dc(t) dt.

The concentration of a drug in blood. Exponential decay. Different realizations. Exponential decay with noise. dc(t) dt. The concentration of a drug in blood Exponential decay C12 concentration 2 4 6 8 1 C12 concentration 2 4 6 8 1 dc(t) dt = µc(t) C(t) = C()e µt 2 4 6 8 1 12 time in minutes 2 4 6 8 1 12 time in minutes

More information

Exercises. T 2T. e ita φ(t)dt.

Exercises. T 2T. e ita φ(t)dt. Exercises. Set #. Construct an example of a sequence of probability measures P n on R which converge weakly to a probability measure P but so that the first moments m,n = xdp n do not converge to m = xdp.

More information

Lecture 12: Detailed balance and Eigenfunction methods

Lecture 12: Detailed balance and Eigenfunction methods Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2015 Lecture 12: Detailed balance and Eigenfunction methods Readings Recommended: Pavliotis [2014] 4.5-4.7 (eigenfunction methods and reversibility),

More information

Linear stochastic approximation driven by slowly varying Markov chains

Linear stochastic approximation driven by slowly varying Markov chains Available online at www.sciencedirect.com Systems & Control Letters 50 2003 95 102 www.elsevier.com/locate/sysconle Linear stochastic approximation driven by slowly varying Marov chains Viay R. Konda,

More information

AN EXTREME-VALUE ANALYSIS OF THE LIL FOR BROWNIAN MOTION 1. INTRODUCTION

AN EXTREME-VALUE ANALYSIS OF THE LIL FOR BROWNIAN MOTION 1. INTRODUCTION AN EXTREME-VALUE ANALYSIS OF THE LIL FOR BROWNIAN MOTION DAVAR KHOSHNEVISAN, DAVID A. LEVIN, AND ZHAN SHI ABSTRACT. We present an extreme-value analysis of the classical law of the iterated logarithm LIL

More information

Reinforcement Learning In Continuous Time and Space

Reinforcement Learning In Continuous Time and Space Reinforcement Learning In Continuous Time and Space presentation of paper by Kenji Doya Leszek Rybicki lrybicki@mat.umk.pl 18.07.2008 Leszek Rybicki lrybicki@mat.umk.pl Reinforcement Learning In Continuous

More information

Ergodic Subgradient Descent

Ergodic Subgradient Descent Ergodic Subgradient Descent John Duchi, Alekh Agarwal, Mikael Johansson, Michael Jordan University of California, Berkeley and Royal Institute of Technology (KTH), Sweden Allerton Conference, September

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Local vs. Nonlocal Diffusions A Tale of Two Laplacians

Local vs. Nonlocal Diffusions A Tale of Two Laplacians Local vs. Nonlocal Diffusions A Tale of Two Laplacians Jinqiao Duan Dept of Applied Mathematics Illinois Institute of Technology Chicago duan@iit.edu Outline 1 Einstein & Wiener: The Local diffusion 2

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Continuous dependence estimates for the ergodic problem with an application to homogenization

Continuous dependence estimates for the ergodic problem with an application to homogenization Continuous dependence estimates for the ergodic problem with an application to homogenization Claudio Marchi Bayreuth, September 12 th, 2013 C. Marchi (Università di Padova) Continuous dependence Bayreuth,

More information

Research Article Existence and Uniqueness Theorem for Stochastic Differential Equations with Self-Exciting Switching

Research Article Existence and Uniqueness Theorem for Stochastic Differential Equations with Self-Exciting Switching Discrete Dynamics in Nature and Society Volume 211, Article ID 549651, 12 pages doi:1.1155/211/549651 Research Article Existence and Uniqueness Theorem for Stochastic Differential Equations with Self-Exciting

More information

Sequential and reinforcement learning: Stochastic Optimization I

Sequential and reinforcement learning: Stochastic Optimization I 1 Sequential and reinforcement learning: Stochastic Optimization I Sequential and reinforcement learning: Stochastic Optimization I Summary This session describes the important and nowadays framework of

More information

Lower Tail Probabilities and Related Problems

Lower Tail Probabilities and Related Problems Lower Tail Probabilities and Related Problems Qi-Man Shao National University of Singapore and University of Oregon qmshao@darkwing.uoregon.edu . Lower Tail Probabilities Let {X t, t T } be a real valued

More information

Branching Processes II: Convergence of critical branching to Feller s CSB

Branching Processes II: Convergence of critical branching to Feller s CSB Chapter 4 Branching Processes II: Convergence of critical branching to Feller s CSB Figure 4.1: Feller 4.1 Birth and Death Processes 4.1.1 Linear birth and death processes Branching processes can be studied

More information

Homogenization for chaotic dynamical systems

Homogenization for chaotic dynamical systems Homogenization for chaotic dynamical systems David Kelly Ian Melbourne Department of Mathematics / Renci UNC Chapel Hill Mathematics Institute University of Warwick November 3, 2013 Duke/UNC Probability

More information

Fast-slow systems with chaotic noise

Fast-slow systems with chaotic noise Fast-slow systems with chaotic noise David Kelly Ian Melbourne Courant Institute New York University New York NY www.dtbkelly.com May 12, 215 Averaging and homogenization workshop, Luminy. Fast-slow systems

More information

Regular Variation and Extreme Events for Stochastic Processes

Regular Variation and Extreme Events for Stochastic Processes 1 Regular Variation and Extreme Events for Stochastic Processes FILIP LINDSKOG Royal Institute of Technology, Stockholm 2005 based on joint work with Henrik Hult www.math.kth.se/ lindskog 2 Extremes for

More information

{σ x >t}p x. (σ x >t)=e at.

{σ x >t}p x. (σ x >t)=e at. 3.11. EXERCISES 121 3.11 Exercises Exercise 3.1 Consider the Ornstein Uhlenbeck process in example 3.1.7(B). Show that the defined process is a Markov process which converges in distribution to an N(0,σ

More information

Linear Systems Theory

Linear Systems Theory ME 3253 Linear Systems Theory Review Class Overview and Introduction 1. How to build dynamic system model for physical system? 2. How to analyze the dynamic system? -- Time domain -- Frequency domain (Laplace

More information

Asymptotical distribution free test for parameter change in a diffusion model (joint work with Y. Nishiyama) Ilia Negri

Asymptotical distribution free test for parameter change in a diffusion model (joint work with Y. Nishiyama) Ilia Negri Asymptotical distribution free test for parameter change in a diffusion model (joint work with Y. Nishiyama) Ilia Negri University of Bergamo (Italy) ilia.negri@unibg.it SAPS VIII, Le Mans 21-24 March,

More information

Lecture 12: Detailed balance and Eigenfunction methods

Lecture 12: Detailed balance and Eigenfunction methods Lecture 12: Detailed balance and Eigenfunction methods Readings Recommended: Pavliotis [2014] 4.5-4.7 (eigenfunction methods and reversibility), 4.2-4.4 (explicit examples of eigenfunction methods) Gardiner

More information

Asymptotic and finite-sample properties of estimators based on stochastic gradients

Asymptotic and finite-sample properties of estimators based on stochastic gradients Asymptotic and finite-sample properties of estimators based on stochastic gradients Panagiotis (Panos) Toulis panos.toulis@chicagobooth.edu Econometrics and Statistics University of Chicago, Booth School

More information

Stochastic contraction BACS Workshop Chamonix, January 14, 2008

Stochastic contraction BACS Workshop Chamonix, January 14, 2008 Stochastic contraction BACS Workshop Chamonix, January 14, 2008 Q.-C. Pham N. Tabareau J.-J. Slotine Q.-C. Pham, N. Tabareau, J.-J. Slotine () Stochastic contraction 1 / 19 Why stochastic contraction?

More information

Notes on policy gradients and the log derivative trick for reinforcement learning

Notes on policy gradients and the log derivative trick for reinforcement learning Notes on policy gradients and the log derivative trick for reinforcement learning David Meyer dmm@{1-4-5.net,uoregon.edu,brocade.com,...} June 3, 2016 1 Introduction The log derivative trick 1 is a widely

More information

Mathematical analysis of an Adaptive Biasing Potential method for diffusions

Mathematical analysis of an Adaptive Biasing Potential method for diffusions Mathematical analysis of an Adaptive Biasing Potential method for diffusions Charles-Edouard Bréhier Joint work with Michel Benaïm (Neuchâtel, Switzerland) CNRS & Université Lyon 1, Institut Camille Jordan

More information

Dynamical systems with Gaussian and Levy noise: analytical and stochastic approaches

Dynamical systems with Gaussian and Levy noise: analytical and stochastic approaches Dynamical systems with Gaussian and Levy noise: analytical and stochastic approaches Noise is often considered as some disturbing component of the system. In particular physical situations, noise becomes

More information

ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control

ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Tianyu Wang: tiw161@eng.ucsd.edu Yongxi Lu: yol070@eng.ucsd.edu

More information

Continuous Time Approximations to GARCH(1, 1)-Family Models and Their Limiting Properties

Continuous Time Approximations to GARCH(1, 1)-Family Models and Their Limiting Properties Communications for Statistical Applications and Methods 214, Vol. 21, No. 4, 327 334 DOI: http://dx.doi.org/1.5351/csam.214.21.4.327 Print ISSN 2287-7843 / Online ISSN 2383-4757 Continuous Time Approximations

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Adaptive timestepping for SDEs with non-globally Lipschitz drift

Adaptive timestepping for SDEs with non-globally Lipschitz drift Adaptive timestepping for SDEs with non-globally Lipschitz drift Mike Giles Wei Fang Mathematical Institute, University of Oxford Workshop on Numerical Schemes for SDEs and SPDEs Université Lille June

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

A System Theoretic Perspective of Learning and Optimization

A System Theoretic Perspective of Learning and Optimization A System Theoretic Perspective of Learning and Optimization Xi-Ren Cao* Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong eecao@ee.ust.hk Abstract Learning and optimization

More information

ORIE 4741: Learning with Big Messy Data. Proximal Gradient Method

ORIE 4741: Learning with Big Messy Data. Proximal Gradient Method ORIE 4741: Learning with Big Messy Data Proximal Gradient Method Professor Udell Operations Research and Information Engineering Cornell November 13, 2017 1 / 31 Announcements Be a TA for CS/ORIE 1380:

More information

The Smoluchowski-Kramers Approximation: What model describes a Brownian particle?

The Smoluchowski-Kramers Approximation: What model describes a Brownian particle? The Smoluchowski-Kramers Approximation: What model describes a Brownian particle? Scott Hottovy shottovy@math.arizona.edu University of Arizona Applied Mathematics October 7, 2011 Brown observes a particle

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

LANGEVIN THEORY OF BROWNIAN MOTION. Contents. 1 Langevin theory. 1 Langevin theory 1. 2 The Ornstein-Uhlenbeck process 8

LANGEVIN THEORY OF BROWNIAN MOTION. Contents. 1 Langevin theory. 1 Langevin theory 1. 2 The Ornstein-Uhlenbeck process 8 Contents LANGEVIN THEORY OF BROWNIAN MOTION 1 Langevin theory 1 2 The Ornstein-Uhlenbeck process 8 1 Langevin theory Einstein (as well as Smoluchowski) was well aware that the theory of Brownian motion

More information

Stochastic Differential Equations

Stochastic Differential Equations Chapter 5 Stochastic Differential Equations We would like to introduce stochastic ODE s without going first through the machinery of stochastic integrals. 5.1 Itô Integrals and Itô Differential Equations

More information

STATISTICS 385: STOCHASTIC CALCULUS HOMEWORK ASSIGNMENT 4 DUE NOVEMBER 23, = (2n 1)(2n 3) 3 1.

STATISTICS 385: STOCHASTIC CALCULUS HOMEWORK ASSIGNMENT 4 DUE NOVEMBER 23, = (2n 1)(2n 3) 3 1. STATISTICS 385: STOCHASTIC CALCULUS HOMEWORK ASSIGNMENT 4 DUE NOVEMBER 23, 26 Problem Normal Moments (A) Use the Itô formula and Brownian scaling to check that the even moments of the normal distribution

More information

Path integrals for classical Markov processes

Path integrals for classical Markov processes Path integrals for classical Markov processes Hugo Touchette National Institute for Theoretical Physics (NITheP) Stellenbosch, South Africa Chris Engelbrecht Summer School on Non-Linear Phenomena in Field

More information

Homogenization with stochastic differential equations

Homogenization with stochastic differential equations Homogenization with stochastic differential equations Scott Hottovy shottovy@math.arizona.edu University of Arizona Program in Applied Mathematics October 12, 2011 Modeling with SDE Use SDE to model system

More information

Error Functions & Linear Regression (1)

Error Functions & Linear Regression (1) Error Functions & Linear Regression (1) John Kelleher & Brian Mac Namee Machine Learning @ DIT Overview 1 Introduction Overview 2 Univariate Linear Regression Linear Regression Analytical Solution Gradient

More information

Statistical Sparse Online Regression: A Diffusion Approximation Perspective

Statistical Sparse Online Regression: A Diffusion Approximation Perspective Statistical Sparse Online Regression: A Diffusion Approximation Perspective Jianqing Fan Wenyan Gong Chris Junchi Li Qiang Sun Princeton University Princeton University Princeton University University

More information

Nonlinear representation, backward SDEs, and application to the Principal-Agent problem

Nonlinear representation, backward SDEs, and application to the Principal-Agent problem Nonlinear representation, backward SDEs, and application to the Principal-Agent problem Ecole Polytechnique, France April 4, 218 Outline The Principal-Agent problem Formulation 1 The Principal-Agent problem

More information

Lecture 4: Ito s Stochastic Calculus and SDE. Seung Yeal Ha Dept of Mathematical Sciences Seoul National University

Lecture 4: Ito s Stochastic Calculus and SDE. Seung Yeal Ha Dept of Mathematical Sciences Seoul National University Lecture 4: Ito s Stochastic Calculus and SDE Seung Yeal Ha Dept of Mathematical Sciences Seoul National University 1 Preliminaries What is Calculus? Integral, Differentiation. Differentiation 2 Integral

More information

Stochastic differential equation models in biology Susanne Ditlevsen

Stochastic differential equation models in biology Susanne Ditlevsen Stochastic differential equation models in biology Susanne Ditlevsen Introduction This chapter is concerned with continuous time processes, which are often modeled as a system of ordinary differential

More information

GARCH processes continuous counterparts (Part 2)

GARCH processes continuous counterparts (Part 2) GARCH processes continuous counterparts (Part 2) Alexander Lindner Centre of Mathematical Sciences Technical University of Munich D 85747 Garching Germany lindner@ma.tum.de http://www-m1.ma.tum.de/m4/pers/lindner/

More information

A Short Introduction to Diffusion Processes and Ito Calculus

A Short Introduction to Diffusion Processes and Ito Calculus A Short Introduction to Diffusion Processes and Ito Calculus Cédric Archambeau University College, London Center for Computational Statistics and Machine Learning c.archambeau@cs.ucl.ac.uk January 24,

More information

Introduction to Diffusion Processes.

Introduction to Diffusion Processes. Introduction to Diffusion Processes. Arka P. Ghosh Department of Statistics Iowa State University Ames, IA 511-121 apghosh@iastate.edu (515) 294-7851. February 1, 21 Abstract In this section we describe

More information

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011 Department of Probability and Mathematical Statistics Faculty of Mathematics and Physics, Charles University in Prague petrasek@karlin.mff.cuni.cz Seminar in Stochastic Modelling in Economics and Finance

More information

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016 Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic

More information

Inverse Brascamp-Lieb inequalities along the Heat equation

Inverse Brascamp-Lieb inequalities along the Heat equation Inverse Brascamp-Lieb inequalities along the Heat equation Franck Barthe and Dario Cordero-Erausquin October 8, 003 Abstract Adapting Borell s proof of Ehrhard s inequality for general sets, we provide

More information

LAN property for sde s with additive fractional noise and continuous time observation

LAN property for sde s with additive fractional noise and continuous time observation LAN property for sde s with additive fractional noise and continuous time observation Eulalia Nualart (Universitat Pompeu Fabra, Barcelona) joint work with Samy Tindel (Purdue University) Vlad s 6th birthday,

More information

Exercises in stochastic analysis

Exercises in stochastic analysis Exercises in stochastic analysis Franco Flandoli, Mario Maurelli, Dario Trevisan The exercises with a P are those which have been done totally or partially) in the previous lectures; the exercises with

More information

5 Applying the Fokker-Planck equation

5 Applying the Fokker-Planck equation 5 Applying the Fokker-Planck equation We begin with one-dimensional examples, keeping g = constant. Recall: the FPE for the Langevin equation with η(t 1 )η(t ) = κδ(t 1 t ) is = f(x) + g(x)η(t) t = x [f(x)p

More information

Stochastic Differential Equations.

Stochastic Differential Equations. Chapter 3 Stochastic Differential Equations. 3.1 Existence and Uniqueness. One of the ways of constructing a Diffusion process is to solve the stochastic differential equation dx(t) = σ(t, x(t)) dβ(t)

More information

Perturbed Proximal Gradient Algorithm

Perturbed Proximal Gradient Algorithm Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Before you begin read these instructions carefully.

Before you begin read these instructions carefully. MATHEMATICAL TRIPOS Part IA Wednesday, 4 June, 2014 9:00 am to 12:00 pm PAPER 2 Before you begin read these instructions carefully. The examination paper is divided into two sections. Each question in

More information

First passage time for Brownian motion and piecewise linear boundaries

First passage time for Brownian motion and piecewise linear boundaries To appear in Methodology and Computing in Applied Probability, (2017) 19: 237-253. doi 10.1007/s11009-015-9475-2 First passage time for Brownian motion and piecewise linear boundaries Zhiyong Jin 1 and

More information

Stochastic Modelling in Climate Science

Stochastic Modelling in Climate Science Stochastic Modelling in Climate Science David Kelly Mathematics Department UNC Chapel Hill dtbkelly@gmail.com November 16, 2013 David Kelly (UNC) Stochastic Climate November 16, 2013 1 / 36 Why use stochastic

More information

Distributed and Recursive Parameter Estimation in Parametrized Linear State-Space Models

Distributed and Recursive Parameter Estimation in Parametrized Linear State-Space Models 1 Distributed and Recursive Parameter Estimation in Parametrized Linear State-Space Models S. Sundhar Ram, V. V. Veeravalli, and A. Nedić Abstract We consider a network of sensors deployed to sense a spatio-temporal

More information

Qualitative behaviour of numerical methods for SDEs and application to homogenization

Qualitative behaviour of numerical methods for SDEs and application to homogenization Qualitative behaviour of numerical methods for SDEs and application to homogenization K. C. Zygalakis Oxford Centre For Collaborative Applied Mathematics, University of Oxford. Center for Nonlinear Analysis,

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Stochastic Subgradient Method

Stochastic Subgradient Method Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)

More information

Hypoelliptic multiscale Langevin diffusions and Slow fast stochastic reaction diffusion equations.

Hypoelliptic multiscale Langevin diffusions and Slow fast stochastic reaction diffusion equations. Hypoelliptic multiscale Langevin diffusions and Slow fast stochastic reaction diffusion equations. Wenqing Hu. 1 (Joint works with Michael Salins 2 and Konstantinos Spiliopoulos 3.) 1. Department of Mathematics

More information