Outline. A Central Limit Theorem for Truncating Stochastic Algorithms

Similar documents
A Central Limit Theorem for Robbins Monro Algorithms with Projections

A framework for adaptive Monte-Carlo procedures

Asymptotic normality of randomly truncated stochastic algorithms

Stochastic Gradient Descent in Continuous Time

Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Lecture 4: Introduction to stochastic processes and stochastic calculus

Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

1. Stochastic Processes and filtrations

Some multivariate risk indicators; minimization by using stochastic algorithms

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

Stochastic Volatility and Correction to the Heat Equation

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS

UNCERTAINTY FUNCTIONAL DIFFERENTIAL EQUATIONS FOR FINANCE

Short-time expansions for close-to-the-money options under a Lévy jump model with stochastic volatility

SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES

Regular Variation and Extreme Events for Stochastic Processes

SPDES driven by Poisson Random Measures and their numerical September Approximation 7, / 42

Detecting instants of jumps and estimating intensity of jumps from continuous or discrete data

Information and Credit Risk

Lecture 17 Brownian motion as a Markov process

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Multi-Factor Lévy Models I: Symmetric alpha-stable (SαS) Lévy Processes

A random perturbation approach to some stochastic approximation algorithms in optimization.

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

CONVERGENCE OF A STOCHASTIC APPROXIMATION VERSION OF THE EM ALGORITHM

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij

Filtrations, Markov Processes and Martingales. Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 3: The Lévy-Itô Decomposition

Brownian Motion. Chapter Definition of Brownian motion

Recursive estimation of GARCH processes

ELEMENTS OF PROBABILITY THEORY

A Perturbed Gradient Algorithm in Hilbert spaces

Online solution of the average cost Kullback-Leibler optimization problem

Size Distortion and Modi cation of Classical Vuong Tests

On the fast convergence of random perturbations of the gradient flow.

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Asymptotical distribution free test for parameter change in a diffusion model (joint work with Y. Nishiyama) Ilia Negri

Nonuniform Random Variate Generation

GARCH processes continuous counterparts (Part 2)

Nested Uncertain Differential Equations and Its Application to Multi-factor Term Structure Model

Some functional (Hölderian) limit theorems and their applications (II)

STAT Sample Problem: General Asymptotic Results

Poisson random measure: motivation

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Exercises in stochastic analysis

Random sets. Distributions, capacities and their applications. Ilya Molchanov. University of Bern, Switzerland

Stochastic Optimization One-stage problem

STATISTICS 385: STOCHASTIC CALCULUS HOMEWORK ASSIGNMENT 4 DUE NOVEMBER 23, = (2n 1)(2n 3) 3 1.

STAT 331. Martingale Central Limit Theorem and Related Results

Beyond stochastic gradient descent for large-scale machine learning

FUNCTIONAL CENTRAL LIMIT (RWRE)

Vast Volatility Matrix Estimation for High Frequency Data

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

P (A G) dp G P (A G)

LAN property for sde s with additive fractional noise and continuous time observation

Ernesto Mordecki 1. Lecture III. PASI - Guanajuato - June 2010

A Barrier Version of the Russian Option

Dynamic Risk Measures and Nonlinear Expectations with Markov Chain noise

MATH 6605: SUMMARY LECTURE NOTES

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

CONVERGENCE RATE OF LINEAR TWO-TIME-SCALE STOCHASTIC APPROXIMATION 1. BY VIJAY R. KONDA AND JOHN N. TSITSIKLIS Massachusetts Institute of Technology

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.

Importance Sampling and Statistical Romberg method

Understanding Regressions with Observations Collected at High Frequency over Long Span

Variance Reduction Techniques for Monte Carlo Simulations with Stochastic Volatility Models

Perturbed Proximal Gradient Algorithm

Lecture 12. F o s, (1.1) F t := s>t

Thomas Knispel Leibniz Universität Hannover

Universal examples. Chapter The Bernoulli process

Asymptotic inference for a nonstationary double ar(1) model

Nonlinear GMM. Eric Zivot. Winter, 2013

Statistics & Data Sciences: First Year Prelim Exam May 2018

Wiener Measure and Brownian Motion

I forgot to mention last time: in the Ito formula for two standard processes, putting

Weak Convergence: Introduction

Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations.

MATH4210 Financial Mathematics ( ) Tutorial 7

Stochastic differential equation models in biology Susanne Ditlevsen

1 Brownian Local Time

Other properties of M M 1

Convergence at first and second order of some approximations of stochastic integrals

e - c o m p a n i o n

Branching Processes II: Convergence of critical branching to Feller s CSB

Stochastic integral. Introduction. Ito integral. References. Appendices Stochastic Calculus I. Geneviève Gauthier.

An Almost Sure Approximation for the Predictable Process in the Doob Meyer Decomposition Theorem

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Beyond stochastic gradient descent for large-scale machine learning

Model Counting for Logical Theories

Confidence Intervals of Prescribed Precision Summary

Verona Course April Lecture 1. Review of probability

Econometrics II - EXAM Answer each question in separate sheets in three hours

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Stochastic Stability - H.J. Kushner

A numerical method for solving uncertain differential equations

On an Effective Solution of the Optimal Stopping Problem for Random Walks

Exercises. T 2T. e ita φ(t)dt.

Average-cost temporal difference learning and adaptive control variates

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 7 9/25/2013

Gaussian vectors and central limit theorem

Transcription:

Outline A Central Limit Theorem for Truncating Stochastic Algorithms Jérôme Lelong http://cermics.enpc.fr/ lelong Tuesday September 5, 6 1 3 4 Jérôme Lelong (CERMICS) Tuesday September 5, 6 1 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 / 3 1 3 4 General Framework Let u: θ R d u(θ) R d, be a continuous function defined as an expectation on a probability space (Ω, A, P). u : R d R d θ E[U(θ, Z)]. Z is a r.v. in R m and U a measurable function defined on R d R m. Hypothesis 1 (convexity)! θ R d, u(θ ) = and θ R d, θ θ, (θ θ u(θ)) >. Remark: if u is the gradient of a strictly convex function, then u satisfies Hypothesis 1. Problem: How to find the root of u? Jérôme Lelong (CERMICS) Tuesday September 5, 6 3 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 4 / 3

For sub-linear functions Assume that for all θ R d We define for θ R d E[ U(θ, Z) ] K(1 + θ ). (1) θ n+1 = θ n γ n+1 U(θ n, Z n+1 ). () (Z n ) n is i.i.d. following the law of Z. γ n >, γ n ց, γi = and Theorem 1 (Robbins Monro) γ i <. Assume Hypothesis 1 and that Equation (1) is true then, the sequence () converges a.s. to θ. For fast growing functions: an intuitive approach Consider an increasing sequence of compact sets (K j ) j such that j= K j = R d. Consider (Z n ) n and (γ n ) n as defined previously. Prevent the algorithm from blowing up: At each step, θ n should remain in a given compact set. If such is not the case, reset the algorithm and consider a larger compact set. This is due to Chen (see [Chen and Zhu, 1986]). Condition (1) is barely satisfied in practice. Jérôme Lelong (CERMICS) Tuesday September 5, 6 5 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 6 / 3 For fast growing functions For fast growing functions: mathematical approach θ n+1 = θ + θ n+1 =θ n+ 1 θ n+ 1 + + +θ +θ n K σn = K σn+1 K σn+1 = K σn+1 For θ K and σ =, we define (θ n ) n and (σ n ) n θ n+ 1 = θ n γ n+1 U(θ n, Z n+1 ), if θ n+ 1 K σ n θ n+1 = θ n+ 1 and σ n+1 = σ n, if θ n+ 1 / K σ n θ n+1 = θ and σ n+1 = σ n + 1. σ n counts the number of truncations up to time n. F n = σ(z k ; k n). θ n is F n measurable and Z n+1 independent of F n. (3) Jérôme Lelong (CERMICS) Tuesday September 5, 6 7 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 8 / 3

It is often more convenient to rewrite (3) as follows where θ n+1 = θ n γ n+1 u(θ n ) γ n+1 δm n+1 }{{}}{{} noise term Newton algorithm } {{ } standard Robbins Monro algorithm + γ n+1 p n+1 }{{} truncation term δm n+1 = U(θ n, Z n+1 ) u(θ n ), { u(θn ) + δm n+1 + 1 γ and p n+1 = n+1 (θ θ n ) if θ n+ 1 / K σ n, otherwise. δm n is a martingale increment, p n is the truncation term. (4) a.s convergence of Chen s procedure Hypothesis (integrability) For all p >, the series n γ n+1δm n+1 1 θn p converges a.s. Hypothesis is satisfied as soon as u and θ E[ U(θ, Z) ] are bounded on any compact sets (or continuous). Hint Theorem Under Hypotheses 1 and, the sequence (θ n ) n defined by (3) converges a.s. to θ and the sequence (σ n ) n is a.s. finite. A proof of this theorem can be found in [Delyon, 1996]. Jérôme Lelong (CERMICS) Tuesday September 5, 6 9 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 1 / 3 1 3 4 General Problem Option pricing problem in a Brownian driven model (no jump): compute E[ψ(G)] by a MC method with G N(, I d ). One way of reducing the variance is to perform importance sampling techniques. For all θ R d, [ ] θ θ G E[ψ(G)] = E ψ(g + θ)e. (5) Minimise[ v(θ) = E ψ(g + θ) e θ G θ ] = E [ψ(g) e [ v(θ) = E (θ G)ψ(G) e θ G+ θ θ θ G+ ]. ]. (6) Jérôme Lelong (CERMICS) Tuesday September 5, 6 11 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 1 / 3

Analysis of the problem To use the previous algorithms, we set u = v and Z = G. v is strictly convex, hence Hypothesis 1 is satisfied. v is not sub-linear. Cannot use a standard S.A. Theorem Hypothesis holds even if ψ is of an exponential type. Theorem holds and provides a way to compute θ. Theorem More details in [Arouna, 4]. Example 1 3 4 Jérôme Lelong (CERMICS) Tuesday September 5, 6 13 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 14 / 3 A CLT for Standard S.A. I First, let us consider the standard algorithm θ n+1 = θ n γ n+1 u(θ n ) γ n+1 δm n+1 with γ n = γ α (n+1), 1/ < α 1. For n, we define the renormalised error n = θ n θ γn. A CLT for Standard S.A. II Hypothesis 3 There exists a function y : R d R d d satisfying lim x y(x) = and a symmetric definite positive matrix A such that u(θ) = A(θ θ ) + y(θ θ )(θ θ ). There exists a real number ρ > such that ( κ = sup E δm n +ρ) <. n There exists a symmetric definite positive matrix Σ such that E(δM n δm n F n 1) P n Σ. Jérôme Lelong (CERMICS) Tuesday September 5, 6 15 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 16 / 3

A CLT for Standard S.A. III Why we cannot deduce a CLT for truncating S.A. from the CLT for Standard S.A. Under Hypotheses 1 and 3 and if the algorithm converges a.s., then n L N(, V ). A proof of this result can be found in [Kushner and Yin, 3], [Benveniste et al., 199], [Bouton, 1985], [Duflo, 1997] for instance. The number of projections is a.s. finite. ω A, P(A) = 1, N(ω), n > N(ω), p n (ω) =. N is r.v. a.s. finite but not bounded. Hence, one cannot use a time shifting argument. Random time shifting does not preserve convergence in distribution. If X n L X and τ < a.s., one can show trivial examples where X n+τ does not converge. Choose for instance τ and τ independent r.v. on {, 1} with parameter 1/ and set X n := ( 1) n (τ τ ). Jérôme Lelong (CERMICS) Tuesday September 5, 6 17 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 18 / 3 A CLT for Randomly Truncating S.A. Consider Chen s Algorithm Definition. Theorem 3 Assume Hypotheses 1 and 3. If there exists η > such that n d(θ, K n ) > η, then n L N(, V ). if 1/ < α < 1 with V = if α = 1 and γa I > with V = γ exp ( At)Σ exp ( At)dt. (( ) ) (( ) ) I I exp γa t Σ exp γa t dt. A functional CLT for Chen s algorithm I We introduce a sequence of interpolating times {t n (u); u, n } t n (u) = sup { with the convention sup =. k ; } n+k γ i u. (7) i=n + + + + + + + γn γn + γ n+1 T Ptn(T) γ i= n+i n () = n and n (t) = n+tn(t)+1 for t. [ n+p For t i=n γ i, n+p+1 i=n γ i [, t n (t) = p and n (t) = n+p+1. Jérôme Lelong (CERMICS) Tuesday September 5, 6 19 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 / 3

A functional CLT for Chen s algorithm II We introduce W n (.) W n () = and W n (t) = Theorem 4 Assume the Hypotheses of Theorem 3. ( n ( ), W n ( )) D D === (, W) n+t n(t)+1 i=n+1 γi δm i for t >. (8) on any finite time interval where is a stationary Ornstein Uhlenbeck process of initial law N(, V ) and W a Wiener process F,W measurable with covariance matrix Σ. Averaging Algorithms I γ (n+1) α with 1/ < α < 1. For any t >, we We restrict to γ n = introduce a moving window average of the iterates ˆθ n (t) = γ n t We study the renormalised error n+ t γn i=n θ i. (9) ˆ n (t) = ˆθ n (t) θ n+ γn t γn = (θ i θ ). (1) γn t We need to characterise the limit law of ( n, n+1,..., n+p ) when (n, p). i=n Jérôme Lelong (CERMICS) Tuesday September 5, 6 1 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 / 3 A CLT for Averaging Algorithms Theorem 5 Under the Hypotheses of Theorem 3, where ˆ n (t) L n N(, ˆV ) ˆV = 1 t A 1 ΣA 1 + A (e At I)V + V A (e At I) t 1 3 4 Jérôme Lelong (CERMICS) Tuesday September 5, 6 3 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 4 / 3

Major steps Donsker s Theorem for martingales increments n is tight in R. Tightness Use a localisation technique to prove that (sup t [,T] n (t) ) n is tight. Prove a Donsker Theorem for martingales increment. n ( ) satisfies Aldous criteria. Tightness in D Theorem (W n ( ), n ( )) n is tight in D D and converges in law to (W, ) where W is a Wiener process with respect to the smallest σ algebra that measures (W( ), ( )) with covariance matrix Σ and is the stationary solution of d (t) = Q (t)dt dw(t). Theorem 6 Let (M n (t)) n be a sequence of martingales. Assume that (M n ( )) n is tight in D and satisfies a C tightness criteria. (M n (t)) n is a uniformly square integrable family for each t. M n t P n t Then, M n ( ) ==== D[,T] B.M. n W n (t) satisfies Theorem 6. Jérôme Lelong (CERMICS) Tuesday September 5, 6 5 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 6 / 3 Comments on Hypothesis M n is a martingale n M n = γ i δm i 1 θi 1 p, i=1 E(M n F n 1 ) = M n 1 + γ n 1 θn 1 pe(δm n F n 1 ). if sup n M n < then M n converges a.s. n M n = γi E(δM i F i 1)1 θi 1 p. i=1 E(δMi F i 1 ) = E ( U(θ, Z) ) θ=θ i 1 u(θ i 1 ) E ( U(θ, Z) ) θ=θ i 1. Jérôme Lelong (CERMICS) Tuesday September 5, 6 7 / 3 Let τ and τ be independent r.v. on {, 1} with parameter 1/. We set X n := ( 1) n (τ τ ). τ τ is symmetric. Hence, X n is constant in law. E(e iuxn+τ ) = E(e iu( 1)n+τ (τ τ ) ), = 1 ( ) E(e iu( 1)n+τ τ ) + E(e iu( 1)n+τ (τ 1) ), = 1 ( ) 1 + e iu( 1)n+11 + e iu( 1)n ( 1) + 1, 4 = 1 ( 1 + e iu( 1)n+1). Hence, (X n+τ ) n does not converge in distribution. Jérôme Lelong (CERMICS) Tuesday September 5, 6 8 / 3

( Basket Option 1 N N i= Si T K ) + Basket Option Figure: Evolution of the drift vector Figure: Importance Sampling Figure: Standard MC basket size : 6 initial value : 11 11 1 8 9 1 maturity time : 1 strike : 1 interest rate :. volatility :. step size : 6 Sample Number : 5 1.3.9.5.1.3 1.8 1.6 1.4 1. 1..8.6.4. 1e3 e3 3e3 4e3 5e3 1.8 1.6 1.4 1. 1..8.6.4. 1e3 e3 3e3 4e3 5e3 1e3 e3 3e3 4e3 5e3 variance =.87959 variance = 11.896 Jérôme Lelong (CERMICS) Tuesday September 5, 6 9 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 3 / 3 Atlas Option Consider a basket of 16 stocks. At maturity time, remove the stocks with the three best and the three worst performances. It pays off 15% of the average of the remaining stocks. Atlas Option Figure: Importance Sampling Figure: Standard MC Figure: Evolution of the drift vector 1..8 1..8 basket size : 16 maturity time : 1 interest rate :. volatility :. step size :.1 Sample Number :.8.6.4..6.4. 4 6 8 1 1 14 16 18.6.4. 4 6 8 1 1 14 16 18 variance =.7596 variance =.615 4 6 8 1 1 14 16 18 Jérôme Lelong (CERMICS) Tuesday September 5, 6 31 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 3 / 3

Arouna, B. (Winter 3/4). Robbins-monro algorithms and variance reduction in finance. The Journal of Computational Finance, 7(). Benveniste, A., Métivier, M., and Priouret, P. (199). Adaptive algorithms and stochastic approximations, volume of Applications of Mathematics (New York). Springer-Verlag, Berlin. Translated from the French by Stephen S. Wilson. Bouton, C. (1985). Approximation Gaussienne d algorithmes stochastiques à dynamique Markovienne. PhD thesis, Université Pierre et Marie Curie - Paris 6. Delyon, B. (1996). General results on the convergence of stochastic algorithms. IEEE Transactions on Automatic Control, 41(9):145 155. Duflo, M. (1997). Random Iterative Models. Springer-Verlag Berlin and New York. Kushner, H. and Yin, G. (3). Stochastic Approximation and Recursive Algorthims and Applications. Springer-Verlag New York, second edition. Chen, H. and Zhu, Y. (1986). Stochastic Approximation Procedure with randomly varying truncations. Scientia Sinica Series. Jérôme Lelong (CERMICS) Tuesday September 5, 6 3 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 3 / 3