Outline A Central Limit Theorem for Truncating Stochastic Algorithms Jérôme Lelong http://cermics.enpc.fr/ lelong Tuesday September 5, 6 1 3 4 Jérôme Lelong (CERMICS) Tuesday September 5, 6 1 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 / 3 1 3 4 General Framework Let u: θ R d u(θ) R d, be a continuous function defined as an expectation on a probability space (Ω, A, P). u : R d R d θ E[U(θ, Z)]. Z is a r.v. in R m and U a measurable function defined on R d R m. Hypothesis 1 (convexity)! θ R d, u(θ ) = and θ R d, θ θ, (θ θ u(θ)) >. Remark: if u is the gradient of a strictly convex function, then u satisfies Hypothesis 1. Problem: How to find the root of u? Jérôme Lelong (CERMICS) Tuesday September 5, 6 3 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 4 / 3
For sub-linear functions Assume that for all θ R d We define for θ R d E[ U(θ, Z) ] K(1 + θ ). (1) θ n+1 = θ n γ n+1 U(θ n, Z n+1 ). () (Z n ) n is i.i.d. following the law of Z. γ n >, γ n ց, γi = and Theorem 1 (Robbins Monro) γ i <. Assume Hypothesis 1 and that Equation (1) is true then, the sequence () converges a.s. to θ. For fast growing functions: an intuitive approach Consider an increasing sequence of compact sets (K j ) j such that j= K j = R d. Consider (Z n ) n and (γ n ) n as defined previously. Prevent the algorithm from blowing up: At each step, θ n should remain in a given compact set. If such is not the case, reset the algorithm and consider a larger compact set. This is due to Chen (see [Chen and Zhu, 1986]). Condition (1) is barely satisfied in practice. Jérôme Lelong (CERMICS) Tuesday September 5, 6 5 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 6 / 3 For fast growing functions For fast growing functions: mathematical approach θ n+1 = θ + θ n+1 =θ n+ 1 θ n+ 1 + + +θ +θ n K σn = K σn+1 K σn+1 = K σn+1 For θ K and σ =, we define (θ n ) n and (σ n ) n θ n+ 1 = θ n γ n+1 U(θ n, Z n+1 ), if θ n+ 1 K σ n θ n+1 = θ n+ 1 and σ n+1 = σ n, if θ n+ 1 / K σ n θ n+1 = θ and σ n+1 = σ n + 1. σ n counts the number of truncations up to time n. F n = σ(z k ; k n). θ n is F n measurable and Z n+1 independent of F n. (3) Jérôme Lelong (CERMICS) Tuesday September 5, 6 7 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 8 / 3
It is often more convenient to rewrite (3) as follows where θ n+1 = θ n γ n+1 u(θ n ) γ n+1 δm n+1 }{{}}{{} noise term Newton algorithm } {{ } standard Robbins Monro algorithm + γ n+1 p n+1 }{{} truncation term δm n+1 = U(θ n, Z n+1 ) u(θ n ), { u(θn ) + δm n+1 + 1 γ and p n+1 = n+1 (θ θ n ) if θ n+ 1 / K σ n, otherwise. δm n is a martingale increment, p n is the truncation term. (4) a.s convergence of Chen s procedure Hypothesis (integrability) For all p >, the series n γ n+1δm n+1 1 θn p converges a.s. Hypothesis is satisfied as soon as u and θ E[ U(θ, Z) ] are bounded on any compact sets (or continuous). Hint Theorem Under Hypotheses 1 and, the sequence (θ n ) n defined by (3) converges a.s. to θ and the sequence (σ n ) n is a.s. finite. A proof of this theorem can be found in [Delyon, 1996]. Jérôme Lelong (CERMICS) Tuesday September 5, 6 9 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 1 / 3 1 3 4 General Problem Option pricing problem in a Brownian driven model (no jump): compute E[ψ(G)] by a MC method with G N(, I d ). One way of reducing the variance is to perform importance sampling techniques. For all θ R d, [ ] θ θ G E[ψ(G)] = E ψ(g + θ)e. (5) Minimise[ v(θ) = E ψ(g + θ) e θ G θ ] = E [ψ(g) e [ v(θ) = E (θ G)ψ(G) e θ G+ θ θ θ G+ ]. ]. (6) Jérôme Lelong (CERMICS) Tuesday September 5, 6 11 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 1 / 3
Analysis of the problem To use the previous algorithms, we set u = v and Z = G. v is strictly convex, hence Hypothesis 1 is satisfied. v is not sub-linear. Cannot use a standard S.A. Theorem Hypothesis holds even if ψ is of an exponential type. Theorem holds and provides a way to compute θ. Theorem More details in [Arouna, 4]. Example 1 3 4 Jérôme Lelong (CERMICS) Tuesday September 5, 6 13 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 14 / 3 A CLT for Standard S.A. I First, let us consider the standard algorithm θ n+1 = θ n γ n+1 u(θ n ) γ n+1 δm n+1 with γ n = γ α (n+1), 1/ < α 1. For n, we define the renormalised error n = θ n θ γn. A CLT for Standard S.A. II Hypothesis 3 There exists a function y : R d R d d satisfying lim x y(x) = and a symmetric definite positive matrix A such that u(θ) = A(θ θ ) + y(θ θ )(θ θ ). There exists a real number ρ > such that ( κ = sup E δm n +ρ) <. n There exists a symmetric definite positive matrix Σ such that E(δM n δm n F n 1) P n Σ. Jérôme Lelong (CERMICS) Tuesday September 5, 6 15 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 16 / 3
A CLT for Standard S.A. III Why we cannot deduce a CLT for truncating S.A. from the CLT for Standard S.A. Under Hypotheses 1 and 3 and if the algorithm converges a.s., then n L N(, V ). A proof of this result can be found in [Kushner and Yin, 3], [Benveniste et al., 199], [Bouton, 1985], [Duflo, 1997] for instance. The number of projections is a.s. finite. ω A, P(A) = 1, N(ω), n > N(ω), p n (ω) =. N is r.v. a.s. finite but not bounded. Hence, one cannot use a time shifting argument. Random time shifting does not preserve convergence in distribution. If X n L X and τ < a.s., one can show trivial examples where X n+τ does not converge. Choose for instance τ and τ independent r.v. on {, 1} with parameter 1/ and set X n := ( 1) n (τ τ ). Jérôme Lelong (CERMICS) Tuesday September 5, 6 17 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 18 / 3 A CLT for Randomly Truncating S.A. Consider Chen s Algorithm Definition. Theorem 3 Assume Hypotheses 1 and 3. If there exists η > such that n d(θ, K n ) > η, then n L N(, V ). if 1/ < α < 1 with V = if α = 1 and γa I > with V = γ exp ( At)Σ exp ( At)dt. (( ) ) (( ) ) I I exp γa t Σ exp γa t dt. A functional CLT for Chen s algorithm I We introduce a sequence of interpolating times {t n (u); u, n } t n (u) = sup { with the convention sup =. k ; } n+k γ i u. (7) i=n + + + + + + + γn γn + γ n+1 T Ptn(T) γ i= n+i n () = n and n (t) = n+tn(t)+1 for t. [ n+p For t i=n γ i, n+p+1 i=n γ i [, t n (t) = p and n (t) = n+p+1. Jérôme Lelong (CERMICS) Tuesday September 5, 6 19 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 / 3
A functional CLT for Chen s algorithm II We introduce W n (.) W n () = and W n (t) = Theorem 4 Assume the Hypotheses of Theorem 3. ( n ( ), W n ( )) D D === (, W) n+t n(t)+1 i=n+1 γi δm i for t >. (8) on any finite time interval where is a stationary Ornstein Uhlenbeck process of initial law N(, V ) and W a Wiener process F,W measurable with covariance matrix Σ. Averaging Algorithms I γ (n+1) α with 1/ < α < 1. For any t >, we We restrict to γ n = introduce a moving window average of the iterates ˆθ n (t) = γ n t We study the renormalised error n+ t γn i=n θ i. (9) ˆ n (t) = ˆθ n (t) θ n+ γn t γn = (θ i θ ). (1) γn t We need to characterise the limit law of ( n, n+1,..., n+p ) when (n, p). i=n Jérôme Lelong (CERMICS) Tuesday September 5, 6 1 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 / 3 A CLT for Averaging Algorithms Theorem 5 Under the Hypotheses of Theorem 3, where ˆ n (t) L n N(, ˆV ) ˆV = 1 t A 1 ΣA 1 + A (e At I)V + V A (e At I) t 1 3 4 Jérôme Lelong (CERMICS) Tuesday September 5, 6 3 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 4 / 3
Major steps Donsker s Theorem for martingales increments n is tight in R. Tightness Use a localisation technique to prove that (sup t [,T] n (t) ) n is tight. Prove a Donsker Theorem for martingales increment. n ( ) satisfies Aldous criteria. Tightness in D Theorem (W n ( ), n ( )) n is tight in D D and converges in law to (W, ) where W is a Wiener process with respect to the smallest σ algebra that measures (W( ), ( )) with covariance matrix Σ and is the stationary solution of d (t) = Q (t)dt dw(t). Theorem 6 Let (M n (t)) n be a sequence of martingales. Assume that (M n ( )) n is tight in D and satisfies a C tightness criteria. (M n (t)) n is a uniformly square integrable family for each t. M n t P n t Then, M n ( ) ==== D[,T] B.M. n W n (t) satisfies Theorem 6. Jérôme Lelong (CERMICS) Tuesday September 5, 6 5 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 6 / 3 Comments on Hypothesis M n is a martingale n M n = γ i δm i 1 θi 1 p, i=1 E(M n F n 1 ) = M n 1 + γ n 1 θn 1 pe(δm n F n 1 ). if sup n M n < then M n converges a.s. n M n = γi E(δM i F i 1)1 θi 1 p. i=1 E(δMi F i 1 ) = E ( U(θ, Z) ) θ=θ i 1 u(θ i 1 ) E ( U(θ, Z) ) θ=θ i 1. Jérôme Lelong (CERMICS) Tuesday September 5, 6 7 / 3 Let τ and τ be independent r.v. on {, 1} with parameter 1/. We set X n := ( 1) n (τ τ ). τ τ is symmetric. Hence, X n is constant in law. E(e iuxn+τ ) = E(e iu( 1)n+τ (τ τ ) ), = 1 ( ) E(e iu( 1)n+τ τ ) + E(e iu( 1)n+τ (τ 1) ), = 1 ( ) 1 + e iu( 1)n+11 + e iu( 1)n ( 1) + 1, 4 = 1 ( 1 + e iu( 1)n+1). Hence, (X n+τ ) n does not converge in distribution. Jérôme Lelong (CERMICS) Tuesday September 5, 6 8 / 3
( Basket Option 1 N N i= Si T K ) + Basket Option Figure: Evolution of the drift vector Figure: Importance Sampling Figure: Standard MC basket size : 6 initial value : 11 11 1 8 9 1 maturity time : 1 strike : 1 interest rate :. volatility :. step size : 6 Sample Number : 5 1.3.9.5.1.3 1.8 1.6 1.4 1. 1..8.6.4. 1e3 e3 3e3 4e3 5e3 1.8 1.6 1.4 1. 1..8.6.4. 1e3 e3 3e3 4e3 5e3 1e3 e3 3e3 4e3 5e3 variance =.87959 variance = 11.896 Jérôme Lelong (CERMICS) Tuesday September 5, 6 9 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 3 / 3 Atlas Option Consider a basket of 16 stocks. At maturity time, remove the stocks with the three best and the three worst performances. It pays off 15% of the average of the remaining stocks. Atlas Option Figure: Importance Sampling Figure: Standard MC Figure: Evolution of the drift vector 1..8 1..8 basket size : 16 maturity time : 1 interest rate :. volatility :. step size :.1 Sample Number :.8.6.4..6.4. 4 6 8 1 1 14 16 18.6.4. 4 6 8 1 1 14 16 18 variance =.7596 variance =.615 4 6 8 1 1 14 16 18 Jérôme Lelong (CERMICS) Tuesday September 5, 6 31 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 3 / 3
Arouna, B. (Winter 3/4). Robbins-monro algorithms and variance reduction in finance. The Journal of Computational Finance, 7(). Benveniste, A., Métivier, M., and Priouret, P. (199). Adaptive algorithms and stochastic approximations, volume of Applications of Mathematics (New York). Springer-Verlag, Berlin. Translated from the French by Stephen S. Wilson. Bouton, C. (1985). Approximation Gaussienne d algorithmes stochastiques à dynamique Markovienne. PhD thesis, Université Pierre et Marie Curie - Paris 6. Delyon, B. (1996). General results on the convergence of stochastic algorithms. IEEE Transactions on Automatic Control, 41(9):145 155. Duflo, M. (1997). Random Iterative Models. Springer-Verlag Berlin and New York. Kushner, H. and Yin, G. (3). Stochastic Approximation and Recursive Algorthims and Applications. Springer-Verlag New York, second edition. Chen, H. and Zhu, Y. (1986). Stochastic Approximation Procedure with randomly varying truncations. Scientia Sinica Series. Jérôme Lelong (CERMICS) Tuesday September 5, 6 3 / 3 Jérôme Lelong (CERMICS) Tuesday September 5, 6 3 / 3