Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio (2014-2015) Etienne Tanré - Olivier Faugeras INRIA - Team Tosca October 22nd, 2014 E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 1 / 50

Outline 1 Introduction and Definitions 2 On the convergence rate of Monte Carlo methods 3 Simulation Methods of Classical Laws 4 Variance Reduction 5 Low discrepancy sequences E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 2 / 50

Motivation What is really random? Stochastic Models in general Sources of noise in neuronal activities Monte Carlo Methods Efficiency E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 4 / 50

Noise in neuronal activity Thermal noise Channel noise Electrical noise Synaptic noise E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 5 / 50

Two approaches: strong link between deterministic and stochastic approaches Toy example A = 1 0 f (θ) dθ A = E [f (U)] Ã = f (θ 1, θ 2, θ 3 ) dθ 3 dθ 2 dθ 1 [0,1] 3 Ã = E [f (U 1, U 2, U 3 )] Heat Equation and Brownian Motion u t (t, x) + 1 2 u(t, x) = 0 2 x 2 u(t, x) = Ψ(x) u(t, x) = E [Ψ(W T ) W t = x] E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 6 / 50

Probability Space σ-field Consider an arbitrary nonempty space Ω. A class F of subsets of Ω is called a σ field if Ω F A F implies A c F A 1, A 2, F implies k N A k F Probability measures Consider a real valued function mathbbp defined on a σ field F of Ω. P is a probability measure if it satisfies. 1 0 P(A) 1, for A F 2 P( ) = 0, P(Ω) = 1 3 if A 1, A 2, is a disjoint sequence of F sets, P( A k ) = P(A k ) k=1 k=1 E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 7 / 50

Limit sets Definition For a sequence A 1, A 2, of sets, we define lim sup A n = n lim inf n A n = If lim sup n A n = lim inf n A n, we write lim A n. n=1 k=n n=1 k=n A k A k Theorem 1 For each sequence A n, P(lim inf A n ) lim P(A n ) lim P(A n ) P(lim sup A n ) 2 If A n A then P(A n ) P(A) E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 8 / 50

Borel Cantelli Lemmas Theorem (First Borel Cantelli Lemma) If n P(A n) converges, then P(lim sup n A n ) = 0 Theorem (Second Borel Cantelli Lemma) If {A n } is an independent sequence of events and n P(A n) diverges, then P(lim sup n A n ) = 1 E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 9 / 50

Random Variables Definition (Random variable) A random variable on a probability space (Ω, F, P) is a real valued function X = X(ω), F-measurable. Definition (Cumulative distribution function) The cumulative distribution function F X of a random variable X is F X (y) := P (X y). Properties F X caracterizes the law of X F X is a non decreasing function F X is right continuous lim y F X (y) = 0 lim y + F X (y) = 1 E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 10 / 50

Expected value For general real random variable, E (1 A ) = P(A) ( ) E a i 1 Ai = a i P(A i ) i i If X 0 E (X) = sup E(Y ) Y X,Y = ai1 A i i E(X) = E(X + ) E(X ) E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 11 / 50

Convergence of sequences of random variables Convergence in distribution A sequence X 1, X 2, of random variables is said to converge in distribution to a random variable X if lim F Xn (x) = F X (x) n for every x such that F X is continuous at x. Properties E(f (X n )) Ef (X) for all bounded, continuous functions f ; E(f (X n )) Ef (X) for all bounded, Lipschitz functions f ; lim sup E(f (X n )) Ef (X) for every upper semi-continuous function f bounded from above; lim inf E(f (X n )) Ef (X) for every lower semi-continuous function f bounded from below; lim sup P(X n C) P(X C) for all closed sets C; lim inf P(X n U) P(X U) for all open sets U; It does not imply convergence of probability density function. E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 12 / 50

Convergence of sequences of random variables Convergence in probability A sequence X 1, X 2, of random variables is said to converge in probability to a random variable X if for every ε > 0 lim n P( X n X ε) = 0 Almost sure Convergence A sequence X 1, X 2, of random variables is said to converge almost surely to a random variable X if P(lim n X n = X) = 1. Convergence in mean A sequence X 1, X 2, of random variables is said to converge in r mean to a random variable X if for every E( X n X r ) = 0. E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 13 / 50

Implications X n X n X n X n a.s. P X = X n X P a.s. X = X ϕ(n) X P L X = X n X L r X = X n P X If r s 1, X n L r X = X n L s X E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 14 / 50

Independence Independent events Events A and B are independent if P(A B) = P(A)P(B). More generally, a finite collection A 1,, A n of events is independent if P(A k1 A kj ) = P(A k1 ) P(A kj ) for j 2, n and 1 k 1 < < k j n. Independent random variables Random variables X and Y are independent if the σ fields σ(x) and σ(y ) are independent (i.e. for all A σ(x) and B σ(y ), A and B are independent). E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 15 / 50

Conditional probability A first definition P(A B) = P(A B) P(B) if P(B) 0. If B 1, B 2,, is a countable partition of Ω f (ω) = P(A B i ) if ω B i. The random variable f is the conditional probability of A given G = σ(b 1, B 2, ). Remark It is necessary to extend the definition to non empty sets B i such that P(B i ) = 0. E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 16 / 50

Conditional Expectation Definition Let X be an integrable random variable on (Ω, F, P) and G a σ field in F. There exists a random variable E(X G) called the conditional expected value of X given G having the two properties (i) E(X G) is G measurable and integrable (ii) E [1 G E(X G)] = E [1 G X] for G G. E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 17 / 50

Strong Law of Large Numbers Theorem (Law of Large Numbers) Consider a random variable X, such that E ( X ) <. We denote the mean µ := E (X). We consider a sample of n independent random variables X 1,, X n with the same law as X. Then X 1 + + X n n a.s. n µ. E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 19 / 50

Central Limit Theorem Theorem Consider a random variable X, such that E ( [ X 2) <. Denote the mean and variance by µ := E (X), σ 2 = E (X E(X)) 2] = E(X 2 ) (E(X)) 2. Let us consider a sample of n i.i.d random variables X 1,, X n with the same law as X. Then ( ) n X1 + + X n L µ N (0, 1). σ n n E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 20 / 50

Confidence intervals Assume µ = E(X). An estimator of µ is ˆµ n := 1 n (X 1 + + X n ) Assume n is large enough to be in the asymptotic regime. [ ] n P σ (ˆµn µ) A P (G A) where G N (0, 1) α there exists y α such that P( G y α ) = α An example of the size of the confidence interval For α = 95%, y α = 1.96. P ( µ [ ˆµ n 1.96σ n, ˆµ n + 1.96σ n ]) 95% E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 21 / 50

A non asymptotic estimate Theorem (Berry-Esseen) Let (X i ) i 1 be a sequence of independent and identically distributed random variables with zero mean. Denote by σ the common standard deviation. Suppose that E X 3 < +. Then ( ) ε n := sup P X1 + + X x N x R σ x e u2 /2 du N 2π CE X 1 3 σ 3 N. In addition, 0.398 C 0.8. For a proof, see, e.g., Shiryayev (1984). E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 22 / 50

A more precise result We now give a result which is slightly more precise than the Berry-Esseen Theorem: the estimate is non uniform in x. See Petrov (1975) for a proof and extensions. Theorem (Bikelis) Let (X i ) i 1 be a sequence of independent real random variables, which are not necessarily identically distributed. Suppose that EX i = 0 for all i, and that there exists 0 < δ 1 such that E X i 2+δ < + for all i. Set σ 2 i := EX 2 i, B N := N σi 2, i=1 F N (x) := P [ N i=1 X i BN Denote by Φ the distribution function of a Gaussian law with zero mean and unit 1 variance. There exists a universal constant A in (, 1) independent of N and 2π of the sequence (X i ) i 1, such that, for all x, F N (x) Φ(x) A B 1+δ/2 N (1 + x ) 2+δ x N E X i 2+δ E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 23 / 50 i=1 ].

Uniform Law Generating a sequence U 1,, U n of i.i.d. uniform random variables Properties 1 n 1 n n 1 a Ui b b a a, b [0, 1] i=1 n i=1 Congruencial generator N max N n 0 N n k+1 an k + b (mod) N max u k = n k N max ( U 2i+1 1 ) ( U 2i+2 1 ) 0 2 2 etc. E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 25 / 50

Remarks 1 This sequence (u k ) k 1 mimics a sequence of independent random variable uniformly distributed on (0, 1) but it is a deterministic sequence. 2 It allows us to compare several methods with the same random events 3 These sequences are periodic 4 We have to take care to the period: as long as possible. 5 A good choice: Mersenne Twister. Do not forget If you want to use a software or a given language in order to apply stochastic numerical methods, you have to find its own uniform random generator or to download a good uniform generator. E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 26 / 50

Inverse the Cumulative Distribution Function Proposition Let X be a random variable with cumulative distribution function F X. Let U be a random variable uniformly distributed on (0, 1). F 1 X (U) X E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 27 / 50

Rejection Procedure Principle Our aim is to estimate E [FG] with 0 G 1 almost surely. The idea : write G = P(X)(= P(X F, G)) E [FG] = E [F 1 X ] = E [F X] E [G] E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 28 / 50

Simulation of a random variable with a rejection procedure Let X be a r.v. with density f. We do not know how to simulate it. Let Y be a r.v. with density g. We know how to simulate it. Assumption: x R 0 f (x) Cg(x). We set h(x) := f (x) Cg(x) 1 {g(x)>0} E [ϕ(x)] = ϕ(x)f (x)dx = ϕ(x) f (x) g(x) g(x)dx [ = E ϕ(y ) f (Y ) ] [ = CE ϕ(y ) f (Y ) ] = CE [ϕ(y )h(y )] g(y ) Cg(Y ) = CE [ ] ϕ(y )1 {U h(y )} = CE [ϕ(y ) U < h(y )] P [U h(y )] E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 29 / 50

P [U h(y )] = E [h(y )] f (y) f (y) = Cg(y) g(y)dy = C dy = 1 C E [ϕ(x)] = CE [ϕ(y ) U < h(y )] P [U h(y )] = E [ϕ(y ) U < h(y )] E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 30 / 50

Algorithm 1 Generate Y 2 Compute for this realisation f (Y ) Cg(Y ) 3 Generate a random variable U, indep. of Y, with uniform law on (0, 1). 4 If U f (Y ), accept the realisation, that is X = Y Cg(Y ) 5 Else (if U > f (Y ) ), reject the realisation and start again from first step. Cg(Y ) Remark You have to wait a random time to obtain each realisation The probability of acceptance is equal to 1 C. Smaller is C, better is the algorithm. E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 31 / 50

Gaussian r.v.: Box Muller Procedure Algorithm Generate (U 1, U 2 ) two independent random variables uniformly distributed on (0, 1) Set G 1 = 2 log(u 1 ) cos(2πu 2 ) G 2 = 2 log(u 1 ) sin(2πu 2 ) G 1 and G 2 are two independent gaussian random variables N (0, 1). E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 32 / 50

Gaussian r.v.: Marsiglia Polar Procedure Algorithm Generate (X 1, X 2 ) a point uniformly distributed in the unit disc. Compute S = X1 2 + X 2 2 Set 2 log (S) G 1 = X 1 S 2 log (S) G 2 = X 2 S G 1 and G 2 are two independent gaussian random variables N (0, 1). E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 33 / 50

Marsaglia Polar Procedure (2) Remark To generate (X 1, X 2 ), uniformly distributed on the unit disc, we use a rejection procedure: We generate U 1 U(0, 1) and U 2 U(0, 1) indep. We set Y 1 = 2U 1 1,Y 2 = 2U 2 1 If Y 2 1 + Y 2 2 1, (X 1, X 2 ) = (Y 1, Y 2 ) ACCEPTANCE Else, REJECTION go to the first step. E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 34 / 50

Context Notation µ = E(Ψ(Z)) ˆµ N = 1 N N Ψ(Z i ) i=1 Analysis of the error Central Limit Theorem gives our aim is to obtain the best accuracy µ ˆµ N σ N where σ 2 = Var(Ψ(Z)) E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 36 / 50

Comparison of methods We want to approximate µ = E(X) We know also that µ = E(Y ) We have two ways to approximate µ : Constraints µ 1 N X N X i=1 X i µ 1 N Y N Y Y i The best method should give the result with the best accuracy in a fixed execution time (say A) i=1 E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 37 / 50

Comparison of methods each simulation of X take a time T X each simulation of Y take a time T Y we have time to generate N X = A T X (and N Y = A T Y ) the accuracy is σ X NX = we have to compare σ 2 X T X and σ 2 Y T Y. σ 2 X T X A. E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 38 / 50

Variance Reduction Importance Sampling You want to estimate with a Monte Carlo procedure E [ϕ(x)] where X has a density g. E [ϕ(x)] = ϕ(x)g(x)dx You choose a random variable Y with density f. ϕ(x)g(x) E [ϕ(x)] = ϕ(x)g(x)dx = f (x)dx [ ] f (x) ϕ(y )g(y ) E [ϕ(x)] = E. f (Y ) Control Variables E(f (X)) = E(f (X) h(x)) + E(h(X)) E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 39 / 50

Variance Reduction Antithetic Variables You want to estimate I = 1 Thanks to the change of variable y = 1 x, I = 1 0 0 g(1 y)dy = 1 2 g(x)dx 1 = 1 E(g(U) + g(1 U)) 2 0 g(x) + g(1 x)dx E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 40 / 50

Antithetic Variables Theorem Let us consider a monotone function g, a non increasing function T and a sequence of i.i.d. random variables (X i, i N) such that X i and T (X i ) have the same law. Then, Var [ 1 n n i=1 ] [ g(x i ) + g(t (X i )) 1 Var 2 2n 2n i=1 g(x i ) ] E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 41 / 50

Stratification Idea X is a r.v. with value in E, you want to estimate E [g(x)] You split the space E into E 1, E 2,..., E k. k I = E [g(x) X E i ] P [X E i ] I = Î = i=1 k p i I i. i=1 k p i Î i. i=1 σ 2 i = Var (g(x) X E i ). E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 42 / 50

You use a Monte Carlo procedure with n i realisations to estimate I i. ( k ) 2 Var(Î) = 1 p i σ i n n i i=1 under the constraint n = n i, the optimal choise of n i is n i = n p iσ i. k p j σ j j=1 E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 43 / 50

Low discrepancy Sequences Using sequences of points more regular than random points may sometimes improve Monte Carlo methods. We look for deterministic sequences (x i, i 1) such that f (x)dx 1 [0,1] n (f (x 1) + + f (x n )) d for all function f in a large enough set. Definition These methods with deterministic sequences are called quasi Monte Carlo methods. One can find sequences such that the speed of convergence of the previous approximation is of order K log(n)d n E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 45 / 50

Low discrepancy Sequences (2) Definition (Uniformly distributed sequences) For all y, z [0, 1] d, we say that y z if i = 1,.., d,y i z i. A sequence (x 1, i 1) is said to be uniformly distributed on [0, 1] d if one of the following equivalent properties is fulfilled: 1 For all y = (y 1,, y d ) [0, 1] d 1, lim n + n 2 Let Dn (x) = sup y [0,1] d the sequence, then 1 n n 1 xk [0,y] = Volume([0, y]) k=1 n 1 xk [0,y] Volume([0, y]) be the discrepancy of k=1 lim n D n (x) = 0 3 For every (bounded) continuous function f on [0, 1] d 1 lim n + n n f (x k ) = f (x)dx [0,1] d k=1 E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 46 / 50

Low discrepancy Sequences(3) Remark If (U n ) n 1 is a sequence of independent random variables with uniform law on [0, 1], the random sequence (U n (ω), n 1) is almost surely uniformly distributed. The discrepancy fulfills an iterated logarithm law 2n lim sup n log(log n) D n (U) = 1 Lower bound for the discrepancy: Roth Theorem a.s. The discrepancy of any infinite sequence satisfies the property Dn (log n) d 1 2 > C d n for d 3 for an infinite number of values of n, where C d is a constant which depends on d only. E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 47 / 50

Low discrepancy Sequences (4) Koksma-Hlawka inequality Let g be a finite variation function in the sense of Hardy and Krause and denote by V (g) its variation. Then, for n 1, 1 N N g(x k ) g(u)du [0,1] V (g)d N(x) d k=1 Finite variation function in the sense of Hardy and Krause If the function g is d times continuously differentiable, the variation V (g) is given by d k=1 1 i 1<..<i k d { x [0, 1] d x j = 1 for j i 1,.., i k k g(x) x i1 x ik dx i 1 dx ik E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 48 / 50

Popular Quasi Monte Carlo Sequences 1 Faure sequences 2 Halton sequences 3 Sobol sequences 4 van der Corput sequences An upper bound For such sequences, we obtain an upper bound of the discrepancy: Remark For small d: deterministic methods Dn (log n)d C n For moderated d: Quasi Monte Carlo methods For large d: Monte Carlo methods E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 49 / 50

Low discrepancy Sequences Figure: Halton Points Figure: (Pseudo) uniform points E. Tanré (INRIA - Team Tosca) Mathematical Methods for Neurosciences October 22nd, 2014 50 / 50