Extrema of log-correlated random variables Principles and Examples

Extrema of log-correlated random variables Principles and Examples Louis-Pierre Arguin Université de Montréal & City University of New York Introductory School IHP Trimester CIRM, January 5-9 2014

Acknowledgements Thank you very much to the organizers for the invitation! Much of what I know on the topic I learned from my collaborators: Anton Bovier, Nicola Kistler, Olivier Zindy, David Belius; and my students: Samuel April, Jean-Sébastien Turcotte et Frédéric Ouimet. I am grateful for all the discussions and insights on the subject. There are many outstanding papers on the subject. I will not be able to reference everybody on the slides. See my webpage http://www.dms.umontreal.ca/ arguinlp/recherche.html for the slides and detailed complementary references.

What is the Statistics of Extremes? The statistics of extremes or extreme value theory in probability deals with questions about the maxima of a collection of random variables: Consider N random variables on a probability space (Ω, F, P) (X i, i = 1,..., N) In the limit N, What can be said about the r.v. max i=1,...,n Xi? Law of the maximum What can be said about the joint law of the reordered collection X (1) X (2) X (3)... Order Statistics or Extremal Process In this mini-course, we will mostly focus on the law of the maximum.

Statistics of Extremes To keep in mind: max i=1,...,n Xi is a functional on the process (X i, i = 1,..., N) like the sum. Our objectives are similar in spirit as the limit theorems for a sum of random variables N i=1 Xi in the limit N Order of magnitude of the maximum Law of Large Numbers Fluctuation of the maximum Central Limit Theorem Ultimately, we want to answer the following questions: Problem Find a N and b N such that max i N X i a N b N N. Identify the limit. converges in law in the limit

Statistics of Extremes: A brief history Earlier works on the theory of extreme values focused on the case where or weakly correlated. (X i, i N) are IID r.v. We have a complete answer to the question. There are only three possible limit laws: Fréchet, Weibull or Gumbel. 1925: Tippett studied the largest values from samples of Gaussians 1927: Fréchet studied distributions other than Gaussian. Obtains the Fréchet limit law. 1928: Fisher & Tippett find two other limit laws. 1936: von Mises find sufficient conditions to converge to the 3 classes. 1943: Gnedenko find necessary and sufficient conditions. 1958: Gumbel writes the first book Statistics of extremes Figure : Gumbel 1891-1966

Statistics of Extremes: Motivation There are applications of the theory of extreme values for IID r.v. in meteorology (floods, droughts, etc). One goal of today s probability theory: Find other classes for the maximum when the r.v. s are STRONGLY CORRELATED. What are the motivations to look at strongly correlated r.v.? Finance: Evidence of slowly-decaying correlations for volatility Physics: The behavior of systems in Statistical physics is determined by the states of lowest energies. States are often correlated through the environment. Ex: Spin glasses, polymers, growth models (KPZ, Random matrices) Mathematics: Distribution of prime numbers seems to exhibit features of strongly correlated r.v. (Lecture 3). Of course, there are many correlation structures that can be studied. We will focus on one class LOG-CORRELATED models

Outline Lecture 1 1. Warm-up: Extrema of IID r.v. 2. Log-correlated Gaussian fields (LGF) Branching Random Walk (BRW) and 2D Gaussian Free Field (2DGFF) 3. Three fundamental properties 4. First order of the maximum ( LLN) Lecture 2 ntermezzo Relations to statistical physics 5. Second order of the maximum ( refined LLN) 6. A word on Convergence and Order Statistics Lecture 3: Universality Class of LGF 7. The maxima of characteristic polynomial of unitary matrices 8. The maxima of the Riemann zeta function

General Setup When we are dealing with correlations, it is convenient to index the r.v. s by points in a metric space, say V n with metric d. Choice of parametrization: (X n(v), v V n) #V n = 2 n E[X n(v)] = 0 for all v V n E[X n(v) 2 ] = σ 2 n For simplicity, assume that (X n(v), v V n) is a Gaussian process. Technical advantages: The covariance encodes the law. Comparison arguments (Slepian s Lemma) may simplify some proofs. The principles that we will discuss hold (or are expected to) in general.

1. Warm-up: The maximum of IID variables Consider (X i, i = 1,... 2 n ) IID Gaussians of variance σ 2 n. In this case it is easy to find a n and b n such that ( ) X i a n P max x converges. i b n Note that a n and b n are defined up to constants, additive and multiplicative resp. We obviously have P ( ) X i a n max x i b n We need to establish convergence of = = ( ) 2 n P(X 1 b nx + a n) ( ) 2 n 1 P(X 1 > b nx + a n). 2 n P(X 1 > b nx + a n) More refined than large deviation.

1. Warm-up: Extrema of IID variables Proposition Consider (X i, i = 1,... 2 n ) IID Gaussians of variance σ 2 n. Then for with c = 2 log 2 σ, we have a n = cn σ2 2c log n P ( max X i a n x ) exp( e cx ) i Gumbel distribution In other words We refer to ( ) max i 2 n Xi = cn σ2 2c log n + G }{{}}{{} Fluctuation Deterministic Order First order of the maximum: cn Second order of the maximum: σ2 log n. 2c Our goal: Establish a general method to prove similar results for log-correlated fields

2. Log-correlated Gaussian Fields ; 5 10 15 20 20 15 v ^ v0 10 5 2 01-1 -2 Vn v v0

Log-correlated Gaussian fields A Gaussian field (X n(v), v V n) is log-correlated if the covariance decays slowly with the distance E[X n(v)x n(v )] log d(v, v ) 2 n This is to be compared with d(v, v ) α or e d(v,v ). This implies that there are an exponential number of points whose correlation with v is of the order of the variance. Precisely, for 0 < r < 1, and a given v V n, { } v V n : E[Xn(v)Xn(v )] r 2n E[Xn(v)] 2 2 rn The correlations do not have to be exactly logarithmic. Approximate or asymptotic log-correlations is enough.

Example 1: Branching Random Walk V n: leafs of a binary tree of depth n Let (Y l ) be IID N (0, σ 2 ) on edges X n(v) = Y l (v) ; l: v Y 1 (v) Variance: E[X n(v) 2 ] = n l=1 E[Y 2 l (v)] = σ 2 n Y 2 (v) v ^ v 0 Covariance: E[X n(v)x n(v )] = v v l=1 E[Y 2 l (v)] = σ 2 v v Y 3 (v) For any 0 r 1 { } v V n : E[Xn(v)Xn(v )] r 2n E[Xn(v)] 2 2 rn v v 0 V n

Example 2: 2D Gaussian Free Field I I Vn : square box in Z2 with 2n points. 5 10 15 20 20 (Xn (v), v Vn ) Gaussian field with 15 0 E[Xn (v)xn (v )] = Ev "τ V Xn # k=0 (Sk )k 0 SRW starting at v. I 10 1{Sk =v0 } 5 2 01-1 -2 Log-Correlations E[Xn (v)2 ] = σ 2 n + O(1) 2n + O(1) E[Xn (v)xn (v 0 )] = log kv v 0 k2 where σ 2 = log 2 π v Vn far from the boundary v, v 0 Vn far from the boundary

3. Fundamental Properties There are three fundamental properties of log-correlated random variables. They are well-illustrated by the case of branching random walk. 1. Multiscale decomposition n X n(v) = Y l (v) l=1 ; Y 1 (v) Define X k (v) = k l=1 Y l(v), 1 k n 2. Self-similarity of scales For a given v, (X n(v ) X l (v ), v v l) Y 2 (v) v ^ v 0 is a BRW on (v, v v l) 2 n l points 3. Dichotomy of scales Y 3 (v) E[Y l (v)y l (v )] = { σ 2 if l v v 0 if l > v v v v 0 V n

Fundamental Properties We now verify the properties for the 2DGFF (X n(v), v V n). Reminder It is good to see (X n(v), v V n) as vectors in a Gaussian Hilbert space. E[X n(v) 2 ]: square norm of the vector E[X n(v)x n(v )], the inner product For B V n, the conditional expectation of X n(v) on X n(v ), v B, E[X n(v) {X n(v ), v B}] = a vv X n(v ) v B is the projection on the subspace spanned by X n(v ), v B. In particular, it is a linear combination of the X v, hence also Gaussian. Orthogonal decomposition X n(v) = (X n(v) E[X n(v) {X n(v ), v B}]) + E[X n(v) {X n(v ), v B}]

Fundamental Properties 1: Multiscale decomposition n X n(v) = Y l (v) l=1 V n Consider B l (v), a ball around v containing 2 n l points. Define F l = σ{x n(v ) : v / B l (v)} Define X l (v) = E[X n(v) F l (v)], l < n. n-l 2 points (X l (v), l n) is a martingale. Lemma (Multiscale) The increments Y l (v) = X l (v) X l 1 (v), l = 1,..., n are independent Gaussians.

Fundamental Properties 2: Self-Similarity Lemma (Self-Similarity) For a given v, (X n(v ) X l (v ), v v l) has the original law on (v, v v l) If B V n, write X B(v) = E[X n(v) {X n(v ), v / B}]. Then ( ) X n(v) X B(v), v B is a GFF on B. In our case, B are the neighborhoods B l (v) containing 2 n l points. E[(X n(v) X l (v)) 2 ] = σ 2 (n l)+o(1) The Y l s have variance σ 2 (1 + o(1)). Linearity of scales! Warning! If v B l (v), it is not true that X l (v ) = X l (v) (as in BRW)... but close! V n n-l 2 points

Fundamental Properties 3: Dichotomy V n E[Y l (v)y l (v )] = { σ 2 if l v v 0 if l > v v Define v v := greatest l such that B l (v) B l (v ). Lemma (Gibbs-Markov Property) For B V n, E[X n(v) {X n(v ), v B c }] = p u(v)x u u B This implies that X n(v) X l (v) = n k=l+1 Y l(v) is independent of Y l (v ) for all l such that v v < l.

Fundamental Properties 3: Dichotomy V n E[Y l (v)y l (v )] = { σ 2 if l v v 0 if l > v v v v' Lemma (Markov Property) For B V n, E[X n(v) {X n(v ), v / B}] = p u(v)x u u A This implies that X n(v) X l (v) = n k=l+1 Y l(v) is independent of Y l (v ) for all l such that v v < l. The decoupling is not exact at the branching point but is for larger scales soon after.

Fundamental Properties 3: Splitting V n E[Y l (v)y l (v )] = { σ 2 if l v v 0 if l > v v v v' Lemma For all l such that v v 2 < 2 n l (l neighborhoods touch) E[(X l (v) X l (v )) 2 ] = O(1) This implies E[X l (v)x l (v )] = E[X l (v) 2 ] + O(1) = σ 2 l + O(1). Thus E[Y l (v)y l (v )] = σ 2 + o(1).

Lectures goals For the remaining part of the lectures, our specific goals are to prove the deterministic orders of the maximum using the 3 properties. Theorem 1. First order: 2. Second order: max v VN X n(v) lim = 2 log 2σ =: c in probability n n max v VN X n(v) cn log n In other words, with large probability max X n(v) = v V n = 3 σ 2 2 c ( cn 3 σ 2 2 c log n Lectures 1 and 2: 2DGFF (BRW as a guide) in probability ) + O(ε log n). Lecture 3: toy model of the Riemann zeta function

4. The first order of the maximum max v Vn X n(v) lim = 2 log 2 σ n n W1(v) W2(v) 1 K n<v^ v0 apple 2 K n W3(v) W3(v 0 ) v v 0

First order of the maximum Let (X n(v), v V n) be a Gaussian field with #V n = 2 n, E[X n(v) 2 ] = σ 2 n. Theorem (First order of the maximum) If (X n(v), v V n) satisfies the three properties (multiscale, self-similarity, splitting), we have max v Vn X n(v) lim = 2 log 2 σ n n }{{} =c in probability This was shown by Biggins 77 for the BRW. This was shown by Bolthausen, Deuschel & Giacomin 2001 for GFF. We follow here the general method of Kistler (2013). 1. Upper bound: P ( max v Vn X n(v) > (c + δ)n ) 0 2. Lower bound P ( max v Vn X n(v) > (c δ)n ) 1 (c )n (c + )n

Upper bound: Plain Markov This is the easy part. Consider the number of exceedances of a level a N n(a) = #{v V n : X n(v) > a} Clearly, by Markov s inequality (or union bound) P ( max X n(v) > a ) = P ( N n(a) 1 ) v V n Note that correlations play no role here! By Gaussian estimate with a = (c + δ)n E[N n(a))] = 2 n P(X n(v) > a) 2 n P(X v > (c + δ)n) 2 n e (c+δ)2 n/2σ 2n e 2 log 2δn/σ goes to zero exponentially fast as n. c = 2 log 2 σ is designed to counterbalance the entropy 2 n e c2 n/2σ 2n = 1.

Lower bound: Multiscale second moment The only tool at our disposal to get lower bound for the right tail of a positive random variable is the Paley-Zigmund inequality: P(N 1) We would like to show that for a = (c δ)n E[N ]2 E[N 2 ] P ( N n(a) 1 ) E[Nn(a)]2 E[N n(a) 2 ] 1 The correlations play a role in the denominator. Good news: we need to find an upper bound. E[N n(a) 2 ] = ( Xn(v) > a, X n(v ) > a ). If the r.v. were independent v,v V n P E[N n(a) 2 ] = v v P ( X n(v) > a )2 + = E[N n(a)] 2 + v E[N n(a)] 2 }{{} +E[N n(a)] dominant for a small! v V n P ( X n(v) > a ) P ( X n(v) > a ) (1 P ( X n(v) > a ) )

Lower bound: Multiscale second moment Use the multiscale decomposition (Prop. 1). K scales (large but fixed) suffices X n(v) = K k=1 k 1 K Y l (v) n<l K k }{{ n } :=W k (v) (W k (v), k = 1,... ) are IID N (0, σ 2 n/k) Prop. 1 and 2 Define a modified number of exceedances Ñ n(a) = # {v V n : W k (v) > a } K k = 1,..., n W2(v) W1(v) k =1 k =2 Note that Since the first order is linear in the scales, this is a good choice. P ( N n(a) 1 ) P ( Ñ n(a) 1 ) v

Lower bound: Multiscale second moment Not losing much in dropping W 1 K X n(v) = W 1(v) + W k (v) > (c δ)n }{{} k=2 > δn }{{} >a( K 1 K ) P(W 1(v) > δn) 1 since Var(W 1) = n/k. This step is crucial and not only technical. W2(v) W1(v) k =1 k =2 v Ñ n(a) = #{v V n : W k (v) > a K It remains to show for a = (c δ)n k = 2,..., K} P ( Ñ n(a) 1 ) E[Ñn(a)]2 E[Ñn(a)2 ] 1

Lower bound: Multiscale second moment The second moment for these exceedances is E[Ñn(a)2 ] = K k=1 v,v : k 1 K n<v v K k n P (W j(v) > a K, Wj(v ) > a ) K j 2 We expect the dominant term to be k = 1 (most independence). For v, v with v v n/k, Prop 3. Splitting P (W j(v) > a K, Wj(v ) > a ) K j 2 = P (W j(v) > a ) 2 K j 2 #{v, v : v v n/k} = 2 2n 2 n 2 n n/k = 2 2n (1 + o(1)) ) 2 But E[Ñn(a)]2 = 2 2n P (W j(v) > ak j 2 E[Ñn(a)2 ] = (1 + } o(1))e[ñn(a)]2 +... {{} k>1 dominant?

Lower bound: Multiscale second moment k>1 v,v : k 1 K n<v v K k n P (W j(v) > a K, Wj(v ) > a ) K j 2 Since we need an upper bound, we can drop conditions in the probability. W1(v) Take v v = l for k 1 n < l k n K K ) P (W j(v) > ak j 2, W j (v ) > ak j k + 1 Use Prop. 3 (splitting): if j > v v, Y j(v) indep. of Y j(v ) P (W j(v) > a ) K j k P (W j (v) > a ) 2 K j k+1 W2(v) 1 K n<v^ v0 apple 2 K n W3(v) W3(v 0 ) v v 0

Lower bound: Multiscale second moment k>1 v,v : k 1 K n<v v K k n P (W j(v) > a K, Wj(v ) > a ) K j 2 Since we need an upper bound, we can drop conditions in the probability. Take v v = l for k 1 2 n 2 k 1 n K K n such pairs. n < l k n. At most K P (W j(v) > a ) K j k P (W j (v) > a ) 2 K j k+1 W1(v) The inside sum is E[Ñn(a)]2 times W2(v) 1 K n<v^ v0 apple 2 K n k 2 k 1 K n 2 k 1 K j=2 ( P W j(v) > a ) 1 K n 2 k 1 K n(1 δ)2 W3(v) W3(v 0 ) This goes to 0 exponentially fast! v v 0