Inhomogeneous circular laws for random matrices with non-identically distributed entries

Inhomogeneous circular laws for random matrices with non-identically distributed entries Nick Cook with Walid Hachem (Telecom ParisTech), Jamal Najim (Paris-Est) and David Renfrew (SUNY Binghamton/IST Austria) UC Davis, 2017/04/19

The Circular Law Empirical spectral distribution (ESD) for M M n (C) with eigenvalues λ 1,..., λ n C: µ M = 1 n δ λi. iid matrix model: Atom variable: ξ C with E ξ = 0, E ξ 2 = 1. For n 1 let X n be n n with iid entries ξ (n) d = ξ. Theorem (Mehta 67, Girko 84, Edelman 97, Bai 97, Götze Tikhomirov 05, 07, Pan Zhou 07, Tao Vu 07, Tao Vu 08) Almost surely, µ 1 n X n converges weakly to 1 π 1 B(0,1)dxdy as n. N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 2 / 24

Applications: stability for mean field models of dynamical systems Model a neural network with a nonlinear system of ODEs: ẋ i (t) = x i (t) + Y tanh(x j (t)) where x i is the membrane potential for the ith neuron and tanh(x i ) is its firing rate. Sompolinsky, Crisanti, Sommers 88: modeled the synaptic matrix Y with a random matrix. (Similar to approach by [May 72] in ecology.) More recent work [Rajan, Abbott 06], [Tao 09], [Aljadeff, Renfrew, Stern 14] has tried to incorporate other structural features of neural networks into the distribution of Y. Desirable to consider Y with a general, possibly sparse variance profile, i.e. Y n = 1 ( 1 A n X n = n σ ξ ). n j=1 N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 3 / 24

Simulated ESDs for Y n = ( 1 n σ ξ ). 0.4 0.3 0.2 0.1 0 0.1 0.2 n = 2000 ξ {± 1 2 ± i 1 2 } uniform. σ = σ( i n, j n ), with σ(x, y) = 1( x y 0.05) 0.3 0.4 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 4 / 24

( ) Simulated ESDs for Y n = n 1 σ ξ 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 n = 2000 ξ {± 1 2 ± i 1 2 } uniform. σ = σ( i n, j n ), with σ(x, y) = (x + y) 2 1( x y 0.1) 0.8 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 5 / 24

( ) Simulated ESDs for Y n = n 1 σ ξ 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 n = 2001 ξ {± 1 2 ± i 1 2 } uniform. 0 1 n/3 1 n/3 A n = (σ ) = 1 n/3 0 0. 1 n/3 0 0 0.8 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 6 / 24

The model and assumptions Inputs: Y n = 1 ( 1 A n X n = n n σ (n) ) ξ (n). Atom variable ξ C with E ξ = 0, E ξ 2 = 1, E ξ 4+ε <. X n = (ξ (n) ) sequence of iid matrices with atom variable ξ. Standard deviation profiles A n = (σ (n) ) with uniformly bounded entries, i.e. σ (n) [0, 1]. Additionally assume A n is robustly irreducible for all n. Assumptions allow for vanishing variances (σ (n) = 0 for (say) 99% of entries). N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 7 / 24

Results Y n = 1 n A n X n = ( 1 n σ (n) ξ (n) E ξ 4+ε <, σ (n) [0, 1], A n robustly irreducible. ), with Theorem (C., Hachem, Najim, Renfrew 16) (Abridged) In the above setup, for each n, A n determines a deterministic, compactly supported, rotationally invariant probability measure µ n over C such that µ Yn µ n in probability, i.e. f dµ Yn f dµ n 0 in probability f C b (C). C C Refer to measures µ n as deterministic equivalents for µ Yn. Recent work of Alt, Erdős and Krüger gives a local version under stronger hypotheses (ξ having bounded density and all moments, σ (n) σ min > 0). N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 8 / 24

Results Y n = 1 n A n X n = ( 1 E ξ 4+ε <, n σ (n) ξ (n) ), with σ (n) uniformly bounded, A n robustly irreducible. V n = ( 1 n σ2 ) doubly stochastic. Theorem (Circular law for doubly stochastic variance profile) With Y n as above, µ Yn 1 π 1 B(0,1)dxdy in probability. Analogue of a result of Anderson and Zeitouni 06 for Hermitian random matrices. N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 9 / 24

How does A n determine µ n? Through the Master Equations For s > 0 consider the following system in unknowns q, q R n : ME(s) : q i = q i = (V T n q) i s 2 + (V n q) i (V T n q) i, (V n q) i s 2 + (V n q) i (V T n q) i, q i = Can show that if V n is irreducible, s (0, ρ(v n )), unique non-trivial solution (q(s), q(s)) R 2n 0. µ n is the radially symmetric probability measure on C with µ n (B(0, s)) = 1 1 n q(s)t V n q(s) s (0, ) where we set q(s) = q(s) = 0 for s ρ(v n ). q i. N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 10 / 24

Plot of F n (s) = 1 1 n q(s)t V n q(s) (curve) vs empirical realization (+) 1 0.8 0.6 0.4 0.2 n = 2001 ξ {± 1 2 ± i 1 2 } uniform. 0 1 n/3 1 n/3 A n = (σ ) = 1 n/3 0 0. 1 n/3 0 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 11 / 24

Useful transforms for proving convergence of measures For sums of independent scalar r.v. s use the Fourier transform (characteristic function). For µ supported on R (such as ESDs of Hermitian matrices) we have the Stieltjes transform s µ : C + C +, s µ (w) = R dµ(x) x w, µ = lim 1 t 0 π Im s µ( + it) = lim µ Cauchy(t). t 0 For µ supported on C (such as ESDs of non-hermitian matrices) we have the log transform (or Coulomb potential) U µ (z) = log λ z dµ(λ), µ = 1 2π U µ. C N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 12 / 24

Girko s Hermitization approach log transform for probability measure µ on C: U µ (z) = log λ z µ(dλ), µ = 1 2π U µ. For the ESD of Y n = 1 n A n X n, C U µyn (z) = 1 log λ i (Y n ) z = 1 n n log det(y n z) = 1 2n log det(y n z ) = log x dµ Y z n (x) R ( ) where Yn z 0 Y n z := is a 2n 2n Hermitian matrix with Yn z 0 eigenvalues {±s i (Y n z)} n R.. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 13 / 24

Girko s Hermitization approach Y z n = 0 Y n z Y n z 0, U µyn (z) = log x dµ Y z n (x). R Two steps. For a.e. z C: 1 Find deterministic equivalents ν z n µ Y z n. 2 Prove these measures uniformly (in n) integrate log. Now we can use the Stieltjes transform sn(w) z := R dµ Y z n (x) x w = 1 2n 2 where R z n (w) := (Y z n w) 1 is the resolvent of Y z n. 1 λ i (Yn z ) w = 1 2n Tr Rz n (w) N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 14 / 24

Deterministic equivalents for resolvents R z n (w): Complex Gaussian case I 2n = Rn z (w) 1 Rn z (w) = w Yn z Y n z S w T T = ws z T + Y T S 1 = w E S ii z E T ii + (Gaussian IBP) = w E S ii z E T ii + (Resolvent derivative formula) (Cauchy Schwarz & Poincaré) = w E S ii z E T ii 1 n E Y Tji j=1 j=1 1 n σ2 E Y T ji σe 2 S ii Sjj j=1 w E S ii z E T ii 1 n (E S ii) σ(e 2 S jj ). j=1 N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 15 / 24

Deterministic equivalents for resolvents Rn z (w) = (Yn z w) 1 = ( S T) T S Similarly obtain equations for E S ii, E S ii, E T ii, E T ii, 1 i n from diagonal entries of the other three blocks. Eventually reduce to a perturbed cubic system of 2n equations: E S ii = E S ii = (Vn T s) i + w z 2 [(Vn T s) i + w][(v n s) i + w] + E n, (V n s) i + w z 2 [(Vn T s) i + w][(v n s) i + w] + E n, where s := (E S ii ) n, s := (E S ii ) n, and E n, E n are small errors. N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 16 / 24

Deterministic equivalents for resolvents Rn z (w) = (Yn z w) 1 = ( S T) T S Stability analysis s, s are close to solutions p( z, w), p( z, w) of the unperturbed system (the Schwinger Dyson loop equations): (Vn T p) i + w p i = z 2 [(Vn T p) i + w][(v n p) i + w], LE( z, w) : (V n p) i + w p i = z 2 [(Vn T p) i + w][(v n p) i + w]. In particular: E sn(w) z = E 1 2n Tr Rz n (w) = E 1 2n 1 2n ( S ii + ( p i + ) S ii ) p i = 1 n p i. N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 17 / 24

Deterministic equivalents for resolvents Rn z (w) = (Yn z w) 1 = ( S T) T S E s z n(w) 1 n p i. Extend to the non-gaussian case by a Lindeberg swapping argument, and remove the expectation with concentration of measure. Get s z n(w) 1 n p i a.s. in the general case (and we can quantify the error). RHS is the Stieltjes transform of a probability measure ν z n on R, so µ Y z n ν z n a.s. N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 18 / 24

Integrability of log We obtained deterministic equivalents ν z n for the Hermitized ESDs µ Y z n. Remains to show for a.e. z C, µ Y z n and νn z uniformly integrate log, i.e. ε > 0 T > 0 : log x dµ Y z n (x) ε log x T with probability 1 ε, and similarly for ν z n. Singularity of log at is easy to handle. Two-step approach to singularity at 0: 1 (Wegner estimate) Use bounds on the Stieltjes transform to show µ Y z n ([ t, t]) = O(t) for t n c. 2 (Invertibility) Prove λ min (Y z n ) = s min (Y n z) n C w.h.p. N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 19 / 24

Integrability of log: Wegner estimates Stieltjes transform controls the density of eigenvalues in short intervals: 1 t µ Yn z ([ t, t]) µ Yn z Cauchy(t) = Im sz n(it). From the loop equations: Im sn(it) z 1 Im p i ( z, it). n Key Proposition Let z 0, t > 0, and let p( z, it), p( z, it) be solutions to LE( z, it). If A n is robustly irreducible, then 1 n Im p i ( z, it) K for some constant K < independent of n, t. N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 20 / 24

Integrability of log Invertibility of structured random matrices Want to show s min (Y n z) n C with high probability (w.h.p.). Recall s min (M) = 0 iff M is singular, and otherwise s min (M) = 1/ M 1. Say a random matrix M is well-invertible w.h.p. if { P M 1 n α} = O(n β ) for some constants α, β > 0. Question: For what choices of n n matrices A = (a ), B = (b ) is Y = 1 ( 1 ) A X + B = n a ξ + b n well-invertible w.h.p.? (For convergence of ESDs we are interested in the shifts B = zi n.) N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 21 / 24

Invertibility of 1 n A X + B = ( 1 n a ξ + b ) Condition number X X 1 for iid matrices is well-studied (von Neumann et al. 40s, Edelman 88, Sankar Spielman Teng 06, Tao Vu 05, 07, Rudelson 05, Rudelson Vershynin 07). Norm: X = O( n) w.h.p. (folklore / Bai Yin) Norm of the inverse: { P X 1 n α(β)} n β Tao Vu 05, 07 { P X 1 } n/ε ε + e cn Rudelson Vershynin 07 (ξ sub-gaussian). So X X 1 = n O(1) w.h.p. N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 22 / 24

Invertibility of 1 n A X + B = ( 1 n a ξ + b ) Structured matrices: 1 n A X + B is well-invertible w.h.p. when a σ 0 > 0, B n O(1) Bordenave Chafaï 11 A is broadly connected, B = O(1) Rudelson Zeitouni 12 for ξ N R (0, 1), C. 16 general case. The above give bounds that are uniform in the shift B, i.e. { ( } 1 1 sup P n A X + B) B: B n nα = O(n β ). O(1) Can we further relax hypotheses on A if B is well-invertible (such as B = zi n for fixed z 0)? N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 23 / 24

Well-invertibility under diagonal perturbation Theorem (C. 16) Let Y = 1 n A X + Z, where 1 X = (ξ ) iid matrix with E ξ = 0, E ξ 2 = 1, E ξ 4+ε = γ < ; 2 A [0, 1] n n arbitrary; 3 Z = diag(z i ) n, z i [r, R] (0, + ) for all i [n]. There exist α, β > 0, C < depending only on ε, γ, r, R such that P ( Y 1 n α) Cn β. Note: Proof gives α = twr[o ε (1) exp((γ/r) O(1) )]. The tower exponential is due to use of Szemerédi s regularity lemma. Conjecture Above holds if Z is any matrix with s i (Z) [r, R] for all i [n]. N. Cook, W. Hachem, J. Najim, D. Renfrew (Stanford) UC Davis, 2017/04/19 24 / 24