SPECTRAL MEASURE OF LARGE RANDOM HANKEL, MARKOV AND TOEPLITZ MATRICES

Similar documents
DISTRIBUTION OF EIGENVALUES OF REAL SYMMETRIC PALINDROMIC TOEPLITZ MATRICES AND CIRCULANT MATRICES

PATTERNED RANDOM MATRICES

Lectures 2 3 : Wigner s semicircle law

DISTRIBUTION OF EIGENVALUES FOR THE ENSEMBLE OF REAL SYMMETRIC TOEPLITZ MATRICES

arxiv: v2 [math.pr] 2 Nov 2009

Empirical Distributions of Laplacian Matrices of Large Dilute Random Graphs. Tiefeng Jiang 1

STAT 7032 Probability Spring Wlodek Bryc

COMPUTING MOMENTS OF FREE ADDITIVE CONVOLUTION OF MEASURES

Lectures 2 3 : Wigner s semicircle law

ON THE HÖLDER CONTINUITY OF MATRIX FUNCTIONS FOR NORMAL MATRICES

Linear Algebra March 16, 2019

LIMITING SPECTRAL MEASURES FOR RANDOM MATRIX ENSEMBLES WITH A POLYNOMIAL LINK FUNCTION

Limit Laws for Random Matrices from Traffic Probability

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

Elementary linear algebra

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

A new method to bound rate of convergence

The Hadamard product and the free convolutions

Wigner s semicircle law

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Convergence of spectral measures and eigenvalue rigidity

Eigenvalue Statistics for Toeplitz and Circulant Ensembles

3 Integration and Expectation

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Boolean Inner-Product Spaces and Boolean Matrices

SUMS PROBLEM COMPETITION, 2000

Notes 6 : First and second moment methods

Random Toeplitz Matrices

Counting Matrices Over a Finite Field With All Eigenvalues in the Field

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with

The Free Central Limit Theorem: A Combinatorial Approach

Laplacian Integral Graphs with Maximum Degree 3

Central Groupoids, Central Digraphs, and Zero-One Matrices A Satisfying A 2 = J

Random regular digraphs: singularity and spectrum

Some functional (Hölderian) limit theorems and their applications (II)

Counting Peaks and Valleys in a Partition of a Set

Quivers of Period 2. Mariya Sardarli Max Wimberley Heyi Zhu. November 26, 2014

Discrete Applied Mathematics

Combinatorics in Banach space theory Lecture 12

Distribution of Eigenvalues for the Ensemble of Real Symmetric Toeplitz Matrices

Lecture I: Asymptotics for large GUE random matrices

On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime. Title

BULK SPECTRUM: WIGNER AND MARCHENKO-PASTUR THEOREMS

Introduction and Preliminaries

arxiv: v1 [math.co] 3 Nov 2014

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

The main results about probability measures are the following two facts:

MATH 722, COMPLEX ANALYSIS, SPRING 2009 PART 5

MATH 117 LECTURE NOTES

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Lectures 6 7 : Marchenko-Pastur Law

Measurable functions are approximately nice, even if look terrible.

Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

ELEMENTARY LINEAR ALGEBRA

Probability and Measure

ACI-matrices all of whose completions have the same rank

Chapter 2 Spectra of Finite Graphs

Brown University Analysis Seminar

1 Math 241A-B Homework Problem List for F2015 and W2016

Kaczmarz algorithm in Hilbert space

Notes on Complex Analysis

Math212a1413 The Lebesgue integral.

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations.

Applications and fundamental results on random Vandermon

4 Sums of Independent Random Variables

Canonical lossless state-space systems: staircase forms and the Schur algorithm

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

Mathematics for Economists

YOUNG TABLEAUX AND THE REPRESENTATIONS OF THE SYMMETRIC GROUP

FREE PROBABILITY THEORY

Lebesgue Measure on R n

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Ahlswede Khachatrian Theorems: Weighted, Infinite, and Hamming

ACO Comprehensive Exam October 14 and 15, 2013

Some Background Material

MATHS 730 FC Lecture Notes March 5, Introduction

Economics 204 Fall 2011 Problem Set 1 Suggested Solutions

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables

Stable periodic billiard paths in obtuse isosceles triangles

ON MULTI-AVOIDANCE OF RIGHT ANGLED NUMBERED POLYOMINO PATTERNS

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

NORMS ON SPACE OF MATRICES

PATH BUNDLES ON n-cubes

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Decomposing dense bipartite graphs into 4-cycles

Modeling and Stability Analysis of a Communication Network System

Fundamentals of Engineering Analysis (650163)

9 Brownian Motion: Construction

Department of Mathematics, University of California, Berkeley. GRADUATE PRELIMINARY EXAMINATION, Part A Fall Semester 2014

Introduction to Real Analysis Alternative Chapter 1

RESEARCH STATEMENT ANIRBAN BASAK

On the convergence of sequences of random variables: A primer

Mi-Hwa Ko. t=1 Z t is true. j=0

Transcription:

SPECTRAL MEASURE OF LARGE RANDOM HANKEL, MARKOV AND TOEPLITZ MATRICES W LODZIMIERZ BRYC, AMIR DEMBO, AND TIEFENG JIANG Abstract We study the limiting spectral measure of large symmetric random matrices of linear algebraic structure For Hankel and Toeplitz matrices generated by iid random variables {X k } of unit variance, and for symmetric Markov matrices generated by iid random variables {X ij } j>i of zero mean and unit variance, scaling the eigenvalues by n we prove the almost sure, weak convergence of the spectral measures to universal, non-random, symmetric distributions γ H, γ M, and γ T of unbounded support The moments of γ H and γ T are the sum of volumes of solids related to Eulerian numbers, whereas γ M has a bounded smooth density given by the free convolution of the semi-circle and normal densities For symmetric Markov matrices generated by iid random variables {X ij } j>i of mean m and finite variance, scaling the eigenvalues by n we prove the almost sure, weak convergence of the spectral measures to the atomic measure at m If m = 0, and the fourth moment is finite, we prove that the spectral norm of M n scaled by 2n log n converges almost surely to one Introduction and main results For a symmetric n n matrix A, let λ j (A), j n denote the eigenvalues of the matrix A, written in a non-increasing order The spectral measure of A, denoted ˆµ(A), is the empirical distribution of its eigenvalues, namely ˆµ(A) = n δ λj (A) n (so when A is a random matrix, ˆµ(A) is a random measure on (R, B)) Large dimensional random matrices are of much interest in statistics, where they play a pivotal role in multivariate analysis In his seminal paper, Wigner (Wigner 958) proved that the spectral measure of a wide class of symmetric random matrices of dimension n converges, as n, to the semi-circle law (also called the Sato-Tate measure, see (Serre 997) and the references therein) Much work has since been done on related random matrix ensembles, either composed of (nearly) independent entries, or drawn according to weighted Haar measures on classical (eg orthogonal, unitary, simplectic) groups The limiting behavior of the spectrum of such matrices and their compositions is of considerable interest for mathematical physics (see (Pastur & Vasilchuk 2000) and the references therein) In addition, such random matrices play an important role in operator algebras studies initiated by Voiculescu, known now as the free (non-commutative) probability theory (see, Date: July 25, 2003; Revised: December 22, 2004 Research partially supported by NSF grants #INT-0332062, #DMS-007233, #DMS-03085 AMS (2000) Subject Classification: Primary: 5A52 Secondary: 60F99, 62H0, 60F0 Keywords: random matrix theory, spectral measure, free convolution, Eulerian numbers

2 W LODZIMIERZ BRYC, AMIR DEMBO, AND TIEFENG JIANG (Hiai & Petz 2000) and the many references therein) The study of large random matrices is also related to interesting questions of combinatorics, geometry and algebra (see the review (Fulton 2000), or for example (Speicher 997)) In his recent review paper (Bai 999), Bai proposes the study of large random matrix ensembles with certain additional linear structure In particular, the properties of the spectral measures of random Hankel, Markov and Toeplitz matrices with independent entries are listed among the unsolved random matrix problems posed in (Bai 999, Section 6) We shall provide here the solution for these three problems We note in passing that Hankel matrices arise for example in polynomial regression, as the covariance for the least squares parameter estimation for the model p i=0 b ix i, observed at x = x,, x n in the presence of additive noise (see (Sen & Srivastava 990, page 36)) Toeplitz matrices appear as the covariance of stationary processes, in shift-invariant linear filtering, and in many aspects of combinatorics, time series and harmonic analysis See (Grenander & Szegő 984) for classical results on deterministic Toeplitz matrices, or (Diaconis 2003) and the references therein, for their applications to certain random matrices The infinitesimal generators of continuous time Markov processes on finite state spaces are given by matrices with row-sums zero (which we call Markov matrices) Such matrices also play an important role in graph theory, as the Laplacian matrix of each graph is of this form, with its eigenvalues related to numerous graph invariants, see (Mohar 99) We next specify the corresponding ensembles of random matrices studied here Let {X k : k = 0,, 2 } be a sequence of iid real-valued random variables For n N, define a random n n Hankel matrix H n = [X i+j ] i,j n, X X 2 X n X n X 2 X 3 X n X n+ X H n = n+ X n+2 () X n 2 X n, X n X n X 2n 3 X 2n 2 X n X n+ X 2n 2 X 2n and a random n n Toeplitz matrix T n = [X i j ] i,j n, X 0 X X 2 X n 2 X n X X 0 X X n 2 T n = X 2 X X 0 (2) X2 X n 2 X 0 X X n X n 2 X 2 X X 0 The limiting spectral distribution for a Toeplitz matrix T n is as follows Theorem Let {X k : k = 0,, 2, } be a sequence of iid real-valued random variables with Var(X ) = Then with probability one, ˆµ(T n / n) converges weakly as n to a non-random symmetric probability measure γ T which does not depend on the distribution of X, and has unbounded support The spectrum of non-random Toeplitz matrices, the rows of which are typically absolutely summable, is well approximated by its counterpart for circulant matrices (cf (Grenander & Szegő 984, page 84)) In contrast, note that the limiting distribution γ T is not normal as the calculation shows that the fourth moment is

HANKEL, MARKOV, TOEPLITZ MATRICES 3 m 4 = 8/3 This differs from the analogous results for random circulant matrices, see (Bose & Mitra 2002), a fact that has been independently noticed also in references (Bose, Chatterjee & Gangopadhyay 2003) and (Hammond & Miller 2003) Our next result gives the limiting spectral distribution for a Hankel matrix H n Theorem 2 Let {X k : k = 0,, 2, } be a sequence of iid real-valued random variables with Var(X ) = Then with probability one, ˆµ(H n / n) converges weakly as n to a non-random symmetric probability measure γ H which does not depend on the distribution of X, has unbounded support, and is not unimodal (Recall that a symmetric distribution ν is said to be unimodal, if the function x ν((, x]) is convex for x < 0) Remark Theorems and 2 fall short of establishing that the limiting distributions have smooth densities and that the density of γ H is bimodal Simulations suggest that these properties are likely to be true, see Figure Remark 2 Consider the empirical distribution of singular values of the nonsymmetric random n n Toeplitz matrix R n = [X i j ] i,j n It follows from Theorem 2 that as n, with probability one ˆµ((R n R T n ) /2 / n) ν weakly, where ν([0, x]) = γ H ([ x, x]), x > 0 Indeed, let J n = [ i+j=n+ ] i,j n, noting that J n R T n is the Hankel matrix H n for {X k n : k = 0,, } to which Theorem 2 applies Since J 2 n = I n, and both J n and J n R T n are symmetric, we have R n R T n = (R n J n ) T J n R T n = H 2 n Thus the singular values of matrix R n are the absolute values of the (real) eigenvalues of the symmetric Hankel matrix H n We now turn to the Markov matrices M n Let {X ij : j i } be an infinite upper triangular array of iid random variables and define X ji = X ij for j > i Let M n be a random n n symmetric matrix given by (3) M n = X n D n, where X n = [X ij ] i,j n and D n = diag( n X ij) i n is a diagonal matrix, so each of the rows of M n has a zero sum (note that the values of X ii are irrelevant for M n ), that is M n = n j=2 X j X 2 X 3 X n X 2 n j 2 X 2j X 23 X 2n X k X k2 n j k X kj X kn X n X n2 n X nj Wigner s classical result says that ˆµ(X n / n) converges weakly as n to the (standard) semi-circle law with the density 4 x 2 /(2π) on ( 2, 2) For normal X n and normal iid diagonal D n independent of X n, the weak limit of ˆµ((X n D n )/ n) is the free convolution of the semi-circle and standard normal measures, see (Pastur & Vasilchuk 2000) and the references therein (see also (Biane 997) for the definition and properties of the free convolution) This predicted result holds also for the Markov matrix M n, but the problem is non-trivial because D n strongly depends on X n Theorem 3 Let {X ij : j i } be a collection of iid random variables with IEX 2 = 0, and Var(X 2 ) = With probability one, ˆµ(M n / n) converges

4 W LODZIMIERZ BRYC, AMIR DEMBO, AND TIEFENG JIANG weakly as n to the free convolution γ M of the semi-circle and standard normal measures This measure γ M is a non-random symmetric probability measure with smooth bounded density, does not depend on the distribution of X 2, and has unbounded support If the mean of X ij is not zero, the following result is relevant Theorem 4 Let {X ij : i, j N, j i } be a collection of iid random variables with IEX 2 = m and IEX 2 2 < Then ˆµ(M n /n) converge weakly to δ m as n Turning to the asymptotic of the spectral norm M n := max{λ (M n ), λ n (M n )} of the symmetric matrix M n, that is, the largest absolute value of its eigenvalues, we have that Theorem 5 Let {X ij : i, j N, j i } be a collection of iid random variables with IEX 2 = 0, Var(X 2 ) =, and IEX 2 4 < Then lim n M n 2n log n = as If the mean of X ij is not zero, the following result is relevant Corollary 6 Suppose IEX 2 = m and IEX 4 2 < Then M n lim = m as n n Theorem 5 reveals a scaling in n that differs from that of the spectral norm of Wigner s ensemble, where under the same conditions, almost surely, (4) X n lim = 2 n n (cf (Bai 999, Theorem 22)) As shown in Section 2 en-route to proving Theorems 4, 5 and Corollary 6, this is due to the domination of the diagonal terms of M n in determining its spectral norm Remark 3 The asymptotic of the spectral norm of random Toeplitz T n and Hankel H n matrices is not addressed in this work Theorems 4, 5 and Corollary 6 are proved in Section 2 The proofs of Theorems and 2, which are similar to each other, ultimately rely on the method of moments and the well known relation x k ˆµ(A)(dx) = n trak for a n n symmetric matrix A We begin in Section 3 by introducing the combinatorial structures which describe the moments of the limiting distributions (Proofs of the properties of the limiting distributions are postponed to the Appendix) Then in Section 4 we use truncation arguments to reduce the theorems to the case when the expected values of the moments of the spectral measures are finite In Section 42 we show that under suitable integrability assumptions the expected values of moments of the spectral measures converge to the corresponding expressions from Section 3 as the size of the matrix n Representing the moments as traces, we use independence of the entries and combinatorial arguments to discard the irrelevant terms in the expansions (47) and (42) In Section 44 we show that

HANKEL, MARKOV, TOEPLITZ MATRICES 5 the moments of the spectral measures are concentrated around their means, which allows us to conclude the proofs in Section 45 The proof of Theorem 3 follows a similar plan, with truncation argument in Section 4, followed by combinatorial analysis of expansion (47) for the traces and concentration of moments in Section 46 2 Proofs of Theorems 4, 5 and Corollary 6 We need the following result, which follows by Chebyshev s inequality from Sakhanenko (Sakhanenko 985, Section 6, Theorem 5), or (Sakhanenko 99, Section 5, Corollary 5) Lemma 2 (Sakhanenko) Let {ξ i ; i =, 2, } be a sequence of independent random variables with mean zero and IEξ i 2 = σi 2 If IE ξ i p < for some p > 2, then there exists a constant C > 0 and {η i, i =, 2, }, a sequence of independent normally distributed random variables with η i N(0, σi 2 ) such that IP( max S C k T k > x) k n + x p n IE ξ i p for any n and x > 0, where S k = k ξ i and T k = k η i Proof of Theorem 5 Hereafter let b(n) = 2n log n denote the normalization function for Theorem 5 It follows from (3) that M n D n X n So, by (4) and the definition of D n, it suffices to show that as n, (2) (22) W n := n max b(n) { n X ij } as We first show the upper bound, that is, lim sup W n as n Note that {X ij ; j } is a sequence of iid random variables for each i By Lemma 2 and the condition that IE X 2 4 <, for each i, there exists a sequence of independent standard normals {Y ij ; j } such that (23) ( n max IP n max k= k (X ij Y ij ) ) > x Cn x 4 for all x > 0 and n, where C is a constant which does not depend on n and x (note that two sequences {Y ij ; j } for different values of i are not independent of each other) We claim that (24) as n First, U n := n max b(n) { n (X ij Y ij ) } 0 as 2 max m+ U k k=2 m b(2 m ) 2 max m+ 2 max m+ k { (X ij Y ij ) } k=

6 W LODZIMIERZ BRYC, AMIR DEMBO, AND TIEFENG JIANG By (23), for any ε > 0, ( 2 m+ ) ( IP max U k ε 2 m+ 2 m+ k IP max k=2 m k= ) (X ij Y ij ) εb(2 m ) C ε m 2 for some constant C ε depending only on ε Since ε > 0 is arbitrary, by the Borel- Cantelli lemma, max 2m+ k=2 m U k 0 as as m, which implies (24) Let V n = n max b(n) n Y ij By the definitions in (2) and (24), we have that W n U n + V n, so by (24) we get (22) as soon as we show that lim sup n V n To this end, fix δ > 0 and α > /δ Then, (25) IP ( (m+) α max n=m α ) V n + δ (m + ) α IP ( (m+) α max n= n ) Y j ( + δ)b(m α ) ( (m+) α ) 2(m + ) α IP Y j ( + δ)b(m α ), where Levy s inequality is used in the second step Since Y ij s are independent standard normals, ξ := (m + ) α/2 (m+) α Y j is a standard normal random variable Thus, by the well known normal tail estimate (26) we see that x 2 2π + x 2 e x /2 IP(ξ > x) 2 2π x e x /2 for x > 0, ( ) IP ξ ( + δ)(m + ) α/2 b(m α ) α(+δ) Ĉδm for some constant Ĉδ > 0 Consequently, for some C δ > 0 and all m, by (25), ( (m+) α ) IP V n + δ C δm αδ max n=m α With αδ >, we have by the Borel-Cantelli lemma that, { (m+) α } lim sup max V m n=m α n + δ as It follows that lim sup n V n + δ as and taking δ 0 we obtain (22) We next prove that (27) lim inf n W n as To this end, fixing /3 > ε > δ > 0, let n ε := [n ε ] + Then, (28) W n n ε max b(n) n X ij b(n) n ε max n X ij b(n) j=n ε+ n ε max nε X ij =: V n, V n,2

HANKEL, MARKOV, TOEPLITZ MATRICES 7 By (22), lim sup n W nε as Thus, with b(n ε )/b(n) 0 as n, we have that (29) b(n ε ) V n,2 = W nε b(n) 0 as Since {X ij ; i n ε, n ε < j n} are iid for any n, it follows that ( n n ε nε (20) IP(V n, 3δ) = IP X j ( 3δ)b(n)) With b(n) n, by Lemma 2 there exists a sequence of independent standard normals {Y j } such that for some C = C(δ) < and all n (2) ( n n ε IP n n ε X j ) Y j δb(n) Cn Further, by the left inequality of (26) we have that for all n sufficiently large, IP ( n n ε Y j ( 2δ)b(n) ) IP( Y ( δ) 2 log n) 2n ( δ) Combining this bound with (2) and (20) we get that for all n large enough IP(V n, 3δ) ( 2n ( δ) + Cn ) n ε ( n ( δ) ) n ε e nε δ Recall that ε > δ, implying that n IP(V n, 3δ) < By the Borel-Cantelli lemma, lim inf n, 3δ as n This together with (28) and (29) implies that almost surely lim inf n W n 3δ, and the lower bound (27) follows by taking δ 0 Proof of Corollary 6 Let M n denote the Markov matrix obtained when X ij = X ij IEX ij replace X ij in (3) Obviously, (22) M n = M n + Y n, where Y n = [Y ij ] is the n n matrix with Y ij = m nm i=j Clearly, λ (Y n ) = 0, λ 2 (Y n ) = = λ n (Y n ) = nm, so Y n = n m By (22) and Theorem 5, we have that M n n Y n n M n 0 n as n This implies that M n /n m as In the context of this paper, the next lemma is very handy for truncation purposes Lemma 22 Let {X ij : j > i } be an infinite triangular array of iid random variables with IEX 2 = 0 and Var(X 2 ) = σ 2 Let X ji = X ij for i < j and set X ii = 0 for all i Then, n n n 2 ( X ij ) 2 σ 2 as as n

8 W LODZIMIERZ BRYC, AMIR DEMBO, AND TIEFENG JIANG Proof Define (23) Then n 2 U n := n j<k n n n ( X ij ) 2 = n 2 n X ij X ik n Xij 2 + 2 n 2 U n By the strong Law of Large Numbers, the first term on the right hand side converges almost surely to σ 2, so it suffices to show that (24) U n n 2 0 as To this end, denote by F k the σ-algebra generated by the random variables {X ij, i, j k} Noting that U n+ U n = n n X (n+)j X (n+)k + X ij X i(n+), j<k n it is easy to verify that {U n : n } is a martingale for the filtration {F n : n } Further, the n 2 (n )/2 terms in the sum (23) are uncorrelated Indeed, if i i and j < k, j < k then IE(X ij X ik X i j X i k ) = 0 as at least one of the four variables in this product must be independent of the others Thus, IE(Un) 2 σ 4 n 2 (n )/2 for any n 2, and by Doob s sub-martingale inequality IP( max U i m 4 ε) IE(U m 2 ) 2 i m 2 m 8 ε 2 σ4 m 2 ε 2 It follows by the Borel-Cantelli Lemma, that almost surely Z m := m 4 max U i 0, i m 2 as m Since n 2 U n (m/(m )) 4 Z m whenever (m ) 2 n m 2, m 2, we thus get (24) Let d BL denote the bounded Lipschitz metric (25) d BL (µ, ν) = sup{ fdµ fdν : f + f L }, where f = sup b f(x), f L = sup x y f(x) f(y) / x y It is well known, see(dudley 2002, Section 3), that d BL is a metric for the weak convergence of measures For the spectral measures of n n symmetric real matrices A, B we have d BL (ˆµ(A), ˆµ(B)) sup{ n f(λ j (A)) f(λ j (B)) : f L } n n n λ j (A) λ j (B) By Lidskii s theorem (Lidskiĭ 950), see also (Bai 999, Lemma 23), n λ j (A) λ j (B) 2 tr((b A) 2 ),

HANKEL, MARKOV, TOEPLITZ MATRICES 9 so (26) d 2 BL(ˆµ(A), ˆµ(B)) n tr((b A)2 ) Proof of Theorem 4 We use the notation from the proof of Corollary 6 and write σ 2 = Var(X ) By (22) and (26) the bounded Lipschitz metric (25) satisfies ( /2 (27) d BL (ˆµ(M n /n), ˆµ(Y n /n)) n 3 tr( M n)) 2 Note that { X ij ; i j} are iid random variables with mean zero and finite variance By the classical strong Law of Large Numbers and Lemma 22 n 2 tr( M 2 n) = 2 X 2 n 2 ij + n n (28) n 2 ( X ij ) 2 2σ 2 as i<j n as n Recall that all but one of the eigenvalues of Y n are nm, hence ˆµ(Y n /n) converges weakly to δ m Combining this with (27) and (28), we have that almost surely, ˆµ(M n /n) converges weakly to δ m j i 3 The limiting distributions γ H, γ M, and γ T 3 Moments For a probability measure γ on (R, B), denote its moments by m k (γ) = x k γ(dx) The probability measures γ H, γ M, and γ T will be determined from their moments It turns out that the odd moments are zero, and the even moments are the sums of numbers labeled by the pair partitions of {,, 2k} It is convenient to index the pair partitions by the partition words w; these are words of length w = 2k with k pairs of letters such that the first occurrences of each of the k letters are in alphabetic order In the case k = 2 we have 3 such partition words which correspond to the pair partitions aabb abba abab, {, 2} {3, 4} {, 4} {2, 3} {, 3} {2, 4} of {, 2, 3, 4} Recall that the number of pair partitions of {,, 2k} is 3 (2k ) Definition 3 For a partition word w, we define its height h(w) as the number of encapsulated partition sub-words, i e, substrings of the form xw x, where x is a single letter, and w is either a partition word, or the empty word For example, h(abcabc) = 0, h(abcbca) = h(abccab) =, while h(aabbcc) = h(abccba) = 3 (the encapsulating pairs of letters are underlined) In the terminology of (Bożejko & Speicher 996), h assigns to a pair partition the number of connected blocks which are of cardinality 2 These connected blocks of cardinality 2 are the pairs of letters underlined in the previous examples

0 W LODZIMIERZ BRYC, AMIR DEMBO, AND TIEFENG JIANG In Proposition B2 we show that the even moments of the free convolution γ M of the semi-circle and standard normal measures are given by (3) m 2k (γ M ) = 2 h(w) w: w =2k For the Toeplitz and Hankel case, with each partition word w we associate a system of linear equation which determine the cross-section of the unit hypercube, and define the corresponding volume p(w) We have to consider these two cases separately 32 Toeplitz volumes Let w[j] denote the letter in position j of the word w For example, if w = abab then w[] = a, w[2] = b, w[3] = a, w[4] = b To every partition word w we associate the following system of equations in unknowns x 0, x,, x 2k (32) x x 0 + x m x m = 0 if m > is such that w[] = w[m ] x 2 x + x m2 x m2 = 0 if there is m 2 > 2 such that w[2] = w[m 2 ] x i x i + x mi x mi = 0 if there is m i > i such that w[i] = w[m i ] x 2k x 2k 2 + x 2k x 2k = 0 if w[2k ] = w[2k] Although we list 2k equations, in fact k of them are empty Informally, the left hand-sides of the equations are formed by adding the differences over the same letter when the variables are written in the space between the letters For example, writing the variables between the letters of the word w = ababcc we get (33) x 0 a x b x 2 a x 3 b x 4 c x 5 x n c x n+ The corresponding system of equations is x x 0 + x 3 x 2 = 0 x 2 x + x 4 x 3 = 0 (34) x 5 x 4 + x n+ x n = 0 Since in every partition word w of length 2k there are exactly k distinct letters, this is the system of k equations in 2k + unknowns We solve it for the variables that follow the last occurrence of a letter, leaving us with k + undetermined variables: x 0, and the k variables that follow the first occurrence of each letter We then require that the dependent variables lie in the interval I = [0, ] This determines a cross-section of the cube I k+ in the remaining undetermined k + coordinates, the volume of which we denote by p T (w) For example, if w = abab, solving the first pair of equations (34) for x 3 = x 0 x + x 2, x 4 = x 0, defines the solid {x 0 x + x 2 I} {x 0 I} I 3, which has the (Eulerian) volume p T (abab) = 4/3! = 2/3 We define measure γ T as a symmetric measure with even moments (35) m 2k (γ T ) = p T (w) w: w =2k

HANKEL, MARKOV, TOEPLITZ MATRICES From Proposition 45 below it follows that (35) indeed defines a positive definite sequence of numbers so that these are indeed the even moments of a probability measure Since m 2k is at most the number (2k )!! of words of length 2k, these moments determine the limiting distribution γ T uniquely 33 Hankel volumes We proceed similarly to the Toeplitz case With each partition word w we associate the following system of equations in unknowns x 0, x,, x 2k (36) x + x 0 = x m + x m if m > is such that w[] = w[m ] x 2 + x = x m2 + x m2 if there is m 2 > 2 such that w[2] = w[m 2 ] x i + x i = x mi + x mi if there is m i > i such that w[i] = w[m j ] x 2k + x 2k 2 = x 2k + x 2k if w[2k ] = w[2k] Informally, the equations are formed by equating the sums of the variables at the same letter For example, the word abab with the variables written as in (33) gives rise to the system of equations { x + x 0 = x 3 + x (37) 2 x 2 + x = x 4 + x 3 As in the Toeplitz case, since there are exactly k distinct letters in the word, this is the system of k equations in 2k + unknowns We solve it for the variables that precede the first occurrence of a letter, leaving us with k undetermined variables, x α,, x αk = x 2k that precede the second occurrence of each letter, and with the (k + )-th undetermined variable x 2k We add to the system (36) one more equation: x 0 = x 2k As previously, we require that the dependent variables are in the interval I = [0, ] This determines a cross-section of the cube I k+ in the remaining k + coordinates with the volume which we denote by p H (w) Due to the additional constraint x 2k = x 0, this volume might be zero For example, equations (37) have solutions x 0 = 2x 2 x 4, x = x 3 x 2 + x 4 with undetermined variables x 2, x 3, x 4 Equation x 0 = x 4 gives additional relation x 4 = x 2, and reduces the dimension of the solid {2x 2 x 4 I} {x 3 x 2 +x 4 I} {x 4 = x 2 } I 3 to 2 Thus the corresponding volume is p H (abab) = 0 We define measure γ H as a symmetric measure with even moments (38) m 2k (γ H ) = p H (w) w: w =2k From Proposition 47 below it follows that (38) indeed defines a positive definite sequence of numbers so that these are indeed the even moments of a probability measure Since m 2k is at most the number (2k )!! of words of length 2k, these moments determine the limiting distribution γ H uniquely

2 W LODZIMIERZ BRYC, AMIR DEMBO, AND TIEFENG JIANG 34 Relation to Eulerian numbers The Eulerian numbers A n,m are often defined by their generating function or by the combinatorial description as the number of permutations σ of {,, n} with σ i > σ i for exactly m choices of i =, 2,, n (taking σ 0 = 0) The geometric interpretation is that A n,m /n! is the volume of a solid cut out of the cube I n by the set {x + + x n [m, m]}, see (Tanny 973) Converting any m of the coordinates x to x, we get that A n,m /n! is the volume of a solid cut out of the cube I n by the set {(x,, x n ) R n : x + x 2 + + x n m (x n m+ + + x n ) I} The solids we encountered in the formula for the 2k-th moments are the intersections of solids of this latter form, with odd values of n, each having m = (n )/2, and with various subsets of the coordinates entering the expression Another interesting representation is ( ) Vol {(x,, x n ) I n : x + x 2 + + x n m (x n m+ + + x n ) I} = 2 ( ) n+ sin t cos((n + 2m)t) dt π 0 t This follows from the integral representation of Eulerian numbers in (Nicolas 992) Remark 3 One can verify that the probabilities p T (w) and p H (w) are rational numbers, and hence so are m 2k (γ T ) and m 2k (γ H ), defined by formulas (35) and (38) (for details, cf (Bryc, Dembo & Jiang 2003)) 4 Proofs of Theorems, 2 and 3 4 Truncation and centering We first reduce Theorems, 2 and 3 to the case of bounded iid random variables, and in case of Theorems and 2, also allow for centering of these variables Proposition 4 (i) If Theorem or Theorem 2 holds true for all bounded independent iid sequences {X j } with mean zero and variance, then it holds true for all square-integrable iid sequences {X j } with variance (ii) If Theorem 3 holds true for all bounded independent iid collections {X ij } with mean zero and variance, then it holds true for all squareintegrable iid collections {X ij } with mean zero and variance Proof Without loss of generality, we may assume that IE(X ) = 0 in Theorems and 2 Indeed, from the rank inequality, (Bai 999, Lemma 22) it follows that subtracting a rank matrix of the means IE(X ) from matrices T n and H n does not affect the asymptotic distribution of the eigenvalues For a fixed u > 0, denote and let m(u) = IEX I { X >u}, σ 2 (u) = IEX 2 I { X u} m 2 (u) Clearly, σ 2 (u) and since IE(X ) = 0, IE(X 2 ) =, we have m(u) 0 and σ(u) as u Let X = X I { X >u} m(u)

HANKEL, MARKOV, TOEPLITZ MATRICES 3 Notice that σ 2 (u) = IE(X X ) 2, therefore the bounded random variable X = X X σ(u) has mean zero and variance Denote by T n, H n the corresponding Toeplitz and Hankel matrices constructed from the independent bounded random variables X j := X j X j σ(u) distributed as X By the triangle inequality for d BL (, ) and (26), d 2 BL(ˆµ(T n / n), ˆµ(T n/ n)) 2d 2 BL(ˆµ(T n / n), ˆµ(σ(u)T n/ n)) + 2d 2 BL(ˆµ(T n/ n), ˆµ(σ(u)T n/ n)) 2 n 2 tr((t n σ(u)t n) 2 ) + 2 n 2 ( σ(u))2 tr((t n) 2 ) It is easy to verify that IE( X 2 ) = σ 2 (u) 2m(u) 2 and that with probability one n 2 tr((t n σ(u)t n) 2 ) = n X 0 2 + 2 n ( j ) (4) X j 2 IE( n n X 2 ), as n (for example, sandwiching the coefficients j/n between the piecewise constant l lj/n and l lj/n allows for applying the strong Law of Large Numbers, with the resulting non-random bounds converging to IE( X 2 ) as l ) Similarly, (42) n 2 tr((t n) 2 ) = n (X 0) 2 + 2 n n ( j ) (X n j) 2 IE((X ) 2 ) For large u, both m(u) and σ(u) are arbitrarily small So, in view of (4) and (42), with probability one the limiting distance in the bounded Lipschitz metric d BL between ˆµ(T n / n) and ˆµ(T n/ n) is arbitrarily small, for all u sufficiently large Thus, if the conclusion of Theorem holds true for all sequences of independent bounded random variables {X j }, with the same limiting distribution γ T, then ˆµ(T n / n) must have the same weak limit with probability one Similarly, we have d 2 BL(ˆµ(H n / n), ˆµ(H n/ n)) 2 n 2 tr((h n σ(u)h n) 2 ) + 2 n 2 ( σ(u))2 tr((h n) 2 ) By the same argument as before, with probability one 2n ( n 2 tr((h n σ(u)h n) 2 ) = n j=0 j n n ) X j 2 IE( X 2 ), and n 2 tr((h n) 2 ) IE((X ) 2 ) Therefore, with probability one the limiting d BL - distance between ˆµ(H n / n) and ˆµ(H n/ n) is arbitrarily small for large enough u Similarly, denoting by M n, M n the corresponding Markov matrices constructed from the independent bounded random variables X ij and X ij := X ij X ij σ(u), we have d 2 BL(ˆµ(M n / n), ˆµ(M n/ n)) 2 n 2 tr( M 2 n) + 2 n 2 ( σ(u))2 tr((m n) 2 )

4 W LODZIMIERZ BRYC, AMIR DEMBO, AND TIEFENG JIANG By (28), with probability one n 2 tr((m n) 2 ) 2 and n 2 tr( M 2 n) 2IE( X 2) 2 Therefore, with probability one, the limiting d BL -distance between ˆµ(M n / n) and ˆµ(M n/ n) is arbitrarily small for large enough u 42 Combinatorics for Hankel and Toeplitz cases For k, n N, consider circuits in {,, n} of length L(π) = k, ie, mappings π : {0,,, k} {, 2,, n}, such that π(0) = π(k) Let s : N 2 N be one of the following two functions: s T (x, y) = x y, or s H (x, y) = x + y We will use s to match (ie pair) the edges (π(i ), π(i)) of a circuit π The main property of the symmetric function s is that for a fixed value of s(m, n), every initial point m of an edge determines uniquely a finite number (here, at most 2) of the other end-points: if k, m N, then (43) #{y N : s(m, y) = k} 2 For a fixed s as above, we will say that circuit π is s-matched, or has selfmatched edges, if for every i L(π) there is j i such that s(π(i ), π(i)) = s(π(j ), π(j)) We will say that a circuit π has an edge of order 3, if there are at least three different edges in π with the same s-value The following proposition says that generically self-matched circuits have only pair-matches Proposition 42 Fix r N Let N denote the number of s-matched circuits in {,, n} of length r with at least one edge of order 3 Then there is a constant C r such that N C r n (r+)/2 In particular, as n we have N n +r/2 0 Proof Either r = 2k is an even number, or r = 2k is an odd number In both cases, if an s-matched circuit has an edge of order 3, then the total number of distinct s-values {s(π(i ), π(i)) : i L(π)} is at most k We can think of constructing each such circuit from the left to the right First, we choose the locations for the s-matches along {,, r} This can be done in at most r! ways Once these locations are fixed, we proceed along the circuit There are n possible choices for the initial point π(0) There are at most n choices for each new s-value, and there are at most 2 ways to complete the edge for each repeat of the already encountered s-value Therefore there are at most r! n n k 2 r+ k C r n k such circuits We say that a set of circuits π, π 2, π 3, π 4 is matched if each edge of any one of these circuits is either self-matched ie, there is another edge of the same circuit with equal s-value, or is cross-matched, ie, there is an edge of the other circuit with the same s-value (or both) The following bound will be used to prove almost sure convergence of moments Proposition 43 Fix r N Let N denote the number of matched quadruples of circuits in {,, n} of length r such that none of them is self-matched Then there is a constant C r such that N C r n 2r+2

HANKEL, MARKOV, TOEPLITZ MATRICES 5 Proof First observe that there are at most 2r distinct s-values in the 4r edges of a matched quadruples of circuits of length r Further, the number of quadruples of such circuits for which there are exactly u distinct s-values is at most C r,u n u+4 Indeed, order the edges (π j (i ), π j (i)), of such quadruples starting at j =, i =, then i = 2,, r, followed by j = 2, i = and then i = 2,, r, etc There are at most u 4r possible allocations of the distinct s-values to these 4r edges, at most n 4 choices for the starting points π (0), π 2 (0), π 3 (0), and π 4 (0) of the circuits and at most n u for the values of π j (i) at those (j, i) for which (π j (i ), π j (i)) is the leftmost occurrence of one of the distinct s-values Once these choices are made, we proceed to sequentially determine the mapping π (i) from i = 0 to i = r, followed by the mappings π 2, π 3, π 4, noting that by (43) at most 2 4r u 4 quadruples can be produced per such choice Recall that the number of possible partitions P of the 4r edges of our quadruple of circuits into P distinct groups of s-matching edges, with at least two edges in each group, is independent of n Thus, by the preceding bound it suffices to show that for each partition P with P {2r, 2r} such that each circuit shares at least one s-value with some other circuit, there correspond at most Cn 2r+2 matched quadruples of circuits in {,, n} To this end, note that P = 2r implies that each s-value is shared by exactly two edges, while when P = 2r we also have either two s-values shared by three edges each or one s-value shared by four edges (but not both) Fixing hereafter a specific partition P of this type, it is not hard to check that upon re-ordering our four circuits we have an s-value that is assigned to exactly one edge of the circuit π, denoted hereafter (π (i ), π (i )), and in case P = 2r, we also have another s-value that does not appear in π and is assigned to exactly one edge of π 2, denoted hereafter (π 2 (j ), π 2 (j )) (Though this property may not hold for all ordering of the four circuits, an inspection of all possible graphs of cross-matches shows that it must hold for some order) We are now ready to improve our counting bound for the case of P = 2r, by the following dynamic construction of π : First choose one of the n possible values for the initial value π (0), and continue filling in the values of π (i), i =, 2,, i Then, starting at π (r) = π (0), sequentially choose the values of π (r ), π (r 2),, π (i ), thus completing the entire circuit π This is done in accordance with the s-matches determined by P, so there are n ways to complete an edge that has no s-match among the edges already constructed, while by (43) if an edge is matching one of the edges already available, then it can be completed in at most 2 ways Since this procedure determines uniquely the edge (π (i ), π (i )) and hence the s-value assigned to it, it reduces to 2r 2 the number of s-matches that can each independently assume O(n) values Consequently, the number of quadruples of circuits corresponding to P is at most Cn 2r+2 In case P = 2r, we first construct π by the preceding dynamic construction while determining the s-value for the edge (π (i ), π (i )) out of the circuit condition for π Then, we repeat the dynamic construction for π 2, keeping it in accordance with the s-values determined already by edges of π and uniquely determining the edge (π 2 (j ), π 2 (j )) and hence the s-value assigned to it, by the circuit condition for π 2 Thus, we have again reduced the total number of s-matches

6 W LODZIMIERZ BRYC, AMIR DEMBO, AND TIEFENG JIANG that can each independently assume O(n) values to 2r 2, and consequently, the number of quadruples of circuits corresponding to P is again at most Cn 2r+2 The next result deals only with the slope matching function s T (x, y) = x y Proposition 44 Fix k N Let N be the number of s T -matched circuits π in {,, n} of length 2k with at least one pair of s T -matched edges (π(i ), π(i)) and (π(j ), π(j)) such that π(i) π(i )+π(j) π(j ) 0 Then, as n we have n (k+) N 0 Proof By Proposition 42, we may and shall consider throughout path π in {,, n} of length 2k for which the absolute values of the slopes π(i) π(i ) take exactly k distinct non-zero values and, for π to be a circuit, the sum of all 2k slopes is zero Let P denote a partition of the 2k slopes to s T -matching pairs, indicating also whether each slope is negative or positive, with m(p) denoting the number of such pairs for which both slopes are positive Observe that if under P both slopes of some s T -matching pair are negative, then necessarily m(p), for otherwise the sum of all slopes will not be zero for any path corresponding to P Thus, it suffices to show that at most n k circuits π correspond to each P with m = m(p) Indeed, fixing such P, there are at most n ways to choose π(0) and n k m ways to choose the k m pairs of slopes for which at least one slope in each pair is negative The remaining m pairs of s T -matching positive slopes are to be chosen among {,, n} subject to a specified sum (due to the circuit condition) Since there are at most n m ways for doing so, the proof is complete 43 Moments of the average spectral measure Proposition 45 Suppose {X j } is a sequence of bounded iid random variables such that IE(X ) = 0, IE(X 2 ) = Then for k N (44) lim IEtr(T2k n nk+ n ) = p T (w), and (45) lim n w: w =2k IEtr(T2k nk+/2 n ) = 0 Proof For a circuit π : {0,,, r} {, 2,, n} write r (46) X π = X π(i) π(i ) Then (47) IEtr(T r n) = π IEX π, where the sum is over all circuits in {,, n} of length r By Hölder s inequality, for any finite set Π of circuits of length r (48) π Π IEX π IE( X r )#Π Since X r is bounded, we can use the bound (48) to discard the non-generic circuits from the sum in (47) To this end, note that since the random variables

HANKEL, MARKOV, TOEPLITZ MATRICES 7 {X j } are independent and have mean zero, the term IEX π vanishes for every circuit π with at least one unpaired X j Since T n is a symmetric matrix, by (46) paired variables correspond to the slopes of the circuit π which are equal in absolute value Hence, the only circuits that make a non-zero contribution to (47) are those with matched absolute values of the slopes This fits the formalism of Section 42 with the matching function s T (x, y) = x y If r = 2k > 0 is odd then each s T -matched circuit π of length r must have an edge of order 3 From (48) and Proposition 42 we get IEtr(T 2k n ) Cn k, proving (45) When r = 2k is an even number, let Π be the set of all circuits π : {0,,, 2k} {,, n} with the set of slopes {π(i) π(i ) : i =,, 2k} consisting of k distinct non-negative integers s,, s k and their counterparts s,, s k From (48) and Proposition 44 it follows that lim n n k+ IEtr(Tr n) IEX π = 0 π Π Moreover, for every circuit π Π, if X j enters the product X π then it occurs in it exactly twice, resulting with IEX π =, and consequently with π Π IEX π = #Π Therefore, the following lemma completes the proof of (44), and with it, that of Proposition 45 Lemma 46 lim n n k+ #Π = w p T (w), where the sum is over the finite set of partition words w of length 2k Proof The circuits in Π can be labeled by the partition words w of length 2k which list the positions of the pairs of s T -matches along {,, 2k} This generates the partition Π = w Π(w) into the corresponding equivalence classes To every such partition word w we can assign n k+ paths π(i) = x i, i = 0,, 2k obtained by solving the system of equations (32), with values, 2,, n for each of the k + undetermined variables, and the remaining k values computed from the equations (which represent the relevant s T -matches for any π Π(w)) Some of these paths will fail to be in the admissible range {,, n} Let p n (w) be the fraction of the n k+ paths that stay within the admissible range {,, n}, noting that by Proposition 42, p n (w) n (k+) #Π(w) 0 Interpreting the undetermined variables x j as the discrete uniform independent random variables with values {, 2,, n}, p n (w) becomes the probability that the computed values stay within the prescribed range As n, the k + undetermined variables x j /n converge in law to independent uniform U[0, ] random variables U j Since p n (w) is the probability of the (independent of n) event A w that the solution of (32) starting with x j /n {/n, 2/n,, } has all the dependent variables in (0, ], it follows that p n (w) converges to p T (w), the probability of the event A w that the corresponding sums of independent uniform U[0, ] random variables take their values in the interval [0, ] Next we give the Hankel version of Proposition 45

8 W LODZIMIERZ BRYC, AMIR DEMBO, AND TIEFENG JIANG Proposition 47 Let {X j } be a sequence of bounded iid random variables such that IE(X ) = 0, IE(X 2 ) = For k N, (49) lim IEtr(H2k n nk+ n ) = p H (w), w: w =2k and (40) lim IEtr(H2k n nk+/2 n ) = 0 Proof We mimic the procedure for the Toeplitz case For a circuit π : {0,,, r} {, 2,, n} write r (4) X π = X π(i)+π(i ) As previously, (42) IEtr(H r n) = IEX π, π where the sum is over all circuits in {,, n} of length r, and by Hölder s inequality, we again have the bound (48), which for bounded X r we use to discard the non-generic circuits from the sum in (42) To this end, with the random variables X j independent and of mean zero, the term IEX π vanishes for every circuit π with at least one unpaired X j By (4), in the current setting paired variables correspond to an s H -matching in the circuit π Hence, only s H -matched circuits (in the formalism of Section 42) can make a non-zero contribution to (42) If r = 2k > 0 is odd then each s H -matched circuit π of length r must have an edge of order 3 From (48) and Proposition 42 we get IEtr(H 2k n ) Cn k, proving (40) When r = 2k is an even number, let Π be the set of all circuits π : {0,,, 2k} {,, n} with the s H -values consisting of k distinct numbers Recall that IEX π = for any π Π (see (4)) Further, with any s H -matched circuit not in Π having an edge of order 3, it follows from (48) and Proposition 42 that lim n n k+ IEtr(Hr n) #Π = 0 Therefore, the following lemma completes the proof of (49), and with it, that of Proposition 47 Lemma 48 lim n #Π = nk+ w: w =2k p H (w) Proof Similarly to the proof of Lemma 46, label the circuits in Π by the partition words w which list the positions of the pairs of s H -matches along {,, 2k}, with the corresponding partition Π = w Π(w) into equivalence classes To every such partition word w we can assign n k+ paths π(i) = x i, i = 0,, 2k obtained by solving the system of equations (36), with values, 2,, n for each of the k+ undetermined variables, and the remaining k values computed from the equations Some of these paths will fail to be a circuit, and some will fail to stay in the admissible range {,, n} Let p n (w) denote the fraction of the paths that stay within the admissible range {,, n} and are circuits, noting that p n (w) n (k+) #Π(w) 0 by

HANKEL, MARKOV, TOEPLITZ MATRICES 9 Proposition 42 Thus, p n (w) is the probability of the event A w that the solution of (36) starting with the undetermined variables x j that are independent discrete uniform random variables on the set {/n, 2/n,, }, stays within (0, ] and satisfies the additional condition x 0 = x 2k It follows that as n, the probabilities p n (w) converge to p H (w), the probability of the event A w with the undetermined variables now being independent and uniformly distributed on [0, ] 44 Concentration of moments of the spectral measure Proposition 49 Let {X j } be a sequence of bounded iid random variables such that IE(X ) = 0 and IE(X 2 ) = Fix r N Then there is C r < such that for all n N we have IE[(tr(T r n) IEtr(T r n)) 4 ] C r n 2r+2 and IE[(tr(H r n) IEtr(H r n)) 4 ] C r n 2r+2 Proof The argument again relies on the enumeration of paths Since both proofs are very similar, we analyze only the Hankel case Using the circuit notation of (4) we have that 4 (43) IE[(tr(H r n) IEtr(H r n)) 4 ] = IE[ (X πj IE(X πj ))], π,π 2,π 3,π 4 where the sum is taken over all circuits π j, j =,, 4 on {,, n} of length r each With the random variables X j independent and of mean zero, any circuit π k which is not matched together with the remaining three circuits has IE(X πk ) = 0 and 4 ( IE[ (X πj IE(X πj ))] = IE[X πk Xπj IE(X πj ) ) ] = 0 Further, if one of the circuits, say π, is only self-matched, ie, has no cross-matched edge, then obviously 4 4 ( IE[ (X πj IE(X πj ))] = IE[X π IE(X πj )]IE[ Xπj IE(X πj ) ) ] = 0 Therefore, it suffices to take the sum in (43) over all s H -matched quadruples of circuits on {,, n}, such that none of them is self-matched By Proposition 43, there are at most C r n 2r+2 such quadruples of circuits, and with X (hence X π ) bounded, this completes the proof j k 45 Proofs of the Hankel and Toeplitz cases Proof of Theorem Proposition 4(i) implies that without loss of generality we may assume that the random variables {X j } are centered and bounded By Proposition 45 the odd moments of the average measure IE(ˆµ(T n / n)) converge to 0, and the even moments converge to m 2k of (35) By Chebyshev s inequality we have from Proposition 49 that for any δ > 0 and k, n N, [ IP x k dˆµ(t n / n) x k die(ˆµ(t n / n)) ] > δ C k δ 4 n 2 Thus, by the Borel-Cantelli lemma, with probability one x k dˆµ(t n / n) x k dγ T as n, for every k N In particular, with probability one, the random measures {ˆµ(T n / n)} are tight, and since the moments determine γ T uniquely, we have the weak convergence of ˆµ(T n / n) to γ T j=2

20 W LODZIMIERZ BRYC, AMIR DEMBO, AND TIEFENG JIANG Since the moments do not depend on the distribution of the iid sequence {X j }, the limiting distribution γ T does not depend on the distribution of X either, and is symmetric as all its odd moments are zero By Proposition A, it has unbounded support Proof of Theorem 2 We follow the same line of reasoning as in the proof of Theorem, starting by assuming without loss of generality that {X j } is a sequence of centered and bounded random variables, in view of Proposition 4(i) Then, by Proposition 47, as n the odd moments of the average measure IE(ˆµ(H n / n)) converge to 0, and the even moments converge to m 2k of (38), whereas from Proposition 49 we conclude that with probability one the same applies to the moments of ˆµ(H n / n) The almost surely convergence x k dˆµ(h n / n) x k dγ H as n, for all k N, implies tightness of ˆµ(H n / n) and its weak convergence to the nonrandom measure γ H Since its moments do not depend on the distribution of the iid sequence {X j }, so does the limiting distribution γ H, which is symmetric since all its odd moments are zero By Proposition A2 it has unbounded support, and is not unimodal 46 Markov matrices with centered entries In view of Proposition 4(iii) we may and shall assume hereafter without loss of generality that the random variables X ij are bounded Our proof of Theorem 3 follows a similar outline as that used in proving Theorems and 2, where the combinatorial arguments used here rely on matrix decomposition Starting with some notation we shall use throughout the proof, let Γ n be a graph whose vertices are two-element subsets of {,, n} with the edges between vertices a and b if the sets overlap, a b We indicate that (a, b) is an edge of Γ n by writing a b, and for a Γ n let a = {a, a + } with a < a + n The main tool in the Markov case is the following decomposition M n = a Γ n X a Q a,a, where X a := X a +,a and Q a,b is the n n matrix defined for vertices a, b of Γ n by if i = a +, j = b +, or i = a, j = b, Q a,b [i, j] = if i = a +, j = b, or i = a, j = b +, 0 otherwise Let t a,b = tr(q a,b ) It is straightforward to check that 2 if a = b, if a b and a t a,b = = b or a + = b +, if a = b + or a + = b, 0 otherwise From this, we see that t a,b = t b,a Since it is easy to check that Q a,b Q c,d = t b,c Q a,d, we get (44) tr (Q a,a Q a2,a 2 Q ar,a r ) = where for convenience we identified a r+ with a r t aj,a j+,

HANKEL, MARKOV, TOEPLITZ MATRICES 2 For a circuit π = (a a r a ) of length r in Γ n let r r (45) X π = X aj It follows from (44) and (45) that t aj,a j+ (46) tr(m r n) = X π, π where the sum is over all circuits of length r in Γ n, leading to the Markov analog of the path expansion (47), (47) IEtr(M r n) = π IEX π We say that a circuit π = (a a r a ) of length r in Γ n is vertexmatched if for each i =,, r there exists some j i such that a i = a j, and that it has a match of order 3 if some value is repeated at least three times among (a j, j =,, r) Note that the only non-vanishing terms in (47) come from vertex-matched circuits In analogy with Proposition 42, we show next that generically vertexmatched circuits have only double repeats, and consequently, the odd moments of IEˆµ(M n / n) converge to zero as n Proposition 40 Fix r N Let N denote the number of vertex-matched circuits in Γ n with r vertices which have at least one match of order 3 Then there is a constant C r such that for all n N N C r n (r+)/2 Proof Either r = 2k is even, or r = 2k is odd In both cases, the total number of different vertices per path is at most k Since a a 2 a r, there are at most n 2 /2 choices for a, and then at most 4n choices for each of the remaining k 2 distinct values of a j, and choice for each repeated value Thus N 4 r n 2 n k 2 = Cn k Corollary 4 Suppose {X ij ; j i } are bounded iid random variables such that IE(X 2 ) = 0, IE(X 2 2) = Then, (48) lim n IEtr(M2k nk+/2 n ) = 0 Proof If IEX π is non-zero, then all the vertices of the path a a 2 a 2k must be repeated at least twice So for an odd number of vertices, there must be a vertex which is repeated at least 3 times Thus, by Proposition 40 and the boundedness of X ij and of t a,b, IEtr(M 2k n ) C k n k, and (48) follows Let W n = n /2 Z n + X n + ξi n, where X n is a symmetric n n matrix with iid standard normal random variables (except for the symmetry constraint), Z n = diag(z ii ) i n, with iid standard normal variables Z ii that are independent of X n and ξ is a standard normal, independent of all other variables A direct combinatorial evaluation of the even moments of IEˆµ(M n / n) is provided in (Bryc,

22 W LODZIMIERZ BRYC, AMIR DEMBO, AND TIEFENG JIANG Dembo & Jiang 2003) We follow here an alternative, shorter proof, proposed to us by O Zeitouni The key step, provided by our next lemma, replaces the even moments by those of the better understood matrix ensemble W n Lemma 42 Suppose {X ij ; j i } is a collection of bounded iid random variables such that IE(X 2 ) = 0, IE(X 2 2) = Then, for every k N, (49) lim n n (k+) [IEtr(M 2k n ) IEtr(Wn 2k )] = 0 Proof First observe that by Proposition 40, we may and shall assume without loss of generality that {X ij } is a collection of iid standard normal random variables, subject to the symmetry constraint X ij = X ji (as such a change affects n (k+) IEtr(M 2k n ) by at most C k n ) Recall the representation M n = X n D n of (3) and let M (n) (n) n = X n D n+ where D n+ is obtained by omitting the last row and column of the diagonal matrix D n+ which is an independent copy of D n+ (n) that is independent of X n Observe that the diagonal entries of D n+ are jointly normal, of zero mean, variance n + and such that the covariance of each pair is (n) Therefore, with D n+ independent of X n, for each n, the distribution of M n is exactly the same as that of W n Consequently, (49) is equivalent to (420) lim n n (k+) IE[tr(M 2k n ) tr( M 2k n )] = 0 The first step in proving (420) is to note that by a path expansion similar to (47) we have that (42) IE[tr(M 2k n ) tr( M 2k n )] = π [IEM π IE M π ], where now the sum is over all circuits π : {0,, 2k} {,, n}, and M π = 2k M π(i ),π(i) with the corresponding expression for M π Set each word w of length 2k to be a circuit by assigning w[0] = w[2k] and let Π(w) denote the collection of circuits π such that the distinct letters of w are in a one to one correspondence with the distinct values of π Let v = v(w) be the number of distinct letters in the word w, noting that #Π(w) n v(w) and that IEM π IE M π = f n (w) is independent of the specific choice of π Π(w) Hence, taking the letters of w to be from the set of numbers {, 2,, 2k} with the convention that w(i) = w[i], we identify w as a representative of π Π(w) (recall w[0] = w[2k]) For example, w = abbc of v(w) = 3 distinct letters becomes w = 223 which we identify with the circuit π Π(w) of length 4 consisting of the edges {, 2}, {2, 2}, {2, 3} and {3, } In view of (42), we thus establish (420) by showing that for any w, some C w < and all n, (422) f n (w) = IEM w IE M w C w n k v(w)+/2 Let q = q(w) be the number of indices i 2k for which w[i] = w[i ] (for example, q(223) = ) It is clear from the definition of M n and M n that f n (w) 0 only if q(w) Let u = u(w) count the number of edges of distinct endpoints in w, namely, with {w[i ], w[i]} Γ n, which appear exactly once along the circuit w (for example, u(223) = 3) Then, by independence and centering we