1 Exchangeable random arrays Tim Austin Notes for IAS workshop, June 2012 Abstract Recommended reading: [Ald85, Aus08, DJ07, Ald10]. Of these, [Aus08] and [DJ07] give special emphasis to the connection with limit objects for finite graphs. Within probability theory, Chapter 7 of the textbook [Kal05] offers a definitive account. 1 Exchangeable sequences and arrays Objects of the theory Exchangeability theory is concerned with families of random variables whose joint distribution is unchanged when they are permuted by some group of permutations. For particular groups of permutations, it seeks to describe all possible joint distributions that have this property. Note that when the index set is a discrete group Γ and the permutations are given by the regular representation of Γ on itself, this task would encompass the whole of ergodic theory. But the point is that in some special cases in which the relevant group of permutations is very large relative to the index set, one can deduce rather more complete results on the distributions of such families than in general ergodic theory. Following the usual convention among probabilists, our random variables will be defined on some probability space that is kept hidden in the background. We will always assume implicitly that it is rich enough to support new independent random variables whenever we need them. Basic objects of the theory: 1

2 Exchangeable sequences: Let E be {0, 1}, R, or any other sensible 1 space. A sequence of E-valued random variables (X n n N is exchangeable if (X n law = (X σ(n for any permutation σ : N N. Note that this is really an assertion about the measure µ on E N which is the joint law of the r.v.s (X n : it is invariant under the action of the permutation group Sym(N on E N by the permutation of coordinates. When E = {0, 1} these were studied by de Finetti in the 1930 s; for more general E by Hewitt and Savage in the 1950 s. Exchangeable arrays: More generally, for any k 1 we can consider an array of E-valued r.v.s (X e e N (k indexed by the set N (k of size-k subsets of N, and say it is exchangeable if (X e = (X σ(e for any permutation σ, where if e = {n 1,..., n k } then σ(e := {σ(n 1,..., σ(n k }. So now exchangeability is an assertion about the law µ on E N(k. Exchangeable sequences are the case k = 1. General arrays were studied by Hoover [Hoo79, Hoo82], Aldous [Ald81, Ald82, Ald85], Fremlin and Talagrand [FT85] and Kallenberg [Kal89, Kal92]. Important ideas were also suggested (though not published by Kingman, who had previously studied random partitions with a similar symmetry and proved his related paint-box theorem: see [Kin78b, Kin78a]. Why are these important? Exchangeable random structures are important because they are the natural output of sampling at random from discrete structures. Example 1 Most simply, for any E and any probability measure ν on E, the product measure ν N(k is always the law of an E-valued exchangeable array. Example 2 Suppose now that E is arbitrary and that P = {ν 1,..., ν m } is a finite set of probability measures on E. Choose an E-valued array (X e e N (k as follows: first pick l {1, 2,..., m} uniformly at random, and then conditionally on this choose (X e e N (k i.i.d. from ν l. (This clearly agrees with the previous example when m = 1. Example 3 Let E = {0, 1} and let G = (V, E be a finite graph, possibly with loops. Let (V n n N be a random sequence of vertices sampled independently from 1 Let s say Polish (= complete separable metric for definiteness. 2

3 the uniform distribution on V, and now define (X e e N (2 by letting { 1 if Vi V X ij := j E 0 else (so there is no additional randomness once the V n have been chosen. Note that in general the notation ij stands for the unordered pair {i, j}. One can easily find many ways to generalize these examples further (for instance, how could Example 3 be made to give a random array with a different space E?. We will soon introduce a general framework for discussing these. Main questions First question: Once a suitably-general notion of sampling has been defined, is it the only way one can produce an exchangeable sequence or array? (Spoiler: Yes. Second question: To what extent does the law of the exchangeable sequence remember the data that it was sampled from, and to what extent are properties of each reflected in the other? 2 Statement of the main theorem In order to generalize Examples 1 3 above, one must first observe that to construct a k-set exchangeable random array, randomness can be introduced at any level between 0 and k. In order to make this formal, consider first the uniform random array: this is simply the product of copies of Lebesgue measure on the space [0, 1] [0, 1] N [0, 1] N(2 [0, 1] N(k. Alternatively, this is the law of a family of random variables (U a a N, a k which are i.i.d. U[0, 1]. Now consider also any measurable function f : [0, 1] [0, 1] k [0, 1] (k 2 [0, 1] ( k k 1 [0, 1] E which is symmetric under the action of the permutation group Sym(k: f(x, (x i i, (x ij ij,..., x [k] = f(x, (x σ(i i, (x σ(iσ(j ij,..., x [k] σ Sym(k. 3

4 Such a function will be referred to as middle-symmetric. Then we may obtain an exchangeable random array as follows: let the U a for a N, a k be uniform i.i.d. as above, and set X e := f ( U, (U i i e, (U a a e (2,..., (U a a e (k 1, U e for e N (k (this is well-defined owing to the symmetry assumption on f. Definition 2.1 This is the exchangeable random array driven by f. We denote its law by Samp(f. Let s revisit the previous examples: Example 1 By abstract measure theory, any Borel probability measure ν on E is the pushforward of U[0, 1] under some measurable f 0 : [0, 1] E: that is, law(f 0 = ν. So now let f(x, (x i i, (x ij ij,..., x [k] = f 0 (x [k]. This example is using only level-k randomness. Example 2 Building on the above, let f l : [0, 1] E be such that law(f l = ν l for 1 l m, and also let P = (I 1,..., I m be a partition of [0, 1] into m equal subintervals. Now let f(x, (x i i, (x ij ij,..., x [k] = f l (x [k] whenever x I l. This example is using the randomness from levels 0 and k. Example 3 Finally, let P = (I v v V be a partition of [0, 1] into V -many equal subintervals, and now define { 1 if(x1, x f(x, x 1, x 2, x 12 := 2 I u I v for some uv E 0 else. So this example (with k = 2 is using the randomness from level 1. Theorem 2.2 Any exchangeable array (X e e N (k has law equal to Samp(f for some f as above. This is due to de Finetti (k = 1, E = {0, 1}, Hewitt and Savage (k = 1, Hoover and separately Aldous (k = 2, different proofs, and Kallenberg (all k. (Aldous partly attributes his proof to Kingman. 4

5 We will prove the case k = 2, E = {0, 1} later. That case already contains the main difficulties, except that the general case requires an induction on k which needs some careful management. That will be left as an exercise, or see [Aus08]. 3 Link to limit objects Before giving the proof, let s see how the theorem above connects to limit objects. Suppose that (V n, E n is a sequence of graphs, and for each n let µ n be the law of the exchangeable random array produced by sampling from (V n, E n. The following facts are now easy exercises in the definitions: The sequence (V n, E n is Cauchy in the sense of left homomorphism-densities if and only if the sequence µ n is Cauchy in the vague 2 topology. If W : [0, 1] 2 [0, 1] is a Lovász-Szegedy limit object for the sequence (V n, E n, then the limit of the measures µ is the exchangeable measure µ on {0, 1} N(2 driven by the function f(u, u 1, u 2, u 12 := 1 {u12 1 W (u 1,u 2 } : that is, it does not depend on u, and above each pair (u 1, u 2 it is the indicator of the interval [1 W (u 1, u 2, 1], whose length is W (u 1, u 2. On the other hand, if the µ n converge vaguely to µ = Samp(f, for some f : [0, 1] [0, 1] 2 [0, 1] {0, 1} which does not depend on the first coordinate, then the limit object for the sequence (V n, E n is represented by W (u 1, u 2 := 1 0 f(u 1, u 2, u 12 du 12. (1 (In fact, one can show that since the µ n all come from sampling single graphs, we can always assume that their limit is driven by an f that does not depend on the first coordinate, but that fact requires a slightly more sophisticated appeal to some ideas of ergodic theory. Modulo the proof that f can be independent of u in the third point above, this shows that the basic existence of limit objects for left-cauchy graph sequences is a fairly simple consequence of Theorem 2.2. On the other hand, if one knows about 2 = weak in some literature, or weak in functional analysis 5

6 the theory of limit objects then these can be used to prove Theorem 2.2, although that wasn t the original proof. We will remark on that approach below. It also follows from these remarks that two sufficient conditions for are the following: that Samp(f = Samp(f f(u, u 1, u 2, u 12 = f (φ(u, φ (u 1, φ (u 2, φ (u 12 for some Lebesgue-measure-preserving measurable bijections φ, φ, φ : [0, 1] [0, 1], or that 1 0 f(u, u 1, u 2, u 12 du 12 = 1 0 f (u, u 1, u 2, u 12 du 12 u, u 1, u 2 [0, 1]. The first of these conditions appears in just the same way as in the theory of limit objects. The second is obvious from the definition of Samp(f, and simply disappears once one works instead with the function W defined from f as above. To give an explicit example, if k = 2 and E = {0, 1} then Theorem 2.2 may be expressed more concretely by asserting that there is some measurable W : [0, 1] [0, 1] 2 [0, 1], symmetric in the last two coordinates, such that P(X i1 j 1 =... = X im j m = 1 = E m W (U, U ir, U jr (2 for any distinct pairs i 1 j 1,..., i m j m, and where the right-hand expectation E is over the independent U[0, 1]-r.v.s U and U i, i N. (This equation is enough to determine the joint law of (X ij ij, because any finite-dimensional event in {0, 1} N(2 may be expressed as a linear combination of events of the above form. It turns out that the combination of the two conditions above is still not necessary for Samp(f = Samp(f : for some f there are other, non-invertible probability-preserving transformations of [0, 1] that do not change Samp(f. This issue is analysed carefully in [Kal89] for two-dimensional arrays and in Section 7.6 of Kallenberg [Kal05] for the general case. One can also make this analysis via the corresponding result for uniqueness of limit objects of graph sequences. I won t go into it further here. r=1 6

7 Let us remark that in general there is some benefit to working with f, rather than its averaged function W as defined in (1: this doesn t appear when E = {0, 1}, but for general spaces E one would have to replace W with function that takes values among probability measures on E, and in this case it seems easier to work in the f language. 4 Proof for k = 2, E = {0, 1} Suppose that (X ij ij N (2 is an exchangeable random graph (i.e., {0, 1}-valued symmetric array. Note our convention that ij N (2 implies i j. We wish to show that there is a function f : [0, 1] [0, 1] 2 [0, 1] {0, 1} which is middle-symmetric and such that (X ij ij law = (f(u, Ui, U j, U ij ij N (2 where U, U i, U j and U ij for i, j N are all i.i.d. U[0, 1]. We will give the classical proof, essentially following Aldous [Ald82, Ald85]. It assumes de Finetti s Theorem (that is, the case k = 1 of our main theorem: for any nice space E, if (Y i i is an exchangeable sequence of E-valued r.v.s then there is a function g : [0, 1] [0, 1] E such that (Y i i law = (g(u, Ui i where U and U i, i N are i.i.d. U[0, 1]. For simplicity, we will also prove Theorem 2.2 as it is formulated in equation (2. First reduction: using de Finetti To prove the desired representation, it will in fact suffice to show that there are some auxiliary nice space Z, an exchangeable sequence of Z-valued r.v.s (Y i i, and a symmetric function W : Z Z [0, 1], such that P(X i1 j 1 =... = X im j m = 1 = E (Yi i m r=1 W (Y ir, Y jr (3 for any distinct edges i 1 j 1,..., i m j m, where E (Yi i denotes expectation in the r.v.s (Y i i. Indeed, once we know this, de Finetti s Theorem applied to (Y i i gives g : [0, 1] [0, 1] E such that (Y i i law = (g(u, Ui i, and so combining with the 7

8 above we have P(X i1 j 1 =... = X im j m = 1 = E U,(Ui i m r=1 W (g(u, U i, g(u, U j, which is the desired conclusion with the middle-symmetric function W (u, u 1, u 2 := W (g(u, u 1, g(u, u 2, u 12. Second reduction: abstract nonsense Secondly, the existence of a suitable exchangeable sequence as above now follows if we instead produce an exchangeable sequence of Z-valued r.v.s (Y i i which is coupled to the array (X ij ij in such a way that (i the whole enlarged array (Y i, X ij i,j is still exchangeable, i.e. (Y i, X ij i j N N law = (Yπ(i, X π(iπ(j i j N N for any permutation π : N N (so, formally, this array is now indexed by directed edges (i, j, and (ii the r.v.s X ij are conditionally independent over the r.v.s Y i, in the following quite specific sense: E ( m X i1 j 1 X i2 j 2 X im j m (Yi i N = E ( X ir j r Yir, Y jr (4 for any family of distinct pairs i r j r N (2, 1 r m. (Actually, this amounts to conditional independence over (Y i i, and also the assertion that when X i1 j 1 is conditioned on the whole sequence (Y i i, it actually depends only on the two values Y i1 and Y j1. r=1 To see that (4 is enough, simply define W : Z Z [0, 1] by W (z 1, z 2 := E(X ij Y i = z 1, Y j = z 2 = P(X ij = 1 Y i = z 1, Y j = z 2 (in the sense of regular conditional probabilities, so this is well-defined up to agreement for a.e. (z 1, z 2 according to the law of a pair (Y i, Y j, i j. This function is symmetric and does not depend on the choice of the pair ij, by exchangeability. Substituting it into (4 gives E ( X i1 j 1 = X i2 j 2 =... = X im j m = 1 (Yi i N = E ( Xi1 j 1 X i2 j 2 X im j m (Yi i N = m W (Y ir, Y jr (5 r=1 8

9 (where the first equality holds simply because each X ij is {0, 1}-valued. Now taking expectations in (Y i i recovers (3. The main step Finally this brings us to the main step: somehow constructing the auxiliary exchangeable variables Y i such that (4 holds. These also emerge from a trick, but a slightly less obvious one. Firstly, exchangeability of the family (X ij ij N (2 implies that also (X ij ij N (2 law = (Xγ(iγ(j ij N (2 where γ : N N is any injection (not necessarily a permutation. This is because the distributions of these two arrays are determined by their finite-dimensional marginals, and for any finite collection i 1, i 2,..., i m N we can find a genuine permutation σ : N N such that σ(i r = γ(i r for all r (but σ and γ differ elsewhere if necessary. Now, of course, our previous choice of N (2 as index set was rather arbitrary; we could have used, say, Z (2 instead. But if we switch to indexing by Z (2, we now rediscover the original array (X ij ij N (2 sitting inside the new array (X ij ij Z (2, in the sense that this sub-array of the new array has the same distribution as the N (2 -indexed array that we started with. This is simply because we can let γ : Z Z be an injection with image equal to N and apply the reasoning above. This completely trivial observation is important, because it provides a large collection of extra random variables with which to synthesize our Y i s. Indeed, let Z := {0, 1} (Nc (2 {0, 1} Nc where N c := Z \ N. So Z is an infinite-dimensional product, to be thought of with its product topology and hence as a Cantor set. Now define the random variables Y i for i N by Y i = ( (X i j i j (N c (2, (X i i i N c Z. (6 Thus, for each i N, Y i simply records all of the values X i j where i j is either an edge in N c, or an edge that joins i to N c. It s now an immediate consequence of the exchangeability of the whole Z (2 - indexed array that the new collection of r.v.s ( Xij, Y i i j N N (which is just our Z (2 -array indexed in a new way has distribution that is invariant under permutations of N. The proof will therefore be completed by the following claim: 9

10 Claim With the Y i s constructed as above one has E ( m X i1 j 1 X i2 j 2 X imjm (Yi i N = E ( X irjr Yir, Y jr r=1 for any family of distinct pairs i r j r N (2, 1 r m. Proof of the claim E ( X i1 j 1 X i2 j 2 X im j m (Yi i N By induction on m, it suffices to prove that = E ( X i1 j 1 X i2 j 2 X im 1 j m 1 (Y i i N E ( Xim j m Y im, Y jm for any family of pairs i r j r N (2, 1 r m. On the other hand, by the identity of iterated conditional expectations we have E ( X i1 j 1 X i2 j 2 X im j m (Y i i N = E ( X i1 j 1 X i2 j 2 X im 1 j m 1 (Y i i N E ( Xim j m X i1 j 1,..., X im 1 j m 1, (Y i i N, so it is enough to show that E ( X im j m Xi1 j 1,..., X im 1 j m 1, (Y i i N = E ( Xim j m Yim, Y jm. Let F 1 be the σ-algebra generated by all the random variables X i1 j 1,..., X im 1 j m 1 and (Y i i N, and F 2 the σ-algebra generated by just Y im and Y jm. Hence F 2 F 1, and by another appeal to iterated conditional expectation we know that E(X im j m F 2 = E(E(X im j m F 1 F 2. We wish to show that E(X im j m F 2 = E(X im j m F 1. Since the operator E( F 2 is an orthogonal projection on L 2 (Ω, F, P, it follows that the L 2 -norm of the random variable E(X imj m F 1 cannot increase under this projection, i.e. E(X imj m F E(X imj m F 1 2 2, (7 and that we have the desired equality if and only if in fact E(X im j m F = E(X im j m F (8 This conclusion now follows by one last trick: another appeal to exchangeability will prove that the inequality in (7 can be reversed, and hence we must have the desired equality. To perform this trick, let T N c be a further subset such that T and N c \ T are both infinite. Given this infinitude, we may now choose an injection γ : Z Z with the following properties: 10

11 γ(i m = i m and γ(j m = j m, γ(n c = T, and γ(n \ {i m, j m } = N c \ T (so all indices except i m and j m end up somewhere in N c. After applying this map to the indices, the r.v.s X irjr are sent to X γ(ir γ(j r, and the r.v.s Y i are replaced by Y γ(i := ( (X i j i j T (2, (X i γ(i i T (recall (6, and the joint exchangeability of all our r.v.s promises that E ( X im j m Xi1 j 1,..., X im 1 j m 1, (Y i i N Taking L 2 norms, this implies law = E ( Xim j m Xγ(i1 γ(j 1,..., X γ(im 1 γ(j m 1, (Y γ(i i N. E(X imj m F = E(X imj m F 3 2 2, where F 3 is the σ-algebra generated by X γ(i1 γ(j 1,..., X γ(im 1 γ(j m 1 and (Y γ(i i N. But upon unravelling the definition of γ, one sees that F 3 is generated by r.v.s of the form X i j where i j is either an edge in N c or is of the form i i m or i j m for some i N c. This is because γ(z = N c {i m, j m }, and the edge i m j m is distinct from i r j r for r m 1. Recalling the definition in (6, this particular subcollection of the random variables X i j is determined the various coordinates appearing in Y im and Y jm. Therefore one has the inclusion F 3 F 2. This is the other way around from our previous inclusion, F 1 F 2, and so arguing as before we have E(X im j m F E(X im j m F Since this left-hand side is equal to E(X im j m F 1 2 2, we deduce the desired equality of r.v.s. This completes the proof of the claim, and the theorem. Exercise Run through the above proof and check what modifications are needed to work with general spaces E. Exercise Using the above proof and the previous exercise as a guide, write out a proof of de Finetti s Theorem itself along similar lines. 11

12 More complicated exercise Generalize the above proof to k-dimensional arrays. (Hint: the appeal we made to de Finetti s Theorem becomes an induction on k. Philosophical remarks This proof does look rather like magic, because it s hard to locate where we did anything nontrivial. Perhaps the first place one should point to is equality (8. After the early abstract-nonsense steps, the essence of this theorem is that conditioning X imjm on all the other random variables that gave rise to F 1 is the same as conditioning on only Y im and Y jm. Then the key realization is that this assertion can be made quantitative, in that it requires only the equality of the L 2 -norms appearing in (7. This is the closest point of contact between the proof above and (versions of the Szemerédi regularity lemma: there, too, the key realization is that one has found a sufficiently regular partition if and only if one cannot substantially increase a certain L 2 -norm by using a slightly finer partition. In fact, the L 2 -norms in question more-or-less correspond as well. In our setting, it is the norm of a conditional expectation of the edge-indicating r.v. X ij onto a σ-algebra generated by the one-dimensional array Y i. For the regularity lemma, it is the norm of a conditional expectation of the adjacency matrix (i.e., indicator function of a fixed finite graph onto a partition of the edges generated by a partition of the vertices. What s very different in the above proof is that we have infinitely many random variables to play with, and so (i that approximate-equality gets tightened up to an exact equality, and (ii many of the manipulations that must be done in the finitary world to keep careful track of various errors are now hidden inside well-known results of probability. (Most obviously, the fact that one must eventually find a regular partition in the regularity lemma now corresponds to the proof that one can even define a conditional expectation operator onto a non-finite σ-subalgebra: classically, that proof uses either the L 2 -martingale convergence theorem, or an analogous argument guaranteeing the existence of an orthogonal projection operator onto a closed subspace of L 2 (P, which also boils down to showing that some sequence converges in L 2. Exercise for enthusiasts Mentally run through the above proof, but replace N with {1, 2,..., M} and Z with { M,..., M} for some enormous value of M, and assume the resulting finite random array is obtained by sampling from some unknown, modestly large finite graph. Try to see why the conditional independence given by the above Claim really does correspond to a certain relative quasirandomness in the finite world, and why an approximate version of equality (8 can be deduced if one knows that a certain not-too-large partition of the vertex set of the driving graph is regular. 12

13 It is also possible to give a proof of the Aldous-Hoover Theorem that uses the limit-object machinery for sequences of dense finite graphs. To do this, one lets (X ij ij be a realization of our exchangeable random array, and considers the finite graphs on {1, 2,..., n} whose adjacency matrices are the sub-arrays (X ij ij {1,2,...,n} (2. Using exchangeability one can then show that P-a.s. these finite graphs converge in the sense of left homomorphism-densities to some limit object f ω : [0, 1] 2 [0, 1], which may still be random (as indicated by the superscript ω. In probabilistic language, this is the assertion of convergence of empirical distributions for exchangeable random arrays; the proof of this convergence, on the other hand, can be made in many ways, for example using the ergodic theorem for actions of the permutation group Sym(N. Finally, by more abstract nonsense one can convert the dependence of f ω on ω into dependence on an extra copy of the unit interval, and so obtain a function g : [0, 1] [0, 1] 2 [0, 1], and some similar manipulations to the first steps above finally convert this into the driving function f. References [Ald81] [Ald82] [Ald85] [Ald10] David J. Aldous. Representations for partially exchangeable arrays of random variables. J. Multivariate Anal., 11(4: , David J. Aldous. On exchangeability and conditional independence. In Exchangeability in probability and statistics (Rome, 1981, pages North-Holland, Amsterdam, David J. Aldous. Exchangeability and related topics. In École d été de probabilités de Saint-Flour, XIII 1983, volume 1117 of Lecture Notes in Math., pages Springer, Berlin, David J. Aldous. More uses of exchangeability: representations of complex random structures. In N. H. Bingham and C. M. Goldie, editors, Probability and Mathematical Genetics: Papers in Honour of Sir John Kingman, pages Cambridge University Press, [Aus08] Tim Austin. On exchangeable random variables and the statistics of large graphs and hypergraphs. Probability Surveys, (5:80 145, [DJ07] Persi Diaconis and Svante Janson. Graph limits and exchangeable random graphs. Preprint; available online at ,

14 [FT85] [Hoo79] D. H. Fremlin and M. Talagrand. Subgraphs of random graphs. Trans. Amer. Math. Soc., 291(2: , David N. Hoover. Relations on probability spaces and arrays of random variables [Hoo82] David N. Hoover. Row-columns exchangeability and a generalized model for exchangeability. In Exchangeability in probability and statistics (Rome, 1981, pages , Amsterdam, North-Holland. [Kal89] [Kal92] [Kal05] Olav Kallenberg. On the representation theorem for exchangeable arrays. J. Multivariate Anal., 30(1: , Olav Kallenberg. Symmetries on random arrays and set-indexed processes. J. Theoret. Probab., 5(4: , Olav Kallenberg. Probabilistic symmetries and invariance principles. Probability and its Applications (New York. Springer, New York, [Kin78a] J. F. C. Kingman. The representation of partition structures. J. London Math. Soc. (2, 18(2: , [Kin78b] J. F. C. Kingman. Uses of exchangeability. Ann. Probability, 6(2: , Department of Mathematics Brown University Providence, RI 02912, USA timaustin 14

