The symmetry in the martingale inequality

Statistics & Probability Letters 56 2002 83 9 The symmetry in the martingale inequality Sungchul Lee a; ;, Zhonggen Su a;b;2 a Department of Mathematics, Yonsei University, Seoul 20-749, South Korea b Department of Mathematics, Zhejiang University, Hangzhou 30028, People s Republic of China Received April 200; received in revised form Augus00 Abstract In this paper, we establish a martingale inequality and develop the symmetry argument to use this martingale inequality. We apply this to the length of the longest increasing subsequences and the independence number of sparse random graphs. c 2002 Elsevier Science B.V. All rights reserved. MSC: primary 60D05; 60F05; secondary 60K35; 05C05; 90C27 Keywords: Bounded martingale inequality; Longest increasing subsequence; Independence number. Introduction and main results A common feature in many probabilistic arguments is to show that with high probability a random variable is concentrated on its mean. The usual way to do this is via either the martingale inequality, the isoperimetric inequality, or the log-sobolev inequality. See Godbole and Hitczenko 998, Janson et al. 2000, McDiarmid 997, 989, Steele 997, Talagrand 995 and Vu 200 for various extensions and beautiful applications. In this paper, we establish a martingale inequality and develop the symmetry argument to use this martingale inequality. We apply this to the length of the longest increasing subsequences and the independence number of sparse random graphs. To motivate the discussion below, let us begin with the well-known Azuma s inequality. Given a probability space ; F;P and a ltration F 0 ={ ;} F F n =F, an integrable random Corresponding author. E-mail addresses: sungchul@yonsei.ac.kr S. Lee, zgsu@mail.hz.zj.cn, zgsu200@yahoo.com Z. Su. This work was supported by the BK2 project of the Department of Mathematics, Yonsei University, the interdisciplinary research program of KOSEF 999-2-03-00-5, and Com 2 MaC in POSTECH. 2 This work was supported by the BK2 project of the Department of Mathematics, Yonsei University, and by NSFC 007072. 067-752/02/$ - see front matter c 2002 Elsevier Science B.V. All rights reserved. PII: S 067-75200077-8

84 S. Lee, Z. Su / Statistics & Probability Letters 56 2002 83 9 variable X L ; F;P can be written as X EX = EX F k EX F k := k= d k : k= Here d k is a martingale dierence. If there exist constants c k 0 such that d k 6 c k a.s. for each k 6 n, then for every t 0, PX EX + t 6 exp 2 n ; k= c2 k PX 6 EX t 6 exp 2 n : k= c2 k The above result appears in Azuma 967 and is often called Azuma s inequality, although it was actually earlier given by Hoeding 963. In most applications, X is a function of n-independent possibly vector valued random variables ; 2 ;:::; n and the ltration is F k = ; 2 ;:::; k : In this case, we let { ; 2 ;:::; n} be an independent copy of { ; 2 ;:::; n } and dene. k = X ;:::; k ; k ; k+;:::; n X ;:::; k ; k; k+;:::; n;.2 d k = E k F k : By denition, k is the change in the value of X resulting from a change only in one coordinate. So, if k 6 c k a.s., then d k 6 c k a.s. and we can apply Azuma s inequality to obtain a tail bound for X. However, in many cases c k grows too rapidly that Azuma s inequality does not provide any reasonable tail bound. A detailed analysis on various problems in our paper shows that our k s are much smaller than c k most of the time and from this observation we can improve Azuma s inequality and obtain a reasonable tail bound for X. Our result is the following. Theorem. Let X be an integrable random variable dened on a probability space ; F; P which is in fact a function of n-independent random variables ; 2 ;:::; n. We dene F k ; k ;d k by..3. Assume that there exists a positive and nite constant c such that for all k 6 n.3 k 6 c a:s: and there exist 0 p k such that for each k 6 n.4 P0 k 6 c F k 6 p k a:s:.5 Then; for every t 0 PX EX + t 6 exp 2c 2 n k= p ;.6 k +2ct=3 PX 6 EX t 6 exp 2c 2 n k= p :.7 k +2ct=3

S. Lee, Z. Su / Statistics & Probability Letters 56 2002 83 9 85 Proof. We prove here.6 with c=. From this one can easily get.6. The argument for.7 is the same. Let p = n n k= p k and let gx=e x x=x 2 with g0 = 2. Since d k = E k F k ; by Jensen s inequality we have for any s 0 Ee sd k F k =Ee se k F k F k 6 Ee s k F k = E + s k + s 2 2 kgs k F k : Since g is increasing and since k 6 a.s.; by.5 we have Ee sd k F k 6 E + s k + s 2 2 kgs k F k 6 +s 2 gse 2 k F k 6 +s 2 gsp k 6 e s2 gsp k a:s: By Markov s inequality; then we have for any s 0 PX EX + t 6 e st sx EX Ee 6 e st Ee s n k= d k 6 e st Ee s n k= d k Ee sdn F n 6 e st Ee s n k= d k e s2 gsp n 6 6 e st e s2 gs n k= p k =e st+es sn p : Letting x =+x log + x x we note that x x 2 =2 + x=3. With the choice of s = log + t=n p we have then PX EX + t 6 exp n pt=n p 6 exp 2 n k= p : k +2t=3 In this paper we show a way, we call a symmetry argument, to use this inequality to obtain a reasonably good tail bound for X. More precisely, we show how to estimate the term n k= p k in Theorem. In some cases this is easy. See Luczak and McDiarmid 200 for an example. However, in some other cases this is not an easy job at all. Actually, in his very beautiful survey McDiarmid 997 introduced Talagrand s isoperimetric inequality by showing that the isoperimetric inequality gives a reasonable tail bound for the longest increasing subsequence whereas the martingale inequality

86 S. Lee, Z. Su / Statistics & Probability Letters 56 2002 83 9 does not. We were suspicious about this and this was one of our motivations of this research. What we nd in this paper is that by the symmetry argument we can control the term n k= p k eectively and surprisingly by the above martingale inequality we can provide a comparable tail bound for the longest increasing subsequence. In Section 2, we tell this story. In Section 3, we again use the martingale inequality and the symmetry, and provide a tail bound for the independence number. Compared to the isoperimetric inequality and the log-sobolev inequality, the martingale inequality is rather elementary. Therefore, if one knows the symmetry argument in this paper, in many cases one can use the elementary martingale inequality to obtain a reasonably good tail bound. 2. The longest increasing subsequence Consider the symmetric group S n of permutations on the numbers ; 2;:::;n, equipped with the uniform probability measure. Given a permutation =;2;:::;n, an increasing subsequence i ;i 2 ;:::;i k is a subsequence of ; 2;:::;n such that i i 2 i k ; i i 2 i n : We write L n for the length of the longest increasing subsequences of. It turns out that L n provides an entry to a rich and diverse circle of mathematical ideas. The recent surveys Aldous and Diaconis 999, Deif000 are useful references for the properties of L n, various associated results and some of the history. We will present below only what we need for the proof of Theorem 2. Let U i =X i ;Y i, i =; 2;:::;n, be a sequence of iid uniform sample on the unit square [0; ] 2. U i ;U i2 ;:::;U ik is called a monotone increasing chain of height k if X ij X ij+ ;Y ij Y ij+ for j =; 2;:::;k : Note that by denition we do not require i j i j+ and hence U i ;U i2 ;:::;U ik is in general not a subsequence of U ;U 2 ;:::;U n. Dene L n U to be the maximum height of the chains in the sample U ;U 2 ;:::;U n. A key observation, due to Hammersley 972, is that L n has the same distribution as L n U. In fact, based on this equivalent formulation Hammersley 972 rst proved that there is a constant c 2 such that L n n c 2 in probability and in mean: The constant c 2 is now known to be 2. See Aldous and Diaconis 999 for details. Our main result regarding the longest increasing subsequence is as follows. Theorem 2. Given any 0; for all suciently large n and any t 0 P L n EL n t 6 2 exp 6 + : 2. n +2t=3

S. Lee, Z. Su / Statistics & Probability Letters 56 2002 83 9 87 Remark. Talagrand 995 rst obtained as an application of his isoperimetric inequality that for all t 0; PL n M n + t 6 2 exp ; PL n 6 M n t 6 2 exp t2 : 2.2 4M n + t 4M n where M n denotes the median of L n. Recently; Boucheron et al. 2000 used the log-sobolev inequality to improve the Talagrand constants as follows: PL n EL n + t 6 exp ; PL n 6 EL n t 6 exp t2 : 2.3 2EL n +2t=3 2EL n Comparing 2. with 2.2 and 2.3; and noting M n = n 2asn ; our elementary martingale argument provides the same order of bounds for L n. We also remark that Baik et al. 999 proved that Var L n n =3 and L n 2 n=n =6 converges in distribution. Proof. By Hammersley s equivalent formulation it suces to show that the theorem holds for L n U instead of L n. Let {U ;U 2 ;:::;U n} be an independent copy of {U ;U 2 ;:::;U n }. It is easy to see that; letting k = L n U ;:::;U k ;U k ;U k+;:::;u n L n U ;:::;U k ;U k;u k+;:::;u n; k takes values only +; 0; and. Moreover; since E k F k = 0 where F k = U ;U 2 ;:::; U k ; we have P k =+ F k =P k = F k : Denote by A j the event that any of the highest chains of U ;:::;U n contains U j. Since we are only concerned with the relative position of U j in the n sample points instead of the order; by symmetry each A j occurs with equal probability. Similarly; given U ;U 2 ;:::;U k ; each A j under the point conguration U ;:::;U k ;U k ;U k+ ;:::;U n occurs with equal probability for j = k; k +;:::;n. Therefore; since k = + implies that A k happens; we have P k =+ F k 6 PA k F k = PA j F k n k + = n k + E A j F k : Note that n A j is just the number of points of U k ;U k+ ;:::;U n; which are common to any highest chains of U ;:::;U k ;U k ;U k+ ;:::;U n and hence if we collect the points U j or U j with A j =; this itself forms a chain. Therefore; by the denition of chain we have E A j F k 6 EL n k+ U k ;U k+ ;:::;U n a:s:

88 S. Lee, Z. Su / Statistics & Probability Letters 56 2002 83 9 Letting p k =2EL n k+ U k ;U k+ ;:::;U n =n k +; we have P k =+ F k 6 n k + E A j F k 6 n k + EL n k+u k ;U k+ ;:::;U n = p k 2 : Now we apply Theorem to obtain P L n U EL n U t 6 2 exp 4 n k= EL ku=k +2t=3 Since EL n U= n 2asn, n =2 n k= EL ku=k 4 and hence the theorem follows. Next let us turn to the case of d-dimensional chains. Let U i =Xi ;Xi 2 ;:::;Xi d, i =; 2;:::;n,be a sequence of iid uniform sample on the unit cube [0; ] d. U i ;U i2 ;:::;U ik is a monotone increasing chain of height k if X i j X i j+ ;X 2 i j X 2 i j+ ;:::;X d i j X d i j+ for j =; 2;:::;k Dene L n;d U to be the maximum height of the chains in the sample U ;U 2 ;:::;U n. By an elementary calculation Bollobas and Winkler 988 proved that there is a constant c d such that L n;d U c n =d d in probability and in mean The exact values c d for d 2 are not known. However, there is a conjecture on c d ; c d = d k=0 =k!. See Steele 995 for details. With ingenuity and endeavor, Bollobas and Brightwell 992 investigated the speed of convergence of EL n;d U=n =d to c d and the concentration of L n;d U about its expectation by using the martingale inequality. Here is a better concentration bound. We skip its proof which is the same as that of Theorem 2. Theorem 3. Given any 0; for all suciently large n and any t 0 P L n;d U EL n;d U t 6 2 exp : 4dc d + n =d +2t=3 : 3. The independence number Given a complete graph K n and 0 p, we dene a random graph Gn; p by selecting each edge in K n with probability p independently. More specically, rst we name the vertices in K n by k, 6 k 6 n. Let U ij,6i j6m, m = n, be the independent uniform weight assigned to the 2 edge between i and j. Then, we keep the edges with U 6 p and remove the edges with U p. A subset A of vertices in K n is independent in Gn; p if no two vertices from A are adjacent in Gn; p. The independence number Gn; p is the size of the largest independent set. We write n p for Gn; p below.

S. Lee, Z. Su / Statistics & Probability Letters 56 2002 83 9 89 For a xed p, n p is remarkably concentrated; there exists k = kn; p such that n p=k or k + with high probability. See Shamir and Spencer 987 for details. In this case, our method does not produce any new result. For p=s=n with s 0 xed, Bollobas and Thomason 985 made very careful analysis of n s=n. Let { s } fs = sup 0: lim P n n = : n n In the range 0 s6, there is an explicit formula for fs. Although the formula is far from being pretty, it does enable one to compute particular values of fs. In the range s, there is a rather weak low bound for fs; fs s log s s+=s 2. These results directly imply that in sparse random graphs the size of the largest independent sets is proportional to n, the size of the graph. So, it is natural to believe that the limit of the rate E n s=n=n exists. However, no rigorous proof is available. Frieze 990 proved that by a large deviation inequality of Talagrand-type, for any xed 0 and for suciently large n n 0 and s s o s n sn 6 n 3. n s with high probability, and moreover s E n sn 6 n n s ; 3.2 where s= 2 log s log log s log 2 + : s However, his method cannot provide an explicit concentration inequality for the independence number. Here we apply Theorem to obtain the following. Theorem 4. Let n = n s=n. Then; for any xed 0 and for all suciently large n n and s s P n E n t 6 2 exp : 4 + s+=sn +2t=3 Remark. For large s this provides a much better tail bound than the bound obtained by simply applying Azuma s inequality. Boucheron et al. 2000 also obtained similar bounds for n by using the log-sobolev inequality. Proof. We use the vertex-expose martingale argument. First; we name the vertices in K n by k; 6 k 6 n. Let U ij ; 6 i j6m; m = n ; be the uniform weight assigned to the edge between i 2 and j and let {U ij} be an independent copy of {U ij }. For each 6 k 6 n; let G k n; s=n bethe random graph generated by replacing the weight U adjacent to the vertex k by U ; and write ;:::;k ;k ;k +;:::;n for the corresponding independence number. Then; we let k = ;:::;k ;k;k +;:::;n ;:::;k ;k ;k +;:::;n:

90 S. Lee, Z. Su / Statistics & Probability Letters 56 2002 83 9 It is easy to see that k takes only +; 0; and. Moreover; since E k F k = 0 where F k = U ij ; 6 i j6 k; we have P k =+ F k =P k = F k : Denote by A k the event that any of the largest independent sets of Gn; s=n contains the vertex k. By symmetry; given U ij ; 6 i j6k ; each A j ;k6j6n; occurs equally likely. More precisely; for k 6 j 6 n PA k F k =PA j F k : Thus; we have P k =+ F k 6 PA k F k = PA j F k n k + = n k + E A j F k : Note that n A j is just the number of vertices which are common to any largest independent sets of Gn; s=n and hence if we collect the vertices j with A j =; this forms an independent set of the random graph on {k; k +;:::;n} where each edge appears with probability s=n. Therefore; by the denition of the independent set we have E A j F k s 6 E n k+ a:s: n Now a direct application of Theorem gives P n E n t 6 2 exp 4 n k= E : 3.3 ks=n=k +2t=3 Given 0, in view of 3.2, for k n 0, s s 0 we have s E k sk 6 k k s : Thus, since k s=n 6 k, it easily follows that for all suciently large n and s E k s n = E k s=n + E k s=n k k k k= k6n k n 6 n + E k s=k k k n 6 n + s+ n: s Now the theorem follows from 3.3.

S. Lee, Z. Su / Statistics & Probability Letters 56 2002 83 9 9 References Aldous, D.J., Diaconis, P., 999. Longest increasing subsequences: From patience sorting to the Baik Deift Johansson theorem. Bull. Amer. Math. Soc. 36, 43 432. Azuma, K., 967. Weighted sums of certain dependent variables. Tôhoku Math. J. 3, 357 367. Baik, J., Deift, P., Johansson, K., 999. On the distribution of the length of the longest increasing subsequence of random permutations. J. Amer. Math. Soc. 2, 9 78. Bollobas, B., Brightwell, G., 992. The height of a random partial order: concentration of measure. Ann. Appl. Probab. 2, 009 08. Bollobas, B., Thomason, A.G., 985. Random graphs of small order. In: Karonski, M., Rucinski, A. Eds., Random graphs 83, Proceedings, Poznan, 983. North-Holland, Amsterdam, New York, pp. 47 97. Bollobas, B., Winkler, P.M., 988. The longest chain among random points in Euclidean space. Proc. Amer. Math. Soc. 03, 347 353. Boucheron, S., Lugosi, G., Massart, P., 2000. A sharp concentration inequality with applications. Random Struct. Alg. 6, 277 292. Deift, P., 2000. Integrable systems and combinatorical theory. Notices AMS. 47, 63 640. Frieze, A.M., 990. On the independence number of random graphs. Discrete Math. 8, 7 75. Godbole, A., Hitczenko, P., 998. Beyond the method of bounded dierences. DIAMCS ser., Discrete Mathematics and Theoretical Computer Science 4, 43 57. Hammersley, J.M., 972. A few seedlings of research. Proceedings of the Sixth Berkeley Symposium on Mathematical and Statistical Probability, University of California Press, Berkeley, CA, pp. 345 394. Hoeding, W., 963. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58, 3 20. Janson, S., Luczak, T., Rucinski, A., 2000. Random Graphs. Wiley, New York. Luczak, M.J., McDiarmid, C., 200. Bisecting sparse random graphs. Random Struct. Algebra 8, 3 38. McDiarmid, C., 989. On the Method of Bounded Dierences. Surveys in Combinatorics. Cambridge University Press, Cambridge, pp. 48 88. McDiarmid, C., 997. Concentration. In: Habib, M., McDiarmid, C., Ramirez-Alfonsin, J., Reed, B. Eds., Probabilistic methods for algorithmic discrete mathematics. Springer, New York, pp. 95 248. Shamir, E., Spencer, J., 987. Sharp concentration of chromatic numbers on random graphs G n;p. Combinatorica 7, 24 29. Steele, J.M., 995. Variations on the long increasing subsequence theme of Erdős and Szekeres. In: Aldous, D., Diaconis, P., Steele, J.M. Eds., Discrete probability and algorithms, Volumes in Mathematics and its applications, Vol. 72. Springer, New York, pp. 3. Steele, J.M., 997. Probability Theory and Combinatorial Optimization. SIAM, Philadelphia. Talagrand, M., 995. Concentration of measure and isoperimetric inequalities in product spaces. Publ. Math. del s IHES 8, 73 205. Vu, V.H., 200. Concentration of non-lipschitz functions and applications, preprint.