U-statistics on network-structured data with kernels of degree larger than one

Size: px

Start display at page:

Download "U-statistics on network-structured data with kernels of degree larger than one"

Dayna Marsh
6 years ago
Views:

1 Journal of Machne Learnng Research 47:37 48, 2014 Statstcally Sound Data Mnng U-statstcs on network-structured data wth kernels of degree larger than one Yuy Wang Chrstos Peleks Jan Ramon Computer Scence Department, KU Leuven, Belgum Edtors: Wlhelmna Hämälänen, Francos Pettean, and Geoff Webb Abstract Most analyss of U-statstcs assumes that data ponts are ndependent or statonary. However, when we analyze network data, these two assumptons do not hold any more. We frst defne the problem of weghted U-statstcs on networked data by extendng prevous work. We analyze ther varance usng Hoeffdng s decomposton and also gve exponental concentraton nequaltes. Two effcently solvable lnear programs are proposed to fnd estmators wth mnmum worst-case varance or wth tghter concentraton nequaltes. 1. Introducton Nowadays there s a plethora of real-world datasets whch are network-structured. These are examples of relatonal databases,.e. data samples are relatons between obects, and so exhbt dependences. A typcal example s the web whch, due to the exploson n socal networks and the expanson of e-commerce, s generatng an mmense amount of networkstructured data. Therefore we need statstcal methods that permt us to mne and learn from ths type of datasets. An example of a statstcal method that generates unbased estmators of mnmum varance nvolves the noton of U-statstcs. U-statstcs are a class of measures, proposed by W. Hoeffdng (Hoeffdng, 1948), whch can usually be wrtten as averages over functons on elements or tuples of elements of samples, e.g., the sample mean, sample varance, sample moments, Kendall s τ (see (Kendall, 1938)), Wlcoxon s sgned-rank sum (see (Wlcoxon, 1945)), etc. Most analyss of U-statstcs assumes that data ponts are ndependently dstrbuted. However, when we consder networked data ponts, ths assumpton does not hold any more; two or more examples may share some common obect. In our prevous work (Wang et al., 2014), we provded a statstcal theory of learnng from networked tranng examples. Ths work generalzes the results and extends the deas further. A crucal assumpton n our prevous work was that every (perhaps correlated) data pont s used only once. In contrast to ths, U-statstcs (see e.g. (Lee, 1990)) are a class of measures that allows us to repeatedly use data ponts. For example, the rank correlaton estmator of Kendall (Kendall s τ) compares every data pont to all other ponts. When we consder U-statstcs on networked data ponts, data ponts are repeatedly used f the degree, d, of the kernel of U-statstcs s greater than 1 (the case d = 1 has been dscussed n (Wang et al., 2014)). Dfferent data ponts may be also correlated. In ths work we c 2014 Y. Wang, C. Peleks & J. Ramon.

2 Wang Peleks Ramon address the problem of how to desgn U-statstcs, on networked data ponts, that exhbt small varance and small probablty of devaton from ther mean. There s a vast lterature on U-statstcs for dependent random varables. However, most of the work focuses on provdng central lmt theorems and related results for dependent statonary sequences of random varables. For example, n (Khashmov, 1988; Hsng and Wu, 2004; Lee, 1990; Khashmov, 1994; Km et al., 2011) the authors dscuss U-statstcs on several types of statonary sequences, lke weakly dependent statonary sequences, m- dependent statonary sequences, absolutely regular process and random varables wth mxng condtons, etc. The assumptons made n those works are not sutable for networked random varables whch wll be dscussed n ths paper. Our contrbuton s to not only analyze the varance and provde concentraton bounds of U-statstcs on networked random varables but also to desgn good U-statstcs for ths type of networked data. In addton, there exsts lterature on weghted U-statstcs. In (Ha et al., 2014), the authors analyze the asymptotc behavor of weghted U-statstcs wth..d. data ponts. In (Nasar, 2012). the author consders ncomplete U-statstcs whch are smlar to our settng, but the attenton s focused towards asymptotc results under the assumpton of..d. data ponts. In (O Nel and Redner, 1993), t s shown that non-normal lmts can occur for some choces of weghts. In (Rf and Utzet, 2000), one can fnd a suffcent condton for the convergence of weghted U-statstcs. In (Hsng and Wu, 2004), the authors consder weghted U-statstcs for statonary processes. Our results dffer from the above n the fact that we do not assume ndependence and our attenton s focused towards dfferent aspects. The rest of the paper s organzed as follows. In Secton 2, we defne a weghted verson of U-statstcs on networked random varables and state the basc questons we are nterested n. In Secton 3, we bound the varance of the U-statstcs by employng Hoeffdng s decomposton. Subsequently, n Secton 4, we formulate a lnear program that allows us to obtan a concentraton nequalty for weghted U-statstcs. In Secton 5, we mnmze the worst-case varance usng a convex program. Fnally, n Secton 6, we draw conclusons wth some remarks and comments on possble future work. 2. Prelmnares In ths secton, we gve a formal defnton of the problem that s addressed n ths paper. Let G = (V (1) V (2), E) be a bpartte graph 1 and assume that we are gven two sets of..d. random varables that are ndexed usng the vertces of G. That s, let X (1) = {φ v } v V (1) be a set of..d., vector-valued random varables assocated to V (1) and let X (2) = {ψ v } v V (2) be a set of..d. random varables assocated to V (2). Fx any enumeraton {e 1,..., e n } of the edge set E. To every edge e = (u, v) E, we assocate a par of random varables by settng X = (φ v, ψ u ) X (1) X (2). We wll denote by X (1) by X (2) les n V (). We wll refer to the set X = {X } n =1 the second coordnate of X. Smlarly, e () In addton, for S {1, 2}, we wll denote by X (S) the frst coordnate of X and, = 1, 2, wll denote the vertex of e that as a set of G-networked random varables. = s S X (s) the (sub)vector formed by the coordnates of X that correspond to S. In partcular X ( ) =. For S, T {1, 2}, 1. We remark that our results can be extended to k-partte hypergraphs but, n order to keep the formalsm smple, we present here the case k = 2. 38

3 U-statstcs on network-structured data wth kernels of degree larger than one we wll denote by X (S) X (T ) the (sub)vector Y X (S T ) for whch Y (S) = X (S) Y (T ) = X (T ). Let f(, ) be a real valued functon such that f e and e are dsont edges n E (henceforth denoted e e = ) then E[f(X, X )] = µ and E [ f 2 (X, X ) ] µ 2 = σ 2. Such a functon f(, ) appears, for example, n the Kendall s τ rank correlaton coeffcent (see Example 1 below). Let us llustrate the above defntons wth an example. Example 1 Let the vertex set V (1) represent a set of persons and V (2) represent a set of flms. For every person v V (1) and every flm u V (2) on the correspondng vertces wth an edge f and only f person v has seen the flm u. The result s a bpartte graph, G = (V (1) V (2), E). An nstance of such a graph can be found n Fgure 1. Suppose that for every person v V (1) there s feature vector, φ v, that contans nformaton on, say, the gender, age, natonalty, etc., of person v and that for every flm u V (2) there s a feature vector, ψ u, contanng nformaton on, say, scenography, actor popularty, etc., of flm u. Thus, every edge e = (v, u) E s assocated to vector X = (φ v, ψ u ). Now suppose that we have two functons, S 1 ( ), S 2 ( ), that take values n [0, 1] and are such that S k (X ), k = 1, 2, represents a ratng/certfcate that s gven to a specfc characterstc of flm u by person v. If e, e are such that e e =, defne the functon f(x, X ) by settng f(x, X ) = ( 1) I{S 1(X )>S 1 (X )}+I{S 2 (X )>S 2 (X )}, where I{ } denotes ndcator. Thus f(x, X ) s equal to 1 f the orderng on both ratngs agree and equal to 1, otherwse. Kendall s τ-coeffcent (see (Kendall, 1938)) s defned as τ = 2 n(n 1) e,e f(x, X ), where the sum runs over all pars of dsont edges, e, e. Note that the fact that the functon f(, ) s defned only for dsont edges mples that τ s an unbased estmator. For a fxed bpartte graph, G = (V, E), let us denote by E 0 = {(, ) : e, e E and e e = } the set consstng of all pars that are ndces of dsont edges from E (as an example, see Fg. 1. Suppose that we are gven a functon w : E 0 [0, + ) of nonnegatve weghts on the ndces of pars of dsont edges from E. Set U(f, w) = 1 w and (,) E 0 w, f(x, X ), (1) where w = (,) E 0 w,. We wll refer to U(f, w) as a weghted U-statstc of f. Note that, by defnton, U(f, w) s an unbased estmator of µ, or, more formally E[U(f, w)] = µ = E[f] for all f, (2) and that, n order to guarantee ths condton, t s mportant to sum over dsont edges n Eq. (1). Hence the means of U(f, w) and f are the same, but the same may not be true for the varance. Our attenton n ths paper (see Sectons 3 and 5) s focused towards analyzng the varance of U(f, w). Functon f(, ) wll be called the kernel of the U-statstc and wll be consdered as fxed throughout the paper; hence from now on we wll denote U(f, w) by U(w). Because the kernel assocates a real number to two vectors X, X X (1) X (2), we say that ts degree s two. 39

4 Wang Peleks Ramon Fgure 1: A bpartte graph. It contans nne pars of dsont edges: ({1, 5}, {2, 6}), ({1, 5}, {2, 7}), ({1, 5}, {3, 7}), ({1, 5}, {4, 7}), ({1, 6}, {2, 7}), ({1, 6}, {3, 7}), ({1, 6}, {4, 7}), ({2, 6}, {3, 7}), ({2, 6}, {4, 7}). If the kernel s not symmetrc then for each par, say ({1, 5}, {2, 6}), we have to nclude also the par consstng of the same edges but wrtten n reversed order,.e., ({2, 6}, {1, 5}). In classcal U-statstcs (see (Hoeffdng, 1948)), the varables {X } n =1 are..d. and all w, are equal to 1. By ntroducng weghts n the above defnton we wll be able to obtan estmators that exhbt small varance and mproved bounds on the probablty of devaton from the mean. Notce that the networked varables {X } n =1 are not ndependent anymore, because two or more random varables may share the frst or the second coordnate. For example, f e (1) = e (1) = v, then X (1) = X (1) = φ v. In ths paper we shall be nterested n followng basc questons: Can we fnd a sharp upper bound on the varance of U(w)? How can we bound the probablty of devaton Pr[U(w) µ t] for every fxed t > 0? How can we desgn a good (low varance and/or small probablty of devaton) statstc U(w) by sutably choosng the weght functon w? We nvestgate these questons n the subsequent sectons. 3. Hoeffdng s decomposton In ths secton, we apply a technque, whch s known as Hoeffdng s decomposton, on weghted U-statstcs of networked random varables. We begn by descrbng ths well known technque (see (Hoeffdng, 1948)). Fx two ndependent random varables, say X and X, for whch the correspondng ( edges ) are dsont,.e. e e =. For any two subsets S, T {1, 2}, we defne µ (S,T ) X (S) recursvely va the followng formula: ( ) [ ] µ (S,T ) X (S) = E f(x, X ) X (S) ( ) µ (W,Z) X (W ), X (Z) 40 (W,Z) (S,T )

5 U-statstcs on network-structured data wth kernels of degree larger than one where [ (W, Z) (S, T )] means, by defnton, W S, Z T but (W, Z) (S, T ) and E f(x, X ) X (S) denotes the condtonal expectaton of f(x, X ), gven X (S) and X (T ). Hoeffdng s decomposton allows one to express f(x, X ) n terms of functons ( ) µ (S,T ) X (S), or, more formally f(x, X ) = E[f(X, X ) X, X ] = S {1,2},T {1,2} ( ) µ (S,T ) X (S). It s a( well-known) result, and n( fact not so dffcult ) to see, that the covarance of µ (S,T ) X (S) and µ (W,Z) X (W ), X (Z) s zero for (S, T ) (W, Z),.e., they are uncorrelated. Ths mples that σ 2 = S {1,2},T {1,2} σ 2 (S,T ) µ2, (3) where σ 2 = E [ f 2 (X, X ) ] ( )] µ 2 and σ(s,t 2 ) [µ = E 2 (S,T ) X (S). In other words, the varance of f can be parttoned nto a sum of varance-components, where every component corresponds to a par of subsets of {1, 2}. Therefore, Hoeffdng s decomposton allows us to wrte the functon f(x, X ) as a sum of several uncorrelated functons. Ths decomposton smplfes sgnfcantly the analyss of the varance of U-statstcs based on an..d. sample. To see ths, let {X } n =1 be..d. and suppose that,, k are three dfferent ndces. Consder the U-statstcs that are defned on {X } n =1 wth all weghts beng equal to 1. We want to fnd upper bounds on the varance of U(w). Snce the varance of U(w) equals E [ U(w) 2] µ 2 (see also Eq. (1)), we have to fnd upper bounds on expressons of the form E[f(X, X )f(x, X k )] and then add them up. Note that E[f(X, X )f(x, X k )] µ 2 s the covarance of f(x, X ) and f(x, X k ). Now, n case one uses an..d. sample, t can be shown that E[f(X, X )f(x, X k )] µ 2 = σ 2 ({1}, ) + σ2 ({2}, ) + σ2 ({1,2}, ). Thus, the varance of U decomposes nto a sum of smaller varance-components. We remark that n the classcal analyss of the varance of U-statstcs usng an..d. sample we often assume that the kernel f s symmetrc,.e. f(x, X ) = f(x, X ). The symmetry guarantees that the covarance of every possble par, f(x, X ), f(x m, X l ), can always be expressed as a sum of several varance-components. However, the classcal varance analyss usng Hoeffdng s decomposton cannot be drectly appled to the case of networked random varables, due to dependence. To see ths suppose that we have four dfferent edges, say e 1, e 2, e 3, e 4, such that e 1 and e 3 ntersect n V (1),.e. e (1) 1 = e (1) 3, and e 2 and e 3 ntersect n V (2),.e. e (2) 2 = e (2) 3. Then, usng the fact 41

6 Wang Peleks Ramon ( ) that the functons µ (, ) X ( ), X ( ) are uncorrelated and some algebra, one can show that [ ] E[f(X 1, X 2 )f(x 3, X 4 )] µ 2 = E µ 2 ({1}, ) (X(1) 1 ) + µ ({2}, )(X (2) 2 )µ (,{2})(X (2) 2 ) [ ] + E µ ({1,2}, ) (X (1) 1 X(2) 2 )µ ({1},{2})(X (1) 1, X(2) 2 ) [ ] = σ({1}, ) 2 + E µ ({2}, ) (X (2) 2 )µ (,{2})(X (2) 2 ) [ + E µ ({1,2}, ) (X (1) 1 X(2) 2 )µ ({1},{2})(X (1) 1, X(2) 2 ], ) Note that the second and the thrd term of the last expresson do not decompose further to varance-components,.e. nto a sum of expressons of the form [ ( )] E µ 2 (S,T ) X (S) = σ(s,t 2 ). If we addtonally [ assume that the kernel s ] symmetrc, then the second term can be wrtten n the form E µ ({2}, ) (X (2) 2 )µ (,{2})(X (2) 2 ) = σ({2}, ) 2, but the thrd term can not. Recall that we are nterested n fndng a sharp bound on the varance of U-statstcs on networked examples. Recall further that the varance of weghted U-statstcs s related to the covarance of f(x, X ) and f(x m, X l ), where (e, e ), (e m, e l ) E 0. In order to formally capture ths relaton, we wll need the followng defnton. Defnton 1 (overlap ndex matrx) Gven a set of edges E = {e } n =1 of a bpartte graph G, we defne the overlap matrx of E, denoted J E to be the n n matrx whose (, ) entry equals J, E = {l {1, 2} e (l) = e (l) }. In other words, gven two edges e, e from E, J E tells us the part of the graph on whch they ntersect. Note that J, E s a subset of {1, 2}. For example, n the graph of Fg. 1, f e 1 = {1, 5} and e 2 = {1, 6} then J1,2 E = {1}, whle f e 1 = {1, 5} and e 3 = {2, 6}, then J1,3 E =. If t s clear from the context, we wll drop E from J E and wrte J nstead. Let {X } n =1 be a set of G-networked random varables assocated to E = {e } n =1. Fx two pars of edges, say (e, e ) and (e m, e l ), such that e e = and e m e l =. One can show that the covarance of f(x, X ) and f(x m, X l ),.e., the quantty Σ(, ; m, l) = E[f(X, X )f(x m, X l )] µ 2, s equal to ( ) ( )] E [µ (S W,T Z) X (S W ) (T Z), X µ (S Z,T W ) X (S) X (Z) X (W ), (4) where the sum runs over all quadruples (S, T, W, Z) such that S J,m, T J,l, W J,l, Z J,m. Now, usng the Cauchy Schwarz nequalty (the followng expected values can be treated as nner products) t s easy to see that [ E ( E [µ (S W,T Z) µ 2 (S W,T Z) X (S W ) ( X (S W ) 1, X 1, X (T Z) 2 (T Z) 2 )] E[ ) ( )] µ (S Z,T W ) X (S) 1 X (Z) 2 2 X (W ) 1 )] µ 2 (S Z,T W ) ( X (S) 1 X (Z) 2 2 X (W ) 1 = σ (S W,T Z) σ (S Z,T W ). 42

7 U-statstcs on network-structured data wth kernels of degree larger than one Summarzng, we can deduce the followng bound on the varance of U(w): Theorem 2 The varance of U(w),.e. the quantty E [ U(w) 2] µ 2, s at most w, w m,l σ (S W,T Z) σ (S Z,T W ), where s as before and runs over all quadruples (,, m, l) for whch e e = and e m e l =. Ths bound s tght because t s possble to choose a kernel whose Hoeffdng s decomposton ensures that the equalty s attaned n the Cauchy Schwarz nequalty,.e. so that µ (S W,T Z) and µ (S Z,T W ) are lnearly dependent. If we gve every term f(x, X ) the same weght, the varance may not be mnmal (see Secton 5), and the same holds for the bound on the probablty of devaton from the mean (see Secton 4). 4. Concentraton nequaltes: a lnear program method In ths secton, we consder bounded kernels of degree two,.e. functons, f(, ), that satsfy f µ M, for some threshold M > 0. We are nterested n obtanng concentraton bounds for U-statstcs wth kernels of that form. We would lke to fnd a weght functon for whch the correspondng weghted U-statstcs gve sharp concentraton nequaltes One way to get a bound s by applyng Hoeffdng s nequalty as follows. Let us consder U-statstcs that are based on a matchng n the graph G. A matchng n a hypergraph s a collecton of dsont edges and so, n the case of networked examples, t corresponds to an ndependent sample. If we use an ndependent sample of sze α G (the matchng number of G),.e., f we set U nd = ( 1 αg2 ) f(x, X ), {, E : } where E s a maxmum matchng of G, then by Hoeffdng s result (see (Hoeffdng, 1963, 1948)) we can conclude that f α G 2, then Pr[U nd µ t] exp ( α Gt 2 ) 4M 2. (5) Ths bound may be sharp. However, t has two dsadvantages: 1. It s dffcult to fnd a large matchng n a k-partte hypergraph when k 3 (see (Garey and Johnson, 1979)), so the bound cannot be computed effcently n more general graphs. 2. Ths method may lose some nformaton of the sample snce we remove some random varables from the sample. 43

8 Wang Peleks Ramon Notce that fndng a maxmum matchng n a hypergraph can be represented as an nteger program. Integer programs are n general dffcult to solve. In contrast to ths, lnear programs are much easer. Wth ths n mnd, and n order to avod dealng wth the aforementoned dsadvantages, we formulate a lnear program (LP) and use the solutons of the lnear program as the weghts of weghted U-statstcs. Ths wll requre the noton of vertex-bounded weght functon. For a gven bpartte graph, G = (V, E), recall the defnton of the set E 0 = {(, ) : e, e E and e e = } Defnton 3 A weght functon w on E 0 s vertex-bounded f w, 0 for all pars (, ) E 0 and for all v, we have w, 1. {(,) E 0 :v e or v e } Our man result s the followng concentraton bound on vertex-bounded weghted U- statstcs: Theorem 4 Let X = {X } n =1 be a gven set of networked random varables. If w s an vertex-bounded weght functon, then the estmator U(w) satsfes f f µ M, then for any t > 0 we have E [ U(w) 2] µ 2 σ2 w, Pr[U(w) µ t] exp where w s the sum of all weghts w,, wth (, ) E 0. ) ( w t2 2M 2 and (6) Ths theorem s an analogue of Theorems 18 and 23 n (Wang et al., 2014). Thus, n order to mnmze the bounds of the prevous theorem, one has to maxmze w. Ths leads to the followng lnear program. max w w, (7), s.t. v : {(,) E 0 :v e or v e } w, 1 (8) (, ) E 0 : w, 0. (9) We call the optmal obectve value of ths lnear program the s -value. Optmal weghts w of ths lnear program wll be referred to as s -weghts. Snce the weght functon correspondng to U nd s vertex-bounded, t follows that s α G 2, when the matchng number satsfes α G 2. Ths shows that the bound gven n Eq. (6) s smaller than the bound n Eq. (5). If the set of networked examples {X } n =1 conssts of..d. random varables, then s = n 2 provded n 2. We remark that the bounds gven n Theorem 4 have the advantage that the quantty s can be computed effcently, n polynomal tme. 44

9 U-statstcs on network-structured data wth kernels of degree larger than one Note that the bounds n Theorem 4 depend on w but they do not depend on functon f. Note also that the frst nequalty of Theorem 4 s an analogue of the well known Hoeffdng s nequalty (see (Hoeffdng, 1963)). In fact, usng smlar deas, one can also show analogues of other well known exponental nequaltes, e.g., Chernoff s or Bernsten s. Now suppose that we use equal weghts,.e., we consder the followng U-statstcs: U eqw = 1 E 0 (,) E 0 f(x, X ). Then we should replace the last constrant (9) wth a constrant of the form: w, = t 0, for all (, ) E 0. (10) Snce we add more constrants to the LP, t follows that the optmal obectve value of the new lnear program wll be smaller than the s -value. Ths mples that the correspondng bounds on U eqw cannot be smaller than those of an s -weghted U-statstc. The followng example shows that the dfference between the optmal obectve values may be large: Example 2 Consder the graph n Fg. 1. If we gve the same weght to all pars of dsont edges, then, w, = 9 8. If we use an s -weght functon, then, w, = 3 2 > 9 8. The dea of usng lnear programs n order to obtan concentraton bounds on sums of dependent random varables appears already n a paper by Svante Janson (Janson, 2004). However, Janson s bound nvolves the optmal obectve value of a lnear program that s known to be computatonally hard. In (Wang et al., 2014) one can fnd concentraton bounds on sums of network-structured random varables that mprove Janson s bound and nvolve the optmal obectve value of lnear programs that can be computed effcently. 5. Mnmum varance: a convex programmng method From the varance pont of vew, the s -weght may not be the optmal soluton. In ths secton we formulate a convex program whch we use to mnmze the worst-case varance of a U-statstcs on a set of networked varables. To smplfy our dscusson, we only consder symmetrc kernels and wll provde remarks for the case of general kernels. Gven a bpartte graph, and usng the verson of Hoeffdng s decomposton that has been descrbed above, we see that the varance of U(w) depends on the (because σ (, ) does not affect and we fx the total varance σ) values of σ (S,T ), one for each par (S, T ). Snce we assume that the kernel s symmetrc, two symmetrc varance-components, e.g. σ ({1}, ) and σ (,{1}), should be the same. In practce, we usually do not know the values of σ (S,T ). Nevertheless, for every weght functon w one can fnd a tght upper bound for var(u(w)) by maxmzng w Σw as a functon of the varance components {σ (S,T ) } S,T {1,2}, where Σ s a covarance matrx defned by Eq. (4) (ts row ndex s (, ) and column ndex s (m, l)). We can see that when the structure,.e. G, of networked random varables s gven, the covarance matrx s determned by the values of σ (S,T ). We wll call a covarance matrx, Σ, for whch w Σw s maxmum a worst-case covarance matrx and the correspondng varance var(u(w)) worst-case varance. A natural problem s to fnd the weght functon, w, for whch the worst-case varance s mnmal. We do ths by formulatng a convex program. We begn wth some lemmas that allow us to smplfy ths convex program. 45

10 Wang Peleks Ramon Lemma 5 For any fxed weght w, there exsts some {σ(s,t ) } S,T {1,2} whch results n a worst-case covarance matrx (and equvalently worst-case varance) such that for all S, T {1, 2} for whch T + S 2 we have σ(s,t ) = 0. The prevous lemma holds true for non-symmetrc kernels as well and should be compared wth Lemma 16 n (Wang et al., 2014); ts proof s also smlar to the proof of Lemma 16. Ths result mples that we only need to consder worst-case covarance matrces for whch all elements are zero except {σ ({}, ) } {1,2} and {σ (,{}) } {1,2}. Note that n case the kernel, f, s symmetrc then we have σ ({}, ) = σ (,{}) for every {1, 2}. We can show one more lemma whch smplfes further our problem. Lemma 6 Suppose that the weght functon s fxed. If the kernel f s symmetrc, then the worst-case varance s attaned when σ({q}, ) 2 = σ2 (,{q}) = σ2 2 for some q {1, 2}. For general kernel f, the worst-case varance-components can be attaned by the method of Lagrange multplers. Lemma 6 should also be compared wth the remarks after Lemma 16 n (Wang et al., 2014). Consequently, we can formulate the followng optmzaton problem: mn w;t t s.t. q {1, 2} : w, = 1, : w, 0, (,) E 0,(m,l) E 0 w, w m,l I 4 t where I 4 = I{q J,m } + I{q J,l } + I{q J,m } + I{q J,l } and I{ } denotes the ndcator functon. Ths convex program s an analogue of program (7) n (Wang et al., 2014). Solvng ths convex quadratcally constraned lnear program, we can get weghts whch mnmze the worst-case varance. Note that these weghts may be not unque, but they form a convex regon. By constructon, these weghts correspond to U-statstcs whose varance s smaller than the varace of U-statstcs correspondng to the s -weght. If the varables {X } n =1 are..d. then optmal solutons of the above optmzaton problem satsfy t = s = n 2, provded n Concluson We consdered the problem of how to analyze the qualty of U-statstcs on network data and how to desgn good estmators usng weghts. The analyss of varance based on Hoeffdng s decomposton was generalzed. We obtaned a Hoeffdng-type concentraton bound for weghted U-statstcs and, n order to mnmze the bound, we used a lnear program whch can be solved effcently. We also consdered the worst-case varance, whose mnmzaton results n a convex quadratcally constraned lnear program. Though we only consdered bpartte graphs and kernels of degree 2 n ths paper, the results are vald for general k-partte hypergraphs and kernels of any degree d. A possble future work s to extend our results to V -statstcs whch s a class of based estmators that are closely related to U-statstcs. 46

11 U-statstcs on network-structured data wth kernels of degree larger than one Acknowledgements Ths work was supported by the European Research Councl, Startng Grant M- GraNT: Mnng Graphs and Networks, a Theory-based approach. We thank the anonymous referees for careful readng and suggestons that have mproved the presentaton. References Mchael R. Garey and Davd S. Johnson. Computers and ntractblty, a gude to the theory of NP-Completeness. W. H. Freeman Company, Hyung-Tae Ha, Me Lng Huang, and De L L. A remark on strong law of large numbers for weghted U-statstcs. Acta Math. Sn. (Engl. Ser.), 30(9): , ISSN Wassly Hoeffdng. A class of statstcs wth asymptotcally normal dstrbuton. The Annals of Mathematcal Statstcs, pages , Wassly Hoeffdng. Probablty nequaltes for sums of bounded random varables. Journal of the Amercan statstcal assocaton, :13 30, Talen Hsng and We Bao Wu. On weghted U-statstcs for statonary processes. The Annals of Probablty, 32(2), Svante Janson. Large devatons for sums of partly dependent random varables. Random Structures & Algorthms, 24.3: , Maurce. G. Kendall. A New Measure of Rank Correlaton. Bometrka, 30(1/2):81 93, Sh A Khashmov. On the Asymptotc Dstrbuton of the Generalzed U-Statstcs for Dependent Varables. Theory of Probablty & Its Applcatons, 32(2): , Sh A Khashmov. The central lmt theorem for generalzed U-statstcs for weakly dependent vectors. Theory of Probablty & Its Applcatons, 38(3): , Tae Yoon Km, Zh-Mng Luo, and Chho Km. The central lmt theorem for degenerate varable U-statstcs under dependence. Journal of Nonparametrc Statstcs, 23(3): , Justn Lee. U-statstcs: Theory and Practce. CRC Press, Masoud M Nasar. Strong law of large numbers for weghted U-statstcs: Applcaton to ncomplete U-statstcs. Statstcs & Probablty Letters, 82(6): , Kevn A. O Nel and Rchard A. Redner. Asymptotc Dstrbutons of Weghted U-Statstcs of Degree 2. The Annals of Probablty, 21(2): , Mohamed Rf and Frederc Utzet. On the Asymptotc Behavor of Weghted U-Statstcs. Journal of Theoretcal Probablty, 13(1): ,

12 Wang Peleks Ramon Yuy Wang, Jan Ramon, and Zheng-Chu Guo. Learnng from networked examples. submtted to Journal of Machne Learnng Research, Frank Wlcoxon. Indvdual Comparsons by Rankng Methods. Bometrcs Bulletn, 1(6): 80 83,

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that