Supplement to Clustering with Statistical Error Control

Supplement to Clusterng wth Statstcal Error Control Mchael Vogt Unversty of Bonn Matthas Schmd Unversty of Bonn In ths supplement, we provde the proofs that are omtted n the paper. In partcular, we derve heorems 4. 4.3 from Secton 4. hroughout the supplement, we use the symbol C to denote a unversal real constant whch may tae a dfferent value on each occurrence. Auxlary results In the proofs of heorems 4. 4.3, we frequently mae use of the followng unform convergence result. Lemma S.. Let Z s = {Z st : t } be sequences of real-valued random varables for s S wth the followng propertes: for each s, the random varables n Z s are ndependent of each other, and E[Z st ] = 0 and E[ Z st φ ] C < for some φ > and C > 0 that depend nether on s nor on t. Suppose that S = q wth 0 q < φ/. hen s S Z st > η = o, where the constant η > 0 can be chosen as small as desred. roof of Lemma S.. Defne τ S, = S /{+δq+} wth some suffcently small δ > 0. In partcular, let δ > 0 be so small that + δq + < φ. Moreover, set Z st = Z st Z st τ S, E [ Z st Z st τ S, ] Z st > = Z st Z st > τ S, E [ Z st Z st > τ S, ] and wrte Z st = Z st + Z > st.

In what follows, we show that s S s S Z > st Z st > C η = o S. > C η = o S. for any fxed constant C > 0. Combnng S. and S. mmedately yelds the statement of Lemma S.. We start wth the proof of S.: It holds that s S Z > st > C η Q > + Q >, where and Q > := Q > := S s= S s= S s= CS τ φ S, S s= Z st Z st > τ S, > C η Z st > τ S, for some t Z st > τ S, = o S s= [ Zst φ ] E τ φ S, E [ Z st Z st > τ S, ] > C η = 0 for S and suffcently large, snce hs yelds S.. E [ Z st Z st > τ S, ] C τ φ S, [ Zst φ E τ φ S, = o η. ] Z st > τ S,

We next turn to the proof of S.: We apply the crude bound s S Z st > C η S s= Z st > C η and show that for any s S, Z st > C η C 0 ρ, S.3 where C 0 s a fxed constant and ρ > 0 can be chosen as large as desred by pcng η slghtly larger than / / + δ. Snce S = O q, ths mmedately mples S.. o prove S.3, we mae use of the followng facts: For a random varable Z and λ > 0, Marov s nequalty says that ± Z > δ E exp±λz. expλδ Snce Z st/ τ S, /, t holds that λ S, Z st/ /, where we set λ S, = /4τ S,. As expx + x + x for x /, ths mples that [ Z ] st E exp ± λ S, + λ S, E[ Zst ] λ S, exp E[ Zst ]. By defnton of λ S,, t holds that λ S, = 4S +δq+ = 4 q+ +δq+ = 4 +δ. Usng and wrtng EZ st C Z <, we obtan that Z st > C η exp { λ S, C η [ E = exp λ S, C η { Z st > C η + exp λ S, Z st > C η Z st ] + E [ Z ] st E exp λ S, + [ Z ] } st exp λ S, [ Z ] } st E exp λ S, 3

exp λ S, C η λ S, exp E[ Zst ] = exp C Z λ S, Cλ S, η CZ = exp 6 C 0 ρ, C +δ 4 +δ η where ρ > 0 can be chosen arbtrarly large f we pc η slghtly larger than / / + δ. roof of heorem 4. We frst prove that Ĥ[K 0 ] qα = α + o. S.4 o do so, we derve a stochastc expanson of the ndvdual statstcs [K 0]. Lemma S.. It holds that [K 0] = [K 0] +, where and the remander [K 0] = p has the property that n { ε j σ } /κ [K R 0 ] > p ξ = o S.5 for some ξ > 0. he proof of Lemma S. as well as those of the subsequent Lemmas S.3 S.5 are postponed untl the proof of heorem 4. s complete. Wth the help of Lemma S., we can bound the probablty of nterest as follows: Snce α := Ĥ[K 0 ] qα = [K 0] qα n n [K 0] n [K 0] + n n [K 0] n, 4

t holds that where < α > α = = < α α > α, n [K 0] n [K 0] qα n R[K 0] qα + n R[K 0]. As the remander has the property S.5, we further obtan that α + o α α + o, S.6 where α α = = n [K 0] n [K 0] qα p ξ qα + p ξ. Wth the help of strong approxmaton theory, we can derve the followng result on the asymptotc behavour of the probabltes α and α. Lemma S.3. It holds that α α = α + o = α + o. ogether wth S.6, ths mmedately yelds that α = α+o, thus completng the proof of S.4. We next show that for any K < K 0, Ĥ[K] qα = o. S.7 Consder a fxed K < K 0 and let S {Ĝ[K] followng property: : K} be any cluster wth the #S n := mn K0 #G, and S contans elements from at least two dfferent classes G and G. S.8 It s not dffcult to see that a cluster wth the property S.8 must always exst under our condtons. By C {Ĝ[K] : K}, we denote the collecton of clusters that have the property S.8. Wth ths notaton at hand, we can derve the [K] followng stochastc expanson of the ndvdual statstcs. 5

Lemma S.4. For any S and S C, t holds that [K] = κσ p d j +, where d j = µ j #S S µ j and the remander for some small ξ > 0. Usng S.9 and the fact that we obtan that Ĥ[K] qα S C S { [K] S C S S C S κσ p = n [K] has the property that [K] R > p ξ = o S.9 qα [K] qα S C S S C S { κσ p { S C S κσ p d j d j d j } } S C S S C S, [K] R qα } qα + p ξ + o. S.0 he arguments from the proof of Lemma S.3, n partcular S., mply that qα C log n for some fxed constant C > 0 and suffcently large n. Moreover, we can prove the followng result. Lemma S.5. It holds that for some fxed constant c > 0. { p S C S d j } c p Snce qα C log n and log n/ p = o by C3, Lemma S.5 allows us to nfer that { S C S κσ p d j } qα + p ξ = o. ogether wth S.0, ths yelds that Ĥ[K] qα = o. 6

roof of Lemma S.. Let n = #G and wrte ε = p p ε j along wth µ = p p µ j. Snce {Ĝ[K 0 ] } { } : K 0 = G : K 0 by 3., we can gnore the estmaton error n the clusters Ĝ[K 0] by the true classes G. For G, we thus get and replace them [K 0] = [K 0] +,A + R[K 0],B R[K 0],C + R[K 0],D, where,a,b = κ κ = κ p σ σ p,c = κ σ p,d = κ σ p { ε j σ } ε j { ε j ε + } ε n j ε G {ε + } ε n j ε. G We now show that G,l = o p p ξ for any and l = A,..., D. hs mples that n,l = K0 G,l = o p p ξ for l = A,..., D, whch n turn yelds the statement of Lemma S.. hroughout the proof, we use the symbol η > 0 to denote a suffcently small constant whch results from applyng Lemma S.. By assumpton, σ = σ + O p p /+δ and κ = κ + O p p δ for some δ > 0. Applyng Lemma S. and choosng ξ > 0 such that ξ < δ η, we obtan that and R [K 0 ],A G κ { ε } j p κ G σ = κ O p p η = O p p δ η = o p p ξ κ R [K 0 ],B G κ = σ { p σ G ε j σ + σ p} κ σ σ { Op p η + σ p } = o p p ξ. 7

We next show that R [K 0 ],C = op p 4. S. G o do so, we wor wth the decomposton,c = { κ σ }{,C, +R[K 0],C, R[K 0],C,3 }, where,c, = ε j ε p,c, = p,c,3 = p ε j n Wth the help of Lemma S., we obtan that G ε j ε j n G ε. R [K 0 ],C, p p G G ε j = Op p η p. S. Moreover, snce p n and,c, = n p G n p G n R [K 0 ],C, = Op n 4, S.3 G { ε j σ } + σ p n + n G p G p { ε j σ } = Op p η n ε j ε j, S.4 ε j ε j = O p n 4. S.5 S.4 s an mmedate consequence of Lemma S.. S.5 follows upon observng that for any constant C 0 > 0, G n n G G p G ε j ε j > C 0 n /4 p ε j ε j > C 0 n /4 8

{ E n G G { n 4 p C, C0 4 G p,..., 4 G,..., 4 } 4 /{ C0 ε j ε j j,...,j 4 = n /4 } 4 E [ ] }/{ C } 4 0 ε j... ε j4 ε j... ε 4 j 4 n /4 the last nequalty resultng from the fact that the mean E[ε j... ε j4 ε j... ε 4 j 4 ] can only be non-zero f some of the ndex pars l, j l for l =,..., 4 are dentcal. Fnally, wth the help of Lemma S., we get that R [K 0 ],C,3 G ε n G p G p η ε j = Op. S.6 n p Combnng S., S.3 and S.6, we arrve at the statement S. on the remander,c. We fnally show that R [K 0 ] p η,d = Op. S.7 G p For the proof, we wrte,d = { κ σ }{,D, + R[K 0],D, }, where,d, = p p,d, = p ε j Wth the help of Lemma S., we obtan that { } ε n j ε. G Moreover, straghtforward calculatons yeld that S.7 now follows upon combnng S.8 and S.9. R [K 0 ] p η,d, = Op. S.8 G p p,d, = O p. S.9 G n 9

roof of Lemma S.3. We mae use of the followng three results: R Let {W : n} be ndependent random varables wth a standard normal dstrbuton and defne a n = / log n together wth hen for any w R, b n = log n log log n + log4π. log n lm W a n w + b n = exp exp w. n n In partcular, for wα ± ε = log log α ± ε, we get lm W a n wα ± ε + b n = α ± ε. n n he next result s nown as Khntchne s heorem. R Let F n be dstrbuton functons and G a non-degenerate dstrbuton functon. Moreover, let α n > 0 and β n R be such that F n α n x + β n Gx for any contnuty pont x of G. hen there are constants α n > 0 and β n R as well as a non-degenerate dstrbuton functon G such that F n α nx + β n G x at any contnuty pont x of G f and only f α n α n α, β n β n α n β and G x = Gα x + β. he fnal result explots strong approxmaton theory and s a drect consequence of the so-called KM heorems; see Komlós et al. 975, 976: R3 Wrte [K 0] = p X j wth X j = { ε j σ }/ κ and let F denote the dstrbuton functon of X j. It s possble to construct..d. random varables { X j : n, j p} wth the dstrbuton functon F and ndependent standard normal random varables {Z j : 0

n, j p} such that [K 0] = p X j and = p Z j have the followng property: [K 0 ] > Cp +δ θ/ p +δ for some arbtrarly small but fxed δ > 0 and some constant C > 0 that does not depend on, p and n. We now proceed as follows: We show that for any w R, n [K 0] a n w + b n exp exp w. S.0 hs n partcular mples that n [K 0] w n α ± ε α ± ε, S. where w n α ± ε = a n wα ± ε + b n wth a n, b n and wα ± ε as defned n R. he proof of S.0 s postponed untl the arguments for Lemma S.3 are complete. he statement S. n partcular holds n the specal case that ε j N0, σ. In ths case, qα s the α-quantle of n [K 0]. Hence, we have n [K 0] w n α ε α ε qα = α n [K 0] n [K 0] w n α + ε α + ε, whch mples that for suffcently large n. w n α ε qα w n α + ε S. Snce p ξ /a n = p ξ log n = o by C3, we can use S.0 together wth R to obtan that n [K 0] w n α ± ε ± p ξ α ± ε. S.3

As w n α ε p ξ qα p ξ qα + p ξ w n α + ε + p ξ for suffcently large n, t holds that α,ε := n [K 0] w n α ε p ξ α = n [K 0] qα p ξ α = n [K 0] qα + p ξ α,ε := w n α + ε + p ξ n [K 0] for large n. Moreover, snce α,ε α ε and α,ε α + ε for any fxed ε > 0 by S.3, we can conclude that α = α + o and = α + o, whch s the statement of Lemma S.3. α It remans to prove S.0: Usng the notaton from R3 and the shorthand w n = a n w + b n, we can wrte n [K 0] w n = [K 0] w n = n n = π S.4 wth π = [K 0 ] w n. he probabltes π can be decomposed nto two parts as follows: π = w n + { [K } 0] = π + π >, where π π > = w n + { [K } 0], [K 0] Cp = w n + { [K } 0], [K 0] > Cp +δ +δ. Wth the help of R3 and the assumpton that n p θ/4, we can show that n π = = n π + R n, S.5 = where R n s a non-negatve remander term wth R n n = n θ/ p +δ = o.

Moreover, the probabltes π can be bounded by w π n + Cp +δ w n Cp +δ θ/ p +δ, the second lne mang use of R3. From ths, we obtan that n π = Π n Π n + o, S.6 where Π n = n w n + Cp Π n = +δ n w n Cp +δ. By combnng S.4 S.6, we arrve at the ntermedate result that Π n + o n [K 0] w n Π n + o. S.7 Snce p +δ /an = p +δ log n = o, we can use R together wth R to show that Π n exp exp w and Π n exp exp w. S.8 luggng S.8 nto S.7 mmedately yelds that whch completes the proof. n [K 0] w n exp exp w, roof of Lemma S.4. We use the notaton n S = #S along wth ε = p p ε j, µ = p p µ j and d = p p d j. For any S and S C, we can wrte where [K] = κσ p d j +,A + R[K],B + R[K],C + R[K],D R[K],E + R[K],F + R[K],G,,A = κ σ κσ p d j 3

,B = p,c κ = p κ { ε j σ }/ κ,d = κ σ p σ,e = κ σ p,f = κ σ p,g = κ σ p { ε j σ } ε j { ε j ε + ε n j ε } S S { ε + ε n j ε } S S { ε j ε } ε n j ε d j. S We now show that S C S,l = o pp / ξ for l = A,..., G. hs mmedately yelds the statement of Lemma S.4. hroughout the proof, η > 0 denotes a suffcently small constant that results from applyng Lemma S.. Wth the help of Lemma S. and our assumptons on σ and κ, t s straghtforward to see that S C S,l n,l = o pp / ξ for l = A, B, C, D wth some suffcently small ξ > 0. We next show that S C S S [K] R = Op p η. o do so, we wrte,e = { κ σ }{,E, + R[K],E, R[K],E,3 }, where,e, = p,e, = p,e,3 = p,e ε j ε ε j n S S ε j ε j n S S ε. S.9 Lemma S. yelds that S C S,E, n,e, = O pp η / p. Moreover, t holds that = O p p η, S C S,E, 4

snce and S C S S C S,E, = n S p n S p n S S { ε j σ } + σ p n S + n S { ε j σ } n n ε j ε p j < n p p whch follows upon applyng Lemma S.. Fnally, S p ε j ε j { ε j σ } p η = Op ε j ε j = O p p η, n,e,3 S C S { p n p ε j } = Op p η p, whch can agan be seen by applyng Lemma S.. uttng everythng together, we arrve at S.9. Smlar arguments show that,f S C S = O p p η S.30 as well. o analyze the term,g, we denote the sgnal vector of the group G by m = m,,..., m p, and wrte K 0 µ j = λ S, m j, n S S = wth λ S, = #S G /n S. Wth ths notaton, we get where,g = { κ σ }{,G, R[K],G, R[K],G,3 R[K],G,4 + R[K],G,5 },,G, = p = µ j ε j K0,G, = λ S, p,g,3 = p ε d j 5 m j, ε j

,G,4 = n S p S,G,5 = n S K 0 S = µ j ε j λ S, p m j, ε j. Wth the help of Lemma S., t can be shown that S C S for l =,..., 5. For example, t holds that,g,l = Op p η S C S [K] R,G,4 < n p µ j ε j = O p p η. As a result, we obtan that hs completes the proof. S C S [K] R = Op p η.,g S.3 roof of Lemma S.5. Let S C. In partcular, suppose that S G and S G for some. We show the followng clam: there exsts some S such that p d j c p, S.3 where c = δ 0 / wth δ 0 defned n assumpton C. From ths, the statement of Lemma S.5 mmedately follows. For the proof of S.3, we denote the Eucldean dstance between vectors v = v,..., v p and w = w,..., w p by dv, w = p v j w j /. Moreover, as n Lemma S.4, we use the notaton K 0 µ j = λ S, m j,, n S S where n S = #S, λ S, = #S G /n S and m = m,,..., m p, s the sgnal vector of the class G. ae any S G. If = d µ, K 0 = λ S, m = d K 0 m, λ S, m = δ0 p, 6

the proof s fnshed, as S.3 s satsfed for. Next consder the case that d m, K 0 = λ S, m < δ0 p. By assumpton C, t holds that dm, m δ 0 p for. Hence, by the trangle nequalty, mplyng that δ0 p d m, m d m, < K 0 δ0 p + d = = K 0 K 0 λ S, m + d = λ S, m, m, = K 0 d λ S, m, m > δ0 p. λ S, m, m hs shows that the clam S.3 s fulflled for any S G. roof of heorem 4. By heorem 4., K0 > K 0 Ĥ[K] = > qα for all K K 0 Ĥ[K = Ĥ[K 0 ] > qα 0 ] > qα, Ĥ[K] qα for some K < K 0 = Ĥ[K 0 ] > qα + o = α + o and K0 < K 0 = Ĥ[K] qα for some K < K 0 K 0 K= = o. Ĥ[K] qα 7

Moreover, {Ĝ : K } { } 0 G : K 0 = {Ĝ : K } { } 0 G : K 0, K0 = K 0 + {Ĝ : K } { } 0 G : K 0, K0 K 0 = α + o, snce {Ĝ : K } { } 0 G : K 0, K0 = K 0 {Ĝ[K = 0 ] } { } : K 0 G : K 0, K0 = K 0 {Ĝ[K 0 ] } { } : K 0 G : K 0 = o by the consstency property 3. and {Ĝ : K } { } 0 G : K 0, K0 K 0 = K0 K 0 = α + o. roof of heorem 4.3 Wth the help of Lemma S., we can show that ρ, = σ + p µj µ j + op S.33 unformly over and. hs together wth C allows us to prove the followng clam: Wth probablty tendng to, the ndces,..., K belong to K dfferent classes n the case that K K 0 and to K 0 dfferent classes n the case that K > K 0. S.34 Now let K = K 0. Wth the help of S.33 and S.34, the startng values C [K 0],......, C [K 0] K 0 can be shown to have the property that {C [K 0 ] } { } : K 0 = G : K 0. S.35 8

ogether wth Lemma S., S.35 yelds that ρ = σ + p µj m j, + op unformly over and. Combned wth C, ths n turn mples that the -means algorthm converges already after the frst teraton step wth probablty tendng to and Ĝ[K 0] are consstent estmators of the classes G n the sense of 3.. roof of 3.6 Suppose that C C3 along wth 3.5 are satsfed. As already noted n Secton 3.4, the -means estmators {ĜA : K } can be shown to satsfy 3.4, that s, for any =,..., K. ĜA G for some K 0 S.36 hs can be proven by very smlar arguments as the consstency property 3.. We thus omt the detals. Let E A be the event that Ĝ A G for some K 0 holds for all clusters ĜA wth =,..., K. E A can be regarded as the event that the partton {ĜA : K } s a refnement of the class structure {G : K 0 }. By S.36, the event E A occurs wth probablty tendng to. Now consder the estmator σ RSS = K n p/ = ĜA Ŷ B #ĜA ĜA Ŷ B. Snce the random varables Ŷ B are ndependent of the estmators ĜA, t s not dffcult to verfy the followng: for any δ > 0, there exsts a constant C δ > 0 that does not depend on {ĜA : K } such that on the event E A, σ RSS σ C δ { } Ĝ A : K δ. p From ths, the frst statement of 3.6 easly follows. he second statement can be obtaned by smlar arguments. 9

References Komlós, J., Major,. and usnády, G. 975. An approxmaton of partal sums of ndependent RV s, and the sample DF. I. Zetschrft für Wahrschenlchetstheore und Verwandte Gebete, 3 3. Komlós, J., Major,. and usnády, G. 976. An approxmaton of partal sums of ndependent RV s, and the sample DF. II. Zetschrft für Wahrschenlchetstheore und Verwandte Gebete, 34 33 58. 0