Supplement to Clustering with Statistical Error Control

Similar documents
Appendix B. Criterion of Riemann-Stieltjes Integrability

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Foundations of Arithmetic

Lecture 4. Instructor: Haipeng Luo

More metrics on cartesian products

Complete subgraphs in multipartite graphs

Convergence of random processes

Lecture Space-Bounded Derandomization

Linear Regression Analysis: Terminology and Notation

Problem Set 9 Solutions

Maximizing the number of nonnegative subsets

Linear Approximation with Regularization and Moving Least Squares

The path of ants Dragos Crisan, Andrei Petridean, 11 th grade. Colegiul National "Emil Racovita", Cluj-Napoca

Randomness and Computation

The Order Relation and Trace Inequalities for. Hermitian Operators

The Expectation-Maximization Algorithm

NUMERICAL DIFFERENTIATION

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS

Lecture Notes on Linear Regression

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

1 Convex Optimization

k t+1 + c t A t k t, t=0

Errors for Linear Systems

Large Sample Properties of Matching Estimators for Average Treatment Effects by Alberto Abadie & Guido Imbens

Learning Theory: Lecture Notes

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights

Lecture 4: Constant Time SVD Approximation

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

E Tail Inequalities. E.1 Markov s Inequality. Non-Lecture E: Tail Inequalities

Error Probability for M Signals

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Assortment Optimization under MNL

Notes on Frequency Estimation in Data Streams

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

Strong Markov property: Same assertion holds for stopping times τ.

The Geometry of Logit and Probit

MA 323 Geometric Modelling Course Notes: Day 13 Bezier Curves & Bernstein Polynomials

THE WEIGHTED WEAK TYPE INEQUALITY FOR THE STRONG MAXIMAL FUNCTION

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS

Lecture 3. Ax x i a i. i i

Random band matrices in the delocalized phase, II: Generalized resolvent estimates

First Year Examination Department of Statistics, University of Florida

Composite Hypotheses testing

Spectral Graph Theory and its Applications September 16, Lecture 5

FACTORIZATION IN KRULL MONOIDS WITH INFINITE CLASS GROUP

Lecture 10 Support Vector Machines II

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

EEE 241: Linear Systems

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Exercise Solutions to Real Analysis

1 GSW Iterative Techniques for y = Ax

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Estimation: Part 2. Chapter GREG estimation

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

arxiv: v1 [math.ca] 31 Jul 2018

3.1 ML and Empirical Distribution

Dimensionality Reduction Notes 1

Exam. Econometrics - Exam 1

Economics 101. Lecture 4 - Equilibrium and Efficiency

Math 702 Midterm Exam Solutions

18.1 Introduction and Recap

SOME NEW CHARACTERIZATION RESULTS ON EXPONENTIAL AND RELATED DISTRIBUTIONS. Communicated by Ahmad Reza Soltani. 1. Introduction

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

MMA and GCMMA two methods for nonlinear optimization

DIFFERENTIAL FORMS BRIAN OSSERMAN

Lecture 3: Shannon s Theorem

Dirichlet s Theorem In Arithmetic Progressions

Graph Reconstruction by Permutations

Math 217 Fall 2013 Homework 2 Solutions

Appendix B. The Finite Difference Scheme

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

6) Derivatives, gradients and Hessian matrices

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso

APPENDIX A Some Linear Algebra

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Numerical Heat and Mass Transfer

STAT 3008 Applied Regression Analysis

Sample Average Approximation with Adaptive Importance Sampling

P exp(tx) = 1 + t 2k M 2k. k N

Axiomatizations of Pareto Equilibria in Multicriteria Games

SUCCESSIVE MINIMA AND LATTICE POINTS (AFTER HENK, GILLET AND SOULÉ) M(B) := # ( B Z N)

Perfect Competition and the Nash Bargaining Solution

2.3 Nilpotent endomorphisms

Lena Boneva and Oliver Linton. January 2017

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

Random Partitions of Samples

A new construction of 3-separable matrices via an improved decoding of Macula s construction

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

The Second Eigenvalue of Planar Graphs

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

HMMT February 2016 February 20, 2016

Transcription:

Supplement to Clusterng wth Statstcal Error Control Mchael Vogt Unversty of Bonn Matthas Schmd Unversty of Bonn In ths supplement, we provde the proofs that are omtted n the paper. In partcular, we derve heorems 4. 4.3 from Secton 4. hroughout the supplement, we use the symbol C to denote a unversal real constant whch may tae a dfferent value on each occurrence. Auxlary results In the proofs of heorems 4. 4.3, we frequently mae use of the followng unform convergence result. Lemma S.. Let Z s = {Z st : t } be sequences of real-valued random varables for s S wth the followng propertes: for each s, the random varables n Z s are ndependent of each other, and E[Z st ] = 0 and E[ Z st φ ] C < for some φ > and C > 0 that depend nether on s nor on t. Suppose that S = q wth 0 q < φ/. hen s S Z st > η = o, where the constant η > 0 can be chosen as small as desred. roof of Lemma S.. Defne τ S, = S /{+δq+} wth some suffcently small δ > 0. In partcular, let δ > 0 be so small that + δq + < φ. Moreover, set Z st = Z st Z st τ S, E [ Z st Z st τ S, ] Z st > = Z st Z st > τ S, E [ Z st Z st > τ S, ] and wrte Z st = Z st + Z > st.

In what follows, we show that s S s S Z > st Z st > C η = o S. > C η = o S. for any fxed constant C > 0. Combnng S. and S. mmedately yelds the statement of Lemma S.. We start wth the proof of S.: It holds that s S Z > st > C η Q > + Q >, where and Q > := Q > := S s= S s= S s= CS τ φ S, S s= Z st Z st > τ S, > C η Z st > τ S, for some t Z st > τ S, = o S s= [ Zst φ ] E τ φ S, E [ Z st Z st > τ S, ] > C η = 0 for S and suffcently large, snce hs yelds S.. E [ Z st Z st > τ S, ] C τ φ S, [ Zst φ E τ φ S, = o η. ] Z st > τ S,

We next turn to the proof of S.: We apply the crude bound s S Z st > C η S s= Z st > C η and show that for any s S, Z st > C η C 0 ρ, S.3 where C 0 s a fxed constant and ρ > 0 can be chosen as large as desred by pcng η slghtly larger than / / + δ. Snce S = O q, ths mmedately mples S.. o prove S.3, we mae use of the followng facts: For a random varable Z and λ > 0, Marov s nequalty says that ± Z > δ E exp±λz. expλδ Snce Z st/ τ S, /, t holds that λ S, Z st/ /, where we set λ S, = /4τ S,. As expx + x + x for x /, ths mples that [ Z ] st E exp ± λ S, + λ S, E[ Zst ] λ S, exp E[ Zst ]. By defnton of λ S,, t holds that λ S, = 4S +δq+ = 4 q+ +δq+ = 4 +δ. Usng and wrtng EZ st C Z <, we obtan that Z st > C η exp { λ S, C η [ E = exp λ S, C η { Z st > C η + exp λ S, Z st > C η Z st ] + E [ Z ] st E exp λ S, + [ Z ] } st exp λ S, [ Z ] } st E exp λ S, 3

exp λ S, C η λ S, exp E[ Zst ] = exp C Z λ S, Cλ S, η CZ = exp 6 C 0 ρ, C +δ 4 +δ η where ρ > 0 can be chosen arbtrarly large f we pc η slghtly larger than / / + δ. roof of heorem 4. We frst prove that Ĥ[K 0 ] qα = α + o. S.4 o do so, we derve a stochastc expanson of the ndvdual statstcs [K 0]. Lemma S.. It holds that [K 0] = [K 0] +, where and the remander [K 0] = p has the property that n { ε j σ } /κ [K R 0 ] > p ξ = o S.5 for some ξ > 0. he proof of Lemma S. as well as those of the subsequent Lemmas S.3 S.5 are postponed untl the proof of heorem 4. s complete. Wth the help of Lemma S., we can bound the probablty of nterest as follows: Snce α := Ĥ[K 0 ] qα = [K 0] qα n n [K 0] n [K 0] + n n [K 0] n, 4

t holds that where < α > α = = < α α > α, n [K 0] n [K 0] qα n R[K 0] qα + n R[K 0]. As the remander has the property S.5, we further obtan that α + o α α + o, S.6 where α α = = n [K 0] n [K 0] qα p ξ qα + p ξ. Wth the help of strong approxmaton theory, we can derve the followng result on the asymptotc behavour of the probabltes α and α. Lemma S.3. It holds that α α = α + o = α + o. ogether wth S.6, ths mmedately yelds that α = α+o, thus completng the proof of S.4. We next show that for any K < K 0, Ĥ[K] qα = o. S.7 Consder a fxed K < K 0 and let S {Ĝ[K] followng property: : K} be any cluster wth the #S n := mn K0 #G, and S contans elements from at least two dfferent classes G and G. S.8 It s not dffcult to see that a cluster wth the property S.8 must always exst under our condtons. By C {Ĝ[K] : K}, we denote the collecton of clusters that have the property S.8. Wth ths notaton at hand, we can derve the [K] followng stochastc expanson of the ndvdual statstcs. 5

Lemma S.4. For any S and S C, t holds that [K] = κσ p d j +, where d j = µ j #S S µ j and the remander for some small ξ > 0. Usng S.9 and the fact that we obtan that Ĥ[K] qα S C S { [K] S C S S C S κσ p = n [K] has the property that [K] R > p ξ = o S.9 qα [K] qα S C S S C S { κσ p { S C S κσ p d j d j d j } } S C S S C S, [K] R qα } qα + p ξ + o. S.0 he arguments from the proof of Lemma S.3, n partcular S., mply that qα C log n for some fxed constant C > 0 and suffcently large n. Moreover, we can prove the followng result. Lemma S.5. It holds that for some fxed constant c > 0. { p S C S d j } c p Snce qα C log n and log n/ p = o by C3, Lemma S.5 allows us to nfer that { S C S κσ p d j } qα + p ξ = o. ogether wth S.0, ths yelds that Ĥ[K] qα = o. 6

roof of Lemma S.. Let n = #G and wrte ε = p p ε j along wth µ = p p µ j. Snce {Ĝ[K 0 ] } { } : K 0 = G : K 0 by 3., we can gnore the estmaton error n the clusters Ĝ[K 0] by the true classes G. For G, we thus get and replace them [K 0] = [K 0] +,A + R[K 0],B R[K 0],C + R[K 0],D, where,a,b = κ κ = κ p σ σ p,c = κ σ p,d = κ σ p { ε j σ } ε j { ε j ε + } ε n j ε G {ε + } ε n j ε. G We now show that G,l = o p p ξ for any and l = A,..., D. hs mples that n,l = K0 G,l = o p p ξ for l = A,..., D, whch n turn yelds the statement of Lemma S.. hroughout the proof, we use the symbol η > 0 to denote a suffcently small constant whch results from applyng Lemma S.. By assumpton, σ = σ + O p p /+δ and κ = κ + O p p δ for some δ > 0. Applyng Lemma S. and choosng ξ > 0 such that ξ < δ η, we obtan that and R [K 0 ],A G κ { ε } j p κ G σ = κ O p p η = O p p δ η = o p p ξ κ R [K 0 ],B G κ = σ { p σ G ε j σ + σ p} κ σ σ { Op p η + σ p } = o p p ξ. 7

We next show that R [K 0 ],C = op p 4. S. G o do so, we wor wth the decomposton,c = { κ σ }{,C, +R[K 0],C, R[K 0],C,3 }, where,c, = ε j ε p,c, = p,c,3 = p ε j n Wth the help of Lemma S., we obtan that G ε j ε j n G ε. R [K 0 ],C, p p G G ε j = Op p η p. S. Moreover, snce p n and,c, = n p G n p G n R [K 0 ],C, = Op n 4, S.3 G { ε j σ } + σ p n + n G p G p { ε j σ } = Op p η n ε j ε j, S.4 ε j ε j = O p n 4. S.5 S.4 s an mmedate consequence of Lemma S.. S.5 follows upon observng that for any constant C 0 > 0, G n n G G p G ε j ε j > C 0 n /4 p ε j ε j > C 0 n /4 8

{ E n G G { n 4 p C, C0 4 G p,..., 4 G,..., 4 } 4 /{ C0 ε j ε j j,...,j 4 = n /4 } 4 E [ ] }/{ C } 4 0 ε j... ε j4 ε j... ε 4 j 4 n /4 the last nequalty resultng from the fact that the mean E[ε j... ε j4 ε j... ε 4 j 4 ] can only be non-zero f some of the ndex pars l, j l for l =,..., 4 are dentcal. Fnally, wth the help of Lemma S., we get that R [K 0 ],C,3 G ε n G p G p η ε j = Op. S.6 n p Combnng S., S.3 and S.6, we arrve at the statement S. on the remander,c. We fnally show that R [K 0 ] p η,d = Op. S.7 G p For the proof, we wrte,d = { κ σ }{,D, + R[K 0],D, }, where,d, = p p,d, = p ε j Wth the help of Lemma S., we obtan that { } ε n j ε. G Moreover, straghtforward calculatons yeld that S.7 now follows upon combnng S.8 and S.9. R [K 0 ] p η,d, = Op. S.8 G p p,d, = O p. S.9 G n 9

roof of Lemma S.3. We mae use of the followng three results: R Let {W : n} be ndependent random varables wth a standard normal dstrbuton and defne a n = / log n together wth hen for any w R, b n = log n log log n + log4π. log n lm W a n w + b n = exp exp w. n n In partcular, for wα ± ε = log log α ± ε, we get lm W a n wα ± ε + b n = α ± ε. n n he next result s nown as Khntchne s heorem. R Let F n be dstrbuton functons and G a non-degenerate dstrbuton functon. Moreover, let α n > 0 and β n R be such that F n α n x + β n Gx for any contnuty pont x of G. hen there are constants α n > 0 and β n R as well as a non-degenerate dstrbuton functon G such that F n α nx + β n G x at any contnuty pont x of G f and only f α n α n α, β n β n α n β and G x = Gα x + β. he fnal result explots strong approxmaton theory and s a drect consequence of the so-called KM heorems; see Komlós et al. 975, 976: R3 Wrte [K 0] = p X j wth X j = { ε j σ }/ κ and let F denote the dstrbuton functon of X j. It s possble to construct..d. random varables { X j : n, j p} wth the dstrbuton functon F and ndependent standard normal random varables {Z j : 0

n, j p} such that [K 0] = p X j and = p Z j have the followng property: [K 0 ] > Cp +δ θ/ p +δ for some arbtrarly small but fxed δ > 0 and some constant C > 0 that does not depend on, p and n. We now proceed as follows: We show that for any w R, n [K 0] a n w + b n exp exp w. S.0 hs n partcular mples that n [K 0] w n α ± ε α ± ε, S. where w n α ± ε = a n wα ± ε + b n wth a n, b n and wα ± ε as defned n R. he proof of S.0 s postponed untl the arguments for Lemma S.3 are complete. he statement S. n partcular holds n the specal case that ε j N0, σ. In ths case, qα s the α-quantle of n [K 0]. Hence, we have n [K 0] w n α ε α ε qα = α n [K 0] n [K 0] w n α + ε α + ε, whch mples that for suffcently large n. w n α ε qα w n α + ε S. Snce p ξ /a n = p ξ log n = o by C3, we can use S.0 together wth R to obtan that n [K 0] w n α ± ε ± p ξ α ± ε. S.3

As w n α ε p ξ qα p ξ qα + p ξ w n α + ε + p ξ for suffcently large n, t holds that α,ε := n [K 0] w n α ε p ξ α = n [K 0] qα p ξ α = n [K 0] qα + p ξ α,ε := w n α + ε + p ξ n [K 0] for large n. Moreover, snce α,ε α ε and α,ε α + ε for any fxed ε > 0 by S.3, we can conclude that α = α + o and = α + o, whch s the statement of Lemma S.3. α It remans to prove S.0: Usng the notaton from R3 and the shorthand w n = a n w + b n, we can wrte n [K 0] w n = [K 0] w n = n n = π S.4 wth π = [K 0 ] w n. he probabltes π can be decomposed nto two parts as follows: π = w n + { [K } 0] = π + π >, where π π > = w n + { [K } 0], [K 0] Cp = w n + { [K } 0], [K 0] > Cp +δ +δ. Wth the help of R3 and the assumpton that n p θ/4, we can show that n π = = n π + R n, S.5 = where R n s a non-negatve remander term wth R n n = n θ/ p +δ = o.

Moreover, the probabltes π can be bounded by w π n + Cp +δ w n Cp +δ θ/ p +δ, the second lne mang use of R3. From ths, we obtan that n π = Π n Π n + o, S.6 where Π n = n w n + Cp Π n = +δ n w n Cp +δ. By combnng S.4 S.6, we arrve at the ntermedate result that Π n + o n [K 0] w n Π n + o. S.7 Snce p +δ /an = p +δ log n = o, we can use R together wth R to show that Π n exp exp w and Π n exp exp w. S.8 luggng S.8 nto S.7 mmedately yelds that whch completes the proof. n [K 0] w n exp exp w, roof of Lemma S.4. We use the notaton n S = #S along wth ε = p p ε j, µ = p p µ j and d = p p d j. For any S and S C, we can wrte where [K] = κσ p d j +,A + R[K],B + R[K],C + R[K],D R[K],E + R[K],F + R[K],G,,A = κ σ κσ p d j 3

,B = p,c κ = p κ { ε j σ }/ κ,d = κ σ p σ,e = κ σ p,f = κ σ p,g = κ σ p { ε j σ } ε j { ε j ε + ε n j ε } S S { ε + ε n j ε } S S { ε j ε } ε n j ε d j. S We now show that S C S,l = o pp / ξ for l = A,..., G. hs mmedately yelds the statement of Lemma S.4. hroughout the proof, η > 0 denotes a suffcently small constant that results from applyng Lemma S.. Wth the help of Lemma S. and our assumptons on σ and κ, t s straghtforward to see that S C S,l n,l = o pp / ξ for l = A, B, C, D wth some suffcently small ξ > 0. We next show that S C S S [K] R = Op p η. o do so, we wrte,e = { κ σ }{,E, + R[K],E, R[K],E,3 }, where,e, = p,e, = p,e,3 = p,e ε j ε ε j n S S ε j ε j n S S ε. S.9 Lemma S. yelds that S C S,E, n,e, = O pp η / p. Moreover, t holds that = O p p η, S C S,E, 4

snce and S C S S C S,E, = n S p n S p n S S { ε j σ } + σ p n S + n S { ε j σ } n n ε j ε p j < n p p whch follows upon applyng Lemma S.. Fnally, S p ε j ε j { ε j σ } p η = Op ε j ε j = O p p η, n,e,3 S C S { p n p ε j } = Op p η p, whch can agan be seen by applyng Lemma S.. uttng everythng together, we arrve at S.9. Smlar arguments show that,f S C S = O p p η S.30 as well. o analyze the term,g, we denote the sgnal vector of the group G by m = m,,..., m p, and wrte K 0 µ j = λ S, m j, n S S = wth λ S, = #S G /n S. Wth ths notaton, we get where,g = { κ σ }{,G, R[K],G, R[K],G,3 R[K],G,4 + R[K],G,5 },,G, = p = µ j ε j K0,G, = λ S, p,g,3 = p ε d j 5 m j, ε j

,G,4 = n S p S,G,5 = n S K 0 S = µ j ε j λ S, p m j, ε j. Wth the help of Lemma S., t can be shown that S C S for l =,..., 5. For example, t holds that,g,l = Op p η S C S [K] R,G,4 < n p µ j ε j = O p p η. As a result, we obtan that hs completes the proof. S C S [K] R = Op p η.,g S.3 roof of Lemma S.5. Let S C. In partcular, suppose that S G and S G for some. We show the followng clam: there exsts some S such that p d j c p, S.3 where c = δ 0 / wth δ 0 defned n assumpton C. From ths, the statement of Lemma S.5 mmedately follows. For the proof of S.3, we denote the Eucldean dstance between vectors v = v,..., v p and w = w,..., w p by dv, w = p v j w j /. Moreover, as n Lemma S.4, we use the notaton K 0 µ j = λ S, m j,, n S S where n S = #S, λ S, = #S G /n S and m = m,,..., m p, s the sgnal vector of the class G. ae any S G. If = d µ, K 0 = λ S, m = d K 0 m, λ S, m = δ0 p, 6

the proof s fnshed, as S.3 s satsfed for. Next consder the case that d m, K 0 = λ S, m < δ0 p. By assumpton C, t holds that dm, m δ 0 p for. Hence, by the trangle nequalty, mplyng that δ0 p d m, m d m, < K 0 δ0 p + d = = K 0 K 0 λ S, m + d = λ S, m, m, = K 0 d λ S, m, m > δ0 p. λ S, m, m hs shows that the clam S.3 s fulflled for any S G. roof of heorem 4. By heorem 4., K0 > K 0 Ĥ[K] = > qα for all K K 0 Ĥ[K = Ĥ[K 0 ] > qα 0 ] > qα, Ĥ[K] qα for some K < K 0 = Ĥ[K 0 ] > qα + o = α + o and K0 < K 0 = Ĥ[K] qα for some K < K 0 K 0 K= = o. Ĥ[K] qα 7

Moreover, {Ĝ : K } { } 0 G : K 0 = {Ĝ : K } { } 0 G : K 0, K0 = K 0 + {Ĝ : K } { } 0 G : K 0, K0 K 0 = α + o, snce {Ĝ : K } { } 0 G : K 0, K0 = K 0 {Ĝ[K = 0 ] } { } : K 0 G : K 0, K0 = K 0 {Ĝ[K 0 ] } { } : K 0 G : K 0 = o by the consstency property 3. and {Ĝ : K } { } 0 G : K 0, K0 K 0 = K0 K 0 = α + o. roof of heorem 4.3 Wth the help of Lemma S., we can show that ρ, = σ + p µj µ j + op S.33 unformly over and. hs together wth C allows us to prove the followng clam: Wth probablty tendng to, the ndces,..., K belong to K dfferent classes n the case that K K 0 and to K 0 dfferent classes n the case that K > K 0. S.34 Now let K = K 0. Wth the help of S.33 and S.34, the startng values C [K 0],......, C [K 0] K 0 can be shown to have the property that {C [K 0 ] } { } : K 0 = G : K 0. S.35 8

ogether wth Lemma S., S.35 yelds that ρ = σ + p µj m j, + op unformly over and. Combned wth C, ths n turn mples that the -means algorthm converges already after the frst teraton step wth probablty tendng to and Ĝ[K 0] are consstent estmators of the classes G n the sense of 3.. roof of 3.6 Suppose that C C3 along wth 3.5 are satsfed. As already noted n Secton 3.4, the -means estmators {ĜA : K } can be shown to satsfy 3.4, that s, for any =,..., K. ĜA G for some K 0 S.36 hs can be proven by very smlar arguments as the consstency property 3.. We thus omt the detals. Let E A be the event that Ĝ A G for some K 0 holds for all clusters ĜA wth =,..., K. E A can be regarded as the event that the partton {ĜA : K } s a refnement of the class structure {G : K 0 }. By S.36, the event E A occurs wth probablty tendng to. Now consder the estmator σ RSS = K n p/ = ĜA Ŷ B #ĜA ĜA Ŷ B. Snce the random varables Ŷ B are ndependent of the estmators ĜA, t s not dffcult to verfy the followng: for any δ > 0, there exsts a constant C δ > 0 that does not depend on {ĜA : K } such that on the event E A, σ RSS σ C δ { } Ĝ A : K δ. p From ths, the frst statement of 3.6 easly follows. he second statement can be obtaned by smlar arguments. 9

References Komlós, J., Major,. and usnády, G. 975. An approxmaton of partal sums of ndependent RV s, and the sample DF. I. Zetschrft für Wahrschenlchetstheore und Verwandte Gebete, 3 3. Komlós, J., Major,. and usnády, G. 976. An approxmaton of partal sums of ndependent RV s, and the sample DF. II. Zetschrft für Wahrschenlchetstheore und Verwandte Gebete, 34 33 58. 0