On Classification Based on Totally Bounded Classes of Functions when There are Incomplete Covariates

Size: px

Start display at page:

Download "On Classification Based on Totally Bounded Classes of Functions when There are Incomplete Covariates"

Sylvia Shepherd
5 years ago
Views:

1 Joural of Statistical Theory ad Applicatios Volume, Number 4, 0, pp ISSN O Classificatio Based o Totally Bouded Classes of Fuctios whe There are Icomplete Covariates Majid Mojirsheibai ad Zahra Motazeri Abstract This article deals with the two-group classificatio problem, where the class coditioal probability πz) = PY = Z = z belogs to a kow class of fuctios F which is totally bouded with respect to the supremum orm. Give a ɛ-cover F ɛ of F, we cosider kerel regressio methods for costructig classifiers usig members of F ɛ. A Horvitz-Thompsotype iverse weightig approach will be used to hadle the presece of icomplete covariates i the data. Coditios uder which the resultig classifiers are strogly cosistet are also give. Key Words ad Phrases. Classificatio, cosistecy, empirical process, coverig umber. AMS 000 Subject Classificatios. 6H30. Departmet of Mathematics, Califoria State Uiversity Northridge, CA 9330, USA. majid.mojirsheibai@csu.edu Departmet of Epidemiology ad Commuity Medicie, Faculty of Medicie, Uiversity of Ottawa, 45 Smyth 358) Ottawa, ON, KH 8M5. zmotaze@uottawa.ca

2 M. Mojirsheibai ad Z. Motazeri 354 Itroductio Cosider the followig stadard two-group classificatio problem. Let Z, Y ) be a radom pair, where Z R s is a radom vector of covariates or predictors ad Y 0, has to be predicted based o the vector Z. More precisely, oe would like to fid a fuctio a classifier) g: R s 0, for which the misclassificatio error probability Lg) = PgZ) Y is as small as possible. The best classifier, called the Bayes classifier ad deoted by g B, is give by where if π z) > g B z) = 0 otherwise, ) π z) = PY = Z = z = EY Z = z). ) For a proof of this fact see, for example, Devroye et. al. 996; Chapter ).) The error of this classifier will be deoted by L throughout this paper, i.e., L := Pg B Z) Y. 3) I passig, we also ote that if Z ad Y are idepedet the π z) is a costat fuctio of z ad is i fact equal to P Y =. I this extreme case, g B Z) is either always or always 0. O the other had, if Y = IZ B for some B R s the π z) = Iz B ad also g B z) = Iz B. I practice oe does ot kow the uderlyig probability distributio of the pair Z, Y ), ad therefore fidig g B traiig sample is virtually impossible. However, i statistics, oe usually has access to a Z, Y ), Z, Y ),, Z, Y ) draw from F. The goal is the to costruct a data-based classificatio rule g, whose coditioal error rate L g ) = Pg Z) Y Z i, Y i ), i =,,

3 Icomplete Covariates 355 is i some sese small. A desirable property for a data-based classifier is cosistecy: A classifier g is said to be cosistet if L g ) coverges to Lg B ) i probability. If the covergece holds almost surely the g is said to be strogly cosistet. Next, let F be a give class of fuctios π : R s [0, ]. For ay real-valued fuctio f o R s, let f = sup z R s fz) be its usual supremum orm ad put Bπ, ɛ) = h : R s [0, ] π h < ɛ, i.e., Bπ, ɛ) is the ope ball of fuctios, cetered at π, with the -radius ɛ > 0. Suppose that the fiite set of fuctios F ɛ = π,, π Nɛ), where π i : R s 0,, i Nɛ) <, is a ɛ-cover of the family F i the usual sese that sup π F mi i Nɛ) π π i ɛ. Note that F Nɛ) Bπ i, ɛ). Here, each member of F ɛ may or may ot be a member of F. The coverig umber of the family F with respect to the -orm, deoted by N ɛ, F), is the cardiality F ɛ of the smallest ɛ-cover of F. If N ɛ, F) < for every ɛ > 0 the F is said to be totally bouded. I passig we also ote the close relatioship betwee compactess ad total boudedess also called pre-compactess): compactess implies total boudedess, but the coverse is ot i geeral true. I fact, a metric space is compact if ad oly if it is complete ad totally bouded, this is the Heie-Borel theorem for geeral metric spaces). For more o these ad other properties of compact metric spaces oe may refer, for example, to Willard 004). Next, for each π F cosider the classifier if πz) > /, g π z) = 0 otherwise. 4) Let L π) = Ig π Z i ) Y i 5)

4 M. Mojirsheibai ad Z. Motazeri 356 be the empirical error rate of g π. The the so-called skeleto estimate of π, selected from F ɛ, is give by see, for example, Chapter 8 of Devroye et. al. 996)): with the correspodig sample-based classifier see )): if π z) > /, g π z) = 0 otherwise. π = argmi L π), 6) Let L π ) = Pg π X) Y X i, Y i ), i =,, be the error of the classifier g π. The followig theorem establishes the cosistecy of the resultig classifier see Theorem 8. of Devroye et. al. 996)). Theorem Let F be a totally bouded class of fuctios mappig R s [0, ]. If π z) F the there is a sequece ɛ > 0 ad a sequece F ɛ π, selected from F ɛ, oe has L π ) a.s. L. Here ɛ ca be take as the smallest positive umber for which log N ɛ, F) ɛ. See Devroye et. al. 996; Chapter 8)) for a proof of this result. F such that for the skeleto estimate I the ext sectio we shall cosider the case where some of the compoets of the covariate vectors Z i may be missig. More specifically, we study the case with Z i = X i, V i ) R d+p, d + p = s, where X i R d, d, is always observable, but V i R p may be missig for the i th observatio. To deal with this difficulty, we propose a Horvitz-Thompso-type estimatio approach which works by weightig the complete cases by the iverse of the missig data probabilities. The problem of classificatio with missig covariates has also bee addressed by Mojirsheibai ad Motazeri 007), uder differet assumptios. Mai Results. Motivatio I this sectio we cosider the case where some compoets of Z i s may be missig. More specifically, we cosider the situatio where Z i = X i, V i ) R d+p, ad where X i R d, d,

5 Icomplete Covariates 357 is always observable, but V i R p may be missig for the i th observatio. We also defie the radom variables 0 if V i is missig i = otherwise, i =,. Now, the data may be represeted by D = Z, Y, ),, Z, Y, ) = X, V, Y, ),, X, V, Y, ). Let Z, Y ) be a ew observatio, for which Y 0, has to be predicted based o Z ad the data D ); here Z, Y ) iid = Z, Y ). Clearly the miimizatio i 6) is o loger possible uder the curret setup where there are missig V i s amog the data. This is because the computatio of the right had side of 5) requires every Z i, i =,,. Usig the complete cases aloe i 5) will ot solve the problem; here a complete case refers to a fully observable Z i i.e., whe i = ). The reaso is that if we choose π as the miimizer of L π) := i Ig π Z i ) Y i, the the correspodig empirical process L π) Lπ) π F is ot cetered i geeral ot eve asymptotically), ad this plays a crucial role i establishig the theoretical validity of g π. I fact, it is clear that L π) is ot i geeral ubiased for Lπ). To motivate the procedures of this sectio, we also eed to defie the missig probability mechaism, i.e., the quatity pz i, Y i ) := P i = Z i, Y i = E i Z i, Y i ), i =,,. I what follows we shall also assume that pz i, Y i ) p 0 > 0 ; 7) this is a assumptio which says, i a sese, that there is always a ozero probability p 0 that a observatio is ot missig. Now, cosider the hypothetical situatio where the above fuctio p is kow ad put L p π) := i pz i, Y i ) Ig πz i ) Y i, 8)

6 M. Mojirsheibai ad Z. Motazeri 358 where g π is as i 4). I passig we also ote that 5) is the special case of 8) whe E i ) = for all i. I fact, it is straightforward to see that L p π) satisfies E[ L p π)] = Lπ), where Lπ) = Pg π Z) Y ). It is importat to metio that the idea i 8) is very similar to that used by Györfi et al. 00; Chapter 6) for the ubiased estimatio of a mea from cesored data. Next, defie the followig revised versio of the estimator π i 6) ad let g π be its correspodig classifier, i.e., if π z) > /, g π z) = 0 otherwise. π = argmi Lp π) 9) 0) To study the performace of g π, let L π ) = P g π Z) Y D be the misclassificatio error of g π. The we have the followig result. Theorem Let F be a totally bouded class of fuctios mappig R d [0, ] cotaiig the fuctio π x) = PY = X = x. The for every ɛ ad δ satisfyig δ > ɛ > 0 oe has P L π ) L > δ N ɛ, F) exp δ/ ɛ) p 0, where p 0 is as i 7). The proofs of the theorems will be deferred util all the results have bee stated.the followig corollary is a immediate cosequece of the Borel-Camtelli lemma: Corollary Let ɛ be a sequece of positive costats decreasig to 0. Also let F be the class of fuctios defied i Theorem. If, as, the log N ɛ, F) 0 L π ) a.s. L. Thus, if the missig probability mechaism pz i, Y i ) were kow, the above approach would provide the theoretical basis to costruct strogly cosistet classifiers. Ufortuately, i practice, the missig probability mechaism is almost always ukow ad must be estimated. I the ext sectio we propose a kerel-based approach to overcome this problem.

7 Icomplete Covariates 359. Kerel Regressio Let pz i, Y i ) = P i = Z i, Y i be the missig probability, i.e., the coditioal probability that V i is missig recall that Z i = X i, V i ) ). Uder the commoly used assumptio of data Missig At Radom MAR), oe assumes that the probability that V i is missig does ot deped o V i itself. That is, P i = Z i, Y i = P i = X i, Y i =: qx i, Y i ). ) Whe P i = Z i, Y i = P i = the V i is said to be Missig Completely At Radom MCAR). For these defiitios ad a survey of other missig patters oe may refer to the book by Little ad Rubi 00). Now cosider the followig kerel-based estimator of the fuctio qx i, Y i ) defied i ): qx i, Y i ) = ) j=, i Xj X jiy j = Y i K i j=, i IY j = Y i K Xj X i h h ), ) with the covetio 0/0 = 0, where K : R d R + is ay kerel with the smoothig parameter h; here h h) 0, as.) Next, for each π F, put L q π) := i qz i, Y i ) Ig πz i ) Y i, ad defie π = argmi L q π). The the correspodig classifier is give by if π z) > /, g π z) = 0 otherwise. 3) To assess the performace of g π we will make the followig assumptios: C: The MAR assumptio ) holds with qx i, Y i ) q 0 > 0, for some positive costat q 0, compare with 7)). C: The radom vector X has a compactly supported desity fuctio, fx), ad f is bouded away from zero o its support. Furthermore, f ad its first-order partial derivatives are uiformly bouded o its support.

8 M. Mojirsheibai ad Z. Motazeri 360 C3: The partial derivatives x i qx, y), where i =, d), exist ad are bouded o the compact support of f, uiformly i x. C4: The kerel K satisfies Ku)du = ad u i Ku)du <, i =,, d, ad K <. The smoothig parameter h satisfies h 0 ad h d, as. The followig theorem gives performace bouds for the classifier g π. Theorem 3 Let F be as i Theorem ad defie the classifier g π that coditios C C4 hold. i) For every δ > ɛ > 0 there is a 0 > 0 such that for all > 0, as i 3). Also suppose P L π ) L > δ N ɛ, F) e δ ɛ) q0 /8 + 4 e c δ ɛ)/4) h d + e c h d), where L π ) = Pg π Z) Y D ad where c ad c are positive costats ot depedig o, δ, or ɛ. ii) Let ɛ be a sequece of positive costats decreasig to 0. If, as, log N ɛ, F) 0 ad log h 0 the L π ) a.s. L. The above results, as well as those i Theorem ad corollary, are based o the requiremet that F is totally bouded. Furthermore, the ɛ-coverig umber N ɛ, F) of the class F should ot grow too fast as ɛ gets closer ad closer to 0). There are may importat classes of fuctios that satisfy these requiremets; here we give two examples: Example. Differetiable fuctios.) Let k,, k s be o-egative itegers ad put k = k,, k s ) ad k = k + +k s. Also, for ay g : R s R, let D k) gu) = k gu)/ u k,, uks s. Cosider the class of fuctios with bouded partial derivatives of order r: G = g : [0, ] d R k r sup D k) gu) A <. u The, for every ɛ > 0, log N ɛ, Ψ) Mɛ α, where α = d/r ad M Md, r). This result is due to Kolmogorov ad Tikhomirov 959).

9 Icomplete Covariates 36 Example. Cosider the class Ψ of all covex fuctios ψ : C [0, ], where C R d is compact ad covex. If ψ satisfies the Lipschitz coditio ψz ) ψz ) L z z, for all z, z C, the log N ɛ, Ψ) Mɛ d/, for every ɛ > 0, where M Md, L); see Va der Vaart ad Weller 996)..3 Least-squares Regressio I this sectio we cosider least-squares LS) estimates of the fuctio q. The method works as follows. Suppose that q belogs to the kow class of fuctios Q of the form q : R d 0, [q 0, ], where q 0 is as i assumptio C. The least-squares estimate of q is give by Now, for each π F, let q = argmi q Q i qx i, Y i )). L q π) := i qz i, Y i ) Ig πz i ) Y i, ad defie π = argmi L q π). I this case, we cosider the followig classifier if π z) > /, g π z) = 0 otherwise. 4) To study the performace of g π we also eed the followig stadard otatio from the empirical process theory. Fix x, y ),, x, y ) ad let N ɛ, Q, x i, y i ) ) be the ɛ-coverig umber of the class Q with respect to the empirical measure of the poits x, y ),, x, y ). That is, N ɛ, Q, x i, y i ) ) is the cardiality of the smallest subclass of fuctios Q ɛ = q,, q Nɛ) q i : R d 0, [q 0, ] such that for every q Q ad every ɛ > 0 there is a q Q ɛ such that qx i, y i ) q x i, y i ) < ɛ. For more o this oe may refer, for example, to Pollard 984) or va der Vaart ad Weller 996). We the have the followig result. Theorem 4 Let F be as i Theorem ad suppose that coditio C holds. Also, defie the classifier g π as i 4) ad set L π ) = Pg π Z) Y D. The:

10 M. Mojirsheibai ad Z. Motazeri 36 i) For every δ > ɛ > 0 there is a 0 > 0 such that for all > 0, P L π ) L > δ N ɛ, F) e δ ɛ) q0 /8 [ δ ɛ)q ) ] + 8E N 0, Q, X i, Y i ) e C 3δ ɛ) 64 [ δ ɛ) q 4 ) ] 0 + 8E N, Q, X i, Y i ) e C 4δ ɛ) 4 04 where c 3 ad c 4 are positive costats ot depedig o, δ, or ɛ. ii) Let ɛ be a sequece of positive costats decreasig to 0. If, as, )] log N ɛ, F) log E [N c, Q, X i, Y i ) 0 ad 0, c > 0, the L π ) a.s. L. 3 Proofs Proof of Theorem. The proof is based o stadard argumets, see, for example, Devroye et al. 996; Sec. 8.3)), ad goes as follows. First observe that for ay classifier g PgZ) Y = PgZ) = Y ) = PgZ) =, Y = + PgZ) = 0, Y = 0 [ ] [ ] = E IgZ) = IY = E IgZ) = 0 IY = 0 [ ] [ ] = E E IgZ) = IY = Z E E IgZ) = 0 IY = 0 Z [ ] = E IgZ) = π Z) + IgZ) = 0 π Z)), where π Z) = PY = Z. Thus, [ ] pgz) y L = E Ig B Z) = π Z) + Ig B Z) = 0 π Z)) [ ] E IgZ) = π Z) + IgZ) = 0 π Z)) [ ) = E π Z) Ig B Z) = IgZ) = )] + π Z)) Ig B Z) = 0 IgZ) = 0 [ )] = E π Z) ) Ig B Z) = IgZ) = = E[ π Z) ] Ig B Z) gz), 5) i view of the defiitios of g B ad π i ) ad )).

11 Icomplete Covariates 363 Now let π F ad put Lπ) = Pg π Z) Y, where ad ote that by 5) if πz) > g π z) = 0 otherwise, Lπ) Lπ ) = E[ π Z) ] Ig B Z) g π Z) E πz) π Z), 6) where the last lie follows sice π Z) 0.5 πz) π Z) wheever g B Z) gz). Let π F ɛ be such that π Bπ, ɛ); this is possible sice F ɛ is a ɛ-cover of F ad π F. Sice if Lπ) L E π Z) π Z), by 6)) sup z R d+p π z) π z) ɛ, because π Bπ, ɛ)), 7) oe fids that for every δ > ɛ > 0 P L π ) L > δ P = P P sup L π ) if Lπ) > δ ɛ Lπ) > δ ɛ L π ) L p π ) + L p π ) if Lp π) Lπ) > δ ɛ N ɛ, F) sup P Lp π) Lπ) δ > ɛ. Now, by Hoeffdig s iequality, this last probability statemet appearig above ca be bouded by exp δ/ ɛ) p 0, ad this completes the proof of the theorem. Proof of Theorem 3. Part i) For each π F, let L q π) := i qx i, Y i ) Ig πz i ) Y i

12 M. Mojirsheibai ad Z. Motazeri 364 ad observe that L q π) L q π) = Furthermore, sice i Ig π Z i ) Y i i Ig π Z i ) Y i qx i, Y i ) qx i, Y i ) ) qx i, Y i ) qx i, Y i ) qx i, Y i ). L π ) if Lπ) = [ ] L π ) L q π ) sup L q π) Lπ), [ + L q π ) if ] Lπ) oe fids that P L π ) L > δ P L π ) if Lπ) > δ ɛ, i view of 7)) P sup L q π) Lπ) > δ ɛ ) P qx i, Y i ) qx i, Y i ) qx i, Y i ) > δ 4 ɛ + P sup L q π) Lπ) > δ 4 ɛ := I + II, say). 8) But, usig the MAR assumptio see )), it is straightforward to see that E[ L q π)] = Lπ). Therefore II N ɛ, F) P Lq π) Lπ) > δ 4 ɛ N ɛ, F) e δ ɛ) q 0 /8, via Hoeffdig s iequality). 9) As for the term I i 8) first ote that [ I P qx i, Y i ) qx i, Y i ) qx i, Y i ) δ ɛ ] [ > 4 [ + P qx i, Y i ) < q ] 0 P qxi, Y i ) qx i, Y i ) /q0) > δ ɛ 4 + qx i, Y i ) > q ] 0 P qx i, Y i ) < q 0. 0) It will be show at the ed of the proof that for every costat b > 0, ad large eough, P qxi, Y i ) qx i, Y i ) > b 4e C 3h d b, )

13 Icomplete Covariates 365 where C 3 is a positive costat ot depedig o or ɛ. Therefore, takig b = δ ɛ i ), the first sum o the r.h.s. of 0) is bouded by 4e C 4h d δ ɛ), for large eough, where C 4 > 0 does ot deped o, δ, or ɛ. Similarly, sice P qx i, Y i ) < q 0 / P qxi, Y i ) qx i, Y i ) > q 0, oe fids, via )), that for large eough, the secod sum o the r.h.s. of 0) is bouded by 4e C 5h d, where the costat C 5 is positive ad does ot deped o or ɛ. Puttig the above together, we have show that for large eough, I 4 e C 4h d δ ɛ) + 4 e C 5h d. This completes the proof of part i) of Theorem 3. Part ii) follows from the Borel-Catelli lemma. Proof of ). Sice qx i, Y i ) qx i, Y i ), it is sufficiet to prove ) for 0 < b. Now, let SX i, Y i ) = fx i )P Y = Y i Y i )qx i, Y i ) ŜX i, Y i ) = ) h d Xj X ) i j IY j = Y i K h j=, i RX i, Y i ) = fx i )P Y = Y i Y i RX i, Y i ) = ) h d Xj X ) i IY j = Y i K h j=, i ad observe that qx i, Y i ) qx i, Y i ) = ŜX i, Y i ) RX i, Y i ) SX i, Y i ) RX i, Y i ) = ŜX i, Y i )/ RX i, Y i ) RX i, Y i ) RX i, Y i ) RX i, Y i )) + ŜX i, Y i ) SX i, Y i ) RX i, Y i ) RX i, Y i ) RX i, Y i ) + RX i, Y i ) ŜX i, Y i ) SX i, Y i ), RX i, Y i ) where we have used the fact that ŜX i, Y i )/ RX i, Y i ). Therefore, sice RX i, Y i ) > C 6 0, by assumptio C)), oe fids that for every b > 0 P qxi, Y i ) qx i, Y i ) > b P ŜX i, Y i ) SX i, Y i ) > C 7 b + P RXi, Y i ) RX i, Y i ) > C 7 b := π + π. )

14 M. Mojirsheibai ad Z. Motazeri 366 where C 7 = C 6 /. Now, by the results of Mojirsheibai et al. 0; Lemma A., with gz, Y ) = ) oe fids ] SX i, Y i ) E[ŜXi, Y i ) X i, Y i Ch, 3) where C > 0 is a costat ot depedig o. Therefore ] ] E π P ŜX i, Y i ) E[ŜXi, Y i ) X i, Y i + [ŜXi, Y i ) X i, Y i SX i, Y i )] > C7 b ] P ŜX i, Y i ) E[ŜXi, Y i ) X i, Y i > C8 b where for large by 3)), where C 8 = C 7 /) [ ] ] = E P ŜX i, Y i ) E[ŜXi, Y i ) X i, Y i > C8 b X i, Y i = E P ) Γ j X i, Y i ) > C 8b X i, Y i, 4) j=, i [ Γ j X i, Y i ) = h d j IY j = Y i K Xj X i h ) E j IY j = Y i K Xj X i h ) Xi, Y i ]. However, coditioal o X i, Y i ), the terms Γ j X i, Y i ), j =,,, are idepedet, zero-mea radom variables, bouded by h d K ad +h d K. We also ote that ] VarΓ j X i, Y i ) X i, Y i ) = E [Γ j X i, Y i ) X i, Y i h d K f. Therefore, by Beett s iequality Beett, 96), for ay fixed oradom) x ad y P ) Γ j X i, Y i ) > C 8 b )h d X i = x, Y i = y exp C8 b, K f + C 8 b j=, i where the boud does ot deped o x or y. 0 < b, oe fids for large eough), )h d C 8 π exp b. K f + C 8 Therefore, i view of 4) ad the fact that Similarly, oe ca also show with, i fact, less efforts) that, for large eough, )h d C 9 π exp b, K f + C 9 where C 9 is a positive costat ot depedig o or b. This complete the proof of ).

15 Icomplete Covariates 367 Proof of Theorem 4. Part i) Usig 7) ad the argumets that lead to 8), we fid P L π ) L > δ I + II, where II is as i 8) ad But, by 9), I := P ) qx i, Y i ) qx i, Y i ) qx i, Y i ) > δ 4 ɛ. II < N ɛ, F) e δ ɛ) q 0 /8. To deal with the term I first ote that sice q q 0, oe fids I P q0 qx i, Y i ) qx i, Y i ) > δ ɛ 4 [ ] P qx i, Y i ) qx i, Y i ) E qx, Y ) qx, Y ) D [ ] + E qx, Y ) qx, Y ) D > δ ɛ)q 0 4 P sup q X, Y ) qx i, Y i ) Eq X, Y ) qx, Y ) > δ ɛ)q 0 q Q 8 [ ] + P E qx, Y ) qx, Y ) D > δ ɛ)q 0 8 := I A) + I B). 5) Stadard results from the empirical process theorey, see for example, Pollard 984)), yields [ δ ɛ)q I A) ) ] 8E N 0, Q, X i, Y i ) e δ ɛ) q0 4/8)8) 64 As for the term I B), put S q) = [ i qx i, Y i )]

16 M. Mojirsheibai ad Z. Motazeri 368 ad observe that I B) [ P E qx, Y ) qx, Y ) ] D > δ ɛ) q by Cauchy-Schwartz iequality) [ ] = P E qx, Y ) δ ɛ) q 4 D EqX, Y ) > 0 64 P sup S q ) E q X, Y ) δ ɛ) q 4 > 0, 64 q Q where the last lie above follows from the followig argumets [ ] E qx, Y ) D EqX, Y ) [ ] = E qx, Y ) q D if E X, Y ) q Q [ ] = sup E qx, Y ) D S q) + S q) q Q S q ) + S q ) Eq X, Y ) sup S q ) E q X, Y ), q Q ad where, we have used the fact that S q) S q ) 0, by the defiitio of q). Therefore [ δ ɛ) I B) q0 4 8E N, Q, X i, Y i ) 04 ) ] e Cδ ɛ)4, where C > 0 does ot deped o or ɛ. Part ii) follows from the Borel-Catelli lemma. Ackowledgemets. The authors would like to thak Professor Hamedai ad the referees for the helpful commets. Refereces [] Beett, G. 96). Probability iequalities for the sum of idepedet radom variables. Joural of the America Statistical Associatio, 57, [] Devroye, L., Györfi, L., ad Lugosi, G. 996). A Probabilistic Theory of Patter Recogitio. Spriger, New York.

17 Icomplete Covariates 369 [3] Györfi, L., Kohler, M., Krzyzak, A., ad Walk, H. 00). A Distributio-Free Theory of Noparametric Regressio. Spriger. [4] Kolmogorov, A.N. ad Tikhomirov, V.M. 959). ɛ-etropy ad ɛ-capacity of sets i fuctio spaces, Uspekhi Matematicheskikh Nauk, 4, [5] Little, R.J.A. ad Rubi, D.B. 00). Statistical Aalysis With Missig Data. Wiley, New York. [6] Mojirsheibai, M., Motazeri, Z., ad Rajaeefard, A. 0). O classificatio with icomplete covariates. Statistics, 45, [7] Mojirsheibai, M. ad Motazeri, Z. 007). Statistical classificatio with missig covariates. Joural of the Royal Statistical Society Ser. B., 69, [8] Pollard, D. 984). Covergece of Stochastic Processes. Spriger-Verlag, New York. [9] va der Vaart, A.W. ad Weller, J.A. 996). Weak Covergece ad Empirical Processes with Applicatio to Statistics. Spriger-Verlag, New York. [0] Willard, S. 004). Geeral Topology. Dover Publicatios.

Estimation of the essential supremum of a regression function

Estimation of the essential supremum of a regression function Estimatio of the essetial supremum of a regressio fuctio Michael ohler, Adam rzyżak 2, ad Harro Walk 3 Fachbereich Mathematik, Techische Uiversität Darmstadt, Schlossgartestr. 7, 64289 Darmstadt, Germay,