On Classification Based on Totally Bounded Classes of Functions when There are Incomplete Covariates

Size: px
Start display at page:

Download "On Classification Based on Totally Bounded Classes of Functions when There are Incomplete Covariates"

Transcription

1 Joural of Statistical Theory ad Applicatios Volume, Number 4, 0, pp ISSN O Classificatio Based o Totally Bouded Classes of Fuctios whe There are Icomplete Covariates Majid Mojirsheibai ad Zahra Motazeri Abstract This article deals with the two-group classificatio problem, where the class coditioal probability πz) = PY = Z = z belogs to a kow class of fuctios F which is totally bouded with respect to the supremum orm. Give a ɛ-cover F ɛ of F, we cosider kerel regressio methods for costructig classifiers usig members of F ɛ. A Horvitz-Thompsotype iverse weightig approach will be used to hadle the presece of icomplete covariates i the data. Coditios uder which the resultig classifiers are strogly cosistet are also give. Key Words ad Phrases. Classificatio, cosistecy, empirical process, coverig umber. AMS 000 Subject Classificatios. 6H30. Departmet of Mathematics, Califoria State Uiversity Northridge, CA 9330, USA. majid.mojirsheibai@csu.edu Departmet of Epidemiology ad Commuity Medicie, Faculty of Medicie, Uiversity of Ottawa, 45 Smyth 358) Ottawa, ON, KH 8M5. zmotaze@uottawa.ca

2 M. Mojirsheibai ad Z. Motazeri 354 Itroductio Cosider the followig stadard two-group classificatio problem. Let Z, Y ) be a radom pair, where Z R s is a radom vector of covariates or predictors ad Y 0, has to be predicted based o the vector Z. More precisely, oe would like to fid a fuctio a classifier) g: R s 0, for which the misclassificatio error probability Lg) = PgZ) Y is as small as possible. The best classifier, called the Bayes classifier ad deoted by g B, is give by where if π z) > g B z) = 0 otherwise, ) π z) = PY = Z = z = EY Z = z). ) For a proof of this fact see, for example, Devroye et. al. 996; Chapter ).) The error of this classifier will be deoted by L throughout this paper, i.e., L := Pg B Z) Y. 3) I passig, we also ote that if Z ad Y are idepedet the π z) is a costat fuctio of z ad is i fact equal to P Y =. I this extreme case, g B Z) is either always or always 0. O the other had, if Y = IZ B for some B R s the π z) = Iz B ad also g B z) = Iz B. I practice oe does ot kow the uderlyig probability distributio of the pair Z, Y ), ad therefore fidig g B traiig sample is virtually impossible. However, i statistics, oe usually has access to a Z, Y ), Z, Y ),, Z, Y ) draw from F. The goal is the to costruct a data-based classificatio rule g, whose coditioal error rate L g ) = Pg Z) Y Z i, Y i ), i =,,

3 Icomplete Covariates 355 is i some sese small. A desirable property for a data-based classifier is cosistecy: A classifier g is said to be cosistet if L g ) coverges to Lg B ) i probability. If the covergece holds almost surely the g is said to be strogly cosistet. Next, let F be a give class of fuctios π : R s [0, ]. For ay real-valued fuctio f o R s, let f = sup z R s fz) be its usual supremum orm ad put Bπ, ɛ) = h : R s [0, ] π h < ɛ, i.e., Bπ, ɛ) is the ope ball of fuctios, cetered at π, with the -radius ɛ > 0. Suppose that the fiite set of fuctios F ɛ = π,, π Nɛ), where π i : R s 0,, i Nɛ) <, is a ɛ-cover of the family F i the usual sese that sup π F mi i Nɛ) π π i ɛ. Note that F Nɛ) Bπ i, ɛ). Here, each member of F ɛ may or may ot be a member of F. The coverig umber of the family F with respect to the -orm, deoted by N ɛ, F), is the cardiality F ɛ of the smallest ɛ-cover of F. If N ɛ, F) < for every ɛ > 0 the F is said to be totally bouded. I passig we also ote the close relatioship betwee compactess ad total boudedess also called pre-compactess): compactess implies total boudedess, but the coverse is ot i geeral true. I fact, a metric space is compact if ad oly if it is complete ad totally bouded, this is the Heie-Borel theorem for geeral metric spaces). For more o these ad other properties of compact metric spaces oe may refer, for example, to Willard 004). Next, for each π F cosider the classifier if πz) > /, g π z) = 0 otherwise. 4) Let L π) = Ig π Z i ) Y i 5)

4 M. Mojirsheibai ad Z. Motazeri 356 be the empirical error rate of g π. The the so-called skeleto estimate of π, selected from F ɛ, is give by see, for example, Chapter 8 of Devroye et. al. 996)): with the correspodig sample-based classifier see )): if π z) > /, g π z) = 0 otherwise. π = argmi L π), 6) Let L π ) = Pg π X) Y X i, Y i ), i =,, be the error of the classifier g π. The followig theorem establishes the cosistecy of the resultig classifier see Theorem 8. of Devroye et. al. 996)). Theorem Let F be a totally bouded class of fuctios mappig R s [0, ]. If π z) F the there is a sequece ɛ > 0 ad a sequece F ɛ π, selected from F ɛ, oe has L π ) a.s. L. Here ɛ ca be take as the smallest positive umber for which log N ɛ, F) ɛ. See Devroye et. al. 996; Chapter 8)) for a proof of this result. F such that for the skeleto estimate I the ext sectio we shall cosider the case where some of the compoets of the covariate vectors Z i may be missig. More specifically, we study the case with Z i = X i, V i ) R d+p, d + p = s, where X i R d, d, is always observable, but V i R p may be missig for the i th observatio. To deal with this difficulty, we propose a Horvitz-Thompso-type estimatio approach which works by weightig the complete cases by the iverse of the missig data probabilities. The problem of classificatio with missig covariates has also bee addressed by Mojirsheibai ad Motazeri 007), uder differet assumptios. Mai Results. Motivatio I this sectio we cosider the case where some compoets of Z i s may be missig. More specifically, we cosider the situatio where Z i = X i, V i ) R d+p, ad where X i R d, d,

5 Icomplete Covariates 357 is always observable, but V i R p may be missig for the i th observatio. We also defie the radom variables 0 if V i is missig i = otherwise, i =,. Now, the data may be represeted by D = Z, Y, ),, Z, Y, ) = X, V, Y, ),, X, V, Y, ). Let Z, Y ) be a ew observatio, for which Y 0, has to be predicted based o Z ad the data D ); here Z, Y ) iid = Z, Y ). Clearly the miimizatio i 6) is o loger possible uder the curret setup where there are missig V i s amog the data. This is because the computatio of the right had side of 5) requires every Z i, i =,,. Usig the complete cases aloe i 5) will ot solve the problem; here a complete case refers to a fully observable Z i i.e., whe i = ). The reaso is that if we choose π as the miimizer of L π) := i Ig π Z i ) Y i, the the correspodig empirical process L π) Lπ) π F is ot cetered i geeral ot eve asymptotically), ad this plays a crucial role i establishig the theoretical validity of g π. I fact, it is clear that L π) is ot i geeral ubiased for Lπ). To motivate the procedures of this sectio, we also eed to defie the missig probability mechaism, i.e., the quatity pz i, Y i ) := P i = Z i, Y i = E i Z i, Y i ), i =,,. I what follows we shall also assume that pz i, Y i ) p 0 > 0 ; 7) this is a assumptio which says, i a sese, that there is always a ozero probability p 0 that a observatio is ot missig. Now, cosider the hypothetical situatio where the above fuctio p is kow ad put L p π) := i pz i, Y i ) Ig πz i ) Y i, 8)

6 M. Mojirsheibai ad Z. Motazeri 358 where g π is as i 4). I passig we also ote that 5) is the special case of 8) whe E i ) = for all i. I fact, it is straightforward to see that L p π) satisfies E[ L p π)] = Lπ), where Lπ) = Pg π Z) Y ). It is importat to metio that the idea i 8) is very similar to that used by Györfi et al. 00; Chapter 6) for the ubiased estimatio of a mea from cesored data. Next, defie the followig revised versio of the estimator π i 6) ad let g π be its correspodig classifier, i.e., if π z) > /, g π z) = 0 otherwise. π = argmi Lp π) 9) 0) To study the performace of g π, let L π ) = P g π Z) Y D be the misclassificatio error of g π. The we have the followig result. Theorem Let F be a totally bouded class of fuctios mappig R d [0, ] cotaiig the fuctio π x) = PY = X = x. The for every ɛ ad δ satisfyig δ > ɛ > 0 oe has P L π ) L > δ N ɛ, F) exp δ/ ɛ) p 0, where p 0 is as i 7). The proofs of the theorems will be deferred util all the results have bee stated.the followig corollary is a immediate cosequece of the Borel-Camtelli lemma: Corollary Let ɛ be a sequece of positive costats decreasig to 0. Also let F be the class of fuctios defied i Theorem. If, as, the log N ɛ, F) 0 L π ) a.s. L. Thus, if the missig probability mechaism pz i, Y i ) were kow, the above approach would provide the theoretical basis to costruct strogly cosistet classifiers. Ufortuately, i practice, the missig probability mechaism is almost always ukow ad must be estimated. I the ext sectio we propose a kerel-based approach to overcome this problem.

7 Icomplete Covariates 359. Kerel Regressio Let pz i, Y i ) = P i = Z i, Y i be the missig probability, i.e., the coditioal probability that V i is missig recall that Z i = X i, V i ) ). Uder the commoly used assumptio of data Missig At Radom MAR), oe assumes that the probability that V i is missig does ot deped o V i itself. That is, P i = Z i, Y i = P i = X i, Y i =: qx i, Y i ). ) Whe P i = Z i, Y i = P i = the V i is said to be Missig Completely At Radom MCAR). For these defiitios ad a survey of other missig patters oe may refer to the book by Little ad Rubi 00). Now cosider the followig kerel-based estimator of the fuctio qx i, Y i ) defied i ): qx i, Y i ) = ) j=, i Xj X jiy j = Y i K i j=, i IY j = Y i K Xj X i h h ), ) with the covetio 0/0 = 0, where K : R d R + is ay kerel with the smoothig parameter h; here h h) 0, as.) Next, for each π F, put L q π) := i qz i, Y i ) Ig πz i ) Y i, ad defie π = argmi L q π). The the correspodig classifier is give by if π z) > /, g π z) = 0 otherwise. 3) To assess the performace of g π we will make the followig assumptios: C: The MAR assumptio ) holds with qx i, Y i ) q 0 > 0, for some positive costat q 0, compare with 7)). C: The radom vector X has a compactly supported desity fuctio, fx), ad f is bouded away from zero o its support. Furthermore, f ad its first-order partial derivatives are uiformly bouded o its support.

8 M. Mojirsheibai ad Z. Motazeri 360 C3: The partial derivatives x i qx, y), where i =, d), exist ad are bouded o the compact support of f, uiformly i x. C4: The kerel K satisfies Ku)du = ad u i Ku)du <, i =,, d, ad K <. The smoothig parameter h satisfies h 0 ad h d, as. The followig theorem gives performace bouds for the classifier g π. Theorem 3 Let F be as i Theorem ad defie the classifier g π that coditios C C4 hold. i) For every δ > ɛ > 0 there is a 0 > 0 such that for all > 0, as i 3). Also suppose P L π ) L > δ N ɛ, F) e δ ɛ) q0 /8 + 4 e c δ ɛ)/4) h d + e c h d), where L π ) = Pg π Z) Y D ad where c ad c are positive costats ot depedig o, δ, or ɛ. ii) Let ɛ be a sequece of positive costats decreasig to 0. If, as, log N ɛ, F) 0 ad log h 0 the L π ) a.s. L. The above results, as well as those i Theorem ad corollary, are based o the requiremet that F is totally bouded. Furthermore, the ɛ-coverig umber N ɛ, F) of the class F should ot grow too fast as ɛ gets closer ad closer to 0). There are may importat classes of fuctios that satisfy these requiremets; here we give two examples: Example. Differetiable fuctios.) Let k,, k s be o-egative itegers ad put k = k,, k s ) ad k = k + +k s. Also, for ay g : R s R, let D k) gu) = k gu)/ u k,, uks s. Cosider the class of fuctios with bouded partial derivatives of order r: G = g : [0, ] d R k r sup D k) gu) A <. u The, for every ɛ > 0, log N ɛ, Ψ) Mɛ α, where α = d/r ad M Md, r). This result is due to Kolmogorov ad Tikhomirov 959).

9 Icomplete Covariates 36 Example. Cosider the class Ψ of all covex fuctios ψ : C [0, ], where C R d is compact ad covex. If ψ satisfies the Lipschitz coditio ψz ) ψz ) L z z, for all z, z C, the log N ɛ, Ψ) Mɛ d/, for every ɛ > 0, where M Md, L); see Va der Vaart ad Weller 996)..3 Least-squares Regressio I this sectio we cosider least-squares LS) estimates of the fuctio q. The method works as follows. Suppose that q belogs to the kow class of fuctios Q of the form q : R d 0, [q 0, ], where q 0 is as i assumptio C. The least-squares estimate of q is give by Now, for each π F, let q = argmi q Q i qx i, Y i )). L q π) := i qz i, Y i ) Ig πz i ) Y i, ad defie π = argmi L q π). I this case, we cosider the followig classifier if π z) > /, g π z) = 0 otherwise. 4) To study the performace of g π we also eed the followig stadard otatio from the empirical process theory. Fix x, y ),, x, y ) ad let N ɛ, Q, x i, y i ) ) be the ɛ-coverig umber of the class Q with respect to the empirical measure of the poits x, y ),, x, y ). That is, N ɛ, Q, x i, y i ) ) is the cardiality of the smallest subclass of fuctios Q ɛ = q,, q Nɛ) q i : R d 0, [q 0, ] such that for every q Q ad every ɛ > 0 there is a q Q ɛ such that qx i, y i ) q x i, y i ) < ɛ. For more o this oe may refer, for example, to Pollard 984) or va der Vaart ad Weller 996). We the have the followig result. Theorem 4 Let F be as i Theorem ad suppose that coditio C holds. Also, defie the classifier g π as i 4) ad set L π ) = Pg π Z) Y D. The:

10 M. Mojirsheibai ad Z. Motazeri 36 i) For every δ > ɛ > 0 there is a 0 > 0 such that for all > 0, P L π ) L > δ N ɛ, F) e δ ɛ) q0 /8 [ δ ɛ)q ) ] + 8E N 0, Q, X i, Y i ) e C 3δ ɛ) 64 [ δ ɛ) q 4 ) ] 0 + 8E N, Q, X i, Y i ) e C 4δ ɛ) 4 04 where c 3 ad c 4 are positive costats ot depedig o, δ, or ɛ. ii) Let ɛ be a sequece of positive costats decreasig to 0. If, as, )] log N ɛ, F) log E [N c, Q, X i, Y i ) 0 ad 0, c > 0, the L π ) a.s. L. 3 Proofs Proof of Theorem. The proof is based o stadard argumets, see, for example, Devroye et al. 996; Sec. 8.3)), ad goes as follows. First observe that for ay classifier g PgZ) Y = PgZ) = Y ) = PgZ) =, Y = + PgZ) = 0, Y = 0 [ ] [ ] = E IgZ) = IY = E IgZ) = 0 IY = 0 [ ] [ ] = E E IgZ) = IY = Z E E IgZ) = 0 IY = 0 Z [ ] = E IgZ) = π Z) + IgZ) = 0 π Z)), where π Z) = PY = Z. Thus, [ ] pgz) y L = E Ig B Z) = π Z) + Ig B Z) = 0 π Z)) [ ] E IgZ) = π Z) + IgZ) = 0 π Z)) [ ) = E π Z) Ig B Z) = IgZ) = )] + π Z)) Ig B Z) = 0 IgZ) = 0 [ )] = E π Z) ) Ig B Z) = IgZ) = = E[ π Z) ] Ig B Z) gz), 5) i view of the defiitios of g B ad π i ) ad )).

11 Icomplete Covariates 363 Now let π F ad put Lπ) = Pg π Z) Y, where ad ote that by 5) if πz) > g π z) = 0 otherwise, Lπ) Lπ ) = E[ π Z) ] Ig B Z) g π Z) E πz) π Z), 6) where the last lie follows sice π Z) 0.5 πz) π Z) wheever g B Z) gz). Let π F ɛ be such that π Bπ, ɛ); this is possible sice F ɛ is a ɛ-cover of F ad π F. Sice if Lπ) L E π Z) π Z), by 6)) sup z R d+p π z) π z) ɛ, because π Bπ, ɛ)), 7) oe fids that for every δ > ɛ > 0 P L π ) L > δ P = P P sup L π ) if Lπ) > δ ɛ Lπ) > δ ɛ L π ) L p π ) + L p π ) if Lp π) Lπ) > δ ɛ N ɛ, F) sup P Lp π) Lπ) δ > ɛ. Now, by Hoeffdig s iequality, this last probability statemet appearig above ca be bouded by exp δ/ ɛ) p 0, ad this completes the proof of the theorem. Proof of Theorem 3. Part i) For each π F, let L q π) := i qx i, Y i ) Ig πz i ) Y i

12 M. Mojirsheibai ad Z. Motazeri 364 ad observe that L q π) L q π) = Furthermore, sice i Ig π Z i ) Y i i Ig π Z i ) Y i qx i, Y i ) qx i, Y i ) ) qx i, Y i ) qx i, Y i ) qx i, Y i ). L π ) if Lπ) = [ ] L π ) L q π ) sup L q π) Lπ), [ + L q π ) if ] Lπ) oe fids that P L π ) L > δ P L π ) if Lπ) > δ ɛ, i view of 7)) P sup L q π) Lπ) > δ ɛ ) P qx i, Y i ) qx i, Y i ) qx i, Y i ) > δ 4 ɛ + P sup L q π) Lπ) > δ 4 ɛ := I + II, say). 8) But, usig the MAR assumptio see )), it is straightforward to see that E[ L q π)] = Lπ). Therefore II N ɛ, F) P Lq π) Lπ) > δ 4 ɛ N ɛ, F) e δ ɛ) q 0 /8, via Hoeffdig s iequality). 9) As for the term I i 8) first ote that [ I P qx i, Y i ) qx i, Y i ) qx i, Y i ) δ ɛ ] [ > 4 [ + P qx i, Y i ) < q ] 0 P qxi, Y i ) qx i, Y i ) /q0) > δ ɛ 4 + qx i, Y i ) > q ] 0 P qx i, Y i ) < q 0. 0) It will be show at the ed of the proof that for every costat b > 0, ad large eough, P qxi, Y i ) qx i, Y i ) > b 4e C 3h d b, )

13 Icomplete Covariates 365 where C 3 is a positive costat ot depedig o or ɛ. Therefore, takig b = δ ɛ i ), the first sum o the r.h.s. of 0) is bouded by 4e C 4h d δ ɛ), for large eough, where C 4 > 0 does ot deped o, δ, or ɛ. Similarly, sice P qx i, Y i ) < q 0 / P qxi, Y i ) qx i, Y i ) > q 0, oe fids, via )), that for large eough, the secod sum o the r.h.s. of 0) is bouded by 4e C 5h d, where the costat C 5 is positive ad does ot deped o or ɛ. Puttig the above together, we have show that for large eough, I 4 e C 4h d δ ɛ) + 4 e C 5h d. This completes the proof of part i) of Theorem 3. Part ii) follows from the Borel-Catelli lemma. Proof of ). Sice qx i, Y i ) qx i, Y i ), it is sufficiet to prove ) for 0 < b. Now, let SX i, Y i ) = fx i )P Y = Y i Y i )qx i, Y i ) ŜX i, Y i ) = ) h d Xj X ) i j IY j = Y i K h j=, i RX i, Y i ) = fx i )P Y = Y i Y i RX i, Y i ) = ) h d Xj X ) i IY j = Y i K h j=, i ad observe that qx i, Y i ) qx i, Y i ) = ŜX i, Y i ) RX i, Y i ) SX i, Y i ) RX i, Y i ) = ŜX i, Y i )/ RX i, Y i ) RX i, Y i ) RX i, Y i ) RX i, Y i )) + ŜX i, Y i ) SX i, Y i ) RX i, Y i ) RX i, Y i ) RX i, Y i ) + RX i, Y i ) ŜX i, Y i ) SX i, Y i ), RX i, Y i ) where we have used the fact that ŜX i, Y i )/ RX i, Y i ). Therefore, sice RX i, Y i ) > C 6 0, by assumptio C)), oe fids that for every b > 0 P qxi, Y i ) qx i, Y i ) > b P ŜX i, Y i ) SX i, Y i ) > C 7 b + P RXi, Y i ) RX i, Y i ) > C 7 b := π + π. )

14 M. Mojirsheibai ad Z. Motazeri 366 where C 7 = C 6 /. Now, by the results of Mojirsheibai et al. 0; Lemma A., with gz, Y ) = ) oe fids ] SX i, Y i ) E[ŜXi, Y i ) X i, Y i Ch, 3) where C > 0 is a costat ot depedig o. Therefore ] ] E π P ŜX i, Y i ) E[ŜXi, Y i ) X i, Y i + [ŜXi, Y i ) X i, Y i SX i, Y i )] > C7 b ] P ŜX i, Y i ) E[ŜXi, Y i ) X i, Y i > C8 b where for large by 3)), where C 8 = C 7 /) [ ] ] = E P ŜX i, Y i ) E[ŜXi, Y i ) X i, Y i > C8 b X i, Y i = E P ) Γ j X i, Y i ) > C 8b X i, Y i, 4) j=, i [ Γ j X i, Y i ) = h d j IY j = Y i K Xj X i h ) E j IY j = Y i K Xj X i h ) Xi, Y i ]. However, coditioal o X i, Y i ), the terms Γ j X i, Y i ), j =,,, are idepedet, zero-mea radom variables, bouded by h d K ad +h d K. We also ote that ] VarΓ j X i, Y i ) X i, Y i ) = E [Γ j X i, Y i ) X i, Y i h d K f. Therefore, by Beett s iequality Beett, 96), for ay fixed oradom) x ad y P ) Γ j X i, Y i ) > C 8 b )h d X i = x, Y i = y exp C8 b, K f + C 8 b j=, i where the boud does ot deped o x or y. 0 < b, oe fids for large eough), )h d C 8 π exp b. K f + C 8 Therefore, i view of 4) ad the fact that Similarly, oe ca also show with, i fact, less efforts) that, for large eough, )h d C 9 π exp b, K f + C 9 where C 9 is a positive costat ot depedig o or b. This complete the proof of ).

15 Icomplete Covariates 367 Proof of Theorem 4. Part i) Usig 7) ad the argumets that lead to 8), we fid P L π ) L > δ I + II, where II is as i 8) ad But, by 9), I := P ) qx i, Y i ) qx i, Y i ) qx i, Y i ) > δ 4 ɛ. II < N ɛ, F) e δ ɛ) q 0 /8. To deal with the term I first ote that sice q q 0, oe fids I P q0 qx i, Y i ) qx i, Y i ) > δ ɛ 4 [ ] P qx i, Y i ) qx i, Y i ) E qx, Y ) qx, Y ) D [ ] + E qx, Y ) qx, Y ) D > δ ɛ)q 0 4 P sup q X, Y ) qx i, Y i ) Eq X, Y ) qx, Y ) > δ ɛ)q 0 q Q 8 [ ] + P E qx, Y ) qx, Y ) D > δ ɛ)q 0 8 := I A) + I B). 5) Stadard results from the empirical process theorey, see for example, Pollard 984)), yields [ δ ɛ)q I A) ) ] 8E N 0, Q, X i, Y i ) e δ ɛ) q0 4/8)8) 64 As for the term I B), put S q) = [ i qx i, Y i )]

16 M. Mojirsheibai ad Z. Motazeri 368 ad observe that I B) [ P E qx, Y ) qx, Y ) ] D > δ ɛ) q by Cauchy-Schwartz iequality) [ ] = P E qx, Y ) δ ɛ) q 4 D EqX, Y ) > 0 64 P sup S q ) E q X, Y ) δ ɛ) q 4 > 0, 64 q Q where the last lie above follows from the followig argumets [ ] E qx, Y ) D EqX, Y ) [ ] = E qx, Y ) q D if E X, Y ) q Q [ ] = sup E qx, Y ) D S q) + S q) q Q S q ) + S q ) Eq X, Y ) sup S q ) E q X, Y ), q Q ad where, we have used the fact that S q) S q ) 0, by the defiitio of q). Therefore [ δ ɛ) I B) q0 4 8E N, Q, X i, Y i ) 04 ) ] e Cδ ɛ)4, where C > 0 does ot deped o or ɛ. Part ii) follows from the Borel-Catelli lemma. Ackowledgemets. The authors would like to thak Professor Hamedai ad the referees for the helpful commets. Refereces [] Beett, G. 96). Probability iequalities for the sum of idepedet radom variables. Joural of the America Statistical Associatio, 57, [] Devroye, L., Györfi, L., ad Lugosi, G. 996). A Probabilistic Theory of Patter Recogitio. Spriger, New York.

17 Icomplete Covariates 369 [3] Györfi, L., Kohler, M., Krzyzak, A., ad Walk, H. 00). A Distributio-Free Theory of Noparametric Regressio. Spriger. [4] Kolmogorov, A.N. ad Tikhomirov, V.M. 959). ɛ-etropy ad ɛ-capacity of sets i fuctio spaces, Uspekhi Matematicheskikh Nauk, 4, [5] Little, R.J.A. ad Rubi, D.B. 00). Statistical Aalysis With Missig Data. Wiley, New York. [6] Mojirsheibai, M., Motazeri, Z., ad Rajaeefard, A. 0). O classificatio with icomplete covariates. Statistics, 45, [7] Mojirsheibai, M. ad Motazeri, Z. 007). Statistical classificatio with missig covariates. Joural of the Royal Statistical Society Ser. B., 69, [8] Pollard, D. 984). Covergece of Stochastic Processes. Spriger-Verlag, New York. [9] va der Vaart, A.W. ad Weller, J.A. 996). Weak Covergece ad Empirical Processes with Applicatio to Statistics. Spriger-Verlag, New York. [0] Willard, S. 004). Geeral Topology. Dover Publicatios.

Estimation of the essential supremum of a regression function

Estimation of the essential supremum of a regression function Estimatio of the essetial supremum of a regressio fuctio Michael ohler, Adam rzyżak 2, ad Harro Walk 3 Fachbereich Mathematik, Techische Uiversität Darmstadt, Schlossgartestr. 7, 64289 Darmstadt, Germay,

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Empirical Processes: Glivenko Cantelli Theorems

Empirical Processes: Glivenko Cantelli Theorems Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3

More information

lim za n n = z lim a n n.

lim za n n = z lim a n n. Lecture 6 Sequeces ad Series Defiitio 1 By a sequece i a set A, we mea a mappig f : N A. It is customary to deote a sequece f by {s } where, s := f(). A sequece {z } of (complex) umbers is said to be coverget

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

Lecture Notes for Analysis Class

Lecture Notes for Analysis Class Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

1 Convergence in Probability and the Weak Law of Large Numbers

1 Convergence in Probability and the Weak Law of Large Numbers 36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Notes #3 Sequences Limit Theorems Monotone and Subsequences Bolzano-WeierstraßTheorem Limsup & Liminf of Sequences Cauchy Sequences and Completeness

Notes #3 Sequences Limit Theorems Monotone and Subsequences Bolzano-WeierstraßTheorem Limsup & Liminf of Sequences Cauchy Sequences and Completeness Notes #3 Sequeces Limit Theorems Mootoe ad Subsequeces Bolzao-WeierstraßTheorem Limsup & Limif of Sequeces Cauchy Sequeces ad Completeess This sectio of otes focuses o some of the basics of sequeces of

More information

Measure and Measurable Functions

Measure and Measurable Functions 3 Measure ad Measurable Fuctios 3.1 Measure o a Arbitrary σ-algebra Recall from Chapter 2 that the set M of all Lebesgue measurable sets has the followig properties: R M, E M implies E c M, E M for N implies

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

A Note on the Kolmogorov-Feller Weak Law of Large Numbers

A Note on the Kolmogorov-Feller Weak Law of Large Numbers Joural of Mathematical Research with Applicatios Mar., 015, Vol. 35, No., pp. 3 8 DOI:10.3770/j.iss:095-651.015.0.013 Http://jmre.dlut.edu.c A Note o the Kolmogorov-Feller Weak Law of Large Numbers Yachu

More information

Self-normalized deviation inequalities with application to t-statistic

Self-normalized deviation inequalities with application to t-statistic Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Properties of Fuzzy Length on Fuzzy Set

Properties of Fuzzy Length on Fuzzy Set Ope Access Library Joural 206, Volume 3, e3068 ISSN Olie: 2333-972 ISSN Prit: 2333-9705 Properties of Fuzzy Legth o Fuzzy Set Jehad R Kider, Jaafar Imra Mousa Departmet of Mathematics ad Computer Applicatios,

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.

More information

LECTURE 8: ASYMPTOTICS I

LECTURE 8: ASYMPTOTICS I LECTURE 8: ASYMPTOTICS I We are iterested i the properties of estimators as. Cosider a sequece of radom variables {, X 1}. N. M. Kiefer, Corell Uiversity, Ecoomics 60 1 Defiitio: (Weak covergece) A sequece

More information

FUNDAMENTALS OF REAL ANALYSIS by

FUNDAMENTALS OF REAL ANALYSIS by FUNDAMENTALS OF REAL ANALYSIS by Doğa Çömez Backgroud: All of Math 450/1 material. Namely: basic set theory, relatios ad PMI, structure of N, Z, Q ad R, basic properties of (cotiuous ad differetiable)

More information

Entropy Rates and Asymptotic Equipartition

Entropy Rates and Asymptotic Equipartition Chapter 29 Etropy Rates ad Asymptotic Equipartitio Sectio 29. itroduces the etropy rate the asymptotic etropy per time-step of a stochastic process ad shows that it is well-defied; ad similarly for iformatio,

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

TERMWISE DERIVATIVES OF COMPLEX FUNCTIONS

TERMWISE DERIVATIVES OF COMPLEX FUNCTIONS TERMWISE DERIVATIVES OF COMPLEX FUNCTIONS This writeup proves a result that has as oe cosequece that ay complex power series ca be differetiated term-by-term withi its disk of covergece The result has

More information

Kernel density estimator

Kernel density estimator Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Theorem 3. A subset S of a topological space X is compact if and only if every open cover of S by open sets in X has a finite subcover.

Theorem 3. A subset S of a topological space X is compact if and only if every open cover of S by open sets in X has a finite subcover. Compactess Defiitio 1. A cover or a coverig of a topological space X is a family C of subsets of X whose uio is X. A subcover of a cover C is a subfamily of C which is a cover of X. A ope cover of X is

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Berry-Esseen bounds for self-normalized martingales

Berry-Esseen bounds for self-normalized martingales Berry-Essee bouds for self-ormalized martigales Xiequa Fa a, Qi-Ma Shao b a Ceter for Applied Mathematics, Tiaji Uiversity, Tiaji 30007, Chia b Departmet of Statistics, The Chiese Uiversity of Hog Kog,

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

Metric Space Properties

Metric Space Properties Metric Space Properties Math 40 Fial Project Preseted by: Michael Brow, Alex Cordova, ad Alyssa Sachez We have already poited out ad will recogize throughout this book the importace of compact sets. All

More information

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002 ECE 330:541, Stochastic Sigals ad Systems Lecture Notes o Limit Theorems from robability Fall 00 I practice, there are two ways we ca costruct a ew sequece of radom variables from a old sequece of radom

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Chapter 7 Isoperimetric problem

Chapter 7 Isoperimetric problem Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated

More information

ON POINTWISE BINOMIAL APPROXIMATION

ON POINTWISE BINOMIAL APPROXIMATION Iteratioal Joural of Pure ad Applied Mathematics Volume 71 No. 1 2011, 57-66 ON POINTWISE BINOMIAL APPROXIMATION BY w-functions K. Teerapabolar 1, P. Wogkasem 2 Departmet of Mathematics Faculty of Sciece

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

MAT1026 Calculus II Basic Convergence Tests for Series

MAT1026 Calculus II Basic Convergence Tests for Series MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real

More information

1+x 1 + α+x. x = 2(α x2 ) 1+x

1+x 1 + α+x. x = 2(α x2 ) 1+x Math 2030 Homework 6 Solutios # [Problem 5] For coveiece we let α lim sup a ad β lim sup b. Without loss of geerality let us assume that α β. If α the by assumptio β < so i this case α + β. By Theorem

More information

McGill University Math 354: Honors Analysis 3 Fall 2012 Solutions to selected problems

McGill University Math 354: Honors Analysis 3 Fall 2012 Solutions to selected problems McGill Uiversity Math 354: Hoors Aalysis 3 Fall 212 Assigmet 3 Solutios to selected problems Problem 1. Lipschitz fuctios. Let Lip K be the set of all fuctios cotiuous fuctios o [, 1] satisfyig a Lipschitz

More information

Lecture 3 : Random variables and their distributions

Lecture 3 : Random variables and their distributions Lecture 3 : Radom variables ad their distributios 3.1 Radom variables Let (Ω, F) ad (S, S) be two measurable spaces. A map X : Ω S is measurable or a radom variable (deoted r.v.) if X 1 (A) {ω : X(ω) A}

More information

f n (x) f m (x) < ɛ/3 for all x A. By continuity of f n and f m we can find δ > 0 such that d(x, x 0 ) < δ implies that

f n (x) f m (x) < ɛ/3 for all x A. By continuity of f n and f m we can find δ > 0 such that d(x, x 0 ) < δ implies that Lecture 15 We have see that a sequece of cotiuous fuctios which is uiformly coverget produces a limit fuctio which is also cotiuous. We shall stregthe this result ow. Theorem 1 Let f : X R or (C) be a

More information

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Mi-Hwa Ko and Tae-Sung Kim

Mi-Hwa Ko and Tae-Sung Kim J. Korea Math. Soc. 42 2005), No. 5, pp. 949 957 ALMOST SURE CONVERGENCE FOR WEIGHTED SUMS OF NEGATIVELY ORTHANT DEPENDENT RANDOM VARIABLES Mi-Hwa Ko ad Tae-Sug Kim Abstract. For weighted sum of a sequece

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

STA Object Data Analysis - A List of Projects. January 18, 2018

STA Object Data Analysis - A List of Projects. January 18, 2018 STA 6557 Jauary 8, 208 Object Data Aalysis - A List of Projects. Schoeberg Mea glaucomatous shape chages of the Optic Nerve Head regio i aimal models 2. Aalysis of VW- Kedall ati-mea shapes with a applicatio

More information

M17 MAT25-21 HOMEWORK 5 SOLUTIONS

M17 MAT25-21 HOMEWORK 5 SOLUTIONS M17 MAT5-1 HOMEWORK 5 SOLUTIONS 1. To Had I Cauchy Codesatio Test. Exercise 1: Applicatio of the Cauchy Codesatio Test Use the Cauchy Codesatio Test to prove that 1 diverges. Solutio 1. Give the series

More information

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A. Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Notes 19 : Martingale CLT

Notes 19 : Martingale CLT Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe

More information

SOME SEQUENCE SPACES DEFINED BY ORLICZ FUNCTIONS

SOME SEQUENCE SPACES DEFINED BY ORLICZ FUNCTIONS ARCHIVU ATHEATICU BRNO Tomus 40 2004, 33 40 SOE SEQUENCE SPACES DEFINED BY ORLICZ FUNCTIONS E. SAVAŞ AND R. SAVAŞ Abstract. I this paper we itroduce a ew cocept of λ-strog covergece with respect to a Orlicz

More information

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology Advaced Aalysis Mi Ya Departmet of Mathematics Hog Kog Uiversity of Sciece ad Techology September 3, 009 Cotets Limit ad Cotiuity 7 Limit of Sequece 8 Defiitio 8 Property 3 3 Ifiity ad Ifiitesimal 8 4

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

5. Likelihood Ratio Tests

5. Likelihood Ratio Tests 1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Lecture 2: Concentration Bounds

Lecture 2: Concentration Bounds CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy

More information

Lecture 8: Convergence of transformations and law of large numbers

Lecture 8: Convergence of transformations and law of large numbers Lecture 8: Covergece of trasformatios ad law of large umbers Trasformatio ad covergece Trasformatio is a importat tool i statistics. If X coverges to X i some sese, we ofte eed to check whether g(x ) coverges

More information

Agnostic Learning and Concentration Inequalities

Agnostic Learning and Concentration Inequalities ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture

More information

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS Ryszard Zieliński Ist Math Polish Acad Sc POBox 21, 00-956 Warszawa 10, Polad e-mail: rziel@impagovpl ABSTRACT Weak laws of large umbers (W LLN), strog

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

ON THE FUZZY METRIC SPACES

ON THE FUZZY METRIC SPACES The Joural of Mathematics ad Computer Sciece Available olie at http://www.tjmcs.com The Joural of Mathematics ad Computer Sciece Vol.2 No.3 2) 475-482 ON THE FUZZY METRIC SPACES Received: July 2, Revised:

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 6 9/23/203 Browia motio. Itroductio Cotet.. A heuristic costructio of a Browia motio from a radom walk. 2. Defiitio ad basic properties

More information

Council for Innovative Research

Council for Innovative Research ABSTRACT ON ABEL CONVERGENT SERIES OF FUNCTIONS ERDAL GÜL AND MEHMET ALBAYRAK Yildiz Techical Uiversity, Departmet of Mathematics, 34210 Eseler, Istabul egul34@gmail.com mehmetalbayrak12@gmail.com I this

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Homework 4. x n x X = f(x n x) +

Homework 4. x n x X = f(x n x) + Homework 4 1. Let X ad Y be ormed spaces, T B(X, Y ) ad {x } a sequece i X. If x x weakly, show that T x T x weakly. Solutio: We eed to show that g(t x) g(t x) g Y. It suffices to do this whe g Y = 1.

More information

The log-behavior of n p(n) and n p(n)/n

The log-behavior of n p(n) and n p(n)/n Ramauja J. 44 017, 81-99 The log-behavior of p ad p/ William Y.C. Che 1 ad Ke Y. Zheg 1 Ceter for Applied Mathematics Tiaji Uiversity Tiaji 0007, P. R. Chia Ceter for Combiatorics, LPMC Nakai Uivercity

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

arxiv: v1 [math.pr] 4 Dec 2013

arxiv: v1 [math.pr] 4 Dec 2013 Squared-Norm Empirical Process i Baach Space arxiv:32005v [mathpr] 4 Dec 203 Vicet Q Vu Departmet of Statistics The Ohio State Uiversity Columbus, OH vqv@statosuedu Abstract Jig Lei Departmet of Statistics

More information

On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities

On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925

More information

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero? 2 Lebesgue Measure I Chapter 1 we defied the cocept of a set of measure zero, ad we have observed that every coutable set is of measure zero. Here are some atural questios: If a subset E of R cotais a

More information

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.

More information

Detailed proofs of Propositions 3.1 and 3.2

Detailed proofs of Propositions 3.1 and 3.2 Detailed proofs of Propositios 3. ad 3. Proof of Propositio 3. NB: itegratio sets are geerally omitted for itegrals defied over a uit hypercube [0, s with ay s d. We first give four lemmas. The proof of

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information