On Classification Based on Totally Bounded Classes of Functions when There are Incomplete Covariates
|
|
- Sylvia Shepherd
- 5 years ago
- Views:
Transcription
1 Joural of Statistical Theory ad Applicatios Volume, Number 4, 0, pp ISSN O Classificatio Based o Totally Bouded Classes of Fuctios whe There are Icomplete Covariates Majid Mojirsheibai ad Zahra Motazeri Abstract This article deals with the two-group classificatio problem, where the class coditioal probability πz) = PY = Z = z belogs to a kow class of fuctios F which is totally bouded with respect to the supremum orm. Give a ɛ-cover F ɛ of F, we cosider kerel regressio methods for costructig classifiers usig members of F ɛ. A Horvitz-Thompsotype iverse weightig approach will be used to hadle the presece of icomplete covariates i the data. Coditios uder which the resultig classifiers are strogly cosistet are also give. Key Words ad Phrases. Classificatio, cosistecy, empirical process, coverig umber. AMS 000 Subject Classificatios. 6H30. Departmet of Mathematics, Califoria State Uiversity Northridge, CA 9330, USA. majid.mojirsheibai@csu.edu Departmet of Epidemiology ad Commuity Medicie, Faculty of Medicie, Uiversity of Ottawa, 45 Smyth 358) Ottawa, ON, KH 8M5. zmotaze@uottawa.ca
2 M. Mojirsheibai ad Z. Motazeri 354 Itroductio Cosider the followig stadard two-group classificatio problem. Let Z, Y ) be a radom pair, where Z R s is a radom vector of covariates or predictors ad Y 0, has to be predicted based o the vector Z. More precisely, oe would like to fid a fuctio a classifier) g: R s 0, for which the misclassificatio error probability Lg) = PgZ) Y is as small as possible. The best classifier, called the Bayes classifier ad deoted by g B, is give by where if π z) > g B z) = 0 otherwise, ) π z) = PY = Z = z = EY Z = z). ) For a proof of this fact see, for example, Devroye et. al. 996; Chapter ).) The error of this classifier will be deoted by L throughout this paper, i.e., L := Pg B Z) Y. 3) I passig, we also ote that if Z ad Y are idepedet the π z) is a costat fuctio of z ad is i fact equal to P Y =. I this extreme case, g B Z) is either always or always 0. O the other had, if Y = IZ B for some B R s the π z) = Iz B ad also g B z) = Iz B. I practice oe does ot kow the uderlyig probability distributio of the pair Z, Y ), ad therefore fidig g B traiig sample is virtually impossible. However, i statistics, oe usually has access to a Z, Y ), Z, Y ),, Z, Y ) draw from F. The goal is the to costruct a data-based classificatio rule g, whose coditioal error rate L g ) = Pg Z) Y Z i, Y i ), i =,,
3 Icomplete Covariates 355 is i some sese small. A desirable property for a data-based classifier is cosistecy: A classifier g is said to be cosistet if L g ) coverges to Lg B ) i probability. If the covergece holds almost surely the g is said to be strogly cosistet. Next, let F be a give class of fuctios π : R s [0, ]. For ay real-valued fuctio f o R s, let f = sup z R s fz) be its usual supremum orm ad put Bπ, ɛ) = h : R s [0, ] π h < ɛ, i.e., Bπ, ɛ) is the ope ball of fuctios, cetered at π, with the -radius ɛ > 0. Suppose that the fiite set of fuctios F ɛ = π,, π Nɛ), where π i : R s 0,, i Nɛ) <, is a ɛ-cover of the family F i the usual sese that sup π F mi i Nɛ) π π i ɛ. Note that F Nɛ) Bπ i, ɛ). Here, each member of F ɛ may or may ot be a member of F. The coverig umber of the family F with respect to the -orm, deoted by N ɛ, F), is the cardiality F ɛ of the smallest ɛ-cover of F. If N ɛ, F) < for every ɛ > 0 the F is said to be totally bouded. I passig we also ote the close relatioship betwee compactess ad total boudedess also called pre-compactess): compactess implies total boudedess, but the coverse is ot i geeral true. I fact, a metric space is compact if ad oly if it is complete ad totally bouded, this is the Heie-Borel theorem for geeral metric spaces). For more o these ad other properties of compact metric spaces oe may refer, for example, to Willard 004). Next, for each π F cosider the classifier if πz) > /, g π z) = 0 otherwise. 4) Let L π) = Ig π Z i ) Y i 5)
4 M. Mojirsheibai ad Z. Motazeri 356 be the empirical error rate of g π. The the so-called skeleto estimate of π, selected from F ɛ, is give by see, for example, Chapter 8 of Devroye et. al. 996)): with the correspodig sample-based classifier see )): if π z) > /, g π z) = 0 otherwise. π = argmi L π), 6) Let L π ) = Pg π X) Y X i, Y i ), i =,, be the error of the classifier g π. The followig theorem establishes the cosistecy of the resultig classifier see Theorem 8. of Devroye et. al. 996)). Theorem Let F be a totally bouded class of fuctios mappig R s [0, ]. If π z) F the there is a sequece ɛ > 0 ad a sequece F ɛ π, selected from F ɛ, oe has L π ) a.s. L. Here ɛ ca be take as the smallest positive umber for which log N ɛ, F) ɛ. See Devroye et. al. 996; Chapter 8)) for a proof of this result. F such that for the skeleto estimate I the ext sectio we shall cosider the case where some of the compoets of the covariate vectors Z i may be missig. More specifically, we study the case with Z i = X i, V i ) R d+p, d + p = s, where X i R d, d, is always observable, but V i R p may be missig for the i th observatio. To deal with this difficulty, we propose a Horvitz-Thompso-type estimatio approach which works by weightig the complete cases by the iverse of the missig data probabilities. The problem of classificatio with missig covariates has also bee addressed by Mojirsheibai ad Motazeri 007), uder differet assumptios. Mai Results. Motivatio I this sectio we cosider the case where some compoets of Z i s may be missig. More specifically, we cosider the situatio where Z i = X i, V i ) R d+p, ad where X i R d, d,
5 Icomplete Covariates 357 is always observable, but V i R p may be missig for the i th observatio. We also defie the radom variables 0 if V i is missig i = otherwise, i =,. Now, the data may be represeted by D = Z, Y, ),, Z, Y, ) = X, V, Y, ),, X, V, Y, ). Let Z, Y ) be a ew observatio, for which Y 0, has to be predicted based o Z ad the data D ); here Z, Y ) iid = Z, Y ). Clearly the miimizatio i 6) is o loger possible uder the curret setup where there are missig V i s amog the data. This is because the computatio of the right had side of 5) requires every Z i, i =,,. Usig the complete cases aloe i 5) will ot solve the problem; here a complete case refers to a fully observable Z i i.e., whe i = ). The reaso is that if we choose π as the miimizer of L π) := i Ig π Z i ) Y i, the the correspodig empirical process L π) Lπ) π F is ot cetered i geeral ot eve asymptotically), ad this plays a crucial role i establishig the theoretical validity of g π. I fact, it is clear that L π) is ot i geeral ubiased for Lπ). To motivate the procedures of this sectio, we also eed to defie the missig probability mechaism, i.e., the quatity pz i, Y i ) := P i = Z i, Y i = E i Z i, Y i ), i =,,. I what follows we shall also assume that pz i, Y i ) p 0 > 0 ; 7) this is a assumptio which says, i a sese, that there is always a ozero probability p 0 that a observatio is ot missig. Now, cosider the hypothetical situatio where the above fuctio p is kow ad put L p π) := i pz i, Y i ) Ig πz i ) Y i, 8)
6 M. Mojirsheibai ad Z. Motazeri 358 where g π is as i 4). I passig we also ote that 5) is the special case of 8) whe E i ) = for all i. I fact, it is straightforward to see that L p π) satisfies E[ L p π)] = Lπ), where Lπ) = Pg π Z) Y ). It is importat to metio that the idea i 8) is very similar to that used by Györfi et al. 00; Chapter 6) for the ubiased estimatio of a mea from cesored data. Next, defie the followig revised versio of the estimator π i 6) ad let g π be its correspodig classifier, i.e., if π z) > /, g π z) = 0 otherwise. π = argmi Lp π) 9) 0) To study the performace of g π, let L π ) = P g π Z) Y D be the misclassificatio error of g π. The we have the followig result. Theorem Let F be a totally bouded class of fuctios mappig R d [0, ] cotaiig the fuctio π x) = PY = X = x. The for every ɛ ad δ satisfyig δ > ɛ > 0 oe has P L π ) L > δ N ɛ, F) exp δ/ ɛ) p 0, where p 0 is as i 7). The proofs of the theorems will be deferred util all the results have bee stated.the followig corollary is a immediate cosequece of the Borel-Camtelli lemma: Corollary Let ɛ be a sequece of positive costats decreasig to 0. Also let F be the class of fuctios defied i Theorem. If, as, the log N ɛ, F) 0 L π ) a.s. L. Thus, if the missig probability mechaism pz i, Y i ) were kow, the above approach would provide the theoretical basis to costruct strogly cosistet classifiers. Ufortuately, i practice, the missig probability mechaism is almost always ukow ad must be estimated. I the ext sectio we propose a kerel-based approach to overcome this problem.
7 Icomplete Covariates 359. Kerel Regressio Let pz i, Y i ) = P i = Z i, Y i be the missig probability, i.e., the coditioal probability that V i is missig recall that Z i = X i, V i ) ). Uder the commoly used assumptio of data Missig At Radom MAR), oe assumes that the probability that V i is missig does ot deped o V i itself. That is, P i = Z i, Y i = P i = X i, Y i =: qx i, Y i ). ) Whe P i = Z i, Y i = P i = the V i is said to be Missig Completely At Radom MCAR). For these defiitios ad a survey of other missig patters oe may refer to the book by Little ad Rubi 00). Now cosider the followig kerel-based estimator of the fuctio qx i, Y i ) defied i ): qx i, Y i ) = ) j=, i Xj X jiy j = Y i K i j=, i IY j = Y i K Xj X i h h ), ) with the covetio 0/0 = 0, where K : R d R + is ay kerel with the smoothig parameter h; here h h) 0, as.) Next, for each π F, put L q π) := i qz i, Y i ) Ig πz i ) Y i, ad defie π = argmi L q π). The the correspodig classifier is give by if π z) > /, g π z) = 0 otherwise. 3) To assess the performace of g π we will make the followig assumptios: C: The MAR assumptio ) holds with qx i, Y i ) q 0 > 0, for some positive costat q 0, compare with 7)). C: The radom vector X has a compactly supported desity fuctio, fx), ad f is bouded away from zero o its support. Furthermore, f ad its first-order partial derivatives are uiformly bouded o its support.
8 M. Mojirsheibai ad Z. Motazeri 360 C3: The partial derivatives x i qx, y), where i =, d), exist ad are bouded o the compact support of f, uiformly i x. C4: The kerel K satisfies Ku)du = ad u i Ku)du <, i =,, d, ad K <. The smoothig parameter h satisfies h 0 ad h d, as. The followig theorem gives performace bouds for the classifier g π. Theorem 3 Let F be as i Theorem ad defie the classifier g π that coditios C C4 hold. i) For every δ > ɛ > 0 there is a 0 > 0 such that for all > 0, as i 3). Also suppose P L π ) L > δ N ɛ, F) e δ ɛ) q0 /8 + 4 e c δ ɛ)/4) h d + e c h d), where L π ) = Pg π Z) Y D ad where c ad c are positive costats ot depedig o, δ, or ɛ. ii) Let ɛ be a sequece of positive costats decreasig to 0. If, as, log N ɛ, F) 0 ad log h 0 the L π ) a.s. L. The above results, as well as those i Theorem ad corollary, are based o the requiremet that F is totally bouded. Furthermore, the ɛ-coverig umber N ɛ, F) of the class F should ot grow too fast as ɛ gets closer ad closer to 0). There are may importat classes of fuctios that satisfy these requiremets; here we give two examples: Example. Differetiable fuctios.) Let k,, k s be o-egative itegers ad put k = k,, k s ) ad k = k + +k s. Also, for ay g : R s R, let D k) gu) = k gu)/ u k,, uks s. Cosider the class of fuctios with bouded partial derivatives of order r: G = g : [0, ] d R k r sup D k) gu) A <. u The, for every ɛ > 0, log N ɛ, Ψ) Mɛ α, where α = d/r ad M Md, r). This result is due to Kolmogorov ad Tikhomirov 959).
9 Icomplete Covariates 36 Example. Cosider the class Ψ of all covex fuctios ψ : C [0, ], where C R d is compact ad covex. If ψ satisfies the Lipschitz coditio ψz ) ψz ) L z z, for all z, z C, the log N ɛ, Ψ) Mɛ d/, for every ɛ > 0, where M Md, L); see Va der Vaart ad Weller 996)..3 Least-squares Regressio I this sectio we cosider least-squares LS) estimates of the fuctio q. The method works as follows. Suppose that q belogs to the kow class of fuctios Q of the form q : R d 0, [q 0, ], where q 0 is as i assumptio C. The least-squares estimate of q is give by Now, for each π F, let q = argmi q Q i qx i, Y i )). L q π) := i qz i, Y i ) Ig πz i ) Y i, ad defie π = argmi L q π). I this case, we cosider the followig classifier if π z) > /, g π z) = 0 otherwise. 4) To study the performace of g π we also eed the followig stadard otatio from the empirical process theory. Fix x, y ),, x, y ) ad let N ɛ, Q, x i, y i ) ) be the ɛ-coverig umber of the class Q with respect to the empirical measure of the poits x, y ),, x, y ). That is, N ɛ, Q, x i, y i ) ) is the cardiality of the smallest subclass of fuctios Q ɛ = q,, q Nɛ) q i : R d 0, [q 0, ] such that for every q Q ad every ɛ > 0 there is a q Q ɛ such that qx i, y i ) q x i, y i ) < ɛ. For more o this oe may refer, for example, to Pollard 984) or va der Vaart ad Weller 996). We the have the followig result. Theorem 4 Let F be as i Theorem ad suppose that coditio C holds. Also, defie the classifier g π as i 4) ad set L π ) = Pg π Z) Y D. The:
10 M. Mojirsheibai ad Z. Motazeri 36 i) For every δ > ɛ > 0 there is a 0 > 0 such that for all > 0, P L π ) L > δ N ɛ, F) e δ ɛ) q0 /8 [ δ ɛ)q ) ] + 8E N 0, Q, X i, Y i ) e C 3δ ɛ) 64 [ δ ɛ) q 4 ) ] 0 + 8E N, Q, X i, Y i ) e C 4δ ɛ) 4 04 where c 3 ad c 4 are positive costats ot depedig o, δ, or ɛ. ii) Let ɛ be a sequece of positive costats decreasig to 0. If, as, )] log N ɛ, F) log E [N c, Q, X i, Y i ) 0 ad 0, c > 0, the L π ) a.s. L. 3 Proofs Proof of Theorem. The proof is based o stadard argumets, see, for example, Devroye et al. 996; Sec. 8.3)), ad goes as follows. First observe that for ay classifier g PgZ) Y = PgZ) = Y ) = PgZ) =, Y = + PgZ) = 0, Y = 0 [ ] [ ] = E IgZ) = IY = E IgZ) = 0 IY = 0 [ ] [ ] = E E IgZ) = IY = Z E E IgZ) = 0 IY = 0 Z [ ] = E IgZ) = π Z) + IgZ) = 0 π Z)), where π Z) = PY = Z. Thus, [ ] pgz) y L = E Ig B Z) = π Z) + Ig B Z) = 0 π Z)) [ ] E IgZ) = π Z) + IgZ) = 0 π Z)) [ ) = E π Z) Ig B Z) = IgZ) = )] + π Z)) Ig B Z) = 0 IgZ) = 0 [ )] = E π Z) ) Ig B Z) = IgZ) = = E[ π Z) ] Ig B Z) gz), 5) i view of the defiitios of g B ad π i ) ad )).
11 Icomplete Covariates 363 Now let π F ad put Lπ) = Pg π Z) Y, where ad ote that by 5) if πz) > g π z) = 0 otherwise, Lπ) Lπ ) = E[ π Z) ] Ig B Z) g π Z) E πz) π Z), 6) where the last lie follows sice π Z) 0.5 πz) π Z) wheever g B Z) gz). Let π F ɛ be such that π Bπ, ɛ); this is possible sice F ɛ is a ɛ-cover of F ad π F. Sice if Lπ) L E π Z) π Z), by 6)) sup z R d+p π z) π z) ɛ, because π Bπ, ɛ)), 7) oe fids that for every δ > ɛ > 0 P L π ) L > δ P = P P sup L π ) if Lπ) > δ ɛ Lπ) > δ ɛ L π ) L p π ) + L p π ) if Lp π) Lπ) > δ ɛ N ɛ, F) sup P Lp π) Lπ) δ > ɛ. Now, by Hoeffdig s iequality, this last probability statemet appearig above ca be bouded by exp δ/ ɛ) p 0, ad this completes the proof of the theorem. Proof of Theorem 3. Part i) For each π F, let L q π) := i qx i, Y i ) Ig πz i ) Y i
12 M. Mojirsheibai ad Z. Motazeri 364 ad observe that L q π) L q π) = Furthermore, sice i Ig π Z i ) Y i i Ig π Z i ) Y i qx i, Y i ) qx i, Y i ) ) qx i, Y i ) qx i, Y i ) qx i, Y i ). L π ) if Lπ) = [ ] L π ) L q π ) sup L q π) Lπ), [ + L q π ) if ] Lπ) oe fids that P L π ) L > δ P L π ) if Lπ) > δ ɛ, i view of 7)) P sup L q π) Lπ) > δ ɛ ) P qx i, Y i ) qx i, Y i ) qx i, Y i ) > δ 4 ɛ + P sup L q π) Lπ) > δ 4 ɛ := I + II, say). 8) But, usig the MAR assumptio see )), it is straightforward to see that E[ L q π)] = Lπ). Therefore II N ɛ, F) P Lq π) Lπ) > δ 4 ɛ N ɛ, F) e δ ɛ) q 0 /8, via Hoeffdig s iequality). 9) As for the term I i 8) first ote that [ I P qx i, Y i ) qx i, Y i ) qx i, Y i ) δ ɛ ] [ > 4 [ + P qx i, Y i ) < q ] 0 P qxi, Y i ) qx i, Y i ) /q0) > δ ɛ 4 + qx i, Y i ) > q ] 0 P qx i, Y i ) < q 0. 0) It will be show at the ed of the proof that for every costat b > 0, ad large eough, P qxi, Y i ) qx i, Y i ) > b 4e C 3h d b, )
13 Icomplete Covariates 365 where C 3 is a positive costat ot depedig o or ɛ. Therefore, takig b = δ ɛ i ), the first sum o the r.h.s. of 0) is bouded by 4e C 4h d δ ɛ), for large eough, where C 4 > 0 does ot deped o, δ, or ɛ. Similarly, sice P qx i, Y i ) < q 0 / P qxi, Y i ) qx i, Y i ) > q 0, oe fids, via )), that for large eough, the secod sum o the r.h.s. of 0) is bouded by 4e C 5h d, where the costat C 5 is positive ad does ot deped o or ɛ. Puttig the above together, we have show that for large eough, I 4 e C 4h d δ ɛ) + 4 e C 5h d. This completes the proof of part i) of Theorem 3. Part ii) follows from the Borel-Catelli lemma. Proof of ). Sice qx i, Y i ) qx i, Y i ), it is sufficiet to prove ) for 0 < b. Now, let SX i, Y i ) = fx i )P Y = Y i Y i )qx i, Y i ) ŜX i, Y i ) = ) h d Xj X ) i j IY j = Y i K h j=, i RX i, Y i ) = fx i )P Y = Y i Y i RX i, Y i ) = ) h d Xj X ) i IY j = Y i K h j=, i ad observe that qx i, Y i ) qx i, Y i ) = ŜX i, Y i ) RX i, Y i ) SX i, Y i ) RX i, Y i ) = ŜX i, Y i )/ RX i, Y i ) RX i, Y i ) RX i, Y i ) RX i, Y i )) + ŜX i, Y i ) SX i, Y i ) RX i, Y i ) RX i, Y i ) RX i, Y i ) + RX i, Y i ) ŜX i, Y i ) SX i, Y i ), RX i, Y i ) where we have used the fact that ŜX i, Y i )/ RX i, Y i ). Therefore, sice RX i, Y i ) > C 6 0, by assumptio C)), oe fids that for every b > 0 P qxi, Y i ) qx i, Y i ) > b P ŜX i, Y i ) SX i, Y i ) > C 7 b + P RXi, Y i ) RX i, Y i ) > C 7 b := π + π. )
14 M. Mojirsheibai ad Z. Motazeri 366 where C 7 = C 6 /. Now, by the results of Mojirsheibai et al. 0; Lemma A., with gz, Y ) = ) oe fids ] SX i, Y i ) E[ŜXi, Y i ) X i, Y i Ch, 3) where C > 0 is a costat ot depedig o. Therefore ] ] E π P ŜX i, Y i ) E[ŜXi, Y i ) X i, Y i + [ŜXi, Y i ) X i, Y i SX i, Y i )] > C7 b ] P ŜX i, Y i ) E[ŜXi, Y i ) X i, Y i > C8 b where for large by 3)), where C 8 = C 7 /) [ ] ] = E P ŜX i, Y i ) E[ŜXi, Y i ) X i, Y i > C8 b X i, Y i = E P ) Γ j X i, Y i ) > C 8b X i, Y i, 4) j=, i [ Γ j X i, Y i ) = h d j IY j = Y i K Xj X i h ) E j IY j = Y i K Xj X i h ) Xi, Y i ]. However, coditioal o X i, Y i ), the terms Γ j X i, Y i ), j =,,, are idepedet, zero-mea radom variables, bouded by h d K ad +h d K. We also ote that ] VarΓ j X i, Y i ) X i, Y i ) = E [Γ j X i, Y i ) X i, Y i h d K f. Therefore, by Beett s iequality Beett, 96), for ay fixed oradom) x ad y P ) Γ j X i, Y i ) > C 8 b )h d X i = x, Y i = y exp C8 b, K f + C 8 b j=, i where the boud does ot deped o x or y. 0 < b, oe fids for large eough), )h d C 8 π exp b. K f + C 8 Therefore, i view of 4) ad the fact that Similarly, oe ca also show with, i fact, less efforts) that, for large eough, )h d C 9 π exp b, K f + C 9 where C 9 is a positive costat ot depedig o or b. This complete the proof of ).
15 Icomplete Covariates 367 Proof of Theorem 4. Part i) Usig 7) ad the argumets that lead to 8), we fid P L π ) L > δ I + II, where II is as i 8) ad But, by 9), I := P ) qx i, Y i ) qx i, Y i ) qx i, Y i ) > δ 4 ɛ. II < N ɛ, F) e δ ɛ) q 0 /8. To deal with the term I first ote that sice q q 0, oe fids I P q0 qx i, Y i ) qx i, Y i ) > δ ɛ 4 [ ] P qx i, Y i ) qx i, Y i ) E qx, Y ) qx, Y ) D [ ] + E qx, Y ) qx, Y ) D > δ ɛ)q 0 4 P sup q X, Y ) qx i, Y i ) Eq X, Y ) qx, Y ) > δ ɛ)q 0 q Q 8 [ ] + P E qx, Y ) qx, Y ) D > δ ɛ)q 0 8 := I A) + I B). 5) Stadard results from the empirical process theorey, see for example, Pollard 984)), yields [ δ ɛ)q I A) ) ] 8E N 0, Q, X i, Y i ) e δ ɛ) q0 4/8)8) 64 As for the term I B), put S q) = [ i qx i, Y i )]
16 M. Mojirsheibai ad Z. Motazeri 368 ad observe that I B) [ P E qx, Y ) qx, Y ) ] D > δ ɛ) q by Cauchy-Schwartz iequality) [ ] = P E qx, Y ) δ ɛ) q 4 D EqX, Y ) > 0 64 P sup S q ) E q X, Y ) δ ɛ) q 4 > 0, 64 q Q where the last lie above follows from the followig argumets [ ] E qx, Y ) D EqX, Y ) [ ] = E qx, Y ) q D if E X, Y ) q Q [ ] = sup E qx, Y ) D S q) + S q) q Q S q ) + S q ) Eq X, Y ) sup S q ) E q X, Y ), q Q ad where, we have used the fact that S q) S q ) 0, by the defiitio of q). Therefore [ δ ɛ) I B) q0 4 8E N, Q, X i, Y i ) 04 ) ] e Cδ ɛ)4, where C > 0 does ot deped o or ɛ. Part ii) follows from the Borel-Catelli lemma. Ackowledgemets. The authors would like to thak Professor Hamedai ad the referees for the helpful commets. Refereces [] Beett, G. 96). Probability iequalities for the sum of idepedet radom variables. Joural of the America Statistical Associatio, 57, [] Devroye, L., Györfi, L., ad Lugosi, G. 996). A Probabilistic Theory of Patter Recogitio. Spriger, New York.
17 Icomplete Covariates 369 [3] Györfi, L., Kohler, M., Krzyzak, A., ad Walk, H. 00). A Distributio-Free Theory of Noparametric Regressio. Spriger. [4] Kolmogorov, A.N. ad Tikhomirov, V.M. 959). ɛ-etropy ad ɛ-capacity of sets i fuctio spaces, Uspekhi Matematicheskikh Nauk, 4, [5] Little, R.J.A. ad Rubi, D.B. 00). Statistical Aalysis With Missig Data. Wiley, New York. [6] Mojirsheibai, M., Motazeri, Z., ad Rajaeefard, A. 0). O classificatio with icomplete covariates. Statistics, 45, [7] Mojirsheibai, M. ad Motazeri, Z. 007). Statistical classificatio with missig covariates. Joural of the Royal Statistical Society Ser. B., 69, [8] Pollard, D. 984). Covergece of Stochastic Processes. Spriger-Verlag, New York. [9] va der Vaart, A.W. ad Weller, J.A. 996). Weak Covergece ad Empirical Processes with Applicatio to Statistics. Spriger-Verlag, New York. [0] Willard, S. 004). Geeral Topology. Dover Publicatios.
Estimation of the essential supremum of a regression function
Estimatio of the essetial supremum of a regressio fuctio Michael ohler, Adam rzyżak 2, ad Harro Walk 3 Fachbereich Mathematik, Techische Uiversität Darmstadt, Schlossgartestr. 7, 64289 Darmstadt, Germay,
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationRates of Convergence by Moduli of Continuity
Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationEmpirical Processes: Glivenko Cantelli Theorems
Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3
More informationlim za n n = z lim a n n.
Lecture 6 Sequeces ad Series Defiitio 1 By a sequece i a set A, we mea a mappig f : N A. It is customary to deote a sequece f by {s } where, s := f(). A sequece {z } of (complex) umbers is said to be coverget
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More informationJournal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula
Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More information5.1 A mutual information bound based on metric entropy
Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationMath Solutions to homework 6
Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there
More informationChapter 6 Infinite Series
Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat
More informationLecture Notes for Analysis Class
Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More information1 Convergence in Probability and the Weak Law of Large Numbers
36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec
More informationA RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS
J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More information32 estimating the cumulative distribution function
32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationNotes #3 Sequences Limit Theorems Monotone and Subsequences Bolzano-WeierstraßTheorem Limsup & Liminf of Sequences Cauchy Sequences and Completeness
Notes #3 Sequeces Limit Theorems Mootoe ad Subsequeces Bolzao-WeierstraßTheorem Limsup & Limif of Sequeces Cauchy Sequeces ad Completeess This sectio of otes focuses o some of the basics of sequeces of
More informationMeasure and Measurable Functions
3 Measure ad Measurable Fuctios 3.1 Measure o a Arbitrary σ-algebra Recall from Chapter 2 that the set M of all Lebesgue measurable sets has the followig properties: R M, E M implies E c M, E M for N implies
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More informationA Note on the Kolmogorov-Feller Weak Law of Large Numbers
Joural of Mathematical Research with Applicatios Mar., 015, Vol. 35, No., pp. 3 8 DOI:10.3770/j.iss:095-651.015.0.013 Http://jmre.dlut.edu.c A Note o the Kolmogorov-Feller Weak Law of Large Numbers Yachu
More informationSelf-normalized deviation inequalities with application to t-statistic
Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric
More informationSummary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector
Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationProperties of Fuzzy Length on Fuzzy Set
Ope Access Library Joural 206, Volume 3, e3068 ISSN Olie: 2333-972 ISSN Prit: 2333-9705 Properties of Fuzzy Legth o Fuzzy Set Jehad R Kider, Jaafar Imra Mousa Departmet of Mathematics ad Computer Applicatios,
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.
More informationLECTURE 8: ASYMPTOTICS I
LECTURE 8: ASYMPTOTICS I We are iterested i the properties of estimators as. Cosider a sequece of radom variables {, X 1}. N. M. Kiefer, Corell Uiversity, Ecoomics 60 1 Defiitio: (Weak covergece) A sequece
More informationFUNDAMENTALS OF REAL ANALYSIS by
FUNDAMENTALS OF REAL ANALYSIS by Doğa Çömez Backgroud: All of Math 450/1 material. Namely: basic set theory, relatios ad PMI, structure of N, Z, Q ad R, basic properties of (cotiuous ad differetiable)
More informationEntropy Rates and Asymptotic Equipartition
Chapter 29 Etropy Rates ad Asymptotic Equipartitio Sectio 29. itroduces the etropy rate the asymptotic etropy per time-step of a stochastic process ad shows that it is well-defied; ad similarly for iformatio,
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationLecture 7: Density Estimation: k-nearest Neighbor and Basis Approach
STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationTERMWISE DERIVATIVES OF COMPLEX FUNCTIONS
TERMWISE DERIVATIVES OF COMPLEX FUNCTIONS This writeup proves a result that has as oe cosequece that ay complex power series ca be differetiated term-by-term withi its disk of covergece The result has
More informationKernel density estimator
Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationTheorem 3. A subset S of a topological space X is compact if and only if every open cover of S by open sets in X has a finite subcover.
Compactess Defiitio 1. A cover or a coverig of a topological space X is a family C of subsets of X whose uio is X. A subcover of a cover C is a subfamily of C which is a cover of X. A ope cover of X is
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationBerry-Esseen bounds for self-normalized martingales
Berry-Essee bouds for self-ormalized martigales Xiequa Fa a, Qi-Ma Shao b a Ceter for Applied Mathematics, Tiaji Uiversity, Tiaji 30007, Chia b Departmet of Statistics, The Chiese Uiversity of Hog Kog,
More informationDefinition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.
4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad
More informationMetric Space Properties
Metric Space Properties Math 40 Fial Project Preseted by: Michael Brow, Alex Cordova, ad Alyssa Sachez We have already poited out ad will recogize throughout this book the importace of compact sets. All
More informationECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002
ECE 330:541, Stochastic Sigals ad Systems Lecture Notes o Limit Theorems from robability Fall 00 I practice, there are two ways we ca costruct a ew sequece of radom variables from a old sequece of radom
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationChapter 7 Isoperimetric problem
Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated
More informationON POINTWISE BINOMIAL APPROXIMATION
Iteratioal Joural of Pure ad Applied Mathematics Volume 71 No. 1 2011, 57-66 ON POINTWISE BINOMIAL APPROXIMATION BY w-functions K. Teerapabolar 1, P. Wogkasem 2 Departmet of Mathematics Faculty of Sciece
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationMAT1026 Calculus II Basic Convergence Tests for Series
MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real
More information1+x 1 + α+x. x = 2(α x2 ) 1+x
Math 2030 Homework 6 Solutios # [Problem 5] For coveiece we let α lim sup a ad β lim sup b. Without loss of geerality let us assume that α β. If α the by assumptio β < so i this case α + β. By Theorem
More informationMcGill University Math 354: Honors Analysis 3 Fall 2012 Solutions to selected problems
McGill Uiversity Math 354: Hoors Aalysis 3 Fall 212 Assigmet 3 Solutios to selected problems Problem 1. Lipschitz fuctios. Let Lip K be the set of all fuctios cotiuous fuctios o [, 1] satisfyig a Lipschitz
More informationLecture 3 : Random variables and their distributions
Lecture 3 : Radom variables ad their distributios 3.1 Radom variables Let (Ω, F) ad (S, S) be two measurable spaces. A map X : Ω S is measurable or a radom variable (deoted r.v.) if X 1 (A) {ω : X(ω) A}
More informationf n (x) f m (x) < ɛ/3 for all x A. By continuity of f n and f m we can find δ > 0 such that d(x, x 0 ) < δ implies that
Lecture 15 We have see that a sequece of cotiuous fuctios which is uiformly coverget produces a limit fuctio which is also cotiuous. We shall stregthe this result ow. Theorem 1 Let f : X R or (C) be a
More informationECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors
ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationMi-Hwa Ko and Tae-Sung Kim
J. Korea Math. Soc. 42 2005), No. 5, pp. 949 957 ALMOST SURE CONVERGENCE FOR WEIGHTED SUMS OF NEGATIVELY ORTHANT DEPENDENT RANDOM VARIABLES Mi-Hwa Ko ad Tae-Sug Kim Abstract. For weighted sum of a sequece
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More informationDistribution of Random Samples & Limit theorems
STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to
More informationSTA Object Data Analysis - A List of Projects. January 18, 2018
STA 6557 Jauary 8, 208 Object Data Aalysis - A List of Projects. Schoeberg Mea glaucomatous shape chages of the Optic Nerve Head regio i aimal models 2. Aalysis of VW- Kedall ati-mea shapes with a applicatio
More informationM17 MAT25-21 HOMEWORK 5 SOLUTIONS
M17 MAT5-1 HOMEWORK 5 SOLUTIONS 1. To Had I Cauchy Codesatio Test. Exercise 1: Applicatio of the Cauchy Codesatio Test Use the Cauchy Codesatio Test to prove that 1 diverges. Solutio 1. Give the series
More informationRandom Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.
Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationNotes 19 : Martingale CLT
Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationSOME SEQUENCE SPACES DEFINED BY ORLICZ FUNCTIONS
ARCHIVU ATHEATICU BRNO Tomus 40 2004, 33 40 SOE SEQUENCE SPACES DEFINED BY ORLICZ FUNCTIONS E. SAVAŞ AND R. SAVAŞ Abstract. I this paper we itroduce a ew cocept of λ-strog covergece with respect to a Orlicz
More informationAdvanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology
Advaced Aalysis Mi Ya Departmet of Mathematics Hog Kog Uiversity of Sciece ad Techology September 3, 009 Cotets Limit ad Cotiuity 7 Limit of Sequece 8 Defiitio 8 Property 3 3 Ifiity ad Ifiitesimal 8 4
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More information5. Likelihood Ratio Tests
1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationLecture 2: Concentration Bounds
CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy
More informationLecture 8: Convergence of transformations and law of large numbers
Lecture 8: Covergece of trasformatios ad law of large umbers Trasformatio ad covergece Trasformatio is a importat tool i statistics. If X coverges to X i some sese, we ofte eed to check whether g(x ) coverges
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationEFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS
EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS Ryszard Zieliński Ist Math Polish Acad Sc POBox 21, 00-956 Warszawa 10, Polad e-mail: rziel@impagovpl ABSTRACT Weak laws of large umbers (W LLN), strog
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More informationON THE FUZZY METRIC SPACES
The Joural of Mathematics ad Computer Sciece Available olie at http://www.tjmcs.com The Joural of Mathematics ad Computer Sciece Vol.2 No.3 2) 475-482 ON THE FUZZY METRIC SPACES Received: July 2, Revised:
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 6 9/23/203 Browia motio. Itroductio Cotet.. A heuristic costructio of a Browia motio from a radom walk. 2. Defiitio ad basic properties
More informationCouncil for Innovative Research
ABSTRACT ON ABEL CONVERGENT SERIES OF FUNCTIONS ERDAL GÜL AND MEHMET ALBAYRAK Yildiz Techical Uiversity, Departmet of Mathematics, 34210 Eseler, Istabul egul34@gmail.com mehmetalbayrak12@gmail.com I this
More informationJanuary 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS
Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationHomework 4. x n x X = f(x n x) +
Homework 4 1. Let X ad Y be ormed spaces, T B(X, Y ) ad {x } a sequece i X. If x x weakly, show that T x T x weakly. Solutio: We eed to show that g(t x) g(t x) g Y. It suffices to do this whe g Y = 1.
More informationThe log-behavior of n p(n) and n p(n)/n
Ramauja J. 44 017, 81-99 The log-behavior of p ad p/ William Y.C. Che 1 ad Ke Y. Zheg 1 Ceter for Applied Mathematics Tiaji Uiversity Tiaji 0007, P. R. Chia Ceter for Combiatorics, LPMC Nakai Uivercity
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationarxiv: v1 [math.pr] 4 Dec 2013
Squared-Norm Empirical Process i Baach Space arxiv:32005v [mathpr] 4 Dec 203 Vicet Q Vu Departmet of Statistics The Ohio State Uiversity Columbus, OH vqv@statosuedu Abstract Jig Lei Departmet of Statistics
More informationOn Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925
More informationIf a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?
2 Lebesgue Measure I Chapter 1 we defied the cocept of a set of measure zero, ad we have observed that every coutable set is of measure zero. Here are some atural questios: If a subset E of R cotais a
More informationEntropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP
Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.
More informationDetailed proofs of Propositions 3.1 and 3.2
Detailed proofs of Propositios 3. ad 3. Proof of Propositio 3. NB: itegratio sets are geerally omitted for itegrals defied over a uit hypercube [0, s with ay s d. We first give four lemmas. The proof of
More informationLecture 3 The Lebesgue Integral
Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More information