Exponential Convergence Rates in Classification

Size: px
Start display at page:

Download "Exponential Convergence Rates in Classification"

Transcription

1 Expoetial Covergece Rates i Classificatio Vladimir Koltchiskii ad Olexadra Bezosova Departmet of Mathematics ad Statistics The Uiversity of New Mexico Albuquerque, NM 873-4, U.S.A. vlad@math.um.edu,bezosik@math.um.edu Abstract. Let (X, Y ) be a radom couple, X beig a observable istace ad Y {, } beig a biary label to be predicted based o a observatio of the istace. Let (X i, Y i), i =,..., be traiig data cosistig of idepedet copies of (X, Y ). Cosider a real valued classifier ˆf that miimizes the followig pealized empirical risk l(y if(x i)) + λ f 2 mi, f H i= over a Hilbert space H of fuctios with orm, l beig a covex loss fuctio ad λ > 0 beig a regularizatio parameter. I particular, H might be a Sobolev space or a reproducig kerel Hilbert space. We provide some coditios uder which the geeralizatio error of the correspodig biary classifier sig( ˆf ) coverges to the Bayes risk expoetially fast. Itroductio Let (S, d) be a metric space ad (X, Y ) be a radom couple takig values i S {, } with joit distributio P. The distributio of X (which is a measure o the Borel σ-algebra i S) will be deoted by Π. Let (X i, Y i ), i be a sequece of idepedet copies of (X, Y ). Here ad i what follows all radom variables are defied o some probability space (Ω, Σ, P ). Let H be a Hilbert space of fuctios o S such that H is dese i the space C(S) of all cotiuous fuctios o S ad, i additio, x, y S f(x) f ad f(x) f(y) f d(x, y). () Here = H is the orm of H ad, =, H is its ier product. We have i mid two mai examples. I the first oe, S is a compact domai i R d with smooth boudary. For ay s, oe ca defie the followig ier product i the space C (S) of all ifiitely differetiable fuctios i S : f, g s := D α fd α gdx. α s Partially supported by NSF grat DMS S

2 Here α = (α,..., α d ), α j = 0,,..., α := d i= α i ad D α α f f = x α.... xα d d ) The Sobolev space H s (S) is the completio of (C (S),, s. There is also a versio of the defiitio for ay real s > 0 that utilizes Fourier trasforms. If s > d/2 +, the it follows from Sobolev s embeddig theorems that coditios () hold with metric d beig the Euclidea distace (possibly, after a proper rescalig of the ier product or of the metric d to make costats equal to ). I the secod example, S is a metric compact ad H = H K is the reproducig kerel Hilbert space (RKHS) geerated by a Mercer kerel K. This meas that K is a cotiuous symmetric oegatively defiite kerel ad H K is defied as the completio of the liear spa of fuctios {K x : x S}, K x (y) := K(x, y), with respect to the followig ier product: α i K xi, β j K yj := α i β j K(x i, y j ). i j K i,j It is well kow that H K ca be idetified with a subset of C(S) ad implyig that f H K f(x) = f, K x K, f(x) f K sup K x K ad f(x) f(y) f K K x K y K, x S so agai coditios () hold with d(x, y) := K x K y K (as before, a simple rescalig is eeded to esure that the costats are equal to ). I biary classificatio problems, it is commo to look for a real valued classifier ˆf that solves the followig pealized empirical risk miimizatio problem l(y i f(x i )) + λ f 2 mi, f H, (2) i= where l is a oegative decreasig covex loss fuctio such that l I (,0] ad λ > 0 is a regularizatio parameter. For istace, if l is a hige loss, i.e. l(u) = ( u) 0, ad is a RKHS-orm, this is a stadard approach i kerel machies classificatio. Give a real valued classifier f : S R, the correspodig biary classifier is typically defied as x sig(f(x)), where sig(u) = + for u 0 ad otherwise. The geeralizatio error or risk of f is the R P (f) := P {(x, y) : y sig(f(x))}.

3 It is well kow that the miimium of R P (f) over all measurable fuctios f is attaied at the regressio fuctio η defied as η(x) := E(Y X = x). The correspodig biary classifier sig(η(x)) is called the Bayes classifier, the quatity R := R P (η) is called the Bayes risk ad, fially, the quatity R P (f) R is ofte referred to as the excess risk of a classifier f. Our goal i this ote is to show that uder some (aturally restrictive) assumptios the expectatio of the excess risk of ˆf coverges to 0 expoetially fast as. Recetly, Audibert ad Tsybakov [] observed a similar pheomeo i the case of plug-i classifiers ad our aalysis here cotiues this lie of work. Deote δ(p ) := sup{δ > 0 : Π{x : η(x) δ} = 0}. We will assume that (a) η is a Lipschitz fuctio with costat L > 0 (which, for the sake of simplicity of otatios, will be assumed to be i what follows): η(x) η(y) Ld(x, y). (b) δ(p ) > 0. These will be two mai coditios that guaratee the possibilty of expoetially fast covergece rates of the geeralizatio error to the Bayes risk. Note that coditio (b), which is a extreme case of Tsybakov s low oise assumptio, meas that there exists δ > 0 such that Π-a.e. either η(x) δ, or η(x) δ. The fuctio η (as a coditioal expectatio) is defied up to Π-a.e. Coditio (a) meas that there exists a smooth (Lipschitz) versio of this coditioal expectatio. Sice smooth fuctios ca ot jump immediately from the value δ to value δ, the combiatio of coditios (a) ad (b) essetially meas that there should be a wide eough corridor betwee the regios {η δ} ad {η δ}, but the probability of gettig ito this corridor is zero. The fact that i such situatios it is possible to costruct classifiers that coverge to Bayes expoetially fast is essetially rather simple, it reduces to a large deviatio type pheomeo, ad it is eve surprisig that, up to our best kowledge, the possibility of such superfast covergece rates i classificatio has ot bee observed before Audibert ad Tsybakov [] (we apologize if someoe, i fact, did it earlier). Subtle results o covergece rates of the geeralizatio error of large margi classifiers to the Bayes risk have bee obtaied relatively recetly, see papers by Bartlett, Jorda ad McAuliffe [3] ad by Blachard, Lugosi ad Vayatis [5] o boostig, ad papers by Blachard, Bousquet ad Massart [4] ad by Scovel ad Steiwart [7] o SVM. These papers rely heavily o geeral expoetial iequalities i abstract empirical risk miimizatio i spirit of papers by Bartlett, Bousquet ad Medelso [2] or Koltchiskii [6] (or eve earlier work by Birgé ad Massart i the 90s). The rates of covergece i classificatio based o this

4 geeral approach are at best of the order O( ). I classificatio problems, there are may relevat probabilistic, aalytic ad geometric parameters to play with whe oe studies the covergece rates. For istace, both papers [4] ad [7] deal with SVM classifiers (so, essetially, with problem (2) i the case whe H is RKHS). I [4], the covergece rates are studied uder the assumptio (b) above ad uder some coditios o the eigevalues of the kerel. I [7], the authors determie the covergece rates uder the assumptio o the etropy of the uit ball i RKHS of the same type as our assumptio (3) below, uder Tsybakov s low oise assumptio ad some additioal coditios of geomeric ature. The fact that uder somewhat more restrictive assumptios imposed i this paper eve expoetial covergece rates are possible idicates that, probably, we have ot uderstood to the ed rather subtle iterplay betwee various parameters that ifluece the behaviour of this type of classifiers. 2 Mai Result We ow tur to precise formulatio of the results. Our goal will be to explai the mai ideas rather tha to give the results i the full geerality, so, we will make below several simplifyig assumptios. First, we eed some coditios o the loss fuctio l ad to get this out of the way, we will just assume that l is the so called logit loss, l(u) = log 2 ( + e u ), u R (other loss fuctios of the same type that are decreasig, strictly covex, satisfy the assumptio l I (,0] ad grow slower tha u 2 as u will also do). We deote (l f)(x, y) := l(yf(x)). For a fuctio g o S {, }, we write P g = gdp = E g(x, Y ). S {,} Let P be the empirical measure based o the traiig data (X i, Y i ), i =,...,. We will write P g = gdp = g(x i, Y i ). S {,} We use similar otatios for fuctios defied o S. A simple ad well kow computatio shows that the fuctio f P (l f) attais its miimum at f defied by f (x) = log + η(x) η(x). We will assume i what follows that f H. This assumptio is rather restrictive. Sice fuctios i H are uiformly bouded (see ()) it meas, i particular, that i=

5 η is bouded away from both + ad. Although, there is a versio of the mai result below without this assumptio, we are ot discussig it i this ote. Next we eed a assumptio o so called uiform L 2 -etropy of the uit ball i H, B H := {f H : f }. ( ) Give a probability measure Q o S, let N B H ; L 2 (Q); ε deote the miimal umber of L 2 (Q)-balls eeded to cover B H. Suppose that for some ρ (0, 2) ad for some costat A > 0 ( ) ( ) ρ A Q ε > 0 : log N B H ; L 2 (Q); ε. (3) ε Deote B(x, δ) the ope ball i (S, d) with ceter x ad radius δ. Also, let H(x, δ) be the set of all fuctios h H satisfyig the followig coditios: It follows from (i) (iii) that (i) y S 0 h(y) 2δ (ii) h δ o B(x; δ/2) (iii) hdπ δ B(x;δ) c δπ(b(x; δ/2)) E h(x) S hdπ 2δ Π(B(x; δ)). δ Sice there exists a cotiuous fuctio h such that 0 h 3 2 δ, h 4 3 δ o B(x, δ/2) ad h = 0 o B(x, δ) c, ad, o the other had, H is dese i C(S), it is easy to see that H(x, δ). Deote q(x, δ) := if h. h H(x,δ) The quatity q(x, δ) is, ofte, bouded from above uiformly i x S by a decreasig fuctio of δ, say by q(δ), ad this will be assumed i what follows. Ofte, q(δ) grows as δ γ, δ 0 for some γ > 0. Example. For istace, if H = H s (S) is a Sobolev space of fuctios i a compact domai S R d, s > d/2 +, defie ( ) x y h(y) := δϕ, δ where ϕ C (R d ), 0 ϕ 2, ϕ(x) if x /2 ad ϕ(x) = 0 if x. The h satisfies coditios (i) (iii) (moreover, h = 0 o B(x, δ) c ). A straightforward computatio of Sobolev s orm of h shows that h Hs (S) Cδ +d/2 s,

6 implyig that q(x, δ) is uiformly bouded from above by q(δ) = Cδ γ with γ = s d 2. Similar results are also true i the case of RKHS for some kerels. Let p(x, δ) := δ 2 Π(B(x, δ/2)). I what follows, K, C > 0 will deote sufficietly large umerical costats (whose precise values might chage from place to place). Recall our assumptio that δ(p ) > 0. I this case it is also atural to assume that for all δ δ(p ) K ad for all x such that η(x) δ(p ) p(x, δ) p(δ) > 0 for some fixed fuctio p. This would be true, for istace, if S is a domai i R d ad Π has desity uiformly bouded away from 0 o the set {x : η(x) δ(p )}. I this case we have for all x from this set Defie ow The o the set {x : η(x) δ(p )} p(x, δ) cδ d+2 =: p(δ). r(x, δ) := p(x, δ) q(x, δ). r(x, δ) p(δ) q(δ). ( ) We set U := K f L (here ad i what follows stads for the maximum ad for the miimum) ad defie λ + = λ + (P ) := { ( 4U if r x; δ(p ) ) U log log ad, for a fixed ε > K, Clearly, λ := A2ρ/(2+ρ) 2/(2+ρ) ε. λ + p(δ(p )/U) 4U q(δ(p )/U) > 0, } : η(x) δ(p ) so, λ + is a positive costat. The if is large eough ad ε is ot too large, we have λ λ +. Now, we are ready to formulate the mai result. Theorem. Let λ [λ, λ + ]. The there exists β = β(h, P ) > 0 such that E (R P ( ˆf ) R ) exp{ β}. ( ) ) I fact, with sufficietly large K, C > 0, β is equal to C ( p δ(p ) U ε, which is positive ad does ot deped o, establishig the expoetial covergece rate.

7 3 Proof We use a well kow represetatio of the excess risk R P (f) R = η dπ to get the followig boud: { ˆf (x)η(x) 0} {sig(f) sig(η)} E (R P ( ˆf ) R ) E η(x) Π(dx) = E η(x) I { ˆf(x)η(x) 0} Π(dx) = η(x) E I { ˆf(x)η(x) 0} Π(dx) = η(x) P { ˆf (x)η(x) 0}Π(dx) (4) Our goal ow is to boud, for a give x, P { ˆf (x)η(x) 0}. Let us assume that η(x) = δ > 0 (the other case, whe η(x) < 0, is similar). We have P { ˆf (x)η(x) 0} = P { ˆf (x) 0} P { ˆf (x) 0, ˆf U} + P { ˆf > U}. (5) We start with boudig the first term. For δ 0 > 0 (to be chose later), let h H(x, δ 0 ). Defie Sice ˆf miimizes the fuctioal L (α) := P (l ( ˆf + αh)) + λ ˆf + αh 2. H f P (l f) + λ f 2, the fuctio α L (α) attais its miimum at α = 0. This fuctio is differetiable, implyig that dl ( ) 0 = l (Y j ˆf (X j ))Y j h(x j ) + 2λ dα ˆf, h = 0. Assumig that η(x) = δ > 0, ˆf U ad ˆf (x) 0, we eed to boud from above l (Y j ˆf (X j ))Y j h(x j ) + 2λ ˆf, h, tryig to show that everywhere except the evet of small probability the last expressio is strictly egative. This would cotradict the fact that it is equal to 0, implyig a boud o the probability of the evet { ˆf (x) 0, ˆf U}. First ote that l (Y j ˆf (X j ))Y j h(x j ) = j:y j=+ l ( ˆf (X j ))h(x j ) j:y j= l ( ˆf (X j ))h(x j ).

8 Note also that fuctio l is egative ad icreasig, h is oegative ad ˆf is a Lipschitz fuctio with Lipschitz orm bouded by ˆf. The last observatio ad the assumptio that ˆf (x) 0 imply that, for all y B(x, δ 0 ), ad, as a result, ˆf (y) ˆf δ 0 Uδ 0 l ( ˆf (y)) l (Uδ 0 ), l ( ˆf (y)) l ( Uδ 0 ). Also, for all y S, ˆf (y) ˆf U, implyig that l ( ˆf (y)) l ( U), l ( ˆf (y)) l ( U). This leads to the followig upper boud: l (Y j ˆf (X j ))Y j h(x j ) l (Uδ 0 ) l ( U) l (Uδ 0 ) l ( U) j:x j B(x,δ 0),Y j=+ h(x j ) l ( Uδ 0 ) h(x j ) = j:x j B(x,δ 0) c + Y j h(x j ) l ( Uδ 0 ) 2 j:x j B(x,δ 0) j:x j B(x,δ 0) c h(x j ) = l (Uδ 0 ) l ( Uδ 0 ) 2 l (Uδ 0 ) + l ( Uδ 0 ) 2 l ( U) h(x j )I B(x,δ0)(X j ) + Y j h(x j )I B(x,δ0)(X j ) + h(x j )I B(x,δ0) c(x j). j:x j B(x,δ 0),Y j= j:x j B(x,δ 0) Usig the fact that for logit loss l has its maximum at 0, we get l (Uδ 0 ) + l ( Uδ 0 ) l (0) 2 l (Uδ 0 ) l (0) 2 + l ( Uδ 0 ) l (0) 2 ad l (Uδ 0 ) l ( Uδ 0 ) 2 l (0)Uδ 0. l (0)Uδ 0 h(x j ) + Y j h(x j ) + 2

9 Therefore, l (Y j ˆf (X j ))Y j h(x j ) l (0) l ( U) Y j h(x j )I B(x;δ0)(X j ) + 2l (0)Uδ 0 h(x j )I c(x B(x;δ0) j) = h(x j )I B(x;δ0)(X j ) + ξ j, (6) where ξ, ξ j, j are i.i.d. ξ := l (0)Y h(x)i B(x,δ0)(X) + 2l (0)Uδ 0 h(x)i B(x,δ0)(X) + l ( U) h(x)i B(x;δ0) c(x). To boud the sum of ξ j s, we will use Berstei iequality. To this ed, we first boud the expectatio ad the variace of ξ. We have E ξ = l (0) E Y h(x)i B(x;δ0)(X) + 2l (0)Uδ 0 E h(x)i B(x;δ0)(X) + l ( U) E h(x)i B(x;δ0) c(x). Sice η is Lipschitz with the Lipschitz costat L ad η(x) = δ, η(y) δ Lδ 0 for all y B(x; δ 0 ). Sice also h H(x, δ 0 ), we have: ad E Y h(x)i B(x;δ0)(X) = E η(x)h(x)i B(x;δ0)(X) (δ Lδ 0 ) E h(x)i B(x;δ0)(X) (δ Lδ 0 )( δ 0 ) E h(x), E h(x)i B(x;δ0)(X) E h(x), E h(x)i B(x;δ0) c(x) δ 0 E h(x) Recall that l (0) < 0 ad l (0) 0. So, the followig boud for the expectatio of ξ is immediate: ] E ξ [l (0)(δ Lδ 0 )( δ 0 ) + 2l (0)Uδ 0 + l ( U) δ 0 E h(x). We will choose δ 0 small eough to make [l (0)(δ Lδ 0 )( δ 0 ) + 2l (Uδ 0 )Uδ 0 + l ( U) δ 0 ] δ 0.

10 A simple computatio shows that it is eough to take δ 0 = C δ U L δ L + 4U + 2, which ca be always achieved by makig the umerical costat C large eough. The the expectatio satisfies the boud E ξ δ 0 E h(x). As far as the variace of ξ is cocered, usig a elemetary boud (a+b+c) 2 3a 2 + 3b 2 + 3c 2, it is easy to check that Var(ξ) Cδ 0 E h(x) with a sufficietly large umerical costat C. Fially, it is also straightforward that with some C > 0 ξ Cδ 0. Now Berstei iequality easily yields with a sufficietly large umerical costat C > 0 P { ξj 2 δ 0 E h(x) } { 2 exp δ } 0 E h(x). C The, sice δ 0 E h(x) δ0π(b(x; 2 δ 0 /2)) = p(x, δ 0 ), { } we have with probability at least 2 exp : So, if the p(x,δ0) C l (Y j ˆf (X j ))Y j h(x j ) + 2λ ˆf, h 2 δ 0 E h(x) + 2λ ˆf, h 2 δ 0 E h(x) + 2λU h 2 p(x, δ 0) + 2λUq(x, δ 0 ) (7) λ < p(x, δ 0) 4Uq(x, δ 0 ) = r(x, δ 0) 4U, l (Y j ˆf (X j ))Y j h(x j ) + 2λ ˆf, h < 0 { with probability at least 2 exp ad λ < r(x,δ0) 4U, the p(x,δ0) C P { ˆf (x) 0, ˆf U} 2 exp }. The coclusio is that if η(x) = δ { p(x, δ } 0). C

11 Thus, for λ λ +, we have { P { ˆf (x) 0, ˆf U} 2 exp p(δ } 0). (8) C We ow tur to boudig the probability P { ˆf U} for a properly chose U. This is the oly part of the proof where the coditio (3) o the uiform etropy of the uit ball B H is eeded. It relies heavily o recet excess risk bouds i Koltchiskii [6] as well as o some of the results i spirit of Blachard, Lugosi ad Vayatis [5] (see their Lemma 4). We formulate the boud we eed i the followig lemma. Lemma. Suppose that coditio (3) holds ad (for simplicity) that l is the logit loss. Let R. The, there exists a costat K > 0 such that for ay t > 0, the followig evet f H with f R (9) P (l f) if P (l g) g R ( ) ( RA 2ρ/(2+ρ) 2 P (l f) if P (l g) + K + tr ), (0) g R 2/(2+ρ) has probability at least e t. The argumet that follows will provide a boud that is somewhat aki to some of the bouds i [7] ad i [4]. Deote E(R) the evet of the lemma. Let R f. O the evet E(R), the coditio R/2 < ˆf R implies λ ˆf 2 P (l ˆf ) if P (l g) + λ ˆf 2 = g R [ ] if P (l f) if P (l g) + λ f 2 f R g R [ ( RA 2ρ/(2+ρ) 2 if P (l f) if P (l g) + f R g R λ f 2 + K + tr )] 2/(2+ρ) [ ] ( RA 2 P (l f ) if P (l g) + λ f 2 2ρ/(2+ρ) + 2K + tr ) g R 2/(2+ρ) ( RA 2λ f 2 2ρ/(2+ρ) + 2K + tr ), which implies that 2/(2+ρ) R 2 ( 4 ˆf RA 2 2 f 2 2ρ/(2+ρ) + 2K + tr ). λ 2/(2+ρ) λ

12 Solvig this iequality with respect to R shows that o E(R) the coditio R/2 ˆf R implies R K ( f A 2ρ/(2+ρ) ) t. λ 2/(2+ρ) λ If ow t = ε ad λ λ, the it yields Note that R K( f ). P (l ˆf ) + λ ˆf 2 l(0) (just plug i f = 0 i the target fuctioal). Therefore, we have λ ˆf 2 l(0), or ˆf l(0) =: R. λ Defie R k = 2 k, k = 0,, 2,..., N := log 2 R +. Note that, for our choice of λ, we have N C log with some umerical costat C > 0. Let E k := E(R k ). Clearly, P (E k ) e t ad, o the eve E k, the coditio R k ˆf R k implies ˆf R k K( f ). Thus, ˆf ca be larger tha the right had side of the last boud oly o the evet N k= Ec k, whose probabilty is smaller tha Ne ε. This establishes the followig iequality: { } P ˆf K( f ) Ne ε e ε/2, () log log provided that ε K, as it was assumed. Combiig bouds (8) ad () ad pluggig the resultig boud i (5) ad the i (4) easily completes the proof (subject to a mior adjustmet of the costats). Ackowledgemet. The first author is very thakful to Alexadre Tsybakov for several useful ad iterestig coversatios o the subject of the paper. Refereces. Audibert, J. Y. ad Tsybakov, A. Fast covergece rates for plug-i estimators uder margi coditios. Upublished mauscript, Bartlett, P., Bousquet, O. ad Medelso, S. Local Rademacher Complexities. Aals of Statistics, 2005, to appear. 3. Bartlett, P., Jorda, M. ad McAuliffe, J. Covexity, Classificatio ad Risk Bouds. J. America Statistical Soc., 2004, to appear.

13 4. Blachard, G., Bousquet, O. ad Massart, P. Statistical Performace of Support Vector Machies. Preprit, 2003, 4, Blachard, G., Lugosi, G. ad Vayatis, N. O the rates of covergece of regularized boostig classifiers. Joural of Machie Learig Research, 2003, 4, Koltchiskii, V. Local Rademacher Complexities ad Oracle Iequalities i Risk Miimizatio. Preprit. Preprit, Scovel, C. ad Steiwart, I. Fast Rates for Support Vector Machies. Preprit, to

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Self-normalized deviation inequalities with application to t-statistic

Self-normalized deviation inequalities with application to t-statistic Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric

More information

Chapter 7 Isoperimetric problem

Chapter 7 Isoperimetric problem Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS 18th Feb, 016 Defiitio (Lipschitz fuctio). A fuctio f : R R is said to be Lipschitz if there exists a positive real umber c such that for ay x, y i the domai

More information

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1 Solutio Sagchul Lee October 7, 017 1 Solutios of Homework 1 Problem 1.1 Let Ω,F,P) be a probability space. Show that if {A : N} F such that A := lim A exists, the PA) = lim PA ). Proof. Usig the cotiuity

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

APPENDIX A SMO ALGORITHM

APPENDIX A SMO ALGORITHM AENDIX A SMO ALGORITHM Sequetial Miimal Optimizatio SMO) is a simple algorithm that ca quickly solve the SVM Q problem without ay extra matrix storage ad without usig time-cosumig umerical Q optimizatio

More information

arxiv: v1 [math.pr] 13 Oct 2011

arxiv: v1 [math.pr] 13 Oct 2011 A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet

More information

Lecture Notes for Analysis Class

Lecture Notes for Analysis Class Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Fast Rates for Regularized Objectives

Fast Rates for Regularized Objectives Fast Rates for Regularized Objectives Karthik Sridhara, Natha Srebro, Shai Shalev-Shwartz Toyota Techological Istitute Chicago Abstract We study covergece properties of empirical miimizatio of a stochastic

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

The log-behavior of n p(n) and n p(n)/n

The log-behavior of n p(n) and n p(n)/n Ramauja J. 44 017, 81-99 The log-behavior of p ad p/ William Y.C. Che 1 ad Ke Y. Zheg 1 Ceter for Applied Mathematics Tiaji Uiversity Tiaji 0007, P. R. Chia Ceter for Combiatorics, LPMC Nakai Uivercity

More information

STAT Homework 1 - Solutions

STAT Homework 1 - Solutions STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function. MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied

More information

Riesz-Fischer Sequences and Lower Frame Bounds

Riesz-Fischer Sequences and Lower Frame Bounds Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices A Hadamard-type lower boud for symmetric diagoally domiat positive matrices Christopher J. Hillar, Adre Wibisoo Uiversity of Califoria, Berkeley Jauary 7, 205 Abstract We prove a ew lower-boud form of

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

MA131 - Analysis 1. Workbook 2 Sequences I

MA131 - Analysis 1. Workbook 2 Sequences I MA3 - Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................

More information

arxiv: v1 [math.pr] 4 Dec 2013

arxiv: v1 [math.pr] 4 Dec 2013 Squared-Norm Empirical Process i Baach Space arxiv:32005v [mathpr] 4 Dec 203 Vicet Q Vu Departmet of Statistics The Ohio State Uiversity Columbus, OH vqv@statosuedu Abstract Jig Lei Departmet of Statistics

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Singular Continuous Measures by Michael Pejic 5/14/10

Singular Continuous Measures by Michael Pejic 5/14/10 Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable

More information

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A. Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)

More information

Sparsity in Multiple Kernel Learning

Sparsity in Multiple Kernel Learning Sparsity i Multiple Kerel Learig Vladimir Koltchiskii School of Mathematics Georgia Istitute of Techology Atlata, GA 30332-0160 USA vlad@math.gatech.edu ad Mig Yua School of Idustrial ad Systems Egieerig

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

A remark on p-summing norms of operators

A remark on p-summing norms of operators A remark o p-summig orms of operators Artem Zvavitch Abstract. I this paper we improve a result of W. B. Johso ad G. Schechtma by provig that the p-summig orm of ay operator with -dimesioal domai ca be

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

5 Birkhoff s Ergodic Theorem

5 Birkhoff s Ergodic Theorem 5 Birkhoff s Ergodic Theorem Amog the most useful of the various geeralizatios of KolmogorovâĂŹs strog law of large umbers are the ergodic theorems of Birkhoff ad Kigma, which exted the validity of the

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Sequences I. Chapter Introduction

Sequences I. Chapter Introduction Chapter 2 Sequeces I 2. Itroductio A sequece is a list of umbers i a defiite order so that we kow which umber is i the first place, which umber is i the secod place ad, for ay atural umber, we kow which

More information

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number MATH 532 Itegrable Fuctios Dr. Neal, WKU We ow shall defie what it meas for a measurable fuctio to be itegrable, show that all itegral properties of simple fuctios still hold, ad the give some coditios

More information

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other

More information

Approximation by Superpositions of a Sigmoidal Function

Approximation by Superpositions of a Sigmoidal Function Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 22 (2003, No. 2, 463 470 Approximatio by Superpositios of a Sigmoidal Fuctio G. Lewicki ad G. Mario Abstract. We geeralize

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

On Classification Based on Totally Bounded Classes of Functions when There are Incomplete Covariates

On Classification Based on Totally Bounded Classes of Functions when There are Incomplete Covariates Joural of Statistical Theory ad Applicatios Volume, Number 4, 0, pp. 353-369 ISSN 538-7887 O Classificatio Based o Totally Bouded Classes of Fuctios whe There are Icomplete Covariates Majid Mojirsheibai

More information

A REMARK ON A PROBLEM OF KLEE

A REMARK ON A PROBLEM OF KLEE C O L L O Q U I U M M A T H E M A T I C U M VOL. 71 1996 NO. 1 A REMARK ON A PROBLEM OF KLEE BY N. J. K A L T O N (COLUMBIA, MISSOURI) AND N. T. P E C K (URBANA, ILLINOIS) This paper treats a property

More information

Application to Random Graphs

Application to Random Graphs A Applicatio to Radom Graphs Brachig processes have a umber of iterestig ad importat applicatios. We shall cosider oe of the most famous of them, the Erdős-Réyi radom graph theory. 1 Defiitio A.1. Let

More information

Empirical Processes: Glivenko Cantelli Theorems

Empirical Processes: Glivenko Cantelli Theorems Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3

More information

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition 6. Kalma filter implemetatio for liear algebraic equatios. Karhue-Loeve decompositio 6.1. Solvable liear algebraic systems. Probabilistic iterpretatio. Let A be a quadratic matrix (ot obligatory osigular.

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Fast Rates for Support Vector Machines

Fast Rates for Support Vector Machines Fast Rates for Support Vector Machies Igo Steiwart ad Clit Scovel CCS-3, Los Alamos Natioal Laboratory, Los Alamos NM 87545, USA {igo,jcs}@lal.gov Abstract. We establish learig rates to the Bayes risk

More information

1 Convergence in Probability and the Weak Law of Large Numbers

1 Convergence in Probability and the Weak Law of Large Numbers 36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec

More information

McGill University Math 354: Honors Analysis 3 Fall 2012 Solutions to selected problems

McGill University Math 354: Honors Analysis 3 Fall 2012 Solutions to selected problems McGill Uiversity Math 354: Hoors Aalysis 3 Fall 212 Assigmet 3 Solutios to selected problems Problem 1. Lipschitz fuctios. Let Lip K be the set of all fuctios cotiuous fuctios o [, 1] satisfyig a Lipschitz

More information

Solutions to HW Assignment 1

Solutions to HW Assignment 1 Solutios to HW: 1 Course: Theory of Probability II Page: 1 of 6 Uiversity of Texas at Austi Solutios to HW Assigmet 1 Problem 1.1. Let Ω, F, {F } 0, P) be a filtered probability space ad T a stoppig time.

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform

More information

PRELIM PROBLEM SOLUTIONS

PRELIM PROBLEM SOLUTIONS PRELIM PROBLEM SOLUTIONS THE GRAD STUDENTS + KEN Cotets. Complex Aalysis Practice Problems 2. 2. Real Aalysis Practice Problems 2. 4 3. Algebra Practice Problems 2. 8. Complex Aalysis Practice Problems

More information

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung Bull. Korea Math. Soc. 36 (999), No. 3, pp. 45{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Abstract. This paper provides suciet coditios which esure the strog cosistecy of regressio

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

2 Banach spaces and Hilbert spaces

2 Banach spaces and Hilbert spaces 2 Baach spaces ad Hilbert spaces Tryig to do aalysis i the ratioal umbers is difficult for example cosider the set {x Q : x 2 2}. This set is o-empty ad bouded above but does ot have a least upper boud

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information