Exponential Convergence Rates in Classification
|
|
- Heather Wilkins
- 6 years ago
- Views:
Transcription
1 Expoetial Covergece Rates i Classificatio Vladimir Koltchiskii ad Olexadra Bezosova Departmet of Mathematics ad Statistics The Uiversity of New Mexico Albuquerque, NM 873-4, U.S.A. vlad@math.um.edu,bezosik@math.um.edu Abstract. Let (X, Y ) be a radom couple, X beig a observable istace ad Y {, } beig a biary label to be predicted based o a observatio of the istace. Let (X i, Y i), i =,..., be traiig data cosistig of idepedet copies of (X, Y ). Cosider a real valued classifier ˆf that miimizes the followig pealized empirical risk l(y if(x i)) + λ f 2 mi, f H i= over a Hilbert space H of fuctios with orm, l beig a covex loss fuctio ad λ > 0 beig a regularizatio parameter. I particular, H might be a Sobolev space or a reproducig kerel Hilbert space. We provide some coditios uder which the geeralizatio error of the correspodig biary classifier sig( ˆf ) coverges to the Bayes risk expoetially fast. Itroductio Let (S, d) be a metric space ad (X, Y ) be a radom couple takig values i S {, } with joit distributio P. The distributio of X (which is a measure o the Borel σ-algebra i S) will be deoted by Π. Let (X i, Y i ), i be a sequece of idepedet copies of (X, Y ). Here ad i what follows all radom variables are defied o some probability space (Ω, Σ, P ). Let H be a Hilbert space of fuctios o S such that H is dese i the space C(S) of all cotiuous fuctios o S ad, i additio, x, y S f(x) f ad f(x) f(y) f d(x, y). () Here = H is the orm of H ad, =, H is its ier product. We have i mid two mai examples. I the first oe, S is a compact domai i R d with smooth boudary. For ay s, oe ca defie the followig ier product i the space C (S) of all ifiitely differetiable fuctios i S : f, g s := D α fd α gdx. α s Partially supported by NSF grat DMS S
2 Here α = (α,..., α d ), α j = 0,,..., α := d i= α i ad D α α f f = x α.... xα d d ) The Sobolev space H s (S) is the completio of (C (S),, s. There is also a versio of the defiitio for ay real s > 0 that utilizes Fourier trasforms. If s > d/2 +, the it follows from Sobolev s embeddig theorems that coditios () hold with metric d beig the Euclidea distace (possibly, after a proper rescalig of the ier product or of the metric d to make costats equal to ). I the secod example, S is a metric compact ad H = H K is the reproducig kerel Hilbert space (RKHS) geerated by a Mercer kerel K. This meas that K is a cotiuous symmetric oegatively defiite kerel ad H K is defied as the completio of the liear spa of fuctios {K x : x S}, K x (y) := K(x, y), with respect to the followig ier product: α i K xi, β j K yj := α i β j K(x i, y j ). i j K i,j It is well kow that H K ca be idetified with a subset of C(S) ad implyig that f H K f(x) = f, K x K, f(x) f K sup K x K ad f(x) f(y) f K K x K y K, x S so agai coditios () hold with d(x, y) := K x K y K (as before, a simple rescalig is eeded to esure that the costats are equal to ). I biary classificatio problems, it is commo to look for a real valued classifier ˆf that solves the followig pealized empirical risk miimizatio problem l(y i f(x i )) + λ f 2 mi, f H, (2) i= where l is a oegative decreasig covex loss fuctio such that l I (,0] ad λ > 0 is a regularizatio parameter. For istace, if l is a hige loss, i.e. l(u) = ( u) 0, ad is a RKHS-orm, this is a stadard approach i kerel machies classificatio. Give a real valued classifier f : S R, the correspodig biary classifier is typically defied as x sig(f(x)), where sig(u) = + for u 0 ad otherwise. The geeralizatio error or risk of f is the R P (f) := P {(x, y) : y sig(f(x))}.
3 It is well kow that the miimium of R P (f) over all measurable fuctios f is attaied at the regressio fuctio η defied as η(x) := E(Y X = x). The correspodig biary classifier sig(η(x)) is called the Bayes classifier, the quatity R := R P (η) is called the Bayes risk ad, fially, the quatity R P (f) R is ofte referred to as the excess risk of a classifier f. Our goal i this ote is to show that uder some (aturally restrictive) assumptios the expectatio of the excess risk of ˆf coverges to 0 expoetially fast as. Recetly, Audibert ad Tsybakov [] observed a similar pheomeo i the case of plug-i classifiers ad our aalysis here cotiues this lie of work. Deote δ(p ) := sup{δ > 0 : Π{x : η(x) δ} = 0}. We will assume that (a) η is a Lipschitz fuctio with costat L > 0 (which, for the sake of simplicity of otatios, will be assumed to be i what follows): η(x) η(y) Ld(x, y). (b) δ(p ) > 0. These will be two mai coditios that guaratee the possibilty of expoetially fast covergece rates of the geeralizatio error to the Bayes risk. Note that coditio (b), which is a extreme case of Tsybakov s low oise assumptio, meas that there exists δ > 0 such that Π-a.e. either η(x) δ, or η(x) δ. The fuctio η (as a coditioal expectatio) is defied up to Π-a.e. Coditio (a) meas that there exists a smooth (Lipschitz) versio of this coditioal expectatio. Sice smooth fuctios ca ot jump immediately from the value δ to value δ, the combiatio of coditios (a) ad (b) essetially meas that there should be a wide eough corridor betwee the regios {η δ} ad {η δ}, but the probability of gettig ito this corridor is zero. The fact that i such situatios it is possible to costruct classifiers that coverge to Bayes expoetially fast is essetially rather simple, it reduces to a large deviatio type pheomeo, ad it is eve surprisig that, up to our best kowledge, the possibility of such superfast covergece rates i classificatio has ot bee observed before Audibert ad Tsybakov [] (we apologize if someoe, i fact, did it earlier). Subtle results o covergece rates of the geeralizatio error of large margi classifiers to the Bayes risk have bee obtaied relatively recetly, see papers by Bartlett, Jorda ad McAuliffe [3] ad by Blachard, Lugosi ad Vayatis [5] o boostig, ad papers by Blachard, Bousquet ad Massart [4] ad by Scovel ad Steiwart [7] o SVM. These papers rely heavily o geeral expoetial iequalities i abstract empirical risk miimizatio i spirit of papers by Bartlett, Bousquet ad Medelso [2] or Koltchiskii [6] (or eve earlier work by Birgé ad Massart i the 90s). The rates of covergece i classificatio based o this
4 geeral approach are at best of the order O( ). I classificatio problems, there are may relevat probabilistic, aalytic ad geometric parameters to play with whe oe studies the covergece rates. For istace, both papers [4] ad [7] deal with SVM classifiers (so, essetially, with problem (2) i the case whe H is RKHS). I [4], the covergece rates are studied uder the assumptio (b) above ad uder some coditios o the eigevalues of the kerel. I [7], the authors determie the covergece rates uder the assumptio o the etropy of the uit ball i RKHS of the same type as our assumptio (3) below, uder Tsybakov s low oise assumptio ad some additioal coditios of geomeric ature. The fact that uder somewhat more restrictive assumptios imposed i this paper eve expoetial covergece rates are possible idicates that, probably, we have ot uderstood to the ed rather subtle iterplay betwee various parameters that ifluece the behaviour of this type of classifiers. 2 Mai Result We ow tur to precise formulatio of the results. Our goal will be to explai the mai ideas rather tha to give the results i the full geerality, so, we will make below several simplifyig assumptios. First, we eed some coditios o the loss fuctio l ad to get this out of the way, we will just assume that l is the so called logit loss, l(u) = log 2 ( + e u ), u R (other loss fuctios of the same type that are decreasig, strictly covex, satisfy the assumptio l I (,0] ad grow slower tha u 2 as u will also do). We deote (l f)(x, y) := l(yf(x)). For a fuctio g o S {, }, we write P g = gdp = E g(x, Y ). S {,} Let P be the empirical measure based o the traiig data (X i, Y i ), i =,...,. We will write P g = gdp = g(x i, Y i ). S {,} We use similar otatios for fuctios defied o S. A simple ad well kow computatio shows that the fuctio f P (l f) attais its miimum at f defied by f (x) = log + η(x) η(x). We will assume i what follows that f H. This assumptio is rather restrictive. Sice fuctios i H are uiformly bouded (see ()) it meas, i particular, that i=
5 η is bouded away from both + ad. Although, there is a versio of the mai result below without this assumptio, we are ot discussig it i this ote. Next we eed a assumptio o so called uiform L 2 -etropy of the uit ball i H, B H := {f H : f }. ( ) Give a probability measure Q o S, let N B H ; L 2 (Q); ε deote the miimal umber of L 2 (Q)-balls eeded to cover B H. Suppose that for some ρ (0, 2) ad for some costat A > 0 ( ) ( ) ρ A Q ε > 0 : log N B H ; L 2 (Q); ε. (3) ε Deote B(x, δ) the ope ball i (S, d) with ceter x ad radius δ. Also, let H(x, δ) be the set of all fuctios h H satisfyig the followig coditios: It follows from (i) (iii) that (i) y S 0 h(y) 2δ (ii) h δ o B(x; δ/2) (iii) hdπ δ B(x;δ) c δπ(b(x; δ/2)) E h(x) S hdπ 2δ Π(B(x; δ)). δ Sice there exists a cotiuous fuctio h such that 0 h 3 2 δ, h 4 3 δ o B(x, δ/2) ad h = 0 o B(x, δ) c, ad, o the other had, H is dese i C(S), it is easy to see that H(x, δ). Deote q(x, δ) := if h. h H(x,δ) The quatity q(x, δ) is, ofte, bouded from above uiformly i x S by a decreasig fuctio of δ, say by q(δ), ad this will be assumed i what follows. Ofte, q(δ) grows as δ γ, δ 0 for some γ > 0. Example. For istace, if H = H s (S) is a Sobolev space of fuctios i a compact domai S R d, s > d/2 +, defie ( ) x y h(y) := δϕ, δ where ϕ C (R d ), 0 ϕ 2, ϕ(x) if x /2 ad ϕ(x) = 0 if x. The h satisfies coditios (i) (iii) (moreover, h = 0 o B(x, δ) c ). A straightforward computatio of Sobolev s orm of h shows that h Hs (S) Cδ +d/2 s,
6 implyig that q(x, δ) is uiformly bouded from above by q(δ) = Cδ γ with γ = s d 2. Similar results are also true i the case of RKHS for some kerels. Let p(x, δ) := δ 2 Π(B(x, δ/2)). I what follows, K, C > 0 will deote sufficietly large umerical costats (whose precise values might chage from place to place). Recall our assumptio that δ(p ) > 0. I this case it is also atural to assume that for all δ δ(p ) K ad for all x such that η(x) δ(p ) p(x, δ) p(δ) > 0 for some fixed fuctio p. This would be true, for istace, if S is a domai i R d ad Π has desity uiformly bouded away from 0 o the set {x : η(x) δ(p )}. I this case we have for all x from this set Defie ow The o the set {x : η(x) δ(p )} p(x, δ) cδ d+2 =: p(δ). r(x, δ) := p(x, δ) q(x, δ). r(x, δ) p(δ) q(δ). ( ) We set U := K f L (here ad i what follows stads for the maximum ad for the miimum) ad defie λ + = λ + (P ) := { ( 4U if r x; δ(p ) ) U log log ad, for a fixed ε > K, Clearly, λ := A2ρ/(2+ρ) 2/(2+ρ) ε. λ + p(δ(p )/U) 4U q(δ(p )/U) > 0, } : η(x) δ(p ) so, λ + is a positive costat. The if is large eough ad ε is ot too large, we have λ λ +. Now, we are ready to formulate the mai result. Theorem. Let λ [λ, λ + ]. The there exists β = β(h, P ) > 0 such that E (R P ( ˆf ) R ) exp{ β}. ( ) ) I fact, with sufficietly large K, C > 0, β is equal to C ( p δ(p ) U ε, which is positive ad does ot deped o, establishig the expoetial covergece rate.
7 3 Proof We use a well kow represetatio of the excess risk R P (f) R = η dπ to get the followig boud: { ˆf (x)η(x) 0} {sig(f) sig(η)} E (R P ( ˆf ) R ) E η(x) Π(dx) = E η(x) I { ˆf(x)η(x) 0} Π(dx) = η(x) E I { ˆf(x)η(x) 0} Π(dx) = η(x) P { ˆf (x)η(x) 0}Π(dx) (4) Our goal ow is to boud, for a give x, P { ˆf (x)η(x) 0}. Let us assume that η(x) = δ > 0 (the other case, whe η(x) < 0, is similar). We have P { ˆf (x)η(x) 0} = P { ˆf (x) 0} P { ˆf (x) 0, ˆf U} + P { ˆf > U}. (5) We start with boudig the first term. For δ 0 > 0 (to be chose later), let h H(x, δ 0 ). Defie Sice ˆf miimizes the fuctioal L (α) := P (l ( ˆf + αh)) + λ ˆf + αh 2. H f P (l f) + λ f 2, the fuctio α L (α) attais its miimum at α = 0. This fuctio is differetiable, implyig that dl ( ) 0 = l (Y j ˆf (X j ))Y j h(x j ) + 2λ dα ˆf, h = 0. Assumig that η(x) = δ > 0, ˆf U ad ˆf (x) 0, we eed to boud from above l (Y j ˆf (X j ))Y j h(x j ) + 2λ ˆf, h, tryig to show that everywhere except the evet of small probability the last expressio is strictly egative. This would cotradict the fact that it is equal to 0, implyig a boud o the probability of the evet { ˆf (x) 0, ˆf U}. First ote that l (Y j ˆf (X j ))Y j h(x j ) = j:y j=+ l ( ˆf (X j ))h(x j ) j:y j= l ( ˆf (X j ))h(x j ).
8 Note also that fuctio l is egative ad icreasig, h is oegative ad ˆf is a Lipschitz fuctio with Lipschitz orm bouded by ˆf. The last observatio ad the assumptio that ˆf (x) 0 imply that, for all y B(x, δ 0 ), ad, as a result, ˆf (y) ˆf δ 0 Uδ 0 l ( ˆf (y)) l (Uδ 0 ), l ( ˆf (y)) l ( Uδ 0 ). Also, for all y S, ˆf (y) ˆf U, implyig that l ( ˆf (y)) l ( U), l ( ˆf (y)) l ( U). This leads to the followig upper boud: l (Y j ˆf (X j ))Y j h(x j ) l (Uδ 0 ) l ( U) l (Uδ 0 ) l ( U) j:x j B(x,δ 0),Y j=+ h(x j ) l ( Uδ 0 ) h(x j ) = j:x j B(x,δ 0) c + Y j h(x j ) l ( Uδ 0 ) 2 j:x j B(x,δ 0) j:x j B(x,δ 0) c h(x j ) = l (Uδ 0 ) l ( Uδ 0 ) 2 l (Uδ 0 ) + l ( Uδ 0 ) 2 l ( U) h(x j )I B(x,δ0)(X j ) + Y j h(x j )I B(x,δ0)(X j ) + h(x j )I B(x,δ0) c(x j). j:x j B(x,δ 0),Y j= j:x j B(x,δ 0) Usig the fact that for logit loss l has its maximum at 0, we get l (Uδ 0 ) + l ( Uδ 0 ) l (0) 2 l (Uδ 0 ) l (0) 2 + l ( Uδ 0 ) l (0) 2 ad l (Uδ 0 ) l ( Uδ 0 ) 2 l (0)Uδ 0. l (0)Uδ 0 h(x j ) + Y j h(x j ) + 2
9 Therefore, l (Y j ˆf (X j ))Y j h(x j ) l (0) l ( U) Y j h(x j )I B(x;δ0)(X j ) + 2l (0)Uδ 0 h(x j )I c(x B(x;δ0) j) = h(x j )I B(x;δ0)(X j ) + ξ j, (6) where ξ, ξ j, j are i.i.d. ξ := l (0)Y h(x)i B(x,δ0)(X) + 2l (0)Uδ 0 h(x)i B(x,δ0)(X) + l ( U) h(x)i B(x;δ0) c(x). To boud the sum of ξ j s, we will use Berstei iequality. To this ed, we first boud the expectatio ad the variace of ξ. We have E ξ = l (0) E Y h(x)i B(x;δ0)(X) + 2l (0)Uδ 0 E h(x)i B(x;δ0)(X) + l ( U) E h(x)i B(x;δ0) c(x). Sice η is Lipschitz with the Lipschitz costat L ad η(x) = δ, η(y) δ Lδ 0 for all y B(x; δ 0 ). Sice also h H(x, δ 0 ), we have: ad E Y h(x)i B(x;δ0)(X) = E η(x)h(x)i B(x;δ0)(X) (δ Lδ 0 ) E h(x)i B(x;δ0)(X) (δ Lδ 0 )( δ 0 ) E h(x), E h(x)i B(x;δ0)(X) E h(x), E h(x)i B(x;δ0) c(x) δ 0 E h(x) Recall that l (0) < 0 ad l (0) 0. So, the followig boud for the expectatio of ξ is immediate: ] E ξ [l (0)(δ Lδ 0 )( δ 0 ) + 2l (0)Uδ 0 + l ( U) δ 0 E h(x). We will choose δ 0 small eough to make [l (0)(δ Lδ 0 )( δ 0 ) + 2l (Uδ 0 )Uδ 0 + l ( U) δ 0 ] δ 0.
10 A simple computatio shows that it is eough to take δ 0 = C δ U L δ L + 4U + 2, which ca be always achieved by makig the umerical costat C large eough. The the expectatio satisfies the boud E ξ δ 0 E h(x). As far as the variace of ξ is cocered, usig a elemetary boud (a+b+c) 2 3a 2 + 3b 2 + 3c 2, it is easy to check that Var(ξ) Cδ 0 E h(x) with a sufficietly large umerical costat C. Fially, it is also straightforward that with some C > 0 ξ Cδ 0. Now Berstei iequality easily yields with a sufficietly large umerical costat C > 0 P { ξj 2 δ 0 E h(x) } { 2 exp δ } 0 E h(x). C The, sice δ 0 E h(x) δ0π(b(x; 2 δ 0 /2)) = p(x, δ 0 ), { } we have with probability at least 2 exp : So, if the p(x,δ0) C l (Y j ˆf (X j ))Y j h(x j ) + 2λ ˆf, h 2 δ 0 E h(x) + 2λ ˆf, h 2 δ 0 E h(x) + 2λU h 2 p(x, δ 0) + 2λUq(x, δ 0 ) (7) λ < p(x, δ 0) 4Uq(x, δ 0 ) = r(x, δ 0) 4U, l (Y j ˆf (X j ))Y j h(x j ) + 2λ ˆf, h < 0 { with probability at least 2 exp ad λ < r(x,δ0) 4U, the p(x,δ0) C P { ˆf (x) 0, ˆf U} 2 exp }. The coclusio is that if η(x) = δ { p(x, δ } 0). C
11 Thus, for λ λ +, we have { P { ˆf (x) 0, ˆf U} 2 exp p(δ } 0). (8) C We ow tur to boudig the probability P { ˆf U} for a properly chose U. This is the oly part of the proof where the coditio (3) o the uiform etropy of the uit ball B H is eeded. It relies heavily o recet excess risk bouds i Koltchiskii [6] as well as o some of the results i spirit of Blachard, Lugosi ad Vayatis [5] (see their Lemma 4). We formulate the boud we eed i the followig lemma. Lemma. Suppose that coditio (3) holds ad (for simplicity) that l is the logit loss. Let R. The, there exists a costat K > 0 such that for ay t > 0, the followig evet f H with f R (9) P (l f) if P (l g) g R ( ) ( RA 2ρ/(2+ρ) 2 P (l f) if P (l g) + K + tr ), (0) g R 2/(2+ρ) has probability at least e t. The argumet that follows will provide a boud that is somewhat aki to some of the bouds i [7] ad i [4]. Deote E(R) the evet of the lemma. Let R f. O the evet E(R), the coditio R/2 < ˆf R implies λ ˆf 2 P (l ˆf ) if P (l g) + λ ˆf 2 = g R [ ] if P (l f) if P (l g) + λ f 2 f R g R [ ( RA 2ρ/(2+ρ) 2 if P (l f) if P (l g) + f R g R λ f 2 + K + tr )] 2/(2+ρ) [ ] ( RA 2 P (l f ) if P (l g) + λ f 2 2ρ/(2+ρ) + 2K + tr ) g R 2/(2+ρ) ( RA 2λ f 2 2ρ/(2+ρ) + 2K + tr ), which implies that 2/(2+ρ) R 2 ( 4 ˆf RA 2 2 f 2 2ρ/(2+ρ) + 2K + tr ). λ 2/(2+ρ) λ
12 Solvig this iequality with respect to R shows that o E(R) the coditio R/2 ˆf R implies R K ( f A 2ρ/(2+ρ) ) t. λ 2/(2+ρ) λ If ow t = ε ad λ λ, the it yields Note that R K( f ). P (l ˆf ) + λ ˆf 2 l(0) (just plug i f = 0 i the target fuctioal). Therefore, we have λ ˆf 2 l(0), or ˆf l(0) =: R. λ Defie R k = 2 k, k = 0,, 2,..., N := log 2 R +. Note that, for our choice of λ, we have N C log with some umerical costat C > 0. Let E k := E(R k ). Clearly, P (E k ) e t ad, o the eve E k, the coditio R k ˆf R k implies ˆf R k K( f ). Thus, ˆf ca be larger tha the right had side of the last boud oly o the evet N k= Ec k, whose probabilty is smaller tha Ne ε. This establishes the followig iequality: { } P ˆf K( f ) Ne ε e ε/2, () log log provided that ε K, as it was assumed. Combiig bouds (8) ad () ad pluggig the resultig boud i (5) ad the i (4) easily completes the proof (subject to a mior adjustmet of the costats). Ackowledgemet. The first author is very thakful to Alexadre Tsybakov for several useful ad iterestig coversatios o the subject of the paper. Refereces. Audibert, J. Y. ad Tsybakov, A. Fast covergece rates for plug-i estimators uder margi coditios. Upublished mauscript, Bartlett, P., Bousquet, O. ad Medelso, S. Local Rademacher Complexities. Aals of Statistics, 2005, to appear. 3. Bartlett, P., Jorda, M. ad McAuliffe, J. Covexity, Classificatio ad Risk Bouds. J. America Statistical Soc., 2004, to appear.
13 4. Blachard, G., Bousquet, O. ad Massart, P. Statistical Performace of Support Vector Machies. Preprit, 2003, 4, Blachard, G., Lugosi, G. ad Vayatis, N. O the rates of covergece of regularized boostig classifiers. Joural of Machie Learig Research, 2003, 4, Koltchiskii, V. Local Rademacher Complexities ad Oracle Iequalities i Risk Miimizatio. Preprit. Preprit, Scovel, C. ad Steiwart, I. Fast Rates for Support Vector Machies. Preprit, to
A survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationSelf-normalized deviation inequalities with application to t-statistic
Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric
More informationChapter 7 Isoperimetric problem
Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationRates of Convergence by Moduli of Continuity
Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationJournal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula
Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationREAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS
REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS 18th Feb, 016 Defiitio (Lipschitz fuctio). A fuctio f : R R is said to be Lipschitz if there exists a positive real umber c such that for ay x, y i the domai
More informationSolution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1
Solutio Sagchul Lee October 7, 017 1 Solutios of Homework 1 Problem 1.1 Let Ω,F,P) be a probability space. Show that if {A : N} F such that A := lim A exists, the PA) = lim PA ). Proof. Usig the cotiuity
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationChapter 6 Infinite Series
Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationAPPENDIX A SMO ALGORITHM
AENDIX A SMO ALGORITHM Sequetial Miimal Optimizatio SMO) is a simple algorithm that ca quickly solve the SVM Q problem without ay extra matrix storage ad without usig time-cosumig umerical Q optimizatio
More informationarxiv: v1 [math.pr] 13 Oct 2011
A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet
More informationLecture Notes for Analysis Class
Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationMath Solutions to homework 6
Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationFast Rates for Regularized Objectives
Fast Rates for Regularized Objectives Karthik Sridhara, Natha Srebro, Shai Shalev-Shwartz Toyota Techological Istitute Chicago Abstract We study covergece properties of empirical miimizatio of a stochastic
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationLecture 3 The Lebesgue Integral
Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified
More informationThe log-behavior of n p(n) and n p(n)/n
Ramauja J. 44 017, 81-99 The log-behavior of p ad p/ William Y.C. Che 1 ad Ke Y. Zheg 1 Ceter for Applied Mathematics Tiaji Uiversity Tiaji 0007, P. R. Chia Ceter for Combiatorics, LPMC Nakai Uivercity
More informationSTAT Homework 1 - Solutions
STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationIt is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.
MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied
More informationRiesz-Fischer Sequences and Lower Frame Bounds
Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.
More informationLinear Classifiers III
Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models
More informationA Hadamard-type lower bound for symmetric diagonally dominant positive matrices
A Hadamard-type lower boud for symmetric diagoally domiat positive matrices Christopher J. Hillar, Adre Wibisoo Uiversity of Califoria, Berkeley Jauary 7, 205 Abstract We prove a ew lower-boud form of
More informationSequences. Notation. Convergence of a Sequence
Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationMA131 - Analysis 1. Workbook 2 Sequences I
MA3 - Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................
More informationarxiv: v1 [math.pr] 4 Dec 2013
Squared-Norm Empirical Process i Baach Space arxiv:32005v [mathpr] 4 Dec 203 Vicet Q Vu Departmet of Statistics The Ohio State Uiversity Columbus, OH vqv@statosuedu Abstract Jig Lei Departmet of Statistics
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationSingular Continuous Measures by Michael Pejic 5/14/10
Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable
More informationRandom Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.
Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)
More informationSparsity in Multiple Kernel Learning
Sparsity i Multiple Kerel Learig Vladimir Koltchiskii School of Mathematics Georgia Istitute of Techology Atlata, GA 30332-0160 USA vlad@math.gatech.edu ad Mig Yua School of Idustrial ad Systems Egieerig
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationLinear Support Vector Machines
Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate
More informationA remark on p-summing norms of operators
A remark o p-summig orms of operators Artem Zvavitch Abstract. I this paper we improve a result of W. B. Johso ad G. Schechtma by provig that the p-summig orm of ay operator with -dimesioal domai ca be
More informationLecture 7: Properties of Random Samples
Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ
More informationDefinition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.
4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad
More informationLecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound
Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee
More information1 Duality revisited. AM 221: Advanced Optimization Spring 2016
AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R
More informationDistribution of Random Samples & Limit theorems
STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More informationA Risk Comparison of Ordinary Least Squares vs Ridge Regression
Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer
More information5 Birkhoff s Ergodic Theorem
5 Birkhoff s Ergodic Theorem Amog the most useful of the various geeralizatios of KolmogorovâĂŹs strog law of large umbers are the ergodic theorems of Birkhoff ad Kigma, which exted the validity of the
More informationLecture 7: Density Estimation: k-nearest Neighbor and Basis Approach
STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.
More informationSequences I. Chapter Introduction
Chapter 2 Sequeces I 2. Itroductio A sequece is a list of umbers i a defiite order so that we kow which umber is i the first place, which umber is i the secod place ad, for ay atural umber, we kow which
More informationIntegrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number
MATH 532 Itegrable Fuctios Dr. Neal, WKU We ow shall defie what it meas for a measurable fuctio to be itegrable, show that all itegral properties of simple fuctios still hold, ad the give some coditios
More informationChapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities
Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other
More informationApproximation by Superpositions of a Sigmoidal Function
Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 22 (2003, No. 2, 463 470 Approximatio by Superpositios of a Sigmoidal Fuctio G. Lewicki ad G. Mario Abstract. We geeralize
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationOn Classification Based on Totally Bounded Classes of Functions when There are Incomplete Covariates
Joural of Statistical Theory ad Applicatios Volume, Number 4, 0, pp. 353-369 ISSN 538-7887 O Classificatio Based o Totally Bouded Classes of Fuctios whe There are Icomplete Covariates Majid Mojirsheibai
More informationA REMARK ON A PROBLEM OF KLEE
C O L L O Q U I U M M A T H E M A T I C U M VOL. 71 1996 NO. 1 A REMARK ON A PROBLEM OF KLEE BY N. J. K A L T O N (COLUMBIA, MISSOURI) AND N. T. P E C K (URBANA, ILLINOIS) This paper treats a property
More informationApplication to Random Graphs
A Applicatio to Radom Graphs Brachig processes have a umber of iterestig ad importat applicatios. We shall cosider oe of the most famous of them, the Erdős-Réyi radom graph theory. 1 Defiitio A.1. Let
More informationEmpirical Processes: Glivenko Cantelli Theorems
Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3
More information6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition
6. Kalma filter implemetatio for liear algebraic equatios. Karhue-Loeve decompositio 6.1. Solvable liear algebraic systems. Probabilistic iterpretatio. Let A be a quadratic matrix (ot obligatory osigular.
More information(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3
MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special
More informationFast Rates for Support Vector Machines
Fast Rates for Support Vector Machies Igo Steiwart ad Clit Scovel CCS-3, Los Alamos Natioal Laboratory, Los Alamos NM 87545, USA {igo,jcs}@lal.gov Abstract. We establish learig rates to the Bayes risk
More information1 Convergence in Probability and the Weak Law of Large Numbers
36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec
More informationMcGill University Math 354: Honors Analysis 3 Fall 2012 Solutions to selected problems
McGill Uiversity Math 354: Hoors Aalysis 3 Fall 212 Assigmet 3 Solutios to selected problems Problem 1. Lipschitz fuctios. Let Lip K be the set of all fuctios cotiuous fuctios o [, 1] satisfyig a Lipschitz
More informationSolutions to HW Assignment 1
Solutios to HW: 1 Course: Theory of Probability II Page: 1 of 6 Uiversity of Texas at Austi Solutios to HW Assigmet 1 Problem 1.1. Let Ω, F, {F } 0, P) be a filtered probability space ad T a stoppig time.
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform
More informationPRELIM PROBLEM SOLUTIONS
PRELIM PROBLEM SOLUTIONS THE GRAD STUDENTS + KEN Cotets. Complex Aalysis Practice Problems 2. 2. Real Aalysis Practice Problems 2. 4 3. Algebra Practice Problems 2. 8. Complex Aalysis Practice Problems
More informationBull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung
Bull. Korea Math. Soc. 36 (999), No. 3, pp. 45{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Abstract. This paper provides suciet coditios which esure the strog cosistecy of regressio
More informationSupport Vector Machines and Kernel Methods
Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,
More information1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable
More informationLecture 13: Maximum Likelihood Estimation
ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select
More information2 Banach spaces and Hilbert spaces
2 Baach spaces ad Hilbert spaces Tryig to do aalysis i the ratioal umbers is difficult for example cosider the set {x Q : x 2 2}. This set is o-empty ad bouded above but does ot have a least upper boud
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More information