Sieve Estimators: Consistency and Rates of Convergence
|
|
- Kory Taylor
- 5 years ago
- Views:
Transcription
1 EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for formal publicatios. They may be distributed outside this class oly with the permissio of the Istructor. 1 Itroductio We ca decompose the excess risk of a discrimiatio ruleas follows: Rĥ R = Rĥ RH + }{{} RH R }{{} estimatio error approximatio error The first term is the estimatio error ad measures the performace of the discrimiatio rule with respect to the best hypothesis i H. I previous lectures, we studied performace guaratees for this quatity whe ĥ is ERM. Now, we also cosider the approximatio error, which captures how well a class of hypotheses {H k } approximates the Bayes decisio boudary. For example, cosider the histogram classifiers i Figure 1 ad Figure 2, respectively. As the grid becomes more fie-graied, the class of hypotheses approximates the Bayes decisio boudary with icreasig accuracy. The examiatio of the approximatio error will lead to the desig of sieve estimators that perform ERM over H k where k = k grows with at a appropriate rate such that both the approximatio error ad the estimatio error coverge to 0. Note that while the estimatio error is radom because it depeds o the sample, the approximatio error is ot radom. 2 Approximatio Error The followig assumptio will be adopted to establish uiversal cosistecy. Defiitio 1. A sequece of sets of classifiers H 1, H 2,... is said to have the uiversal approximatio property UAP if for all P XY, as k. R H k R Observe that RH k is ot radom; it does ot deped o the traiig data. Also, it depeds o P XY because it is a expectatio. The followig theorem gives a sufficiet coditio for a sequece of classifiers to have the UAP. Theorem 1. Suppose H k cosists of classifiers that are piecewise costat o a paritito of X ito cells A k,1, A k,2,.... If sup j 1 diama k,j 0 as k, the {H k } has the UAP. Proof. See the proof of Theorem 6.1 i [1]. Example. Suppose X = [0, 1] d. Cosider H k = { histogram classifiers based o cells of sidelegth 1 k }. The, diama k,j = d k for j. I particular, sup diama k,j = j 1 as k. By the above theorem, {H k } has the UAP. 0 d k 1
2 2 Figure 1: Histogram classifier based o 6 6 grid Figure 2: Histogram classifier based o grid 3 Covergece of Radom Variables This sectio offers a brief review of covergece i probability ad almost surely, ad gives some useful results for provig both kids of covergece. Defiitio 2. Let Z 1, Z 2,... be a sequece of radom variables i R. We say that Z coverges to Z i i.p. probability, ad write Z Z, if ɛ > 0, lim Pr Z Z ɛ = 0. We say Z coverges to Z almost surely with probability oe, ad write Z a.s. Z, if Pr{ω : lim Z ω = Zω} = 1. Example. I aalysis of discrimiatio rules, we cosider Ω = {ω = X 1, Y 1, X 2, Y 2,... X Y } where X i, Y i i.i.d. P XY. To study covergece of the risk to the Bayes risk, we take Z = Rĥ where ĥ is a classifier based o the first etries of ω, amely X 1, Y 1,..., X, Y, ad Z = R. Now, we cosider some useful lemmas for provig covergece i probability ad almost sure covergece. We begi with useful results for covergece i probability. a.s. i.p. Lemma 1. If Z Z, the Z Z. Proof. This was proved i EECS 501. Lemma 2. If E{ Z Z } 0 as, the Z Proof. By Markov s iequality i.p. Z. Pr Z Z ɛ E{ Z Z } ɛ 0. Now, we tur our attetio to almost sure covergece. We begi with the Borel-Catelli Lemma, which yields a useful corollary.
3 3 Lemma 3 Borel-Catelli Lemma. Cosider a probability space Ω, A, P. Let {A } 1 be a sequece of evets. If P A < the where 1 P lim sup A = 0 lim sup A := {ω Ω : N N such that ω A } = {ω that occur ifiitely ofte i the sequece of evets} = A. N 1 N Proof. Observe that P lim sup A = P lim N N A. Let B N = N A. Observe that B 1 B 2 B 3, i.e., {B N } is a decreasig sequece of evets. The, by cotiuity of P, we have P lim B N = lim P B N N N = lim P A N lim = 0 N N N P A where the iequality i the third lie follows from uio boud ad the last equality follows from the hypothesis that 1 P A <. The followig corollary of the Borel-Catelli lemma is very useful for showig almost sure covergece. Corollary 1. If for all ɛ > 0, we have the Z a.s. Z. Pr Z Z ɛ < 1 Proof. Defie the evet A ɛ = {ω Ω : N N such that Z w Zw ɛ}. Let A ɛ = {ω Ω : Z ω Zω ɛ}. Observe that lim sup A ɛ = A ɛ. By the hypothesis, we may apply the Borel-Catelli lemma to obtai PrA ɛ = 0. Now defie A = {ω Ω : Z ω Z} = {ω Ω : ɛ > 0 N N such that Z ω Zω ɛ}. I words, this meas: for some ɛ, o matter how far alog you go i the sequece, you ca fid some such that Z ω Zω ɛ. Now, let {ɛ j } be a strictly decreasig sequece covergig to 0. Observe that A j=1 Aɛj. To see this, take some ω A. The, for some ɛ > 0 N N such that Z ω Zω ɛ. As j, evetually there is some j such that ɛ j ɛ, so that ω A ɛ j. Therefore, PrA Pr A ɛ j This cocludes the proof. j=1 PrA ɛj j=1 = 0.
4 4 4 Sieve Estimators Let {H k } be a sequece of sets of classifiers with the uiform approximatio property UAP. Assume that we have a uiform deviatio boud for H k e.g., if H k is fiite or a VC class. We deote by ĥ,k the classifier we obtai from empirical risk miimizatio ERM over H k : 1 ĥ,k = arg mi h H k 1 {hxi Y i}. Let k be a iteger-valued sequece ad defie ĥ = ĥ,k. Sice we have the UAP, it should be clear that the approximatio error goes to 0 as log as k with. We wish to restrict the rate of growth of k appropriately so that the estimatio error also goes to zero, i probability or almost surely. This is called a sieve estimator, whose ame is geerally attributed to Greader [3]. Formally, we eed that Rĥ RH k 0, i.p/a.s. Let s apply some previously developed theory to this problem. Cosider the case where H k <. We have see that Pr Rĥ RH k ɛ 2 H k e ɛ2 /2. 1 We eed to make sure that H k does ot domiate the expoetial term, so that the sum i Corollary 1 coverges. Rewrite the right-had side of Eq. 1 as Hk e ɛ 2 /2 = e l H k ɛ 2 /2. i Thus it suffices to have l H k 0, as. Aother way to write this is usig little-oh otatio. We say that a sequece a = ob if a b. So, the above is equivalet to l Hk = o. Uder this coditio, 0 as The for all ɛ > 0, 1 ɛ > 0, N ɛ, N ɛ, l Hk ɛ 2 Pr Rĥ RH k ɛ N ɛ + 2e ɛ 2 /4 <. N ɛ 2 ɛ The summatio o the right is simply a covergig geometric series. By Corollary 1 we have established Theorem 2. Let {H k } satisfy the UAP with Hk <. Let k be a iteger sequece such that as, k ad l Hk = o. The R ĥ R almost surely, i.e., ĥ is strogly uiversally cosistet. Corollary 2. Let {H k } = {the set of histogram classifiers with side legth 1/k}, X = [0, 1] d. Let k such that k d = o. The ĥ is strogly uiversally cosistet. Proof. We previous saw i Sectio 2 that histograms have the UAP. Also, Hk = 2 k d, so the corollary follows from Theorem 2. I Corollary 2, we require kd 0. Note that /kd is the umber of samples per cell. Therefore the umber of samples per cell must go to ifiity; this seems like a reasoable coditio for strog cosistecy.
5 5 Now, let s cosider the case where the VC dimesio of H k is fiite. From our discussio of VC theory, we have that for all ɛ > 0, Pr Rĥ RH k ɛ 8S Hk e ɛ2 /128. We also kow from Sauer s lemma e V S H, V, V where V is the VC dimesio of H. Usig this, we deduce Pr Rĥ RH k ɛ 8S Hk e ɛ2 /128 C V k e ɛ2 /128 = C exp V k l ɛ2 128 C is simply a costat that does ot deped o. If we choose k such that V k = o l, the we obtai the followig iequality, which is similar to the oe i Eq. 2:. Therefore We have established 1 ɛ > 0, N, N, V k l ɛ2 128 ɛ Pr Rĥ RH k ɛ N + Ce ɛ2 /256 <. N Theorem 3. Let {H k } satisfy the UAP, ad let V k < deote the VC dimesio of H k. Let k be a iteger-valued sequece such that as, k ad V k = o l. The Rĥ R almost surely, i.e., ĥ is strogly uiversally cosistet. 5 Rates of Covergece Oe could ask ] whether there is a uiversal rate of covergece. Put more formally, is there a ĥ such that E [Rĥ R coverges to 0 at a fixed rate, for ay joit distributio P XY? It turs out that the aswer is o. A theorem from [2] makes this cocrete. Theorem 4. Let {a } with a 0 such that 1 16 a 1 a For ay ĥ, there exists a joit distributio P XY such that R = 0 ad ERĥ a. Proof. See Chapter 7 of [1]. Sice the sequece {a } is arbitrary, we ca also choose how slowly it coverges to 0, thus showig that there will be o uiversal rate of covergece. I order to establish rates of covergece, we will therefore have to place restrictios o P XY. Some possible reasoable restrictios o P XY iclude: P X Y =y is a cotiuous radom variable. ηx is smooth. there exists a t 0 such that P X ηx 1 2 t 0 = 0. the Bayes decisio boudary is smooth. Below we cosider oe particular distributioal assumptio uder which we ca establish a rate of covergece.
6 6 Figure 3: Illustratio of Bayes decisio boudary BDB. 6 Box-Coutig Class Let X = [0, 1] d. For m 1, defie P m to be the partitio of X ito cubes of side legth 1/m. Note that P m = m d. Defiitio 3. The box coutig class B is the set of all P XY such that A P X has a bouded desity. B There exists a costat C such that m 1, the Bayes decisio boudary BDB {x : ηx = 1/2} passes through at most Cm d 1 elemets of P m see Fig. 3. This essetially says that the Bayes decisio boudary BDB has dimesio d 1. Look up the term box-coutig dimesio for more. We have the followig rate of covergece result. Theorem 5. Let H k ={histogram classifiers based o P k }, assume P XY B, ad let ĥ be a sieve estimator with ad k 1 d+2. The [ ] E Rĥ R = O 1 d+2. Proof. Recall the decompositio ito estimatio ad approximatio errors: [ E Rĥ R ] ] = E [Rĥ R Hk + RH k R. To boud the estimatio error, deote := Rĥ R H k. By the law of total expectatio, Choosig ɛ = [ ] E = P r [ ɛ E }{{} ] ɛ + P r [ < ɛ E }{{}}{{} ] < ɛ. }{{} δ 1 1 trivial sice R 1 ɛ 2[k d l 2+l2/δ] [ ] ad δ = 1/, we have E Rĥ RH k k = O d. To boud the approximatio error, let h = Bayes classifier ad h k = best classifier i H k. Observe [ Rh k Rh = 2E X ηx 1 ] 2 1 {h k X h X} E X [1 {h k X h X}] by ηx = P X h kx h X. 3
7 7 Let := {x : h k x h x}, ad let f be the desity of P X. The P X = fxdx f λ, where λ deotes volume/lebesgue measure. Now classificatio errors occur oly o cells itersectig the Bayes decisio boudary see shaded cells i Fig. 3, ad so λ Ck d 1 /k d = O 1 k. Settig 1 k = k d, we see that k 1 d+2 gives the best rate for histograms. Histograms[ do ot ] achieve the best rate amog all discrimiatio rules for the box-coutig class; this best rate is E Rĥ R = O 1 d [4]. I the ext set of otes we ll examie a discrimiatio rule that achieves this rate to withi a logarithmic factor. Exercises 1. With X = R d let H k = {hx = 1 {fx 0} : f is a polyomial of degree at most k}. Let P be the set of all joit distributios P XY such that if Rh R h H k as k. Determie a explicit sufficiet coditio o k for the sieve estimator to be cosistet for all P XY P. 2. Up to this poit i our discussio of empirical risk miimizatio, we have always assumed that a empirical risk miimizer exists. However, it could be that the ifimum of the empirical risk is ot attaied by ay classifier. I this case, we ca still have a cosistet sieve estimator based o a approximate empirical risk miimizer. Thus, let τ k be a sequece of positive real umbers decreasig to zero. Defie ĥ,k to be ay classifier ĥ,k {h H k : R h if R h + τ k }. h H k Now cosider the discrimiatio rule ĥ := ĥ,k. Show that Theorems 2 ad 3 still hold for this discrimiatio rule. State ay additioal assumptios o the rate of covergece of τ k that may be eeded. 3. Assume X R d. A fuctio f : X R is said to be Lipschitz cotiuous if there exists a costat L > 0 such that for all x, x X, fx fx L x x. Show that if oe coordiate of the Bayes decisio boudary is a Lipschitz cotiuous fuctio of the others, the B holds i the defiitio of the box-coutig class. A example of the stated coditio is whe the Bayes decisio boudary is the graph of, say, x d = fx 1,..., x d 1, where f is Lipschitz. 4. Establish the followig partial coverse to Lemma 2: If Z Z is bouded, the Z i.p. Z implies E{ Z Z } 0. Therefore, i the cotext of classificatio, ERĥ R is equivalet to weak cosistecy.
8 8 Refereces [1] L. Devroye, L. Györfi ad G. Lugosi, A Probabilistic Theory of Patter Recogitio, Spriger, [2] L. Devroye, Ay discrimiatio rule ca have a arbitrarily bad probability of error for fiite sample size, IEEE Tras. Patter Aalysis ad Machie Itelligece, vol. 4, pp , [3] U. Greader, Abstract Iferece, Wiley, [4] C. Scott ad R. Nowak, Miimax-Optimal Classificatio with Dyadic Decisio Trees, IEEE Tras. Iform. Theory, vol. 52, pp , 2006.
Rademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationChapter 6 Infinite Series
Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat
More informationEntropy Rates and Asymptotic Equipartition
Chapter 29 Etropy Rates ad Asymptotic Equipartitio Sectio 29. itroduces the etropy rate the asymptotic etropy per time-step of a stochastic process ad shows that it is well-defied; ad similarly for iformatio,
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationLecture 10: Universal coding and prediction
0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved
More informationEmpirical Processes: Glivenko Cantelli Theorems
Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More information10.6 ALTERNATING SERIES
0.6 Alteratig Series Cotemporary Calculus 0.6 ALTERNATING SERIES I the last two sectios we cosidered tests for the covergece of series whose terms were all positive. I this sectio we examie series whose
More informationIntegrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number
MATH 532 Itegrable Fuctios Dr. Neal, WKU We ow shall defie what it meas for a measurable fuctio to be itegrable, show that all itegral properties of simple fuctios still hold, ad the give some coditios
More informationSequences. Notation. Convergence of a Sequence
Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 Fuctioal Law of Large Numbers. Costructio of the Wieer Measure Cotet. 1. Additioal techical results o weak covergece
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More information(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3
MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special
More informationINFINITE SEQUENCES AND SERIES
INFINITE SEQUENCES AND SERIES INFINITE SEQUENCES AND SERIES I geeral, it is difficult to fid the exact sum of a series. We were able to accomplish this for geometric series ad the series /[(+)]. This is
More informationMath 341 Lecture #31 6.5: Power Series
Math 341 Lecture #31 6.5: Power Series We ow tur our attetio to a particular kid of series of fuctios, amely, power series, f(x = a x = a 0 + a 1 x + a 2 x 2 + where a R for all N. I terms of a series
More informationEstimation of the essential supremum of a regression function
Estimatio of the essetial supremum of a regressio fuctio Michael ohler, Adam rzyżak 2, ad Harro Walk 3 Fachbereich Mathematik, Techische Uiversität Darmstadt, Schlossgartestr. 7, 64289 Darmstadt, Germay,
More informationAPPENDIX A SMO ALGORITHM
AENDIX A SMO ALGORITHM Sequetial Miimal Optimizatio SMO) is a simple algorithm that ca quickly solve the SVM Q problem without ay extra matrix storage ad without usig time-cosumig umerical Q optimizatio
More informationLecture 2. The Lovász Local Lemma
Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationIntroduction to Probability. Ariel Yadin
Itroductio to robability Ariel Yadi Lecture 2 *** Ja. 7 ***. Covergece of Radom Variables As i the case of sequeces of umbers, we would like to talk about covergece of radom variables. There are may ways
More informationMa 530 Infinite Series I
Ma 50 Ifiite Series I Please ote that i additio to the material below this lecture icorporated material from the Visual Calculus web site. The material o sequeces is at Visual Sequeces. (To use this li
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More information1 Convergence in Probability and the Weak Law of Large Numbers
36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec
More informationMAT1026 Calculus II Basic Convergence Tests for Series
MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real
More informationIt is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.
MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationBIRKHOFF ERGODIC THEOREM
BIRKHOFF ERGODIC THEOREM Abstract. We will give a proof of the poitwise ergodic theorem, which was first proved by Birkhoff. May improvemets have bee made sice Birkhoff s orgial proof. The versio we give
More informationThe log-behavior of n p(n) and n p(n)/n
Ramauja J. 44 017, 81-99 The log-behavior of p ad p/ William Y.C. Che 1 ad Ke Y. Zheg 1 Ceter for Applied Mathematics Tiaji Uiversity Tiaji 0007, P. R. Chia Ceter for Combiatorics, LPMC Nakai Uivercity
More information7 Sequences of real numbers
40 7 Sequeces of real umbers 7. Defiitios ad examples Defiitio 7... A sequece of real umbers is a real fuctio whose domai is the set N of atural umbers. Let s : N R be a sequece. The the values of s are
More informationSolutions to HW Assignment 1
Solutios to HW: 1 Course: Theory of Probability II Page: 1 of 6 Uiversity of Texas at Austi Solutios to HW Assigmet 1 Problem 1.1. Let Ω, F, {F } 0, P) be a filtered probability space ad T a stoppig time.
More informationLecture Notes for Analysis Class
Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationIf a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?
2 Lebesgue Measure I Chapter 1 we defied the cocept of a set of measure zero, ad we have observed that every coutable set is of measure zero. Here are some atural questios: If a subset E of R cotais a
More informationGlivenko-Cantelli Classes
CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce
More informationMath 61CM - Solutions to homework 3
Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet
More informationChapter 7 Isoperimetric problem
Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 6 9/23/203 Browia motio. Itroductio Cotet.. A heuristic costructio of a Browia motio from a radom walk. 2. Defiitio ad basic properties
More informationREAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS
REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS 18th Feb, 016 Defiitio (Lipschitz fuctio). A fuctio f : R R is said to be Lipschitz if there exists a positive real umber c such that for ay x, y i the domai
More information2.1. Convergence in distribution and characteristic functions.
3 Chapter 2. Cetral Limit Theorem. Cetral limit theorem, or DeMoivre-Laplace Theorem, which also implies the wea law of large umbers, is the most importat theorem i probability theory ad statistics. For
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationRates of Convergence by Moduli of Continuity
Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity
More informationMA131 - Analysis 1. Workbook 3 Sequences II
MA3 - Aalysis Workbook 3 Sequeces II Autum 2004 Cotets 2.8 Coverget Sequeces........................ 2.9 Algebra of Limits......................... 2 2.0 Further Useful Results........................
More informationMath 155 (Lecture 3)
Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,
More informationInformation Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame
Iformatio Theory Tutorial Commuicatio over Chaels with memory Chi Zhag Departmet of Electrical Egieerig Uiversity of Notre Dame Abstract A geeral capacity formula C = sup I(; Y ), which is correct for
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationLecture 3 : Random variables and their distributions
Lecture 3 : Radom variables ad their distributios 3.1 Radom variables Let (Ω, F) ad (S, S) be two measurable spaces. A map X : Ω S is measurable or a radom variable (deoted r.v.) if X 1 (A) {ω : X(ω) A}
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationMATH 413 FINAL EXAM. f(x) f(y) M x y. x + 1 n
MATH 43 FINAL EXAM Math 43 fial exam, 3 May 28. The exam starts at 9: am ad you have 5 miutes. No textbooks or calculators may be used durig the exam. This exam is prited o both sides of the paper. Good
More information1 Lecture 2: Sequence, Series and power series (8/14/2012)
Summer Jump-Start Program for Aalysis, 202 Sog-Yig Li Lecture 2: Sequece, Series ad power series (8/4/202). More o sequeces Example.. Let {x } ad {y } be two bouded sequeces. Show lim sup (x + y ) lim
More informationSolution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1
Solutio Sagchul Lee October 7, 017 1 Solutios of Homework 1 Problem 1.1 Let Ω,F,P) be a probability space. Show that if {A : N} F such that A := lim A exists, the PA) = lim PA ). Proof. Usig the cotiuity
More informationEntropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP
Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationDetailed proofs of Propositions 3.1 and 3.2
Detailed proofs of Propositios 3. ad 3. Proof of Propositio 3. NB: itegratio sets are geerally omitted for itegrals defied over a uit hypercube [0, s with ay s d. We first give four lemmas. The proof of
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationLecture 11: Decision Trees
ECE9 Sprig 7 Statistical Learig Theory Istructor: R. Nowak Lecture : Decisio Trees Miimum Complexity Pealized Fuctio Recall the basic results of the last lectures: let X ad Y deote the iput ad output spaces
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationNotes 5 : More on the a.s. convergence of sums
Notes 5 : More o the a.s. covergece of sums Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: Dur0, Sectios.5; Wil9, Sectio 4.7, Shi96, Sectio IV.4, Dur0, Sectio.. Radom series. Three-series
More information4x 2. (n+1) x 3 n+1. = lim. 4x 2 n+1 n3 n. n 4x 2 = lim = 3
Exam Problems (x. Give the series (, fid the values of x for which this power series coverges. Also =0 state clearly what the radius of covergece is. We start by settig up the Ratio Test: x ( x x ( x x
More informationLecture 8: Convergence of transformations and law of large numbers
Lecture 8: Covergece of trasformatios ad law of large umbers Trasformatio ad covergece Trasformatio is a importat tool i statistics. If X coverges to X i some sese, we ofte eed to check whether g(x ) coverges
More informationA Proof of Birkhoff s Ergodic Theorem
A Proof of Birkhoff s Ergodic Theorem Joseph Hora September 2, 205 Itroductio I Fall 203, I was learig the basics of ergodic theory, ad I came across this theorem. Oe of my supervisors, Athoy Quas, showed
More informationLecture 9: Expanders Part 2, Extractors
Lecture 9: Expaders Part, Extractors Topics i Complexity Theory ad Pseudoradomess Sprig 013 Rutgers Uiversity Swastik Kopparty Scribes: Jaso Perry, Joh Kim I this lecture, we will discuss further the pseudoradomess
More informationMAS111 Convergence and Continuity
MAS Covergece ad Cotiuity Key Objectives At the ed of the course, studets should kow the followig topics ad be able to apply the basic priciples ad theorems therei to solvig various problems cocerig covergece
More informationSequences, Series, and All That
Chapter Te Sequeces, Series, ad All That. Itroductio Suppose we wat to compute a approximatio of the umber e by usig the Taylor polyomial p for f ( x) = e x at a =. This polyomial is easily see to be 3
More informationStatistical Machine Learning II Spring 2017, Learning Theory, Lecture 7
Statistical Machie Learig II Sprig 2017, Learig Theory, Lecture 7 1 Itroductio Jea Hoorio jhoorio@purdue.edu So far we have see some techiques for provig geeralizatio for coutably fiite hypothesis classes
More informationLecture Chapter 6: Convergence of Random Sequences
ECE5: Aalysis of Radom Sigals Fall 6 Lecture Chapter 6: Covergece of Radom Sequeces Dr Salim El Rouayheb Scribe: Abhay Ashutosh Doel, Qibo Zhag, Peiwe Tia, Pegzhe Wag, Lu Liu Radom sequece Defiitio A ifiite
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationLecture 2: Concentration Bounds
CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy
More informationIt is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.
Taylor Polyomials ad Taylor Series It is ofte useful to approximate complicated fuctios usig simpler oes We cosider the task of approximatig a fuctio by a polyomial If f is at least -times differetiable
More informationDefinition An infinite sequence of numbers is an ordered set of real numbers.
Ifiite sequeces (Sect. 0. Today s Lecture: Review: Ifiite sequeces. The Cotiuous Fuctio Theorem for sequeces. Usig L Hôpital s rule o sequeces. Table of useful its. Bouded ad mootoic sequeces. Previous
More information4.1 Sigma Notation and Riemann Sums
0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas
More informationMA131 - Analysis 1. Workbook 2 Sequences I
MA3 - Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationON MEAN ERGODIC CONVERGENCE IN THE CALKIN ALGEBRAS
PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 00, Number 0, Pages 000 000 S 0002-9939(XX0000-0 ON MEAN ERGODIC CONVERGENCE IN THE CALKIN ALGEBRAS MARCH T. BOEDIHARDJO AND WILLIAM B. JOHNSON 2
More informationn=1 a n is the sequence (s n ) n 1 n=1 a n converges to s. We write a n = s, n=1 n=1 a n
Series. Defiitios ad first properties A series is a ifiite sum a + a + a +..., deoted i short by a. The sequece of partial sums of the series a is the sequece s ) defied by s = a k = a +... + a,. k= Defiitio
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationLecture 3 The Lebesgue Integral
Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified
More information