Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
|
|
- Poppy Leonard
- 6 years ago
- Views:
Transcription
1 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe performig a biary classificatio over classes of fiite VC dimesio This result geeralizes the agostic boud for fiite classes, discussed i the previous lecture Most of the material follows the expositio of Bousquet et al 004) I also ivite the iterested studets to thik about questios marked with blue You wo t get extra poits for them, but you will certaily get a better uderstadig of the material Let s recall the settig ad some basic facts We have a iput space X ad a output space Y := 0, There is a ukow probability distributio P over X Y We receive a traiig sample S := X i, Y i ) of iid iput-output pairs from P We fix a set of classifiers H We deote the expected risk for ay h H as ad the empirical risk as Lh) := P X,Y ) P hx) Y L h) := hx i ) Y i We itroduce the Empirical Risk Miimizatio ERM) algorithm ĥ := ĥs, H): L Ĥ) = if g H L g) We will require the followig cocetratio iequality, itroduced i the secod lecture: Theorem Hoeffdig s iequality) Let ξ,, ξ be idepedet radom variables such that ξ i [a i, b i ], a i, b i R, for i =,, with probability oe Deote Z := ξ i The for ay ε > 0 it holds that: ε ) PZ E[Z ] ε exp b i a i ) Same iequality holds for PE[Z ] Z ε Moreover, ε ) P Z E[Z ] ε exp b i a i ) Show that the third iequality of theorem follows simply from the first two oes The uio boud is our favourite trick!
2 Agostic boud for fiite classes Let s shortly recall the agostic excess risk boud for fiite classes, itroduced i the secod lecture We will provide a slightly modified proof leadig to mior chages i the costat factors: Theorem Assume H = h,, h N The for ay δ > 0 with probability larger tha δ the followig holds: Lĥ) mi Lh log N + log δ i) + Th),,N Proof For our further discussio it will be useful to recall the idea behid a proof Assumig h is the miimizer of the expected risk over H we may write: Next we write: Lĥ) Lh ) = Lĥ) L ĥ) + L ĥ) L h ) + L h ) Lh ) Lĥ) L ĥ) + L h ) Lh ) *) Lh) L h) ) + L h) Lh) ) Lh) L h) ) PLh) L h) ɛ = P N Lh i ) L h i ) ɛ ) N P Lh i ) L h i ) ɛ, 3) where we used the uio boud i the last lie We may ow apply Hoeffdig s iequality of Theorem ad get: PLh) L h) ɛ N e ɛ / = Ne ɛ We wat the rhs of the previous iequality to be smaller tha δ I other words, we wat to fid ɛ such that: δ = Ne ɛ Solvig the equatio for ɛ we get: Note that for this choice of ɛ we have logn/δ) ɛ = PLh) L h) ɛ δ, or equivaletly PLh) L h) < ɛ δ I other words, with probability larger tha δ we have Lh) L h) Isertig this boud back to ) we coclude the proof logn/δ)
3 log N+log Try to slightly improve this result You may replace δ log N+log δ log δ i the upper boud with + For this get back to *) ad do somethig smarter Notice that h does ot deped o S so why upper boudig last two terms with remum? 3 Oe step further: ifiite classes H, VC-boud The mai goal of this lecture is to drop the assumptio of Theorem that the class H is fiite Now we assume that H may be ifiite Actually, there ca be ucoutably may classifiers i H just thik about liear classifiers i R d or simply about thresholds i oe dimesio) 3 Spoiler Before eve itroducig all the ecessary defiitios, let us start with the statemet of theorem, which we are goig to prove Theorem 3 VC-boud) For ay δ with probability larger tha δ it holds that: Lĥ) if Lg) + log S H ) + log 4 δ g H Compare this boud to Th) It looks almost the same, but N is replaced with S H ) a quatity kow as the growth fuctio, which will be itroduced later i the proof For ow it is istructive to ote the similarity betwee these two results: perhaps, it meas that we ca proceed with the same or almost the same) proof, where, magically, N evets appearig o lies ) 3) will be evetually replaced with S H ) evets? It turs out that this is ideed the case! I the followig we preset the proof of Theorem 7 3 Debuggig the proof Ca we still repeat the proof of Theorem? Let s assume for ow that there is h H such that Lh ) = if g H Lg) Show that geerally this is ot true) It turs out that we ca still repeat the first steps, but we ca o more apply the uio boud Ideed, the uio boud P i A i ) i PA i) holds at most for coutable set of evets A i I our case, as we already metioed, we may ed up with ucoutably may evets I summary, we ca ot apply step ) 3) ay more Let s try to fid a workaroud What is actually causig the problem? Note that L h) i lies betwee ) ad ) still takes oly fiitely may values as h rus through the H prove this yourself!) If we had oly L h) appearig iside of probability sig i ) we could still eumerate all the differet values of L h) ad get back to fiitely may evets ad proceed with all the previous steps The real problem is the Lh) term, which also appears i the evets of ) I priciple, Lh) ca take ay value betwee 0 ad for h H prove this yourself!) This is the reaso we may ed up with ucoutably may evets Fortuately, the followig otrivial iequality helps us to get rid of the adversarial Lh) term: Lemma 4 Symmetrizatio iequality) Assume S := X i, Y i )) is a idepedet copy of S, that is S S forms a sequece of iid iput-output pairs distributed accordig to P Deote L h) := hx i ) Y i The for ay ɛ > 0, such that ɛ, it holds that: P S Lh) L h) ) ɛ P S S L h) L h) ) ɛ/ 3
4 Iequality also holds for L h) Lh) ) 33 Modifyig the proof: gettig rid of Lh) Now, let us retur to the begiig ad try to apply this result: Lĥ) Lh ) = Lĥ) L ĥ) + L ĥ) L h ) + L h ) Lh ) Lĥ) L ĥ) + L h ) Lh ) Lh) L h) ) + L h) Lh) ) As we already ow, if for two evets A ad B it holds that A B the ecessarily PA) PB) This gives us PLĥ) Lh ) ɛ P Lh) L h) ) + L h) Lh) ) ɛ 4) Also ote that by the same reaso for ay radom variables a ad b we have Pa + b ɛ Pa ɛ/ b ɛ/ Pa ɛ/ + Pb ɛ/ Applyig this to 4) ad usig Lemma 4 we get: PLĥ) Lh ) ɛ P Lh) L h) ) ɛ/ + P L h) Lh) ) ɛ/ 4 P S S L h) L h) ) ɛ/ 5) At this poit ote that o matter what h is, L h) L h) ca take oly fiitely may values prove this yourself!) The value of L h) L h) depeds oly o the projectio of H o the double sample S S, where for ay sample S m := X j, Y j ) m j= we defie a projectio i the followig way: hx ) H Sm := ) Y, hx ) Y,, hx m ) Y m, h H 0, m Note that H S S is a subset of the 0, ad thus its cardiality cardh S S ) is upper bouded by We may write PLĥ) Lh ) ɛ 4 P S S L v) L v) ) ɛ/ v H S S where we have overloaded otatios L v) ad L v) i a atural way All i all, it seems like we may ow proceed with the origial ) 3) steps to boud the rhs of the previous iequality, sice is ow over the fiite set This is ideed what we did durig the lecture, but the thig is, this step is ot quite correct Notice that the uio boud assumes that evets A i are fixed I our case, there are fiitely may evets A v := L v) L v) ɛ idexed by v, but they all deped o the radom samples S ad S, so the uio boud at least i its usual form) ca ot be applied, 4
5 34 Aother eat trick: Rademacher symmetrizatio Istead, we will proceed with a trick commoly kow as the Rademacher symmetrizatio Next lies are take from Sectio 4 of Devroye et al 996) Itroduce radom variables σ,, σ which are all idepedet also idepedet from S ad S ) ad take values ad + with probabilities 05 Rewrite 5) i the followig way: PLĥ) Lh ) ɛ 4 P S S ad otice that distributio of is the same as distributio of proof this yourself!) We may thus write PLĥ) Lh ) ɛ 4 P S S = 4 P σ,s S hx i ) Y i hx i ) Y i ) ɛ/ hx i ) Y i hx i ) Y i ) σ i hx i ) Y i hx i ) Y i ) hx i ) Y i hx i ) Y i ) ɛ/ σ i hx i ) Y i hx i ) Y i ) ɛ/ Next we use the tower rule of expectatio, which ca be writte for ay evet A ad ay radom variable Z as PA) = E Z [PA Z)] This gives us PLĥ) Lh ) ɛ 4E S S [P σ σ i hx i ) Y i hx i ) Y i ) ɛ ] S S It is left to boud the coditioal probability appearig iside of expected value defiitio of the projectio we may rewrite P σ σ i hx i ) Y i hx i ) Y i ) ɛ S S ) = P σ σ i v ɛ v H S S i v i S S, Usig our where we oce agai perhaps cofusigly) used v i ad v i to deote idicators h vx i ) Y i ad h v X i ) Y i, where h v H is ay classifier with projectio equal to v Notice that, because we coditioed o S ad S, these sets are ow fixed, ad thus the projectio H S S is ow ot radom ay more, but istead just some fixed subset of 0, We may ow safely use our iitial ) 3) trick uio boud) ad write P σ σ i hx i ) Y i hx i ) Y i ) ɛ S S ) σ i v ɛ i v i S S v H S S P σ 5
6 Idividual probabilities may be agai bouded usig Hoeffdig s iequality prove it yourself!): ) P σ σ i v ɛ i v i S S e ɛ /4 4/ = e ɛ /8 35 VC combiatorics Puttig all the bits together we fially get: PLĥ) Lh ) ɛ 4e ɛ /8 E S S [ cardhs S )] Agai, makig the upper boud equal to δ ad solvig for ɛ we get that for ay δ > 0 with probability larger tha δ it holds that: Lĥ) if Lg) + log E H ) + log 4 δ, g H where we deoted E H ) := E S [cardh S )] The quatity E H ) is kow as the VC etropy Obviously, the VC etropy ca be upper bouded i the followig perhaps, extremely crude) way: E H ) S H ) := cardh S ) S : cards)= All we did is replaced the average expectatio) with the maximum value The quatity S H ) is commoly kow as the growth fuctio We showed that with probability larger tha δ it also holds that: Lĥ) if g H Lg) + log S H ) + log 4 δ This cocludes the proof of Theorem 7 But are we satisfied with this result? The good thig about Theorem is that as the sample size grows to ifiity the last term o the rhs of Th) decreases to zero, showig that the performace of ERM achieves the best possible oe Does Theorem 7 have the same behaviour? Of course, the aswer depeds o the growth fuctio S H ), which is defied purely by the geometry of H As we already metioed, the trivial upper boud gives S H ) However, if we isert it i the VC-boud we ed up with, which does ot ted to zero A importat questio is: how should H look like so that log S H )/ 0 as? The aswer to this questio is hidde i the followig defiitio: Defiitio 5 VC dimesio) The VC dimesio of the class H is the largest such that S H ) = If there is o such a we say that H has ifiite VC dimesio The followig fact establishes the polyomial growth of S H ) for classes H of fiite VC dimesio: There is a curious history behid this lemma It was apparetly) simultaeously proved by several groups aroud late 60s early 70th, icludig Vapik ad Chervoekis, Sauer, ad Shelah ad Perles A woderful overview of this fact ca be foud i Leo Bottou s slides available olie here: papers/vapik-symposium-0pdf 6
7 Lemma 6 Vapik, Chervoekis, Sauer, Shelah) Let H be a class of VC dimesio d < The for all it holds that d ) S H ), i ad for all d it holds that: S H ) e ) d d We may fially state the followig boud, which behaves exactly like the oe of origial Theorem : Theorem 7 VC-boud) Assume H has a VC dimesio d < For ay δ with probability larger tha δ it holds that: Lĥ) if Lg) + d log e d + log 4 δ g H Refereces Olivier Bousquet, Stéphae Bouchero, ad Gábor Lugosi Itroductio to statistical learig theory Lecture Notes i Artificial Itelligece, 004 URL user_upload/files/publicatios/pdfs/pdf89pdf Luc Devroye, László Györfi, ad Gábor Lugosi A Probabilistic Theory of Patter Recogitio Spriger, 996 7
Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationStatistical Machine Learning II Spring 2017, Learning Theory, Lecture 7
Statistical Machie Learig II Sprig 2017, Learig Theory, Lecture 7 1 Itroductio Jea Hoorio jhoorio@purdue.edu So far we have see some techiques for provig geeralizatio for coutably fiite hypothesis classes
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationSelective Prediction
COMS 6998-4 Fall 2017 November 8, 2017 Selective Predictio Preseter: Rog Zhou Scribe: Wexi Che 1 Itroductio I our previous discussio o a variatio o the Valiat Model [3], the described learer has the ability
More informationOnce we have a sequence of numbers, the next thing to do is to sum them up. Given a sequence (a n ) n=1
. Ifiite Series Oce we have a sequece of umbers, the ext thig to do is to sum them up. Give a sequece a be a sequece: ca we give a sesible meaig to the followig expressio? a = a a a a While summig ifiitely
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22
CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first
More informationNotes 5 : More on the a.s. convergence of sums
Notes 5 : More o the a.s. covergece of sums Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: Dur0, Sectios.5; Wil9, Sectio 4.7, Shi96, Sectio IV.4, Dur0, Sectio.. Radom series. Three-series
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationLecture 9: Boosting. Akshay Krishnamurthy October 3, 2017
Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely
More informationA sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationRecurrence Relations
Recurrece Relatios Aalysis of recursive algorithms, such as: it factorial (it ) { if (==0) retur ; else retur ( * factorial(-)); } Let t be the umber of multiplicatios eeded to calculate factorial(). The
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationSeunghee Ye Ma 8: Week 5 Oct 28
Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value
More informationMath 155 (Lecture 3)
Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationLearnability with Rademacher Complexities
Learability with Rademacher Complexities Daiel Khashabi Fall 203 Last Update: September 26, 206 Itroductio Our goal i study of passive ervised learig is to fid a hypothesis h based o a set of examples
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationGlivenko-Cantelli Classes
CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce
More informationMath 2784 (or 2794W) University of Connecticut
ORDERS OF GROWTH PAT SMITH Math 2784 (or 2794W) Uiversity of Coecticut Date: Mar. 2, 22. ORDERS OF GROWTH. Itroductio Gaiig a ituitive feel for the relative growth of fuctios is importat if you really
More informationMA131 - Analysis 1. Workbook 3 Sequences II
MA3 - Aalysis Workbook 3 Sequeces II Autum 2004 Cotets 2.8 Coverget Sequeces........................ 2.9 Algebra of Limits......................... 2 2.0 Further Useful Results........................
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationBertrand s Postulate
Bertrad s Postulate Lola Thompso Ross Program July 3, 2009 Lola Thompso (Ross Program Bertrad s Postulate July 3, 2009 1 / 33 Bertrad s Postulate I ve said it oce ad I ll say it agai: There s always a
More informationOPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES
OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES Peter M. Maurer Why Hashig is θ(). As i biary search, hashig assumes that keys are stored i a array which is idexed by a iteger. However, hashig attempts to bypass
More informationLaw of the sum of Bernoulli random variables
Law of the sum of Beroulli radom variables Nicolas Chevallier Uiversité de Haute Alsace, 4, rue des frères Lumière 68093 Mulhouse icolas.chevallier@uha.fr December 006 Abstract Let be the set of all possible
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationGENERATING FUNCTIONS AND RANDOM WALKS
GENERATING FUNCTIONS AND RANDOM WALKS SIMON RUBINSTEIN-SALZEDO 1 A illustrative example Before we start studyig geeratig fuctios properly, let us look a example of their use Cosider the umbers a, defied
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More informationSequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet
More informationSequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018
CSE 353 Discrete Computatioal Structures Sprig 08 Sequeces, Mathematical Iductio, ad Recursio (Chapter 5, Epp) Note: some course slides adopted from publisher-provided material Overview May mathematical
More informationThe Growth of Functions. Theoretical Supplement
The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that
More informationLecture 2. The Lovász Local Lemma
Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationTR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT
TR/46 OCTOBER 974 THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION by A. TALBOT .. Itroductio. A problem i approximatio theory o which I have recetly worked [] required for its solutio a proof that the
More informationDiscrete-Time Systems, LTI Systems, and Discrete-Time Convolution
EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [
More informationMathematical Induction
Mathematical Iductio Itroductio Mathematical iductio, or just iductio, is a proof techique. Suppose that for every atural umber, P() is a statemet. We wish to show that all statemets P() are true. I a
More information4.3 Growth Rates of Solutions to Recurrences
4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationDiscrete Mathematics and Probability Theory Summer 2014 James Cook Note 15
CS 70 Discrete Mathematics ad Probability Theory Summer 2014 James Cook Note 15 Some Importat Distributios I this ote we will itroduce three importat probability distributios that are widely used to model
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More information1 Convergence in Probability and the Weak Law of Large Numbers
36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec
More informationECE 901 Lecture 13: Maximum Likelihood Estimation
ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric
More informationEmpirical Processes: Glivenko Cantelli Theorems
Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3
More informationHomework 9. (n + 1)! = 1 1
. Chapter : Questio 8 If N, the Homewor 9 Proof. We will prove this by usig iductio o. 2! + 2 3! + 3 4! + + +! +!. Base step: Whe the left had side is. Whe the right had side is 2! 2 +! 2 which proves
More informationMA131 - Analysis 1. Workbook 2 Sequences I
MA3 - Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................
More informationLecture 4: April 10, 2013
TTIC/CMSC 1150 Mathematical Toolkit Sprig 01 Madhur Tulsiai Lecture 4: April 10, 01 Scribe: Haris Agelidakis 1 Chebyshev s Iequality recap I the previous lecture, we used Chebyshev s iequality to get a
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 3 : Olie Learig, miimax value, sequetial Rademacher complexity Recap: Miimax Theorem We shall use the celebrated miimax theorem as a key tool to boud the miimax rate
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationLecture 12: November 13, 2018
Mathematical Toolkit Autum 2018 Lecturer: Madhur Tulsiai Lecture 12: November 13, 2018 1 Radomized polyomial idetity testig We will use our kowledge of coditioal probability to prove the followig lemma,
More informationLecture 9: Expanders Part 2, Extractors
Lecture 9: Expaders Part, Extractors Topics i Complexity Theory ad Pseudoradomess Sprig 013 Rutgers Uiversity Swastik Kopparty Scribes: Jaso Perry, Joh Kim I this lecture, we will discuss further the pseudoradomess
More informationShannon s noiseless coding theorem
18.310 lecture otes May 4, 2015 Shao s oiseless codig theorem Lecturer: Michel Goemas I these otes we discuss Shao s oiseless codig theorem, which is oe of the foudig results of the field of iformatio
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationThe random version of Dvoretzky s theorem in l n
The radom versio of Dvoretzky s theorem i l Gideo Schechtma Abstract We show that with high probability a sectio of the l ball of dimesio k cε log c > 0 a uiversal costat) is ε close to a multiple of the
More informationCS / MCS 401 Homework 3 grader solutions
CS / MCS 401 Homework 3 grader solutios assigmet due July 6, 016 writte by Jāis Lazovskis maximum poits: 33 Some questios from CLRS. Questios marked with a asterisk were ot graded. 1 Use the defiitio of
More informationIntroduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT
Itroductio to Extreme Value Theory Laures de Haa, ISM Japa, 202 Itroductio to Extreme Value Theory Laures de Haa Erasmus Uiversity Rotterdam, NL Uiversity of Lisbo, PT Itroductio to Extreme Value Theory
More informationFeedback in Iterative Algorithms
Feedback i Iterative Algorithms Charles Byre (Charles Byre@uml.edu), Departmet of Mathematical Scieces, Uiversity of Massachusetts Lowell, Lowell, MA 01854 October 17, 2005 Abstract Whe the oegative system
More informationIf a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?
2 Lebesgue Measure I Chapter 1 we defied the cocept of a set of measure zero, ad we have observed that every coutable set is of measure zero. Here are some atural questios: If a subset E of R cotais a
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More information(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3
MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationDiscrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions
CS 70 Discrete Mathematics for CS Sprig 2005 Clacy/Wager Notes 21 Some Importat Distributios Questio: A biased coi with Heads probability p is tossed repeatedly util the first Head appears. What is the
More information32 estimating the cumulative distribution function
32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio
More informationLecture Chapter 6: Convergence of Random Sequences
ECE5: Aalysis of Radom Sigals Fall 6 Lecture Chapter 6: Covergece of Radom Sequeces Dr Salim El Rouayheb Scribe: Abhay Ashutosh Doel, Qibo Zhag, Peiwe Tia, Pegzhe Wag, Lu Liu Radom sequece Defiitio A ifiite
More informationSequences I. Chapter Introduction
Chapter 2 Sequeces I 2. Itroductio A sequece is a list of umbers i a defiite order so that we kow which umber is i the first place, which umber is i the secod place ad, for ay atural umber, we kow which
More informationDefinition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.
4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad
More informationDiscrete Mathematics and Probability Theory Fall 2016 Walrand Probability: An Overview
CS 70 Discrete Mathematics ad Probability Theory Fall 2016 Walrad Probability: A Overview Probability is a fasciatig theory. It provides a precise, clea, ad useful model of ucertaity. The successes of
More informationChapter 6 Infinite Series
Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More information