Machine Learning Theory (CS 6783)
|
|
- Julian Robbins
- 5 years ago
- Views:
Transcription
1 Machie Learig Theory (CS 6783) Lecture 3 : Olie Learig, miimax value, sequetial Rademacher complexity Recap: Miimax Theorem We shall use the celebrated miimax theorem as a key tool to boud the miimax rate for olie learig problems. Below we state a geeralizatio of Vo Neuma s miimax theorem. Theorem (Browei 4). Let A ad B be Baach spaces. Let A A be oempty, weakly compact, ad covex, ad let B B be oempty ad covex. Let g : A B R be cocave with respect to b B ad covex ad lower-semicotiuous with respect to a A ad weakly cotiuous i a whe restricted to A. The if g(a, b) = if b B a A a A b B g(a, b) The above theorem states that uder the right coditios, oe ca swap ifimum ad remum. We shall use this i a sequetial maer to swap the order of the learer ad adversary ad use this to get a hadle o miimax rate for olie learig. For istace usig the above theorem, we ca show that for ay loss l, lower semicotiuous i its first argumet, as log as Y is well behaved (compact for istace), if ŷt q t l(ŷ t, y t ) + Φ(y t ) = q t (Y) where Φ is some arbitrary fuctio. p t (Y) 2 Miimax Rate for Olie Learig if l(ŷ t, y t ) + Φ(y t ) Recall that the miimax rate for a olie learig problem ca be writte as : V sq = if ŷ q... if ŷ q l(ŷ t, y t ) if q (Y) x X q (F) y Y That is i a sequetial fashio, o each roud, adversary picks the worst iput istace x t X, The learer the picks the optimal q t (Y) the adversary the picks the worst outcome y t Y, the learer draws predictio ŷ t q t with the aim of learer to miimize regret ad goal of adversary to maximize regret. We ow itroduce a shorthad otatio. We shall use the otatio Operator t... to refer to Operator Operator 2... Operator.... Hece for istace, V sq = if l(ŷ t, y t ) if q t (Y)
2 We ca also write the coditioal value as V (x, y,..., x t, y t ) = if x j X q j (Y) y j Y Claim 2. = ŷ j q j j=t+ if j=t+ l(ŷ t, y t ) if l(ŷ t, y t ) if Proof. = if q t (Y) = if q t (Y) = if q t (Y) = = =... = if q t (Y) if q t (Y) Thus we have the claim. if l(ŷ t, y t ) if l(ŷ t, y t ) + x X l(ŷ t, y t ) + x X l(ŷ t, y t ) + x X l(ŷ t, y t ) + x X if q (Y) y Y if y p p (Y) ŷ Y if p (Y) ŷ Y l(ŷ t, y t ) if p y (Y) p { } l(ŷ, y ) if ŷ q }{{} g(q,y ) l(ŷ, y ) if l(ŷ, y ) y p if ŷ Y y p if l(ŷ, y ) if y p = Notice that i the above claim, we have a distributios (possibly depedet) over istaces but have essetially elimiated the role of the learer ad moved to a completely stochastic object. From the above claim it is easy to show that the the miimax rate if govered by a quatity measurig rate of uiform covergece of class F over martigale differece sequeces. Claim 3. P (X Y) t where P is a joit distributio over the sequece of istaces ad t refers to the coditioal expectatio over istace (x t, y t ) give past istaces (x, y ),..., (x t, y t ) 2
3 Proof. = = = P (X Y) if l(ŷ t, y t ) if { } if l(ŷ t, y t ) t 3 Sequetial Rademacher Complexity I the statistical learig framework a key tool was symmetrizatio ad the use of Rademacher complexity. With the use of Rademacher complexity we were able to move our focus o how the fuctio class behaves o the etire space of istaces to oly how rich the class is effectively o samples of size. The questio ow, is whether there is a aalogue of this for uiform covergece over martigales. Surprisigly it turs out that there is ad this complexity we shall refer to as sequetial Rademacher complexity. Claim 4. Proof. = =... 2 x X p (Y ) x X p (Y ) x X p (Y ) ɛ... ɛ x X y Y ɛ t l(f(x t ), y t) y t pt y p ɛ x X y,y Y y,y p y,y p ɛ l(f(x t ), y t) + l(f(x ), y ) l(f(x ), y ) y t pt y p y t pt l(f(x t ), y t) + l(f(x ), y ) l(f(x ), y ) y t pt l(f(x t ), y t) + ɛ (l(f(x ), y ) l(f(x ), y )) y t pt l(f(x t ), y t) + ɛ (l(f(x ), y ) l(f(x ), y )) 3
4 proceedig i similar fashio ɛt y t,y t Y ɛt y t,y t Y 2 ɛt ɛ t (l(f(x t ), y t) ) ɛ t l(f(x t ), y t) + ɛ t = 2 ɛ... ɛ x X y Y x X y Y ɛ t ) ɛ t The above complexity ca be equivaletly writte as follows. ɛ t l(f(x t (ɛ :t )), y t (ɛ :t )) 2 x ɛ y =: 2R sq (l F) Where x ad y are X ad Y valued complete biary tree of depth. That is, for istace x = (x,..., x ) where each x t : {± } t X. To see that the two forms are equivalet, ote that, give ay trees x ad y, ote that ɛ... ɛ ɛ t x X y Y ɛ ɛ... x X y Y ɛ... ɛt+: ɛ ɛ l(f(x t (ɛ), y t (ɛ))) ɛ t + l(f(x (ɛ), y (ɛ))) t ɛ i l(f(x i ), y i ) + i= j=t+ l(f(x j (ɛ), y j (ɛ))) Sice the above statemet holds for ay trees x ad y we ca take the remum over the trees. O the other had, defie a pair of tree x ad y as follows : ) x = argmax ɛ ɛ t l(f(x t ), y t x X ɛt t=2 4
5 (ad similarly defie y ) ad subsequetly, give each ɛ :t defie x t (ɛ :t ) = argmax ɛt t ɛ i l(f(x i (ɛ)), y i (ɛ)) + x X x j X ɛ j i= y j Y j=t+ Clearly by defiitio of these trees, ɛ... ɛ x X y Y ɛ t ɛ ɛ j l(f(x j ), y j ) j=t l(f(x t (ɛ), yt (ɛ))) Sice we have both iequalities we coclude that the two forms are equivalet. I geeral for a give fuctio class G o space Z to reals we defie below the sequetial Rademacher complexity. Defiitio. Give a class G R Z, we defie the sequetial Rademacher complexity of the class G as, R sq (G) = ɛ ɛ t g(z t (ɛ)) z g G Pictorially, we ca view the Rademacher complexity as : 5
Machine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationOn Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform
More informationNotes 19 : Martingale CLT
Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationMath 341 Lecture #31 6.5: Power Series
Math 341 Lecture #31 6.5: Power Series We ow tur our attetio to a particular kid of series of fuctios, amely, power series, f(x = a x = a 0 + a 1 x + a 2 x 2 + where a R for all N. I terms of a series
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationANSWERS TO MIDTERM EXAM # 2
MATH 03, FALL 003 ANSWERS TO MIDTERM EXAM # PENN STATE UNIVERSITY Problem 1 (18 pts). State ad prove the Itermediate Value Theorem. Solutio See class otes or Theorem 5.6.1 from our textbook. Problem (18
More informationREAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS
REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS 18th Feb, 016 Defiitio (Lipschitz fuctio). A fuctio f : R R is said to be Lipschitz if there exists a positive real umber c such that for ay x, y i the domai
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationBoosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32
Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More informationDefinition An infinite sequence of numbers is an ordered set of real numbers.
Ifiite sequeces (Sect. 0. Today s Lecture: Review: Ifiite sequeces. The Cotiuous Fuctio Theorem for sequeces. Usig L Hôpital s rule o sequeces. Table of useful its. Bouded ad mootoic sequeces. Previous
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 23. SOME CONSEQUENCES OF ONLINE NO-REGRET METHODS I this lecture, we explore some cosequeces of the developed techiques.. Covex optimizatio Wheever
More informationLecture 10: Bounded Linear Operators and Orthogonality in Hilbert Spaces
Lecture : Bouded Liear Operators ad Orthogoality i Hilbert Spaces 34 Bouded Liear Operator Let ( X, ), ( Y, ) i i be ored liear vector spaces ad { } X Y The, T is said to be bouded if a real uber c such
More informationGlivenko-Cantelli Classes
CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce
More informationStatistical Machine Learning II Spring 2017, Learning Theory, Lecture 7
Statistical Machie Learig II Sprig 2017, Learig Theory, Lecture 7 1 Itroductio Jea Hoorio jhoorio@purdue.edu So far we have see some techiques for provig geeralizatio for coutably fiite hypothesis classes
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationf n (x) f m (x) < ɛ/3 for all x A. By continuity of f n and f m we can find δ > 0 such that d(x, x 0 ) < δ implies that
Lecture 15 We have see that a sequece of cotiuous fuctios which is uiformly coverget produces a limit fuctio which is also cotiuous. We shall stregthe this result ow. Theorem 1 Let f : X R or (C) be a
More informationThe log-behavior of n p(n) and n p(n)/n
Ramauja J. 44 017, 81-99 The log-behavior of p ad p/ William Y.C. Che 1 ad Ke Y. Zheg 1 Ceter for Applied Mathematics Tiaji Uiversity Tiaji 0007, P. R. Chia Ceter for Combiatorics, LPMC Nakai Uivercity
More information2 Banach spaces and Hilbert spaces
2 Baach spaces ad Hilbert spaces Tryig to do aalysis i the ratioal umbers is difficult for example cosider the set {x Q : x 2 2}. This set is o-empty ad bouded above but does ot have a least upper boud
More informationReal Variables II Homework Set #5
Real Variables II Homework Set #5 Name: Due Friday /0 by 4pm (at GOS-4) Istructios: () Attach this page to the frot of your homework assigmet you tur i (or write each problem before your solutio). () Please
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationOn Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Alexader Rakhli Uiversity of Pesylvaia Karthik Sridhara Corell Uiversity October 17, 2015 Abstract We study a equivalece of (i) determiistic
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More information1 Convergence in Probability and the Weak Law of Large Numbers
36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec
More information2.1. The Algebraic and Order Properties of R Definition. A binary operation on a set F is a function B : F F! F.
CHAPTER 2 The Real Numbers 2.. The Algebraic ad Order Properties of R Defiitio. A biary operatio o a set F is a fuctio B : F F! F. For the biary operatios of + ad, we replace B(a, b) by a + b ad a b, respectively.
More informationLecture 3 The Lebesgue Integral
Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified
More information1 Duality revisited. AM 221: Advanced Optimization Spring 2016
AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R
More informationLearnability with Rademacher Complexities
Learability with Rademacher Complexities Daiel Khashabi Fall 203 Last Update: September 26, 206 Itroductio Our goal i study of passive ervised learig is to fid a hypothesis h based o a set of examples
More informationA Proof of Birkhoff s Ergodic Theorem
A Proof of Birkhoff s Ergodic Theorem Joseph Hora September 2, 205 Itroductio I Fall 203, I was learig the basics of ergodic theory, ad I came across this theorem. Oe of my supervisors, Athoy Quas, showed
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationDISCRETE PREDICTION PROBLEMS: RANDOMIZED PREDICTION
DISCRETE PREDICTION PROBLEMS: RANDOMIZED PREDICTION Csaba Szepesvári Uiversity of Alberta CMPUT 654 E-mail: szepesva@ualberta.ca UofA, October 10-12-14, 2006 OUTLINE 1 DISCRETE PREDICTION PROBLEMS 2 RANDOMIZED
More informationCS 330 Discussion - Probability
CS 330 Discussio - Probability March 24 2017 1 Fudametals of Probability 11 Radom Variables ad Evets A radom variable X is oe whose value is o-determiistic For example, suppose we flip a coi ad set X =
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationHomework Set #3 - Solutions
EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationErratum to: An empirical central limit theorem for intermittent maps
Probab. Theory Relat. Fields (2013) 155:487 491 DOI 10.1007/s00440-011-0393-0 ERRATUM Erratum to: A empirical cetral limit theorem for itermittet maps J. Dedecker Published olie: 25 October 2011 Spriger-Verlag
More informationHomework 4. x n x X = f(x n x) +
Homework 4 1. Let X ad Y be ormed spaces, T B(X, Y ) ad {x } a sequece i X. If x x weakly, show that T x T x weakly. Solutio: We eed to show that g(t x) g(t x) g Y. It suffices to do this whe g Y = 1.
More informationLearning Theory: Lecture Notes
Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic
More informationSupport Vector Machines and Kernel Methods
Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,
More informationHOMEWORK #4 - MA 504
HOMEWORK #4 - MA 504 PAULINHO TCHATCHATCHA Chapter 2, problem 19. (a) If A ad B are disjoit closed sets i some metric space X, prove that they are separated. (b) Prove the same for disjoit ope set. (c)
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationIntroduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT
Itroductio to Extreme Value Theory Laures de Haa, ISM Japa, 202 Itroductio to Extreme Value Theory Laures de Haa Erasmus Uiversity Rotterdam, NL Uiversity of Lisbo, PT Itroductio to Extreme Value Theory
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationIf a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?
2 Lebesgue Measure I Chapter 1 we defied the cocept of a set of measure zero, ad we have observed that every coutable set is of measure zero. Here are some atural questios: If a subset E of R cotais a
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationDistribution of Random Samples & Limit theorems
STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 Fuctioal Law of Large Numbers. Costructio of the Wieer Measure Cotet. 1. Additioal techical results o weak covergece
More informationNotes for Lecture 11
U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with
More informationlim za n n = z lim a n n.
Lecture 6 Sequeces ad Series Defiitio 1 By a sequece i a set A, we mea a mappig f : N A. It is customary to deote a sequece f by {s } where, s := f(). A sequece {z } of (complex) umbers is said to be coverget
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationLecture 14: Graph Entropy
15-859: Iformatio Theory ad Applicatios i TCS Sprig 2013 Lecture 14: Graph Etropy March 19, 2013 Lecturer: Mahdi Cheraghchi Scribe: Euiwoog Lee 1 Recap Bergma s boud o the permaet Shearer s Lemma Number
More informationOutput Analysis and Run-Length Control
IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%
More informationOn Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
Proceedigs of Machie Learig Research vol 65:1 19, 2017 O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Alexader Rakhli Uiversity of Pesylvaia Karthik Sridhara Corell Uiversity rakhli@wharto.upe.edu
More informationAssignment 5: Solutions
McGill Uiversity Departmet of Mathematics ad Statistics MATH 54 Aalysis, Fall 05 Assigmet 5: Solutios. Let y be a ubouded sequece of positive umbers satisfyig y + > y for all N. Let x be aother sequece
More informationLecture Notes for Analysis Class
Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios
More informationMATH301 Real Analysis (2008 Fall) Tutorial Note #7. k=1 f k (x) converges pointwise to S(x) on E if and
MATH01 Real Aalysis (2008 Fall) Tutorial Note #7 Sequece ad Series of fuctio 1: Poitwise Covergece ad Uiform Covergece Part I: Poitwise Covergece Defiitio of poitwise covergece: A sequece of fuctios f
More informationLecture 10: Universal coding and prediction
0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More informationsin(n) + 2 cos(2n) n 3/2 3 sin(n) 2cos(2n) n 3/2 a n =
60. Ratio ad root tests 60.1. Absolutely coverget series. Defiitio 13. (Absolute covergece) A series a is called absolutely coverget if the series of absolute values a is coverget. The absolute covergece
More informationNotes on Snell Envelops and Examples
Notes o Sell Evelops ad Examples Example (Secretary Problem): Coside a pool of N cadidates whose qualificatios are represeted by ukow umbers {a > a 2 > > a N } from best to last. They are iterviewed sequetially
More informationLecture Chapter 6: Convergence of Random Sequences
ECE5: Aalysis of Radom Sigals Fall 6 Lecture Chapter 6: Covergece of Radom Sequeces Dr Salim El Rouayheb Scribe: Abhay Ashutosh Doel, Qibo Zhag, Peiwe Tia, Pegzhe Wag, Lu Liu Radom sequece Defiitio A ifiite
More informationSeunghee Ye Ma 8: Week 5 Oct 28
Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value
More informationLecture 9: Boosting. Akshay Krishnamurthy October 3, 2017
Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely
More informationMath Solutions to homework 6
Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationHomework 9. (n + 1)! = 1 1
. Chapter : Questio 8 If N, the Homewor 9 Proof. We will prove this by usig iductio o. 2! + 2 3! + 3 4! + + +! +!. Base step: Whe the left had side is. Whe the right had side is 2! 2 +! 2 which proves
More informationAn alternating series is a series where the signs alternate. Generally (but not always) there is a factor of the form ( 1) n + 1
Calculus II - Problem Solvig Drill 20: Alteratig Series, Ratio ad Root Tests Questio No. of 0 Istructios: () Read the problem ad aswer choices carefully (2) Work the problems o paper as eeded (3) Pick
More informationChapter IV Integration Theory
Chapter IV Itegratio Theory Lectures 32-33 1. Costructio of the itegral I this sectio we costruct the abstract itegral. As a matter of termiology, we defie a measure space as beig a triple (, A, µ), where
More informationFinal Solutions. 1. (25pts) Define the following terms. Be as precise as you can.
Mathematics H104 A. Ogus Fall, 004 Fial Solutios 1. (5ts) Defie the followig terms. Be as recise as you ca. (a) (3ts) A ucoutable set. A ucoutable set is a set which ca ot be ut ito bijectio with a fiite
More informationSolutions to home assignments (sketches)
Matematiska Istitutioe Peter Kumli 26th May 2004 TMA401 Fuctioal Aalysis MAN670 Applied Fuctioal Aalysis 4th quarter 2003/2004 All documet cocerig the course ca be foud o the course home page: http://www.math.chalmers.se/math/grudutb/cth/tma401/
More informationCHAPTER 5 SOME MINIMAX AND SADDLE POINT THEOREMS
CHAPTR 5 SOM MINIMA AND SADDL POINT THORMS 5. INTRODUCTION Fied poit theorems provide importat tools i game theory which are used to prove the equilibrium ad eistece theorems. For istace, the fied poit
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationLecture 11: Channel Coding Theorem: Converse Part
EE376A/STATS376A Iformatio Theory Lecture - 02/3/208 Lecture : Chael Codig Theorem: Coverse Part Lecturer: Tsachy Weissma Scribe: Erdem Bıyık I this lecture, we will cotiue our discussio o chael codig
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationReal Numbers R ) - LUB(B) may or may not belong to B. (Ex; B= { y: y = 1 x, - Note that A B LUB( A) LUB( B)
Real Numbers The least upper boud - Let B be ay subset of R B is bouded above if there is a k R such that x k for all x B - A real umber, k R is a uique least upper boud of B, ie k = LUB(B), if () k is
More informationLecture 6 Simple alternatives and the Neyman-Pearson lemma
STATS 00: Itroductio to Statistical Iferece Autum 06 Lecture 6 Simple alteratives ad the Neyma-Pearso lemma Last lecture, we discussed a umber of ways to costruct test statistics for testig a simple ull
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationSolutions to HW Assignment 1
Solutios to HW: 1 Course: Theory of Probability II Page: 1 of 6 Uiversity of Texas at Austi Solutios to HW Assigmet 1 Problem 1.1. Let Ω, F, {F } 0, P) be a filtered probability space ad T a stoppig time.
More information17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15
17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig
More informationn=1 a n is the sequence (s n ) n 1 n=1 a n converges to s. We write a n = s, n=1 n=1 a n
Series. Defiitios ad first properties A series is a ifiite sum a + a + a +..., deoted i short by a. The sequece of partial sums of the series a is the sequece s ) defied by s = a k = a +... + a,. k= Defiitio
More informationLecture 15: Strong, Conditional, & Joint Typicality
EE376A/STATS376A Iformatio Theory Lecture 15-02/27/2018 Lecture 15: Strog, Coditioal, & Joit Typicality Lecturer: Tsachy Weissma Scribe: Nimit Sohoi, William McCloskey, Halwest Mohammad I this lecture,
More informationMeasure and Measurable Functions
3 Measure ad Measurable Fuctios 3.1 Measure o a Arbitrary σ-algebra Recall from Chapter 2 that the set M of all Lebesgue measurable sets has the followig properties: R M, E M implies E c M, E M for N implies
More information4. Partial Sums and the Central Limit Theorem
1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems
More informationEntropy Rates and Asymptotic Equipartition
Chapter 29 Etropy Rates ad Asymptotic Equipartitio Sectio 29. itroduces the etropy rate the asymptotic etropy per time-step of a stochastic process ad shows that it is well-defied; ad similarly for iformatio,
More informationECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002
ECE 330:541, Stochastic Sigals ad Systems Lecture Notes o Limit Theorems from robability Fall 00 I practice, there are two ways we ca costruct a ew sequece of radom variables from a old sequece of radom
More information