Rademacher Complexity
|
|
- Clare Parrish
- 6 years ago
- Views:
Transcription
1 EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for formal publicatios. They may be distributed outside this class oly with the permissio of the Istructor. Itroductio Rademacher complexity is a measure of the richess of a class of real-valued fuctios. I this sese, it is similar to the VC dimesio. I fact, we will establish a uiform deviatio boud i terms of Rademacher complexity, ad the use this result to prove the VC iequality. Ulike VC dimesio, however, Rademacher complexity is ot restricted to biary fuctios, ad will also prove useful later i the aalysis of other learig algorithms such as kerel-based algorithms. 2 Rademacher Complexity Let G a, b Z be a set of fuctios Z a, b where a, b R, a < b. Let Z,..., Z be i.i.d. radom variables o Z followig some distributio P. Deote the sample S = Z,..., Z. The empirical Rademacher complexity of G with respect to the sample S is R S G : gz i where σ = σ,..., σ with iid uif{, }. Here σ,..., σ are kow as Rademacher radom variables. The complexity R S G is radom because of the radomess of S. The Rademacher complexity of G is R G = E S R S G. Rademacher complexity is sometimes called Rademacher average. A iterpretatio i the cotext of biary classificatio is that G is rich, equivaletly, R S G or R G is high, if we ca choose fuctios g to accurately match differet radom sig combiatios reflected by σ. Note that the complexity is bouded, sice elemets of G are bouded withi the iterval a, b. Theorem Oe-sided Rademacher complexity boud. Let Z, Z,..., Z be iid radom variables takig values i a set Z. Cosider a set of fuctios G a, b Z. δ > 0, with probability δ, we have with respect to the draw of sample S that: g G, EgZ log /δ gz i + 2R G + b a 2. I additio, δ > 0, with probability δ, we have with respect to the draw of S that: g G, EgZ gz i + 2 R log 2/δ S G + 3b a 2. 2
2 2 The fial term i both ad 2 is typically much smaller tha the Rademacher complexity. Note that ad 2 are oe-sided uiform deviatio bouds, ad that 2 is a data-depedet boud. Before provig the theorem, we first review the followig useful facts. Fact : For ay real-valued fuctios f, f 2 : X R, x f x x f 2 x x f x f 2 x. To see this, let ɛ > 0 ad let x be such that f x x f x ɛ. The, x f x x f 2 x x ɛ > 0 was arbitrary, so the result follows. f x f 2 x f x f 2 x + ɛ f x f 2 x + ɛ. x Fact 2: For ay real-valued fuctios f, f 2 : X R, x f x + f 2 x x f x + x f 2 x. Fact 3: is a covex fuctio, i.e., if x λ λ Λ ad x λ λ Λ are two sequeces where Λ is possibly ucoutable, the α 0,, λ Λ αx λ + αx λ α This is a immediate cosequece of Fact 2. λ Λ x λ + α x λ λ Λ Fact 4: Jese s iequality, i.e., if f is covex, the feu EfU. Now, we are ready to prove Theorem. Proof. For otatioal brevity, deote Eg = EgZ ad ÊSg = gz i. The idea is to apply the bouded differece iequality BDI to φs = Eg ÊSg. First, we verify the bouded differece assumptio. Deotig S i = Z,..., Z i, Z i, Z i+,..., Z, we have φs φs i = Eg ÊSg Eg ÊS g ÊS ÊSg g = gz i gz i b a. by Fact Similarly, we ca prove φs φs i b a/ ad therefore φs φs i b a/. By the BDI, we have that with probability δ, log/δ φs E S φs b a. 2 To establish, it remais to show that E S φs 2R G. Thus let us itroduce aother radom sample called a ghost sample S = Z,..., Z with Z i idepedet of S. The E S φs = E S Eg ÊSg = E S E S ÊS g ÊSg by Eg = E S Ê S g iid P,
3 E S,S ÊS g ÊSg by Facts 3 ad 4 = E S,S,S,S,S,S,S = 2R G. gz i gz i gz i gz i gz i + gz i gz i +,S gz i by Fact 2 is symmetric, i The equality holds because i Z i ad Z i are i.i.d. hece gz i gz i ad gz i gz i have the same distributio, ad ii is symmetric. To establish 2, we apply the BDI agai to φs = R S G. Observe that φs φs i = R S G R S G gz i σ j gz j + gz i j i gz i gz i b a. by Fact. Similarly, we ca prove φs φs i b a/ ad thus φs φs i b a/. Applyig the BDI, we have that with probability δ/2, R G R log 2/δ S G + b a 2. 3 Combiig with δ replaced by δ/2 ad the iequality above, we the establish 2, because Prviolatig 2 Prviolatig + Prviolatig 3 δ/2 + δ/2 = δ. 3 The followig two-sided boud also holds. Theorem 2 Two-sided Rademacher complexity boud. Cosider a set of classifiers G a, b Z. δ > 0, with probability δ, we have with respect to the draw of sample S that: EgZ log 2/δ gz i 2R G + b a 2. 4 I additio, δ > 0, with probability δ, we have with respect to the draw of S that: EgZ gz i 2 R log 4/δ S G + 3b a 2. 5 The proof is left as a exercise.
4 4 3 Bouds for Biary Classificatio Cosider a set of biary classifiers H {, } X. Let Z = X {, }. Defie aother set G based o H as G = {x, y {hx y} : h H}. Let S = {Z,..., Z } = {X, Y,..., X, Y }, ad also let T = {X,..., X }, which is the projectio of S o the domai X. The empirical Rachemacher complexity of H should be writte R T H, however, we will follow covetio ad write it as R S H. There should be o cofusio sice the domai of elemets of H is X, so oly the X i s i the sample ca be used whe evaluatig the empirical Rademacher complexity. Thus we have Lemma. R S G = 2 R S H Proof. From the defiitios, we have R S H R S G 2 = 2 = 2 R S H, hx i. {hxi Y i} Y i h X i h X i Y i h X i where the secod to last step follows from the facts that = 0 ad ad Y i have the same distributio. Now observe that E g = E {hx Y } = Rh whe g G is defied i terms of h H. Note also that g Z i = R h. This gives the followig corollary: Corollary. δ > 0, with probability δ, ad with probability δ, Rh R h R H + l /δ 2, Rh R h R l2/δ S H Remark. A two-sided versio of this corollary also holds, with δ δ/2. Example. Let Π = {A,..., A k } be a fixed partitio of X, such as a regular partitio or a recursive dyadic partitio. Let H = {classifiers that are costat o cells i Π}. The H = 2 k. We ll obtai a boud o the
5 5 empirical Rademacher complexity of H. Let la deote the label assiged to A Π. The R S H h X i = k E σ j= = A Π la i:x i A = A Π j h X i la la la. Maipulatig the terms iside the expectatio gives la la la la 2 2 la {, } Jese s iequality = #{i : X i A}, where the last lie follows because σ j = { 0, i j,, i = j. If j = #{i : X i A j }, the k R S H = j= k = j= j P A j, where P A j = j. The oly iequality i the above derivatio was Jese s iequality, ad by the Kitchie- Kahae iequality the reverse iequality holds if we iclude a multiplicative factor of 2, so the calculatio is tight up to this factor. The Rachemacher complexity i this example ca actually be computed exactly i terms of biomal probabilities. This is left as a exercise. 4 Proof of VC Iequality To prove the VC iequality, we will focus o boudig R H i terms of the shatter coefficiet.
6 6 Theorem 3. Massart s Lemma Let A R, A. Set r = max u 2. The u i r 2 l A, where u = u,..., u T. Proof. t 0, we have that exp t u i = exp t exp t exp t exp The summad is a MGF. Due to idepedece, exp t u i where the boud comes from the followig lemma: t u i u i u i u i. = Jese s iequality expoetial is strictly icreasig i exp t u i exp t 2 2u i 2 /8, Lemma 2. Let V be a radom variable o R with E V = 0 ad V a, b with probability oe. The for all t > 0, E e tv e t2 b a 2 /8. This lemma was give ad proved as Lemma i the otes o Hoeffdig s iequality. It was used to prove Hoeffdig s iequality. I our case, we used V = u i, a = u i, ad b = u i. Cotiuig with the proof of Massart s lemma, exp t 2 2u i 2 /8 Takig the log of both sides ad dividig by t gives u i = exp t 2 2 u 2 i = t 2 u 2 2 exp 2 t 2 r 2 exp 2 t 2 r 2 = A exp. 2 l A + tr2 t 2 = r 2 l A,
7 7 where the last step follows from choosig t = 2 l A r. Dividig both sides by completes the proof. This theorem is the key result that bridges the gap betwee VC theory ad Rademacher complexity. We first state ad prove a oe-sided versio of the VC iequality. Theorem 4 Oe-sided VC Iequality. For 0 < δ <, with probability δ, Rh R 8 l SH + l /δ h. Equivaletly, for ay ɛ > 0, Pr Rh R h ɛ S H e ɛ2 /8. 6 Proof. Let H = {, } X ad S = X,..., X X. Deote H S = {h X,..., h X : h H}. If u H S, the u 2 =. By Massart s lemma, R H = E S h X i S 2 l H S E S 2 l E H S 7 Jese s iequality 2 l SH, 8 where the last step follows from the fact that H S S H. From Corollary we deduce that with probability δ, Rh R 2 l SH l /δ h +. 2 We ow observe that for a, b 0, a + b a + b + a + b = 2 a + b. Therefore, for 0 < δ <, with probability δ, Rh R 8 l SH + l /δ /4 h 9 8 l SH + l /δ. This establishes the first part of the theorem. To establish the secod part, set the right-had side equal to ɛ ad solve for δ if o such δ exists, the boud holds trivially. Remark. Note that step 7 is ot ecessary. We could have goe directly to 8 usig the defiitio of the shatter coefficiet. However, the itermediate result gives a uiform deviatio bouds i terms of the expected cardiality of H S, which we used to study mootoe layers ad covex sets i the lecture o VC Theory. Fially, we state the stadard two-sided VC iequality, whose proof is left as a exercise.
8 8 Theorem 5 Two-sided VC Iequality. For 0 < δ <, with probability δ, Rh R 8 l SH + l 2/δ h. Equivaletly, for ay ɛ > 0, Pr Rh R h ɛ 2S H e ɛ2 /8. Exercises. Ca you improve the costats i the empirical Rademacher complexity boud 2 through a sigle, direct applicatio of the bouded differece iequality? 2. Determie a exact formula for the empirical Rademacher complexity of the set of classifiers based o a fixed partitio see example above. 3. Let G, G, G 2 deote arbitrary classes of fuctios Z a, b, ad let c, d be arbitrary real umbers. Show a R S cg + d = c RG, where cg + d := {g z = cgz + d g G}. b R S covg = R S G, where covg := { α ig i N, α i 0, i α i =, g i G}. c R S G + G 2 = R S G + R S G 2, where G + G 2 := {gz = g z + g 2 z g G, g 2 G 2 }. 4. Two-sided uiform deviatio bouds. a Prove Theorem 2. Hit: Apply the oe-sided Rademacher boud agai to G. b Prove Theorem 5. Hit: Observe that R h = Rh ad similarly for the empirical risk. c Show that if G = G, the the two-sided Rademacher boud holds with the same costats as the oe-sided versio. I particular, the substitutio δ δ/2 is uecessary. d Show that if H = H, the the two-sided VC iequality holds with the same costats as the oe-sided versio. I particular, the substitutio δ δ/2 is uecessary. 5. Use iequality 9 to improve the costat i the expoet of 6 at the expese of a larger term i frot of the expoetial.
1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationLecture 2: Concentration Bounds
CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h
More informationGlivenko-Cantelli Classes
CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationThe Boolean Ring of Intervals
MATH 532 Lebesgue Measure Dr. Neal, WKU We ow shall apply the results obtaied about outer measure to the legth measure o the real lie. Throughout, our space X will be the set of real umbers R. Whe ecessary,
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationLearning Theory: Lecture Notes
Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic
More informationStatistical Machine Learning II Spring 2017, Learning Theory, Lecture 7
Statistical Machie Learig II Sprig 2017, Learig Theory, Lecture 7 1 Itroductio Jea Hoorio jhoorio@purdue.edu So far we have see some techiques for provig geeralizatio for coutably fiite hypothesis classes
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationIt is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.
MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied
More informationReview Problems 1. ICME and MS&E Refresher Course September 19, 2011 B = C = AB = A = A 2 = A 3... C 2 = C 3 = =
Review Problems ICME ad MS&E Refresher Course September 9, 0 Warm-up problems. For the followig matrices A = 0 B = C = AB = 0 fid all powers A,A 3,(which is A times A),... ad B,B 3,... ad C,C 3,... Solutio:
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationA 2nTH ORDER LINEAR DIFFERENCE EQUATION
A 2TH ORDER LINEAR DIFFERENCE EQUATION Doug Aderso Departmet of Mathematics ad Computer Sciece, Cocordia College Moorhead, MN 56562, USA ABSTRACT: We give a formulatio of geeralized zeros ad (, )-discojugacy
More informationLecture 9: Expanders Part 2, Extractors
Lecture 9: Expaders Part, Extractors Topics i Complexity Theory ad Pseudoradomess Sprig 013 Rutgers Uiversity Swastik Kopparty Scribes: Jaso Perry, Joh Kim I this lecture, we will discuss further the pseudoradomess
More informationCS / MCS 401 Homework 3 grader solutions
CS / MCS 401 Homework 3 grader solutios assigmet due July 6, 016 writte by Jāis Lazovskis maximum poits: 33 Some questios from CLRS. Questios marked with a asterisk were ot graded. 1 Use the defiitio of
More informationMath 2784 (or 2794W) University of Connecticut
ORDERS OF GROWTH PAT SMITH Math 2784 (or 2794W) Uiversity of Coecticut Date: Mar. 2, 22. ORDERS OF GROWTH. Itroductio Gaiig a ituitive feel for the relative growth of fuctios is importat if you really
More informationDiscrete-Time Systems, LTI Systems, and Discrete-Time Convolution
EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More information(b) What is the probability that a particle reaches the upper boundary n before the lower boundary m?
MATH 529 The Boudary Problem The drukard s walk (or boudary problem) is oe of the most famous problems i the theory of radom walks. Oe versio of the problem is described as follows: Suppose a particle
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More information1+x 1 + α+x. x = 2(α x2 ) 1+x
Math 2030 Homework 6 Solutios # [Problem 5] For coveiece we let α lim sup a ad β lim sup b. Without loss of geerality let us assume that α β. If α the by assumptio β < so i this case α + β. By Theorem
More informationThe log-behavior of n p(n) and n p(n)/n
Ramauja J. 44 017, 81-99 The log-behavior of p ad p/ William Y.C. Che 1 ad Ke Y. Zheg 1 Ceter for Applied Mathematics Tiaji Uiversity Tiaji 0007, P. R. Chia Ceter for Combiatorics, LPMC Nakai Uivercity
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationMath 61CM - Solutions to homework 3
Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More informationDefinition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.
4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform
More information(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3
MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationMath 155 (Lecture 3)
Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationMath 104: Homework 2 solutions
Math 04: Homework solutios. A (0, ): Sice this is a ope iterval, the miimum is udefied, ad sice the set is ot bouded above, the maximum is also udefied. if A 0 ad sup A. B { m + : m, N}: This set does
More informationSequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet
More informationChapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities
Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other
More informationSequences. Notation. Convergence of a Sequence
Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it
More informationMath 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions
Math 451: Euclidea ad No-Euclidea Geometry MWF 3pm, Gasso 204 Homework 3 Solutios Exercises from 1.4 ad 1.5 of the otes: 4.3, 4.10, 4.12, 4.14, 4.15, 5.3, 5.4, 5.5 Exercise 4.3. Explai why Hp, q) = {x
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More informationSTAT Homework 1 - Solutions
STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 6 9/23/203 Browia motio. Itroductio Cotet.. A heuristic costructio of a Browia motio from a radom walk. 2. Defiitio ad basic properties
More informationNICK DUFRESNE. 1 1 p(x). To determine some formulas for the generating function of the Schröder numbers, r(x) = a(x) =
AN INTRODUCTION TO SCHRÖDER AND UNKNOWN NUMBERS NICK DUFRESNE Abstract. I this article we will itroduce two types of lattice paths, Schröder paths ad Ukow paths. We will examie differet properties of each,
More information1 Convergence in Probability and the Weak Law of Large Numbers
36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec
More information4.1 Data processing inequality
ECE598: Iformatio-theoretic methods i high-dimesioal statistics Sprig 206 Lecture 4: Total variatio/iequalities betwee f-divergeces Lecturer: Yihog Wu Scribe: Matthew Tsao, Feb 8, 206 [Ed. Mar 22] Recall
More informationAxioms of Measure Theory
MATH 532 Axioms of Measure Theory Dr. Neal, WKU I. The Space Throughout the course, we shall let X deote a geeric o-empty set. I geeral, we shall ot assume that ay algebraic structure exists o X so that
More informationMa 530 Introduction to Power Series
Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.
More informationLecture 10: Universal coding and prediction
0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved
More informationInformation Theory and Statistics Lecture 4: Lempel-Ziv code
Iformatio Theory ad Statistics Lecture 4: Lempel-Ziv code Łukasz Dębowski ldebowsk@ipipa.waw.pl Ph. D. Programme 203/204 Etropy rate is the limitig compressio rate Theorem For a statioary process (X i)
More informationSolution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1
Solutio Sagchul Lee October 7, 017 1 Solutios of Homework 1 Problem 1.1 Let Ω,F,P) be a probability space. Show that if {A : N} F such that A := lim A exists, the PA) = lim PA ). Proof. Usig the cotiuity
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationLecture Notes for Analysis Class
Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios
More informationThe multiplicative structure of finite field and a construction of LRC
IERG6120 Codig for Distributed Storage Systems Lecture 8-06/10/2016 The multiplicative structure of fiite field ad a costructio of LRC Lecturer: Keeth Shum Scribe: Zhouyi Hu Notatios: We use the otatio
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationEntropy Rates and Asymptotic Equipartition
Chapter 29 Etropy Rates ad Asymptotic Equipartitio Sectio 29. itroduces the etropy rate the asymptotic etropy per time-step of a stochastic process ad shows that it is well-defied; ad similarly for iformatio,
More informationLearnability with Rademacher Complexities
Learability with Rademacher Complexities Daiel Khashabi Fall 203 Last Update: September 26, 206 Itroductio Our goal i study of passive ervised learig is to fid a hypothesis h based o a set of examples
More informationA sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More informationIf a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?
2 Lebesgue Measure I Chapter 1 we defied the cocept of a set of measure zero, ad we have observed that every coutable set is of measure zero. Here are some atural questios: If a subset E of R cotais a
More informationLesson 10: Limits and Continuity
www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals
More informationECE534, Spring 2018: Solutions for Problem Set #2
ECE534, Srig 08: s for roblem Set #. Rademacher Radom Variables ad Symmetrizatio a) Let X be a Rademacher radom variable, i.e., X = ±) = /. Show that E e λx e λ /. E e λx = e λ + e λ = + k= k=0 λ k k k!
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationExpectation and Variance of a random variable
Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationChapter 6 Infinite Series
Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat
More informationNotes for Lecture 11
U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with
More informationEntropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP
Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationSection 11.8: Power Series
Sectio 11.8: Power Series 1. Power Series I this sectio, we cosider geeralizig the cocept of a series. Recall that a series is a ifiite sum of umbers a. We ca talk about whether or ot it coverges ad i
More informationMATH301 Real Analysis (2008 Fall) Tutorial Note #7. k=1 f k (x) converges pointwise to S(x) on E if and
MATH01 Real Aalysis (2008 Fall) Tutorial Note #7 Sequece ad Series of fuctio 1: Poitwise Covergece ad Uiform Covergece Part I: Poitwise Covergece Defiitio of poitwise covergece: A sequece of fuctios f
More informationRandomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)
Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationLaw of the sum of Bernoulli random variables
Law of the sum of Beroulli radom variables Nicolas Chevallier Uiversité de Haute Alsace, 4, rue des frères Lumière 68093 Mulhouse icolas.chevallier@uha.fr December 006 Abstract Let be the set of all possible
More information