Binary classification, Part 1

Size: px
Start display at page:

Download "Binary classification, Part 1"

Transcription

1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y 1,1} is called the label 1. I the spirit of the modelfree framework, we assume that the relatioship betwee the features ad the labels is stochastic ad described by a ukow probability distributio P P (Z), where Z = R d 1,1}. I these lectures o biary classificatio, I will be followig maily two excellet sources: the book by Devroye, Györfi, ad Lugosi [DGL96] ad the comprehesive survey article by Bousquet, Bouchero, ad Lugosi [BBL05]. As usual, we cosider the case whe we are give a i.i.d. sample of legth from P. The goal is to lear a classifier, i.e., a mappig g : R d 1,1}, such that the probability of classificatio error, P(g (X ) Y ), is small. As we have see before, the optimal choice is the Bayes classifier g (x) 1, if η(x) > 1/2 1, otherwise (1) where η(x) P[Y = 1 X = x] is the regressio fuctio. However, sice we make o assumptios o P, i geeral we caot hope to lear the Bayes classifier g. Istead, we focus o a more realistic goal: We fix a collectio G of classifiers ad the use the traiig data to come up with a hypothesis ĝ G, such that P(ĝ (X ) Y ) if P(g (X ) Y ) g G with high probability. By way of otatio, let us write L(g ) for the classificatio error of g, i.e., L(g ) P(g (X ) Y ), ad let L (G ) deote the smallest classificatio error attaiable over G : L (G ) if L(g ). g G We will assume that a miimizig g G exists. For future referece, we ote that L(g ) = P(g (X ) Y ) = P(Y g (X ) < 0). (2) Warig: I what follows, we will use C or c to deote various absolute costats; their values may chage from lie to lie. 1 The reaso why we chose 1,1}, rather tha 0,1}, for the label space is merely coveiece. 1

2 1 Learig liear discrimiat rules Oe of the simplest classificatio rules (ad oe of the first to be studied) is a liear discrimiat rule: give a ozero vector w = (w (1),..., w (d) ) R d ad a scalar b R, let g (x) g w,b (x) 1, if w, x + b > 0 1, otherwise (3) Let G be the class of all such liear discrimiat rules as w rages over all ozero vectors i R d ad b rages over all reals: G = g w,b : w R d \0},b R}. Give the traiig sample Z, let ĝ G be the output of the ERM algorithm, i.e., ĝ argmi g G 1 1 g (Xi ) Y i }. I other words, ĝ is ay classifier of the form (3) that miimizes the umber of misclassificatios o the traiig sample. The we have the followig: Theorem 1. There exists a absolute costat C > 0, such that for ay N ad ay δ (0,1), the boud L(ĝ ) L d + 1 (G ) +C + 2log(1/δ) (4) holds with probability at least 1 δ. Proof. A stadard argumet leads to the boud where L(ĝ ) L (G ) + 2 (Z ), (5) (Z ) sup L(g ) L (g ) g G is the uiform deviatio ad L (g ) deotes the empirical classificatio error of g o Z : L (g ) = 1 1 g (Xi ) Y i }, which is the fractio of icorrectly labeled poits i the traiig sample Z. Cosider a classifier g G ad defie the set } C g (x, y) R d 1,1} : y ( w, x + b) 0. The it is easy to see that where, as before, L(g ) = P(C g ) ad L (g ) = P (C g ), P = 1 Zi = δ 1 δ (Xi,Y i ) is the empirical distributio of the sample Z. Let C deote the collectio of all sets of the form C = C g for some g G. The (Z ) = sup P (C ) P(C ). C C 2

3 Let F = F C deote the class of idicator fuctios of the sets i C : F C = 1 C } : C C }. The we kow that, with probability at least 1 δ, (Z ) 2ER (F (Z log(1/δ) )) +, (6) 2 where R (F (Z )) is the Rademacher average of the projectio of F oto the sample Z. Now, F (Z ) = (f (Z 1 ),..., f (Z )) : f F } Therefore, if we prove that C is a VC class, the = (1 Z1 C },...,1 Z C }) : C C }. R (F (Z )) C V (C ). But this follows from the fact that ay C C has the form } d C = (x, y) R d 1,1} : w (j ) y x (j ) + by 0 for some w R d \0} ad some b R, ad the fuctios (x, y) y ad (x, y) y x (j ), 1 j d, spa a liear space of dimesio o greater tha d + 1. Hece, V (C ) d + 1, so that R (F (Z V (C ) d + 1 )) C C. Combiig this with (5) ad (6), we see that (4) holds with probability at least 1 δ. 1.1 Geeralized liear discrimiat rules I the same vei, we may cosider classificatio rules of the form g (x) = 1, if k w (j ) ψ j (x) + b > 0 1, otherwise (7) where k is some positive iteger (ot ecessarily equal to d), w = (w (1),..., w (k) ) R k is a ozero vector, b R is a arbitrary scalar, ad Ψ = ψ j : R d R} k is some fixed dictioary of real-valued fuctios o R d. For a fixed Ψ, let G deote the collectio of all classifiers of the form (7) as w rages over all ozero vectors i R k ad b rages over all reals. The the ERM rule is, agai, give by ĝ if L 1 (g ) if g G g G 1 g (Xi ) Y i }. The followig result ca be proved pretty much alog the same lies as Theorem 1: Theorem 2. There exists a absolute costat C > 0, such that for ay N ad ay δ (0,1), the boud L(ĝ ) L k + 1 (G ) +C + 2log(1/δ) (8) holds with probability at least 1 δ. 3

4 1.2 Two fudametal issues As Theorems 1 ad 2 show, the ERM algorithm applied to the collectio of all (geeralized) liear discrimiat rules is guarateed to work well i the sese that the classificatio error of the output hypothesis will, with high probability, be close to the optimum achievable by ay discrimiat rule with the give structure. The same argumet exteds to ay collectio of classifiers G, for which the error sets (x, y) : y g (x) 0}, g G, form a VC class of dimesio much smaller tha the sample size. I other words, with high probability the differece L(ĝ ) L (G ) = L(ĝ ) if g G L(g ) will be small. However, precisely because the VC dimesio of G caot be too large, the approximatio properties of G will be limited. Aother problem is computatioal. For istace, the problem of fidig a empirically optimal liear discrimiat rule is NP-hard. I other words, uless P is equal to NP, there is o hope of comig up with a efficiet ERM algorithm for liear discrimiat rules that would work for all feature space dimesios d. If d is fixed, the it is possible to eumerate all projectios of a give sample Z oto the class of idicators of all halfspaces i O( d 1 log) time, which allows for a exhaustive search for a ERM solutio, but the usefuless of this aive approach is limited to d < 5. 2 Risk bouds for combied classifiers via surrogate loss fuctios Oe way to sidestep the above approximatio-theoretic ad computatioal issues is to replace the 0 1 Hammig loss fuctio that gives rise to the probability of error criterio with some other loss fuctio. What we gai is the ability to boud the performace of various complicated classifiers built up by combiig simpler base classifiers i terms of the complexity (e.g, the VC dimesio) of the collectio of the base classifiers, as well as cosiderable computatioal advatages, especially if the problem of miimizig the empirical surrogate loss turs out to be a covex programmig problem. What we lose, though, is that, i geeral, we will ot be able to compare the geeralizatio error of the leared classifier to the miimum classificatio risk. Istead, we will have to be cotet with the fact that the geeralizatio error will be close to the smallest surrogate loss. We will cosider classifiers of the form g f (x) = sg f (x) 1, if f (x) 0 1, otherwise (9) where f : R d R is some fuctio. From (2) we have L(g f ) = P(g f (X ) Y ) P(Y g f (X ) < 0) = P(Y f (X ) < 0). From ow o, whe dealig with classifiers of the form (9), we will write L(f ) istead of L(g f ) to keep the otatio simple. Now we itroduce the otio of a surrogate loss fuctio. Defiitio 1. A surrogate loss fuctio is ay fuctio ϕ : R R +, such that Some examples of commoly used surrogate losses: ϕ(x) 1 x>0}. (10) 4

5 1. Expoetial, ϕ(x) = e x 2. Logit, ϕ(x) = log 2 (1 + e x ) 3. Hige loss, ϕ(x) = (x + 1) + maxx + 1,0} Let ϕ be a surrogate loss. The for ay (x, y) R d 1,1} ad ay f : R d R we have y f (x) < 0 ϕ( y f (x)) 1 y f (x)>0} = 1 y f (x)<0}. (11) Therefore, defiig the ϕ-risk of f by A ϕ (f ) E[ϕ( Y f (X ))] ad its empirical versio we see from (11) that A ϕ, (f ) 1 ϕ( Y i f (X i )), L(f ) A ϕ (f ) ad L (f ) A ϕ, (f ). (12) Now that these prelimiaries are out of the way, we ca state ad prove the basic surrogate loss boud: Theorem 3. Cosider ay learig algorithm A = A } =1, where, for each, the mappig A receives the traiig sample Z = (Z 1,..., Z ) as iput ad produces a fuctio f : R d R from some class F. Suppose that F ad the surrogate loss ϕ are chose so that the followig coditios are satisfied: 1. There exists some costat B > 0 such that sup (x,y) R d 1,1} sup ϕ( y f (x)) B 2. There exists some costat M ϕ > 0 such that ϕ is M ϕ -Lipschitz, i.e., ϕ(u) ϕ(v) M ϕ u v, u, v R. The for ay ad ay δ (0,1) the followig boud holds with probability at least 1 δ: Proof. Usig (12), we ca write L( f ) A ϕ, ( f ) + 4M ϕ ER (F (X log(1/δ) )) + B. (13) 2 L( f ) A ϕ ( f ) = A ϕ, ( f ) + A ϕ ( f ) A ϕ, ( f ) A ϕ, ( f ) + sup Aϕ (f ) A ϕ, (f ). 5

6 Now let H deote the class of fuctios h : R d 1,1} R of the form h(x, y) = y f (x), f F. The sup Aϕ (f ) A ϕ, (f ) = sup E[ϕ( Y f (X ))] 1 ϕ( Y i f (X i )) = sup P(ϕ h) P (ϕ h), h H where ϕ h(z) ϕ(h(z)) for every z = (x, y) R d 1,1}. Let (Z ) sup P(ϕ h) P (ϕ h) h H = sup h H P(ϕ h ϕ(0)) P (ϕ h ϕ(0)), where the secod lie follows from the fact that addig the same costat to each ϕ h does ot chage the value of P (ϕ h) P(ϕ h). Usig the familiar symmetrizatio argumet, we ca write E (Z ) 2ER ( Hϕ (Z ) ), (14) where H ϕ deotes the class of all fuctios of the form (x, y) ϕ(h(x, y)) ϕ(0), h H. We ow use a very powerful result about the Rademacher averages called the cotractio priciple, which states the followig [LT91]: If A R is a bouded set ad F : R R is a M-Lipschitz fuctio satisfyig F (0) = 0, the R (F A ) 2MR (A ), (15) where F A (F (a 1 ),...,F (a )) : a = (a 1,..., a ) A }. (The proof of the cotractio priciple is somewhat ivolved, ad we do ot give it here.) Cosider the fuctio F (u) = ϕ(u) ϕ(0). This fuctio clearly satisfies F (0) = 0, ad it is M ϕ -Lipschitz, by our assumptios o ϕ. Moreover, from our defiitio of H ϕ, we immediately see that H ϕ (Z ) = ( ϕ(h(z 1 ) ϕ(0),...,ϕ(h(z ) ϕ(0) ) : h H } = (F (h(z 1 )),...,F (h(z ))) : h H } = F H (Z ). Therefore, applyig (15) to A = H (Z ) ad the usig the resultig boud i (14), we obtai E (Z ) 4M ϕ ER ( H (Z ) ). Furthermore, lettig σ be a i.i.d. Rademacher tuple idepedet of Z, we have [ ] R (H (Z )) = 1 E σ sup σ h H i h(z i ) ] = 1 E σ [sup σ i Y i f (X i ) ] = 1 E σ [sup σ i f (X i ) R (F (X )), 6

7 which leads to E (Z ) 4M ϕ ER ( F (X ) ). (16) Now, sice every fuctio ϕ h is bouded betwee 0 ad B, the fuctio (Z ) has bouded differeces with c 1 =... = c = B/. Therefore, from (16) ad from McDiarmid s iequality, we have for every t > 0 that ( ) ( ) P (Z ) 4M ϕ ER (F (X )) + t P (Z ) E (Z ) + t e 2t 2 /B 2. Choosig t = B (2) 1 log(1/δ), we see that with probability at least 1 δ. Therefore, sice (Z ) 4M ϕ ER (F (X )) + B we see that (13) holds with probability at least 1 δ. L( f ) A ϕ, ( f ) + (Z ), log(1/δ) What the above theorem tells us is that the performace of the leared classifier f is cotrolled by the Rademacher average of the class F, ad we ca always arrage it to be relatively small. We will ow look at several specific examples. 2 3 Weighted liear combiatio of classifiers Let G = g : R d 1,1}} be a class of base classifiers (ot to be cofused with Bayes classifiers!), ad cosider the class } F λ f = c j g j : N N, c j λ; g 1,..., g N G, where λ > 0 is a tuable parameter. The for each f = N c j g j F λ the correspodig classifier g f of the form (9) is give by ( ) g f (x) = sg c j g j (x). A useful way of thikig about g f is that, upo receivig a feature x R d, it computes the outputs g 1 (x),..., g N (x) of the N base classifiers from G ad the takes a weighted majority vote ideed, if we had c 1 =... = c N = λ/n, the sg(g f (x)) would precisely correspod to takig the majority vote amog the N base classifiers. Note, by the way, that the umber of base classifiers is ot fixed, ad ca be leared from the data. Now, Theorem 3 tells us that the performace of ay learig algorithm that accepts a traiig sample Z ad produces a fuctio f F λ is cotrolled by the Rademacher average R (F λ (X )). It turs out, moreover, that we ca relate it to the Rademacher average of the base class G. To start, ote that F λ = λ abscovg, 7

8 where } abscovg = c j g j : N N; c = c j 1; g 1,..., g N G is the absolute covex hull of G. Therefore R (F λ (X )) = λ R (G (X )). Now ote that the fuctios i G are biary-valued. Therefore, assumig that the base class G is a VC class, we will have R (G (X V (G ) )) C. Combiig these bouds with the boud of Theorem 3, we coclude that for ay f selected from F λ based o the traiig sample Z, the boud L( f ) A ϕ, ( f V (G ) ) +CλM ϕ + B log(1/δ) 2 will hold with probability at least 1 δ, where B is the uiform upper boud o ϕ( y f (x)), f F Λ, (x, y) R d 1,1} ad M ϕ is the Lipschitz costat of the surrogate loss ϕ. Note that the above boud ivolves oly the VC dimesio of the base class, which is typically small. O the other had, the class F λ obtaied by formig weighted combiatios of classifiers from G is extremely rich, ad will geerally have ifiite VC dimesio! But there is a price we pay: The first term is the empirical surrogate loss A ϕ, ( f ), rather tha the empirical classificatio error L ( f ). However, it is possible to choose the surrogate ϕ i such a way that A ϕ, ( ) ca be bouded i terms of a quatity related to the umber of misclassified traiig examples. Here is a example. Fix a positive parameter γ > 0 ad cosider 0, if x γ ϕ(x) = 1, if x x/γ, otherwise This is a valid surrogate loss with B = 1 ad M ϕ = 1/γ, but i additio we have ϕ(x) 1 x> γ}, which implies that ϕ( y f (x)) 1 y f (x)<γ}. Therefore, for ay f we have The quatity is called the margi error of f. Notice that: For ay γ > 0, L γ (f ) L (f ) The fuctio γ L γ (f ) is icreasig. A ϕ, (f ) = 1 i f (X i )) ϕ( Y 1 L γ (f ) 1 1 Yi f (X i )<γ}. (17) 1 Yi f (X i )<γ} (18) 8

9 Notice also that we ca write L γ (f ) = 1 1 Yi f (X i )<0} Yi f (X i )<γ}, where the first term is just L (f ), while the secod term is the umber of traiig examples that were classified correctly, but oly with small margi" (the quatity Y f (X ) is ofte called the margi of the classifier f ). Theorem 4 (Margi-based risk boud for weighted liear combiatios). For ay γ > 0, the boud L( f ) L γ ( f ) + Cλ V (G ) γ + log(1/δ) (19) holds with probability at least 1 δ. Remark 1. Note that the first term o the right-had side of (19) icreases with γ, while the secod term decreases with γ. Hece, if the leared classifier f has a small margi error for a large γ, i.e., it classifies the traiig samples well ad with high cofidece," the its geeralizatio error will be small. Refereces [BBL05] O. Bousquet, S. Bouchero, ad G. Lugosi. Theory of classificatio: a survey of recet advaces. ESAIM Probability ad Statistics, 9: , [DGL96] L. Devroye, L. Györfi, ad G. Lugosi. A Probabilistic Theory of Patter Recogitio. Spriger, [LT91] M. Ledoux ad M. Talagrad. Probability i Baach Spaces: Isoperimetry ad Processes. Spriger,

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

Dimensionality reduction in Hilbert spaces

Dimensionality reduction in Hilbert spaces Dimesioality reductio i Hilbert spaces Maxim Ragisky October 3, 014 Dimesioality reductio is a geeric ame for ay procedure that takes a complicated object livig i a high-dimesioal (or possibly eve ifiite-dimesioal)

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Monte Carlo Integration

Monte Carlo Integration Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

Beurling Integers: Part 2

Beurling Integers: Part 2 Beurlig Itegers: Part 2 Isomorphisms Devi Platt July 11, 2015 1 Prime Factorizatio Sequeces I the last article we itroduced the Beurlig geeralized itegers, which ca be represeted as a sequece of real umbers

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

Learnability with Rademacher Complexities

Learnability with Rademacher Complexities Learability with Rademacher Complexities Daiel Khashabi Fall 203 Last Update: September 26, 206 Itroductio Our goal i study of passive ervised learig is to fid a hypothesis h based o a set of examples

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017 Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 7

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 7 Statistical Machie Learig II Sprig 2017, Learig Theory, Lecture 7 1 Itroductio Jea Hoorio jhoorio@purdue.edu So far we have see some techiques for provig geeralizatio for coutably fiite hypothesis classes

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

INEQUALITIES BJORN POONEN

INEQUALITIES BJORN POONEN INEQUALITIES BJORN POONEN 1 The AM-GM iequality The most basic arithmetic mea-geometric mea (AM-GM) iequality states simply that if x ad y are oegative real umbers, the (x + y)/2 xy, with equality if ad

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology Advaced Aalysis Mi Ya Departmet of Mathematics Hog Kog Uiversity of Sciece ad Techology September 3, 009 Cotets Limit ad Cotiuity 7 Limit of Sequece 8 Defiitio 8 Property 3 3 Ifiity ad Ifiitesimal 8 4

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

CHAPTER 5. Theory and Solution Using Matrix Techniques

CHAPTER 5. Theory and Solution Using Matrix Techniques A SERIES OF CLASS NOTES FOR 2005-2006 TO INTRODUCE LINEAR AND NONLINEAR PROBLEMS TO ENGINEERS, SCIENTISTS, AND APPLIED MATHEMATICIANS DE CLASS NOTES 3 A COLLECTION OF HANDOUTS ON SYSTEMS OF ORDINARY DIFFERENTIAL

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

The Maximum-Likelihood Decoding Performance of Error-Correcting Codes

The Maximum-Likelihood Decoding Performance of Error-Correcting Codes The Maximum-Lielihood Decodig Performace of Error-Correctig Codes Hery D. Pfister ECE Departmet Texas A&M Uiversity August 27th, 2007 (rev. 0) November 2st, 203 (rev. ) Performace of Codes. Notatio X,

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Seunghee Ye Ma 8: Week 5 Oct 28

Seunghee Ye Ma 8: Week 5 Oct 28 Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value

More information

Differentiable Convex Functions

Differentiable Convex Functions Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions Math 451: Euclidea ad No-Euclidea Geometry MWF 3pm, Gasso 204 Homework 3 Solutios Exercises from 1.4 ad 1.5 of the otes: 4.3, 4.10, 4.12, 4.14, 4.15, 5.3, 5.4, 5.5 Exercise 4.3. Explai why Hp, q) = {x

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 3 : Olie Learig, miimax value, sequetial Rademacher complexity Recap: Miimax Theorem We shall use the celebrated miimax theorem as a key tool to boud the miimax rate

More information

On Classification Based on Totally Bounded Classes of Functions when There are Incomplete Covariates

On Classification Based on Totally Bounded Classes of Functions when There are Incomplete Covariates Joural of Statistical Theory ad Applicatios Volume, Number 4, 0, pp. 353-369 ISSN 538-7887 O Classificatio Based o Totally Bouded Classes of Fuctios whe There are Icomplete Covariates Majid Mojirsheibai

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Pattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm

Pattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm Patter recogitio systems Lab 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his lab sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet descet ad

More information

Selective Prediction

Selective Prediction COMS 6998-4 Fall 2017 November 8, 2017 Selective Predictio Preseter: Rog Zhou Scribe: Wexi Che 1 Itroductio I our previous discussio o a variatio o the Valiat Model [3], the described learer has the ability

More information

Mathematical Induction

Mathematical Induction Mathematical Iductio Itroductio Mathematical iductio, or just iductio, is a proof techique. Suppose that for every atural umber, P() is a statemet. We wish to show that all statemets P() are true. I a

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

CS 2750 Machine Learning. Lecture 23. Concept learning. CS 2750 Machine Learning. Concept Learning

CS 2750 Machine Learning. Lecture 23. Concept learning. CS 2750 Machine Learning. Concept Learning Lecture 3 Cocept learig Milos Hauskrecht milos@cs.pitt.edu Cocept Learig Outlie: Learig boolea fuctios Most geeral ad most specific cosistet hypothesis. Mitchell s versio space algorithm Probably approximately

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

Lecture 11: Decision Trees

Lecture 11: Decision Trees ECE9 Sprig 7 Statistical Learig Theory Istructor: R. Nowak Lecture : Decisio Trees Miimum Complexity Pealized Fuctio Recall the basic results of the last lectures: let X ad Y deote the iput ad output spaces

More information

7 Sequences of real numbers

7 Sequences of real numbers 40 7 Sequeces of real umbers 7. Defiitios ad examples Defiitio 7... A sequece of real umbers is a real fuctio whose domai is the set N of atural umbers. Let s : N R be a sequece. The the values of s are

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Chapter Vectors

Chapter Vectors Chapter 4. Vectors fter readig this chapter you should be able to:. defie a vector. add ad subtract vectors. fid liear combiatios of vectors ad their relatioship to a set of equatios 4. explai what it

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Glivenko-Cantelli Classes

Glivenko-Cantelli Classes CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce

More information

Enumerative & Asymptotic Combinatorics

Enumerative & Asymptotic Combinatorics C50 Eumerative & Asymptotic Combiatorics Stirlig ad Lagrage Sprig 2003 This sectio of the otes cotais proofs of Stirlig s formula ad the Lagrage Iversio Formula. Stirlig s formula Theorem 1 (Stirlig s

More information

Lecture 6: Integration and the Mean Value Theorem. slope =

Lecture 6: Integration and the Mean Value Theorem. slope = Math 8 Istructor: Padraic Bartlett Lecture 6: Itegratio ad the Mea Value Theorem Week 6 Caltech 202 The Mea Value Theorem The Mea Value Theorem abbreviated MVT is the followig result: Theorem. Suppose

More information

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 18.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 15 Scribe: Zach Izzo Oct. 27, 2015 Part III Olie Learig It is ofte the case that we will be asked to make a sequece of predictios,

More information