Learnability with Rademacher Complexities

Size: px
Start display at page:

Download "Learnability with Rademacher Complexities"

Transcription

1 Learability with Rademacher Complexities Daiel Khashabi Fall 203 Last Update: September 26, 206 Itroductio Our goal i study of passive ervised learig is to fid a hypothesis h based o a set of examples that has small error with respect to some target fuctio. Oe ca improve geeralizatio by cotrollig the complexity of the cocept class H from which we are choosig a hypothesis. Oe way to achieve this is via the ideas i VC dimesio. Here we will itroduce Rademacher complexity as aother way of hadlig hypothesis space complexity, ad as a result, derivig geeralizatio bouds. Here are some major differeces our results will have with those i the discussio of VC dimesio: Oe observatio i the dicusssio of VC dimesio is that it is idepedet of the data distributio. I other words, its gurattees hold for ay data distributio; o the other had, the boud that it gives might ot be tight for certais data distributios. The aalysis of VC dimesio boud apply to discrete problems (such as classificatio), ad it does ot state aythig about problems like regressio. 2 Rademacher Averages/Complexities Here we defie Rademacher complexity which will be used i boudig risk fuctios. Defiitio 2. (Rademacher Average). If H F = {f : X R} be a class of fuctios we are explorig defied o domai X X, ad S = {x i } be the set of samples geerated by some ukow distributio D X o the same domai X. Defie σ i to be uiform radom variable o ±, for ay i. The empirica" Rademacher average or complexity is defied as followig: ˆR S (H) = E σ {x i} = E σ {x i} ad the expectatio of the above measure, with respect to the radom samples {x i }, is called the Rademacher average or complexity: R (H) = E ˆR S (H) = E σ Implicit assumptio: premum over the fuctio class H is measurable.

2 There is a similar defiitio without the absolutes, which have similar properties as above: ˆR a S(H) = E σ {x i } ad R a (H) = E ˆR a S(H) = E σ Aother way of writig the Rademacher complexiity is the followig ˆR S (H) = E σ f S.σ {x i} where f S = (f(x ),..., f(x )), ad σ = (σ,..., σ ). The dot product f S.σ measures the correlatio betwee the fuctio values, ad the radom oise vector. I overall, the Rademacher complexity measures how well the fuctio class H ca correlate with radom oise. The richer the hypothesis class it, the better it will correclate with the radom oise. Here are some useful properties of the Rademacher averages. Lemma 2.. For ay {x i } ad for ay fuctio class F ad H, that map X R:. If H F the ˆR S (H) ˆR S (F). 2. For ay fuctio h : X R, the ˆR a S (F h) = ˆR a S (F). 3. If cvx(f) = {x E f π f(x), π (F)} the ˆR a S (F) = ˆR a S (cvx(f)). 4. ˆRa S (F H) = ˆR a S (F) ˆR a S (H). Proof of this propositio is icluded i Sectio 7. Proof. We prove each propositio:. 2. ˆR S (H) = E σ {x i} E σ ˆR a S(F h) = E σ = E σ = E σ f F f F f F f F {x i} = ˆR S (F) σ i (f(x i ) h(x i )) {x i } σ i h(x i ) {x i } {x i } 0 = ˆR a S(F) 2

3 3. 4. ˆR a S(cvx(F)) = E σ π (F) σ i E f π f(x i ) {x i } = E σ E f π {x i } π (F) = E σ {x i } = ˆR a S(F) f F ˆR a S(F H) = E σ = E σ f F,h H f F σ i (f(x i ) h(x i )) {x i } h H (swap oly i the corers of the covex set) σ i h(x i ) {x i } = ˆR a S(F) ˆR a S(H) Lemma 2.2. Give real-valued CDF fuctio F (x), ad F beig class of idicator fuctios o half-itervals which defie the empirical CDF fuctio: ˆF S (x) = {Xi x} we ca show that, with S = (X = x,..., X = x ). E S ˆF S (x) F (x) 2R (F) x R Proof. The trick that is commoly used for this is covertig expectatio to empirical mea by itroducig fake/ghost samples S ad symmetrizatio: E S ˆF S (x) F (x) = E S ˆF S (x) E S ˆFS (x) E S,S ˆF S (x) ˆF S (x) x R x R x R = E S,S {Xi x} {X i x} d = ES,S,σ σ i {Xi x} { Xi x} 2E S,σ σ i {Xi x} = 2R (F), for F = half-itervals x R It turs out that this observatio is geeral for ay loss fuctio. techique could be geeralized to ay loss fuctio. The followig boudig 3

4 Lemma 2.3. Give a class fuctios H = {f : X R} defied o domai X X, we have the followig geeral boud o the Rademacher average: E S Ef ÊSf 2R (H) with S = (x,..., x ) ad ÊSf = f(x i). Proof. The steps for the previous proof hold for this proof, with some mior chages. Agai, we covert expectatio to empirical mea by itroducig fake/ghost samples S ad symmetrizatio: E S Ef ÊSf = E S E S Ê S f ÊSf = E S,S ÊS f ÊSf = E S,S,σ 2E S,σ x R = 2R (F) σ i f(xi ) f(x i) With the followig lemma we show how to geeralize Rademacher averages usig Lipchitz maps. Lemma 2.4 (Ledoux-Talagrad cotractio). Let f : R R be a covex ad icreasig fuctio. Also let φ i (x) : R R, s.t. it satisfies φ i (0) = 0 with Lipchitz costat L (for ay x, y R φ i (x) φ i (y) L x y ). For ay T R, E σ f ( 2 t T ) ( σ i φ i (t i ) E σ f L. t T ) σ i t i Proof. Proof with defiitio of Rademacher average ad properties of covex fuctios. The above lemma will result the followig boud: Corollary 2.5. Let F be a class of fuctios with domai X ad φ(.) be a L-Lipchitz map from R to R with φ(0) = 0. The compositio of the map o the fuctios is defied as φ F = {φ f f F}. The R (φ F) 2LR (F) Proof. I the previous lemma, take the covex icreasig fuctio be the idetity fuctio. 2. Rademacher complexity of liear class Here we aalyze the Rademacher complexity of the followig liear classes. These results will come hady i aalyzig the geeralizatio bouds of may forthcomig problems which ivolve liear models. Defie the followig classes: H = {x x, w : w }, H 2 = {x x, w : w 2 } 4

5 Lemma 2.6. Let S = (x,..., x ), the R (H 2 S) max i x i 2 Proof. Due to Jese iequality: R a (H) = E σ w: w 2 = E σ w, w: w 2 E σ σ i x i 2 σ i x i, w σ i x i E σ σ i x i 2 E σ 2 σ i x i Sice the Rademacher radom variables are idepedet of each other, we have: E σ 2 σ i x i = E σ σ i σ j x i, x j 2 i,j = x i, x j E σ σ i σ j x i, x i E σ σ 2 i i i,j,i j = x i 2 i max x i 2 = max i x i i 2 Lemma 2.7. Let S = (x,..., x ), the Proof. R a 2 log 2 (H S) max x i i R a (H) = E σ w: w = E σ w, w: w E σ σ i x i σ i x i, w σ i x i 5

6 The last step is doe via the fiite class lemma (see Lemma 4.). 3 Geeralizatio bouds Here is the mai theorem, which cotais the geeralizatio bouds via Rademacher complexity: Theorem 3.. Let F be a class of fuctios, defied o domai X ad mappig to 0,. For some δ (0, ), ad for ay f F: log /δ Ef(X) E f(x) 2R (F), with probability at least δ 2 Also for ay f F: Ef(X) log 2/δ f(x i ) 2R S (F) 5, with probability at least δ 2 Similar results ca be foud with slightly differet defiitio of the Radmacher average: Theorem 3.2. Let F be a class of fuctios, defied o domai X ad mappig to 0,. For some δ (0, ), ad for ay f F: log /δ Ef(X) E f(x) 2R a (F), with probability at least δ 2 Also for ay f F: Ef(X) log 2/δ f(x i ) 2R a S(F) 3, with probability at least δ 2 A side ote before jumpi ito the proof: usually i practice the set F is a compositio of iput space X, hypothesis fuctios H ad the loss family l whcih measures the quality of the learig: F = l H S For example for SVM, H is space of liear classifiers, ad l is margi based (hard/soft) loss. Aother issue worhty to poit out is that, here we assumed that the rage of the fuctio F is bouded iside 0,. However if the fuctio is raged betwee 0, c, a c coefficiet would appear before log 2/δ 2 (easy to verify through the proof). Proof. For a sample set S = (x, x 2,..., x ), defie the followig fuctio Φ S (F) = f F E f f(x i ) Proof uses the McDiarmid s boud o the fuctio Φ S (F); defie the sample set S to be exactly the same as S, except oe differig sample. Φ S (F) Φ S (F) f(x f F i ) f(x i ) = f(x j ) f(x j ) x i S x i S f F 6 x i S

7 We used the fact that remum of diffece is bigger tha the differece of remums. Also we implicitly assumed that the fuctio is bouded betwee 0 ad. Hece we proved that Φ S (F) Φ S (F) Usig the boudedess property of Φ(.) ad usig the McDiarmid s iequality we have: log 2/δ Φ S (F) E S Φ S (F), with probably at least δ/2 2 Note that usig Lemma 2.3 we kow: E S Φ S (F) 2R (H) which would give us the first iequality (with δ/2 replaced with δ). To get the secod iequaly, we apply the McDiarmid boud o the Rademacher defiitio: R (H) ˆR log 2/δ S (H), with probably at least δ/2 2 Combie this with the previous result ad we will the 2d iequality the i the defitio of the theorem. 3. Cocetratio bouds for biary classificatio We start with a few examples, ad the move to more geeral theorems. Example 3.3. Let f : X {0, }, ad let (X, Y ) X {0, } be radom i.i.d. sampligs from the joit distributio P XY. Cosider the empirical risk defied as, L (f) = {f(x i ) Y i }. Prove that for ay f F, L(f) L (f) probability at least δ. Hit: Use Beristei s iequality. 2L(f) log(/δ) 2 log(/δ) 3 () 2. Use the result of the previous part to show that, for ay f F, 2L (f) log(/δ) L(f) L (f) 4 log(/δ) with probability at least δ. Use this to prove that if the ERM solutio predicts every test data correctly, i.e., if L ( ˆf ) = 0, the, L( ˆf ) 4 log( F /δ) with probability at least δ. This boud also holds with the relatioship betwee X ad Y is determiistic. Hit: Use the fact that, for ay a, b, c R ad a b c a, the we have a b c 2 c b. 7

8 Lemma 3.4 (Beristei s iequality). If U,..., U are i.i.d. Beroulli radom variables with parameter p, the, ( ) ) P U i < p ɛ exp ( ɛ2 (2) 2p 2ɛ/3 3.2 Geeralizatio boud for hard SVM usig Rademacher complexity Here we prove geeralizatio boud for hard SVM.. We will resort to Thereom 3.2 whcih cotais the geeralizatio bouds based o the defiitio of the Rademacher average. For SVM, the hypothesis space is a class of ilear predictors: H = { w, x : w R d} with hige loss l(x, y; w) = max {0, y w, x } as the loss fucti. Defie F = l H S = {l(x, y ; w), l(x 2, y 2 ; w),..., l(x, y ; w)} sice the hige loss is -Lipchitz, ad assumig that x R, w B, usig Lemma 2.5 we have: R (F) BR/ I geeral for ay ρ-lipchitz fuctio, R (F) ρbr/. Pluggig this ito Theorem 3.2 we get the followig risk boud for SVM: Ef(X) E f(x) 2BR/ log /δ 2, with probability at least δ So how should we iterpret this? Suppose we make the assumptio that we kow the miimize of the empirical risk, which we deote with w which has zero empirical risk. Also B = w, H ca simply be the set of liear classifier which have orm smaller tha B. The the risk boud ca be refied to Ef(X) ˆL 2R w ( R w ) log /δ 2, with probability at least δ Ad ote that F is (B w )-Lipchitz. With risk boud, oe ca show that the sample complexity of hard-svm R2 w 2. ɛ 2 I practice w is ot kow. Oe way to fix this, is to use the doublig trick o the weight vector size boud B. Suppose B i = 2 i, H be all the liear models with weight orm less tha B i, δ i = 2/. For each i we ca write a iequaliy for the risk. A uio boud over all of the iequalities would give a uified boud which holds for all ws. 4 Gliveko-Catelli Theorem The Gliveko-Catelli guaratees uiform covergece bouds o empirical risk of the distributios. Our characterizatio of GC is based o Rademacher ad Fiite Class lemma, though this is ot the oly way to derive these results. First we itroduce the fiite class lemma which is a tool for boudig Rademacher averages. Details o basic formulatios here: 8

9 Lemma 4. (Fiite Class Lemma (Massart)). Let A be some fiite subset of R ad {σ i } m idepedet Rademacher radom variables, ad L = a A A, m σ i x i Proof. Defie, For ay λ R, e λµ E exp ( λ a A R (A) = E a A µ = E a A ) m σx i = E 2L log A m σ i x i = m R (A) exp a A ( ) ( ) m m λ σx i E exp λ σx i a A = ( ) m E exp λ σx i = m E exp (λσx i ) = m a A a A a A m exp ( λ 2 x i2 /2) m exp ( λ 2 L 2 /2 ) m A exp ( λ 2 L 2 /2 ) a A a A exp ( λx i ) exp (λx i ) 2 µ l A λ λl2 2. Set λ = 2 l A, ad we will have, µ L 2 l A L 2 More details: TBW The fiite class lemma could be geeralized to the class of biary-valued fuctios. Now defie F be class of biary valued fuctios, F = {f : Z {0, }}. I other words, give radom samples {Z i }, ad F(Z ) {(f(z ),..., f(z )) : f F}, We geeralize the boud usig the Rademacher boud for this class of fuctios, Lemma 4.2 (Rademacher boud for biary-valued fuctios). For class of biary-valued fuctios F, log F(Z R (F (Z )) 2 ) Proof. Proof i the Sectio 7. Theorem 4.3 (Gliveko-Catelli). Let, F (x) {X i x} 9

10 if, the for big eough. F (x) F (x) a.s 0 x Proof. The proof cosists of two mai pars. First usig the Rademacher for boudig the risk, ad the secod, usig the Fiite-Class lemma for boudig the Rademacher average. More details for later 5 Bibliographical otes The first use of Rademacher complexity for risk bouds is probably due to, 2. Refereces Peter L Bartlett ad Shahar Medelso. Rademacher ad gaussia complexities: Risk bouds ad structural results. Joural of Machie Learig Research, 3(Nov): , Vladimir Koltchiskii ad Dmitry Pacheko. Empirical margi distributios ad boudig the geeralizatio error of combied classifiers. Aals of Statistics, pages 50,

11 6 Appedix: Uio boud for risk Let s assume we have prove the followig boud for ay f F, p(l(f) L (f) a(δ)) δ, for ay f F which is equivalet to, L (f) L(f) b(δ) with probability at least δ (3) for some values a, b (fuctios of parameters). The, p( f F L (f) = 0 L(f) a) F δ or, equivaletly, L (f) L(f) b(δ/ F ) with probability at least δ Proof. p( f F L (f) = 0 L(f) a) p ( f F (L (f) = 0 L(f) a)) Now defie δ = δ F, ad the usig 3 we have f F p ((L (f) = 0 L(f) a)) F δ L (f) L(f) b(δ ) = L(f) b(δ/ F ) with probability at least δ which proves our desired statemet. 7 Proofs 7. Proof of lemma 4.2 Proof. Sice each f is a biary-valued fuctio, F {0, }. For ay set of samples {Z i }, ad ay fuctio f F, we kow, f(z i ) = For a fixed set of radom samples,{z i }, the set F(Z ) {(f(z ),..., f(z )) : f F} is equivalet to the set A, i Lemma 4., as N = F(Z ) 2 ad L =. As such, R (F (Z )) 2 log F(Z )

12 8 Aswers Here aswers to some of the questios are icluded. The aswers are mostly by the authors, ad might be buggy. Therefore, read cautiously! 8. Aswer to example First part : We ( first use the ) Berstei s iequality ad simplify it. Cosider the Equatio 2 ad take δ = exp ɛ2 2p2ɛ/3. The, ( 2 ɛ 2 3 l ) ɛ 2p l δ δ = 0 2 ( 3 ɛ = l δ ± 2 3 l ) 2 δ 8p l δ 2 Based o the assumptio of the iequality the ɛ 0 ad we ca choose the value with the sig i the about equatio. Usig this simplificatio, we ca rewrite the Berstei iequality i the followig equivalet form: EU U i 2 ( 3 l δ 2 3 l ) 2 δ 8p l δ 2, probability at least δ Now, for a specific f F, we ca cosider U i = {y i f(x i )} as a Beroulli distributio, with the probability of success defied by p = EU = L(f). The empirical estimatio is the Beroulli distributio is U i = {Y i f(x i )} = L (f). This we ca rewrite the boud as: L(f) L (f) l δ 3 Now we use the fact that, a b a b L(f) L (f) l δ 3 L (f) l δ 3 L (f) 2 l δ 3 Which proves the desired result. ( 2 3 l ) 2 δ 8L(f) l δ 2 ( 2 3 l ) 2 δ 8L(f) l δ 2 ( 2 3 l δ ) 2 8L(f) l δ, probability at least δ 2 2L(f) l δ, probability at least δ 2

13 8..2 Secod part : We use the hit o the boud which we foud i the previous part, i Equatio, with the followig defiitios: a = L(f), b = L (f) 2 log(/δ) 2 log(/δ), c = 3 This would imply the followig iequality: L(f) L (f) 2 log(/δ) 3 L(f) L (f) 8 log(/δ) 3 We use the iequality a b a b, L(f) L (f) 8 log(/δ) 3 ( ) 2 ( ) 2 log(/δ) 2 log(/δ) L (f) 2 log(/δ) 3 2L (f) log(/δ) 4 3 2L (f) log(/δ) 2L (f) log(/δ) 4 3 ( log(/δ) ( log(/δ) L (f) 8 log(/δ) ( 4 log(/δ) 3 3 ( 2 L (f) 3 8 ) log(/δ) 2L (f) log(/δ) log(/δ) 2L (f) log(/δ) =L (f) L (f) 4 log(/δ) 2L (f) log(/δ) Which proves the desired result. Now usig this boud, we prove the last part of the questio. Before that we state the uio boud for risk. Sice this boud holds for ay f F, this also holds for ˆf F. Based o the assumptio of the questio, the risk for this fuctio is zero. For a fixed ˆf F, if we have L ( ˆf) = 0, L(f) 4 log(/δ) sice ˆf is ot kow a priori ad it ca ay fuctio i the class of fuctios F, we eed to use the uio boud, as i Equatio 3: L(f) 4 log( F /δ) ) 2 ) 2 ) 2 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

Glivenko-Cantelli Classes

Glivenko-Cantelli Classes CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Agnostic Learning and Concentration Inequalities

Agnostic Learning and Concentration Inequalities ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

Review Problems 1. ICME and MS&E Refresher Course September 19, 2011 B = C = AB = A = A 2 = A 3... C 2 = C 3 = =

Review Problems 1. ICME and MS&E Refresher Course September 19, 2011 B = C = AB = A = A 2 = A 3... C 2 = C 3 = = Review Problems ICME ad MS&E Refresher Course September 9, 0 Warm-up problems. For the followig matrices A = 0 B = C = AB = 0 fid all powers A,A 3,(which is A times A),... ad B,B 3,... ad C,C 3,... Solutio:

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

ECE534, Spring 2018: Solutions for Problem Set #2

ECE534, Spring 2018: Solutions for Problem Set #2 ECE534, Srig 08: s for roblem Set #. Rademacher Radom Variables ad Symmetrizatio a) Let X be a Rademacher radom variable, i.e., X = ±) = /. Show that E e λx e λ /. E e λx = e λ + e λ = + k= k=0 λ k k k!

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 7

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 7 Statistical Machie Learig II Sprig 2017, Learig Theory, Lecture 7 1 Itroductio Jea Hoorio jhoorio@purdue.edu So far we have see some techiques for provig geeralizatio for coutably fiite hypothesis classes

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

1 Convergence in Probability and the Weak Law of Large Numbers

1 Convergence in Probability and the Weak Law of Large Numbers 36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec

More information

STAT Homework 1 - Solutions

STAT Homework 1 - Solutions STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

On the Theory of Learning with Privileged Information

On the Theory of Learning with Privileged Information O the Theory of Learig with Privileged Iformatio Dmitry Pechyoy NEC Laboratories Priceto, NJ 08540, USA pechyoy@ec-labs.com Vladimir Vapik NEC Laboratories Priceto, NJ 08540, USA vlad@ec-labs.com Abstract

More information

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15 17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

CS / MCS 401 Homework 3 grader solutions

CS / MCS 401 Homework 3 grader solutions CS / MCS 401 Homework 3 grader solutios assigmet due July 6, 016 writte by Jāis Lazovskis maximum poits: 33 Some questios from CLRS. Questios marked with a asterisk were ot graded. 1 Use the defiitio of

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

Optimal Sample-Based Estimates of the Expectation of the Empirical Minimizer

Optimal Sample-Based Estimates of the Expectation of the Empirical Minimizer Optimal Sample-Based Estimates of the Expectatio of the Empirical Miimizer Peter L. Bartlett Computer Sciece Divisio ad Departmet of Statistics Uiversity of Califoria, Berkeley 367 Evas Hall #3860, Berkeley,

More information

STAT Homework 2 - Solutions

STAT Homework 2 - Solutions STAT-36700 Homework - Solutios Fall 08 September 4, 08 This cotais solutios for Homework. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better isight.

More information

1 Approximating Integrals using Taylor Polynomials

1 Approximating Integrals using Taylor Polynomials Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et

More information

Selective Prediction

Selective Prediction COMS 6998-4 Fall 2017 November 8, 2017 Selective Predictio Preseter: Rog Zhou Scribe: Wexi Che 1 Itroductio I our previous discussio o a variatio o the Valiat Model [3], the described learer has the ability

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018) NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we

More information

Section 11.8: Power Series

Section 11.8: Power Series Sectio 11.8: Power Series 1. Power Series I this sectio, we cosider geeralizig the cocept of a series. Recall that a series is a ifiite sum of umbers a. We ca talk about whether or ot it coverges ad i

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 3 : Olie Learig, miimax value, sequetial Rademacher complexity Recap: Miimax Theorem We shall use the celebrated miimax theorem as a key tool to boud the miimax rate

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2. SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample

More information

MATH301 Real Analysis (2008 Fall) Tutorial Note #7. k=1 f k (x) converges pointwise to S(x) on E if and

MATH301 Real Analysis (2008 Fall) Tutorial Note #7. k=1 f k (x) converges pointwise to S(x) on E if and MATH01 Real Aalysis (2008 Fall) Tutorial Note #7 Sequece ad Series of fuctio 1: Poitwise Covergece ad Uiform Covergece Part I: Poitwise Covergece Defiitio of poitwise covergece: A sequece of fuctios f

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic

More information

arxiv: v1 [math.pr] 13 Oct 2011

arxiv: v1 [math.pr] 13 Oct 2011 A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

Peter L. Bartlett 1, Shahar Mendelson 2, 3 and Petra Philips Introduction

Peter L. Bartlett 1, Shahar Mendelson 2, 3 and Petra Philips Introduction ESAIM: Probability ad Statistics URL: http://www.emath.fr/ps/ Will be set by the publisher ON THE OPTIMALITY OF SAMPLE-BASED ESTIMATES OF THE EXPECTATION OF THE EMPIRICAL MINIMIZER, Peter L. Bartlett 1,

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities

On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

4.1 Data processing inequality

4.1 Data processing inequality ECE598: Iformatio-theoretic methods i high-dimesioal statistics Sprig 206 Lecture 4: Total variatio/iequalities betwee f-divergeces Lecturer: Yihog Wu Scribe: Matthew Tsao, Feb 8, 206 [Ed. Mar 22] Recall

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Empirical Processes: Glivenko Cantelli Theorems

Empirical Processes: Glivenko Cantelli Theorems Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

Fast Rates for Regularized Objectives

Fast Rates for Regularized Objectives Fast Rates for Regularized Objectives Karthik Sridhara, Natha Srebro, Shai Shalev-Shwartz Toyota Techological Istitute Chicago Abstract We study covergece properties of empirical miimizatio of a stochastic

More information

Probability for mathematicians INDEPENDENCE TAU

Probability for mathematicians INDEPENDENCE TAU Probability for mathematicias INDEPENDENCE TAU 2013 28 Cotets 3 Ifiite idepedet sequeces 28 3a Idepedet evets........................ 28 3b Idepedet radom variables.................. 33 3 Ifiite idepedet

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

HOMEWORK #10 SOLUTIONS

HOMEWORK #10 SOLUTIONS Math 33 - Aalysis I Sprig 29 HOMEWORK # SOLUTIONS () Prove that the fuctio f(x) = x 3 is (Riema) itegrable o [, ] ad show that x 3 dx = 4. (Without usig formulae for itegratio that you leart i previous

More information

The random version of Dvoretzky s theorem in l n

The random version of Dvoretzky s theorem in l n The radom versio of Dvoretzky s theorem i l Gideo Schechtma Abstract We show that with high probability a sectio of the l ball of dimesio k cε log c > 0 a uiversal costat) is ε close to a multiple of the

More information