Dimensionality reduction in Hilbert spaces

Size: px
Start display at page:

Download "Dimensionality reduction in Hilbert spaces"

Transcription

1 Dimesioality reductio i Hilbert spaces Maxim Ragisky October 3, 014 Dimesioality reductio is a geeric ame for ay procedure that takes a complicated object livig i a high-dimesioal (or possibly eve ifiite-dimesioal) space ad approximates it i some sese by a fiite-dimesioal vector. We are iterested i a particular class of dimesioality reductio methods. Cosider a data source that geerates vectors i some Hilbert space H, which is either ifiite-dimesioal or has a fiite but extremely large dimesio (thik R d with the usual Euclidea orm, where d is huge). We will assume that the vectors of iterest lie i the uit ball of H, { } B(H ) x H : x 1, where x x, x is the orm o H. We wish to represet each x B(H ) by a vector ŷ R k for some fixed k (if H is d-dimesioal, the of course we must have d k). For istace, k may represet some storage limitatio, such as a device that ca store o more tha k real umbers (or, more realistically, k double-precisio floatig-poit umbers, which for all practical purposes ca be thought of as real umbers). The mappig x ŷ ca be thought of as a ecodig rule. I additio, give ŷ R k, we eed a decodig rule that takes ŷ ad outputs a vector x H that will serve as a approximatio of x. I geeral, the cascade of mappigs x ecodig ŷ decodig x will be lossy, i.e., x x. So, the goal is to esure that the squared orm error x x is as small as possible. I this lecture, we will see how Rademacher complexity techiques ca be used to characterize the performace of a particular fairly broad class of dimesioality reductio schemes i Hilbert spaces. Our expositio here is based o a beautiful recet paper of Maurer ad Potil [MP10]. We will cosider a particular type of dimesioality reductio schemes, where the ecoder is a (oliear) projectio, whereas the decoder is a liear operator from R k ito H (the Appedix cotais some basic facts pertaiig to liear operators betwee Hilbert spaces). To specify such a scheme, we fix a pair (Y,T ) cosistig of a closed set Y R k ad a liear operator T : R k H. We call Y the codebook ad use the ecodig rule ŷ argmi x T y. (1) Uless Y is a closed subspace of R k, this ecodig map will be oliear. The decodig, o the other had, is liear: x T ŷ. With these defiitios, the recostructio error is give by x x f T (x) mi x T y. 1

2 Now suppose that the iput to our dimesioality reductio scheme is a radom vector X B(H ) with some ukow distributio P. The we measure the performace of the codig scheme (Y,T ) by its expected recostructio error ] L(T ) E P [f T (X )] E P [mi X T y (ote that, eve though the recostructio error depeds o the codebook Y, we do ot explicitly idicate this depedece, sice the choice of Y will be fixed by a particular applicatio). Now let T be some fixed class of admissible liear decodig maps T : R k H. So, if we kew P, we could fid the best decoder T T that achieves L (T ) if L(T ) (assumig, of course, that the ifimum exists ad is achieved by at least oe T T ). By ow, you kow the drill: We do t kow P, but we have access to a large set of samples X 1,..., X draw i.i.d. from P. So we attempt to lear T via ERM: T argmi argmi Our goal is to establish the followig result: 1 1 f T (X i ) mi X i T y. Theorem 1. Assume that Y is a closed subset of the uit ball B k { y R k : y 1 }, ad that every T T satisfies Te j α, T Y sup, y 0 1 j k T y α for some fiite α 1, where e 1,...,e k is the stadard basis of R k. The L( T ) L (T ) + 60α k + 4α log(1/δ) () with probability at lest 1 δ. I the special case whe Y {e 1,...,e k }, the stadard basis i R k, the evet L( T ) L (T ) + 40α k + 4α log(1/δ) (3) holds with probability at least 1 δ. Remark 1. The above result is slightly weaker tha the oe from [MP10]; as a cosequece, the costats i Eqs. () ad (3) are slightly worse tha they could otherwise be. 1 Examples Before we get dow to busiess ad prove the theorem, let s look at a few examples.

3 1.1 Pricipal compoet aalysis (PCA) The objective of PCA is, give k, costruct a projectio Π oto a k-dimesioal closed subspace of H to maximize the average eergy cotet of the projected vector: For ay x H, maximize E ΠX subject to dimπ(h ) k (4) Π Π Πx x (I Π)x, (5) where I is the idetity operator o H. To prove (5), expad the right-had side: x (I Π)x x x Πx x,πx Πx Πx, where the last step is by the properties of projectios. Thus, Πx x x Πx x mi x K x x, (6) where K is the rage of Π (the closure of the liear spa of all vectors of the form Πx, x H ). Moreover, ay projectio operator Π : H K with dim(h ) k ca be factored as T T, where T : R k H is a isometry (see Appedix for defiitios ad the proof of this fact). Usig this fact, we ca write { K Π(H ) T y : y R k}. Usig this i (6), we get Πx x mi y R k x T y. Hece, solvig the optimizatio problem (4) is equivalet to fidig the best liear decodig map T for the pair (Y,T ), where Y R k ad T is the collectio of all isometries T : R k H. Moreover, if we recall our assumptio that X B(H ) with probability oe, the we see that there is o loss of geerality if we take } Y B {y k R k : y 1, i.e., the uit ball i (R k, ). This follows from the fact that Πx x for ay projectio Π, so, i particular, for Π T T the ecodig ŷ i (1) ca be writte as ŷ T x, ad ŷ T ŷ T T x Πx x 1. Thus, Theorem 1 applies with α 1. That said, there are much tighter bouds for PCA that rely o deeper structural results pertaiig to fiite-dimesioal subspaces of Hilbert spaces, but that is beside the poit. The key idea here is that we ca already get ice bouds usig the tools already at our figertips. 3

4 1. Vector quatizatio or k-meas clusterig Vector quatizatio (or k-meas clusterig) is a procedure that takes a vector x H ad maps it to its earest eighbor i a fiite set C {ξ 1,...,ξ k } H, where k is a give positive iteger: x argmi x ξ. ξ C If X is radom with distributio P, the the optimal k-poit quatizer is give a size-k set C { ξ 1,..., ξ k } that miimizes the recostructio error ] E P [mi X ξ ξ C over all C H with C k. We ca cast the problem of fidig C i our framework by takig Y {e 1,...,e k } (the stadard basis i R k ) ad lettig T be the set of all liear operators T : R k H. It is easy to see that ay C H with C k ca be obtaied as a image of the stadard basis {e 1,...,e k } uder some liear operator T : R k H. Ideed, for ay C {ξ 1,...,ξ k }, we ca just defie a liear operator T : R k H by Te j ξ j, 1 j k ad the extedig it to all of R k by liearity: ( ) T y j e j y j Te j y j ξ j. So, aother way to iterpret the objective of vector quatizatio is as follows: give a distributio P supported o B(H ), we seek a k-elemet set C {ξ 1,...,ξ k } H, such that the radom vector X P ca be well-approximated o average by liear combiatios of the form y j ξ j, where the vector of coefficiets y (y 1,..., y k ) ca have oly oe ozero compoet, which is furthermore required to be equal to 1. I fact, there is o loss of geerality i assumig that C B(H ) as well. This is a cosequece of the fact that, for ay x B(H ) ad ay x H, we ca always fid some x B(H ) such that x x x x. Ideed, it suffices to take x argmi z B(H ) x z, ad it is ot hard to show that x x / x. Thus, Theorem 1 applies with α 1. Moreover, the excess risk grows liearly with dimesio k, cf. Eq. (3). It is ot kow whether this liear depedece o k is optimal there are Ω( k/) lower bouds for vector quatizatio, but it is still a ope questio whether these lower bouds are tight [MP10]. 4

5 1.3 Noegative matrix factorizatio Cosider approximatig the radom vector X P, where P is supported o the uit ball B(H ), by liear combiatios of the form y j ξ j, where the real vector y (y 1,..., y k ) is costraied to lie i the oegative orthat { } R k + y (y 1,..., y k ) R k : y j 0,1 j k, while the uit vectors ξ 1,...,ξ k B(H ) are costraied by the positivity coditio ξ j,ξ l H 0, 1 j,l k. This is a geeralizatio of the oegative matrix factorizatio (NMF) problem, origially posed by Lee ad Seug [LS99]. To cast NMF i our framework, let Y R k +, ad let T be the set of all liear operators T : Rk H such that (i) Te j 1 for all 1 j k ad (ii) Te j,te l 0 for all 1 j,l k. The the choice of T is equivalet to the choice of ξ 1,...,ξ k B(H ), as above. Moreover, it ca be show that, for ay x B(H ) ad ay T T, the miimum of x T y over all y R k + is achieved at some ŷ Rk + with ŷ 1. Thus, there is o loss of geerality if we take Y R k + Bk. I this case, the coditios of Theorem 1 are satisfied with α Sparse codig Take Y to be the l 1 uit ball B k 1 {y (y 1,..., y k ) R k : y 1 } y j 1, ad let T be the collectio of all liear operators T : R k H with Te j 1 for all 1 j k. I this case, the dimesioality reductio problem is to approximate a radom X B(H ) by a liear combiatio of the form y j ξ j, where y (y 1,..., y k ) R k satisfies the costrait y 1 1, while the vectors ξ 1,...,ξ k belog to the uit ball B(H ). The for ay y k y j e j Y we have T y y j Te j y j Te j y 1 max 1j k Te j 1, where the third lie is by Hölder s iequality. The the coditios of Theorem 1 are satisfied with α 1. 5

6 Proof of Theorem 1 Now we tur to the proof of Theorem 1. The format of the proof is the familiar oe: if we cosider the empirical recostructio error L (T ) 1 1 for every T T ad defie the uiform deviatio the f T (X i ) mi X i T y (X ) sup L (T ) L(T ), (7) L( T ) L (T ) + (X ). Now, for ay x B(H ), ay y Y, ad ay T T, we have 0 x T y x + T y 4α. Thus, the uiform deviatio (X ) has bouded differeces with c 1... c 4α /, so by McDiarmid s iequality, L( T ) L (T ) + E (X ) + 4α log(1/δ), (8) with probability at least 1 δ. By the usual symmetrizatio argumet, we obtai the boud E (X ) ER (F (X )), where F is the class of fuctios f T for all T T. Now, the whole affair higes o gettig a good upper boud o the Rademacher averages R (F (X )). We will do this i several steps, ad we eed to itroduce some additioal machiery alog the way..1 Gaussia averages Let γ 1,...,γ be i.i.d. stadard ormal radom variables. I aalogy to the Rademacher average of a bouded set A R, we ca defie the Gaussia average of A [BM0] as 1 G (A ) E γ sup γ i a i. Lemma 1 (Gaussia averages vs. Rademacher averages). π R (A ) G (A ). (9) 6

7 Proof. Let σ (σ 1,...,σ ) be a -tuple of i.i.d. Rademacher radom variables idepedet of γ. Sice each γ i is a symmetric radom variable, it has the same distributio as σ i γ i. Therefore, G (A ) 1 E γ sup γ i a i 1 E σ E γ sup σ i γ i a i 1 E σ sup σ i a i E γi γ i E γ 1 1 E σ sup σ i a i E γ 1 R (A ), where the secod step is by covexity, while i the last step we have used the fact that γ 1,...,γ are i.i.d. radom variables. Now, if γ is a stadard ormal radom variable, the Rearragig, we get (9). E γ 1 π 0 t e t / dt 1 te t / d t 1 0 π π π π. 0 te t / dt te t / dt Gaussia averages are ofte easier to work with tha Rademacher averages. The reaso for this is that, for ay real costats a 1,..., a, the sum W a a 1 γ a γ is a Gaussia radom variable with mea 0 ad variace a a. Moreover, for ay fiite collectio of vectors a(1),..., a (m) A, the radom variables W a (1),...,W a (m) are joitly Gaussia. Thus, the collectio of radom variables (W a ) is a zero-mea Gaussia process, where we say that a collectio of real-valued radom variables (W a ) is a Gaussia process if all fiite liear combiatios of the W a s are Gaussia radom variables. I particular, we ca compute covariaces: for ay a, a A, E[W a W a ] E [ ] γ i γ j a i a j E[γ i γ j ]a i a j a i a i a, a 7

8 ad thigs like [ ] E[(W a W a ) ] E γ i γ j (a i a i )(a j a j ) E [ ] γ i γ j (ai a i )(a j a j ) (a i a i ) a a. The latter quatities are hady because of a very useful result called Slepia s lemma [Sle6, LT91]: Lemma. Let (W a ) ad (V a ) be two zero-mea Gaussia processes with some idex set A (ot ecessarily a subset of R ), such that The E[(W a W a ) ] E[(V a V a ) ], a, a A. (10) E sup W a E sup V a. (11) Remark. The Gaussia processes (W a ),(V a ) that appear i Slepia s lemma are ot ecessarily of the form W a a,γ with γ (γ 1,...,γ ) a vector of idepedet Gaussias. They ca be arbitrarily collectios of radom variables idexed by the elemets of A, such that ay fiite liear combiatio of W a s or of V a s is Gaussia. Slepia s lemma is typically used to obtai upper bouds o the expected supremum of oe Gaussia process i terms of aother, which is hopefully easier to hadle. The oly wrikle is that we ca t apply Slepia s lemma to the problem of estimatig the Gaussia average G (A ) because of the absolute value. However, if all a A are uiformly bouded i orm, the absolute value makes little differece: Lemma 3. Let A R be a set of vectors uiformly bouded i orm, i.e., there exists some L < such that a L for all a A. Let The G (A ) 1 E [sup γ i a i ]. (1) L G (A ) G (A ) G (A ) + π. (13) Proof. The first iequality i (13) is obvious. For the secod iequality, pick a arbitrary a A, let W a γ i a i for ay a A, ad write G (A ) 1 ] [sup E W a 1 ] [sup E W a W a + 1 E W a. 8

9 Sice a was arbitrary, this gives { [ ] 1 G (A ) sup a A E sup W a W a + 1 } E W a. [ ] 1 E sup W a W a a,a A For ay two a, a, the radom variable W a W a is symmetric, so E [ sup W a W a a,a A ] + 1 sup E W a. (14) a A [ E sup W a ]. Moreover, for ay a A, W a is Gaussia with zero mea ad variace a L. Thus, sup E W a L E γ a A π L. Usig the two above formulas i (14), we get the secod iequality i (13), ad the lemma is proved. Armed with this lemma, we ca work with the quatity G (A ) istead of the Gaussia average G (A ). The advatage is that ow we ca rely o tools like Slepia s lemma.. Boudig the Rademacher average Now everythig higes o boudig the Gaussia average G (F (x )) for a fixed sample x (x 1,..., x ), which i tur will give us a boud o the Rademacher average R (F (x )), by Lemmas 1 ad 3. Let (γ i ) 1i, (γ i j ) 1i,1j k, ad (γ i j l ) 1i,1j,lk be mutually idepedet sequeces of i.i.d. stadard Gaussia radom variables. Defie the followig zero-mea Gaussia processes, idexed by T T : W T V T U T γ i f T (x i ), γ i j x i,te j, l1 Υ T 8V T + U T. γ i j l Te j,te l, By defiitio, G (F (x )) E sup 1 γ i f T (x i ) 1 E sup W T, 9

10 ad we defie G (F (x )) similarly. We will use Slepia s lemma to upper-boud G (F (x )) i terms of expected suprema of (V T ) ad (U T ). To that ed, we start with E [ (W T W T ) ] ( ft (x i ) f T (x i ) ) ( mi x i T y mi ( xi T y x i T y 8 ( max max ) x i T y ) xi,t y T y + T y T y max xi,t y T y + ) ( max T y T y ), (15) where i the third lie we have used properties of ier products, ad the last lie is by the iequality (a + b) a + b. Now, for each i, max xi,t y T y max y j x i,te j T e j max y xi,te j T e j xi,te j T e j, where i the secod step we have used Cauchy Schwarz. Summig over 1 i, we see that max xi,t y T y xi,te j T e j E [ (V T V T ) ]. (16) Similarly, Therefore, ( max T y T y ) max max ( l1 l1 max y 4 l1 y j y l Te j,te l T e j,t e l y j y l l1 l1 ) ( Te j,te l T e j,t e l ) ( Te j,te l T e j,t e l ) ( Te j,te l T e j,t e l ). ( max T y T y ) E [ (U T U T ) ]. (17) 10

11 Usig (16) ad (17) i (15), we have E [ (W T W T ) ] 8E [ (V T V T ) ] + E [ (U T U T ) ] E [ (Υ T Υ T ) ]. We ca therefore apply Slepia s lemma (Lemma ) to (W T ) ad (Υ T ) to write G (F (x )) 1 E sup W T Υ T 1 E sup 8 E sup V T + We ow upper-boud the expected suprema of V T ad U T. For the former, E sup V T E sup E sup γ i j x i,te j γ i j x i,te j E sup γ i j x i Te j E γ i j x i sup Te j α E γ i j x i α E γ i j γ i j x i, x i α α i 1 i 1 E sup U T. (18) (liearity) (Cauchy Schwarz) (assumptio o T ) (liearity) E [ ] γ i j γ i j xi, x i (Jese) x i (properties of i.i.d. Gaussias) αk. (x i B(H ) for all i ) 11

12 Similarly, for the latter, E sup U T E sup l1 l1 Substitutig these bouds ito (18), we have G (F (x )) 1 l1 γ i j l Te j,te l E sup γ i j l Te j,te l E α k π. γ i j l sup Te j Te l ( αk 8 + α k ) 5α k. π Thus, applyig Lemmas 1 ad 3, we have π R (F (x )) G (F (x )) π max G (F (x f T (x i ) )) + π π 15α k [ 10α k + π ] α Recallig (8), we see that the evet () holds with probability at least 1 δ. For the special case of k-meas clusterig, i.e., whe Y {e 1,...,e k }, we follow a slightly differet strategy. Defie a zero-mea Gaussia process Ξ T γ i j x i Te j, T T. The E [ (W T W T ) ] ( mi x i Te j mi i T e j 1j k 1j k ( max xi Te j x i T e j ) 1j k ( xi Te j x i T e j ) E [ (Ξ T Ξ T ) ]. ) 1

13 For the process (Ξ T ), we have E sup Ξ T E sup E sup E sup 3kα, γ i j x i Te j { γ i j xi x i,te j + Te j } { } γ i j x i,te j + γ i j Te j where the methods used to obtai this boud are similar to what we did for (V T ) ad (U T ). Usig Lemmas 1 3, we have π R (F (x )) G (F (x )) π π 10α k. G (F (x )) + [ 6α k + π max f T (x i ) π ] α Agai, recallig (8), we see that the evet (3) occurs with probability at least 1 δ. The proof of Theorem 1 is complete. A Liear operators betwee Hilbert spaces We assume, for simplicity, that all Hilbert spaces H of iterest are separable. By defiitio, a Hilbert space H is separable if it has a coutable dese subset: there exists a coutable set {h 1,h,...} H, such that for ay h H ad ay ε > 0 there exists some j N, for which h h j H < ε. Ay separable Hilbert space H has a coutable complete ad orthoormal basis, i.e., a coutable set {ϕ 1,ϕ,...} H with the followig properties: 1. Orthoormality ϕ i,ϕ j H δ i j ;. Completeess if there exists some h H which is orthogoal to all ϕ j s, i.e., h,ϕ j 0 for all j, the h 0. As a cosequece, ay h H ca be uiquely represeted as a ifiite liear combiatio h c j ϕ j, where c j h,ϕ j H, 13

14 where the ifiite series coverges i orm, i.e., for ay ε > 0 there exists some N, such that ϕ j ϕ j < ε. c H Moreover, h H c j. Let H ad K be two Hilbert spaces. A liear operator from H ito K is a mappig T : H K, such that (i) T (αh+α h ) αt h+α T h for ay two h,h H ad α,α R. A liear operator T : H K is bouded if T H K sup h H,h 0 T h K h H <. We will deote the space of all bouded liear operators T : H K by L (H,K ). Whe H K, we will write L (H ) istead. For ay operator T L (H,K ), we have the adjoit operator T L (K,H ), which is characterized by g,t h K T g,h H, g K,h H. If T L (H ) has the property that T T, we say that T is self-adjoit. Some examples: The idetity operator o H, deoted by I H, maps each h H to itself. I H is a self-adjoit operator with I H I H H H 1. We will ofte omit the idex H ad just write I. A projectio is a operator Π L (H ) satisfyig Π Π, i.e., Π(Πh) Πh for ay h H. This is a bouded operator with Π 1. Ay projectio is self-adjoit. A isometry is a operator T L (H,K ), such that T h K h H for all h H, i.e., T preserves orms. If T is a isometry, the T T I H, while T T L (K ) is a projectio. This is easy to see: (T T )(T T ) T (T T )T T T. If T L (H ) ad T L (H ) are both isometries, the T is called a uitary operator. If Π L (H ) is a projectio whose rage K H is a closed k-dimesioal subspace, the there exists a isometry T L (R k,k ), such that Π T T. Here, R k is a Hilbert space with the usual orm. To see this, let {ψ 1,...,ψ k } H be a orthoormal basis of K, ad complete it to a coutable basis {ψ 1,ψ,...,ψ k,ψ k+1,ψ k+,...} for the etire H. Here, the elemets of {ψ j } j k+1 are mutually orthoormal ad orthogoal to {ψ j } k. Ay h H has a uique represetatio h α j ψ j for some real coefficiets α 1,α,... With this, we ca write out the actio of Π explicitly as Πh α j ψ j. 14

15 Now cosider the map T : R k K that takes α (α 1,...,α k ) R k α j ψ j. It is easy to see that T is a isometry. Ideed, T α H j ψ j α k α j α. H The adjoit of T is easily computed: for ay α (α 1,...,α k ) R k ad ay h α j ψ j H, h,t α H Πh,T α H α j ψ j, α j ψ j α j α j T h,α. ( Sice this must hold for arbitrary α R k ad h H, we must have T h T ) α j ψ j (α 1,...,α j ). Now let s compute T h for ay h j α j ψ j : T T h T (T h) ( ( )) T T α j ψ j T ((α 1,...,α k )) α j ψ j Πh. Refereces [BM0] P. L. Bartlett ad S. Medelso. Rademacher ad Gaussia complexities: risk bouds ad structural results. Joural of Machie Learig Research, 3:463 48, 00. [LS99] D. D. Lee ad H. S. Seug. Learig the parts of objects by oegative matrix factorizatio. Nature, 401: , [LT91] M. Ledoux ad M. Talagrad. Probability i Baach Spaces: Isoperimetry ad Processes. Spriger, [MP10] A. Maurer ad M. Potil. K -dimesioal codig schemes i Hilbert spaces. IEEE Trasactios o Iformatio Theory, 56(11): , November 010. [Sle6] D. Slepia. The oe-sided barrier problem for Gaussia oise. Bell Systems Techical Joural, 41: ,

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Real Numbers R ) - LUB(B) may or may not belong to B. (Ex; B= { y: y = 1 x, - Note that A B LUB( A) LUB( B)

Real Numbers R ) - LUB(B) may or may not belong to B. (Ex; B= { y: y = 1 x, - Note that A B LUB( A) LUB( B) Real Numbers The least upper boud - Let B be ay subset of R B is bouded above if there is a k R such that x k for all x B - A real umber, k R is a uique least upper boud of B, ie k = LUB(B), if () k is

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

A Proof of Birkhoff s Ergodic Theorem

A Proof of Birkhoff s Ergodic Theorem A Proof of Birkhoff s Ergodic Theorem Joseph Hora September 2, 205 Itroductio I Fall 203, I was learig the basics of ergodic theory, ad I came across this theorem. Oe of my supervisors, Athoy Quas, showed

More information

Introduction to Optimization Techniques

Introduction to Optimization Techniques Itroductio to Optimizatio Techiques Basic Cocepts of Aalysis - Real Aalysis, Fuctioal Aalysis 1 Basic Cocepts of Aalysis Liear Vector Spaces Defiitio: A vector space X is a set of elemets called vectors

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Chapter 3 Inner Product Spaces. Hilbert Spaces

Chapter 3 Inner Product Spaces. Hilbert Spaces Chapter 3 Ier Product Spaces. Hilbert Spaces 3. Ier Product Spaces. Hilbert Spaces 3.- Defiitio. A ier product space is a vector space X with a ier product defied o X. A Hilbert space is a complete ier

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Lecture Notes for Analysis Class

Lecture Notes for Analysis Class Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1 Solutio Sagchul Lee October 7, 017 1 Solutios of Homework 1 Problem 1.1 Let Ω,F,P) be a probability space. Show that if {A : N} F such that A := lim A exists, the PA) = lim PA ). Proof. Usig the cotiuity

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

CHAPTER 5. Theory and Solution Using Matrix Techniques

CHAPTER 5. Theory and Solution Using Matrix Techniques A SERIES OF CLASS NOTES FOR 2005-2006 TO INTRODUCE LINEAR AND NONLINEAR PROBLEMS TO ENGINEERS, SCIENTISTS, AND APPLIED MATHEMATICIANS DE CLASS NOTES 3 A COLLECTION OF HANDOUTS ON SYSTEMS OF ORDINARY DIFFERENTIAL

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

The Maximum-Likelihood Decoding Performance of Error-Correcting Codes

The Maximum-Likelihood Decoding Performance of Error-Correcting Codes The Maximum-Lielihood Decodig Performace of Error-Correctig Codes Hery D. Pfister ECE Departmet Texas A&M Uiversity August 27th, 2007 (rev. 0) November 2st, 203 (rev. ) Performace of Codes. Notatio X,

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Homework Set #3 - Solutions

Homework Set #3 - Solutions EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

arxiv: v1 [math.pr] 4 Dec 2013

arxiv: v1 [math.pr] 4 Dec 2013 Squared-Norm Empirical Process i Baach Space arxiv:32005v [mathpr] 4 Dec 203 Vicet Q Vu Departmet of Statistics The Ohio State Uiversity Columbus, OH vqv@statosuedu Abstract Jig Lei Departmet of Statistics

More information

Math 61CM - Solutions to homework 3

Math 61CM - Solutions to homework 3 Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Riesz-Fischer Sequences and Lower Frame Bounds

Riesz-Fischer Sequences and Lower Frame Bounds Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

Lecture 23: Minimal sufficiency

Lecture 23: Minimal sufficiency Lecture 23: Miimal sufficiecy Maximal reductio without loss of iformatio There are may sufficiet statistics for a give problem. I fact, X (the whole data set) is sufficiet. If T is a sufficiet statistic

More information

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j. Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α

More information

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory 1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.

More information

Lecture 3 : Random variables and their distributions

Lecture 3 : Random variables and their distributions Lecture 3 : Radom variables ad their distributios 3.1 Radom variables Let (Ω, F) ad (S, S) be two measurable spaces. A map X : Ω S is measurable or a radom variable (deoted r.v.) if X 1 (A) {ω : X(ω) A}

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

5 Birkhoff s Ergodic Theorem

5 Birkhoff s Ergodic Theorem 5 Birkhoff s Ergodic Theorem Amog the most useful of the various geeralizatios of KolmogorovâĂŹs strog law of large umbers are the ergodic theorems of Birkhoff ad Kigma, which exted the validity of the

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Notes for Lecture 11

Notes for Lecture 11 U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Singular Continuous Measures by Michael Pejic 5/14/10

Singular Continuous Measures by Michael Pejic 5/14/10 Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [

More information

Notes 27 : Brownian motion: path properties

Notes 27 : Brownian motion: path properties Notes 27 : Browia motio: path properties Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces:[Dur10, Sectio 8.1], [MP10, Sectio 1.1, 1.2, 1.3]. Recall: DEF 27.1 (Covariace) Let X = (X

More information

TENSOR PRODUCTS AND PARTIAL TRACES

TENSOR PRODUCTS AND PARTIAL TRACES Lecture 2 TENSOR PRODUCTS AND PARTIAL TRACES Stéphae ATTAL Abstract This lecture cocers special aspects of Operator Theory which are of much use i Quatum Mechaics, i particular i the theory of Quatum Ope

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Solutions to home assignments (sketches)

Solutions to home assignments (sketches) Matematiska Istitutioe Peter Kumli 26th May 2004 TMA401 Fuctioal Aalysis MAN670 Applied Fuctioal Aalysis 4th quarter 2003/2004 All documet cocerig the course ca be foud o the course home page: http://www.math.chalmers.se/math/grudutb/cth/tma401/

More information

Mathematical Foundations -1- Sets and Sequences. Sets and Sequences

Mathematical Foundations -1- Sets and Sequences. Sets and Sequences Mathematical Foudatios -1- Sets ad Sequeces Sets ad Sequeces Methods of proof 2 Sets ad vectors 13 Plaes ad hyperplaes 18 Liearly idepedet vectors, vector spaces 2 Covex combiatios of vectors 21 eighborhoods,

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 Fuctioal Law of Large Numbers. Costructio of the Wieer Measure Cotet. 1. Additioal techical results o weak covergece

More information

A REMARK ON A PROBLEM OF KLEE

A REMARK ON A PROBLEM OF KLEE C O L L O Q U I U M M A T H E M A T I C U M VOL. 71 1996 NO. 1 A REMARK ON A PROBLEM OF KLEE BY N. J. K A L T O N (COLUMBIA, MISSOURI) AND N. T. P E C K (URBANA, ILLINOIS) This paper treats a property

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Linear Elliptic PDE s Elliptic partial differential equations frequently arise out of conservation statements of the form

Linear Elliptic PDE s Elliptic partial differential equations frequently arise out of conservation statements of the form Liear Elliptic PDE s Elliptic partial differetial equatios frequetly arise out of coservatio statemets of the form B F d B Sdx B cotaied i bouded ope set U R. Here F, S deote respectively, the flux desity

More information

Introduction to Optimization Techniques. How to Solve Equations

Introduction to Optimization Techniques. How to Solve Equations Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually

More information

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK)

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) Everythig marked by is ot required by the course syllabus I this lecture, all vector spaces is over the real umber R. All vectors i R is viewed as a colum

More information

Ma 4121: Introduction to Lebesgue Integration Solutions to Homework Assignment 5

Ma 4121: Introduction to Lebesgue Integration Solutions to Homework Assignment 5 Ma 42: Itroductio to Lebesgue Itegratio Solutios to Homework Assigmet 5 Prof. Wickerhauser Due Thursday, April th, 23 Please retur your solutios to the istructor by the ed of class o the due date. You

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

Review Problems 1. ICME and MS&E Refresher Course September 19, 2011 B = C = AB = A = A 2 = A 3... C 2 = C 3 = =

Review Problems 1. ICME and MS&E Refresher Course September 19, 2011 B = C = AB = A = A 2 = A 3... C 2 = C 3 = = Review Problems ICME ad MS&E Refresher Course September 9, 0 Warm-up problems. For the followig matrices A = 0 B = C = AB = 0 fid all powers A,A 3,(which is A times A),... ad B,B 3,... ad C,C 3,... Solutio:

More information

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero? 2 Lebesgue Measure I Chapter 1 we defied the cocept of a set of measure zero, ad we have observed that every coutable set is of measure zero. Here are some atural questios: If a subset E of R cotais a

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

MAT1026 Calculus II Basic Convergence Tests for Series

MAT1026 Calculus II Basic Convergence Tests for Series MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Math 525: Lecture 5. January 18, 2018

Math 525: Lecture 5. January 18, 2018 Math 525: Lecture 5 Jauary 18, 2018 1 Series (review) Defiitio 1.1. A sequece (a ) R coverges to a poit L R (writte a L or lim a = L) if for each ǫ > 0, we ca fid N such that a L < ǫ for all N. If the

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

2 Banach spaces and Hilbert spaces

2 Banach spaces and Hilbert spaces 2 Baach spaces ad Hilbert spaces Tryig to do aalysis i the ratioal umbers is difficult for example cosider the set {x Q : x 2 2}. This set is o-empty ad bouded above but does ot have a least upper boud

More information

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et

More information

Axioms of Measure Theory

Axioms of Measure Theory MATH 532 Axioms of Measure Theory Dr. Neal, WKU I. The Space Throughout the course, we shall let X deote a geeric o-empty set. I geeral, we shall ot assume that ay algebraic structure exists o X so that

More information

Bertrand s Postulate

Bertrand s Postulate Bertrad s Postulate Lola Thompso Ross Program July 3, 2009 Lola Thompso (Ross Program Bertrad s Postulate July 3, 2009 1 / 33 Bertrad s Postulate I ve said it oce ad I ll say it agai: There s always a

More information

Homework 2. Show that if h is a bounded sesquilinear form on the Hilbert spaces X and Y, then h has the representation

Homework 2. Show that if h is a bounded sesquilinear form on the Hilbert spaces X and Y, then h has the representation omework 2 1 Let X ad Y be ilbert spaces over C The a sesquiliear form h o X Y is a mappig h : X Y C such that for all x 1, x 2, x X, y 1, y 2, y Y ad all scalars α, β C we have (a) h(x 1 + x 2, y) h(x

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Beurling Integers: Part 2

Beurling Integers: Part 2 Beurlig Itegers: Part 2 Isomorphisms Devi Platt July 11, 2015 1 Prime Factorizatio Sequeces I the last article we itroduced the Beurlig geeralized itegers, which ca be represeted as a sequece of real umbers

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

1 The Haar functions and the Brownian motion

1 The Haar functions and the Brownian motion 1 The Haar fuctios ad the Browia motio 1.1 The Haar fuctios ad their completeess The Haar fuctios The basic Haar fuctio is 1 if x < 1/2, ψx) = 1 if 1/2 x < 1, otherwise. 1.1) It has mea zero 1 ψx)dx =,

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

Lecture 19. sup y 1,..., yn B d n

Lecture 19. sup y 1,..., yn B d n STAT 06A: Polyomials of adom Variables Lecture date: Nov Lecture 19 Grothedieck s Iequality Scribe: Be Hough The scribes are based o a guest lecture by ya O Doell. I this lecture we prove Grothedieck s

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information