18.S096: Principal Component Analysis in High Dimensions and the Spike Model

Size: px
Start display at page:

Download "18.S096: Principal Component Analysis in High Dimensions and the Spike Model"

Transcription

1 18.S096: Pricipal Compoet Aalysis i High Dimesios ad the Spike Model opics i Mathematics of Data Sciece (Fall 2015) Afoso S. Badeira badeira@mit.edu September 18, 2015 hese are lecture otes ot i fial form ad will be cotiuously edited ad/or corrected (as I am sure it cotais may typos). Please let me kow if you fid ay typo/mistake. Also, I am postig shorter versios of these otes (together with the ope problems) o my Blog, see Ba15b]. 0.1 Brief Review of some liear algebra tools I this Sectio we ll briefly review a few liear algebra tools that will be importat durig the course. If you eed a refresh o ay of these cocepts, I recommed takig a look at HJ85] ad/or Gol96] Sigular Value Decompositio he Sigular Value Decompositio (SVD) is oe of the most useful tools for this course! Give a matrix M R m, the SVD of M is give by M = UΣV, (1) where U O(m), V O() are orthogoal matrices (meaig that U U = UU = I ad V V = V V = I) ad Σ R m is a matrix with o-egative etries i its diagoal ad otherwise zero etries. he colums of U ad V are referred to, respectively, as left ad right sigular vectors of M ad the diagoal elemets of Σ as sigular values of M. Remark 0.1 Say m, it is easy to see that we ca also thik of the SVD as havig U R m where UU = I, Σ R a diagoal matrix with o-egative etries ad V O() Spectral Decompositio If M R is symmetric the it admits a spectral decompositio M = V ΛV, 1

2 where V O() is a matrix whose colums v k are the eigevectors of M ad Λ is a diagoal matrix whose diagoal elemets λ k are the eigevalues of M. Similarly, we ca write M = λ k v k vk. Whe all of the eigevalues of M are o-egative we say that M is positive semidefiite ad write M 0. I that case we ca write M = ( V Λ 1/2) ( V Λ 1/2). A decompositio of M of the form M = UU (such as the oe above) is called a Cholesky decompositio. he spectral orm of M is defied as race ad orm Give a matrix M R, its trace is give by r(m) = M = max λ k (M). k M kk = λ k (M). Its Frobeiues orm is give by M F = Mij 2 = r(m M) A particularly importat property of the trace is that: r(ab) = ij A ij B ji = r(ba). i,j=1 Note that this implies that, e.g., r(abc) = r(cab), it does ot imply that, e.g., r(abc) = r(acb) which is ot true i geeral! 0.2 Quadratic Forms Durig the course we will be iterested i solvig problems of the type ( max r V MV ), V R d V V =I d d where M is a symmetric matrix. 2

3 Note that this is equivalet to max v 1,...,v d R v i v j=δ ij d vk Mv k, (2) where δ ij is the Kroecker delta (is 1 is i = j ad 0 otherwise). Whe d = 1 this reduces to the more familiar max v Mv. (3) v R v 2 =1 It is easy to see (for example, usig the spectral decompositio of M) that (3) is maximized by the leadig eigevector of M ad max v Mv = λ max (M). v R v 2 =1 It is also ot very difficult to see (it follows for example from a heorem of Fa (see, for example, page 3 of Mos11]) that (2) is maximized by takig v 1,..., v d to be the k leadig eigevectors of M ad that its value is simply the sum of the k largest eigevalues of M. he ice cosequece of this is that the solutio to (2) ca be computed sequetially: we ca first solve for d = 1, computig v 1, the v 2, ad so o. Remark 0.2 All of the tools ad results above have atural aalogues whe the matrices have complex etries (ad are Hermitia istead of symmetric). 1.1 Dimesio Reductio ad PCA Whe faced with a high dimesioal dataset, a atural approach is to try to reduce its dimesio, either by projectig it to a lower dimesio space or by fidig a better represetatio for the data. Durig this course we will see a few differet ways of doig dimesio reductio. We will start with Pricipal Compoet Aalysis (PCA). I fact, PCA cotiues to be oe of the best (ad simplest) tools for exploratory data aalysis. Remarkably, it dates back to a 1901 paper by Karl Pearso Pea01]! Let s say we have data poits x 1,..., x i R p, for some p, ad we are iterested i (liearly) projectig the data to d < p dimesios. his is particularly useful if, say, oe wats to visualize the data i two or three dimesios. here are a couple of differet ways we ca try to choose this projectio: 1. Fidig the d-dimesioal affie subspace for which the projectios of x 1,..., x o it best approximate the origial poits x 1,..., x. 2. Fidig the d dimesioal projectio of x 1,..., x that preserved as much variace of the data as possible. As we will see below, these two approaches are equivalet ad they correspod to Pricipal Compoet Aalysis. 3

4 Before proceedig, we recall a couple of simple statistical quatities associated with x 1,..., x, that will reappear below. Give x 1,..., x we defie its sample mea as ad its sample covariace as Σ = 1 1 µ = 1 x k, (4) (x k µ ) (x k µ ). (5) Remark 1.3 If x 1,..., x are idepedetly sampled from a distributio, µ ad Σ are ubiased estimators for, respectively, the mea ad covariace of the distributio. We will start with the first iterpretatio of PCA ad the show that it is equivalet to the secod PCA as best d-dimesioal affie fit We are tryig to approximate each x k by x k µ + d (β k ) i v i, (6) i=1 where v 1,..., v d is a orthoormal basis for the d-dimesioal subspace, µ R p represets the traslatio, ad β k correspods to the coefficiets of x k. If we represet the subspace by V = v 1 v d ] R p d the we ca rewrite (7) as x k µ + V β k, (7) where V V = I d d as the vectors v i are orthoormal. We will measure goodess of fit i terms of least squares ad attempt to solve mi µ, V, β k V V =I x k (µ + V β k ) 2 2 (8) We start by optimizig for µ. It is easy to see that the first order coditios for µ correspod to µ x k (µ + V β k ) 2 2 = 0 (x k (µ + V β k )) = 0. hus, the optimal value µ of µ satisfies ( ) x k µ V ( ) β k = 0. 4

5 Because β k = 0 we have that the optimal µ is give by µ = 1 x k = µ, the sample mea. We ca the proceed o fidig the solutio for (9) by solvig mi V, β k V V =I x k µ V β k 2 2. (9) Let us proceed by optimizig for β k. Sice the problem decouples for each k, we ca focus o, for each k, mi x k µ V β k 2 2 = mi d 2 β k β k x k µ (β k ) i v i. (10) Sice v 1,..., v d are orthoormal, it is easy to see that the solutio is give by (β k ) i = v i (x k µ ) which ca be succictly writte as β k = V (x k µ ). hus, (9) is equivalet to mi V V =I i=1 (x k µ ) V V (x k µ ) 2 2. (11) Note that (x k µ ) V V (x k µ ) 2 2 = (x k µ ) (x k µ ) 2 (x k µ ) V V (x k µ ) + (x k µ ) V ( V V ) V (x k µ ) = (x k µ ) (x k µ ) (x k µ ) V V (x k µ ). Sice (x k µ ) (x k µ ) does ot deped o V, miimizig (9) is equivalet to max V V =I (x k µ ) V V (x k µ ). (12) A few more simple algebraic maipulatios usig properties of the trace: ] (x k µ ) V V (x k µ ) = r (x k µ ) V V (x k µ ) = = r ] r V (x k µ ) (x k µ ) V V ] (x k µ ) (x k µ ) V = ( 1) r V Σ V ]. 5 2

6 his meas that the solutio to (13) is give by max r V Σ V ]. (13) V V =I As we saw above (recall (2)) the solutio is give by V = v 1,, v d ] where v 1,..., v d correspod to the d leadig eigevectors of Σ. Let us first show that iterpretatio (2) of fidig the d-dimesioal projectio of x 1,..., x that preserves the most variace also arrives to the optimizatio problem (13) PCA as d-dimesioal projectio that preserves the most variace We aim to fid a orthoormal basis v 1,..., v d (orgaized as V = v 1,..., v d ] with V V = I d d ) of a d-dimesioal space such that the projectio of x 1,..., x projected o this subspace has the most variace. Equivaletly we ca ask for the poits v1 x k., vd x k to have as much variace as possible. Hece, we are iterested i solvig Note that V x k 1 max V V =I 2 V x r = r=1 V x k 1 2 V x r. (14) r=1 V (x k µ ) 2 = r ( V Σ V ), showig that (14) is equivalet to (13) ad that the two iterpretatios of PCA are ideed equivalet Fidig the Pricipal Compoets Whe give a dataset x 1,..., x R p, i order to compute the Pricipal Compoets oe eeds to fid the leadig eigevectors of Σ = 1 1 (x k µ ) (x k µ ). A aive way of doig this would be to costruct Σ (which takes O(p 2 ) work) ad the fidig its spectral decompositio (which takes O(p 3 ) work). his meas that the computatioal complexity of this procedure is O ( max { p 2, p 3}) (see HJ85] ad/or Gol96]). A alterative is to use the Sigular Value Decompositio (1). Let X = x 1 x ] recall that, Σ = 1 ( X µ 1 ) ( X µ 1 ). 6

7 Let us take the SVD of X µ 1 = U L DU R with U L O(p), D diagoal, ad U R U R = I. he, Σ = 1 ( X µ 1 ) ( X µ 1 ) = UL DU R U R DU L = U L D 2 U L, meaig that U L correspod to the eigevectors of Σ. Computig the SVD of X µ 1 takes O(mi 2 p, p 2 ) but if oe is iterested i simply computig the top d eigevectors the this computatioal costs reduces to O(dp). his ca be further improved with radomized algorithms. here are radomized algorithms that compute a approximate solutio i O ( p log d + (p + )d 2) time (see for example HM09, RS09, MM15]) Which d should we pick? Give a dataset, if the objective is to visualize it the pickig d = 2 or d = 3 might make the most sese. However, PCA is useful for may other purposes, for example: (1) ofte times the data belogs to a lower dimesioal space but is corrupted by high dimesioal oise. Whe usig PCA it is oftetimess possible to reduce the oise while keepig the sigal. (2) Oe may be iterested i ruig a algorithm that would be too computatioally expesive to ru i high dimesios, dimesio reductio may help there, etc. I these applicatios (ad may others) it is ot clear how to pick d. If we deote the k-th largest eigevalue of Σ as λ (+) (Σ ), the the k-th pricipal compoet has a λ(+) k (Σ) r(σ ) proportio of the variace. 2 A fairly popular heuristic is to try to choose the cut-off at a compoet that has sigificatly more variace tha the oe immediately after. his is usually visualized by a scree plot: a plot of the values of the ordered eigevalues. Here is a example: k 1 If there is time, we might discuss some of these methods later i the course. 2 Note that r (Σ ) = p λ k (Σ ). 7

8 It is commo to the try to idetify a elbow o the scree plot to choose the cut-off. I the ext Sectio we will look ito radom matrix theory to try to uderstad better the behavior of the eigevalues of Σ ad it will help us uderstad whe to cut-off A related ope problem We ow show a iterestig ope problem posed by Mallat ad Zeitoui at MZ11] Ope Problem 1.1 (Mallat ad Zeitoui MZ11]) Let g N (0, Σ) be a gaussia radom vector i R p with a kow covariace matrix Σ ad d < p. Now, for ay orthoormal basis V = v 1,..., v p ] of R p, cosider the followig radom variable Γ V : Give a draw of the radom vector g, Γ V is the squared l 2 orm of the largest projectio of g o a subspace geerated by d elemets of the basis V. he questio is: What is the basis V for which E Γ V ] is maximized? he cojecture i MZ11] is that the optimal basis is the eigedecompositio of Σ. It is kow that this is the case for d = 1 (see MZ11]) but the questio remais ope for d > 1. It is ot very difficult to see that oe ca assume, without loss of geerality, that Σ is diagoal. A particularly ituitive way of statig the problem is: 1. Give Σ R p p ad d 2. Pick a orthoormal basis v 1,..., v p 3. Give g N (0, Σ) 4. Pick d elemets ṽ 1,..., ṽ d of the basis 5. Score: d (ṽ i=1 i g ) 2 he objective is to pick the basis i order to maximize the expected value of the Score. Notice that if the steps of the procedure were take i a slightly differet order o which step 4 would take place before havig access to the draw of g (step 3) the the best basis is ideed the eigebasis of Σ ad the best subset of the basis is simply the leadig eigevectors (otice the resemblace with PCA, as described above). More formally, we ca write the problem as fidig argmax V R p p V V =I E max S p] S =d ( v i g ) 2, where g N (0, Σ). he observatio regardig the differet orderig of the steps amouts to sayig that the eigebasis of Σ is the optimal solutio for ( argmax max E v i g ) ] 2. V R p p S p] V V =I S =d i S i S 8

9 1.2 PCA i high dimesios ad Marceko-Pastur Let us assume that the data poits x 1,..., x R p are idepedet draws of a gaussia radom variable g N (0, Σ) for some covariace Σ R p p. I this case whe we use PCA we are hopig to fid low dimesioal structure i the distributio, which should correspod to large eigevalues of Σ (ad their correspodig eigevectors). For this reaso (ad sice PCA depeds o the spectral properties of Σ ) we would like to uderstad whether the spectral properties of Σ (eigevalues ad eigevectors) are close to the oes of Σ. Sice EΣ = Σ, if p is fixed ad the law of large umbers guaratees that ideed Σ Σ. However, i may moder applicatios it is ot ucommo to have p i the order of (or, sometimes, eve larger!). For example, if our dataset is composed by images the is the umber of images ad p the umber of pixels per image; it is coceivable that the umber of pixels be o the order of the umber of images i a set. Ufortuately, i that case, it is o loger clear that Σ Σ. Dealig with this type of difficulties is the realm of high dimesioal statistics. For simplicity we will istead try to uderstad the spectral properties of S = 1 XX. Sice x N (0, Σ) we kow that µ 0 (ad, clearly, 1 1) the spectral properties of S will be essetially the same as Σ. 3 Let us start by lookig ito a simple example, Σ = I. I that case, the distributio has o low dimesioal structure, as the distributio is rotatio ivariat. he followig is a histogram (left) ad a scree plot of the eigevalues of a sample of S (whe Σ = I) for p = 500 ad = he red lie is the eigevalue distributio predicted by the Marcheko-Pastur distributio (15), that we will discuss below. 3 I this case, S is actually the Maximum likelihood estimator for Σ, we ll talk about Maximum likelihood estimatio later i the course. 9

10 As oe ca see i the image, there are may eigevalues cosiderably larger tha 1 (ad some cosiderably larger tha others). Notice that, if give this profile of eigevalues of Σ oe could potetially be led to believe that the data has low dimesioal structure, whe i truth the distributio it was draw from is isotropic. Uderstadig the distributio of eigevalues of radom matrices is i the core of Radom Matrix heory (there are may good books o Radom Matrix heory, e.g. ao12] ad AGZ10]). his particular limitig distributio was first established i 1967 by Marcheko ad Pastur MP67] ad is ow referred to as the Marcheko-Pastur distributio. hey showed that, if p ad are both goig to with their ratio fixed p/ = γ 1, the sample distributio of the eigevalues of S (like the histogram above), i the limit, will be df γ (λ) = 1 (γ+ λ) (λ γ ) 1 2π γλ γ,γ + ](λ)dλ, (15) with support γ, γ + ]. his is plotted as the red lie i the figure above. Remark 1.4 We will ot show the proof of the Marcheko-Pastur heorem here (you ca see, for example, Bai99] for several differet proofs of it), but a approach to a proof is usig the so-called momet method. he core of the idea is to ote that oe ca compute momets of the eigevalue distributio i two ways ad ote that (i the limit) for ay k, ( ) ] 1 1 k p E r XX = 1p ( ) E r S k = E 1 p γ+ λ k i (S ) = λ k df γ (λ), p ad that the quatities 1 p E r ( 1 XX ) k ] ca be estimated (these estimates rely essetially i combiatorics). he distributio df γ (λ) ca the be computed from its momets A related ope problem Ope Problem 1.2 (Mootoicity of sigular values BKS13]) Cosider the settig above but with p =, the X R is a matrix with iid N (0, 1) etries. Let ( ) 1 σ i X, deote the i-th sigular value 4 of 1 X, ad defie α R () := E 1 as the expected value of the average sigular value of 1 X. he cojecture is that, for every 1, i=1 i=1 σ i ( 1 X) ], α R ( + 1) α R (). 4 he i-th diagoal elemet of Σ i the SVD 1 X = UΣV. γ 10

11 Moreover, for the aalogous quatity α C () defied over the complex umbers, meaig simply that each etry of X is a iid complex valued stadard gaussia CN (0, 1) the reverse iequality is cojectured for all 1: α C ( + 1) α C (). Notice that the sigular values of 1 X are simply the square roots of the eigevalues of S, ( ) 1 σ i X = λ i (S ). his meas that we ca compute α R i the limit (sice we kow the limitig distributio of λ i (S )) ad get (sice p = we have γ = 1, γ = 0, ad γ + = 2) lim α R() = 2 0 λ 1 2 df1 (λ) = 1 2π 2 λ (2 λ) λ λ = 8 3π g Also, α R (1) simply correspods to the expected value of the absolute value of a stadard gaussia α R (1) = E g = 2 π , which is compatible with the cojecture. O the complex valued side, the Marcheko-Pastur distributio also holds for the complex valued case ad so lim α C () = lim α R () ad α C (1) ca also be easily calculated ad see to be larger tha the limit. 2 Spike Models ad BPP trasitio What if there actually is some (liear) low dimesioal structure o the data? Whe ca we expect to capture it with PCA? A particularly simple, yet relevat, example to aalyse is whe the covariace matrix Σ is a idetity with a rak 1 perturbatio, which we refer to as a spike model Σ = I + βvv, for v a uit orm vector ad β 0. Oe way to thik about this istace is as each data poit x cosistig of a sigal part βg 0 v where g 0 is a oe-dimesioal stadard gaussia (a gaussia multiple of a fixed vector βv ad a oise part g N (0, I) (idepedet of g 0. he x = g + βg 0 v is a gaussia radom variable x N (0, I + βvv ). A atural questio is whether this rak 1 perturbatio ca be see i S. Let us build some ituitio with a example, the followig is the histogram of the eigevalues of a sample of S for p = 500, = 1000, v is the first elemet of the caoical basis v = e 1, ad β = 1.5: 11

12 he images suggests that there is a eigevalue of S that pops out of the support of the Marcheko-Pastur distributio (below we will estimate the locatio of this eigevalue, ad that estimate correspods to the red x ). It is worth oticig that the largest eigevalues of Σ is simply 1 + β = 2.5 while the largest eigevalue of S appears cosiderably larger tha that. Let us try ow the same experimet for β = 0.5: ad it appears that, for β = 0.5, the distributio of the eigevalues appears to be udistiguishable from whe Σ = I. his motivates the followig questio: Questio 2.1 For which values of γ ad β do we expect to see a eigevalue of S poppig out of the support of the Marcheko-Pastur distributio, ad what is the limit value that we expect it to take? 12

13 As we will see below, there is a critical value of β below which we do t expect to see a chage i the distributio of eivealues ad above which we expect oe of the eigevalues to pop out of the support, this is kow as BPP trasitio (after Baik, Be Arous, ad Péché BBAP05]). here are may very ice papers about this pheomeo, icludig Pau, Joh01, BBAP05, Pau07, BS05, Kar05]. 5 I what follows we will fid the critical value of β ad estimate the locatio of the largest eigevalue of S. While the argumet we will use ca be made precise (ad is borrowed from Pau]) we will be igorig a few details for the sake of expositio. I short, the argumet below ca be trasformed ito a rigorous proof, but it is ot oe at the preset form! First of all, it is ot difficult to see that we ca assume that v = e 1 (sice everythig else is rotatio ivariat). We wat to uderstad the behavior of the leadig eigevalue of S = 1 x i x i = 1 XX, i=1 where We ca write X as X = x 1,..., x ] R p. 1 + βz X = 1 Z 2 ], where Z 1 R 1 ad Z 2 R (p 1), both populated with i.i.d. stadard gaussia etries (N (0, 1)). he, S = 1 XX = 1 ] (1 + β)z 1 Z βz 1 1 Z 2 + βz 2 Z 1 Z2 Z. 2 Now, let ˆλ ad v = v1 v 2 ] where v 2 R p 1 ad v 1 R, deote, respectively, a eigevalue ad associated eigevector for S. By the defiitio of eigevalue ad eigevector we have ] ] ] 1 (1 + β)z 1 Z βz 1 1 Z 2 v1 + βz 2 Z 1 Z2 Z = ˆλ v1, 2 v 2 v 2 which ca be rewritte as (17) is equivalet to 1 (1 + β)z 1 Z 1 v βz 1 Z 2 v 2 = ˆλv 1 (16) βz 2 Z 1 v Z 2 Z 2 v 2 = ˆλv 2. (17) βz 2 Z 1 v 1 = (ˆλ I 1 ) Z2 Z 2 v 2. 5 Notice that the Marcheko-Pastur theorem does ot imply that all eigevalues are actually i the support of the Marchek-Pastur distributio, it just rules out that a o-vaishig proportio are. However, it is possible to show that ideed, i the limit, all eigevalues will be i the support (see, for example, Pau]). 13

14 If ˆλ I 1 Z 2 Z 2 is ivertible (this wo t be justified here, but it is i Pau]) the we ca rewrite it as ( v 2 = ˆλ I 1 ) 1 1 Z 2 Z βz 2 Z 1 v 1, which we ca the plug i (16) to get 1 (1 + β)z 1 Z 1 v βz 1 Z 2 (ˆλ I 1 ) 1 1 Z 2 Z βz 2 Z 1 v 1 = ˆλv 1 If v 1 0 (agai, ot properly justified here, see Pau]) the this meas that ˆλ = 1 (1 + β)z 1 Z βz 1 Z 2 (ˆλ I 1 ) 1 1 Z 2 Z βz 2 Z 1 (18) First observatio is that because Z 1 R has stadard gaussia etries the 1 Z 1 Z 1 1, meaig that ˆλ = (1 + β) ( Z1 Z 2 ˆλ I 1 ) ] 1 1 Z 2 Z 2 Z 2 Z 1. (19) Cosider the SVD of Z 2 = UΣV where U R p ad V R p p have orthoormal colums (meaig that U U = I p p ad V V = I p p ), ad Σ is a diagoal matrix. ake D = 1 Σ2 the 1 Z 2 Z 2 = 1 V Σ2 V = V DV, meaig that the diagoal etries of D correspod to the eigevalues of 1 Z 2 Z 2 which we expect to be distributed (i the limit) accordig to the Marcheko-Pastur distributio for p 1 γ. Replacig back i (19) ˆλ = (1 + β) ( UD ) (ˆλ ) Z 1 1/2 V 1 I V DV 1 ( UD ) ] 1/2 V Z1 = (1 + β) ( U ) (ˆλ ) Z 1 D 1/2 V 1 I V DV V D 1/2 ( U ) ] Z 1 = (1 + β) ( U ) ( ] Z 1 D 1/2 V V ˆλ ) I D V 1 V D 1/2 ( U ) ] Z 1 = (1 + β) ( U ) (ˆλ ]) 1 Z 1 D 1/2 I D D 1/2 ( U ) ] Z 1. Sice the colums of U are orthoormal, g := U Z 1 R p 1 is a isotropic gaussia (g N (0, 1)), i fact, Egg = EU ( Z 1 U ) Z 1 = EU Z 1 Z1 U = U E Z 1 Z1 ] U = U U = I (p 1) (p 1). We proceed ˆλ = (1 + β) (ˆλ ]) ] 1 g D 1/2 I D D 1/2 g = (1 + β) p 1 gj 2 D jj ˆλ D jj j=1 14

15 Because we expect the diagoal etries of D to be distributed accordig to the Marcheko-Pastur distributio ad g to be idepedet to it we expect that (agai, ot properly justified here, see Pau]) 1 p 1 p 1 We thus get a equatio for ˆλ: j=1 g 2 j ˆλ = (1 + β) D γ+ jj ˆλ D jj γ γ+ 1 + γ γ x ˆλ x df γ(x). ] x ˆλ x df γ(x), which ca be easily solved with the help of a program that computes itegrals symbolically (such as Mathematica) to give (you ca also see Pau] for a derivatio): ( ˆλ = (1 + β) 1 + γ ), (20) β which is particularly elegat (specially cosiderig the size of some the equatios used i the derivatio). A importat thig to otice is that for β = γ we have ˆλ = (1 + ( γ) 1 + γ ) = (1 + γ) 2 = γ +, γ suggestig that β = γ is the critical poit. Ideed this is the case ad it is possible to make the above argumet rigorous 6 ad show that i the model described above, If β γ the λ max (S ) γ +, ad if β > γ the ( λ max (S ) (1 + β) 1 + γ ) > γ +. β Aother importat questio is wether the leadig eigevector actually correlates with the plated perturbatio (i this case e 1 ). urs out that very similar techiques ca aswer this questio as well Pau] ad show that the leadig eigevector v max of S will be o-trivially correlated with e 1 if ad oly if β > γ, more precisely: If β γ the ad if β > γ the v max, e 1 2 0, v max, e γ β 2 1 γ β 6 Note that i the argumet above it was t eve completely clear where it was used that the eigevalue was actually the leadig oe. I the actual proof oe first eeds to make sure that there is a eigevalue outside of the support ad the proof oly holds for that oe, you ca see Pau]. 15

16 2.0.2 A brief metio of Wiger matrices Aother very importat radom matrix model is the Wiger matrix (ad it will show up later i this course). Give a iteger, a stadard gaussia Wiger matrix W R is a symmetric matrix with idepedet N (0, 1) etries (except for the fact that W ij = W ji ). I the limit, the eigevalues 1 of W are distributed accordig to the so-called semi-circular law dsc(x) = 1 4 x 2π 2 1 2,2] (x)dx, ad there is also a BPP like trasitio for this matrix esemble FP06]. More precisely, if v is a uit-orm vector i R 1 ad ξ 0 the the largest eigevalue of W + ξvv satisfies If ξ 1 the ad if ξ > 1 the λ max ( 1 W + ξvv ) 2, λ max ( 1 W + ξvv ) ξ + 1 ξ. (21) A ope problem about spike models Ope Problem 2.1 (Spike Model for cut SDP MS15]) Let W deote a symmetric Wiger matrix with i.i.d. etries W ij N (0, 1). Also, give B R symmetric, defie: Defie q(ξ) as What is the value of ξ, defied as Q(B) = max {r(bx) : X 0, X ii = 1}. ( 1 ξ q(ξ) = lim EQ ) W. ξ = if{ξ 0 : q(ξ) > 2}. It is kow that, if 0 ξ 1, q(ξ) = 2 MS15]. Oe ca show that 1 Q(B) λ max(b). I fact, max {r(bx) : X 0, X ii = 1} max {r(bx) : X 0, r X = }. It is also ot difficult to show (hit: take the spectral decompositio of X) that { } max r(bx) : X 0, X ii = = λ max (B). his meas that for ξ > 1, q(ξ) ξ + 1 ξ. i=1 16

17 Remark 2.1 Optimizatio problems of the type of max {r(bx) : X 0, X ii = 1} are semidefiite programs, they will be a major player later i the course! ( )] Sice 1 11 E r ξ W ξ, by takig X = 11 we expect that q(ξ) ξ. hese observatios imply that 1 ξ < 2 (see MS15]). A reasoable cojecture is that it is equal to 1. his would imply that a certai semidefiite programmig based algorithm for clusterig uder the Stochastic Block Model o 2 clusters (we will discuss these thigs later i the course) is optimal for detectio (see MS15]). 7 Refereces ABH14] E. Abbe, A. S. Badeira, ad G. Hall. Exact recovery i the stochastic block model. Available olie at arxiv: cs.si], AGZ10] G. W. Aderso, A. Guioet, ad O. Zeitoui. A itroductio to radom matrices. Cambridge studies i advaced mathematics. Cambridge Uiversity Press, Cambridge, New York, Melboure, Bai99] Z. D. Bai. Methodologies i spectral aalysis of large dimesioal radom matrices, a review. Statistics Siica, 9: , Ba15a] Ba15b] A. S. Badeira. Radom Laplacia matrices ad covex relaxatios. Available olie at arxiv: math.pr], A. S. Badeira. Relax ad Coquer BLOG: 18.S096 Pricipal Compoet Aalysis i High Dimesios ad the Spike Model BBAP05] J. Baik, G. Be-Arous, ad S. Péché. Phase trasitio of the largest eigevalue for oull complex sample covariace matrices. he Aals of Probability, 33(5): , BKS13] BS05] FP06] A. S. Badeira, C. Keedy, ad A. Siger. Approximatig the little grothedieck problem over the orthogoal group. Available olie at arxiv: cs.ds], J. Baik ad J. W. Silverstei. Eigevalues of large sample covariace matrices of spiked populatio models D. Féral ad S. Péché. he largest eigevalue of rak oe deformatio of large wiger matrices. Commuicatios i Mathematical Physics, 272(1): , Gol96] G. H. Golub. Matrix Computatios. Johs Hopkis Uiversity Press, third editio, HJ85] R. A. Hor ad C. R. Johso. Matrix Aalysis. Cambridge Uiversity Press, HM09] N. Halko, P. G. Martisso, ad J. A. ropp. Fidig structure with radomess: probabilistic algorithms for costructig approximate matrix decompositios. Available olie at arxiv: v2 math.na], Later i the course we will discuss clusterig uder the Stochastic Block Model quite thoroughly, ad will see how this same SDP is kow to be optimal for exact recovery ABH14, HWX14, Ba15a]. 17

18 HWX14] B. Hajek, Y. Wu, ad J. Xu. Achievig exact cluster recovery threshold via semidefiite programmig. Available olie at arxiv: , Joh01] I. M. Johsto. O the distributio of the largest eigevalue i pricipal compoets aalysis. he Aals of Statistics, 29(2): , Kar05] MM15] N. E. Karoui. Recet results about the largest eigevalue of radom covariace matrices ad statistical applicatio. Acta Physica Poloica B, 36(9), C. Musco ad C. Musco. Stroger ad faster approximate sigular value decompositio via the block laczos method. Available at arxiv: cs.ds], Mos11] M. S. Moslehia. Ky Fa iequalities. Available olie at arxiv: math.fa], MP67] MS15] MZ11] Pau] Pau07] Pea01] RS09] ao12] V. A. Marcheko ad L. A. Pastur. Distributio of eigevalues i certai sets of radom matrices. Mat. Sb. (N.S.), 72(114): , A. Motaari ad S. Se. Semidefiite programs o sparse radom graphs. Available olie at arxiv: cs.dm], S. Mallat ad O. Zeitoui. A cojecture cocerig optimality of the karhue-loeve basis i oliear recostructio. Available olie at arxiv: math.pr], D. Paul. Asymptotics of the leadig sample eigevalues for a spiked covariace model. Available olie at http: // aso. ucdavis. edu/ ~ debashis/ techrep/ eigelimit. pdf. D. Paul. Asymptotics of sample eigestructure for a large dimesioal spiked covariace model. Statistics Siica, 17: , K. Pearso. O lies ad plaes of closest fit to systems of poits i space. Philosophical Magazie, Series 6, 2(11): , V. Rokhli, A. Szlam, ad M. ygert. A radomized algorithm for pricipal compoet aalysis. Available at arxiv: stat.co], ao. opics i Radom Matrix heory. Graduate studies i mathematics. America Mathematical Soc.,

1 Principal Component Analysis in High Dimensions and the Spike Model

1 Principal Component Analysis in High Dimensions and the Spike Model Pricipal Compoet Aalysis i High Dimesios ad the Spike Model. Dimesio Reductio ad PCA Whe faced with a high dimesioal dataset, a atural approach is to try to reduce its dimesio, either by projectig it to

More information

CMSE 820: Math. Foundations of Data Sci.

CMSE 820: Math. Foundations of Data Sci. Lecture 17 8.4 Weighted path graphs Take from [10, Lecture 3] As alluded to at the ed of the previous sectio, we ow aalyze weighted path graphs. To that ed, we prove the followig: Theorem 6 (Fiedler).

More information

18.S096: Homework Problem Set 1 (revised)

18.S096: Homework Problem Set 1 (revised) 8.S096: Homework Problem Set (revised) Topics i Mathematics of Data Sciece (Fall 05) Afoso S. Badeira Due o October 6, 05 Exteded to: October 8, 05 This homework problem set is due o October 6, at the

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

Spectral Partitioning in the Planted Partition Model

Spectral Partitioning in the Planted Partition Model Spectral Graph Theory Lecture 21 Spectral Partitioig i the Plated Partitio Model Daiel A. Spielma November 11, 2009 21.1 Itroductio I this lecture, we will perform a crude aalysis of the performace of

More information

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n. CS 189 Itroductio to Machie Learig Sprig 218 Note 11 1 Caoical Correlatio Aalysis The Pearso Correlatio Coefficiet ρ(x, Y ) is a way to measure how liearly related (i other words, how well a liear model

More information

Session 5. (1) Principal component analysis and Karhunen-Loève transformation

Session 5. (1) Principal component analysis and Karhunen-Loève transformation 200 Autum semester Patter Iformatio Processig Topic 2 Image compressio by orthogoal trasformatio Sessio 5 () Pricipal compoet aalysis ad Karhue-Loève trasformatio Topic 2 of this course explais the image

More information

Lecture 8: October 20, Applications of SVD: least squares approximation

Lecture 8: October 20, Applications of SVD: least squares approximation Mathematical Toolkit Autum 2016 Lecturer: Madhur Tulsiai Lecture 8: October 20, 2016 1 Applicatios of SVD: least squares approximatio We discuss aother applicatio of sigular value decompositio (SVD) of

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition 6. Kalma filter implemetatio for liear algebraic equatios. Karhue-Loeve decompositio 6.1. Solvable liear algebraic systems. Probabilistic iterpretatio. Let A be a quadratic matrix (ot obligatory osigular.

More information

Chi-Squared Tests Math 6070, Spring 2006

Chi-Squared Tests Math 6070, Spring 2006 Chi-Squared Tests Math 6070, Sprig 2006 Davar Khoshevisa Uiversity of Utah February XXX, 2006 Cotets MLE for Goodess-of Fit 2 2 The Multiomial Distributio 3 3 Applicatio to Goodess-of-Fit 6 3 Testig for

More information

Singular value decomposition. Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaine

Singular value decomposition. Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaine Lecture 11 Sigular value decompositio Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaie V1.2 07/12/2018 1 Sigular value decompositio (SVD) at a glace Motivatio: the image of the uit sphere S

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Inverse Matrix. A meaning that matrix B is an inverse of matrix A. Iverse Matrix Two square matrices A ad B of dimesios are called iverses to oe aother if the followig holds, AB BA I (11) The otio is dual but we ofte write 1 B A meaig that matrix B is a iverse of matrix

More information

For a 3 3 diagonal matrix we find. Thus e 1 is a eigenvector corresponding to eigenvalue λ = a 11. Thus matrix A has eigenvalues 2 and 3.

For a 3 3 diagonal matrix we find. Thus e 1 is a eigenvector corresponding to eigenvalue λ = a 11. Thus matrix A has eigenvalues 2 and 3. Closed Leotief Model Chapter 6 Eigevalues I a closed Leotief iput-output-model cosumptio ad productio coicide, i.e. V x = x = x Is this possible for the give techology matrix V? This is a special case

More information

Topics in Eigen-analysis

Topics in Eigen-analysis Topics i Eige-aalysis Li Zajiag 28 July 2014 Cotets 1 Termiology... 2 2 Some Basic Properties ad Results... 2 3 Eige-properties of Hermitia Matrices... 5 3.1 Basic Theorems... 5 3.2 Quadratic Forms & Noegative

More information

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j. Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α

More information

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio

More information

Lecture 2: April 3, 2013

Lecture 2: April 3, 2013 TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

1 Last time: similar and diagonalizable matrices

1 Last time: similar and diagonalizable matrices Last time: similar ad diagoalizable matrices Let be a positive iteger Suppose A is a matrix, v R, ad λ R Recall that v a eigevector for A with eigevalue λ if v ad Av λv, or equivaletly if v is a ozero

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22 CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Notes for Lecture 11

Notes for Lecture 11 U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

LINEAR ALGEBRA. Paul Dawkins

LINEAR ALGEBRA. Paul Dawkins LINEAR ALGEBRA Paul Dawkis Table of Cotets Preface... ii Outlie... iii Systems of Equatios ad Matrices... Itroductio... Systems of Equatios... Solvig Systems of Equatios... 5 Matrices... 7 Matrix Arithmetic

More information

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices A Hadamard-type lower boud for symmetric diagoally domiat positive matrices Christopher J. Hillar, Adre Wibisoo Uiversity of Califoria, Berkeley Jauary 7, 205 Abstract We prove a ew lower-boud form of

More information

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model

More information

Lecture 19. sup y 1,..., yn B d n

Lecture 19. sup y 1,..., yn B d n STAT 06A: Polyomials of adom Variables Lecture date: Nov Lecture 19 Grothedieck s Iequality Scribe: Be Hough The scribes are based o a guest lecture by ya O Doell. I this lecture we prove Grothedieck s

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

The Sample Variance Formula: A Detailed Study of an Old Controversy

The Sample Variance Formula: A Detailed Study of an Old Controversy The Sample Variace Formula: A Detailed Study of a Old Cotroversy Ky M. Vu PhD. AuLac Techologies Ic. c 00 Email: kymvu@aulactechologies.com Abstract The two biased ad ubiased formulae for the sample variace

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Section 4.3. Boolean functions

Section 4.3. Boolean functions Sectio 4.3. Boolea fuctios Let us take aother look at the simplest o-trivial Boolea algebra, ({0}), the power-set algebra based o a oe-elemet set, chose here as {0}. This has two elemets, the empty set,

More information

Chandrasekhar Type Algorithms. for the Riccati Equation of Lainiotis Filter

Chandrasekhar Type Algorithms. for the Riccati Equation of Lainiotis Filter Cotemporary Egieerig Scieces, Vol. 3, 00, o. 4, 9-00 Chadrasekhar ype Algorithms for the Riccati Equatio of Laiiotis Filter Nicholas Assimakis Departmet of Electroics echological Educatioal Istitute of

More information

The random version of Dvoretzky s theorem in l n

The random version of Dvoretzky s theorem in l n The radom versio of Dvoretzky s theorem i l Gideo Schechtma Abstract We show that with high probability a sectio of the l ball of dimesio k cε log c > 0 a uiversal costat) is ε close to a multiple of the

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

arxiv: v1 [math.pr] 13 Oct 2011

arxiv: v1 [math.pr] 13 Oct 2011 A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,

More information

THE SPECTRAL RADII AND NORMS OF LARGE DIMENSIONAL NON-CENTRAL RANDOM MATRICES

THE SPECTRAL RADII AND NORMS OF LARGE DIMENSIONAL NON-CENTRAL RANDOM MATRICES COMMUN. STATIST.-STOCHASTIC MODELS, 0(3), 525-532 (994) THE SPECTRAL RADII AND NORMS OF LARGE DIMENSIONAL NON-CENTRAL RANDOM MATRICES Jack W. Silverstei Departmet of Mathematics, Box 8205 North Carolia

More information

c 2006 Society for Industrial and Applied Mathematics

c 2006 Society for Industrial and Applied Mathematics SIAM J. MATRIX ANAL. APPL. Vol. 7, No. 3, pp. 851 860 c 006 Society for Idustrial ad Applied Mathematics EXTREMAL EIGENVALUES OF REAL SYMMETRIC MATRICES WITH ENTRIES IN AN INTERVAL XINGZHI ZHAN Abstract.

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 5: SINGULARITIES.

ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 5: SINGULARITIES. ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 5: SINGULARITIES. ANDREW SALCH 1. The Jacobia criterio for osigularity. You have probably oticed by ow that some poits o varieties are smooth i a sese somethig

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

Fastest mixing Markov chain on a path

Fastest mixing Markov chain on a path Fastest mixig Markov chai o a path Stephe Boyd Persi Diacois Ju Su Li Xiao Revised July 2004 Abstract We ider the problem of assigig trasitio probabilities to the edges of a path, so the resultig Markov

More information

Last time: Moments of the Poisson distribution from its generating function. Example: Using telescope to measure intensity of an object

Last time: Moments of the Poisson distribution from its generating function. Example: Using telescope to measure intensity of an object 6.3 Stochastic Estimatio ad Cotrol, Fall 004 Lecture 7 Last time: Momets of the Poisso distributio from its geeratig fuctio. Gs () e dg µ e ds dg µ ( s) µ ( s) µ ( s) µ e ds dg X µ ds X s dg dg + ds ds

More information

Physics 324, Fall Dirac Notation. These notes were produced by David Kaplan for Phys. 324 in Autumn 2001.

Physics 324, Fall Dirac Notation. These notes were produced by David Kaplan for Phys. 324 in Autumn 2001. Physics 324, Fall 2002 Dirac Notatio These otes were produced by David Kapla for Phys. 324 i Autum 2001. 1 Vectors 1.1 Ier product Recall from liear algebra: we ca represet a vector V as a colum vector;

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

Stochastic Matrices in a Finite Field

Stochastic Matrices in a Finite Field Stochastic Matrices i a Fiite Field Abstract: I this project we will explore the properties of stochastic matrices i both the real ad the fiite fields. We first explore what properties 2 2 stochastic matrices

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Symmetric Matrices and Quadratic Forms

Symmetric Matrices and Quadratic Forms 7 Symmetric Matrices ad Quadratic Forms 7.1 DIAGONALIZAION OF SYMMERIC MARICES SYMMERIC MARIX A symmetric matrix is a matrix A such that. A = A Such a matrix is ecessarily square. Its mai diagoal etries

More information

Achieving Stationary Distributions in Markov Chains. Monday, November 17, 2008 Rice University

Achieving Stationary Distributions in Markov Chains. Monday, November 17, 2008 Rice University Istructor: Achievig Statioary Distributios i Markov Chais Moday, November 1, 008 Rice Uiversity Dr. Volka Cevher STAT 1 / ELEC 9: Graphical Models Scribe: Rya E. Guerra, Tahira N. Saleem, Terrace D. Savitsky

More information

THE KALMAN FILTER RAUL ROJAS

THE KALMAN FILTER RAUL ROJAS THE KALMAN FILTER RAUL ROJAS Abstract. This paper provides a getle itroductio to the Kalma filter, a umerical method that ca be used for sesor fusio or for calculatio of trajectories. First, we cosider

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER 1 018/019 DR. ANTHONY BROWN 8. Statistics 8.1. Measures of Cetre: Mea, Media ad Mode. If we have a series of umbers the

More information

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS DEMETRES CHRISTOFIDES Abstract. Cosider a ivertible matrix over some field. The Gauss-Jorda elimiatio reduces this matrix to the idetity

More information

MATH10212 Linear Algebra B Proof Problems

MATH10212 Linear Algebra B Proof Problems MATH22 Liear Algebra Proof Problems 5 Jue 26 Each problem requests a proof of a simple statemet Problems placed lower i the list may use the results of previous oes Matrices ermiats If a b R the matrix

More information

The Random Walk For Dummies

The Random Walk For Dummies The Radom Walk For Dummies Richard A Mote Abstract We look at the priciples goverig the oe-dimesioal discrete radom walk First we review five basic cocepts of probability theory The we cosider the Beroulli

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

Chi-squared tests Math 6070, Spring 2014

Chi-squared tests Math 6070, Spring 2014 Chi-squared tests Math 6070, Sprig 204 Davar Khoshevisa Uiversity of Utah March, 204 Cotets MLE for goodess-of fit 2 2 The Multivariate ormal distributio 3 3 Cetral limit theorems 5 4 Applicatio to goodess-of-fit

More information

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK)

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) Everythig marked by is ot required by the course syllabus I this lecture, all vector spaces is over the real umber R. All vectors i R is viewed as a colum

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

(VII.A) Review of Orthogonality

(VII.A) Review of Orthogonality VII.A Review of Orthogoality At the begiig of our study of liear trasformatios i we briefly discussed projectios, rotatios ad projectios. I III.A, projectios were treated i the abstract ad without regard

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Homework Set #3 - Solutions

Homework Set #3 - Solutions EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm

More information

CHAPTER I: Vector Spaces

CHAPTER I: Vector Spaces CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig

More information

State Space Representation

State Space Representation Optimal Cotrol, Guidace ad Estimatio Lecture 2 Overview of SS Approach ad Matrix heory Prof. Radhakat Padhi Dept. of Aerospace Egieerig Idia Istitute of Sciece - Bagalore State Space Represetatio Prof.

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

The Growth of Functions. Theoretical Supplement

The Growth of Functions. Theoretical Supplement The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

Matrix Theory, Math6304 Lecture Notes from October 25, 2012 taken by Manisha Bhardwaj

Matrix Theory, Math6304 Lecture Notes from October 25, 2012 taken by Manisha Bhardwaj Matrix Theory, Math6304 Lecture Notes from October 25, 2012 take by Maisha Bhardwaj Last Time (10/23/12) Example for low-rak perturbatio, re-examied Relatig eigevalues of matrices ad pricipal submatrices

More information

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet

More information

Lecture 9: September 19

Lecture 9: September 19 36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

if j is a neighbor of i,

if j is a neighbor of i, We see that if i = j the the coditio is trivially satisfied. Otherwise, T ij (i) = (i)q ij mi 1, (j)q ji, ad, (i)q ij T ji (j) = (j)q ji mi 1, (i)q ij. (j)q ji Now there are two cases, if (j)q ji (i)q

More information

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled 1 Lecture : Area Area ad distace traveled Approximatig area by rectagles Summatio The area uder a parabola 1.1 Area ad distace Suppose we have the followig iformatio about the velocity of a particle, how

More information

Tridiagonal reduction redux

Tridiagonal reduction redux Week 15: Moday, Nov 27 Wedesday, Nov 29 Tridiagoal reductio redux Cosider the problem of computig eigevalues of a symmetric matrix A that is large ad sparse Usig the grail code i LAPACK, we ca compute

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

The Binomial Theorem

The Binomial Theorem The Biomial Theorem Robert Marti Itroductio The Biomial Theorem is used to expad biomials, that is, brackets cosistig of two distict terms The formula for the Biomial Theorem is as follows: (a + b ( k

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Notes for Lecture 5. 1 Grover Search. 1.1 The Setting. 1.2 Motivation. Lecture 5 (September 26, 2018)

Notes for Lecture 5. 1 Grover Search. 1.1 The Setting. 1.2 Motivation. Lecture 5 (September 26, 2018) COS 597A: Quatum Cryptography Lecture 5 (September 6, 08) Lecturer: Mark Zhadry Priceto Uiversity Scribe: Fermi Ma Notes for Lecture 5 Today we ll move o from the slightly cotrived applicatios of quatum

More information

Mon Apr Second derivative test, and maybe another conic diagonalization example. Announcements: Warm-up Exercise:

Mon Apr Second derivative test, and maybe another conic diagonalization example. Announcements: Warm-up Exercise: Math 2270-004 Week 15 otes We will ot ecessarily iish the material rom a give day's otes o that day We may also add or subtract some material as the week progresses, but these otes represet a i-depth outlie

More information