ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018
|
|
- Jasmine Oliver
- 5 years ago
- Views:
Transcription
1 ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018
2 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space perturbation theory Extension: singular subspaces Extension: eigen-space for asymmetric transition matrices Spectral methods 2-2
3 A motivating application: graph clustering
4 Graph clustering / community detection Community structures are common in many social networks figure credit: The Future Buzz figure credit: S. Papadopoulos Goal: partition users into several clusters based on their friendships / similarities Spectral methods 2-4
5 A simple model: stochastic block model (SBM) x i = 1: 1 st community x i = 1: 2 nd community n nodes {1,, n} 2 communities n unknown variables: x 1,, x n {1, 1} encode community memberships Spectral methods 2-5
6 A simple model: stochastic block model (SBM) G observe a graph G (i, j) G with prob. { p, if i and j are from same community q, else Here, p > q and p, q log n/n Goal: recover community memberships of all nodes, i.e. {x i } Spectral methods 2-6
7 Adjacency matrix Consider the adjacency matrix A {0, 1} n n of G: { 1, if (i, j) G A i,j = 0, else WLOG, suppose x 1 = = x n/2 = 1; x n/2+1 = = x n = 1 Spectral methods 2-7
8 Adjacency matrix A = E[A] }{{} rank 2 + A E [A] [ p11 E[A] = q11 q11 p11 ] = p + q p q [ ] 1 [ 1, 1 ] }{{} 2 1 }{{} uninformative bias =x:=[x i] 1 i n Spectral methods 2-8
9 Spectral clustering A = E[A] }{{} rank 2 + A E [A] 1. computing the leading eigenvector û = [û i ] 1 i n of A p+q 2 { 11 1, if ûi > 0 2. rounding: output ˆx i = 1, if û i < 0 Spectral methods 2-9
10 Spectral clustering Rationale: recovery is reliable if A E[A] is sufficiently small }{{} perturbation if A E[A] = 0, then û ± [ 1 1 ] = perfect clustering Question: how to quantify the effect of perturbation A E[A] on û Spectral methods 2-10
11 Distance and angles between two subspaces
12 Setup and notation Consider 2 symmetric matrices M, ˆM = M + H R n n with eigen-decompositions n M = λ i u i u i and ˆM n = ˆλ i û i û i i=1 i=1 where λ 1 λ n ; ˆλ 1 ˆλ n. For simplicity, write [ ] [ ] Λ0 U M = [U 0, U 1 ] 0 Λ 1 U 1 [ ] [ ˆΛ0 Û ˆM = [Û0, Û1] 0 ˆΛ 1 Û1 Here, U 0 = [u 1,, u r ], Λ 0 = diag ([λ 1,, λ r ]), ] Spectral methods 2-12
13 Setup and notation [ ] M = u 1 u r u r+1 u n }{{}}{{} U 0 U 1 λ 1... λ r }{{} Λ 0 λ r+1... λn }{{} Λ 1 u 1. u r u r+1. u n U0 U1 Spectral methods 2-13
14 Setup and notation M : spectral norm (largest singular value of M) M F : Frobenius norm ( M F = tr(m M) = i,j M 2 i,j ) Spectral methods 2-14
15 Eigen-space perturbation theory Main focus: how does the perturbation H affect the distance between U and Û Question #0: how to define distance between two subspaces U Û F and U Û are not appropriate, since they fall short of accounting for global orthonormal transformation }{{} orthonormal R R r r, U and UR represent same subspace Spectral methods 2-15
16 Distance between two eigen-spaces One metric that takes care of global orthonormal transformation is dist(x 0, Z 0 ) := X 0 X 0 Z 0 Z 0 (2.1) This metric has several equivalent expressions: Lemma 2.1 Suppose X := [X 0, X 1 ] and Z := [Z 0, Z 1 ] are orthonormal matrices. Then }{{} complement subspace }{{} complement subspace dist(x 0, Z 0 ) = X 0 Z 1 = Z 0 X 1 sanity check: if X 0 = Z 0, then dist(x 0, Z 0 ) = X 0 Z 1 = 0 Spectral methods 2-16
17 Principal angles between two eigen-spaces In addition to distance, one might also be interested in angles i We can quantify the similarity between two lines (represented resp. by unit vectors x 0 and z 0 ) by an angle between them θ = arccos x 0, z 0 Spectral methods 2-17
18 Principal angles between two eigen-spaces For r-dimensional subspaces, one needs r angles Specifically, given X 0 Z 0 1, we write SVD of X 0 Z 0 R r r as cos θ 1 X0 Z 0 = U... } {{ cos θ r } :=cos Θ V := U cos Θ V where {θ 1,, θ r } are called the principal angles between X 0 and Z 0 Spectral methods 2-18
19 Relations between principal angles and dist(, ) As expected, principal angles and distance are closely related Lemma 2.2 Suppose X := [X 0, X 1 ] and Z := [Z 0, Z 1 ] are orthonormal matrices. Then X 0 Z 1 = sin Θ = max{ sin θ 1,, sin θ r } Lemmas 2.1 and 2.2 taken collectively give dist(x 0, Z 0 ) = max{ sin θ 1,, sin θ r } (2.2) Spectral methods 2-19
20 Proof of Lemma 2.2 X0 Z 1 = X0 Z 1 Z1 1 X 0 2 }{{} =I Z 0 Z0 = X 0 X 0 X 0 Z 0 Z 0 X = I U cos 2 Θ U 1 2 (since X 0 Z 0 = U cos Θ V ) = I cos 2 Θ 1 2 = sin Θ Spectral methods 2-20
21 Proof of Lemma 2.1 We first claim that SVD of X 1 Z 0 can be written as X 1 Z 0 = Ũ sin Θ V (2.3) for some orthonormal Ũ (to be proved later). With this claim in place, one has [ ] [ ] X Z 0 = [X 0, X 1 ] 0 U cos Θ V X1 Z 0 = [X 0, X 1 ] Ũ sin Θ V [ = Z 0 Z0 = [X 0, X 1 ] U cos 2 Θ U Ũ cos Θ sin Θ U U cos Θ sin Θ Ũ Ũ sin 2 Θ Ũ ] [ X 0 X 1 ] As a consequence, X 0 X0 Z 0 Z0 [ I U cos = [X 0, X 1 ] 2 Θ U U cos Θ sin Θ Ũ Ũ cos Θ sin Θ U Ũ sin2 Θ Ũ ] [ X 0 X 1 ] Spectral methods 2-21
22 Proof of Lemma 2.1 (cont.) This further gives X 0 X0 Z 0 Z0 ] [ ] [ ] = U sin [ 2 Θ cos Θ sin Θ U Ũ cos Θ sin Θ sin 2 Θ Ũ [ ] sin = 2 Θ cos Θ sin Θ cos Θ sin Θ sin 2 ( is rotationally invariant) Θ }{{} each block is diagonal matrix [ ] = max sin 2 θ i cos θ i sin θ i 1 i r cos θ i sin θ i sin 2 θ i [ ] = max sin 1 i r sin θ θi cos θ i i cos θ i sin θ i = max 1 i r sin θ i = sin Θ Spectral methods 2-22
23 Proof of Lemma 2.1 (cont.) It remains to justify (2.3). To this end, observe that Z 0 X 1 X 1 Z 0 = Z 0 Z 0 Z 0 X 0 X 0 Z 0 = I V cos 2 Θ V = V sin 2 Θ V and hence right singular space (resp. singular values) of X 1 Z 0 is given by V (resp. sin Θ) Spectral methods 2-23
24 Eigen-space perturbation theory
25 Davis-Kahan sin Θ Theorem: a simple case Theorem 2.3 Chandler Davis William Kahan Suppose M 0 and has rank r. If H < λ r (M), then dist ( ) HU 0 Û 0, U 0 λ r (M) H H λ r (M) H depends on the smallest non-zero eigenvalue of M }{{} and perturbation eigen-gap between λ r and λ Spectral methods r size
26 Proof of Theorem 2.3 We intend to control Û1 U 0 by studying their interactions through H: ) Û 1 HU 0 = Û1 (Û ˆΛÛ UΛU }{{ } U 0 = } {{ } M+H M ˆΛ 1 Û 1 U 0 Û 1 U 0 Λ 0 (since U 1 U 0 = Û 1 Û0 = 0) Û 1 U 0 Λ 0 ˆΛ1 Û 1 U 0 (triangle inequality) Û 1 U 0 λr Û 1 U 0 ˆΛ1 (2.4) In view of Weyl s Theorem, ˆΛ 1 H, which combined with (2.4) gives Û Û 1 U 0 1 HU 0 HU 0 λ r H Û1 λ r H This together with Lemma 2.1 completes the proof HU 0 = λ r H Spectral methods 2-26
27 Davis-Kahan sin Θ Theorem: more general case Theorem 2.4 (Davis-Kahan sin Θ Theorem) Suppose λ r (M) a and λ r+1 ( ˆM) a for some > 0. Then dist ( ) HU 0 Û 0, U 0 H immediate consequence: if λ r (M) > λ r+1 (M) + H, then dist ( Û 0, U 0 ) H λ r (M) λ r+1 (M) H }{{} spectral gap (2.5) Spectral methods 2-27
28 Back to stochastic block model... Let M = E[A] p + q [ 2 11, ˆM = A p+q 2 11 and u = 1 1 n 1 }{{} [ ] [1, 1 ] = p q Then the Davis-Kahan sin Θ Theorem yields ] dist(û, u) ˆM M λ 1 (M) ˆM M = A E[A] (p q)n 2 A E[A] (2.6) Question: how to bound A E[A] Spectral methods 2-28
29 A hammer: matrix Bernstein inequality Consider a sequence of independent random matrices { X l R d } 1 d 2 E[X l ] = 0 X l B for each l variance statistic: { E [ ] ] } v := max X l lxl E, [ l X l X l Theorem 2.5 (Matrix Bernstein inequality) For all τ 0, { P X l l } τ (d 1 + d 2 ) exp ( τ 2 /2 ) v + Bτ/3 Spectral methods 2-29
30 A hammer: matrix Bernstein inequality ( { } P X τ 2 ) /2 l l τ (d 1 + d 2 ) exp v + Bτ/3 moderate-deviation regime (τ is small): sub-gaussian tail behavior exp( τ 2 /2v) large-deviation regime (τ is large): sub-exponential tail behavior exp( 3τ/2B) (slower decay) user-friendly form (exercise): with prob. 1 O((d 1 + d 2 ) 10 ) X l l v log(d 1 + d 2 ) + B log(d 1 + d 2 ) (2.7) Spectral methods 2-30
31 Bounding A E[A] The Matrix Bernstein inequality yields Lemma 2.6 Consider SBM with p > q log n n. Then with high prob. A E[A] np log n (2.8) Spectral methods 2-31
32 Statistical accuracy of spectral clustering Substitute (2.8) into (2.6) to reach A E[A] np log n dist(û, u) (p q)n 2 A E[A] (p q)n provided that (p q)n np log n Thus, under condition p q p log n, with high prob. one has n dist(û, u) 1 = nearly perfect clustering Spectral methods 2-32
33 Statistical accuracy of spectral clustering p q p log n n = nearly perfect clustering dense regime: if p q 1, then this condition reads log n p q n sparse regime: if p = a log n n and q = b log n n a b a for a, b 1, then This condition is information-theoretically optimal (up to log factor) Mossel, Neeman, Sly 15, Abbe 18 Spectral methods 2-33
34 Proof of Lemma 2.6 To simplify presentation, assume A i,j and A j,i are independent (check: why this assumption does not change our bounds) Spectral methods 2-34
35 Proof of Lemma 2.6 Write A E[A] as i,j X i,j, where X i,j = ( A i,j E[A i,j ] ) e i e j Since Var(A i,j ) p, one has E [ X i,j Xi,j] pei e i, which gives E [ X i,j X ] i,j pe ie i np I i,j i,j Similarly, i,j E [ X i,j X i,j] np I. As a result, { v = max E [ X i,j X ] i,j, i,j i,j E [ Xi,jX ] } i,j np In addition, X i,j 1 := B Take the matrix Bernstein inequality to conclude that with high prob., A E[A] v log n + B log n np log n (since p log n n ) Spectral methods 2-35
36 Extension: singular subspaces
37 Singular value decomposition Consider two matrices M, ˆM = M + H with SVD Σ 0 0 [ ] V M = [U 0, U 1 ] 0 Σ 1 0 V1 0 0 ˆM = [Û0 Û1], ˆΣ ˆΣ1 0 0 [ ˆV 0 ˆV 1 where U 0 (resp. Û 0 ) and V 0 (resp. ˆV 0 ) represent the top-r singular subspaces of M (resp. ˆM) ] Spectral methods 2-37
38 Wedin sin Θ Theorem The Davis-Kahan Theorem generalizes to singular subspace perturbation: Theorem 2.7 (Wedin sin Θ Theorem) Suppose max σ r (M) }{{} rth singular value a and σ r+1 ( ˆM) a for some > 0. Then { dist ( Û 0, U 0 ), dist ( ˆV0, V 0 ) } max { HV 0, H U 0 } } {{ } two-sided interactions H Spectral methods 2-38
39 Example: low-rank matrix completion Netflix challenge: Netflix provides highly incomplete ratings from 0.5 million users for & 17,770 movies How to predict unseen user ratings for movies Spectral methods 2-39
40 Example: low-rank matrix completion In general, we cannot infer missing ratings this is an underdetermined system (more unknowns than observations) Spectral methods 2-40
41 Example: low-rank matrix completion... unless rating matrix has other structure A few factors explain most of the data Spectral methods 2-41
42 Example: low-rank matrix completion... unless rating matrix has other structure A few factors explain most of the data low-rank approximation How to exploit (approx.) low-rank structure in prediction Spectral methods 2-41
43 Model for low-rank matrix completion figure credit: Candès consider a low-rank matrix M each entry M i,j is observed independently with prob. p goal: fill in missing entries Spectral methods 2-42
44 Spectral estimate for matrix completion 1. set ˆM R n n as { 1 p ˆM i,j = M i,j if M i,j is observed 0, else rationale for rescaling: E[ ˆM] = M 2. compute the rank-r SVD Û ˆΣ ˆV of ˆM, and return ( Û, ˆΣ, ˆV ) Spectral methods 2-43
45 Statistical accuracy of spectral estimate Let s analyze a simple case where M = uv with u = 1 ũ 2 ũ, v = 1 ṽ 2 ṽ, ũ, ṽ N (0, I n ) From Wedin s Theorem: if p log 3 n/n, then with high prob. max {dist(û, u), dist(ˆv, v)} ˆM M σ 1 (M) ˆM M ˆM M }{{} controlled by Bernstein 1 (nearly accurate estimates) (2.9) Spectral methods 2-44
46 Sample complexity For rank-1 matrix completion, p log3 n n = nearly accurate estimates Sample complexity needed for obtaining reliable spectral estimates is n 2 p n log 3 n }{{} optimal up to log factor Spectral methods 2-45
47 Write First, Proof of (2.9) ˆM M = i,j X i,j, where X i,j = ( ˆM i,j M i,j )e i e j X i,j 1 p max M i,j log n := B (check) i,j pn Next, E [ X i,j Xi,j] = Var( ˆMi,j )e i e i and hence E [ i,j X i,jxi,j ] { = [ E max i,j i,j X i,jx i,j Var ( ) } { n ˆMi,j ni p max Mi,j 2 i,j ] n p max i,j M 2 i,j log2 n np Similar bounds hold for E [ i,j X i,j X i,j]. Therefore, } I (check) { E [ v := max X ] i,jxi,j [ ] }, E i,j i,j X i,jx i,j log2 n np Take the matrix Bernstein inequality to yield: if p log 3 n/n, then ˆM M v log n + B log n 1 Spectral methods 2-46
48 Extension: eigen-space for asymmetric transition matrices
49 Eigen-decomposition for asymmetric matrices Eigen-decomposition for asymmetric matrices is much more tricky; for example: 1. both eigenvalues and eigenvectors might be complex-valued 2. eigenvectors might not be orthogonal to each other This lecture focuses on a special case: probability transition matrices Spectral methods 2-48
50 Probability transition matrices P 2, P1, Consider Markov chain {X t } t 0 n states transition probability P{X t+1 = j X t = i} = P i,j transition matrix P = [P i,j ] 1 i,j n stationary distribution π = [π 1,, π n ] is 1st eigenvector of P }{{} π 1 + +π n=1 πp = π {X t } t 0 is said to be reversible if π i P i,j = π j P j,i for all i, j Spectral methods 2-49
51 Eigenvector perturbation for transition matrices Define a π := Theorem 2.8 (Chen, Fan, Ma, Wang 17) π 1 a π na 2 n Suppose P, ˆP are transition matrices with stationary distributions π, ˆπ, respectively. Assume P induces reversible Markov chain. If 1 > max {λ 2 (P ), λ n (P )} + ˆP P π, then π( ˆP P ) ˆπ π π π 1 max {λ 2 (P ), λ n (P )} } {{ } spectral gap ˆP P π }{{} perturbation ˆP does not need to induce reversible Markov chain Spectral methods 2-50
52 Example: ranking from pairwise comparisons pairwise comparisons for ranking tennis players figure credit: Bozóki, Csató, Temesi Spectral methods 2-51
53 Bradley-Terry-Luce (logistic) model w i : preference score n items to be ranked i: rank assign a latent score {w i } 1 i n to each item, so that item i item j if w i > w j each pair of items (i, j) is compared independently w j P {item j beats item i} = w i + w j Spectral methods 2-52
54 Bradley-Terry-Luce (logistic) model w i : preference score n items to be ranked i: rank assign a latent score {w i } 1 i n to each item, so that item i item j if w i > w j each pair of items (i, j) is compared independently w ind. 1, with prob. j w y i,j = i +w j 0, else Spectral methods 2-52
55 Spectral ranking method construct a probability transition matrix ˆP obeying ˆP i,j = { 1 2n y i,j, if i j 1 l:l i ˆP i,l, if i = j return the score estimate as the leading left eigenvector ˆπ of ˆP closely related to PageRank! Spectral methods 2-53
56 Rationale behind spectral method E[ ˆP i,j ] = 1 2n w j, w i + w j i j P := E[ ˆP ] obeys w i P i,j = w j P j,i (detailed balance) Thus, the stationary distribution π of P obeys π = 1 l w w l (reveal true scores) Spectral methods 2-54
57 Statistical guarantees for spectral ranking Negahban, Oh, Shah 16, Chen, Fan, Ma, Wang 17 Suppose max i,j w i w j 1. Then with high prob. ˆπ π 2 π 2 ˆπ π π π 2 1 n 0 }{{} nearly perfect estimate consequence of Theorem 2.8 and the matrix Bernstein (exercise) Spectral methods 2-55
58 Reference [1] The rotation of eigenvectors by a perturbation, C. Davis, W. Kahan, SIAM Journal on Numerical Analysis, [2] Perturbation bounds in connection with singular value decomposition, P. Wedin, BIT Numerical Mathematics, [3] Inference, estimation, and information processing, EE 378B lecture notes, A. Montanari, Stanford University. [4] COMS 4772 lecture notes, D. Hsu, Columbia University. [5] High-dimensional statistics: a non-asymptotic viewpoint, M. Wainwright, Cambridge University Press, [6] Principal component analysis for big data, J. Fan, Q. Sun, W. Zhou, Z. Zhu, arxiv: , Spectral methods 2-56
59 Reference [7] Community detection and stochastic block models, E. Abbe, Foundations and Trends in Communications and Information Theory, [8] Consistency thresholds for the planted bisection model, E. Mossel, J. Neeman, A. Sly, ACM Symposium on Theory of Computing, [9] Matrix completion from a few entries, R. Keshavan, A. Montanari, S. Oh, IEEE Transactions on Information Theory, [10] The PageRank citation ranking: bringing order to the web, L. Page, S. Brin, R. Motwani, T. Winograd, [11] Rank centrality: ranking from pairwise comparisons, S. Negahban, S. Oh, D. Shah, Operations Research, [12] Spectral method and regularized MLE are both optimal for top-k ranking, Y. Chen, J. Fan, C. Ma, K. Wang, accepted to Annals of Statistics, Spectral methods 2-57
Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking
Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang Ranking A fundamental
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationMatrix estimation by Universal Singular Value Thresholding
Matrix estimation by Universal Singular Value Thresholding Courant Institute, NYU Let us begin with an example: Suppose that we have an undirected random graph G on n vertices. Model: There is a real symmetric
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationLecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the
More informationSpectral Partitiong in a Stochastic Block Model
Spectral Graph Theory Lecture 21 Spectral Partitiong in a Stochastic Block Model Daniel A. Spielman November 16, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened
More informationLow-Rank Matrix Recovery
ELE 538B: Mathematics of High-Dimensional Data Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2018 Outline Motivation Problem setup Nuclear norm minimization RIP and low-rank matrix recovery
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationCommunity detection in stochastic block models via spectral methods
Community detection in stochastic block models via spectral methods Laurent Massoulié (MSR-Inria Joint Centre, Inria) based on joint works with: Dan Tomozei (EPFL), Marc Lelarge (Inria), Jiaming Xu (UIUC),
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationCOMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY
COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY OLIVIER GUÉDON AND ROMAN VERSHYNIN Abstract. We present a simple and flexible method to prove consistency of semidefinite optimization
More informationBinary matrix completion
Binary matrix completion Yaniv Plan University of Michigan SAMSI, LDHD workshop, 2013 Joint work with (a) Mark Davenport (b) Ewout van den Berg (c) Mary Wootters Yaniv Plan (U. Mich.) Binary matrix completion
More informationReconstruction in the Generalized Stochastic Block Model
Reconstruction in the Generalized Stochastic Block Model Marc Lelarge 1 Laurent Massoulié 2 Jiaming Xu 3 1 INRIA-ENS 2 INRIA-Microsoft Research Joint Centre 3 University of Illinois, Urbana-Champaign GDR
More informationRanking from Pairwise Comparisons
Ranking from Pairwise Comparisons Sewoong Oh Department of ISE University of Illinois at Urbana-Champaign Joint work with Sahand Negahban(MIT) and Devavrat Shah(MIT) / 9 Rank Aggregation Data Algorithm
More information2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.
2. LINEAR ALGEBRA Outline 1. Definitions 2. Linear least squares problem 3. QR factorization 4. Singular value decomposition (SVD) 5. Pseudo-inverse 6. Eigenvalue decomposition (EVD) 1 Definitions Vector
More informationarxiv: v5 [math.na] 16 Nov 2017
RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem
More informationEigenvalue Problems Computation and Applications
Eigenvalue ProblemsComputation and Applications p. 1/36 Eigenvalue Problems Computation and Applications Che-Rung Lee cherung@gmail.com National Tsing Hua University Eigenvalue ProblemsComputation and
More informationNode and Link Analysis
Node and Link Analysis Leonid E. Zhukov School of Applied Mathematics and Information Science National Research University Higher School of Economics 10.02.2014 Leonid E. Zhukov (HSE) Lecture 5 10.02.2014
More informationFinding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October
Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find
More informationEE 381V: Large Scale Learning Spring Lecture 16 March 7
EE 381V: Large Scale Learning Spring 2013 Lecture 16 March 7 Lecturer: Caramanis & Sanghavi Scribe: Tianyang Bai 16.1 Topics Covered In this lecture, we introduced one method of matrix completion via SVD-based
More informationCompressed Sensing and Sparse Recovery
ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing
More informationLecture 8: Linear Algebra Background
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected
More informationKrylov Subspace Methods to Calculate PageRank
Krylov Subspace Methods to Calculate PageRank B. Vadala-Roth REU Final Presentation August 1st, 2013 How does Google Rank Web Pages? The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector
More informationCommunity Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria
Community Detection fundamental limits & efficient algorithms Laurent Massoulié, Inria Community Detection From graph of node-to-node interactions, identify groups of similar nodes Example: Graph of US
More informationInvariant Subspace Perturbations or: How I Learned to Stop Worrying and Love Eigenvectors
Invariant Subspace Perturbations or: How I Learned to Stop Worrying and Love Eigenvectors Alexander Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu
More informationHigh Dimensional Covariance and Precision Matrix Estimation
High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal
More informationPCA with random noise. Van Ha Vu. Department of Mathematics Yale University
PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical
More informationLarge-scale eigenvalue problems
ELE 538B: Mathematics of High-Dimensional Data Large-scale eigenvalue problems Yuxin Chen Princeton University, Fall 208 Outline Power method Lanczos algorithm Eigenvalue problems 4-2 Eigendecomposition
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,
More informationLink Analysis. Leonid E. Zhukov
Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization
More informationAssessing the dependence of high-dimensional time series via sample autocovariances and correlations
Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Johannes Heiny University of Aarhus Joint work with Thomas Mikosch (Copenhagen), Richard Davis (Columbia),
More informationInformation Recovery from Pairwise Measurements
Information Recovery from Pairwise Measurements A Shannon-Theoretic Approach Yuxin Chen, Changho Suh, Andrea Goldsmith Stanford University KAIST Page 1 Recovering data from correlation measurements A large
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationCS60021: Scalable Data Mining. Dimensionality Reduction
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie
More information1 Searching the World Wide Web
Hubs and Authorities in a Hyperlinked Environment 1 Searching the World Wide Web Because diverse users each modify the link structure of the WWW within a relatively small scope by creating web-pages on
More informationMatrix concentration inequalities
ELE 538B: Mathematics of High-Dimensional Data Matrix concentration inequalities Yuxin Chen Princeton University, Fall 2018 Recap: matrix Bernstein inequality Consider a sequence of independent random
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationData Mining and Matrices
Data Mining and Matrices 10 Graphs II Rainer Gemulla, Pauli Miettinen Jul 4, 2013 Link analysis The web as a directed graph Set of web pages with associated textual content Hyperlinks between webpages
More informationAdaptive estimation of the copula correlation matrix for semiparametric elliptical copulas
Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work
More informationGoogle PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano
Google PageRank Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano fricci@unibz.it 1 Content p Linear Algebra p Matrices p Eigenvalues and eigenvectors p Markov chains p Google
More informationApplications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices
Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.
More informationLecture 12 : Graph Laplacians and Cheeger s Inequality
CPS290: Algorithmic Foundations of Data Science March 7, 2017 Lecture 12 : Graph Laplacians and Cheeger s Inequality Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Graph Laplacian Maybe the most beautiful
More informationFundamentals of Matrices
Maschinelles Lernen II Fundamentals of Matrices Christoph Sawade/Niels Landwehr/Blaine Nelson Tobias Scheffer Matrix Examples Recap: Data Linear Model: f i x = w i T x Let X = x x n be the data matrix
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationMatrix Completion from a Few Entries
Matrix Completion from a Few Entries Raghunandan H. Keshavan and Sewoong Oh EE Department Stanford University, Stanford, CA 9434 Andrea Montanari EE and Statistics Departments Stanford University, Stanford,
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1
Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of
More informationPart I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes
Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with
More informationLecture 9: SVD, Low Rank Approximation
CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 9: SVD, Low Rank Approimation Lecturer: Shayan Oveis Gharan April 25th Scribe: Koosha Khalvati Disclaimer: hese notes have not been subjected
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationA useful variant of the Davis Kahan theorem for statisticians
arxiv:1405.0680v1 [math.st] 4 May 2014 A useful variant of the Davis Kahan theorem for statisticians Yi Yu, Tengyao Wang and Richard J. Samworth Statistical Laboratory, University of Cambridge (May 6,
More informationData science with multilayer networks: Mathematical foundations and applications
Data science with multilayer networks: Mathematical foundations and applications CDSE Days University at Buffalo, State University of New York Monday April 9, 2018 Dane Taylor Assistant Professor of Mathematics
More informationOrthogonal tensor decomposition
Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.
More informationapproximation algorithms I
SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 201 meta-task encoded as low-degree polynomial in R x example: f(x) = i,j n w ij x i x j 2 given: functions
More informationAn Introduction to Spectral Learning
An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationDimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining
Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can
More informationConditions for Robust Principal Component Analysis
Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationAnalysis of Robust PCA via Local Incoherence
Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu
More informationDonald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016
Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix
More informationLab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018
Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationAlgebraic Representation of Networks
Algebraic Representation of Networks 0 1 2 1 1 0 0 1 2 0 0 1 1 1 1 1 Hiroki Sayama sayama@binghamton.edu Describing networks with matrices (1) Adjacency matrix A matrix with rows and columns labeled by
More informationSpectra of Adjacency and Laplacian Matrices
Spectra of Adjacency and Laplacian Matrices Definition: University of Alicante (Spain) Matrix Computing (subject 3168 Degree in Maths) 30 hours (theory)) + 15 hours (practical assignment) Contents 1. Spectra
More informationLink Analysis Ranking
Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationLecture 9: Low Rank Approximation
CSE 521: Design and Analysis of Algorithms I Fall 2018 Lecture 9: Low Rank Approximation Lecturer: Shayan Oveis Gharan February 8th Scribe: Jun Qi Disclaimer: These notes have not been subjected to the
More information6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities
6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities 1 Outline Outline Dynamical systems. Linear and Non-linear. Convergence. Linear algebra and Lyapunov functions. Markov
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationSpectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods
Spectral Clustering Seungjin Choi Department of Computer Science POSTECH, Korea seungjin@postech.ac.kr 1 Spectral methods Spectral Clustering? Methods using eigenvectors of some matrices Involve eigen-decomposition
More informationJeffrey D. Ullman Stanford University
Jeffrey D. Ullman Stanford University 2 Often, our data can be represented by an m-by-n matrix. And this matrix can be closely approximated by the product of two matrices that share a small common dimension
More informationDissertation Defense
Clustering Algorithms for Random and Pseudo-random Structures Dissertation Defense Pradipta Mitra 1 1 Department of Computer Science Yale University April 23, 2008 Mitra (Yale University) Dissertation
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 12 Luca Trevisan October 3, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analysis Handout 1 Luca Trevisan October 3, 017 Scribed by Maxim Rabinovich Lecture 1 In which we begin to prove that the SDP relaxation exactly recovers communities
More informationA hybrid reordered Arnoldi method to accelerate PageRank computations
A hybrid reordered Arnoldi method to accelerate PageRank computations Danielle Parker Final Presentation Background Modeling the Web The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector
More informationMAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds
MAE 298, Lecture 8 Feb 4, 2008 Web search and decentralized search on small-worlds Search for information Assume some resource of interest is stored at the vertices of a network: Web pages Files in a file-sharing
More informationSparse PCA in High Dimensions
Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationAnnounce Statistical Motivation Properties Spectral Theorem Other ODE Theory Spectral Embedding. Eigenproblems I
Eigenproblems I CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Eigenproblems I 1 / 33 Announcements Homework 1: Due tonight
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationStatistical and Computational Phase Transitions in Planted Models
Statistical and Computational Phase Transitions in Planted Models Jiaming Xu Joint work with Yudong Chen (UC Berkeley) Acknowledgement: Prof. Bruce Hajek November 4, 203 Cluster/Community structure in
More informationMTH 2032 SemesterII
MTH 202 SemesterII 2010-11 Linear Algebra Worked Examples Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education December 28, 2011 ii Contents Table of Contents
More informationOnline Social Networks and Media. Link Analysis and Web Search
Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationDATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS
DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationCertifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering
Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint
More informationStochastic optimization Markov Chain Monte Carlo
Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing
More information