Dissertation Defense
|
|
- Magdalene Phelps
- 5 years ago
- Views:
Transcription
1 Clustering Algorithms for Random and Pseudo-random Structures Dissertation Defense Pradipta Mitra 1 1 Department of Computer Science Yale University April 23, 2008 Mitra (Yale University) Dissertation Defense April 23, / 46
2 Committee Ravi Kannan (Advisor) Dana Angluin Dan Spielman Mike Mahoney (Yahoo!) Mitra (Yale University) Dissertation Defense April 23, / 46
3 Outline 1 Introduction: Clustering and Spectral algorithms 2 Four results: 1 Clustering using Bi-partitioning 2 Clustering in sparse graphs 3 A Robust clustering algorithm 4 An Entrywise notion for spectral stability 3 Future work Mitra (Yale University) Dissertation Defense April 23, / 46
4 Clustering What is clustering? Issues Given a set of objects S, partition into disjoint sets or clusters S 1... S k Partitioning done according to some notion of closeness, i.e. objects in a cluster S r are close to each other, and far from objects in other clusters. What is the right definition of closeness? Algorithms to find clusters given the right definition Mitra (Yale University) Dissertation Defense April 23, / 46
5 Clustering: Examples Figure: From Yale face dataset Mitra (Yale University) Dissertation Defense April 23, / 46
6 Clustering: Matrices Term-document matrices M V C A H K F P CS Doc CS Doc CS Doc Medicine Doc Medicine Doc Medicine Doc M - Microprocessor, V - VirtualMemory, C - L2 Cache, A - Algorithm, H - Hemoglobin, K - Kidney, F - Fracture, P - Painkiller Moral Clustering problems can be modelled as object-feature matrices, objects can be seen as vectors in high dimesional space. Mitra (Yale University) Dissertation Defense April 23, / 46
7 Mixture models Each cluster is defined by a simple (high-dimensional) probability distribution. Objects are samples from these distributions. Hope Can successfully cluster if centers (means) are far apart. µ 1 µ 2 How large does need to be? Figure: Two circles whose centers are separated Mitra (Yale University) Dissertation Defense April 23, / 46
8 Random graphs A G n,p random graph is generated by selecting each possible edge with independent probability p Example: G 5,0.5 EA = A Mitra (Yale University) Dissertation Defense April 23, / 46
9 Planted partition model for Graphs Total n vertices, divided in to k clusters T 1, T 2... T k of size n 1... n k There are k(k+1) 2 probabilities P rs (= P sr ) such that if v T r, u T s, the edge e(u, v) is present with probability P rs Mitra (Yale University) Dissertation Defense April 23, / 46
10 Planted partition model for Graphs P = EA = A µ 1 = {0.5, 0.5, 0.5, 0.1, 0.1, 0.1} µ 2 = {0.1, 0.1, 0.1, 0.5, 0.5, 0.5} Mitra (Yale University) Dissertation Defense April 23, / 46
11 Algorithmic Issues Heuristic Analysis: Analyze an algorithm known to work in practice. Spectral Algorthms Uses information about the spectrum (Eigenvalues, Eigenvectors, Singular Vectors etc) of the data matrix to do clustering Quite popular, seems to work in practice. Singular values and vectors can be computed efficiently For a matrix A, the span of the top k singular vectors gives A k, the rank k matrix such that for all rank k matrices M A A k A M Mitra (Yale University) Dissertation Defense April 23, / 46
12 Why might this work? Intuition: Eliminates noise Avoids curse of dimensionality Cheeger s inequality (relation to sparsest cut) Convention Eigen/Singular values are often sorted from largest to smallest (in absolute value). λ 1 λ 2 λ 3... The eigen/singular vector corresponding to λ i is the i th eigen/singular vector Mitra (Yale University) Dissertation Defense April 23, / 46
13 Why might this work? A = Quick Definition: if A is square symmetric, v its eigenvector if v = 1, and Av = λv for some λ (an eigenvalue). Mitra (Yale University) Dissertation Defense April 23, / 46
14 Why might this work? A = = {1,... 1} T, A1 = 41 1 is the first eigenvector Av v = {1, 1, 1, 1, 1, 1, 1, 1} T = 2v This is the second eigenvector, and reveals the cluster. Mitra (Yale University) Dissertation Defense April 23, / 46
15 Previous Work Lot of work: [B 87], [DF 89], [AK 97], [AKS 98], [VW2002]... McSherry 2001 An instance of the planted partition model A with k clusters can be clustered with probability 1 o(1) if the following separation condition holds (centers are far apart): for all r s µ r µ s 2 cσ 2 log n where σ 2 = max rs P rs, n = number of vertices Assumption: σ 2 log6 n n, i.e. atleast polylog degree Spectral method: Take best rank-k approximation A k, do greedy on that matrix. This gives approximate clustering Clean-up: Use combinatorial projections, ie counting edges to approximate partitions. Mitra (Yale University) Dissertation Defense April 23, / 46
16 Our Contributon Clustering by Recursive Bi-partitioning: use the second singular vector to bi-partition the data. repeat. Pseudo-random models of Clustering: used to model cluster problems for sparse (constant-degree) graphs. Rotationally invariant algorithms: Remove combinatorial/ad-hoc techniques for discrete distributions. Entrywise bounds for Eigenvectors: A different notion of spectral stability for random graphs. Mitra (Yale University) Dissertation Defense April 23, / 46
17 Spectral Clustering by Recursive Bi-partitioning Mitra (Yale University) Dissertation Defense April 23, / 46
18 Spectral Clustering by Recursive Bi-partitioning Joint work with Dasgupta, Hopcroft and Kannan (ESA 2006) Goal Instead of a rank-k approximation based method, use a incremental algorithm that bi-partitions the data at each step. Result Clustering possible if for all r s where σ 2 r = max s P rs log6 n n µ r µ s 2 c(σ r + σ s ) 2 log n Mitra (Yale University) Dissertation Defense April 23, / 46
19 Basic Step Given A, find vector v 1 of AJ that maximizes AJv 1 where J = I 1 n 11T Sort entries of v 1 : v 1 (1) v 1 (2)... v 1 (n) Find i, i + 1 such that v 1 (i) v 1 (i + 1) is largest Return {1,... i} and {i + 1,... n} as the bi-partition Definition refresher v 1 is the first right singular vector of AJ, and close to the second right singular vector of A Mitra (Yale University) Dissertation Defense April 23, / 46
20 Main algorithm Given A, randomly partition the rows into t = 4 log n parts B i (i = 1 to t) of equal size Bi-partition the (same) columns t times using Basic Step (last slide) Combine these (approximate) bi-partitions to find an accurate bi-partition Mitra (Yale University) Dissertation Defense April 23, / 46
21 Analysis Let s focus on one B i. Call it B. Let B = E(B) v 1 (BJ) is almost structured Let v 1 = v 1 (BJ). Then, v 1 = r α r g (r) + v where g (r) is the characteristic vector of T r, v is orthogonal to each g (r) and v 1 c 2 Mitra (Yale University) Dissertation Defense April 23, / 46
22 Analysis Let s focus on one B i. Call it B. Let B = E(B) v 1 (BJ) is almost structured Let v 1 = v 1 (BJ). Then, Furedi-Komlos 81 if σ 2 log6 n n B B 3σ n v 1 = r α r g (r) + v where g (r) is the characteristic vector of T r, v is orthogonal to each g (r) and v 1 c 2 BJ BJv + BJv BJ v + BJv + BJ BJ v BJ (1 v 2 ) + B B v Using (1 x) 1 x v B B 1 BJ c 2 Mitra (Yale University) Dissertation Defense April 23, / 46
23 Analysis v 1 = v + v, v = α r g (r) Claim When sorted, there is a Ω(1) gap in the α s. v is orthogonal to 1. This implies, αr 0 On the other hand, 1 = v 1 2 = v 2 + v 2 r α 2 r + 1 c 2 2 r α 2 r 1 2 v looks like this. Combines to prove the existence of a Ω(1) gap. Mitra (Yale University) Dissertation Defense April 23, / 46
24 Analysis v 1 = v + v, v = α r g (r) Claim No more than n min c 3 vertices cross the gap. (n min = min r n r ) Implied by the fact that v is small. Ω(1) gap in α 1 4 n min gap in v. Let there are m vertices that cross the gap. Then, 1 m v n min c2 2 m 16n min c 2 2 v 1 looks like this. Mitra (Yale University) Dissertation Defense April 23, / 46
25 Combining the 4 log n bi-partitions We showed No more than n min c 3 gap. vertices cross the Given 4 log n, bi-partitions, construct a graph on the vertices thus: For each u, v [n] set e(u, v) = 1 if vertices u and v are on the same side of the bi-partition in 1 2ɛ fraction of cases. Find connected components in the graph. Return them as a (bi-)partition. Mitra (Yale University) Dissertation Defense April 23, / 46
26 Combining the 4 log n bi-partitions Equivalently A vertex has ɛ = 1 c 3 being misclassified. probability of Given 4 log n, bi-partitions, construct a graph on the vertices thus: For each u, v [n] set e(u, v) = 1 if vertices u and v are on the same side of the bi-partition in 1 2ɛ fraction of cases. Find connected components in the graph. Return them as a (bi-)partition. Mitra (Yale University) Dissertation Defense April 23, / 46
27 Combining the 4 log n bi-partitions Equivalently A vertex has ɛ = 1 c 3 probability of being misclassified. Given 4 log n, bi-partitions, construct a graph on the vertices thus: For each u, v [n] set e(u, v) = 1 if vertices u and v are on the same side of the bi-partition in 1 2ɛ fraction of cases. Find connected components in the graph. Return them as a (bi-)partition. Need to show: No two vertices from the same cluster can be put in different components. We find at least two components. Mitra (Yale University) Dissertation Defense April 23, / 46
28 Combining the 4 log n bi-partitions Equivalently A vertex has ɛ = 1 c 3 probability of being misclassified. Given 4 log n, bi-partitions, construct a graph on the vertices thus: For each u, v [n] set e(u, v) = 1 if vertices u and v are on the same side of the bi-partition in 1 2ɛ fraction of cases. Find connected components in the graph. Return them as a (bi-)partition. Clean clusters No two vertices from the same cluster can be put in different components. Let u, v T r. Vertex v is on the right side of the bi-partition (1 ɛ) fraction of cases. Same is true for u. So u and v on the same side at least (1 2ɛ) fraction of cases. Mitra (Yale University) Dissertation Defense April 23, / 46
29 Combining the 4 log n bi-partitions Equivalently A vertex has ɛ = 1 c 3 probability of being misclassified. Given 4 log n, bi-partitions, construct a graph on the vertices thus: For each u, v [n] set e(u, v) = 1 if vertices u and v are on the same side of the bi-partition in 1 2ɛ fraction of cases. Find connected components in the graph. Return them as a (bi-)partition. Nontrivial Partitions We find at least two components. A counting argument. Mitra (Yale University) Dissertation Defense April 23, / 46
30 Pseudo-randomness and Clustering Mitra (Yale University) Dissertation Defense April 23, / 46
31 Sparse graphs? Goal Design a model that will allow constant-degree graphs. Problems Standard condition: µ r µ s 2 cσ 2 log n. A planted partition model with σ 2 = Θ( d n ) for constant d will have vertices with logarithmic degree. Our Result We Introduce a model where clustering possible if: for constant α µ r µ s 2 c α2 n log2 α Mitra (Yale University) Dissertation Defense April 23, / 46
32 Solution: Use pseudo-randomness A graph G(V, E) is (p, α) pseudo-random if for all A, B V e(a, B) p A B α A B Theorem A G n,p random graph is (p, 2 np) pseudo-random (p log6 n n ) Proof: E(e(A, B)) = p A B. Using Chernoff Bound, P( e(a, B) E(e(A, B)) > 2 np A B ) exp( 2n) But there are only 2 n.2 n = 2 2n pairs of sets A, B. The claim follows. Intuition Pseudo-random graphs are deterministic versions of random-graphs Mitra (Yale University) Dissertation Defense April 23, / 46
33 The model Graph G, k clusters T r, r [k] For some α, and for each r, s [k], there is p rs such that: G(T r, T s ) is (p rs, α) pseudo-random. Also, e(x, T s ) p rs T s 2α if x T r Algorithmic issue: Furedi-Komlos doesn t apply, and there is no independence! Ā A Mitra (Yale University) Dissertation Defense April 23, / 46
34 Rotationally Invariant Algorithm for Discrete Distributions Mitra (Yale University) Dissertation Defense April 23, / 46
35 Discrete vs. Continuous Similar results can be proved for discrete and continuous models µ r µ s 2 Ω (σ 2 log n) The algorithms: 1 Share the spectral part that gives an approximation 2 Differ in clean-up phase continuous models seems to have more natural algorithms Mixture of Gaussians: k high dimensional gaussians with centers µ r, r = 1 to k. pdf for the k-th cluster/gaussian: f r (x) ( exp 1 ) 2 (x µ r ) Σ 1 r (x µ r ) Σ r is the covariance matrix. Mitra (Yale University) Dissertation Defense April 23, / 46
36 Discrete vs. Continuous We would like an algorithm 1 Simple, natural clean-up phase 2 Rotationally invariant 3 Easily extensible to more complex models McSherry 2001 Conjecture: Such an algorithm exists. Mitra (Yale University) Dissertation Defense April 23, / 46
37 Discrete vs. Continuous We would like an algorithm 1 Simple, natural clean-up phase 2 Rotationally invariant 3 Easily extensible to more complex models Simplicity One shot distance-based or projection-based algorithm, instead of combinatorial, incremental or sampling techniques. McSherry 2001 Conjecture: Such an algorithm exists. Mitra (Yale University) Dissertation Defense April 23, / 46
38 Discrete vs. Continuous We would like an algorithm 1 Simple, natural clean-up phase 2 Rotationally invariant Natural assumption If the vectors are rotated, the clustering remains the same. 3 Easily extensible to more complex models McSherry 2001 Conjecture: Such an algorithm exists. Mitra (Yale University) Dissertation Defense April 23, / 46
39 Discrete vs. Continuous We would like an algorithm 1 Simple, natural clean-up phase 2 Rotationally invariant 3 Easily extensible to more complex models Extension Simpler algorithms are easier to adapt: models without complete independence, without block structuring. McSherry 2001 Conjecture: Such an algorithm exists. Mitra (Yale University) Dissertation Defense April 23, / 46
40 Discrete vs. Continuous We would like an algorithm 1 Simple, natural clean-up phase 2 Rotationally invariant 3 Easily extensible to more complex models McSherry 2001 Conjecture: Such an algorithm exists. Our result The conjecture is true. Theorem Consider a matrix generated from a discrete mixture model with k- clusters, m objects and n features. Clustering possible if: ( µ r µ s 2 cσ n ) log m m Mitra (Yale University) Dissertation Defense April 23, / 46
41 Our algorithm Cluster(A, k) Divide A into A 1 and A 2 {µ r } = Centers(A 1, k) Project (A 2, µ 1,... µ k ) Project(A 2, µ 1,... µ k ) Group v A 2 with the µ r that minimizes v µ r Centers(A 1, k) Uses a spectral algorithm to find approximate clusters P r, r [k]. Returns empirical centers µ r = 1 P r v P r v Mitra (Yale University) Dissertation Defense April 23, / 46
42 Analysis Lemma Proof idea: µ r µ r 2 c 1 σ 2 ( 1 + n m µ r = 1 p r v P r v p r µ r = v P r v = P r v + s v Q rs p r (µ r µ r ) = Pr (v µ r ) + s Q rs (v µ r ) ) 1 20 µ r µ s 2 (1) Spectral method returns a approximately correct partition. P r = correctly classified part of P r, p r = P r, p r = P r Q rs = should be in P s, placed in P r, q rs = Q rs Mitra (Yale University) Dissertation Defense April 23, / 46
43 Analysis p r (µ r µ r ) = Pr (v µ r ) + s Q rs (v µ r ) Need to bound Pr (v µ r ) and for all s (v µ r ) (v µ s ) + q rs µ s q rs µ r Q rs Q rs q rs µ s q rs µ r = q rs µ s µ r Turns out q rs decreases if µ s µ r increases, cancels each other out. Pr (v µ r ) follows from an argument based on a spectral norm bound (ala Furedi-Komlos). Mitra (Yale University) Dissertation Defense April 23, / 46
44 Analysis Lemma For each sample u, if u T r, then for all s r (u µ r ) (µ r µ s ) 2 5 µ r µ s 2 Assume µ r = µ r + δ r ; r. Then, (u µ r ) (µ s µ r ) =(u µ r δ r ) (µ s µ r δ r + δ s ) =(u µ r ) (µ r µ s ) δ r (µ s µ r ) δ r (δ s δ r ) + (u µ r ) (δ s δ r ) (u µ r ) (µ r µ s ) is small by separation assumption. Mitra (Yale University) Dissertation Defense April 23, / 46
45 Analysis Lemma For each sample u, if u T r, then for all s r (u µ r ) (µ r µ s ) 2 5 µ r µ s 2 Assume µ r = µ r + δ r ; r. Then, (u µ r ) (µ s µ r ) =(u µ r δ r ) (µ s µ r δ r + δ s ) =(u µ r ) (µ r µ s ) δ r (µ s µ r ) δ r (δ s δ r ) + (u µ r ) (δ s δ r ) δ r (µ s µ r ) δ r µ s µ r by Cauchy-Schwartz, is small as δ r is small. Mitra (Yale University) Dissertation Defense April 23, / 46
46 Analysis Lemma For each sample u, if u T r, then for all s r (u µ r ) (µ r µ s ) 2 5 µ r µ s 2 Assume µ r = µ r + δ r ; r. Then, (u µ r ) (µ s µ r ) =(u µ r δ r ) (µ s µ r δ r + δ s ) =(u µ r ) (µ r µ s ) δ r (µ s µ r ) δ r (δ s δ r ) + (u µ r ) (δ s δ r ) δ r (δ s δ r ) is similarly small. Mitra (Yale University) Dissertation Defense April 23, / 46
47 Analysis Lemma For each sample u, if u T r, then for all s r (u µ r ) (µ r µ s ) 2 5 µ r µ s 2 Assume µ r = µ r + δ r ; r. Then, (u µ r ) (µ s µ r ) =(u µ r δ r ) (µ s µ r δ r + δ s ) =(u µ r ) (µ r µ s ) δ r (µ s µ r ) δ r (δ s δ r ) + (u µ r ) (δ s δ r ) Main challenge Bounding (u µ r ) (δ s δ r ) Mitra (Yale University) Dissertation Defense April 23, / 46
48 Completing the proof Claim (u µ r ) (δ s δ r ) < c 3 σ 2 (1 + n m ) log m Proof idea: (u µ r ) δ r = (u(i) µ r (i))δ r (i) = x(i) i [n] i [n] This is a sum of zero mean random variables x(i). E(x(i) 2 ) 2δ r (i) 2 σ 2 i E(x(i) 2 ) 2σ 2 δ r 2 c 3 kσ 4 ( 1 + n m ) x(i) δ i 2c 4 σ 2, because the number of 1 s in a column can be at most 1.1mσ 2. Mitra (Yale University) Dissertation Defense April 23, / 46
49 Completing the proof So we have a sum of absolutely bounded, zero mean, bounded variance random variables. Can apply: Bernstein s inequality Let {X i } n i=1 be a collection of independent, random variables where Pr { X i M} = 1 i. Then, for any ε 0 { } ( ) n Pr (X i E[X i ]) ε ε 2 exp 2 ( θ 2 + M 3 ε) i=1 where θ = EX 2 i Plugging in our values, Pr{ ( x(i) c 3 σ 2 (1 + n )} m ) + log m 1 m 3 i [n] Mitra (Yale University) Dissertation Defense April 23, / 46
50 Entrywise Bounds for Eigenvectors of Random Graphs Mitra (Yale University) Dissertation Defense April 23, / 46
51 Well studied: l 2 norm bounds Already saw: if A is the adjacency matrix of a G n,p graph Lot of research on similar bounds. A E(A) 3 np v = v 1 (E(A)) = 1 n 1 Question u = v 1 (A) =? Goal Study u v max i [n] u(i) v(i) A potentially useful notion of spectral stability. Mitra (Yale University) Dissertation Defense April 23, / 46
52 Can l 2 give l? Not directly! The spectral norm bound on A E(A) can be converted to a bound on u v. Best bound we can get u v u v 3 Too weak! 1 np is much larger np 1 that n Mitra (Yale University) Dissertation Defense April 23, / 46
53 Eigenvector of a Random Graph Figure: G 400,0.2 Mitra (Yale University) Dissertation Defense April 23, / 46
54 Our result Let A be the adjacency matrix of a G n,p graph, and u = v 1 (A). Then with probability 1 o(1), for all i u(i) = 1 n (1 ± ɛ) log n where ɛ = c log n 2 log np np, p log6 n n, c 2 constant Essentially optimal. Mitra (Yale University) Dissertation Defense April 23, / 46
55 Proof Only need a few elementary properties. Let = 2 probability, e(i) = np(1 ± ); for all i [n] log n np e(a, B) p A B 2 np A B ; for all A, B λ 1 (A) np. With high Normalize u = v 1 (A) such that max i (u(i)) = u(1) = 1 Au = (np)u (Au)(1) = (np)u(1) i A(1, i)u(i) = np N(1) u(i) = np Claim At least np 2 vertices of N(1) have u(i) 1 2 We know, N(1) u(i) = np Mitra (Yale University) Dissertation Defense April 23, / 46
56 Proof Only need a few elementary properties. Let = 2 probability, e(i) = np(1 ± ); for all i [n] log n np e(a, B) p A B 2 np A B ; for all A, B λ 1 (A) np. With high If not N(1) u(i) np (np(1 + ) np )(1 2 ) 2 np 2 + (np(1 + ))(1 2 ) 2 np np 2 < np Claim At least np 2 vertices of N(1) have u(i) 1 2 We know, N(1) u(i) = np Mitra (Yale University) Dissertation Defense April 23, / 46
57 Proof (contd.) Idea Extend the argument to successive neighborhood sets. We define a sequence of sets {S t } for t = 1... S 1 = {1} S t+1 = {i : i N(S t ) and u(i) 1 c(t + 1) } How quickly does S t+1 grow? Lemma Let t be the last index such that S t 2n 3. For all t t S t+1 (np) S t 9t 2 Exponential increase! Mitra (Yale University) Dissertation Defense April 23, / 46
58 Connection to Clustering Experiments show that for our models, no clean-up is necessary at all. Needed Subtle entrywise bounds for the second (and smaller) eigenvectors for the planted model. Figure: Second eigenvector of a graph with two clusters. Mitra (Yale University) Dissertation Defense April 23, / 46
59 Connection to Clustering Can show for models with stronger separation conditions Theorem Assume σ 2 = Ω( 1 n ). Then the second eigenvector provides a clean clustering if µ r µ s 2 σ 2/3 log n Stronger than standard assumption σ 2 log n Mitra (Yale University) Dissertation Defense April 23, / 46
60 Future Work 1 Clustering without clean-up 2 Clustering below the variance bound Ω (σ 2 ) 3 A Chernoff type bound for entrywise error? Algorithmic applications? Mitra (Yale University) Dissertation Defense April 23, / 46
61 Thanks! Mitra (Yale University) Dissertation Defense April 23, / 46
A Simple Algorithm for Clustering Mixtures of Discrete Distributions
1 A Simple Algorithm for Clustering Mixtures of Discrete Distributions Pradipta Mitra In this paper, we propose a simple, rotationally invariant algorithm for clustering mixture of distributions, including
More informationClustering Algorithms for Random and Pseudo-random Structures
Abstract Clustering Algorithms for Random and Pseudo-random Structures Pradipta Mitra 2008 Partitioning of a set objects into a number of clusters according to a suitable distance metric is one of the
More informationLecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the
More informationPCA with random noise. Van Ha Vu. Department of Mathematics Yale University
PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical
More informationSpectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity
Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Adjacency matrix and Laplacian Intuition, spectral graph drawing
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationNorms of Random Matrices & Low-Rank via Sampling
CS369M: Algorithms for Modern Massive Data Set Analysis Lecture 4-10/05/2009 Norms of Random Matrices & Low-Rank via Sampling Lecturer: Michael Mahoney Scribes: Jacob Bien and Noah Youngs *Unedited Notes
More informationU.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018
U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018 Lecture 3 In which we show how to find a planted clique in a random graph. 1 Finding a Planted Clique We will analyze
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More information1 Adjacency matrix and eigenvalues
CSC 5170: Theory of Computational Complexity Lecture 7 The Chinese University of Hong Kong 1 March 2010 Our objective of study today is the random walk algorithm for deciding if two vertices in an undirected
More informationLecture 13: Spectral Graph Theory
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More informationLecture 12 : Graph Laplacians and Cheeger s Inequality
CPS290: Algorithmic Foundations of Data Science March 7, 2017 Lecture 12 : Graph Laplacians and Cheeger s Inequality Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Graph Laplacian Maybe the most beautiful
More informationarxiv: v5 [math.na] 16 Nov 2017
RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationRandom matrices: Distribution of the least singular value (via Property Testing)
Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued
More informationGraph Partitioning Using Random Walks
Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationLecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality
CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality Lecturer: Shayan Oveis Gharan May 4th Scribe: Gabriel Cadamuro Disclaimer:
More informationExpander Construction in VNC 1
Expander Construction in VNC 1 Sam Buss joint work with Valentine Kabanets, Antonina Kolokolova & Michal Koucký Prague Workshop on Bounded Arithmetic November 2-3, 2017 Talk outline I. Combinatorial construction
More informationLecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods
Stat260/CS294: Spectral Graph Methods Lecture 20-04/07/205 Lecture: Local Spectral Methods (3 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide
More informationRandom Lifts of Graphs
27th Brazilian Math Colloquium, July 09 Plan of this talk A brief introduction to the probabilistic method. A quick review of expander graphs and their spectrum. Lifts, random lifts and their properties.
More informationSpectral and Electrical Graph Theory. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity
Spectral and Electrical Graph Theory Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Spectral Graph Theory: Understand graphs through eigenvectors and
More informationCertifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering
Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint
More informationRank minimization via the γ 2 norm
Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational
More informationFinding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October
Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More information1 Tridiagonal matrices
Lecture Notes: β-ensembles Bálint Virág Notes with Diane Holcomb 1 Tridiagonal matrices Definition 1. Suppose you have a symmetric matrix A, we can define its spectral measure (at the first coordinate
More informationLecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure
Stat260/CS294: Spectral Graph Methods Lecture 19-04/02/2015 Lecture: Local Spectral Methods (2 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide
More informationAM205: Assignment 2. i=1
AM05: Assignment Question 1 [10 points] (a) [4 points] For p 1, the p-norm for a vector x R n is defined as: ( n ) 1/p x p x i p ( ) i=1 This definition is in fact meaningful for p < 1 as well, although
More informationSpectral Clustering. Guokun Lai 2016/10
Spectral Clustering Guokun Lai 2016/10 1 / 37 Organization Graph Cut Fundamental Limitations of Spectral Clustering Ng 2002 paper (if we have time) 2 / 37 Notation We define a undirected weighted graph
More informationPositive Semi-definite programing and applications for approximation
Combinatorial Optimization 1 Positive Semi-definite programing and applications for approximation Guy Kortsarz Combinatorial Optimization 2 Positive Sem-Definite (PSD) matrices, a definition Note that
More informationInvariant Subspace Perturbations or: How I Learned to Stop Worrying and Love Eigenvectors
Invariant Subspace Perturbations or: How I Learned to Stop Worrying and Love Eigenvectors Alexander Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu
More informationSpectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min
Spectral Graph Theory Lecture 2 The Laplacian Daniel A. Spielman September 4, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in class. The notes written before
More informationRandom Matrices: Invertibility, Structure, and Applications
Random Matrices: Invertibility, Structure, and Applications Roman Vershynin University of Michigan Colloquium, October 11, 2011 Roman Vershynin (University of Michigan) Random Matrices Colloquium 1 / 37
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationSparsification by Effective Resistance Sampling
Spectral raph Theory Lecture 17 Sparsification by Effective Resistance Sampling Daniel A. Spielman November 2, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationBALANCING GAUSSIAN VECTORS. 1. Introduction
BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More informationInfo-Greedy Sequential Adaptive Compressed Sensing
Info-Greedy Sequential Adaptive Compressed Sensing Yao Xie Joint work with Gabor Braun and Sebastian Pokutta Georgia Institute of Technology Presented at Allerton Conference 2014 Information sensing for
More informationCS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory
CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory Tim Roughgarden & Gregory Valiant May 2, 2016 Spectral graph theory is the powerful and beautiful theory that arises from
More informationCOMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY
COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY OLIVIER GUÉDON AND ROMAN VERSHYNIN Abstract. We present a simple and flexible method to prove consistency of semidefinite optimization
More informationGraph Partitioning Algorithms and Laplacian Eigenvalues
Graph Partitioning Algorithms and Laplacian Eigenvalues Luca Trevisan Stanford Based on work with Tsz Chiu Kwok, Lap Chi Lau, James Lee, Yin Tat Lee, and Shayan Oveis Gharan spectral graph theory Use linear
More informationSpectral Partitiong in a Stochastic Block Model
Spectral Graph Theory Lecture 21 Spectral Partitiong in a Stochastic Block Model Daniel A. Spielman November 16, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened
More informationInverse Power Method for Non-linear Eigenproblems
Inverse Power Method for Non-linear Eigenproblems Matthias Hein and Thomas Bühler Anubhav Dwivedi Department of Aerospace Engineering & Mechanics 7th March, 2017 1 / 30 OUTLINE Motivation Non-Linear Eigenproblems
More informationIterative solvers for linear equations
Spectral Graph Theory Lecture 15 Iterative solvers for linear equations Daniel A. Spielman October 1, 009 15.1 Overview In this and the next lecture, I will discuss iterative algorithms for solving linear
More informationReconstruction in the Generalized Stochastic Block Model
Reconstruction in the Generalized Stochastic Block Model Marc Lelarge 1 Laurent Massoulié 2 Jiaming Xu 3 1 INRIA-ENS 2 INRIA-Microsoft Research Joint Centre 3 University of Illinois, Urbana-Champaign GDR
More informationSingular value decomposition (SVD) of large random matrices. India, 2010
Singular value decomposition (SVD) of large random matrices Marianna Bolla Budapest University of Technology and Economics marib@math.bme.hu India, 2010 Motivation New challenge of multivariate statistics:
More informationGraph Sparsification I : Effective Resistance Sampling
Graph Sparsification I : Effective Resistance Sampling Nikhil Srivastava Microsoft Research India Simons Institute, August 26 2014 Graphs G G=(V,E,w) undirected V = n w: E R + Sparsification Approximate
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationLecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving
Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney
More informationParallel Singular Value Decomposition. Jiaxing Tan
Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector
More informationPattern Recognition 2
Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error
More informationProblem Set 2. Assigned: Mon. November. 23, 2015
Pseudorandomness Prof. Salil Vadhan Problem Set 2 Assigned: Mon. November. 23, 2015 Chi-Ning Chou Index Problem Progress 1 SchwartzZippel lemma 1/1 2 Robustness of the model 1/1 3 Zero error versus 1-sided
More informationSparse PCA in High Dimensions
Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and
More informationORIE 6334 Spectral Graph Theory October 13, Lecture 15
ORIE 6334 Spectral Graph heory October 3, 206 Lecture 5 Lecturer: David P. Williamson Scribe: Shijin Rajakrishnan Iterative Methods We have seen in the previous lectures that given an electrical network,
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationCS6999 Probabilistic Methods in Integer Programming Randomized Rounding Andrew D. Smith April 2003
CS6999 Probabilistic Methods in Integer Programming Randomized Rounding April 2003 Overview 2 Background Randomized Rounding Handling Feasibility Derandomization Advanced Techniques Integer Programming
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationRandomized Algorithms
Randomized Algorithms 南京大学 尹一通 Martingales Definition: A sequence of random variables X 0, X 1,... is a martingale if for all i > 0, E[X i X 0,...,X i1 ] = X i1 x 0, x 1,...,x i1, E[X i X 0 = x 0, X 1
More informationLecture 15: Expanders
CS 710: Complexity Theory 10/7/011 Lecture 15: Expanders Instructor: Dieter van Melkebeek Scribe: Li-Hsiang Kuo In the last lecture we introduced randomized computation in terms of machines that have access
More informationGeometry of log-concave Ensembles of random matrices
Geometry of log-concave Ensembles of random matrices Nicole Tomczak-Jaegermann Joint work with Radosław Adamczak, Rafał Latała, Alexander Litvak, Alain Pajor Cortona, June 2011 Nicole Tomczak-Jaegermann
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationOn the concentration of eigenvalues of random symmetric matrices
On the concentration of eigenvalues of random symmetric matrices Noga Alon Michael Krivelevich Van H. Vu April 23, 2012 Abstract It is shown that for every 1 s n, the probability that the s-th largest
More informationUsing Friendly Tail Bounds for Sums of Random Matrices
Using Friendly Tail Bounds for Sums of Random Matrices Joel A. Tropp Computing + Mathematical Sciences California Institute of Technology jtropp@cms.caltech.edu Research supported in part by NSF, DARPA,
More informationNotes on Gaussian processes and majorizing measures
Notes on Gaussian processes and majorizing measures James R. Lee 1 Gaussian processes Consider a Gaussian process {X t } for some index set T. This is a collection of jointly Gaussian random variables,
More informationInvertibility of random matrices
University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]
More informationFiedler s Theorems on Nodal Domains
Spectral Graph Theory Lecture 7 Fiedler s Theorems on Nodal Domains Daniel A. Spielman September 19, 2018 7.1 Overview In today s lecture we will justify some of the behavior we observed when using eigenvectors
More informationGraph Clustering Algorithms
PhD Course on Graph Mining Algorithms, Università di Pisa February, 2018 Clustering: Intuition to Formalization Task Partition a graph into natural groups so that the nodes in the same cluster are more
More informationFundamentals of Matrices
Maschinelles Lernen II Fundamentals of Matrices Christoph Sawade/Niels Landwehr/Blaine Nelson Tobias Scheffer Matrix Examples Recap: Data Linear Model: f i x = w i T x Let X = x x n be the data matrix
More informationLecture 6: Random Walks versus Independent Sampling
Spectral Graph Theory and Applications WS 011/01 Lecture 6: Random Walks versus Independent Sampling Lecturer: Thomas Sauerwald & He Sun For many problems it is necessary to draw samples from some distribution
More informationvariance of independent variables: sum of variances So chebyshev predicts won t stray beyond stdev.
Announcements No class monday. Metric embedding seminar. Review expectation notion of high probability. Markov. Today: Book 4.1, 3.3, 4.2 Chebyshev. Remind variance, standard deviation. σ 2 = E[(X µ X
More informationTHE SZEMERÉDI REGULARITY LEMMA AND ITS APPLICATION
THE SZEMERÉDI REGULARITY LEMMA AND ITS APPLICATION YAQIAO LI In this note we will prove Szemerédi s regularity lemma, and its application in proving the triangle removal lemma and the Roth s theorem on
More informationFiedler s Theorems on Nodal Domains
Spectral Graph Theory Lecture 7 Fiedler s Theorems on Nodal Domains Daniel A Spielman September 9, 202 7 About these notes These notes are not necessarily an accurate representation of what happened in
More informationLecture 14: Random Walks, Local Graph Clustering, Linear Programming
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These
More information3 Best-Fit Subspaces and Singular Value Decomposition
3 Best-Fit Subspaces and Singular Value Decomposition (SVD) Think of the rows of an n d matrix A as n data points in a d-dimensional space and consider the problem of finding the best k-dimensional subspace
More informationBounded Arithmetic, Expanders, and Monotone Propositional Proofs
Bounded Arithmetic, Expanders, and Monotone Propositional Proofs joint work with Valentine Kabanets, Antonina Kolokolova & Michal Koucký Takeuti Symposium on Advances in Logic Kobe, Japan September 20,
More informationNORMS ON SPACE OF MATRICES
NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationConvex and Semidefinite Programming for Approximation
Convex and Semidefinite Programming for Approximation We have seen linear programming based methods to solve NP-hard problems. One perspective on this is that linear programming is a meta-method since
More informationRandom Methods for Linear Algebra
Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform
More informationECS231: Spectral Partitioning. Based on Berkeley s CS267 lecture on graph partition
ECS231: Spectral Partitioning Based on Berkeley s CS267 lecture on graph partition 1 Definition of graph partitioning Given a graph G = (N, E, W N, W E ) N = nodes (or vertices), E = edges W N = node weights
More informationClustering and Gaussian Mixture Models
Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap
More informationOn the Spectra of General Random Graphs
On the Spectra of General Random Graphs Fan Chung Mary Radcliffe University of California, San Diego La Jolla, CA 92093 Abstract We consider random graphs such that each edge is determined by an independent
More informationLecture 10 February 4, 2013
UBC CPSC 536N: Sparse Approximations Winter 2013 Prof Nick Harvey Lecture 10 February 4, 2013 Scribe: Alexandre Fréchette This lecture is about spanning trees and their polyhedral representation Throughout
More informationLaplacian Matrices of Graphs: Spectral and Electrical Theory
Laplacian Matrices of Graphs: Spectral and Electrical Theory Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale University Toronto, Sep. 28, 2 Outline Introduction to graphs
More informationSmall Ball Probability, Arithmetic Structure and Random Matrices
Small Ball Probability, Arithmetic Structure and Random Matrices Roman Vershynin University of California, Davis April 23, 2008 Distance Problems How far is a random vector X from a given subspace H in
More informationCombinatorial Optimization
Combinatorial Optimization 2017-2018 1 Maximum matching on bipartite graphs Given a graph G = (V, E), find a maximum cardinal matching. 1.1 Direct algorithms Theorem 1.1 (Petersen, 1891) A matching M is
More informationExtreme eigenvalues of Erdős-Rényi random graphs
Extreme eigenvalues of Erdős-Rényi random graphs Florent Benaych-Georges j.w.w. Charles Bordenave and Antti Knowles MAP5, Université Paris Descartes May 18, 2018 IPAM UCLA Inhomogeneous Erdős-Rényi random
More informationProbability Background
Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second
More informationSpectral Clustering on Handwritten Digits Database
University of Maryland-College Park Advance Scientific Computing I,II Spectral Clustering on Handwritten Digits Database Author: Danielle Middlebrooks Dmiddle1@math.umd.edu Second year AMSC Student Advisor:
More informationLinear algebra for computational statistics
University of Seoul May 3, 2018 Vector and Matrix Notation Denote 2-dimensional data array (n p matrix) by X. Denote the element in the ith row and the jth column of X by x ij or (X) ij. Denote by X j
More informationData Analysis and Manifold Learning Lecture 7: Spectral Clustering
Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral
More informationThe Informativeness of k-means for Learning Mixture Models
The Informativeness of k-means for Learning Mixture Models Vincent Y. F. Tan (Joint work with Zhaoqiang Liu) National University of Singapore June 18, 2018 1/35 Gaussian distribution For F dimensions,
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationLecture 5: January 30
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 5: January 30 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They
More informationLecture 2: Review of Basic Probability Theory
ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent
More information