8.1 Concentration inequality for Gaussian random matrix (cont d)

Size: px
Start display at page:

Download "8.1 Concentration inequality for Gaussian random matrix (cont d)"

Transcription

1 MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration inequality for Gaussian random matrix Spectral clustering algorithm based on Laplacian matrix 8 Concentration inequality for Gaussian random matrix (cont d) 8 Brief review Theorem 8 Given A ij iid N (, ), we have P r { A 2 E [ A 2 ] > t} 2e t2 2 It implies that A 2 has a Gaussian tail behavior Theorem 82 Given Gaussian random matrix A R n d, we have E [ A 2 ] n + d Theorem 8 and Theorem 82 imply that { P r A 2 n + } d + t 2e t2 /2 Therefore, when the deviation t gets bigger, the probability becomes smaller In particular, let t = t n, and assume that t n as n Then we get that with probability converging to, A 2 n + d + t n, In plain english, A 2 is likely to be smaller than n + d + t n Remark Is n + d + t n a tight upper bound? The following simple analysis shows that the upper bound is tight up to constant factors Consider A 2 as the sum of squared length of projections of rows of A to best-fit vector v That is, A 2 2 = max v 2 = Av 2 2 Ae 2 2, (8) where e R d is a unit vector whose first element is equal to and the other elements are all zero In fact, we can use any other basis vector e i Then Ae 2 2 = A 2 2, which is the squared length

2 of the first column of A, and follows chi-squared distribution with n degree of freedom (χ 2 (n)) (Recall that chi-squared random variable is expressed as the summation of squared normal random variables) In addition, we have shown in the previous lectures that chi-squared random variable is also highly concentrated on its mean, and thus A 2 n with high probability Note that the same argument is applied for A T Hence, we have { n, } A 2 max d with high probability Thus, we see that the upper and lower bounds of A 2 differ by at most a constant factor In plain english, A 2 is roughly on the same order as the length of the row or column vector of A Notice that we have assumed that A has independent entries It is easy to see that without such independence assumption, A 2 could be much larger than the length of the row or column vector of A, for example, when all rows or columns of A are identical 82 Spectral clustering under Gaussian mixture model (revisited) In the previous class, we learned that the fraction of misclassified nodes is bounded as, (Fraction of misclassified nodes) ( n µ 2 2 ) 2, where = X E [X] Note that X R n d is the data matrix, and µ is the cluster center under iid Gaussian mixture model Notice that ij N(, σ 2 ) Then by plugging in the previous result, we have ( n ) 2 σ + d + tn, and σ behaves as a scaling factor From this result, if n µ 2 σ( n + d + t n ), then the fraction of misclassified nodes goes to zero with high probability Hence, we say spectral clustering achieves approximate cluster recovery under sufficient condition n µ 2 σ( n + d + t n ) If the dimension of dataset, d, is proportional to n, the number of data points (ie d = αn, where α is a fixed constant), we have ( µ 2 σ + α + t ) n n Note that t n goes to arbitrarily slowly as n Then tn n goes to zero as n As a result, we see that the cluster separation µ 2 does not have to scale with dimension d anymore In contrast, recall that in the previous lectures, we have shown that a naive thresholding algorithm needs µ 2 to scale with d 83 Spectral clustering based on Laplacian matrix Plan In this lecture, we will introduce weighted similarity graph and its Laplacian matrices for spectral clustering In the next lecture, we will introduce stochastic block model, which is a simple probabilistic model to generate similarity (unweighted) graph 2

3 83 Motivation for spectral clustering based on Laplacian matrix Recall that we have learnt spectral clustering based on data matrix X R n d ; it involves singular value decomposition of X or eigenvalue decomposition of XX and X X Here, we introduce a more general setup for spectral clustering Define a similarity matrix S R n n between every pair of data points The component S ij characterizes similarity between data points X i and X j, where X i denotes the ith row of X Example 8 S = XX T Each component S ij is the inner product of x i and x j Notice that the spectral clustering algorithm discussed in the previous lectures deals with this similarity matrix 2 Define S with { S ij = exp } 2σ 2 X i X j 2 2 This function has the form of Gaussian density function, and it is called as Gaussian kernel function with σ When the two data points X i and X j differ a lot, the corresponding similarity score S ij gets smaller, and vice versa A crucial parameter is σ: when σ becomes larger, S ij is less sensitive to the difference between x i and x j, and vice versa The similarity matrix can be equivalently viewed as a similarity graph Definition 8 (Similarity graph) Given a similarity matrix S, let G denote a weighted graph, where each node i corresponds to a data point i; every pair of two nodes i and j are connected by an edge with edge weight S ij In this case S is called weighted adjacency matrix of graph G Note that the defined similarity graph G is a complete graph (every pair of two nodes are connected) Often, we may apply a truncation procedure to sparsify G There are two possible truncation procedures Let A R n n denote the similarity matrix after truncation: A ij = Sij ɛ, where ɛ is a given threshold A ij = S ij Sij ɛ Equivalently, by truncating, in graph G, two nodes i and j are connected if and only if S ij ɛ In the first case, A is a binary matrix, and thus graph G after truncation becomes an unweighted graph There are at least two reasons why truncation is used in practical It can potentially remove a bit of noise; 2 A is sparse and computation of A is faster For instance, eigenvalue decomposition of A gets faster Example 82 (A clustering example for truncation) Let us see why truncation could possibly remove noise by considering the following example, depicted in Fig 8 In this example, the data distribution clearly does not come from Gaussian mixture model There are two possible clusters, 3

4 Figure 8: An example of two clusters: one consists of data points in the out ring; the other consists of data points in the inner ring one given by the out ring, and the other given by the inner ring Two clusters have very close cluster centers Hence, the spectral clustering developed under GMM does not work well any more To deal with this, we may apply truncation Two data points i and j are connected, ie, Aij >, if and only if Sij By carefully picking the threshold, one can get that data points in the same cluster are densely connected, while data points in different clusters are loosely connected, even though not every pair of data points in the same cluster is connected We call edges which connect data points in two different clusters cross links Fig 82 shows that there are few cross-links after proper truncation Figure 82: Construction of cross links between inner and outer ring An important question is: can we recover clusters from A? Let us consider two cases: 4

5 There is no cros link In this case, we can classifies the inner and outer ring very easily by finding the two connected components 2 There are a few cross links In this case, we will use spectral clustering based on Laplacian matrix of a graph 832 Weighted graph and its Laplacian matrices Consider a general symmetric matrix A Rn n with Aii = and all Aij Let G denote a graph with n nodes, where nodes i and j are connected if and only if Aij >, and when they are connected, Aij is the edge weight for edge (i, j) Notice that there is a one-to-one correspondence between A Figure 83: Node i and j are connected with edge weight Aij and G, and A is called (weighted) adjacency matrix of graph G When A is a binary matrix, there is no weight on the edge and A only represents whether the two points are the neighbors In this case, A is simply called adjacency matrix of G Consider the following diagonal matrix D: d D=, dn Pn and its ith elements are defined as di = j= Aij for i {,, n} If Aij {, }, then di represents the number of neighbors of node i Hence di is called degree of node i, and D is called degree matrix Definition 82 (Laplacian matrix) There are three possible versions of Laplacian matrix unnormalized version: L = D A, (unnormalized) where A is a weighted adjacency matrix defined previously 2 normalized version: Assume that D has no zero diagonal elements With this D, Ln = D 2 LD 2 = I D 2 AD 2, 5 (normalized)

6 where d D 2 = dn 3 random walk version: Lrw = D L = I D A = D 2 Ln D 2 (randomwalk) Note For undirected graph G, A is symmetric (ie, Aij = Aji ) Hence, L and Ln are also symmetric, but Lrw may not be symmetric Question Why is this called Laplacian? Recall: For a scalar function f (x) : R R, Laplacian operator is simply defined as, 2 f (x), d2 f (x) (dx)2 (82) We consider a (unweighted) line graph G with n nodes Each node is associated with a value xi, and the values at any two neighboring nodes differ by δx, ie, xi+ xi = δx for all i n Figure 84: Each point in G has two neighbors Assume n is large and δx is small With this notation, we can approximate the derivative of f at xi and xi respectively as, f (xi+ ) f (xi ), δx f (xi ) f (xi ) f (xi ) δx f (xi ) Then the second derivative at xi can be approximated as, f (xi ) f (xi ) δx f (xi ) + f (xi+ ) 2f (xi ) (δx)2 2 f (xi ) = f (xi ) 6 (83)

7 Now consider the unnormalized Laplacian matrix L = D A, where A is the adjacency matrix of the line graph G We have f(x ) L = 2f(x i ) f(x i ) f(x i+ ), (84) f(x n ) since every ith node has (i )th and (i + )th nodes as its neighbors, so its degree is equal to 2 for i = 2,, n By comparing (83) and (84), we can observe that the ith element of (84) is approximately 2 f(x i )(δx) 2 2 Why is L rw called random walk version? To answer for this question, let s see what the random walk is Definition 83 (Randomwalk on Graph G) We say X, X,, X t, is random walk on graph G with corresponding A if P (X t+ = j X t = i) d i A ij In other words, the random walk on undirected graph is a Markov chain defined on the graph, which jumps over the nodes at each time setp, and the probability of which node jumping to only depends on the current node that it is at Denote P ij as the transition probability matrix for random walk That is, P ij d i A ij Since D is a diagonal matrix, it is easy to see that L rw = I P Hence L is called random walk version of Laplacian 833 Properties of Laplacian matrices In this subsection, we characterize some properties of Laplacian matrices Proposition 8 Recall L = D A and L n = I D 2 AD 2 Then, for any v R n, we have v Lv = A ij (v i v j ) 2 2 and ( ) 2 v L n v = v i A ij v j 2 di dj Note: Given any v R n, let v i denote the value associated with node i Then, the proposition shows that L and L n characterize the quadratice difference between node values weighted by the edge weights 7

8 Proof v Lv = v (D A)v = d i vi 2 A ij v i v j i = n d i vi 2 2 n A ij v i v j + d j v 2 j = A ij (v i v j ) i= j= The following proposition characterizes eigenvalues and eigenvector of Laplacian matrices Proposition 82 Consider a connected graph G, and let L be the unnormalized Laplacian matrix and L n be the normalized Laplacian matrix Then If G is connected, then = λ (L n ) < λ 2 (L n ) λ n (L n ) and similarly, = λ (L) < λ 2 (L) λ n (L) 2 Let u be the eigenvector corresponding to λ (L n ) Then u,i d i Similarly, let ũ be the eigenvector corresponding to λ (L) Then ũ is parallel to the all-one vector 3 λ 2 (L n ) = min z 2 st A ij (z i z j ) 2 n d i z i = i= n d i zi 2 = i= Proof By P roposition 8, L n is a positive semidefinite matrix and u L nu = Thus, = λ (L n ) λ 2 (L n ) λ n (L n ), and u is an eigenvector corresponding to λ (L n ) Similar conclusions hold for L We next show λ 2 (L n ) > It suffices to show that v L n v > for all v u We prove it by contradiction Suppose there exists a v u and v L n v = Fix any node i in graph G Since G is connected, then there must exist a path from to i Suppose the edges on the path are given by ( = i, i 2 ), (i 2, i 3 ),, (i k, i k = i) Then A i,i 2 >,, A ik,i k > Thus, v L n v = implies v d = v i di,, v ik dik = v i k dik Therefore v d = v i di Since i is arbitrarily chosen, we have that for all i, v i di v d, which contradicts the assumption that v u 8

9 Finally, we give a characterization of λ 2 (L n ) By definition, λ 2 (L n ) = min v 2 =,v u v L n v We have v u n v i u i = i= n v i di = i= Let z i = v i di Then v u i z id i = Moreover v 2 = n P roposition 8, v L n v = 2 A ij(z i z j ) 2 i= d iz 2 i = Also, by Next, we will derive an analogy of the Proposition 82 for a graph G with k connected components Let S,, S k denote the k connected components of G Then [n] = S S 2 S k, S i S j =, and A Si,S j = for i j, where A S,T (A ij ) () S T Thus, the normalized Laplacian matrix for G has the following diagonal-block structure: L n = L () n L (k) n ; where L (j) n is the normalized Laplacian matrix for j th component of the graph G The following proposition immediately follows Proposition 83 {λ i (L n )} n i= = { λ i (L () n ) } i S In particular, we have { λi (L (2) n ) } { i S 2 λi (L n (k) ) } i S k λ = λ 2 = = λ k = < λ k+ λ n and there are k orthogonal eigenvectors corresponding to zero eigenvalue, which are given by u,, u k with u l,i = c l i Sl di, l k; where c l is a normalization constant dependent on l In particular, c d c d S u = c 2 d S + u 2 = u k = c k dn Sk + c 2 d S + S 2 c k dn 9

10 Let U n k [u, u 2, u k ] Then, We can normalize U i to have a unit norm: (c di ),,, ) i S (, c 2 di,, ) i S 2 U i = (,,, c k di ) i S k Z i U i U i 2 = (,,, ) i S (,,, ) i S 2 (,,, ) i S k Note By P roposition 83, the number of zero eigenvalues is exactly the same as the number of connected components One can recover the k-connected components by clustering n rows {U i } n i= Since k zero eigenvalues implies k multiplicity, eigenvectors are not uniquely determined In particular, Claim 8 Let R O(k, k), and let [ũ,, ũ k ] = [u,, u k ]R Then, {ũ,, ũ k } are also orthogonal eigenvectors of L n corresponding to zero eigenvalue

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods Spectral Clustering Seungjin Choi Department of Computer Science POSTECH, Korea seungjin@postech.ac.kr 1 Spectral methods Spectral Clustering? Methods using eigenvectors of some matrices Involve eigen-decomposition

More information

MATH 829: Introduction to Data Mining and Analysis Clustering II

MATH 829: Introduction to Data Mining and Analysis Clustering II his lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 829: Introduction to Data Mining and Analysis Clustering II Dominique Guillot Departments

More information

An Algorithmist s Toolkit September 10, Lecture 1

An Algorithmist s Toolkit September 10, Lecture 1 18.409 An Algorithmist s Toolkit September 10, 2009 Lecture 1 Lecturer: Jonathan Kelner Scribe: Jesse Geneson (2009) 1 Overview The class s goals, requirements, and policies were introduced, and topics

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

Machine Learning for Data Science (CS4786) Lecture 11

Machine Learning for Data Science (CS4786) Lecture 11 Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will

More information

Lecture 13: Spectral Graph Theory

Lecture 13: Spectral Graph Theory CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

ORIE 6334 Spectral Graph Theory September 22, Lecture 11

ORIE 6334 Spectral Graph Theory September 22, Lecture 11 ORIE 6334 Spectral Graph Theory September, 06 Lecturer: David P. Williamson Lecture Scribe: Pu Yang In today s lecture we will focus on discrete time random walks on undirected graphs. Specifically, we

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

6.842 Randomness and Computation March 3, Lecture 8

6.842 Randomness and Computation March 3, Lecture 8 6.84 Randomness and Computation March 3, 04 Lecture 8 Lecturer: Ronitt Rubinfeld Scribe: Daniel Grier Useful Linear Algebra Let v = (v, v,..., v n ) be a non-zero n-dimensional row vector and P an n n

More information

Functional Analysis Review

Functional Analysis Review Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

More information

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms A simple example Two ideal clusters, with two points each Spectral clustering Lecture 2 Spectral clustering algorithms 4 2 3 A = Ideally permuted Ideal affinities 2 Indicator vectors Each cluster has an

More information

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These

More information

MATH 567: Mathematical Techniques in Data Science Clustering II

MATH 567: Mathematical Techniques in Data Science Clustering II Spectral clustering: overview MATH 567: Mathematical Techniques in Data Science Clustering II Dominique uillot Departments of Mathematical Sciences University of Delaware Overview of spectral clustering:

More information

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality Lecturer: Shayan Oveis Gharan May 4th Scribe: Gabriel Cadamuro Disclaimer:

More information

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the

More information

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

COMPSCI 514: Algorithms for Data Science

COMPSCI 514: Algorithms for Data Science COMPSCI 514: Algorithms for Data Science Arya Mazumdar University of Massachusetts at Amherst Fall 2018 Lecture 8 Spectral Clustering Spectral clustering Curse of dimensionality Dimensionality Reduction

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012 CS-62 Theory Gems October 24, 202 Lecture Lecturer: Aleksander Mądry Scribes: Carsten Moldenhauer and Robin Scheibler Introduction In Lecture 0, we introduced a fundamental object of spectral graph theory:

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and

More information

Statistical Learning Notes III-Section 2.4.3

Statistical Learning Notes III-Section 2.4.3 Statistical Learning Notes III-Section 2.4.3 "Graphical" Spectral Features Stephen Vardeman Analytics Iowa LLC January 2019 Stephen Vardeman (Analytics Iowa LLC) Statistical Learning Notes III-Section

More information

12. Cholesky factorization

12. Cholesky factorization L. Vandenberghe ECE133A (Winter 2018) 12. Cholesky factorization positive definite matrices examples Cholesky factorization complex positive definite matrices kernel methods 12-1 Definitions a symmetric

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information

Lecture 10: October 27, 2016

Lecture 10: October 27, 2016 Mathematical Toolkit Autumn 206 Lecturer: Madhur Tulsiani Lecture 0: October 27, 206 The conjugate gradient method In the last lecture we saw the steepest descent or gradient descent method for finding

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY

COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY OLIVIER GUÉDON AND ROMAN VERSHYNIN Abstract. We present a simple and flexible method to prove consistency of semidefinite optimization

More information

1 Positive definiteness and semidefiniteness

1 Positive definiteness and semidefiniteness Positive definiteness and semidefiniteness Zdeněk Dvořák May 9, 205 For integers a, b, and c, let D(a, b, c) be the diagonal matrix with + for i =,..., a, D i,i = for i = a +,..., a + b,. 0 for i = a +

More information

17.1 Directed Graphs, Undirected Graphs, Incidence Matrices, Adjacency Matrices, Weighted Graphs

17.1 Directed Graphs, Undirected Graphs, Incidence Matrices, Adjacency Matrices, Weighted Graphs Chapter 17 Graphs and Graph Laplacians 17.1 Directed Graphs, Undirected Graphs, Incidence Matrices, Adjacency Matrices, Weighted Graphs Definition 17.1. A directed graph isa pairg =(V,E), where V = {v

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

Lecture 5: Random Walks and Markov Chain

Lecture 5: Random Walks and Markov Chain Spectral Graph Theory and Applications WS 20/202 Lecture 5: Random Walks and Markov Chain Lecturer: Thomas Sauerwald & He Sun Introduction to Markov Chains Definition 5.. A sequence of random variables

More information

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Lecture 12 : Graph Laplacians and Cheeger s Inequality CPS290: Algorithmic Foundations of Data Science March 7, 2017 Lecture 12 : Graph Laplacians and Cheeger s Inequality Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Graph Laplacian Maybe the most beautiful

More information

MATH 567: Mathematical Techniques in Data Science Clustering II

MATH 567: Mathematical Techniques in Data Science Clustering II This lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 567: Mathematical Techniques in Data Science Clustering II Dominique Guillot Departments

More information

Spectral Theory of Unsigned and Signed Graphs Applications to Graph Clustering. Some Slides

Spectral Theory of Unsigned and Signed Graphs Applications to Graph Clustering. Some Slides Spectral Theory of Unsigned and Signed Graphs Applications to Graph Clustering Some Slides Jean Gallier Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104,

More information

Clustering compiled by Alvin Wan from Professor Benjamin Recht s lecture, Samaneh s discussion

Clustering compiled by Alvin Wan from Professor Benjamin Recht s lecture, Samaneh s discussion Clustering compiled by Alvin Wan from Professor Benjamin Recht s lecture, Samaneh s discussion 1 Overview With clustering, we have several key motivations: archetypes (factor analysis) segmentation hierarchy

More information

Iterative solvers for linear equations

Iterative solvers for linear equations Spectral Graph Theory Lecture 23 Iterative solvers for linear equations Daniel A. Spielman November 26, 2018 23.1 Overview In this and the next lecture, I will discuss iterative algorithms for solving

More information

Lecture 2: Linear Algebra

Lecture 2: Linear Algebra Lecture 2: Linear Algebra Rajat Mittal IIT Kanpur We will start with the basics of linear algebra that will be needed throughout this course That means, we will learn about vector spaces, linear independence,

More information

Math 4153 Exam 3 Review. The syllabus for Exam 3 is Chapter 6 (pages ), Chapter 7 through page 137, and Chapter 8 through page 182 in Axler.

Math 4153 Exam 3 Review. The syllabus for Exam 3 is Chapter 6 (pages ), Chapter 7 through page 137, and Chapter 8 through page 182 in Axler. Math 453 Exam 3 Review The syllabus for Exam 3 is Chapter 6 (pages -2), Chapter 7 through page 37, and Chapter 8 through page 82 in Axler.. You should be sure to know precise definition of the terms we

More information

2.1 Optimization formulation of k-means

2.1 Optimization formulation of k-means MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 2: k-means Clustering Lecturer: Jiaming Xu Scribe: Jiaming Xu, September 2, 2016 Outline Optimization formulation of k-means Convergence

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

0.1 Naive formulation of PageRank

0.1 Naive formulation of PageRank PageRank is a ranking system designed to find the best pages on the web. A webpage is considered good if it is endorsed (i.e. linked to) by other good webpages. The more webpages link to it, and the more

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

Computational Genomics

Computational Genomics Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event

More information

Lecture 8 : Eigenvalues and Eigenvectors

Lecture 8 : Eigenvalues and Eigenvectors CPS290: Algorithmic Foundations of Data Science February 24, 2017 Lecture 8 : Eigenvalues and Eigenvectors Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Hermitian Matrices It is simpler to begin with

More information

Lecture 8: Linear Algebra Background

Lecture 8: Linear Algebra Background CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected

More information

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017 COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SEQUENTIAL DATA So far, when thinking

More information

Iterative solvers for linear equations

Iterative solvers for linear equations Spectral Graph Theory Lecture 15 Iterative solvers for linear equations Daniel A. Spielman October 1, 009 15.1 Overview In this and the next lecture, I will discuss iterative algorithms for solving linear

More information

ORIE 6334 Spectral Graph Theory October 13, Lecture 15

ORIE 6334 Spectral Graph Theory October 13, Lecture 15 ORIE 6334 Spectral Graph heory October 3, 206 Lecture 5 Lecturer: David P. Williamson Scribe: Shijin Rajakrishnan Iterative Methods We have seen in the previous lectures that given an electrical network,

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods Stat260/CS294: Spectral Graph Methods Lecture 20-04/07/205 Lecture: Local Spectral Methods (3 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.

More information

Networks and Their Spectra

Networks and Their Spectra Networks and Their Spectra Victor Amelkin University of California, Santa Barbara Department of Computer Science victor@cs.ucsb.edu December 4, 2017 1 / 18 Introduction Networks (= graphs) are everywhere.

More information

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank David Glickenstein November 3, 4 Representing graphs as matrices It will sometimes be useful to represent graphs

More information

Lecture 9: SVD, Low Rank Approximation

Lecture 9: SVD, Low Rank Approximation CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 9: SVD, Low Rank Approimation Lecturer: Shayan Oveis Gharan April 25th Scribe: Koosha Khalvati Disclaimer: hese notes have not been subjected

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Markov Chains Handout for Stat 110

Markov Chains Handout for Stat 110 Markov Chains Handout for Stat 0 Prof. Joe Blitzstein (Harvard Statistics Department) Introduction Markov chains were first introduced in 906 by Andrey Markov, with the goal of showing that the Law of

More information

Lecture: Local Spectral Methods (1 of 4)

Lecture: Local Spectral Methods (1 of 4) Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Diffusion and random walks on graphs

Diffusion and random walks on graphs Diffusion and random walks on graphs Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural

More information

Communities, Spectral Clustering, and Random Walks

Communities, Spectral Clustering, and Random Walks Communities, Spectral Clustering, and Random Walks David Bindel Department of Computer Science Cornell University 26 Sep 2011 20 21 19 16 22 28 17 18 29 26 27 30 23 1 25 5 8 24 2 4 14 3 9 13 15 11 10 12

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory Tim Roughgarden & Gregory Valiant May 2, 2016 Spectral graph theory is the powerful and beautiful theory that arises from

More information

Rank Determination for Low-Rank Data Completion

Rank Determination for Low-Rank Data Completion Journal of Machine Learning Research 18 017) 1-9 Submitted 7/17; Revised 8/17; Published 9/17 Rank Determination for Low-Rank Data Completion Morteza Ashraphijuo Columbia University New York, NY 1007,

More information

1 Solutions to selected problems

1 Solutions to selected problems Solutions to selected problems Section., #a,c,d. a. p x = n for i = n : 0 p x = xp x + i end b. z = x, y = x for i = : n y = y + x i z = zy end c. y = (t x ), p t = a for i = : n y = y(t x i ) p t = p

More information

1 Last time: least-squares problems

1 Last time: least-squares problems MATH Linear algebra (Fall 07) Lecture Last time: least-squares problems Definition. If A is an m n matrix and b R m, then a least-squares solution to the linear system Ax = b is a vector x R n such that

More information

Quantum Computing Lecture 2. Review of Linear Algebra

Quantum Computing Lecture 2. Review of Linear Algebra Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces

More information

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure Stat260/CS294: Spectral Graph Methods Lecture 19-04/02/2015 Lecture: Local Spectral Methods (2 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Spectral Clustering on Handwritten Digits Database

Spectral Clustering on Handwritten Digits Database University of Maryland-College Park Advance Scientific Computing I,II Spectral Clustering on Handwritten Digits Database Author: Danielle Middlebrooks Dmiddle1@math.umd.edu Second year AMSC Student Advisor:

More information

5 Linear Algebra and Inverse Problem

5 Linear Algebra and Inverse Problem 5 Linear Algebra and Inverse Problem 5.1 Introduction Direct problem ( Forward problem) is to find field quantities satisfying Governing equations, Boundary conditions, Initial conditions. The direct problem

More information

1 Inner Product and Orthogonality

1 Inner Product and Orthogonality CSCI 4/Fall 6/Vora/GWU/Orthogonality and Norms Inner Product and Orthogonality Definition : The inner product of two vectors x and y, x x x =.., y =. x n y y... y n is denoted x, y : Note that n x, y =

More information

Convergence of Random Walks and Conductance - Draft

Convergence of Random Walks and Conductance - Draft Graphs and Networks Lecture 0 Convergence of Random Walks and Conductance - Draft Daniel A. Spielman October, 03 0. Overview I present a bound on the rate of convergence of random walks in graphs that

More information

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018 ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space

More information

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities 6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities 1 Outline Outline Dynamical systems. Linear and Non-linear. Convergence. Linear algebra and Lyapunov functions. Markov

More information

Matrix estimation by Universal Singular Value Thresholding

Matrix estimation by Universal Singular Value Thresholding Matrix estimation by Universal Singular Value Thresholding Courant Institute, NYU Let us begin with an example: Suppose that we have an undirected random graph G on n vertices. Model: There is a real symmetric

More information

Walks, Springs, and Resistor Networks

Walks, Springs, and Resistor Networks Spectral Graph Theory Lecture 12 Walks, Springs, and Resistor Networks Daniel A. Spielman October 8, 2018 12.1 Overview In this lecture we will see how the analysis of random walks, spring networks, and

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics

More information

SPARSE RANDOM GRAPHS: REGULARIZATION AND CONCENTRATION OF THE LAPLACIAN

SPARSE RANDOM GRAPHS: REGULARIZATION AND CONCENTRATION OF THE LAPLACIAN SPARSE RANDOM GRAPHS: REGULARIZATION AND CONCENTRATION OF THE LAPLACIAN CAN M. LE, ELIZAVETA LEVINA, AND ROMAN VERSHYNIN Abstract. We study random graphs with possibly different edge probabilities in the

More information

Dissertation Defense

Dissertation Defense Clustering Algorithms for Random and Pseudo-random Structures Dissertation Defense Pradipta Mitra 1 1 Department of Computer Science Yale University April 23, 2008 Mitra (Yale University) Dissertation

More information

Spectral Clustering. Guokun Lai 2016/10

Spectral Clustering. Guokun Lai 2016/10 Spectral Clustering Guokun Lai 2016/10 1 / 37 Organization Graph Cut Fundamental Limitations of Spectral Clustering Ng 2002 paper (if we have time) 2 / 37 Notation We define a undirected weighted graph

More information

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University) Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University) The authors explain how the NCut algorithm for graph bisection

More information

Manifold Regularization

Manifold Regularization 9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

Lecture Semidefinite Programming and Graph Partitioning

Lecture Semidefinite Programming and Graph Partitioning Approximation Algorithms and Hardness of Approximation April 16, 013 Lecture 14 Lecturer: Alantha Newman Scribes: Marwa El Halabi 1 Semidefinite Programming and Graph Partitioning In previous lectures,

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Spectral radius, symmetric and positive matrices

Spectral radius, symmetric and positive matrices Spectral radius, symmetric and positive matrices Zdeněk Dvořák April 28, 2016 1 Spectral radius Definition 1. The spectral radius of a square matrix A is ρ(a) = max{ λ : λ is an eigenvalue of A}. For an

More information

5 Quiver Representations

5 Quiver Representations 5 Quiver Representations 5. Problems Problem 5.. Field embeddings. Recall that k(y,..., y m ) denotes the field of rational functions of y,..., y m over a field k. Let f : k[x,..., x n ] k(y,..., y m )

More information

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th STATS 306B: Unsupervised Learning Spring 2014 Lecture 3 April 7th Lecturer: Lester Mackey Scribe: Jordan Bryan, Dangna Li 3.1 Recap: Gaussian Mixture Modeling In the last lecture, we discussed the Gaussian

More information

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway

More information

Lecture 7. 1 Normalized Adjacency and Laplacian Matrices

Lecture 7. 1 Normalized Adjacency and Laplacian Matrices ORIE 6334 Spectral Graph Theory September 3, 206 Lecturer: David P. Williamson Lecture 7 Scribe: Sam Gutekunst In this lecture, we introduce normalized adjacency and Laplacian matrices. We state and begin

More information

Review of Linear Algebra

Review of Linear Algebra Review of Linear Algebra Definitions An m n (read "m by n") matrix, is a rectangular array of entries, where m is the number of rows and n the number of columns. 2 Definitions (Con t) A is square if m=

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Lecture 13 Spectral Graph Algorithms

Lecture 13 Spectral Graph Algorithms COMS 995-3: Advanced Algorithms March 6, 7 Lecture 3 Spectral Graph Algorithms Instructor: Alex Andoni Scribe: Srikar Varadaraj Introduction Today s topics: Finish proof from last lecture Example of random

More information

1 Tridiagonal matrices

1 Tridiagonal matrices Lecture Notes: β-ensembles Bálint Virág Notes with Diane Holcomb 1 Tridiagonal matrices Definition 1. Suppose you have a symmetric matrix A, we can define its spectral measure (at the first coordinate

More information

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name: Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition Due date: Friday, May 4, 2018 (1:35pm) Name: Section Number Assignment #10: Diagonalization

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information