Statistical Learning Notes III-Section 2.4.3

Size: px

Start display at page:

Download "Statistical Learning Notes III-Section 2.4.3"

Vivian Rodgers
5 years ago
Views:

1 Statistical Learning Notes III-Section "Graphical" Spectral Features Stephen Vardeman Analytics Iowa LLC January 2019 Stephen Vardeman (Analytics Iowa LLC) Statistical Learning Notes III-Section January / 10

2 Graphical "Spectral" Features Roughly Connected Sets of Points in p-space Another variant of principal components ideas concerns "spectral features" of a dataset built on thinking of data cases as corresponding to vertices on a graph. This material has emphases in common with the local version of multi-dimensional scaling treated in Section 17.3 of the notes, and can sometimes provide a way to separate "unconventional" but distinct structures of data points in R p. The basic motivation is to not necessarily look for "convex" groups of points in p-space, but rather for "roughly connected"/"contiguous" sets of points of any shape in p-space. Stephen Vardeman (Analytics Iowa LLC) Statistical Learning Notes III-Section January / 10

3 Adjacencies and Symmetric Sets of Index Pairsl Begin with N vectors x 1, x 2,..., x N in R p. Consider weights w ij = w ( x i x j ) for a decreasing function w : [0, ) [0, 1] and use them to define similarities/adjacencies s ij. (For example, we might use w (d) = exp ( d 2 /c ) for some c > 0.) Similarities can be exactly s ij = w ij, but can be even more "locally" defined as follows. For fixed k consider the symmetric set of index pairs { the number of j N k = (i, j) } with w ij > w ij is less than k or the number of i with w i j > w ij is less than k (an index pair is in the set if one of the items is in the k-nearest neighbor neighborhood of the other). One might then define s ij = w ij I [(i, j) N k ]. Stephen Vardeman (Analytics Iowa LLC) Statistical Learning Notes III-Section January / 10

4 Adjacency Matrix and Node Degrees We ll call the matrix S = (s ij ) i=1,...,n j=1,...,n the adjacency matrix, and use the notations g i = N s ij and G = diag (g 1, g 2,..., g N ) j=1 It is common to think of the points x 1, x 2,..., x N in R p as nodes/vertices on a graph, with edges between nodes weighted by similarities s ij, and the g i so-called node degrees, i.e. sums of weights of the edges connected to nodes i. In such thinking, s ij = 0 indicates that there is no "edge" between case i and case j. Stephen Vardeman (Analytics Iowa LLC) Statistical Learning Notes III-Section January / 10

5 Graph Laplacians The matrix L = G S is called the (unnormalized) graph Laplacian, and one standardized (with respect to the node degrees) version of this is L = G 1 L = I G 1 S and a second (symmetric) standardized version is L = G 1/2 LG 1/2 = I G 1/2 SG 1/2 (1) Stephen Vardeman (Analytics Iowa LLC) Statistical Learning Notes III-Section January / 10

6 Unnormalized Graph Laplacians are Non-negative Definite Note that for any vector u, u Lu = N i=1 g i u 2 i = 1 2 ( N = 1 2 N i=1 j=1 N N i=1 j=1 N N i=1 j=1 s ij u 2 i + u i u j s ij s ij (u i u j ) 2 N N j=1 i=1 s ij u 2 j so that the N N symmetric L is nonnegative definite. ) N N i=1 j=1 u i u j s ij Stephen Vardeman (Analytics Iowa LLC) Statistical Learning Notes III-Section January / 10

7 Eigen Analysis of Graph Laplacians Consider the spectral/eigen decomposition of L and focus on the small eigenvalues. For v 1,..., v m eigenvectors corresponding to the 2nd through (m + 1)st smallest non-zero eigenvalues (since L1 = 0 there is an uninteresting 0 eigenvalue), let V = (v 1,..., v m ) These are "graphical spectral features" and one might think of cases with similar rows of V as "alike." As we noted in the discussion in Section 2.4.1, small eigenvalues are associated with linear combinations of columns of L that are close to 0. Stephen Vardeman (Analytics Iowa LLC) Statistical Learning Notes III-Section January / 10

8 Rationale for Meaning of Graphical Features Why should this work to identify connected structures in a training set? For v l a column of V that is a eigenvector of L corresponding to a small eigenvalue λ l, by virtue of relationship (??) λ l = v l Lv l = 1 2 N N i=1 j=1 s ij (v li v lj ) 2 0 and points x i and x j with large adjacencies must have similar corresponding coordinates of the eigenvectors. HTF (at the bottom of their page 545) essentially argue that the number of "0 or nearly 0" eigenvalues of L is indicative of the number of connected structures in the original N data vectors. A series of points could be (in sequence) close to successive elements of the sequence but have very small adjacencies for points separated in the sequence. "Structures" by this methodology need NOT be "clumps" of points, but could also be serpentine "chains" of points in R p. Stephen Vardeman (Analytics Iowa LLC) Statistical Learning Notes III-Section January / 10

9 Analysis Based on the Symmetric Normalized Laplacian A second version of this is easily built on the symmetric normalized Laplacian (1), L. Its eigenvalues are nonnegative and it has a 0 eigenvalue. Let λ1 λ m be the 2nd through (m + 1)st smallest eigenvalues and v1,..., v m be corresponding eigenvectors. Then for λl such a small non-negative eigenvalue, ( λl = vl L vl = vl G 1/2 LG 1/2) vl = 1 2 N i=1 N j=1 s ij ( vli v lj gi gj ) 2 0 and points x i and x j with large adjacencies must have similar corresponding coordinates of the vector G 1/2 vl. So one might treat vectors G 1/2 vl (or perhaps normalized versions of them) as a second version of m graphical features. Stephen Vardeman (Analytics Iowa LLC) Statistical Learning Notes III-Section January / 10

10 Graphcial Spectral Features Markov Chain Motivation for Graphical Features It is also easy to see that P G 1 S is a stochastic matrix and thus specifying an N-state stationary Markov Chain. It is plausible that the standardized graph Laplacian L = I P identifies groups of states such that transition by such a chain between the groups is relatively infrequent (the MC more typically moves within groups). Stephen Vardeman (Analytics Iowa LLC) Statistical Learning Notes III-Section January / 10

Machine Learning for Data Science (CS4786) Lecture 11

Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will