Locally-biased analytics

Similar documents
First, parse the title...

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods

Using Local Spectral Methods in Theory and in Practice

Lecture: Local Spectral Methods (1 of 4)

Graph Partitioning Using Random Walks

A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally

Semi-supervised Eigenvectors for Locally-biased Learning

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

Mapping the Similarities of Spectra: Global and Locally-biased Approaches to SDSS Galaxy Data

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Facebook Friends! and Matrix Functions

Three right directions and three wrong directions for tensor research

Regularized Laplacian Estimation and Fast Eigenvector Approximation

STA141C: Big Data & High Performance Statistical Computing

An Optimization Approach to Locally-Biased Graph Algorithms

Heat Kernel Based Community Detection

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Predicting Graph Labels using Perceptron. Shuang Song

Lecture: Flow-based Methods for Partitioning Graphs (1 of 2)

Spectral Clustering. Guokun Lai 2016/10

More Spectral Clustering and an Introduction to Conjugacy

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Strong localization in seeded PageRank vectors

Four graph partitioning algorithms. Fan Chung University of California, San Diego

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Parallel Local Graph Clustering

Lecture 13: Spectral Graph Theory

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Markov Chains and Spectral Clustering

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Lecture: Some Practical Considerations (3 of 4)

Capacity Releasing Diffusion for Speed and Locality

Analysis of Spectral Kernel Design based Semi-supervised Learning

A physical model for efficient rankings in networks

Lecture: Some Statistical Inference Issues (2 of 3)

Exploiting Sparse Non-Linear Structure in Astronomical Data

Space-Variant Computer Vision: A Graph Theoretic Approach

Graph Partitioning Algorithms and Laplacian Eigenvalues

Networks and Their Spectra

Partitioning Algorithms that Combine Spectral and Flow Methods

Nonlinear Eigenproblems in Data Analysis

Lecture: Modeling graphs with electrical networks

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm

Spectral Generative Models for Graphs

A Nearly Sublinear Approximation to exp{p}e i for Large Sparse Matrices from Social Networks

1 Matrix notation and preliminaries from spectral graph theory

Spectral Clustering on Handwritten Digits Database

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Graph Clustering Algorithms

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University. AMS Josiah Willard Gibbs Lecture January 6, 2016

Dictionary Learning for photo-z estimation

Laplacian Matrices of Graphs: Spectral and Electrical Theory

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

Seeded PageRank solution paths

LOCAL CHEEGER INEQUALITIES AND SPARSE CUTS

Semi Supervised Distance Metric Learning

Approximate Principal Components Analysis of Large Data Sets

Lecture: Expanders, in theory and in practice (2 of 2)

Iterative Methods for Solving A x = b

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Classification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter

Data Mining Techniques

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

Inverse Power Method for Non-linear Eigenproblems

Large Scale Semi-supervised Linear SVMs. University of Chicago

Topic: Balanced Cut, Sparsest Cut, and Metric Embeddings Date: 3/21/2007

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 8 Luca Trevisan September 19, 2017

A New Spectral Technique Using Normalized Adjacency Matrices for Graph Matching 1

Linear & nonlinear classifiers

25 : Graphical induced structured input/output models

Local Lanczos Spectral Approximation for Community Detection

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

CMPSCI 791BB: Advanced ML: Laplacian Learning

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Notes for CS542G (Iterative Solvers for Linear Systems)

Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis

A New Space for Comparing Graphs

Classical Predictive Models

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

VCMC: Variational Consensus Monte Carlo

CS6220: DATA MINING TECHNIQUES

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

CS6220: DATA MINING TECHNIQUES

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Data dependent operators for the spatial-spectral fusion problem

Eigenvalue Problems Computation and Applications

Spectral Graph Theory

Markov Chains, Random Walks on Graphs, and the Laplacian

Convex relaxation for Combinatorial Penalties

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

ON THE QUALITY OF SPECTRAL SEPARATORS

Asymmetric Cheeger cut and application to multi-class unsupervised clustering

Linear & nonlinear classifiers

Specral Graph Theory and its Applications September 7, Lecture 2 L G1,2 = 1 1.

Transcription:

Locally-biased analytics You have BIG data and want to analyze a small part of it: Solution 1: Cut out small part and use traditional methods Challenge: cutting out may be difficult a priori Solution 2: Develop locally-biased methods for data-analysis Challenge: Most data analysis tools (implicitly or explicitly) make strong local-global assumptions* *spectral partitioning wants to find 50-50 clusters; recursive partitioning is of interest if recursion depth isn t too deep; eigenvectors optimize global objectives, etc.

Locally-biased analytics Locally-biased community identification: Find a community around an exogenously-specified seed node Locally-biased image segmentation: Find a small tiger in the middle of a big picture Locally-biased neural connectivity analysis: Find neurons that are temporally-correlated with local stimulus Locally-biased inference, semi-supervised learning, etc.: Do machine learning with a seed set of ground truth nodes, i.e., make predictions that draws strength based on local information

Global spectral methods DO work well (1) Construct a graph from the data (2) Use the second eigenvalue/eigenvector of Laplacian: do clustering, community detection, image segmentation, parallel computing, semisupervised/transductive learning, etc. Why is it useful? (*) Connections with random walks and sparse cuts (*) Isoperimetric structure gives controls on capacity/inference (*) Relatively easy to compute

Global spectral methods DON T work well (1) Leading nontrivial eigenvalue/eigenvector are inherently global quantities (2) May NOT be sensitive to local information: (*) Sparse cuts may be poorly correlated with second/all eigenvectors (*) Interesting local region may be hidden to global eigenvectors that are dominated by exact orthogonality constraint. QUES: Can we find a locally-biased analogue of the usual global eigenvectors that comes with the good properties of the global eigenvectors? (*) Connections with random walks and sparse cuts (*) This gives controls on capacity/inference (*) Relatively easy to compute

Outline Locally-biased eigenvectors A methodology to construct a locally-biased analogue of leading nontrivial eigenvector of graph Laplacian Implicit regularization...... in early-stopped iterations and teleported PageRank computations Semi-supervised eigenvectors Extend locally-biased eigenvectors to compute multiple locally-biased eigenvectors, i.e., locally-biased SPSD kernels Implicit regularization...... in truncated diffusions and push-based approximations to PageRank... connections to strongly-local spectral methods and scalable computation

Outline Locally-biased eigenvectors A methodology to construct a locally-biased analogue of leading nontrivial eigenvector of graph Laplacian Implicit regularization...... in early-stopped iterations and teleported PageRank computations Semi-supervised eigenvectors Extend locally-biased eigenvectors to compute multiple locally-biased eigenvectors, i.e., locally-biased SPSD kernels Implicit regularization...... in truncated diffusions and push-based approximations to PageRank... connections to strongly-local spectral methods and scalable computation

Recall spectral graph partitioning The basic optimization problem: Relaxation of: Solvable via the eigenvalue problem: Sweep cut of second eigenvector yields:

Geometric correlation and generalized PageRank vectors Given a cut T, define the vector: Can use this to define a geometric notion of correlation between cuts:

Local spectral partitioning ansatz Mahoney, Orecchia, and Vishnoi (2010) Primal program: Dual program: Interpretation: Find a cut well-correlated with the seed vector s. If s is a single node, this relaxes: Interpretation: Embedding a combination of scaled complete graph K n and complete graphs T and T (K T and K T ) - where the latter encourage cuts near (T,T).

Main results (1 of 2) Mahoney, Orecchia, and Vishnoi (2010) Theorem: If x* is an optimal solution to LocalSpectral, it is a GPPR vector for parameter α, and it can be computed as the solution to a set of linear equations. Proof: (1) Relax non-convex problem to convex SDP (2) Strong duality holds for this SDP (3) Solution to SDP is rank one (from comp. slack.) (4) Rank one solution is GPPR vector.

Main results (2 of 2) Theorem: If x* is optimal solution to LocalSpect (G,s,κ), one can find a cut of conductance 8λ(G,s,κ) in time O(n lg n) with sweep cut of x*. Upper bound, as usual from sweep cut & Cheeger. Mahoney, Orecchia, and Vishnoi (2010) Theorem: Let s be seed vector and κ correlation parameter. For all sets of nodes T s.t. κ :=<s,s T > D 2, we have: φ(t) λ(g,s,κ) if κ κ, and φ(t) (κ /κ)λ(g,s,κ) if κ κ. Lower bound: Spectral version of flowimprovement algs.

Illustration on small graphs Similar results if we do local random walks, truncated PageRank, and heat kernel diffusions. Linear equation formulation is more powerful than diffusions I.e., can access all α ε ( -, λ 2 (G) ) parameter values

Illustration with general seeds Seed vector doesn t need to correspond to cuts. It could be any vector on the nodes, e.g., can find a cut near lowdegree vertices with s i = -(d i -d av ), iε[n].

New methods are useful more generally Maji, Vishnoi,and Malik (2011) applied Mahoney, Orecchia, and Vishnoi (2010) Cannot find the tiger with global eigenvectors. Can find the tiger with the LocalSpectral method!

Outline Locally-biased eigenvectors A methodology to construct a locally-biased analogue of leading nontrivial eigenvector of graph Laplacian Implicit regularization...... in early-stopped iterations and teleported PageRank computations Semi-supervised eigenvectors Extend locally-biased eigenvectors to compute multiple locally-biased eigenvectors, i.e., locally-biased SPSD kernels Implicit regularization...... in truncated diffusions and push-based approximations to PageRank... connections to strongly-local spectral methods and scalable computation

PageRank and implicit regularization Recall the usual characterization of PPR: Compare with our definition of GPPR: Question: Can we formalize that PageRank is a regularized version of leading nontrivial eigenvector of the Laplacian?

Two versions of spectral partitioning VP: R-VP:

Two versions of spectral partitioning VP: SDP: R-VP: R-SDP:

A simple theorem Mahoney and Orecchia (2010) Modification of the usual SDP form of spectral to have regularization (but, on the matrix X, not the vector x).

Corollary If F D (X) = -logdet(x) (i.e., Log-determinant), then this gives scaled PageRank matrix, with t ~ η I.e., PageRank does two things: It approximately computes the Fiedler vector. It exactly computes a regularized version of the Fiedler vector implicitly! (Similarly, generalized entropy regularization implicit in Heat Kernel computations; & matrix p-norm regularization implicit in power iteration.)

Outline Locally-biased eigenvectors A methodology to construct a locally-biased analogue of leading nontrivial eigenvector of graph Laplacian Implicit regularization...... in early-stopped iterations and teleported PageRank computations Semi-supervised eigenvectors Extend locally-biased eigenvectors to compute multiple locally-biased eigenvectors, i.e., locally-biased SPSD kernels Implicit regularization...... in truncated diffusions and push-based approximations to PageRank... connections to strongly-local spectral methods and scalable computation

Semi-supervised eigenvectors Hansen and Mahoney (NIPS 2013, JMLR 2014) Eigenvectors are inherently global quantities, and the leading ones may therefore fail at modeling relevant local structures. Generalized eigenvalue problem. Solution is given by the second smallest eigenvector, and yields a Normalized Cut. Locally-biased analogue of the second smallest eigenvector. Optimal solution is a generalization of Personalized PageRank and can be computed in nearly-linear time [MOV2012]. Semi-supervised eigenvector generalization of [HM2013]. This objective incorporates a general orthogonality constraint, allowing us to compute a sequence of localized eigenvectors. Semi-supervised eigenvectors are efficient to compute and inherit many of the nice properties that characterizes global eigenvectors of a graph.

Semi-supervised eigenvectors Hansen and Mahoney (NIPS 2013, JMLR 2014) This interpolates between very localized solutions and the global eigenvectors of the graph Laplacian. For κ=0, this is the usual global generalized eigenvalue problem. For κ=1, this returns the local seed set. For γ<0, one we can compute the first semi-supervised eigenvectors using local graph diffusions, i.e., personalized PageRank. Approximate the solution using the Push algorithm [ACL06]. Implicit regularization characterization by [M010] & [GM14]. Leading solution General solution Norm constraint Orthogonality constraint Locality constraint Seed vector Projection operator Determines the locality of the solution. Convex for.

Semi-supervised eigenvectors Small-world example - The eigenvectors having smallest eigenvalues capture the slowest modes of variation. Probability of random edges Global eigenvectors Global eigenvectors

Semi-supervised eigenvectors Small-world example - The eigenvectors having smallest eigenvalues capture the slowest modes of variation. Correlation with seed seed node Semi-supervised eigenvectors Low correlation Semi-supervised eigenvectors High correlation

Semi-supervised eigenvectors Hansen and Mahoney (NIPS 2013, JMLR 2014) One seed per class Ten labeled samples per class used in a downstream classifier Semi-supervised eigenvector scatter plots Many real applications: A spatially guided searchlight technique that compared to [Kriegeskorte2006] account for spatially distributed signal representations. Large/small-scale structure in DNA SNP data in population genetics Local structure in astronomical data Code is available at: https://sites.google.com/site/tokejansenhansen/

Local structure in SDSS spectra Lawlor, Budavari, and Mahoney (2014) Data: x ε R 3841, N 500k are photon fluxes in 10 Å bins preprocessing corrects for redshift, gappy regions normalized by median flux at certain wavelengths Red galaxy Blue galaxy

Local structure in SDSS spectra Lawlor, Budavari, and Mahoney (2014) Galaxies along bridge & bridge spectra ROC curves for classifying AGN spectra on top four global eigenvectors (left) ; and (right) top four semisupervised eigenvectors.

Outline Locally-biased eigenvectors A methodology to construct a locally-biased analogue of leading nontrivial eigenvector of graph Laplacian Implicit regularization...... in early-stopped iterations and teleported PageRank computations Semi-supervised eigenvectors Extend locally-biased eigenvectors to compute multiple locally-biased eigenvectors, i.e., locally-biased SPSD kernels Implicit regularization...... in truncated diffusions and push-based approximations to PageRank... connections to strongly-local spectral methods and scalable computation

Push Algorithm for PageRank The Push Method! Proposed (a variant) in ACL06 (also M0x, JW03) for Personalized PageRank Strongly related to Gauss-Seidel (see Gleich s talk at Simons for this) Derived to show improved runtime for balanced solvers Applied to graphs with 10M+nodes and 1B+edges

Why do we care about push? 1. Widely-used for empirical studies of communities 2. Used for fast PageRank approximation Produces sparse approximations to PageRank! Why does the push method have such empirical utility? v has a single one here Newman s netscience 379 vertices, 1828 nnz zero on most of the nodes

How might an algorithm be good? Two ways this algorithm might be good. Theorem 1. [ACL06] The ACL push procedure returns a vector that is ε-worst than the exact PPR and much faster. Theorem 2. [GM14] The ACL push procedure returns a vector that exactly solves an L1-regulairzed version of the PPR objective. I.e., the Push Method does two things: It approximately computes the PPR vector. It exactly computes a regularized version of the PPR vector implicitly!

The s-t min-cut problem Unweighted incidence matrix Diagonal capacity matrix Consider L2 variants of this objective & show how the Push Method and other diffusion-based ML algorithms implicitly regularize.

The localized cut graph Gleich and Mahoney (2014) Solve the s-t min-cut

s-t min-cut -> PageRank Gleich and Mahoney (2014) L1->L2 changes s-t min-cut to electrical flow s-t min-cut

Back to the push method Gleich and Mahoney (2014) Need for normalization L1 regularization for sparsity

Conclusions Locally-biased and semi-supervised eigenvectors Local versions of the usual global eigenvectors that come with the good properties of global eigenvectors Strong algorithmic and statistical theory & good initial results in several applications Novel connections between approximate computation and implicit regularization Special cases already scaled up to LARGE data