Fitting a Graph to Vector Data. Samuel I. Daitch (Yale) Jonathan A. Kelner (MIT) Daniel A. Spielman (Yale)

Size: px
Start display at page:

Download "Fitting a Graph to Vector Data. Samuel I. Daitch (Yale) Jonathan A. Kelner (MIT) Daniel A. Spielman (Yale)"

Transcription

1 Fitting a Graph to Vector Data Samuel I. Daitch (Yale) Jonathan A. Kelner (MIT) Daniel A. Spielman (Yale) Machine Learning Summer School, June 2009

2 Given a collection of vectors x 1,..., x n IR d 1,, n Find a natural weighted graph on V = {1,, n} that is 1. Theoretically well motivated 2. Helps solve ML problems on the vectors 3. Has interesting combinatorial properties 4. Efficiently computable 5. Sparse

3 Given a collection of vectors x 1,..., x n IR d 1,, n Find a natural weighted graph on V = {1,, n} that is 1. Theoretically well motivated 2. Helps solve ML problems on the vectors 2. Helps solve TCS problems on the vectors 3. Has interesting combinatorial properties 4. Efficiently computable 5. Sparse

4 Outline Standard graphs Motivation for ours Laplacian view Combinatorial properties (Sparsity & Planarity) Use in classification, regression and clustering Efficient computation Approximate sparsity Lovasz sgraphs Open Questions

5 Standard ways to make graphs Choose one from each column Choice of edges k nearest neighbors neighbors if distance less than threshold δ Weights of edges unweighted exp ³ 2 kx i x j k 2 /2σ

6 Our first proposal Choose weights to minimize nx d ix i X w i,j x j j i i=11 j i 2 w i,j = w j,i where the weighted degrees satisfy d i = X j i w i,j 1

7 Our first proposal.67 Choose weights to minimize nx d ix i X w i,j x j j i i=11 j i w i,j = w j,i.67 where the weighted degrees satisfy d i = X j i w i,j 1

8 Motivation Consider regression problem solving for y i given {y j : j 6= i} y i = f(x i ) Given a weighted graph, natural guess is 1 X d i j i w i,j y j Sum of squares of errors when leave out each iis nx y i 1 X w i,j y j d i i=1 j i 2

9 Motivation Try to get right on coordinate vectors, x i = (x 1..., x d i, i ) Sum of squares of errors over d coordinate vectors dx k=11 i=11 nx x k i 1 X w i,j x k j d i i j i nx 1 = X i=1 x X i w i,j x j di j i 2 2

10 Motivation Try to get right on coordinate vectors, x i = (x 1..., x d i, i ) Sum of squares of errors over d coordinate vectors dx k=11 i=11 nx x k i 1 X w i,j x k j d i i j i nx 1 = X i=1 x X i w i,j x j di j i 2 Choose edge weights to minimize i i this. 2

11 Motivation Try to get right on coordinate vectors, x i = (x 1..., x d i, i ) Sum of squares of errors over d coordinate vectors dx k=11 i=11 nx x k i 1 X w i,j x k j d i i j i nx 1 = X i=1 x X i w i,j x j di j i 2 Ch d i ht t i i i thi Choose edge weights to minimize this. But, leads to a non-convex problem, and 2

12 Motivation, tricky example Consider minimizing 2 nx x i 1 X w i,j x j d i i=1 j i

13 Motivation, tricky example Consider minimizing 2 nx x i 1 X w i,j x j d i i=1 j i

14 Motivation, tricky example Consider minimizing 2 nx x i 1 X w i,j x j d i i=1 j i

15 Motivation, tricky example Consider minimizing 2 nx x i 1 X w i,j x j d i i=1 j i

16 Motivation, tricky example Consider minimizing 2 nx x i 1 X w i,j x j d i i=1 j i 1 1 ² ² ² 1 1 ² ² Just better as ² goes to 0 1

17 Motivation, fixing bad example Sum squares of errors, by weighted degree n nx x i 1 X w i,j x j d i i=1 j i 2 X n i=1 d ix i X w i,j x j j i 2

18 Motivation, fixing bad example Sum squares of errors, by weighted degree n nx x i 1 X w i,j x j d i i=1 j i 2 X n i=1 To avoid degeneracy, require either d ix i X w i,j x j j i 2 or d i = X j i w i,j 1 (hard graph) j i X (1 di ) 2 αn i:d i <1 (α soft graph)

19 Example Random points in 2 dimensions (0.1 soft)

20 Example Two Clusters (0.1 soft)

21 Example (0.1 soft)

22 Unique? Not necessarily: Not unique on highly symmetric point sets. (will see why later) But, almost always unique. Conjecture: Unique with probability 1 after infinitesimally small random perturbation.

23 Related to Locally linear embeddings [Roweis Saul Chooses allowable neighbors, then chooses asymmetric weights to minimize the same objective function. K means If restrict graph to be a union of complete graphs, one for each cluster, are minimizing same objective function.

24 In terms of the graph Laplacian L = D A where A is the weighted adjacency matrix, and D is the diagonal matrix of degrees

25 In terms of the graph Laplacian L = D A where A is the weighted adjacency matrix, and D is the diagonal matrix of degrees XnX 2 i=1 d ix i X w i,j x j = klxk 2 F j i L is linear in edge weights, so is quadratic klxk 2 F (term rejected was ) D 1 LX 2 F

26 Sparse Solution Can fix LX, or difference vectors i = d i x i X w i,j x j = X w i,j (x i x j ) j i j i and weighted degrees: d = U w Imposing only n(d+1) constraints on w. If more than n(d+1) non zero entries, can make one zero, holding all these fixed, keeping w non negative.

27 Sparsity Theorem: For n vectors in d dimensions, always is a graph with at most (d+1)n edges, average degree 2(d+1). Theorem (later) For n vectors in any dimension, is an ² approx solution with average degree O(1/² 2 ).

28 Degrees on real data Data set Type n dim ave deg (hard) ave deg (soft) abalone regression glass 6 classes heart 2 classes housing regression ionosphere 2 classes iris 3 classes machine regression mpg regression pima 2 classes sonar 2 classes vehicle 4 classes vowel 11 classes wine 3 classes

29 Degrees on real data Data set Type n dim ave deg (hard) ave deg (soft) abalone regression glass 6 classes heart 2 classes housing regression ionosphere 2 classes iris 3 classes machine regression mpg regression pima 2 classes sonar 2 classes vehicle 4 classes vowel 11 classes wine 3 classes

30 Example for which graphs are not unique Consider set of vectors in {0,1} d of even parity. Sum of any two solutions is a solution, so sum over all isomorphisms of the point set Every vertex gets degree at least µ d 2 By previous theorem, is a solution of average degree at most 2(d +1) Cannot have symmetry and sparsity

31 Planarity For every set of vectors in 2 dimensions, there is a hard and α soft graph that is planar no crossings

32 Planarity For every set of vectors in 2 dimensions, there is a hard and α soft graph that is planar no crossings And, no point contained in a triangle., p g In general dimension, is a simplicial complex.

33 Planarity, proof A graph with maximum sum of weighted ihtddegrees has no crossings. increase weights outside decrease weights on crossing edges + Will fix LX and thus klxk 2 F but tincreases all weighted ihtddegrees

34 Planarity, proof x 0 +tα 0 β 0 y 0 +tα 1 β 0 +tα 0 β 1 z tα 0 α 1 tβ 0 β 1 x 1 z = α 0x 0 + α 1 x 1, z = β 0 y 0 + β 1 y 1, α 0 + α 1 = 1 β 0 + β 1 =1 y 1 +tα 1 β 1 Will fix LX and thus klxk 2 F but tincreases all weighted ihtddegrees

35 Classification and Regression Experiments Given a set S of vectors with labels, compute z minimizing y i IR X wi,j (z i z j ) 2 = z T Lz i j subject to z i = y i, for i S [ZGL 03] With 2 clusters, use an indicator vector for one. With k clusters, use k indicator vectors.

36 Classification Experiments Used 10 fold cross validation We had no parameters to train. For other means of choosing graphs, chose their parameters by gridding over choices, and evaluating by leave one out cross validation. Repeated 100 times

37 Classification experiments Data set n dim classes 0.1 soft knn thresh glass heart ionosphere iris pima sonar vehicle vowel wine tried unweighted and exponential weights varying σ tried unweighted, and exponential weights varying σ varied k for knn, and δ for threshold

38 Classification experiments Data set n dim classes 0.1 soft knn thresh libsvm glass heart ionosphere iris pima sonar vehicle vowel wine For libsvm [ChangLin] For libsvm [ChangLin], chose params by 10 fold CV on training dat

39 Regression Error Data set n dim hard 0.1 soft knn thresh epsilon svr abalone bl Too long housing machine mpg Data normalized to have variance 1. 2 fold CV on abalone, 10 fold otherwise

40 Clustering Experiments (0.1 soft) Dt Data set n dim k k means NJW Nvec = k glass heart ionosphere iris pima sonar vehicle Vowel wine NJW: Ng Jordan Weiss, as implemented in Spider. Use graph, run k means in Nvec eigenvectors.

41 Clustering Experiments (0.1 soft) Dt Data set n dim k k means NJW Nvec = k Nvec chosen glass (12) heart (2) 0.19 ionosphere (15) iris (8) pima (12) sonar (35) vehicle (6) Vowel (30) wine (3) NJW: Ng, Jordan, Weiss, as implemented in Spider. Use graph, run k means in Nvec eigenvectors.

42 Choosing number vectors for spectral pro ojectio on Vector number (by λ) Vector number (by λ) Project X into eigenvectors. Take vectors until all small coefficient. (Still fuzzy)

43 Quadratic Program for Edge Weights L = UWU T V Signed vertex edge adjacency matrix 1-1 U Diagonal matrix of edge weights ih E

44 Objective function is quadratic klxk 2 F = dx Lx k 2 = dx k=1 k=1 = = = UWU T x k 2 XdX UWy k 2 k=1 XdX UY UY w (k) k=1 2 UY (1). w UY (d) w is vector of edge weights y k = U T x k 2 Y (k) = diag(y k )

45 Soft graph program 2 Minimize M = Subject to kmwk 2 nx (max(1 d i, 0) 2 ) αn i=1 w 0 UY (1).. UY (d) d = U w

46 Soft graph program 2 Minimize M = Subject to kmwk 2 nx (max(1 d i, 0) 2 ) αn i=1 w 0 UY (1).. UY (d) d = U w Minimize n kmwk 2 + μ X i=1(max(1 d i, 0) 2 ) Searching over μ until nx (max(1 d i, 0) 2 )=αn i=1

47 Soft graph program, as NNLSQ n Minimize kmwk 2 + μ X d i, 0) i=1(max(1 2 ) s Minimize kmwk 2 + μ k1 + s U wk 2 w, s 0 A non negative least squares problem.

48 Computing Soft graph quickly Minimize kmwk 2 + μ k1 + s U wk 2 w, s 0 Look for solution among a subset of edges Given solution restricted to subset of edges check gradient of obj func wrt unused edges if all negative, finished otherwise, include some and solve again.

49 Computation Time (secs) Data set Type n dim soft time hard time abalone regression glass 6 classes heart 2 classes housing regression ionosphere 2 classes iris 3 classes machine regression mpg regression pima 2 classes sonar 2 classes vehicle 4 classes vowel 11 classes wine 3 classes

50 Approximate Sparse Solutions G e with average degree at most 16/² 2 For every graph G exists a sparse graph So that for all x, x T L 2 x x T L2 x ² x T Lx klkkxk If not too small, klxk 2 F elx 2 F Leave one out regression scores similiar

51 Approximate Sparse Solutions G e with average degree at most 16/² 2 For every graph G exists a sparse graph So that for all x, x T L 2 x x T L e 2 x ²(x T Lx) klkkxk Want x T L 2 x x T L e 2 x ²x T L 2 x

52 Sparsification Theorem [Batson S Srivastava 09] For every graph G exists a sparse graph e G with at most 4n/² 2 edges Such that for all x, (1 ²)x T e Lx x T Lx (1 + ²)x t e Lx

53 Sparsification Theorem [Batson S Srivastava 09] For every graph G exists a sparse graph e G with at most 4n/² 2 edges Such that for all x, (1 ²)x T e Lx x T Lx (1 + ²)x t e Lx Edges Time O(n log n/² 2 ) (linsolve) log n/² 2 [S Srivastava 08] O(n log 29 n/² 2 ) O(m log 17 n/² 2 ) [S Teng 04]

54 Lovasz s Graphs Steinitz it representations tti and the Colin de Verdiere number, Form a matrix M = P A, P is positive diagonal A is non negative (adjacency) MX = 0 (in fact, null(m) = X) M has one negative eigenvalue Works if points are on convex hull of polytope. A non zero only for edges of polytope.

55 Open Questions Are there other natural and useful graphs? Are there interesting ways to modify/parameterize our construction? Are our graphs connected? Are our graphs unique (under perturbation)? Do there exist sparse approximate solutions in all dimensions? Can one sparsify for the square of the Laplacian? How to interpolate off the graph, or add points?

Spectral and Electrical Graph Theory. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Spectral and Electrical Graph Theory. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Spectral and Electrical Graph Theory Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Spectral Graph Theory: Understand graphs through eigenvectors and

More information

Laplacian Matrices of Graphs: Spectral and Electrical Theory

Laplacian Matrices of Graphs: Spectral and Electrical Theory Laplacian Matrices of Graphs: Spectral and Electrical Theory Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale University Toronto, Sep. 28, 2 Outline Introduction to graphs

More information

An Algorithmist s Toolkit September 10, Lecture 1

An Algorithmist s Toolkit September 10, Lecture 1 18.409 An Algorithmist s Toolkit September 10, 2009 Lecture 1 Lecturer: Jonathan Kelner Scribe: Jesse Geneson (2009) 1 Overview The class s goals, requirements, and policies were introduced, and topics

More information

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University. AMS Josiah Willard Gibbs Lecture January 6, 2016

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University. AMS Josiah Willard Gibbs Lecture January 6, 2016 Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016 From Applied to Pure Mathematics Algebraic and Spectral Graph Theory Sparsification: approximating

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to

More information

Graph Partitioning Using Random Walks

Graph Partitioning Using Random Walks Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient

More information

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Adjacency matrix and Laplacian Intuition, spectral graph drawing

More information

Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm

Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm Shang-Hua Teng Computer Science, Viterbi School of Engineering USC Massive Data and Massive Graphs 500 billions web

More information

Machine Learning for Data Science (CS4786) Lecture 11

Machine Learning for Data Science (CS4786) Lecture 11 Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will

More information

Graph Sparsification I : Effective Resistance Sampling

Graph Sparsification I : Effective Resistance Sampling Graph Sparsification I : Effective Resistance Sampling Nikhil Srivastava Microsoft Research India Simons Institute, August 26 2014 Graphs G G=(V,E,w) undirected V = n w: E R + Sparsification Approximate

More information

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min Spectral Graph Theory Lecture 2 The Laplacian Daniel A. Spielman September 4, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in class. The notes written before

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

CMPSCI 791BB: Advanced ML: Laplacian Learning

CMPSCI 791BB: Advanced ML: Laplacian Learning CMPSCI 791BB: Advanced ML: Laplacian Learning Sridhar Mahadevan Outline! Spectral graph operators! Combinatorial graph Laplacian! Normalized graph Laplacian! Random walks! Machine learning on graphs! Clustering!

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Machine Learning. Probabilistic KNN.

Machine Learning. Probabilistic KNN. Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007

More information

Lecture 13: Spectral Graph Theory

Lecture 13: Spectral Graph Theory CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral

More information

Introduction to Spectral Graph Theory and Graph Clustering

Introduction to Spectral Graph Theory and Graph Clustering Introduction to Spectral Graph Theory and Graph Clustering Chengming Jiang ECS 231 Spring 2016 University of California, Davis 1 / 40 Motivation Image partitioning in computer vision 2 / 40 Motivation

More information

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney

More information

Supervised locally linear embedding

Supervised locally linear embedding Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms : Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer

More information

Sparsification of Graphs and Matrices

Sparsification of Graphs and Matrices Sparsification of Graphs and Matrices Daniel A. Spielman Yale University joint work with Joshua Batson (MIT) Nikhil Srivastava (MSR) Shang- Hua Teng (USC) HUJI, May 21, 2014 Objective of Sparsification:

More information

Algorithms, Graph Theory, and the Solu7on of Laplacian Linear Equa7ons. Daniel A. Spielman Yale University

Algorithms, Graph Theory, and the Solu7on of Laplacian Linear Equa7ons. Daniel A. Spielman Yale University Algorithms, Graph Theory, and the Solu7on of Laplacian Linear Equa7ons Daniel A. Spielman Yale University Rutgers, Dec 6, 2011 Outline Linear Systems in Laplacian Matrices What? Why? Classic ways to solve

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML) Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Degenerate harmonic structures on fractal graphs.

Degenerate harmonic structures on fractal graphs. Degenerate harmonic structures on fractal graphs. Konstantinos Tsougkas Uppsala University Fractals 6, Cornell University June 16, 2017 1 / 25 Analysis on fractals via analysis on graphs. The motivation

More information

Algorithms, Graph Theory, and Linear Equa9ons in Laplacians. Daniel A. Spielman Yale University

Algorithms, Graph Theory, and Linear Equa9ons in Laplacians. Daniel A. Spielman Yale University Algorithms, Graph Theory, and Linear Equa9ons in Laplacians Daniel A. Spielman Yale University Shang- Hua Teng Carter Fussell TPS David Harbater UPenn Pavel Cur9s Todd Knoblock CTY Serge Lang 1927-2005

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Graph Sparsifiers. Smaller graph that (approximately) preserves the values of some set of graph parameters. Graph Sparsification

Graph Sparsifiers. Smaller graph that (approximately) preserves the values of some set of graph parameters. Graph Sparsification Graph Sparsifiers Smaller graph that (approximately) preserves the values of some set of graph parameters Graph Sparsifiers Spanners Emulators Small stretch spanning trees Vertex sparsifiers Spectral sparsifiers

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Diffusion and random walks on graphs

Diffusion and random walks on graphs Diffusion and random walks on graphs Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural

More information

MATH 567: Mathematical Techniques in Data Science Clustering II

MATH 567: Mathematical Techniques in Data Science Clustering II This lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 567: Mathematical Techniques in Data Science Clustering II Dominique Guillot Departments

More information

Properties of Expanders Expanders as Approximations of the Complete Graph

Properties of Expanders Expanders as Approximations of the Complete Graph Spectral Graph Theory Lecture 10 Properties of Expanders Daniel A. Spielman October 2, 2009 10.1 Expanders as Approximations of the Complete Graph Last lecture, we defined an expander graph to be a d-regular

More information

Metric Embedding for Kernel Classification Rules

Metric Embedding for Kernel Classification Rules Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding

More information

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13 CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as

More information

Spectral Clustering. Guokun Lai 2016/10

Spectral Clustering. Guokun Lai 2016/10 Spectral Clustering Guokun Lai 2016/10 1 / 37 Organization Graph Cut Fundamental Limitations of Spectral Clustering Ng 2002 paper (if we have time) 2 / 37 Notation We define a undirected weighted graph

More information

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Learning on Graphs and Manifolds CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Outline Manifold learning is a relatively new area of machine learning (2000-now). Main idea Model the underlying geometry of

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Random Matrices: Invertibility, Structure, and Applications

Random Matrices: Invertibility, Structure, and Applications Random Matrices: Invertibility, Structure, and Applications Roman Vershynin University of Michigan Colloquium, October 11, 2011 Roman Vershynin (University of Michigan) Random Matrices Colloquium 1 / 37

More information

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline

More information

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods) Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March

More information

Spectral Techniques for Clustering

Spectral Techniques for Clustering Nicola Rebagliati 1/54 Spectral Techniques for Clustering Nicola Rebagliati 29 April, 2010 Nicola Rebagliati 2/54 Thesis Outline 1 2 Data Representation for Clustering Setting Data Representation and Methods

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

Discriminative Learning and Big Data

Discriminative Learning and Big Data AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture

More information

Diffuse interface methods on graphs: Data clustering and Gamma-limits

Diffuse interface methods on graphs: Data clustering and Gamma-limits Diffuse interface methods on graphs: Data clustering and Gamma-limits Yves van Gennip joint work with Andrea Bertozzi, Jeff Brantingham, Blake Hunter Department of Mathematics, UCLA Research made possible

More information

Statistical and Computational Analysis of Locality Preserving Projection

Statistical and Computational Analysis of Locality Preserving Projection Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637

More information

Machine Learning - MT Clustering

Machine Learning - MT Clustering Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:

More information

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT) Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

An Algorithmist s Toolkit September 15, Lecture 2

An Algorithmist s Toolkit September 15, Lecture 2 18.409 An Algorithmist s Toolkit September 15, 007 Lecture Lecturer: Jonathan Kelner Scribe: Mergen Nachin 009 1 Administrative Details Signup online for scribing. Review of Lecture 1 All of the following

More information

Using Local Spectral Methods in Theory and in Practice

Using Local Spectral Methods in Theory and in Practice Using Local Spectral Methods in Theory and in Practice Michael W. Mahoney ICSI and Dept of Statistics, UC Berkeley ( For more info, see: http: // www. stat. berkeley. edu/ ~ mmahoney or Google on Michael

More information

Constant Arboricity Spectral Sparsifiers

Constant Arboricity Spectral Sparsifiers Constant Arboricity Spectral Sparsifiers arxiv:1808.0566v1 [cs.ds] 16 Aug 018 Timothy Chu oogle timchu@google.com Jakub W. Pachocki Carnegie Mellon University pachocki@cs.cmu.edu August 0, 018 Abstract

More information

approximation algorithms I

approximation algorithms I SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 201 meta-task encoded as low-degree polynomial in R x example: f(x) = i,j n w ij x i x j 2 given: functions

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

The Simplest Construction of Expanders

The Simplest Construction of Expanders Spectral Graph Theory Lecture 14 The Simplest Construction of Expanders Daniel A. Spielman October 16, 2009 14.1 Overview I am going to present the simplest construction of expanders that I have been able

More information

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES JIANG ZHU, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, P. R. China E-MAIL:

More information

Random Sampling of Bandlimited Signals on Graphs

Random Sampling of Bandlimited Signals on Graphs Random Sampling of Bandlimited Signals on Graphs Pierre Vandergheynst École Polytechnique Fédérale de Lausanne (EPFL) School of Engineering & School of Computer and Communication Sciences Joint work with

More information

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 - Clustering Lorenzo Rosasco UNIGE-MIT-IIT About this class We will consider an unsupervised setting, and in particular the problem of clustering unlabeled data into coherent groups. MLCC 2018

More information

Graph Sparsifiers: A Survey

Graph Sparsifiers: A Survey Graph Sparsifiers: A Survey Nick Harvey UBC Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman, Srivastava and Teng Approximating Dense Objects

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

The Curse of Dimensionality for Local Kernel Machines

The Curse of Dimensionality for Local Kernel Machines The Curse of Dimensionality for Local Kernel Machines Yoshua Bengio, Olivier Delalleau & Nicolas Le Roux April 7th 2005 Yoshua Bengio, Olivier Delalleau & Nicolas Le Roux Snowbird Learning Workshop Perspective

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Kernel Methods. Charles Elkan October 17, 2007

Kernel Methods. Charles Elkan October 17, 2007 Kernel Methods Charles Elkan elkan@cs.ucsd.edu October 17, 2007 Remember the xor example of a classification problem that is not linearly separable. If we map every example into a new representation, then

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

An indicator for the number of clusters using a linear map to simplex structure

An indicator for the number of clusters using a linear map to simplex structure An indicator for the number of clusters using a linear map to simplex structure Marcus Weber, Wasinee Rungsarityotin, and Alexander Schliep Zuse Institute Berlin ZIB Takustraße 7, D-495 Berlin, Germany

More information

Locality Preserving Projections

Locality Preserving Projections Locality Preserving Projections Xiaofei He Department of Computer Science The University of Chicago Chicago, IL 60637 xiaofei@cs.uchicago.edu Partha Niyogi Department of Computer Science The University

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?) Math 35 Exam Review SOLUTIONS Overview In this third of the course we focused on linear learning algorithms to model data. summarize: To. Background: The SVD and the best basis (questions selected from

More information

Scalable Subspace Clustering

Scalable Subspace Clustering Scalable Subspace Clustering René Vidal Center for Imaging Science, Laboratory for Computational Sensing and Robotics, Institute for Computational Medicine, Department of Biomedical Engineering, Johns

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Neural Computation, June 2003; 15 (6):1373-1396 Presentation for CSE291 sp07 M. Belkin 1 P. Niyogi 2 1 University of Chicago, Department

More information

Spectral Clustering. Zitao Liu

Spectral Clustering. Zitao Liu Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of

More information

MATH 829: Introduction to Data Mining and Analysis Clustering II

MATH 829: Introduction to Data Mining and Analysis Clustering II his lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 829: Introduction to Data Mining and Analysis Clustering II Dominique Guillot Departments

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Manifold Regularization

Manifold Regularization 9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12 Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

More information

Graph Sparsification III: Ramanujan Graphs, Lifts, and Interlacing Families

Graph Sparsification III: Ramanujan Graphs, Lifts, and Interlacing Families Graph Sparsification III: Ramanujan Graphs, Lifts, and Interlacing Families Nikhil Srivastava Microsoft Research India Simons Institute, August 27, 2014 The Last Two Lectures Lecture 1. Every weighted

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee 9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic

More information

Effective Resistance and Schur Complements

Effective Resistance and Schur Complements Spectral Graph Theory Lecture 8 Effective Resistance and Schur Complements Daniel A Spielman September 28, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology ..... SPARROW SPARse approximation Weighted regression Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Université de Montréal March 12, 2012 SPARROW 1/47 .....

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 2 Nonlinear Manifold Learning Multidimensional Scaling (MDS) Locally Linear Embedding (LLE) Beyond Principal Components Analysis (PCA)

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

2. (Review) Write an equation to describe each linear function based on the provided information. A. The linear function, k(x), has a slope

2. (Review) Write an equation to describe each linear function based on the provided information. A. The linear function, k(x), has a slope Sec 4.1 Creating Equations & Inequalities Building Linear, Quadratic, and Exponential Functions 1. (Review) Write an equation to describe each linear function graphed below. A. B. C. Name: f(x) = h(x)

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection

More information

Classification and Pattern Recognition

Classification and Pattern Recognition Classification and Pattern Recognition Léon Bottou NEC Labs America COS 424 2/23/2010 The machine learning mix and match Goals Representation Capacity Control Operational Considerations Computational Considerations

More information

Eight theorems in extremal spectral graph theory

Eight theorems in extremal spectral graph theory Eight theorems in extremal spectral graph theory Michael Tait Carnegie Mellon University mtait@cmu.edu ICOMAS 2018 May 11, 2018 Michael Tait (CMU) May 11, 2018 1 / 1 Questions in extremal graph theory

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information