Fitting a Graph to Vector Data. Samuel I. Daitch (Yale) Jonathan A. Kelner (MIT) Daniel A. Spielman (Yale)
|
|
- Mary Walton
- 5 years ago
- Views:
Transcription
1 Fitting a Graph to Vector Data Samuel I. Daitch (Yale) Jonathan A. Kelner (MIT) Daniel A. Spielman (Yale) Machine Learning Summer School, June 2009
2 Given a collection of vectors x 1,..., x n IR d 1,, n Find a natural weighted graph on V = {1,, n} that is 1. Theoretically well motivated 2. Helps solve ML problems on the vectors 3. Has interesting combinatorial properties 4. Efficiently computable 5. Sparse
3 Given a collection of vectors x 1,..., x n IR d 1,, n Find a natural weighted graph on V = {1,, n} that is 1. Theoretically well motivated 2. Helps solve ML problems on the vectors 2. Helps solve TCS problems on the vectors 3. Has interesting combinatorial properties 4. Efficiently computable 5. Sparse
4 Outline Standard graphs Motivation for ours Laplacian view Combinatorial properties (Sparsity & Planarity) Use in classification, regression and clustering Efficient computation Approximate sparsity Lovasz sgraphs Open Questions
5 Standard ways to make graphs Choose one from each column Choice of edges k nearest neighbors neighbors if distance less than threshold δ Weights of edges unweighted exp ³ 2 kx i x j k 2 /2σ
6 Our first proposal Choose weights to minimize nx d ix i X w i,j x j j i i=11 j i 2 w i,j = w j,i where the weighted degrees satisfy d i = X j i w i,j 1
7 Our first proposal.67 Choose weights to minimize nx d ix i X w i,j x j j i i=11 j i w i,j = w j,i.67 where the weighted degrees satisfy d i = X j i w i,j 1
8 Motivation Consider regression problem solving for y i given {y j : j 6= i} y i = f(x i ) Given a weighted graph, natural guess is 1 X d i j i w i,j y j Sum of squares of errors when leave out each iis nx y i 1 X w i,j y j d i i=1 j i 2
9 Motivation Try to get right on coordinate vectors, x i = (x 1..., x d i, i ) Sum of squares of errors over d coordinate vectors dx k=11 i=11 nx x k i 1 X w i,j x k j d i i j i nx 1 = X i=1 x X i w i,j x j di j i 2 2
10 Motivation Try to get right on coordinate vectors, x i = (x 1..., x d i, i ) Sum of squares of errors over d coordinate vectors dx k=11 i=11 nx x k i 1 X w i,j x k j d i i j i nx 1 = X i=1 x X i w i,j x j di j i 2 Choose edge weights to minimize i i this. 2
11 Motivation Try to get right on coordinate vectors, x i = (x 1..., x d i, i ) Sum of squares of errors over d coordinate vectors dx k=11 i=11 nx x k i 1 X w i,j x k j d i i j i nx 1 = X i=1 x X i w i,j x j di j i 2 Ch d i ht t i i i thi Choose edge weights to minimize this. But, leads to a non-convex problem, and 2
12 Motivation, tricky example Consider minimizing 2 nx x i 1 X w i,j x j d i i=1 j i
13 Motivation, tricky example Consider minimizing 2 nx x i 1 X w i,j x j d i i=1 j i
14 Motivation, tricky example Consider minimizing 2 nx x i 1 X w i,j x j d i i=1 j i
15 Motivation, tricky example Consider minimizing 2 nx x i 1 X w i,j x j d i i=1 j i
16 Motivation, tricky example Consider minimizing 2 nx x i 1 X w i,j x j d i i=1 j i 1 1 ² ² ² 1 1 ² ² Just better as ² goes to 0 1
17 Motivation, fixing bad example Sum squares of errors, by weighted degree n nx x i 1 X w i,j x j d i i=1 j i 2 X n i=1 d ix i X w i,j x j j i 2
18 Motivation, fixing bad example Sum squares of errors, by weighted degree n nx x i 1 X w i,j x j d i i=1 j i 2 X n i=1 To avoid degeneracy, require either d ix i X w i,j x j j i 2 or d i = X j i w i,j 1 (hard graph) j i X (1 di ) 2 αn i:d i <1 (α soft graph)
19 Example Random points in 2 dimensions (0.1 soft)
20 Example Two Clusters (0.1 soft)
21 Example (0.1 soft)
22 Unique? Not necessarily: Not unique on highly symmetric point sets. (will see why later) But, almost always unique. Conjecture: Unique with probability 1 after infinitesimally small random perturbation.
23 Related to Locally linear embeddings [Roweis Saul Chooses allowable neighbors, then chooses asymmetric weights to minimize the same objective function. K means If restrict graph to be a union of complete graphs, one for each cluster, are minimizing same objective function.
24 In terms of the graph Laplacian L = D A where A is the weighted adjacency matrix, and D is the diagonal matrix of degrees
25 In terms of the graph Laplacian L = D A where A is the weighted adjacency matrix, and D is the diagonal matrix of degrees XnX 2 i=1 d ix i X w i,j x j = klxk 2 F j i L is linear in edge weights, so is quadratic klxk 2 F (term rejected was ) D 1 LX 2 F
26 Sparse Solution Can fix LX, or difference vectors i = d i x i X w i,j x j = X w i,j (x i x j ) j i j i and weighted degrees: d = U w Imposing only n(d+1) constraints on w. If more than n(d+1) non zero entries, can make one zero, holding all these fixed, keeping w non negative.
27 Sparsity Theorem: For n vectors in d dimensions, always is a graph with at most (d+1)n edges, average degree 2(d+1). Theorem (later) For n vectors in any dimension, is an ² approx solution with average degree O(1/² 2 ).
28 Degrees on real data Data set Type n dim ave deg (hard) ave deg (soft) abalone regression glass 6 classes heart 2 classes housing regression ionosphere 2 classes iris 3 classes machine regression mpg regression pima 2 classes sonar 2 classes vehicle 4 classes vowel 11 classes wine 3 classes
29 Degrees on real data Data set Type n dim ave deg (hard) ave deg (soft) abalone regression glass 6 classes heart 2 classes housing regression ionosphere 2 classes iris 3 classes machine regression mpg regression pima 2 classes sonar 2 classes vehicle 4 classes vowel 11 classes wine 3 classes
30 Example for which graphs are not unique Consider set of vectors in {0,1} d of even parity. Sum of any two solutions is a solution, so sum over all isomorphisms of the point set Every vertex gets degree at least µ d 2 By previous theorem, is a solution of average degree at most 2(d +1) Cannot have symmetry and sparsity
31 Planarity For every set of vectors in 2 dimensions, there is a hard and α soft graph that is planar no crossings
32 Planarity For every set of vectors in 2 dimensions, there is a hard and α soft graph that is planar no crossings And, no point contained in a triangle., p g In general dimension, is a simplicial complex.
33 Planarity, proof A graph with maximum sum of weighted ihtddegrees has no crossings. increase weights outside decrease weights on crossing edges + Will fix LX and thus klxk 2 F but tincreases all weighted ihtddegrees
34 Planarity, proof x 0 +tα 0 β 0 y 0 +tα 1 β 0 +tα 0 β 1 z tα 0 α 1 tβ 0 β 1 x 1 z = α 0x 0 + α 1 x 1, z = β 0 y 0 + β 1 y 1, α 0 + α 1 = 1 β 0 + β 1 =1 y 1 +tα 1 β 1 Will fix LX and thus klxk 2 F but tincreases all weighted ihtddegrees
35 Classification and Regression Experiments Given a set S of vectors with labels, compute z minimizing y i IR X wi,j (z i z j ) 2 = z T Lz i j subject to z i = y i, for i S [ZGL 03] With 2 clusters, use an indicator vector for one. With k clusters, use k indicator vectors.
36 Classification Experiments Used 10 fold cross validation We had no parameters to train. For other means of choosing graphs, chose their parameters by gridding over choices, and evaluating by leave one out cross validation. Repeated 100 times
37 Classification experiments Data set n dim classes 0.1 soft knn thresh glass heart ionosphere iris pima sonar vehicle vowel wine tried unweighted and exponential weights varying σ tried unweighted, and exponential weights varying σ varied k for knn, and δ for threshold
38 Classification experiments Data set n dim classes 0.1 soft knn thresh libsvm glass heart ionosphere iris pima sonar vehicle vowel wine For libsvm [ChangLin] For libsvm [ChangLin], chose params by 10 fold CV on training dat
39 Regression Error Data set n dim hard 0.1 soft knn thresh epsilon svr abalone bl Too long housing machine mpg Data normalized to have variance 1. 2 fold CV on abalone, 10 fold otherwise
40 Clustering Experiments (0.1 soft) Dt Data set n dim k k means NJW Nvec = k glass heart ionosphere iris pima sonar vehicle Vowel wine NJW: Ng Jordan Weiss, as implemented in Spider. Use graph, run k means in Nvec eigenvectors.
41 Clustering Experiments (0.1 soft) Dt Data set n dim k k means NJW Nvec = k Nvec chosen glass (12) heart (2) 0.19 ionosphere (15) iris (8) pima (12) sonar (35) vehicle (6) Vowel (30) wine (3) NJW: Ng, Jordan, Weiss, as implemented in Spider. Use graph, run k means in Nvec eigenvectors.
42 Choosing number vectors for spectral pro ojectio on Vector number (by λ) Vector number (by λ) Project X into eigenvectors. Take vectors until all small coefficient. (Still fuzzy)
43 Quadratic Program for Edge Weights L = UWU T V Signed vertex edge adjacency matrix 1-1 U Diagonal matrix of edge weights ih E
44 Objective function is quadratic klxk 2 F = dx Lx k 2 = dx k=1 k=1 = = = UWU T x k 2 XdX UWy k 2 k=1 XdX UY UY w (k) k=1 2 UY (1). w UY (d) w is vector of edge weights y k = U T x k 2 Y (k) = diag(y k )
45 Soft graph program 2 Minimize M = Subject to kmwk 2 nx (max(1 d i, 0) 2 ) αn i=1 w 0 UY (1).. UY (d) d = U w
46 Soft graph program 2 Minimize M = Subject to kmwk 2 nx (max(1 d i, 0) 2 ) αn i=1 w 0 UY (1).. UY (d) d = U w Minimize n kmwk 2 + μ X i=1(max(1 d i, 0) 2 ) Searching over μ until nx (max(1 d i, 0) 2 )=αn i=1
47 Soft graph program, as NNLSQ n Minimize kmwk 2 + μ X d i, 0) i=1(max(1 2 ) s Minimize kmwk 2 + μ k1 + s U wk 2 w, s 0 A non negative least squares problem.
48 Computing Soft graph quickly Minimize kmwk 2 + μ k1 + s U wk 2 w, s 0 Look for solution among a subset of edges Given solution restricted to subset of edges check gradient of obj func wrt unused edges if all negative, finished otherwise, include some and solve again.
49 Computation Time (secs) Data set Type n dim soft time hard time abalone regression glass 6 classes heart 2 classes housing regression ionosphere 2 classes iris 3 classes machine regression mpg regression pima 2 classes sonar 2 classes vehicle 4 classes vowel 11 classes wine 3 classes
50 Approximate Sparse Solutions G e with average degree at most 16/² 2 For every graph G exists a sparse graph So that for all x, x T L 2 x x T L2 x ² x T Lx klkkxk If not too small, klxk 2 F elx 2 F Leave one out regression scores similiar
51 Approximate Sparse Solutions G e with average degree at most 16/² 2 For every graph G exists a sparse graph So that for all x, x T L 2 x x T L e 2 x ²(x T Lx) klkkxk Want x T L 2 x x T L e 2 x ²x T L 2 x
52 Sparsification Theorem [Batson S Srivastava 09] For every graph G exists a sparse graph e G with at most 4n/² 2 edges Such that for all x, (1 ²)x T e Lx x T Lx (1 + ²)x t e Lx
53 Sparsification Theorem [Batson S Srivastava 09] For every graph G exists a sparse graph e G with at most 4n/² 2 edges Such that for all x, (1 ²)x T e Lx x T Lx (1 + ²)x t e Lx Edges Time O(n log n/² 2 ) (linsolve) log n/² 2 [S Srivastava 08] O(n log 29 n/² 2 ) O(m log 17 n/² 2 ) [S Teng 04]
54 Lovasz s Graphs Steinitz it representations tti and the Colin de Verdiere number, Form a matrix M = P A, P is positive diagonal A is non negative (adjacency) MX = 0 (in fact, null(m) = X) M has one negative eigenvalue Works if points are on convex hull of polytope. A non zero only for edges of polytope.
55 Open Questions Are there other natural and useful graphs? Are there interesting ways to modify/parameterize our construction? Are our graphs connected? Are our graphs unique (under perturbation)? Do there exist sparse approximate solutions in all dimensions? Can one sparsify for the square of the Laplacian? How to interpolate off the graph, or add points?
Spectral and Electrical Graph Theory. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity
Spectral and Electrical Graph Theory Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Spectral Graph Theory: Understand graphs through eigenvectors and
More informationLaplacian Matrices of Graphs: Spectral and Electrical Theory
Laplacian Matrices of Graphs: Spectral and Electrical Theory Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale University Toronto, Sep. 28, 2 Outline Introduction to graphs
More informationAn Algorithmist s Toolkit September 10, Lecture 1
18.409 An Algorithmist s Toolkit September 10, 2009 Lecture 1 Lecturer: Jonathan Kelner Scribe: Jesse Geneson (2009) 1 Overview The class s goals, requirements, and policies were introduced, and topics
More informationGraphs, Vectors, and Matrices Daniel A. Spielman Yale University. AMS Josiah Willard Gibbs Lecture January 6, 2016
Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016 From Applied to Pure Mathematics Algebraic and Spectral Graph Theory Sparsification: approximating
More informationLaplacian Eigenmaps for Dimensionality Reduction and Data Representation
Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to
More informationGraph Partitioning Using Random Walks
Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient
More informationSpectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity
Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Adjacency matrix and Laplacian Intuition, spectral graph drawing
More informationAlgorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm
Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm Shang-Hua Teng Computer Science, Viterbi School of Engineering USC Massive Data and Massive Graphs 500 billions web
More informationMachine Learning for Data Science (CS4786) Lecture 11
Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will
More informationGraph Sparsification I : Effective Resistance Sampling
Graph Sparsification I : Effective Resistance Sampling Nikhil Srivastava Microsoft Research India Simons Institute, August 26 2014 Graphs G G=(V,E,w) undirected V = n w: E R + Sparsification Approximate
More informationSpectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min
Spectral Graph Theory Lecture 2 The Laplacian Daniel A. Spielman September 4, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in class. The notes written before
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationCMPSCI 791BB: Advanced ML: Laplacian Learning
CMPSCI 791BB: Advanced ML: Laplacian Learning Sridhar Mahadevan Outline! Spectral graph operators! Combinatorial graph Laplacian! Normalized graph Laplacian! Random walks! Machine learning on graphs! Clustering!
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationMachine Learning. Probabilistic KNN.
Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007
More informationLecture 13: Spectral Graph Theory
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved
More informationData Analysis and Manifold Learning Lecture 7: Spectral Clustering
Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral
More informationIntroduction to Spectral Graph Theory and Graph Clustering
Introduction to Spectral Graph Theory and Graph Clustering Chengming Jiang ECS 231 Spring 2016 University of California, Davis 1 / 40 Motivation Image partitioning in computer vision 2 / 40 Motivation
More informationLecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving
Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney
More informationSupervised locally linear embedding
Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer
More informationSparsification of Graphs and Matrices
Sparsification of Graphs and Matrices Daniel A. Spielman Yale University joint work with Joshua Batson (MIT) Nikhil Srivastava (MSR) Shang- Hua Teng (USC) HUJI, May 21, 2014 Objective of Sparsification:
More informationAlgorithms, Graph Theory, and the Solu7on of Laplacian Linear Equa7ons. Daniel A. Spielman Yale University
Algorithms, Graph Theory, and the Solu7on of Laplacian Linear Equa7ons Daniel A. Spielman Yale University Rutgers, Dec 6, 2011 Outline Linear Systems in Laplacian Matrices What? Why? Classic ways to solve
More informationL26: Advanced dimensionality reduction
L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern
More informationMachine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)
Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationDegenerate harmonic structures on fractal graphs.
Degenerate harmonic structures on fractal graphs. Konstantinos Tsougkas Uppsala University Fractals 6, Cornell University June 16, 2017 1 / 25 Analysis on fractals via analysis on graphs. The motivation
More informationAlgorithms, Graph Theory, and Linear Equa9ons in Laplacians. Daniel A. Spielman Yale University
Algorithms, Graph Theory, and Linear Equa9ons in Laplacians Daniel A. Spielman Yale University Shang- Hua Teng Carter Fussell TPS David Harbater UPenn Pavel Cur9s Todd Knoblock CTY Serge Lang 1927-2005
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationGraph Sparsifiers. Smaller graph that (approximately) preserves the values of some set of graph parameters. Graph Sparsification
Graph Sparsifiers Smaller graph that (approximately) preserves the values of some set of graph parameters Graph Sparsifiers Spanners Emulators Small stretch spanning trees Vertex sparsifiers Spectral sparsifiers
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationDiffusion and random walks on graphs
Diffusion and random walks on graphs Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural
More informationMATH 567: Mathematical Techniques in Data Science Clustering II
This lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 567: Mathematical Techniques in Data Science Clustering II Dominique Guillot Departments
More informationProperties of Expanders Expanders as Approximations of the Complete Graph
Spectral Graph Theory Lecture 10 Properties of Expanders Daniel A. Spielman October 2, 2009 10.1 Expanders as Approximations of the Complete Graph Last lecture, we defined an expander graph to be a d-regular
More informationMetric Embedding for Kernel Classification Rules
Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding
More informationCSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13
CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as
More informationSpectral Clustering. Guokun Lai 2016/10
Spectral Clustering Guokun Lai 2016/10 1 / 37 Organization Graph Cut Fundamental Limitations of Spectral Clustering Ng 2002 paper (if we have time) 2 / 37 Notation We define a undirected weighted graph
More informationLearning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst
Learning on Graphs and Manifolds CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Outline Manifold learning is a relatively new area of machine learning (2000-now). Main idea Model the underlying geometry of
More informationNonlinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationRandom Matrices: Invertibility, Structure, and Applications
Random Matrices: Invertibility, Structure, and Applications Roman Vershynin University of Michigan Colloquium, October 11, 2011 Roman Vershynin (University of Michigan) Random Matrices Colloquium 1 / 37
More informationData Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings
Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationSpectral Techniques for Clustering
Nicola Rebagliati 1/54 Spectral Techniques for Clustering Nicola Rebagliati 29 April, 2010 Nicola Rebagliati 2/54 Thesis Outline 1 2 Data Representation for Clustering Setting Data Representation and Methods
More informationEE 381V: Large Scale Optimization Fall Lecture 24 April 11
EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that
More informationDiscriminative Learning and Big Data
AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture
More informationDiffuse interface methods on graphs: Data clustering and Gamma-limits
Diffuse interface methods on graphs: Data clustering and Gamma-limits Yves van Gennip joint work with Andrea Bertozzi, Jeff Brantingham, Blake Hunter Department of Mathematics, UCLA Research made possible
More informationStatistical and Computational Analysis of Locality Preserving Projection
Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637
More informationMachine Learning - MT Clustering
Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:
More informationMetric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)
Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationAn Algorithmist s Toolkit September 15, Lecture 2
18.409 An Algorithmist s Toolkit September 15, 007 Lecture Lecturer: Jonathan Kelner Scribe: Mergen Nachin 009 1 Administrative Details Signup online for scribing. Review of Lecture 1 All of the following
More informationUsing Local Spectral Methods in Theory and in Practice
Using Local Spectral Methods in Theory and in Practice Michael W. Mahoney ICSI and Dept of Statistics, UC Berkeley ( For more info, see: http: // www. stat. berkeley. edu/ ~ mmahoney or Google on Michael
More informationConstant Arboricity Spectral Sparsifiers
Constant Arboricity Spectral Sparsifiers arxiv:1808.0566v1 [cs.ds] 16 Aug 018 Timothy Chu oogle timchu@google.com Jakub W. Pachocki Carnegie Mellon University pachocki@cs.cmu.edu August 0, 018 Abstract
More informationapproximation algorithms I
SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 201 meta-task encoded as low-degree polynomial in R x example: f(x) = i,j n w ij x i x j 2 given: functions
More informationLECTURE NOTE #11 PROF. ALAN YUILLE
LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform
More informationThe Simplest Construction of Expanders
Spectral Graph Theory Lecture 14 The Simplest Construction of Expanders Daniel A. Spielman October 16, 2009 14.1 Overview I am going to present the simplest construction of expanders that I have been able
More informationSINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES
SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES JIANG ZHU, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, P. R. China E-MAIL:
More informationRandom Sampling of Bandlimited Signals on Graphs
Random Sampling of Bandlimited Signals on Graphs Pierre Vandergheynst École Polytechnique Fédérale de Lausanne (EPFL) School of Engineering & School of Computer and Communication Sciences Joint work with
More informationMLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 - Clustering Lorenzo Rosasco UNIGE-MIT-IIT About this class We will consider an unsupervised setting, and in particular the problem of clustering unlabeled data into coherent groups. MLCC 2018
More informationGraph Sparsifiers: A Survey
Graph Sparsifiers: A Survey Nick Harvey UBC Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman, Srivastava and Teng Approximating Dense Objects
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationThe Curse of Dimensionality for Local Kernel Machines
The Curse of Dimensionality for Local Kernel Machines Yoshua Bengio, Olivier Delalleau & Nicolas Le Roux April 7th 2005 Yoshua Bengio, Olivier Delalleau & Nicolas Le Roux Snowbird Learning Workshop Perspective
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationKernel Methods. Charles Elkan October 17, 2007
Kernel Methods Charles Elkan elkan@cs.ucsd.edu October 17, 2007 Remember the xor example of a classification problem that is not linearly separable. If we map every example into a new representation, then
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationAn indicator for the number of clusters using a linear map to simplex structure
An indicator for the number of clusters using a linear map to simplex structure Marcus Weber, Wasinee Rungsarityotin, and Alexander Schliep Zuse Institute Berlin ZIB Takustraße 7, D-495 Berlin, Germany
More informationLocality Preserving Projections
Locality Preserving Projections Xiaofei He Department of Computer Science The University of Chicago Chicago, IL 60637 xiaofei@cs.uchicago.edu Partha Niyogi Department of Computer Science The University
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More information1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)
Math 35 Exam Review SOLUTIONS Overview In this third of the course we focused on linear learning algorithms to model data. summarize: To. Background: The SVD and the best basis (questions selected from
More informationScalable Subspace Clustering
Scalable Subspace Clustering René Vidal Center for Imaging Science, Laboratory for Computational Sensing and Robotics, Institute for Computational Medicine, Department of Biomedical Engineering, Johns
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationLaplacian Eigenmaps for Dimensionality Reduction and Data Representation
Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Neural Computation, June 2003; 15 (6):1373-1396 Presentation for CSE291 sp07 M. Belkin 1 P. Niyogi 2 1 University of Chicago, Department
More informationSpectral Clustering. Zitao Liu
Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of
More informationMATH 829: Introduction to Data Mining and Analysis Clustering II
his lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 829: Introduction to Data Mining and Analysis Clustering II Dominique Guillot Departments
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationManifold Regularization
9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationGraph Sparsification III: Ramanujan Graphs, Lifts, and Interlacing Families
Graph Sparsification III: Ramanujan Graphs, Lifts, and Interlacing Families Nikhil Srivastava Microsoft Research India Simons Institute, August 27, 2014 The Last Two Lectures Lecture 1. Every weighted
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More information9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee
9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic
More informationEffective Resistance and Schur Complements
Spectral Graph Theory Lecture 8 Effective Resistance and Schur Complements Daniel A Spielman September 28, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More information... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology
..... SPARROW SPARse approximation Weighted regression Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Université de Montréal March 12, 2012 SPARROW 1/47 .....
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 2 Nonlinear Manifold Learning Multidimensional Scaling (MDS) Locally Linear Embedding (LLE) Beyond Principal Components Analysis (PCA)
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More information2. (Review) Write an equation to describe each linear function based on the provided information. A. The linear function, k(x), has a slope
Sec 4.1 Creating Equations & Inequalities Building Linear, Quadratic, and Exponential Functions 1. (Review) Write an equation to describe each linear function graphed below. A. B. C. Name: f(x) = h(x)
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationUnsupervised dimensionality reduction
Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional
More informationShort Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning
Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants
More informationIntroduction to Machine Learning
Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection
More informationClassification and Pattern Recognition
Classification and Pattern Recognition Léon Bottou NEC Labs America COS 424 2/23/2010 The machine learning mix and match Goals Representation Capacity Control Operational Considerations Computational Considerations
More informationEight theorems in extremal spectral graph theory
Eight theorems in extremal spectral graph theory Michael Tait Carnegie Mellon University mtait@cmu.edu ICOMAS 2018 May 11, 2018 Michael Tait (CMU) May 11, 2018 1 / 1 Questions in extremal graph theory
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More information