Multiscale Wavelets on Trees, Graphs and High Dimensional Data

Similar documents
Global vs. Multiscale Approaches

Harmonic Analysis and Geometries of Digital Data Bases

Multiscale Wavelets on Trees, Graphs and High Dimensional Data: Theory and Applications to Semi Supervised Learning

Multiscale bi-harmonic Analysis of Digital Data Bases and Earth moving distances.

Graphs, Geometry and Semi-supervised Learning

Diffusion Wavelets and Applications

Diffusion Geometries, Global and Multiscale

Learning gradients: prescriptive models

Conference in Honor of Aline Bonami Orleans, June 2014

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

From graph to manifold Laplacian: The convergence rate

Diffusion/Inference geometries of data features, situational awareness and visualization. Ronald R Coifman Mathematics Yale University

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

Semi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data

Semi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)

Data-dependent representations: Laplacian Eigenmaps

Diffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets.

Nonlinear Dimensionality Reduction. Jose A. Costa

Clustering in kernel embedding spaces and organization of documents

March 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University

Stable MMPI-2 Scoring: Introduction to Kernel. Extension Techniques

Multiscale analysis on graphs

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

Semi-Supervised Learning by Multi-Manifold Separation

Graph-Based Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning

Self-Tuning Semantic Image Segmentation

On the Phase Transition Phenomenon of Graph Laplacian Eigenfunctions on Trees

Manifold Coarse Graining for Online Semi-supervised Learning

How to learn from very few examples?

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

Justin Solomon MIT, Spring 2017

Discrete vs. Continuous: Two Sides of Machine Learning

Semi-supervised Eigenvectors for Locally-biased Learning

Sparse Approximation: from Image Restoration to High Dimensional Classification

Bi-stochastic kernels via asymmetric affinity functions

Predicting Graph Labels using Perceptron. Shuang Song

A graph based approach to semi-supervised learning

Semi-Supervised Classification with Universum

Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

Semi-Supervised Learning of Speech Sounds

What is semi-supervised learning?

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Semi-Supervised Learning

Persistent homology and nonparametric regression

Diffusion Wavelets for multiscale analysis on manifolds and graphs: constructions and applications

Multiscale Analysis and Diffusion Semigroups With Applications

Limits of Spectral Clustering

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

Geometry on Probability Spaces

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Multiscale Manifold Learning

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Unlabeled Data: Now It Helps, Now It Doesn t

Directed Graph Embedding: an Algorithm based on Continuous Limits of Laplacian-type Operators

Manifold Learning: Theory and Applications to HRI

The Generalized Haar-Walsh Transform (GHWT) for Data Analysis on Graphs and Networks

Manifold Regularization

Local regression, intrinsic dimension, and nonparametric sparsity

Nonlinear Dimensionality Reduction

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

IFT LAPLACIAN APPLICATIONS. Mikhail Bessmeltsev

Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

DIMENSIONALITY REDUCTION METHODS IN INDEPENDENT SUBSPACE ANALYSIS FOR SIGNAL DETECTION. Mijail Guillemard, Armin Iske, Sara Krause-Solberg

Non-linear Dimensionality Reduction

EECS 275 Matrix Computation

A Statistical Look at Spectral Graph Analysis. Deep Mukhopadhyay

Graphs in Machine Learning

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Statistical learning on graphs

Mathematical Methods in Machine Learning

Manifold Denoising. Matthias Hein Markus Maier Max Planck Institute for Biological Cybernetics Tübingen, Germany

Improved Graph Laplacian via Geometric Consistency

Machine Learning with Quantum-Inspired Tensor Networks

LOCALITY PRESERVING HASHING. Electrical Engineering and Computer Science University of California, Merced Merced, CA 95344, USA

Data dependent operators for the spatial-spectral fusion problem

Diffusion on a Tensor Product Graph for Semi-Supervised Learning and Interactive Image Segmentation

A Multiscale Framework for Markov Decision Processes using Diffusion Wavelets

Risi Kondor, The University of Chicago. Nedelina Teneva UChicago. Vikas K Garg TTI-C, MIT

Semi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University

Spectral Hashing: Learning to Leverage 80 Million Images

Graph Metrics and Dimension Reduction

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

An Iterated Graph Laplacian Approach for Ranking on Manifolds

Is Manifold Learning for Toy Data only?

Filtering via a Reference Set. A.Haddad, D. Kushnir, R.R. Coifman Technical Report YALEU/DCS/TR-1441 February 21, 2011

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach

Scalable machine learning for massive datasets: Fast summation algorithms

Deep Learning: Approximation of Functions by Composition

Improved Local Coordinate Coding using Local Tangents

Transcription:

Multiscale Wavelets on Trees, Graphs and High Dimensional Data ICML 2010, Haifa Matan Gavish (Weizmann/Stanford) Boaz Nadler (Weizmann) Ronald Coifman (Yale)

Boaz Nadler Ronald Coifman

Motto... the relationships between smoothness and frequency forming the core ideas of Euclidean harmonic analysis are remarkably resilient, persisting in very general geometries. - Szlam, Maggioni, Coifman (2008)

Problem setup: Processing functions on a dataset Given a dataset X = {x 1,..., x N } with similarity matrix W i,j (or X = {x 1,..., x N } R d ) Nonparametric inference of f : X R Denoise: observe g = f + ε, recover f SSL / classification: extend f from X X to X

Problem setup: Processing functions on a dataset Given a dataset X = {x 1,..., x N } with similarity matrix W i,j (or X = {x 1,..., x N } R d ) Nonparametric inference of f : X R Denoise: observe g = f + ε, recover f SSL / classification: extend f from X X to X

Problem setup: Processing functions on a dataset Given a dataset X = {x 1,..., x N } with similarity matrix W i,j (or X = {x 1,..., x N } R d ) Nonparametric inference of f : X R Denoise: observe g = f + ε, recover f SSL / classification: extend f from X X to X

Problem setup: Processing functions on a dataset Given a dataset X = {x 1,..., x N } with similarity matrix W i,j (or X = {x 1,..., x N } R d ) Nonparametric inference of f : X R Denoise: observe g = f + ε, recover f SSL / classification: extend f from X X to X

Problem setup: Processing functions on a dataset Given a dataset X = {x 1,..., x N } with similarity matrix W i,j (or X = {x 1,..., x N } R d ) Nonparametric inference of f : X R Denoise: observe g = f + ε, recover f SSL / classification: extend f from X X to X

Problem setup: Processing functions on a dataset Given a dataset X = {x 1,..., x N } with similarity matrix W i,j (or X = {x 1,..., x N } R d ) Nonparametric inference of f : X R Denoise: observe g = f + ε, recover f SSL / classification: extend f from X X to X

Problem setup: Processing functions on a dataset Given a dataset X = {x 1,..., x N } with similarity matrix W i,j (or X = {x 1,..., x N } R d ) Nonparametric inference of f : X R Denoise: observe g = f + ε, recover f SSL / classification: extend f from X X to X

Problem setup: Processing functions on a dataset Given a dataset X = {x 1,..., x N } with similarity matrix W i,j (or X = {x 1,..., x N } R d ) Nonparametric inference of f : X R Denoise: observe g = f + ε, recover f SSL / classification: extend f from X X to X

Problem setup: Data adaptive orthobasis Can use local geometry W, but why reinvent the wheel? Enter Euclid Harmonic analysis wisdom in low dim Euclidean space: use orthobasis {ψ i } for space of functions f : X R Popular bases: Fourier, wavelet Process f in coefficient domain e.g. estimate, thereshold Exit Euclid We want to build {ψ i } according to graph W

Problem setup: Data adaptive orthobasis Can use local geometry W, but why reinvent the wheel? Enter Euclid Harmonic analysis wisdom in low dim Euclidean space: use orthobasis {ψ i } for space of functions f : X R Popular bases: Fourier, wavelet Process f in coefficient domain e.g. estimate, thereshold Exit Euclid We want to build {ψ i } according to graph W

Problem setup: Data adaptive orthobasis Can use local geometry W, but why reinvent the wheel? Enter Euclid Harmonic analysis wisdom in low dim Euclidean space: use orthobasis {ψ i } for space of functions f : X R Popular bases: Fourier, wavelet Process f in coefficient domain e.g. estimate, thereshold Exit Euclid We want to build {ψ i } according to graph W

Problem setup: Data adaptive orthobasis Can use local geometry W, but why reinvent the wheel? Enter Euclid Harmonic analysis wisdom in low dim Euclidean space: use orthobasis {ψ i } for space of functions f : X R Popular bases: Fourier, wavelet Process f in coefficient domain e.g. estimate, thereshold Exit Euclid We want to build {ψ i } according to graph W

Problem setup: Data adaptive orthobasis Can use local geometry W, but why reinvent the wheel? Enter Euclid Harmonic analysis wisdom in low dim Euclidean space: use orthobasis {ψ i } for space of functions f : X R Popular bases: Fourier, wavelet Process f in coefficient domain e.g. estimate, thereshold Exit Euclid We want to build {ψ i } according to graph W

Problem setup: Data adaptive orthobasis Can use local geometry W, but why reinvent the wheel? Enter Euclid Harmonic analysis wisdom in low dim Euclidean space: use orthobasis {ψ i } for space of functions f : X R Popular bases: Fourier, wavelet Process f in coefficient domain e.g. estimate, thereshold Exit Euclid We want to build {ψ i } according to graph W

Problem setup: Data adaptive orthobasis Can use local geometry W, but why reinvent the wheel? Enter Euclid Harmonic analysis wisdom in low dim Euclidean space: use orthobasis {ψ i } for space of functions f : X R Popular bases: Fourier, wavelet Process f in coefficient domain e.g. estimate, thereshold Exit Euclid We want to build {ψ i } according to graph W

Toy example Chapelle, Scholkopf and Zien, Semi-supervised learning, 2006 Example: USPS benchmark X is USPS (ML benchmark) as 1500 vectors in R 16 16 = R 256 Affinity W i,j = exp ( x i x j 2) f : X {1, 1} is the class label.

Toy example Chapelle, Scholkopf and Zien, Semi-supervised learning, 2006 Example: USPS benchmark X is USPS (ML benchmark) as 1500 vectors in R 16 16 = R 256 Affinity W i,j = exp ( x i x j 2) f : X {1, 1} is the class label.

Toy example Chapelle, Scholkopf and Zien, Semi-supervised learning, 2006 Example: USPS benchmark X is USPS (ML benchmark) as 1500 vectors in R 16 16 = R 256 Affinity W i,j = exp ( x i x j 2) f : X {1, 1} is the class label.

Toy example Chapelle, Scholkopf and Zien, Semi-supervised learning, 2006 Example: USPS benchmark X is USPS (ML benchmark) as 1500 vectors in R 16 16 = R 256 Affinity W i,j = exp ( x i x j 2) f : X {1, 1} is the class label.

Toy example: visualization by kernel PCA 0.015 0.01 0.005 0 0.005 0.01 0.015 0.02 0.025 0.02 0.01 1 0.8 0 0.6 0.01 0.02 0.025 0.02 0.015 0.4 0.2 0 0.01 0.005 0 0.005 0.2 0.4 0.6 0.01 0.015 0.8 1

Fourier Basis for {f : X R} Belkin and Niyogi, Using manifold structure for partially labelled classification, 2003 Generalizing Fourier: The Graph Laplacian eigenbasis Take (W D)ψ i = λ i ψ i where D i,i = j W i,j 0.015 0.01 0.005 0 0.005 0.01 0.015 0.02 0.025 0.02 0.015 0.01 0.06 0.005 0 0.005 0.04 0.01 0.015 0.025 0.02 0.02 0.015 0.01 0 0.005 0 0.02 0.005 0.01 0.015 0.04

Cons of Laplacian Eigenbasis Laplacian ( Graph Fourier ) basis Oscillatory, nonlocalized Uninterpretable Scalability challenging No theoretical bound on f, ψ i Empirically slow coefficient decay No fast transform

Toy example: Graph Laplacian Eigenbasis

Toy example: Graph Laplacian Eigenbasis

Cons of Laplacian Eigenbasis Laplacian eigenbasis Oscillatory, nonlocalized Uninterpretable Scalability Challenging No theory bound on f, ψ i Empirically slow decay No fast transform Dream Basis Localized Interpretable Computation scalable Provably fast coef. decay Empricially fast decay Fast transform

On Euclidean space, Wavelet basis solves this Localized Interpretable - scale/shift of same function Fundamental wavelet property on R - coeffs decay: If ψ is a regular wavelet and 0 < α < 1, then f (x) f (y) C x y α f, ψ l,k C 2 l(α+ 1 2) Fast transform

Wavelet basis for {f : X R}? Prior Art Diffusion wavelets (Coifman, Maggioni) Anisotropic Haar bases (Donoho) Treelets (Nadler, Lee, Wasserman)

Main Message Any Balanced Partition Tree whose metric preserves smoothness in W yields an extremely simple Wavelet Dream Basis

Cons of Laplacian Eigenbasis Laplacian eigenbasis Oscillatory, nonlocalized Uninterpretable Scalability Challenging No theory bound on f, ψ i Empirically slow decay No fast transform Haar-like Basis Localized Easily interpretable Computation scalable f smooth f, ψ l,k c l Empricially fast decay Fast transform

Toy example: Haar-like coeffs decay

Toy example: Haar-like coeffs decay

Toy example: Haar-like coeffs decay

Toy example: Haar-like coeffs decay

Eigenfunctions are oscillatory 0.015 0.01 0.005 0 0.005 0.01 0.015 0.02 0.025 0.02 0.015 0.01 0.005 0.06 0 0.005 0.01 0.015 0.025 0.02 0.04 0.02 0.015 0.01 0 0.005 0 0.005 0.02 0.01 0.015 0.04

Toy example: Haar-like basis function 0.015 0.01 0.005 0 0.005 0.01 0.015 0.02 0.025 0.02 0.01 0.04 0 0.01 0.02 0.02 0.025 0.02 0 0.015 0.01 0.005 0.02 0 0.005 0.04 0.01 0.015 0.06

Main Message Any Balanced Partition Tree, whose metric preserves smoothness in W,yields an extremely simple Basis

A Partition Tree on the nodes

A Partition Tree on the nodes

A Partition Tree on the nodes

A Partition Tree on the nodes

A Partition Tree on the nodes

Partition Tree (Dendrogram)

Partition Tree (Dendrogram)

Partition Tree (Dendrogram)

Partition Tree (Dendrogram)

Partition Tree (Dendrogram)

Partition Tree (Dendrogram)

Partition Tree (Dendrogram)

The Haar Basis on [0, 1]

The Haar Basis on [0, 1]

The Haar Basis on [0, 1]

The Haar Basis on [0, 1]

The Haar Basis on [0, 1]

The Haar Basis on [0, 1]

The Haar Basis on [0, 1]

The Haar Basis on [0, 1]

The Haar Basis on [0, 1]

The Haar Basis on [0, 1]

The Haar Basis on [0, 1]

Partition Tree (Dendrogram)

Partition Tree (Dendrogram)

Partition Tree (Dendrogram)

Partition Tree (Dendrogram)

Partition Tree (Dendrogram)

Partition Tree (Dendrogram)

Partition Tree (Dendrogram)

Partition Tree Haar-like basis

Partition Tree Haar-like basis

Partition Tree Haar-like basis

Partition Tree Haar-like basis

Partition Tree Haar-like basis

Partition Tree Haar-like basis

Partition Tree Haar-like basis

Partition Tree Haar-like basis

Partition Tree Haar-like basis

Partition Tree Haar-like basis

Partition Tree Haar-like basis

Main Message Any Balanced Partition Tree,whose metric preserves smoothness in W, yields an extremely simple WaveletBasis

f smooth in tree metric coefs decay How to define smoothness Partition tree T induces natrual tree (ultra-) metric d Measure smoothness of f : X R w.r.t d Theorem Let f : X R. Then f (x) f (y) C d (x, y) α f, ψ l,k C supp (ψ l,k ) (α+ 1 2) for any Haar-like basis {ψ l.k } based on the tree T. If the tree is balanced offspring folder q parent folder Then f (x) f (y) C d (x, y) α f, ψ l,k C q l(α+ 1 2) (for classical Haar, q = 1 2 ).

f smooth in tree metric coefs decay How to define smoothness Partition tree T induces natrual tree (ultra-) metric d Measure smoothness of f : X R w.r.t d Theorem Let f : X R. Then f (x) f (y) C d (x, y) α f, ψ l,k C supp (ψ l,k ) (α+ 1 2) for any Haar-like basis {ψ l.k } based on the tree T. If the tree is balanced offspring folder q parent folder Then f (x) f (y) C d (x, y) α f, ψ l,k C q l(α+ 1 2) (for classical Haar, q = 1 2 ).

f smooth in tree metric coefs decay How to define smoothness Partition tree T induces natrual tree (ultra-) metric d Measure smoothness of f : X R w.r.t d Theorem Let f : X R. Then f (x) f (y) C d (x, y) α f, ψ l,k C supp (ψ l,k ) (α+ 1 2) for any Haar-like basis {ψ l.k } based on the tree T. If the tree is balanced offspring folder q parent folder Then f (x) f (y) C d (x, y) α f, ψ l,k C q l(α+ 1 2) (for classical Haar, q = 1 2 ).

f smooth in tree metric coefs decay How to define smoothness Partition tree T induces natrual tree (ultra-) metric d Measure smoothness of f : X R w.r.t d Theorem Let f : X R. Then f (x) f (y) C d (x, y) α f, ψ l,k C supp (ψ l,k ) (α+ 1 2) for any Haar-like basis {ψ l.k } based on the tree T. If the tree is balanced offspring folder q parent folder Then f (x) f (y) C d (x, y) α f, ψ l,k C q l(α+ 1 2) (for classical Haar, q = 1 2 ).

f smooth in tree metric coefs decay How to define smoothness Partition tree T induces natrual tree (ultra-) metric d Measure smoothness of f : X R w.r.t d Theorem Let f : X R. Then f (x) f (y) C d (x, y) α f, ψ l,k C supp (ψ l,k ) (α+ 1 2) for any Haar-like basis {ψ l.k } based on the tree T. If the tree is balanced offspring folder q parent folder Then f (x) f (y) C d (x, y) α f, ψ l,k C q l(α+ 1 2) (for classical Haar, q = 1 2 ).

Results Results 1 Any partition tree on X induces wavelet Haar-like bases 2 Balanced tree f smooth equals fast coefficient decay 3 Application to semi-supervised learning 4 Beyond basics: Comparing trees, Tensor product of Haar-like bases

Application: Semi supervised learning Classification/Regression with Haar-like basis Task: Given values of smooth f on X X, extend f to X. Step 1: Build a partition tree s.t. f is smooth w.r.t tree metric Step 2: Construct a Haar-like basis {ψ l,i } Step 3: Estimate ˆf = f, ψl,i ψ l,i Control over coefficient decay non-parametric risk analysis Bound on E f ˆf 2 depends only on smoothness α of the target f and # labeled points

Application: Semi supervised learning Classification/Regression with Haar-like basis Task: Given values of smooth f on X X, extend f to X. Step 1: Build a partition tree s.t. f is smooth w.r.t tree metric Step 2: Construct a Haar-like basis {ψ l,i } Step 3: Estimate ˆf = f, ψl,i ψ l,i Control over coefficient decay non-parametric risk analysis Bound on E f ˆf 2 depends only on smoothness α of the target f and # labeled points

Application: Semi supervised learning Classification/Regression with Haar-like basis Task: Given values of smooth f on X X, extend f to X. Step 1: Build a partition tree s.t. f is smooth w.r.t tree metric Step 2: Construct a Haar-like basis {ψ l,i } Step 3: Estimate ˆf = f, ψl,i ψ l,i Control over coefficient decay non-parametric risk analysis Bound on E f ˆf 2 depends only on smoothness α of the target f and # labeled points

Application: Semi supervised learning Classification/Regression with Haar-like basis Task: Given values of smooth f on X X, extend f to X. Step 1: Build a partition tree s.t. f is smooth w.r.t tree metric Step 2: Construct a Haar-like basis {ψ l,i } Step 3: Estimate ˆf = f, ψl,i ψ l,i Control over coefficient decay non-parametric risk analysis Bound on E f ˆf 2 depends only on smoothness α of the target f and # labeled points

Application: Semi supervised learning Classification/Regression with Haar-like basis Task: Given values of smooth f on X X, extend f to X. Step 1: Build a partition tree s.t. f is smooth w.r.t tree metric Step 2: Construct a Haar-like basis {ψ l,i } Step 3: Estimate ˆf = f, ψl,i ψ l,i Control over coefficient decay non-parametric risk analysis Bound on E f ˆf 2 depends only on smoothness α of the target f and # labeled points

Application: Semi supervised learning Classification/Regression with Haar-like basis Task: Given values of smooth f on X X, extend f to X. Step 1: Build a partition tree s.t. f is smooth w.r.t tree metric Step 2: Construct a Haar-like basis {ψ l,i } Step 3: Estimate ˆf = f, ψl,i ψ l,i Control over coefficient decay non-parametric risk analysis Bound on E f ˆf 2 depends only on smoothness α of the target f and # labeled points

Application: Semi supervised learning Classification/Regression with Haar-like basis Task: Given values of smooth f on X X, extend f to X. Step 1: Build a partition tree s.t. f is smooth w.r.t tree metric Step 2: Construct a Haar-like basis {ψ l,i } Step 3: Estimate ˆf = f, ψl,i ψ l,i Control over coefficient decay non-parametric risk analysis Bound on E f ˆf 2 depends only on smoothness α of the target f and # labeled points

Toy Example benchmark Test Error (%) 30 25 20 15 10 Laplacian Eigenmaps Laplacian Reg. Adaptive Threshold Haar like basis State of the art 5 0 0 20 40 60 80 100 # of labeled points (out of 1500)

MNIST Digits 8 vs. {3,4,5,7} 25 20 Laplacian Eigenmaps Laplacian Reg. Adaptive Threshold Haar like basis Test Error (%) 15 10 5 0 0 20 40 60 80 100 # of labeled points (out of 1000)

Results Results 1 Any partition tree on X induces wavelet Haar-like bases 2 Balanced tree f smooth equals fast coefficient decay 3 Application to semi-supervised learning 4 Beyond basics: Tensor product of Haar-like bases, Coefficient thresholding

Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

Tensor product of Haar-like bases

The Big Picture Coifman and Weiss, Extensions of Hardy spaces and their use in analysis, 1979 Geometric tools for Data analysis widely recognized Analysis tools (e.g. function spaces, wavelet theory) in graph or general geometries valuable and largely unexplored in ML context Deep theory, long tradition: geometry of X bases for {f : X R} ( Spaces of Homogeneous Type )

The Big Picture Coifman and Weiss, Extensions of Hardy spaces and their use in analysis, 1979 Geometric tools for Data analysis widely recognized Analysis tools (e.g. function spaces, wavelet theory) in graph or general geometries valuable and largely unexplored in ML context Deep theory, long tradition: geometry of X bases for {f : X R} ( Spaces of Homogeneous Type )

The Big Picture Coifman and Weiss, Extensions of Hardy spaces and their use in analysis, 1979 Geometric tools for Data analysis widely recognized Analysis tools (e.g. function spaces, wavelet theory) in graph or general geometries valuable and largely unexplored in ML context Deep theory, long tradition: geometry of X bases for {f : X R} ( Spaces of Homogeneous Type )

The Big Picture Coifman and Weiss, Extensions of Hardy spaces and their use in analysis, 1979 Geometric tools for Data analysis widely recognized Analysis tools (e.g. function spaces, wavelet theory) in graph or general geometries valuable and largely unexplored in ML context Deep theory, long tradition: geometry of X bases for {f : X R} ( Spaces of Homogeneous Type )

Summary Motto... the relationships between smoothness and frequency forming the core ideas of Euclidean harmonic analysis are remarkably resilient, persisting in very general geometries. - Szlam, Maggioni, Coifman (2008) Main message Any Balanced Partition Tree whose metric preserves smoothness in W yields an extremely simple Dream Wavelet Basis Fascinating open question Which graphs admit Balanced Partition Trees, whose metric preserves smoothness in W?

Supporting Information - proofs & code: www.stanford.edu/ gavish ; www.wisdom.weizmann.ac.il/ nadler/ G, Nadler and Coifman, Multiscale wavelets on trees, graphs and high dimensional data, Proceedings of ICML 2010 G, Nadler and Coifman, Inference using Haar-like wavelets, preprint (2010) Coifman and G, Harmonic analysis of digital data bases, to appear in Wavelets: Old and New Perspectives, Springer (2010) Coifman and Weiss, Extensions of Hardy spaces and their use in analysis, Bul. Of the AMS, 83 #4, pp. 569-645 (1977) Singh, Nowak and Calderbank, Detecting weak but hierarchically-structured patterns in networks, Proceedings of AISTATS 2010