Multiscale Wavelets on Trees, Graphs and High Dimensional Data

Size: px

Start display at page:

Download "Multiscale Wavelets on Trees, Graphs and High Dimensional Data"

Claud Wilkinson
6 years ago
Views:

1 Multiscale Wavelets on Trees, Graphs and High Dimensional Data ICML 2010, Haifa Matan Gavish (Weizmann/Stanford) Boaz Nadler (Weizmann) Ronald Coifman (Yale)

2 Boaz Nadler Ronald Coifman

3 Motto... the relationships between smoothness and frequency forming the core ideas of Euclidean harmonic analysis are remarkably resilient, persisting in very general geometries. - Szlam, Maggioni, Coifman (2008)

4 Problem setup: Processing functions on a dataset Given a dataset X = {x 1,..., x N } with similarity matrix W i,j (or X = {x 1,..., x N } R d ) Nonparametric inference of f : X R Denoise: observe g = f + ε, recover f SSL / classification: extend f from X X to X

5 Problem setup: Processing functions on a dataset Given a dataset X = {x 1,..., x N } with similarity matrix W i,j (or X = {x 1,..., x N } R d ) Nonparametric inference of f : X R Denoise: observe g = f + ε, recover f SSL / classification: extend f from X X to X

6 Problem setup: Processing functions on a dataset Given a dataset X = {x 1,..., x N } with similarity matrix W i,j (or X = {x 1,..., x N } R d ) Nonparametric inference of f : X R Denoise: observe g = f + ε, recover f SSL / classification: extend f from X X to X

7 Problem setup: Processing functions on a dataset Given a dataset X = {x 1,..., x N } with similarity matrix W i,j (or X = {x 1,..., x N } R d ) Nonparametric inference of f : X R Denoise: observe g = f + ε, recover f SSL / classification: extend f from X X to X

8 Problem setup: Processing functions on a dataset Given a dataset X = {x 1,..., x N } with similarity matrix W i,j (or X = {x 1,..., x N } R d ) Nonparametric inference of f : X R Denoise: observe g = f + ε, recover f SSL / classification: extend f from X X to X

9 Problem setup: Processing functions on a dataset Given a dataset X = {x 1,..., x N } with similarity matrix W i,j (or X = {x 1,..., x N } R d ) Nonparametric inference of f : X R Denoise: observe g = f + ε, recover f SSL / classification: extend f from X X to X

10 Problem setup: Processing functions on a dataset Given a dataset X = {x 1,..., x N } with similarity matrix W i,j (or X = {x 1,..., x N } R d ) Nonparametric inference of f : X R Denoise: observe g = f + ε, recover f SSL / classification: extend f from X X to X

11 Problem setup: Processing functions on a dataset Given a dataset X = {x 1,..., x N } with similarity matrix W i,j (or X = {x 1,..., x N } R d ) Nonparametric inference of f : X R Denoise: observe g = f + ε, recover f SSL / classification: extend f from X X to X

12 Problem setup: Data adaptive orthobasis Can use local geometry W, but why reinvent the wheel? Enter Euclid Harmonic analysis wisdom in low dim Euclidean space: use orthobasis {ψ i } for space of functions f : X R Popular bases: Fourier, wavelet Process f in coefficient domain e.g. estimate, thereshold Exit Euclid We want to build {ψ i } according to graph W

13 Problem setup: Data adaptive orthobasis Can use local geometry W, but why reinvent the wheel? Enter Euclid Harmonic analysis wisdom in low dim Euclidean space: use orthobasis {ψ i } for space of functions f : X R Popular bases: Fourier, wavelet Process f in coefficient domain e.g. estimate, thereshold Exit Euclid We want to build {ψ i } according to graph W

14 Problem setup: Data adaptive orthobasis Can use local geometry W, but why reinvent the wheel? Enter Euclid Harmonic analysis wisdom in low dim Euclidean space: use orthobasis {ψ i } for space of functions f : X R Popular bases: Fourier, wavelet Process f in coefficient domain e.g. estimate, thereshold Exit Euclid We want to build {ψ i } according to graph W

15 Problem setup: Data adaptive orthobasis Can use local geometry W, but why reinvent the wheel? Enter Euclid Harmonic analysis wisdom in low dim Euclidean space: use orthobasis {ψ i } for space of functions f : X R Popular bases: Fourier, wavelet Process f in coefficient domain e.g. estimate, thereshold Exit Euclid We want to build {ψ i } according to graph W

16 Problem setup: Data adaptive orthobasis Can use local geometry W, but why reinvent the wheel? Enter Euclid Harmonic analysis wisdom in low dim Euclidean space: use orthobasis {ψ i } for space of functions f : X R Popular bases: Fourier, wavelet Process f in coefficient domain e.g. estimate, thereshold Exit Euclid We want to build {ψ i } according to graph W

17 Problem setup: Data adaptive orthobasis Can use local geometry W, but why reinvent the wheel? Enter Euclid Harmonic analysis wisdom in low dim Euclidean space: use orthobasis {ψ i } for space of functions f : X R Popular bases: Fourier, wavelet Process f in coefficient domain e.g. estimate, thereshold Exit Euclid We want to build {ψ i } according to graph W

18 Problem setup: Data adaptive orthobasis Can use local geometry W, but why reinvent the wheel? Enter Euclid Harmonic analysis wisdom in low dim Euclidean space: use orthobasis {ψ i } for space of functions f : X R Popular bases: Fourier, wavelet Process f in coefficient domain e.g. estimate, thereshold Exit Euclid We want to build {ψ i } according to graph W

19 Toy example Chapelle, Scholkopf and Zien, Semi-supervised learning, 2006 Example: USPS benchmark X is USPS (ML benchmark) as 1500 vectors in R = R 256 Affinity W i,j = exp ( x i x j 2) f : X {1, 1} is the class label.

20 Toy example Chapelle, Scholkopf and Zien, Semi-supervised learning, 2006 Example: USPS benchmark X is USPS (ML benchmark) as 1500 vectors in R = R 256 Affinity W i,j = exp ( x i x j 2) f : X {1, 1} is the class label.

21 Toy example Chapelle, Scholkopf and Zien, Semi-supervised learning, 2006 Example: USPS benchmark X is USPS (ML benchmark) as 1500 vectors in R = R 256 Affinity W i,j = exp ( x i x j 2) f : X {1, 1} is the class label.

X is USPS (ML benchmark) as 1500 vectors in R 16 16 = R

22 Toy example Chapelle, Scholkopf and Zien, Semi-supervised learning, 2006 Example: USPS benchmark X is USPS (ML benchmark) as 1500 vectors in R = R 256 Affinity W i,j = exp ( x i x j 2) f : X {1, 1} is the class label.

23 Toy example: visualization by kernel PCA

24 Fourier Basis for {f : X R} Belkin and Niyogi, Using manifold structure for partially labelled classification, 2003 Generalizing Fourier: The Graph Laplacian eigenbasis Take (W D)ψ i = λ i ψ i where D i,i = j W i,j

25 Cons of Laplacian Eigenbasis Laplacian ( Graph Fourier ) basis Oscillatory, nonlocalized Uninterpretable Scalability challenging No theoretical bound on f, ψ i Empirically slow coefficient decay No fast transform

26 Toy example: Graph Laplacian Eigenbasis

27 Toy example: Graph Laplacian Eigenbasis

28 Cons of Laplacian Eigenbasis Laplacian eigenbasis Oscillatory, nonlocalized Uninterpretable Scalability Challenging No theory bound on f, ψ i Empirically slow decay No fast transform Dream Basis Localized Interpretable Computation scalable Provably fast coef. decay Empricially fast decay Fast transform

29 On Euclidean space, Wavelet basis solves this Localized Interpretable - scale/shift of same function Fundamental wavelet property on R - coeffs decay: If ψ is a regular wavelet and 0 < α < 1, then f (x) f (y) C x y α f, ψ l,k C 2 l(α+ 1 2) Fast transform

30 Wavelet basis for {f : X R}? Prior Art Diffusion wavelets (Coifman, Maggioni) Anisotropic Haar bases (Donoho) Treelets (Nadler, Lee, Wasserman)

31 Main Message Any Balanced Partition Tree whose metric preserves smoothness in W yields an extremely simple Wavelet Dream Basis

33 Cons of Laplacian Eigenbasis Laplacian eigenbasis Oscillatory, nonlocalized Uninterpretable Scalability Challenging No theory bound on f, ψ i Empirically slow decay No fast transform Haar-like Basis Localized Easily interpretable Computation scalable f smooth f, ψ l,k c l Empricially fast decay Fast transform

34 Toy example: Haar-like coeffs decay

35 Toy example: Haar-like coeffs decay

36 Toy example: Haar-like coeffs decay

37 Toy example: Haar-like coeffs decay

38 Eigenfunctions are oscillatory

39 Toy example: Haar-like basis function

40 Main Message Any Balanced Partition Tree, whose metric preserves smoothness in W,yields an extremely simple Basis

41 A Partition Tree on the nodes

42 A Partition Tree on the nodes

43 A Partition Tree on the nodes

44 A Partition Tree on the nodes

45 A Partition Tree on the nodes

46 Partition Tree (Dendrogram)

47 Partition Tree (Dendrogram)

48 Partition Tree (Dendrogram)

49 Partition Tree (Dendrogram)

50 Partition Tree (Dendrogram)

51 Partition Tree (Dendrogram)

52 Partition Tree (Dendrogram)

53 The Haar Basis on [0, 1]

54 The Haar Basis on [0, 1]

55 The Haar Basis on [0, 1]

56 The Haar Basis on [0, 1]

57 The Haar Basis on [0, 1]

58 The Haar Basis on [0, 1]

59 The Haar Basis on [0, 1]

60 The Haar Basis on [0, 1]

61 The Haar Basis on [0, 1]

62 The Haar Basis on [0, 1]

63 The Haar Basis on [0, 1]

64 Partition Tree (Dendrogram)

65 Partition Tree (Dendrogram)

66 Partition Tree (Dendrogram)

67 Partition Tree (Dendrogram)

68 Partition Tree (Dendrogram)

69 Partition Tree (Dendrogram)

70 Partition Tree (Dendrogram)

71 Partition Tree Haar-like basis

72 Partition Tree Haar-like basis

73 Partition Tree Haar-like basis

74 Partition Tree Haar-like basis

75 Partition Tree Haar-like basis

76 Partition Tree Haar-like basis

77 Partition Tree Haar-like basis

78 Partition Tree Haar-like basis

79 Partition Tree Haar-like basis

80 Partition Tree Haar-like basis

81 Partition Tree Haar-like basis

83 Main Message Any Balanced Partition Tree,whose metric preserves smoothness in W, yields an extremely simple WaveletBasis

84 f smooth in tree metric coefs decay How to define smoothness Partition tree T induces natrual tree (ultra-) metric d Measure smoothness of f : X R w.r.t d Theorem Let f : X R. Then f (x) f (y) C d (x, y) α f, ψ l,k C supp (ψ l,k ) (α+ 1 2) for any Haar-like basis {ψ l.k } based on the tree T. If the tree is balanced offspring folder q parent folder Then f (x) f (y) C d (x, y) α f, ψ l,k C q l(α+ 1 2) (for classical Haar, q = 1 2 ).

85 f smooth in tree metric coefs decay How to define smoothness Partition tree T induces natrual tree (ultra-) metric d Measure smoothness of f : X R w.r.t d Theorem Let f : X R. Then f (x) f (y) C d (x, y) α f, ψ l,k C supp (ψ l,k ) (α+ 1 2) for any Haar-like basis {ψ l.k } based on the tree T. If the tree is balanced offspring folder q parent folder Then f (x) f (y) C d (x, y) α f, ψ l,k C q l(α+ 1 2) (for classical Haar, q = 1 2 ).

86 f smooth in tree metric coefs decay How to define smoothness Partition tree T induces natrual tree (ultra-) metric d Measure smoothness of f : X R w.r.t d Theorem Let f : X R. Then f (x) f (y) C d (x, y) α f, ψ l,k C supp (ψ l,k ) (α+ 1 2) for any Haar-like basis {ψ l.k } based on the tree T. If the tree is balanced offspring folder q parent folder Then f (x) f (y) C d (x, y) α f, ψ l,k C q l(α+ 1 2) (for classical Haar, q = 1 2 ).

87 f smooth in tree metric coefs decay How to define smoothness Partition tree T induces natrual tree (ultra-) metric d Measure smoothness of f : X R w.r.t d Theorem Let f : X R. Then f (x) f (y) C d (x, y) α f, ψ l,k C supp (ψ l,k ) (α+ 1 2) for any Haar-like basis {ψ l.k } based on the tree T. If the tree is balanced offspring folder q parent folder Then f (x) f (y) C d (x, y) α f, ψ l,k C q l(α+ 1 2) (for classical Haar, q = 1 2 ).

88 f smooth in tree metric coefs decay How to define smoothness Partition tree T induces natrual tree (ultra-) metric d Measure smoothness of f : X R w.r.t d Theorem Let f : X R. Then f (x) f (y) C d (x, y) α f, ψ l,k C supp (ψ l,k ) (α+ 1 2) for any Haar-like basis {ψ l.k } based on the tree T. If the tree is balanced offspring folder q parent folder Then f (x) f (y) C d (x, y) α f, ψ l,k C q l(α+ 1 2) (for classical Haar, q = 1 2 ).

89 Results Results 1 Any partition tree on X induces wavelet Haar-like bases 2 Balanced tree f smooth equals fast coefficient decay 3 Application to semi-supervised learning 4 Beyond basics: Comparing trees, Tensor product of Haar-like bases

90 Application: Semi supervised learning Classification/Regression with Haar-like basis Task: Given values of smooth f on X X, extend f to X. Step 1: Build a partition tree s.t. f is smooth w.r.t tree metric Step 2: Construct a Haar-like basis {ψ l,i } Step 3: Estimate ˆf = f, ψl,i ψ l,i Control over coefficient decay non-parametric risk analysis Bound on E f ˆf 2 depends only on smoothness α of the target f and # labeled points

91 Application: Semi supervised learning Classification/Regression with Haar-like basis Task: Given values of smooth f on X X, extend f to X. Step 1: Build a partition tree s.t. f is smooth w.r.t tree metric Step 2: Construct a Haar-like basis {ψ l,i } Step 3: Estimate ˆf = f, ψl,i ψ l,i Control over coefficient decay non-parametric risk analysis Bound on E f ˆf 2 depends only on smoothness α of the target f and # labeled points

92 Application: Semi supervised learning Classification/Regression with Haar-like basis Task: Given values of smooth f on X X, extend f to X. Step 1: Build a partition tree s.t. f is smooth w.r.t tree metric Step 2: Construct a Haar-like basis {ψ l,i } Step 3: Estimate ˆf = f, ψl,i ψ l,i Control over coefficient decay non-parametric risk analysis Bound on E f ˆf 2 depends only on smoothness α of the target f and # labeled points

93 Application: Semi supervised learning Classification/Regression with Haar-like basis Task: Given values of smooth f on X X, extend f to X. Step 1: Build a partition tree s.t. f is smooth w.r.t tree metric Step 2: Construct a Haar-like basis {ψ l,i } Step 3: Estimate ˆf = f, ψl,i ψ l,i Control over coefficient decay non-parametric risk analysis Bound on E f ˆf 2 depends only on smoothness α of the target f and # labeled points

94 Application: Semi supervised learning Classification/Regression with Haar-like basis Task: Given values of smooth f on X X, extend f to X. Step 1: Build a partition tree s.t. f is smooth w.r.t tree metric Step 2: Construct a Haar-like basis {ψ l,i } Step 3: Estimate ˆf = f, ψl,i ψ l,i Control over coefficient decay non-parametric risk analysis Bound on E f ˆf 2 depends only on smoothness α of the target f and # labeled points

95 Application: Semi supervised learning Classification/Regression with Haar-like basis Task: Given values of smooth f on X X, extend f to X. Step 1: Build a partition tree s.t. f is smooth w.r.t tree metric Step 2: Construct a Haar-like basis {ψ l,i } Step 3: Estimate ˆf = f, ψl,i ψ l,i Control over coefficient decay non-parametric risk analysis Bound on E f ˆf 2 depends only on smoothness α of the target f and # labeled points

96 Application: Semi supervised learning Classification/Regression with Haar-like basis Task: Given values of smooth f on X X, extend f to X. Step 1: Build a partition tree s.t. f is smooth w.r.t tree metric Step 2: Construct a Haar-like basis {ψ l,i } Step 3: Estimate ˆf = f, ψl,i ψ l,i Control over coefficient decay non-parametric risk analysis Bound on E f ˆf 2 depends only on smoothness α of the target f and # labeled points

97 Toy Example benchmark Test Error (%) Laplacian Eigenmaps Laplacian Reg. Adaptive Threshold Haar like basis State of the art # of labeled points (out of 1500)

98 MNIST Digits 8 vs. {3,4,5,7} Laplacian Eigenmaps Laplacian Reg. Adaptive Threshold Haar like basis Test Error (%) # of labeled points (out of 1000)

99 Results Results 1 Any partition tree on X induces wavelet Haar-like bases 2 Balanced tree f smooth equals fast coefficient decay 3 Application to semi-supervised learning 4 Beyond basics: Tensor product of Haar-like bases, Coefficient thresholding

100 Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

101 Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

102 Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

103 Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

104 Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

105 Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

106 Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

107 Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

108 Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

109 Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

110 Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

111 Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

112 Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

113 Tensor product of Haar-like bases Coifman and G, Harmonic Analysis of Digital Data Bases (2010)

114 Tensor product of Haar-like bases

115 The Big Picture Coifman and Weiss, Extensions of Hardy spaces and their use in analysis, 1979 Geometric tools for Data analysis widely recognized Analysis tools (e.g. function spaces, wavelet theory) in graph or general geometries valuable and largely unexplored in ML context Deep theory, long tradition: geometry of X bases for {f : X R} ( Spaces of Homogeneous Type )

116 The Big Picture Coifman and Weiss, Extensions of Hardy spaces and their use in analysis, 1979 Geometric tools for Data analysis widely recognized Analysis tools (e.g. function spaces, wavelet theory) in graph or general geometries valuable and largely unexplored in ML context Deep theory, long tradition: geometry of X bases for {f : X R} ( Spaces of Homogeneous Type )

117 The Big Picture Coifman and Weiss, Extensions of Hardy spaces and their use in analysis, 1979 Geometric tools for Data analysis widely recognized Analysis tools (e.g. function spaces, wavelet theory) in graph or general geometries valuable and largely unexplored in ML context Deep theory, long tradition: geometry of X bases for {f : X R} ( Spaces of Homogeneous Type )

118 The Big Picture Coifman and Weiss, Extensions of Hardy spaces and their use in analysis, 1979 Geometric tools for Data analysis widely recognized Analysis tools (e.g. function spaces, wavelet theory) in graph or general geometries valuable and largely unexplored in ML context Deep theory, long tradition: geometry of X bases for {f : X R} ( Spaces of Homogeneous Type )

119 Summary Motto... the relationships between smoothness and frequency forming the core ideas of Euclidean harmonic analysis are remarkably resilient, persisting in very general geometries. - Szlam, Maggioni, Coifman (2008) Main message Any Balanced Partition Tree whose metric preserves smoothness in W yields an extremely simple Dream Wavelet Basis Fascinating open question Which graphs admit Balanced Partition Trees, whose metric preserves smoothness in W?

120 Supporting Information - proofs & code: gavish ; nadler/ G, Nadler and Coifman, Multiscale wavelets on trees, graphs and high dimensional data, Proceedings of ICML 2010 G, Nadler and Coifman, Inference using Haar-like wavelets, preprint (2010) Coifman and G, Harmonic analysis of digital data bases, to appear in Wavelets: Old and New Perspectives, Springer (2010) Coifman and Weiss, Extensions of Hardy spaces and their use in analysis, Bul. Of the AMS, 83 #4, pp (1977) Singh, Nowak and Calderbank, Detecting weak but hierarchically-structured patterns in networks, Proceedings of AISTATS 2010

Global vs. Multiscale Approaches

Harmonic Analysis on Graphs Global vs. Multiscale Approaches Weizmann Institute of Science, Rehovot, Israel July 2011 Joint work with Matan Gavish (WIS/Stanford), Ronald Coifman (Yale), ICML 10' Challenge: