Global vs. Multiscale Approaches

Size: px

Start display at page:

Download "Global vs. Multiscale Approaches"

Brianne Watts
5 years ago
Views:

1 Harmonic Analysis on Graphs Global vs. Multiscale Approaches Weizmann Institute of Science, Rehovot, Israel July 2011 Joint work with Matan Gavish (WIS/Stanford), Ronald Coifman (Yale), ICML 10'

2 Challenge: Organization / Understanding of Data In many elds massive amounts of data collected or generated, EXAMPLES: nancial data, multi-sensor data, simulations, documents, web-pages, images, video streams, medical data, astrophysical data, etc.

3 Challenge: Organization / Understanding of Data In many elds massive amounts of data collected or generated, EXAMPLES: nancial data, multi-sensor data, simulations, documents, web-pages, images, video streams, medical data, astrophysical data, etc. Need to organize / understand their structure Inference / Learning from data

4 Classical vs. Modern Data Analysis Setup and Tasks Classical Setup: - Data typically in a (low-dimensional) Euclidean Space. - Small to medium sample sizes (n < 1000) - Either all data unlabeled (unsupervised) or all labeled (supervised).

5 Classical vs. Modern Data Analysis Setup and Tasks Classical Setup: - Data typically in a (low-dimensional) Euclidean Space. - Small to medium sample sizes (n < 1000) - Either all data unlabeled (unsupervised) or all labeled (supervised). Modern Setup: - high-dimensional data, or data encoded as a graph. - Huge datasets, with n = 10 6 samples or more. Few labeled data.

6 Classical vs. Modern Data Analysis Setup and Tasks Classical Setup: - Data typically in a (low-dimensional) Euclidean Space. - Small to medium sample sizes (n < 1000) - Either all data unlabeled (unsupervised) or all labeled (supervised). Modern Setup: - high-dimensional data, or data encoded as a graph. - Huge datasets, with n = 10 6 samples or more. Few labeled data. The well developed and understood standard tools of statistics not always applicable

7 Classical vs. Modern Data Analysis Setup and Tasks Classical Setup: - Data typically in a (low-dimensional) Euclidean Space. - Small to medium sample sizes (n < 1000) - Either all data unlabeled (unsupervised) or all labeled (supervised). Modern Setup: - high-dimensional data, or data encoded as a graph. - Huge datasets, with n = 10 6 samples or more. Few labeled data. The well developed and understood standard tools of statistics not always applicable Question: Harmonic Analysis in such settings

8 Harmonic Analysis and Learning In the past 20 years (multiscale) harmonic analysis had profound impact on statistics and signal processing.

9 Harmonic Analysis and Learning In the past 20 years (multiscale) harmonic analysis had profound impact on statistics and signal processing. Well developed theory, long tradition: geometry of space X bases for {f : X R}

10 Harmonic Analysis and Learning In the past 20 years (multiscale) harmonic analysis had profound impact on statistics and signal processing. Well developed theory, long tradition: geometry of space X bases for {f : X R} Simplest Example: X = [0, 1] Construct (multiscale) basis {Ψ i }, that allows control of f, Ψ i for some (smooth) class of functions f. Theorem: ψ smooth wavelet, ψ l,k - wavelet basis, f : [0, 1] R, then f (x) f (y) < C x y α f, ψ l,k C 2 l(α+1/2)

11 Setting Harmonic analysis wisdom for f : R d R

12 Setting Harmonic analysis wisdom for f : R d R Expand f in orthonormal basis {ψ i } (e.g. wavelet) f = i f, ψ i ψ i

13 Setting Harmonic analysis wisdom for f : R d R Expand f in orthonormal basis {ψ i } (e.g. wavelet) f = i f, ψ i ψ i Shrink / estimate coecients

14 Setting Harmonic analysis wisdom for f : R d R Expand f in orthonormal basis {ψ i } (e.g. wavelet) f = i f, ψ i ψ i Shrink / estimate coecients Useful when f well approximated by a few terms (fast coecent decay / sparsity)

15 Setting Harmonic analysis wisdom for f : R d R Expand f in orthonormal basis {ψ i } (e.g. wavelet) f = i f, ψ i ψ i Shrink / estimate coecients Useful when f well approximated by a few terms (fast coecent decay / sparsity) Can this work on general datasets? Need data-adaptive basis {ψ i } for space of functions f : X R

16 Harmonic Analysis on Graphs Setup: We are given dataset X = {x 1,..., x N } with similarity / anity matrix W i,j Goal: Statistical inference of smooth f : X R Denoise f SSL / Regression / classication: extend f from X X to X

17 Semi-Supervised Learning In many applications - easy to collect lots of unlabeled data, BUT labeling the data is expensive. Question: Given (small) labeled set X X = {x i, y i } (y = f (x)), construct ˆf to label rest of X.

18 Semi-Supervised Learning In many applications - easy to collect lots of unlabeled data, BUT labeling the data is expensive. Question: Given (small) labeled set X X = {x i, y i } (y = f (x)), construct ˆf to label rest of X. Key Assumption: function f (x) has some smoothness w.r.t graph anities W i,j. otherwise - unlabeled data is useless or may even harm prediction.

19 The Graph Laplacian In most previous approaches, key object is GRAPH LAPLACIAN: L = D W where D ii = W j ij,

20 Global Graph Laplacian Based SSL Methods Zhu, Ghahramani, Laerty, SSL using Gaussian elds and harmonic functions [ICML, 2003], Azran [ICML 2007] 1 f (x) = arg min f (x i )=y i n 2 i,j=1 n W i,j(f i f j ) 2 = arg min 1 n 2 f T Lf Y. Bengio, O. Delalleau, N. Le Roux, [2006] D. Zhou, O. Bousquet, T. Navin Lal, J. Weston, B. Scholkopf [NIPS 2004] f (x) = arg min 1 n l n 2 W i,j(f i f j ) 2 + λ (f i y i ) 2 i,j=1 i=1

21 Global Graph-Laplacian Based SSL Belkin & Niyogi (2003): Given similarity matrix W nd rst few eigenvectors of Graph Laplacian, Le j = (D W )e j = λ j e j. Expand p ˆf = a j e j Estimate coecients a j j=1 from labeled data.

22 Statistical Analysis of Laplacian Regularization [N., Srebro, Zhou, NIPS 09'] Theorem: In the limit of large unlabeled data from Euclidean space, with W i,j = K( x i x j ), Graph Laplacian Regularizaton Methods f (x) = arg min 1 n l n 2 W i,j(f i f j ) 2 + λ (f i y i ) 2 i,j=1 are weill posed for underlying data in 1-d, but are not well posed for data in dimension d 2. In particular, in limit of innite unlabeled data, f (x) const at all unlabeled x. i=1

23 Eigenvector-Fourier Methods Belkin, Niyogi [2003], suggested a dierent approach based on the Graph Laplacian L = W D: Given similarity W, nd rst few eigenvectors (W D)e j = λ j e j, Expand p ŷ(x) = a j e j Find coecients a j j=1 by least squares, (â 1,..., â p ) = arg min l (y j ŷ(x j )) 2. j=1

24 Toy example Example: USPS benchmark

25 Toy example Example: USPS benchmark X is USPS (ML benchmark) as 1500 vectors in R = R 256

26 Toy example Example: USPS benchmark X is USPS (ML benchmark) as 1500 vectors in R = R 256 Anity W i,j = exp ( x i x j 2)

27 Toy example Example: USPS benchmark X is USPS (ML benchmark) as 1500 vectors in R = R 256 Anity W i,j = exp ( x i x j 2) f : X {1, 1} is the class label.

28 Toy example: visualization by kernel PCA

29 Toy example: Prior art?

30 Toy example: Prior art? Generalizing Fourier: The Graph Laplacian eigenbasis

31 Toy example: Prior art? Generalizing Fourier: The Graph Laplacian eigenbasis Take (W D)ψ i = λ i ψ i where D i,i = j W i,j

32 Toy example: Prior art? Generalizing Fourier: The Graph Laplacian eigenbasis Take (W D)ψ i = λ i ψ i where D i,i = j W i,j (Belkin, Niyogi 2004) and others

33 Toy example: Prior art? Generalizing Fourier: The Graph Laplacian eigenbasis Take (W D)ψ i = λ i ψ i where D i,i = j W i,j (Belkin, Niyogi 2004) and others In Euclidean setting

34 Toy example: Prior art? Generalizing Fourier: The Graph Laplacian eigenbasis Take (W D)ψ i = λ i ψ i where D i,i = j W i,j (Belkin, Niyogi 2004) and others In Euclidean setting Typical coecient decay rate in Fourier basis: polynomial

35 Toy example: Prior art? Generalizing Fourier: The Graph Laplacian eigenbasis Take (W D)ψ i = λ i ψ i where D i,i = j W i,j (Belkin, Niyogi 2004) and others In Euclidean setting Typical coecient decay rate in Fourier basis: polynomial But in wavelet bases: exponential

36 Toy example: Graph Laplacian Eigenbasis

37 Toy example: Graph Laplacian Eigenbasis

38 Toy example: Graph Laplacian Eigenbasis

39 Challenge

40 Challenge Challenge: build multiscale "wavelet-like" bases on X

41 Challenge Challenge: build multiscale "wavelet-like" bases on X 1. Construct adaptive multiscale basis on general dataset

42 Challenge Challenge: build multiscale "wavelet-like" bases on X 1. Construct adaptive multiscale basis on general dataset 2. Investigate: coecient f, ψ i decay rate f smooth

43 Challenge Challenge: build multiscale "wavelet-like" bases on X 1. Construct adaptive multiscale basis on general dataset 2. Investigate: coecient f, ψ i decay rate f smooth Previous Works - Difusion Wavelets [Coifman and Maggioni] - for Data on Graphs [Jansen, Nason, Silverman] - Wavelets on Graphs via Spectral Graph Theory [Hammond et al.]

44 Challenge Challenge: build multiscale "wavelet-like" bases on X 1. Construct adaptive multiscale basis on general dataset 2. Investigate: coecient f, ψ i decay rate f smooth Previous Works - Difusion Wavelets [Coifman and Maggioni] - for Data on Graphs [Jansen, Nason, Silverman] - Wavelets on Graphs via Spectral Graph Theory [Hammond et al.] Our Work - (Relavitely) Simple Construction -Accompanying Theory.

45 Toy example: intriguing experiment

46 Toy example: intriguing experiment

47 Toy example: intriguing experiment 10 2 sorted abs(coefficients) on log10 scale eigenbasis

48 Toy example: intriguing experiment 10 2 sorted abs(coefficients) on log10 scale 10 0 eigenbasis haar like basis

49 Toy example: intriguing experiment

50 Toy example: intriguing experiment

51 Challenge and Result

52 Challenge and Result Challenge: build "wavelet" bases on X

53 Challenge and Result Challenge: build "wavelet" bases on X 1. Construct adaptive multiscale basis on general dataset

54 Challenge and Result Challenge: build "wavelet" bases on X 1. Construct adaptive multiscale basis on general dataset 2. Investigate: coecient f, ψ i decay rate f smooth

55 Challenge and Result Challenge: build "wavelet" bases on X 1. Construct adaptive multiscale basis on general dataset 2. Investigate: coecient f, ψ i decay rate f smooth Key Results

56 Challenge and Result Challenge: build "wavelet" bases on X 1. Construct adaptive multiscale basis on general dataset 2. Investigate: coecient f, ψ i decay rate f smooth Key Results 1. Partition tree on X induces wavelet Haar-like bases

57 Challenge and Result Challenge: build "wavelet" bases on X 1. Construct adaptive multiscale basis on general dataset 2. Investigate: coecient f, ψ i decay rate f smooth Key Results 1. Partition tree on X induces wavelet Haar-like bases 2. Assuming Balanced tree f smooth fast coecient decay

58 Challenge and Result Challenge: build "wavelet" bases on X 1. Construct adaptive multiscale basis on general dataset 2. Investigate: coecient f, ψ i decay rate f smooth Key Results 1. Partition tree on X induces wavelet Haar-like bases 2. Assuming Balanced tree f smooth fast coecient decay 3. Novel SSL scheme with learning guarantees assuming smooth functions on tree.

59 The Haar Basis on [0, 1]

60 The Haar Basis on [0, 1]

61 The Haar Basis on [0, 1]

62 The Haar Basis on [0, 1]

63 The Haar Basis on [0, 1]

64 The Haar Basis on [0, 1]

65 The Haar Basis on [0, 1]

66 The Haar Basis on [0, 1]

67 The Haar Basis on [0, 1]

68 The Haar Basis on [0, 1]

69 The Haar Basis on [0, 1]

70 Hierarchical partition of X

71 Hierarchical partition of X

72 Hierarchical partition of X

73 Hierarchical partition of X

74 Hierarchical partition of X

75 Partition Tree (Dendrogram)

76 Partition Tree (Dendrogram)

77 Partition Tree (Dendrogram)

78 Partition Tree (Dendrogram)

79 Partition Tree (Dendrogram)

80 Partition Tree (Dendrogram)

81 Partition Tree (Dendrogram)

82 Partition Tree Haar-like basis Simple Observation: [Lee, N., Wasserman, AOAS 08'] Hierarchical Tree Mutli-Resolution Analysis of Space of Functions Haar-like multiscale basis

83 Partition Tree Haar-like basis Simple Observation: [Lee, N., Wasserman, AOAS 08'] Hierarchical Tree Mutli-Resolution Analysis of Space of Functions Haar-like multiscale basis V = {f f : X R} V l = {f f constant at partitions at level l}

84 Partition Tree Haar-like basis Simple Observation: [Lee, N., Wasserman, AOAS 08'] Hierarchical Tree Mutli-Resolution Analysis of Space of Functions Haar-like multiscale basis V = {f f : X R} V l = {f f constant at partitions at level l} V 1 V 2... V L = V

85 Partition Tree Haar-like basis

86 Partition Tree Haar-like basis

87 Partition Tree Haar-like basis

88 Partition Tree Haar-like basis

89 Partition Tree Haar-like basis

90 Partition Tree Haar-like basis

91 Partition Tree Haar-like basis

92 Partition Tree Haar-like basis

93 Partition Tree Haar-like basis

94 Partition Tree Haar-like basis

95 Partition Tree Haar-like basis

96 Toy example: Haar-like basis function

97 Smoothness Coecient decay How to dene smoothness?

98 Smoothness Coecient decay How to dene smoothness? Partition tree induces tree metric d(x i, x j ) [ultrametric] Particular example of Space of Homogeneous Type [Coifman & Weiss]

99 Smoothness Coecient decay How to dene smoothness? Partition tree induces tree metric d(x i, x j ) [ultrametric] Measure smoothness of f : X R in tree metric Particular example of Space of Homogeneous Type [Coifman & Weiss]

100 Smoothness Coecient decay How to dene smoothness? Partition tree induces tree metric d(x i, x j ) [ultrametric] Measure smoothness of f : X R in tree metric Theorem: Let f : X R. Then f (x i ) f (x j ) Cd(x i, x j ) α f, ψ l,i C q l(α+ 1/2). where - q measures the tree balance - α measures function smoothness w.r.t. tree Particular example of Space of Homogeneous Type [Coifman & Weiss]

101 Application: SSL Given dataset X with weighted graph G, similarity matrix W, labeled points {x i, y i }, 1. Construct a balanced hierarchical tree of graph 2. Construct corresponding Haar-like basis 3. Estimate coecients from labeled points

102 Toy Example: Benchmarks Test Error (%) Laplacian Eigenmaps Laplacian Reg. Adaptive Threshold Haar like basis State of the art # of labeled points (out of 1500)

103 Toy Example: MNIST 8 vs. {3,4,5,7} Laplacian Eigenmaps Laplacian Reg. Adaptive Threshold Haar like basis Test Error (%) # of labeled points (out of 1000)

104 Summary

105 Summary 1. A balanced partition tree induces a useful wavelet basis

106 Summary 1. A balanced partition tree induces a useful wavelet basis 2. Designing the basis allows a theory connecting Coecients decay, smoothness and learnability

107 Summary 1. A balanced partition tree induces a useful wavelet basis 2. Designing the basis allows a theory connecting Coecients decay, smoothness and learnability 3. Wavelet arsenal becomes available: wavelet shrinkage, etc

108 Summary 1. A balanced partition tree induces a useful wavelet basis 2. Designing the basis allows a theory connecting Coecients decay, smoothness and learnability 3. Wavelet arsenal becomes available: wavelet shrinkage, etc 4. Interesting harmonic analysis. Promising for data analysis?

109 Summary 1. A balanced partition tree induces a useful wavelet basis 2. Designing the basis allows a theory connecting Coecients decay, smoothness and learnability 3. Wavelet arsenal becomes available: wavelet shrinkage, etc 4. Interesting harmonic analysis. Promising for data analysis? 5. Computational experiments motivate theory and vice-versa

110 Summary 1. A balanced partition tree induces a useful wavelet basis 2. Designing the basis allows a theory connecting Coecients decay, smoothness and learnability 3. Wavelet arsenal becomes available: wavelet shrinkage, etc 4. Interesting harmonic analysis. Promising for data analysis? 5. Computational experiments motivate theory and vice-versa The End nadler

Multiscale Wavelets on Trees, Graphs and High Dimensional Data

Multiscale Wavelets on Trees, Graphs and High Dimensional Data ICML 2010, Haifa Matan Gavish (Weizmann/Stanford) Boaz Nadler (Weizmann) Ronald Coifman (Yale) Boaz Nadler Ronald Coifman Motto... the relationships