Diffusion Wavelets. Ronald R. Coifman, Mauro Maggioni

Size: px
Start display at page:

Download "Diffusion Wavelets. Ronald R. Coifman, Mauro Maggioni"

Transcription

1 Diffusion Wavelets Ronald R. Coifman, Mauro Maggioni Program in Applied Mathematics Department of Mathematics Yale University New Haven,CT,65 U.S.A. Abstract We present a multiresolution construction for efficiently computing, compressing and applying large powers of operators that have high powers with low numerical rank. This allows the fast computation of functions of the operator, notably the associated Green s function, in compressed form, and their fast application. Classes of operators satisfying these conditions include diffusion-like operators, in any dimension, on manifolds, graphs, and in non-homogeneous media. In this case our construction can be viewed as a far-reaching generalization of Fast Multipole Methods, achieved through a different point of view, and of the non-standard wavelet representation of Calderón-Zygmund and pseudodifferential operators, achieved through a different multiresolution analysis adapted to the operator. We show how the dyadic powers of an operator can be used to induce a multiresolution analysis, as in classical Littlewood-Paley and wavelet theory, and we show how to construct, with fast and stable algorithms, scaling function and wavelet bases associated to this multiresolution analysis, and the corresponding downsampling operators, and use them to compress the corresponding powers of the operator. This allows to extend multiscale signal processing to general spaces (such as manifolds and graphs) in a very natural way, with corresponding fast algorithms. Key words: Wavelets, Wavelets on Manifolds, Wavelets on Graphs, Heat Diffusion, Laplace-Beltrami Operator, Diffusion Semigroups, Fast Multipole Method, Matrix Compression, Spectral Graph Theory. addresses: coifman@math.yale.edu ( Ronald R. Coifman), mauro.maggioni@yale.edu ( Mauro Maggioni ). URL: ( Mauro Maggioni ). Preprint submitted to Elsevier Science 7 August 4

2 Introduction We introduce a multiresolution geometric construction for the efficient computation of high powers of local operators (in order O(n log n)forseveral classes of operators, where n is the cardinality of the space). The operators under consideration are positive contractions that have high powers with low numerical rank. This organization of the geometry of a matrix representing T enables fast computation of functions of the operator, notably the associated Green s function, in compressed form. Classes of operators satisfying these conditions include discretizations of differential operators, in any dimension, on manifolds, and in non-homogeneous media. Our construction can be viewed as an extension of Fast Multipole Methods [], and of the non-standard wavelet form for Calderón-Zygmund integral operators and pseudo-differential operators of []. Unlike the integral equation approach we start from the generator T of the semigroup associated to a differential operator, rather than from the Green operator. For most examples above the (dyadic) powers of this operator decrease in rank, thus suggesting the compression of the function (and geometric) space upon which each power acts. The scheme we propose consists in the following: apply T to a space of test functions at the finest scale, compress the range via a local orthogonalization procedure, represent T in the compressed range, compute T, compress and orthogonalize, and so on: at scale j we obtain a compressed representation of T j+, acting on the range of T j+, for which we have a (compressed) orthonormal basis, then we apply T j+, locally orthogonalize and compress the result, thus getting the next coarser subspace. The computation of the inverse Laplacian (I T ) (of course, on the complement of the eigenspace corresponding to the eigenvalue ) can be carried out (in compressed form) via the Schultz method []: we have (I T ) + f = T k f which implies, letting S K = K k= T k, S K+ = S K + T K S K = k= K ( I + T k) f. (.) Since we can apply fast T k to any function f, and hence the product S K+,we can apply (I T ) to any function f fast, in time O(n log n). Moreover, since this construction adapts to the natural geometry induced by the operator, it is very promising in view of applications to the problem of homogenization of linear diffusion partial differential operators with non-constant coefficients. The interplay between geometry of sets, function spaces on sets, and operators on sets is of course classical in Harmonic Analysis. Our construction views the columns of a matrix representing T as data points in Euclidean k=

3 space which are then viewed as lying on a manifold, for which the first few eigenvectors of T provide coordinates (see [3,4]). The spectral theory of T provides a Fourier Analysis on this set relating our approach to multiscale geometric analysis, multiscale Fourier and wavelet analysis. The action of a given diffusion semigroup on the space of functions on the set is analyzed in a multiresolution fashion, where (dyadic powers of) the diffusion operator corresponds to dilations, and projections (in general nonrothogonal) correspond to downsampling. The localization of the scaling functions we construct then allows to reinterpret these operations in function space in a geometric fashion. This mathematical construction has a numerical implementation which is fast and stable. We thus get a fast construction of multiresolution analyses on rather general spaces, with respect to the dilations induced by a diffusion semigroup, and fast algorithms for computing the corresponding transform. The framework we consider in this work includes at once large classes of manifolds, graphs, spaces of homogeneous type, and can be extended even further. Given a metric measure space and a symmetric diffusion semigroup on it, there is a natural multiresolution analysis associated to it, and an associated Littlewood-Paley theory for the decomposition of the natural function spaces on the metric space. These ideas are classical (see [5] for a specific instance in which the diffusion semigroup is center-stage, but the literature on multiscale decompositions in general settings is vast, see e.g. [6 ] and references therein). Generalized Heisenberg principles exist [] that guarantee that eigenspaces are well approximated by scaling functions spaces at the appropriate scale. Our work shows that scaling functions and wavelets can be constructed and fast numerical algorithms that allow the implementation of these ideas exist. In effect, this construction allows to lift multiscale signal processing techniques to datasets, for compression and denoising of functions and operators on a dataset (and of the data set itself), for approximation and learning. It also allows the introduction of dictionaries of diffusion wavelet packets [], with the corresponding fast algorithms for best basis selection. Material related to this paper, such as examples, and Matlab scripts for their generatation, will be made available on the web page at yale.edu/~mmm8/diffusionwavelets.html. Note Because of the scope of this article and for space constraints, it is not possible to list all the references to all the works related to motivations, techniques, and applications of the constructions and algorithms introduced in this paper. We provide references to the most relevant works, but many others had to be omitted. 3

4 The construction in the finite dimensional discrete setting In this section we present a particular case of our construction in a finitedimensional, purely discrete setting, for which only finite dimensional linear algebra is needed. We refer the reader to the diagram in Figure for a visual representation of the scheme presented σ(t).4 σ(t 3 ).3 σ(t 7 ). σ(t 5 ). ε V V V V 4 3 Fig.. Spectra of powers of T and corresponding multiscale eigenspace decomposition. We consider a finite graph X and a symmetric positive definite and positive diffusion operator T on (functions on) X (for a precise definition and discussion of these hyperotheses see Section 4). Without loss of generality, we can assume that T. For example: (i) The graph could represent a metric space in which points are data (e.g. documents) and edges have weights (e.g. a function of the similarity between documents), and I T could be a Laplacian on X, that induces a natural diffusion process. T is then similar to a Markov matrix P representing the natural random walk on the graph [3]. (ii) X could represent the discretization of a domain and T = e,where isa partial differential operator such as a Laplacian on the domain, or manifold with smooth boundary. Our main assumptions are that T is local, i.e. T (δ k ), where δ k is the Dirac δ-function at k X, has small support, and that high powers of T have low numerical rank. Ideally there exists a γ< such that for every j wehave rk ɛ (T j ) <γrk ɛ (T j ), where rk ɛ denotes the ɛ-numerical rank as in definition 6. We want to compute and describe efficiently the powers T j,forj>, which describe the long term behavior of the diffusion. This will allow the computation of functions of the operator in compressed form (notably of the Green s function (I T ) ), as well as the fast computation of the diffusion from any initial conditions. This is of course of great interest in the solution of dis- 4

5 cretized partial differential equations, of Markov chains, but also in learning and related classification problems. M M Φ Φ... Φ j+... T G T G G j M j Tj j Φ Φ... Φj+ G j+ M j+ Fig.. Diagram for downsampling, orthogonalization and operator compression. (All triangles are commutative by construction) The reason why one expects to be able to compress high powers of T is that they are low rank by assumption, so that it should be possible to efficiently represent them on an appropriate basis, at the appropriate resolution. From the analyst s perspective, these high powers are smooth functions with small gradient (or even in a sense band-limited with small band), hence compressible. We start by fixing a precision ɛ>, assume that T is self-adjoint and is represented on the basis Φ = {δ k } k X, and consider the columns of T,which can be interpreted as the set of functions Φ = {Tδ k } k X on X. Weusea local multiscale Gram-Schmidt procedure, which is a linear transformation we represent by a matrix G, to carefully orthonormalize these columns to get abasisφ = {ϕ,k } k X (X is defined as this index set) for the range of T up to precision ɛ. This yields a subspace that we denote by V. Essentially Φ is a basis for a subspace which is ɛ-close to the range of T, and with basis elements that are well-localized. Moreover, the elements of Φ are coarses than the elements of Φ, since the dilation they are the result of applying T once. Obviously X X but the inequality may already be strict since part of the numerical range of T may be below the precision ɛ. Whether this is the case or not, we have then a map M from X to X, which is the composition of T with the orthonormalization by G. We can also represent T in the basis Φ : we denote this matrix by T and compute T = M M T. We proceed now by looking at the columns of T,whichare Φ = {T δ k } k X i.e., by unravelling the bases on which this is happening, {T ϕ,k } k X up to the precision ɛ. Again we can apply a local Gram-Schmidt procedure to orthonormalize this set: this yields a matrix G and an orthonormal basis Φ = {ϕ,k } k X for the range of T up to precision ɛ, and hence for the range of T 3 up to precision ɛ. Moreover, depending on the decay of the spectrum of T, X is in general afractionof X. The matrix M which is the composition of G with T is then of size X X,andT = M M T is a representation of T 4 acting on Φ. 5

6 After j steps in this fashion, we will have a representation of T j = T j+ onto a basis Φ j = {ϕ j,k } k Xj. Depending on the decay of the spectrum of T, we expect X j << X, in fact in the ideal situation the spectrum of T decays fast enough so that there exists γ< such that X j <γ j+ X. While Φ j is naturally identified with the set of Dirac δ-functions on X j,we can extend these functions living on the compressed (or downsampled ) graph X j to the whole initial graph X by writing Φ j = M M j Φ :since every function in Φ is defined on X, so is every function in Φ j. Hence any function on the compressed space X j can be extended naturally to the whole X. In particular, one can compute low-frequency eigenfunctions on X j,and then extend them to the whole X. The elements in Φ j are at scale T j+,and are much coarses, and smoother, than the initial elements in Φ,whichis way they can be represented in compressed form. The projection of a function onto the subspace spanned by Φ j will be by definition an approximation to that function at that particular scale. There is an associated fast scaling function transform: suppose we are given f on X and want to compute f,ϕ j,k for all scales j and corresponding translations k. Beinggivenf means we are given ( f,ϕ,k ) k X.Thenwecan compute ( f,ϕ,k ) k X = M ( f,ϕ,k ) k X, and so on for all scales. The matrices M j are sparse (since T j and G j are), so this computation is fast. This generalizes the classical scaling function transform. We will show later that wavelets can be constructed as well, and that a fast wavelet transform is also possible. Inthesameway,anypowerofT can be applied fast to a function f. In particular the Green s function (I T ) can be applied fast to any function, since it can represented as product of dyadic powers of T as in (.), each of which can be applied fast. We are at the same time compressing the powers of the operator T and the space X itself, at essentially the optimal rate at each scale, as dictated by the portion of the spectrum of the powers of T which is above the precision ɛ. Observe that each point in X j can be considered as a local aggregation of points in X j, which is completely dictated by the action of the operator T on functions on X: the operator itself is dictating the geometry with respect to which it should be analyzed, compressed or applied to any vector. 3 Notation and Definitions To present our construction in some generality, we will need the following definitions and notations. 6

7 Definition Let X be a set. A function d : X X [, + ) is called a quasi-metric if (i) d(x, y) for every x, y X, with equality if and only if x = y, (ii) d(x, y) =d(y, x), for every x, y X, (iii) there exists C> such that d(x, y) C(d(x, z)+d(z, y)) for every x, y, z X (quasi-triangle inequality). The pair (X, d) is called a quasi-metric space. (X, d) is a metric space if one can choose C =in (iii). Example A weighted undirected connected graph (G, E, W ), wherew is the set of positive weights on the edges in E, is a metric space when the distance is defined by d(x, y) =inf γ x,y e γ x,y w e where γ x,y isapathconnectingx and y. Often in applications a measure on (the vertices of) G is either uniform or specified by some connectivity properties of each vertex (e.g. sum of weights of the edges concurrent in each vertex). Let (X, d, µ) be a quasi-metric space with Borel measure µ. Forx X, δ>, let B δ (x) ={y X : d(x, y) <δ} be the open ball of radius δ around x. For a subset S of X, we will use the notation N δ (S) ={x X : y S : d(x, y) <δ} for the δ-neighborhood of S and, for δ <δ,welet N δ,δ (S) =N δ \N δ. Since we would like to handle finite, discrete and continuous situations in a unique framework, and since many finite-dimensional discrete problems arise from the discretization of continuous and infinite-dimensional problems, we introduce the following definitions. Definition 3 (Coifman and Weiss, [?]) A quasi-metric measure space (X, d, µ) is said to be of homogeneous type [?,6] if µ is a non-negative Borel measure and there exists a constant C X > such that for every x X, δ>, µ(b δ (x)) C X µ(b δ (x)) (3.) We assume µ(b δ (x)) < for all x X, δ >, and we will work on connected spaces X. In the continuous situation, we will assume that µ({x}) = for every x X. 7

8 One can replace the quasi-metric d with an equivalent quasi-metric ρ so that the δ-balls in the ρ metric have measure approximately δ: it is enough to define ρ(x, y) as the measure of the smallest ball containing x and y. Onecanalso assume some Hölder-smoothness for ρ, in the sense that ρ(x, y) ρ(x,y) cρ(x, x ) β [ρ(x, y)+ρ(x,y)] β for some β (, ),c>, see [7,8,5,6]. Example 4 Examples of spaces of homogeneous type include: (i) Euclidean spaces of any dimension, with isotropic or anisotropic metrics induced by positive-definite bilinear forms and their powers (see e.g. [7]). (ii) Compact Riemannian manifolds with respect to geodesic metric, or also with respect to metrics induced by certain classes of vector fields [8]. (iii) Graphs with bounded degree (the degree of a vertex is defined as d x = y x w yx, where y x means y is connected to x), in particular regular graphs. Definition 5 A family of functions {ϕ k } k on (X, d) is called δ-local, for some δ>, if sup diam.(supp.ϕ k ) <δ, k i.e. there exists a set {x k } k X, called a supporting set for {ϕ k } k, such that for every k we have supp.ϕ k B δ (x k ). Notation If V is a closed linear subspace of L (X, µ), we denote the orthogonal projection onto V by P V. The closure V of any subspace V is taken in L (X, µ), unless otherwise specified. Definition 6 Two subspaces V and W are ɛ-close if there exists a linear isomorphism L with I L <ɛmapping V onto W. An ɛ-numerical span of a set of vectors {ϕ k } k in L is defined as a closed subspace which is ɛ-close to the closure of the span of {ϕ k } k. An ɛ-numerical rank of an operator T is a subspace which is ɛ-close to the range of T. The definition of numerical kernel is analogous. Notation If L is a self-adjoint bounded operator on L (X, µ), withspectrum σ L, and spectral decomposition L = λ de λ, σ L 8

9 we define L ɛ = {λ σ L : λ >ɛ} λ de λ. 4 Multiresolution Analysis induced by symmetric diffusion semigroups 4. Symmetric diffusion semigroups We start from the following definition from [5]. Definition 7 Let {T t } t [,+ ) be a family of operators on (X, µ), each mapping L (X, µ) into itself. Suppose this family is a semigroup, i.e. T = I and T t +t = T t T t for any t,t [, + ), andlim t + T t f = f in L (X, µ) for any f L (X, µ). Such a semigroup is called a symmetric diffusion semigroup if it satisfies the following: (i) T t p, for every p + (contraction property). (ii) Each T t is self-adjoint. (symmetry property). (iii) T t is positive preserving: Tf for every f in L (X) (positivity property). (iv) The semigroup has a generator, so that T t = e t, (4.) While some of these assumptions are not strictly necessary, their adoption simplifies this presentation without reducing the types of applications we are interested in at this time. We will denote by σ T the spectrum of T (so that {λ t } λ σt is the spectrum of T t ), and by {ξ λ } λ σt the corresponding basis (orthogonal since T t is normal) of eigenvectors, normalized in L (without loss of generality). Observe that here and in all that follows we will use the abuse of notation that implicitly accounts for possible multiplicities of the eigenvalues. Remark 8 Observe that σ T [, ]: by the semigroup property and (ii), we have ( ) T = T T = T T so the spectrum is non-negative. The upper bound obviously follows from condition (i). 9

10 Remark 9 There are classes of diffusion operators that are not self-adjoint but very important in applications. The construction and the algorithm we propose do not depend on this hypothesis, however the interpretation of many results does depend on the spectral decomposition. In the paper [9] we will address these and related issues in a broader framework. Example Examples include (a) The Poisson semigroup, for example on the circle or half-space, or anisotropic and/or higher dimensional anisotropic versions (e.g. [7]). (b) The random walk diffusion induced by a symmetric Markov chain (on a graph, a manifold etc...). (c) The semigroup T t = e tl generated by a second-order differential operator on some interval (a, b) (a and/or b possibly infinite), in the form Lf = a (x) d dx f + a (x) d dx f + a (x)f(x) with a (x) >, a (x), acting on a subspace of L ((a, b),q(x)dx), where q is an appropriate weight function, given by imposing appropriate boundary conditions, so that L is (unbounded) self-adjoint. Conditions (i) to (iii) are satisfied, and (iv) is satisfied if c =. This extends to R n by considering elliptic partial differential operators in the form Lf = w(x) n ( a ij (x) f + c(x)f, i= x i x j where we assume c(x) and w(x) >, and we consider this operator as defined on a smooth manifold. L, applied to functions with appropriate boundary conditions, is formally self-adjoint and generates a semigroup satisfying (i) to (iii), and (iv) is satisfied if c =. An important case is the Laplace-Beltrami operator on a compact smooth manifold (or even on a Lie group), and subordinated operators [5]. (f) If (X, d, µ) is derived from a graph (G, W ) as described above in example, one can define D i = j W ij and then the matrix D W is a Markov matrix, which corresponds to the natural random walk on G. The operator L = D (I W )D is the normalized Laplacian on graphs, it is a contraction on L(G) and is self-adjoint. This discrete setting is extremely useful in applications and widely used in a number of fields such as data analysis (e.g. clustering, learning on manifolds, parametrization of data sets), computer vision (e.g. segmentation) and computer science (e.g. network design, distributed processing). We believe our construction will have application in all these settings. As a starting point, see for example [3] for an overview, and [] for application to image segmentation. (g) One parameter groups (continuous or discrete) of dilations in R n,orother motion groups (e.g. the Galilei group or the Heisenberg-type groups), act on )

11 square-integrable functions on these spaces (with the appropriate measures) as isometries or contractions. Continuous wavelets in some of these settings have been studied (see e.g. [] and references therein), but an efficient discretization of such transforms seems still an open problem (see however []). (h) The random walk diffusion on a hypergroup (see e.g. [3] and references therein). Definition A positive diffusion semigroup of operators acts η-regularly, for some η>, if for every x and every smooth test function ϕδ-local around x, the function Tϕ is ηδ-local around x. 4. Diffusion metrics and embeddings induced by a diffusion semigroup The material in this section and its applications are discussed in greater detail in [4,3] and references therein. From now on, in all that follows, we shall assume, mainly for simplicity, that T is compact. Being positive definite, T t induces the diffusion metric d (t) (x, y) = λ t (ξ λ (x) ξ λ (y)) λ σ T = δ x δ y,t t (δ x δ y ) = T t δx T t δy. (4.) Definition The metric defined above is called the diffusion metric associated to T,attimet. If the action of T t on L (X, µ) can be represented by a (symmetric) kernel K t (x, y), then d (t) can be written as d (t) (x, y) = since the spectral decomposition of K is K t (x, y) = K t (x, x)+k t (y, y) K t (x, y), (4.3) λ σ T λ t ξ λ (x)ξ λ (y). (4.4) For any subset σ T σ T we can consider the map of metric spaces Φ (t) σ T :(X, d (t) ) (R σ T,d Euc. ) x ( λ t ξλ (x) ) λ σ T (4.5)

12 which is in particular cases called eigenmap [5,6], and is a form of local multidimensional scaling (see also [3,4,4]). By the definition of d (t),thismap is an isometry when σ T = σ T, and an approximation to an isometry when σ T σ T. If σ T is the set of the first n top eigenvalues, and if X K(x, y)dµ =,then Φ σ T is a minimum local distortion map from (X, d (t) )tor n, in the sense it minimizes n tr(p {φλ } λ σ K (t) P {φ λ } T λ σ )= T (φ λ (x) φ λ (y)) K (t) (x, y)dµ(x)dµ(y) X X λ σ T (4.6) among all possible maps x (φ (x),...,φ n (x)) such that φ i,φ j = δ ij.this is just a rephrasing, in our situation, of the well-known fact that the top k singular vectors span the best approximating k dimensional subspace to the domain of a linear operator, in the L sense. See e.g. [7] and references therein for a comparison between different dimensionality reduction techniques that canbecastinthisframework. These ideas are related to various techniques used for nonlinear dimension reduction: see for example for a list of relevant links and [8 3,4,3,3] and references therein. Observe that for any fixed precision ɛ, one can find σ T so that the eigenfunction expansion (4.) of d (t) can be truncated on σ T with approximation error smaller than ɛ. Finally, we observe that the semigroup not only generates a family of metrics on X, but a family of measures as well. We simply consider the σ-algebra A (t) generated by the family of sets S (t) = {B (t) δ (x)) := supp.tt (χ Bδ (x)),x X, δ > } (where χ A denotes the characteristic function of a set A) and define µ (t) (B (t) δ (x)) = µ(b δ (x)) on S (t) and extend it to A (t). Example 3 Suppose we have a symmetric Markov chain M on (X, µ), which we can assume irreducible, with transition matrix P. We assume P is positive definite (otherwise we could consider (I + P )). Let {λ l} l be the set of eigenvalues of P, ordered by increasing value, and {ξ l } l be the corresponding set of right eigenvectors of P, which form an orthonormal basis. Then by functional calculus we have P t i,j = l λ t l ξ l(i)ξ l (j)

13 and for an initial distribution f on X, we define P t f = l λ t l f,ξ l ξ l. (4.7) Observe that if f is a probability distribution, so is P t f for every t since P is Markov. The diffusion map Φ σ T embeds the Markov chain in Euclidean R σ T in such a way that the diffusion metric induced by M on X becomes Euclidean distance in R σ T. 4.3 Construction of scaling functions and wavelets We can interpret T as a dilation operator acting on functions in L (X, µ), and use it to define a multiresolution structure. As in [5] and in classical wavelet theory, we start by discretizing the semigroup at the increasing sequence of times j t j = l = j+. (4.8) i= Let {λ i } i be the spectrum of T, ordered in decreasing order, and {ξ i } i the corresponding eigenvectors. We can define low-pass portions of the spectrum by letting σ T,j = {λ σ T,λ t j ɛ}. (4.9) For a fixed ɛ (, ) (which we may think of as our precision), we define the (finite dimensional!) approximation spaces by V j =< {ξ λ : λ σ T,λ t j ɛ} >=< {ξ λ : λ σ T,j } > (4.) The set of subspaces {V j } j Z is a multiresolution analysis in the sense that it satisfies the following properties: (i) lim j V j =Ran(T ), lim j + V j =< {ξ i : λ i =} >. (ii) V j+ V j for every j Z. (iii) {ξ λ : λ t j ɛ} is an orthonormal basis for V j. We can also define, for j, the subspaces W j as the orthogonal complement of V j+ in V j, so that we have the familiar relation between approximation and detail subspaces as in the classical wavelet multiresolution constructions: V j+ = V j W j. (4.) 3

14 The direct orthogonal sum L (X) = W j j is a wavelet decomposition of the space, induced by the diffusion semigroup, and related to the Littlewood-Paley decomposition studied in this setting by Stein [5]. While by definition (4.) we already have an orthonormal basis of eigenfunctions of T for each subspace V j (and for the subspaces W j as well), these basis functions are in general highly non-localized, being generalized global Fourier modes of the operator. Our aim is to build localized bases for each of these subspaces, starting from a basis of the fine subspace V and explicitly constructing a downsampling scheme that yields an orthonormal basis for V j, j>. This is motivated by general Heisenberg principles (see e.g. [33] for a setting similar to ours) that guarantee that eigenfunctions have a smoothness or frequency content or scale determined by the corresponding eigenvalues, and can be reconstructed by maximally localized bump functions, or atoms, at that scale. Such ideas have many applications in numerical analysis, especially to problems motivated by physics (multigrid techniques [34,35], matrix compression [36 39], etc...) and are in general behind many techniques for multiscale compression. We avoid the computation of the eigenfunctions, nevertheless the approximation spaces Ṽj that we build will be about ɛ-approximations of the true V j s. We propose two ways of constructing the scaling functions in the Multi- Resolution. One uses the following: Proposition 4 (Multiscale local orthonormalization) Let Φ be a δ-local set of functions on X, spanning a subspace V, S a supporting set for Φ, and let ɛ>. Then there exists an orthonormal basis of V such that: Φ={Φ l } l=,...,l = {{ϕ l,k } k Kl } l=,...,l (a) for each l {,...,L}, thefamily{ϕ l,k } k Kl is ( l+ )δ-local and it has a supporting set S l = {x l,k } k Kl S that is a ( l+ )δ-lattice. In particular K l l µ(x) µ(b δ ) ; (b) the linear map G from Φ to Φ is local and sparse; (c) there exists a monotonic function L such that for each l, {Φ l } l=,...,l(l) spans a subspace which is ɛ-close to the subspace spanned by the top l singular vectors of Φ. 4

15 We postpone the proof of this Proposition, and comments on its numerical aspects and implementation, till section 5.. Here we would like to remark that under rather general assumptions on {Φ} and geometric assumptions on its supporting set S, one can prove that the constructed Φ have exponential decay. For the second orthonormalization technique, which guarantees good bounds on the decay on the orthonormal functions we build, we will need the following definition from [6] Definition 5 Amatrix(B) (j,k) J J is called η-accreative, if there exists an η> such that for every ξ l (J) we have Re j B jk ξ j ξ k η k j ξ l (J). In [6, Proposition 3 in Section.4], Coifman and Meyer prove a result suggesting the following Proposition 6 Let (X, d, µ) be a space of homogeneous type, and let Φ = { ϕ j } j J be a countable, δ-local Riesz basis of L (X, µ), with supporting set S = {s j } j J. If the Gramian matrix G ij = ϕ i, ϕ j is η-accreative and there exist C, α > such that G i,j Ce αd(s i,s j ), then there exist C,α > and an orthonormal basis {ϕ} j J such that f j = k J γ(j, k) ϕ k with γ(j, k) C e βd(s i,s j ). The proof proceeds exactly as in [6], with the traingle inequality on Z n replaced by the quasi-triangle inequality for the metric d. Definition 7 Let (X, ρ, µ) be a space of homogeneous type, and γ>. A subset {x i } i of X is a γ-lattice if ρ(x i,x j ) >cγ, for all i, j, and for every y X, there exists i(y) such that ρ(y, x i(y) ) <cγ.herec depends only on the constant C X of the space of homogeneous type. We first present our construction in general. Theorem 8 (Construction of the Multi-Resolution Analysis) Let {T t } be a symmetric diffusion semigroup acting regularly on X, Φ ={ ϕ k } k K be a countable orthonormal δ-local basis, with supporting set S. Let Ṽ = < Φ > and assume Ṽ is ɛ-close to V,forsomefixedɛ>. 5

16 Then we can build a set of linear maps {G j } and {M j } as in figure, and scaling functions with the following properties: (I) For every j, let Φ={{{ϕ j,l,k } k Kj,l } l=,...,lj } j=,...,j, Φ j = {{ϕ j,l,k } k Kj,l } l=,...,lj and Ṽ j = < Φ j >.ThenΦ j is an orthonormal basis and Ṽ j is (j +)ɛ-close to V j. (II) For every j, the orthonormal basis Φ j has the following layered structure: (a) For every l {,...,L j }, K j,l j+ µ(x). µ(b (t j ) ) δ (b) For every l L j, with respect to the metric d (tj),theset{ϕ j,l,k } k Kj,l is ( l+ )δ-local and admits a supporting set S j,l that is a ( l+ )δ-lattice. In particular, supp. ϕ j,l,k supp. ϕ j,l,k =, k,k K j,l, k k. (III) The representation of T j+ on Φ j is faithful, in the sense that T j+ P Φj is (j +)ɛ-close to T j+ Ran(T j+ ). (IV) The orthonormalization map G j mapping Φ j = T j Φ j onto Φ j+,andthe transformation M j = G j T j mapping Φ j onto Φ j+ are local and sparse. A localized orthonormal basis of wavelets spanning the subspace W j,defined as in (4.), can be constructed. Alternatively, by using the orthogonalization of Proposition 6, a family of scaling functions Φ with exponential decay for the same subspaces Ṽ j can be constructed. PROOF. We apply T to the given orthonormal basis Φ (dilation step). This yields a ηδ-local family of functions { ϕ,k } k K on X whose span Ṽ is ɛ-distant from V, by definition of V, because Ṽ was ɛ-close to V and T ɛ is a contraction. We then apply Proposition 4 or to { ϕ,k } k K (downsampling step) togetthe orthonormal basis Φ = {{ϕ,l,k } k K,l } l=,...,l for a subspace Ṽ which is ɛ-close to Ṽ, and hence ɛ-close to V,withthe properties claimed in Theorem 8. At this point we represent (T ) ɛ on Φ. Observe that only the top L(ɛ )layers of Φ are needed to express (T ) ɛ up to precision ɛ, because of the properties of 6

17 the algorithm used to build Φ. Moreover the subset {{ϕ,l,k } k K,l } l=,...,l(ɛ ) of Φ is (η L(ɛ )+ )δ-local. By induction, once we have built Φ j with all the properties claimed, we apply (T j+ ) ɛ to Φ j getting a set of functions Φ j. As before, observe that only the first L(ɛ j+ )layersofφ j will be necessary to express T j+ to precision ɛ, and this subset is local. Applying Proposition 4 yields an orthonormal basis Φ j+ with the properties claimed. The map G j is constructed in the proof of Proposition 4, and is a local orthogonalization map that is sparse, being a multiscale orthogonalization of local functions. The orthogonalization produces an error term of order ɛ in the distance between Ṽj and Ṽ j. Observe that each orthonormalization could have achieved by applying Proposition 6, since at each level we are applying a self-adjoint, strictly positive operator to an orthonormal basis: it is easy to see that the resulting system has an accreative Gramian matrix. 4.4 Downsampling and compression of the powers of T Let us summarize this construction in terms of linear operators (and corresponding matrix representations), to show how this leads not only to the fast construction of the multiscale analysis and the orthonormal bases of scaling functions, but also to a multi-level compression of the operator at different scales. Figure is again a visual summary of what follows. By induction suppose we have constructed the orthonormal scaling functions Φ j = {{ϕ j,l,k } k Kj,l } l=,...,lj at scale j, written on the orthonormal basis Φ j = {{ϕ j,l,k } k Kj,l } l=,...,lj at scale j, and compress T j by representing it on these two bases in the domain and range respectively. Then we perform two operations: we compute Φ j+ = { ϕ j,l,k } = {T j ϕ j,l,k } for every k K j,l, l {,...,L j } (this operation is fast because T j is compressed and well-localized after compression), and then use a fast local modified Gram-Schmidt to find the numerical span of { ϕ j,l,k } k Kj,l and at the same time orthogonalize them. This is a linear map G j that yields the orthonormal basis Φ j+ = {ϕ j+,l,k } k Kj+,l, l {,...,L j+ }, naturally represented on the bases Φ j (in the domain) and Φ j+ (in the range), such that < Φ j+ > is ɛ-close to < Φ j+ >.ThemapM j is the composition G j T j, and is thus naturally represented on the bases Φ j (in the domain) and Φ j+ (in the range). Observe that for j = we can assume the operator 7

18 T is given represented on the basis {ϕ k } k K, and then we represent T j on the basis Φ j+ simply by computing T j j = ( ) Φ j Tj j ΦT j = Mj Mj T. (4.) Φ j is of size K j L j K j L j,sotj j is of the same size, and is a compressed version of the operator T j acting on Ṽj. We can interpret this as having downsampled the subspace V j at the critical rate for representing up to precision the action of T j onit.thisisrelatedto the Heisenberg principle and sampling theory for these operators [], which implies that low-frequency eigenfunctions are not well localized and can be synthesized by coarse (depending on the scale) bump functions. One then expects the existence of quadrature formulas determined by very few samples: we consider this problem of great importance and it is currently being investigated. Most of the functions {ϕ j,l,k } are essentially as well localized as it is compatible with their being in V j. Moreover, because of this localization property, we can interpret this downsampling in function space geometrically. We have the identifications X j := {x l,k : x l,k is center of ϕ j,l,k } K j,l {ϕ j,l,k } k Kj,l. (4.3) The natural metric on X j is d (t j), which is, by (4.), the distance between the ϕ j,l,k in L (X, µ), and can be computed recursively in each X j by combining (4.) with (4.). This allows us to interpret X j as a space representing the metric d (t j) in compressed form. In our construction we only compute ϕ j,l,k expanded on the basis {ϕ j,l,k } k Kj,l (this is the role of M j ), in other words, up to the identifications above, we know ϕ j,l,k on the downsampled space X j only. However we can extend these functions to X j and recursively all the way down to X = X, just by using the multiscale relations: ϕ j,l,k (x) =M j ϕ j,l,k (x),x X j j = M j M j... M ϕ,l,k (x) = M l ϕ,l,k (x),x X l= (4.4) This is of course completely analogous to the standard construction of scaling functions in the Euclidean setting [4,4,4,,4]. This formula also immediately generalizes to arbitrary functions in V j, extending them from X j to the whole original space X (see for example Figure 6). The matrix M j is sparse, since T j is expected to be local (in the compressed space X j )andg j is sparse (as we show in the Appendix), so that M j has essentially X j log X j elements above precision (in fact, in many cases of 8

19 practical interest, only X!), at least for j not too large. When j is large the operator is in general not local anymore, but the space X j on which it is compressed is expected to be very small. For interesting classes of operators we actually expect to have only X j entries in M j which are bigger than ɛ, and thus the algorithms presented will have order O(n). This follows from the observation that the basis functions {{ϕ l,k } k Kl } l=,...,l by construction essentially span the subspace corresponding to the top singular vectors of Φ =T j Φ j, which correspond to the (fast) decaying singular values of T j. Remark 9 We could have started from Ṽ and the defined V j as the result of j steps of our algorithm: in this way we could do without the spectral decomposition for the semigroup. This permits the application of our whole construction and algorithms to the non-self-adjoint case. Remark (Biorthogonal bases) While at this point we are mainly concerned with the construction of orthonormal bases for the approximation spaces V j, well-conditioned bases would be just as good for most purposes, and would lead to the construction of stable biorthogonal scaling function bases. This could follow the ideas of dual operators in Coifman s formula on space of homogeneous type [43,6], and also, exactly as in the classical biorthogonal wavelet construction [44], we would have two ladders of approximation subspaces, with wavelet subspaces giving the oblique projection onto their corresponding duals. This will be reported in the separate work [9]. 4.5 Wavelets We would like to construct bases {ψ j,l,k } k,l for the spaces W j, j, such that V j W j = V j+. To achieve this, after having built {{ϕ j,l,k } k Kj,l } l and {{ϕ j,l,k } k Kj,l } l, we can apply our modified Gram-Schmidt procedure with geometric pivoting to the set of functions {(P j P j+ )ϕ j,l,k } k Kj,l, which will yield an orthonormal basis Ψ j of wavelets for the orthogonal complement W j of V j+ in V j. Moreover one can easily prove upper bounds for the diameters of the supports of the wavelets so obtained and for their decay. 4.6 Vanishing moments for the scaling functions and wavelets In Euclidean settings vanishing moments are usually defined via orthogonality relations to subspaces of polynomials up to a certain degree. In our setting the natural subspaces with respect to which to measure orthogonality is the set of 9

20 eigenfunctions ξ λ of T. So the number of vanishing moments of an orthonormal scaling function ϕ j,l,k can be defined as the number of eigenfunctions corresponding to eigenvalues in σ T,j \ σ T,j+ (as defined in (4.9)) to which T j ϕ j,l,k is orthogonal up to precision ɛ. Observe this is comparable to defining the number of vanishing moments based on T j+ ϕ j,l,k, which evokes classical estimates in multiscale theory of Calderón-Zygmund operators. In section 5 we will see that there is an approximately monotonic relationship between the index l for the layer of scaling functions and the number of eigenfunctions to which the scaling function are orthogonal, i.e. the number of moments of the scaling functions. In particular, the mean of the scaling functions is typically roughly constant for the first few layers, and then quickly drops below precision as l grows: this would allow to split each scaling function space in two subspaces, one with scaling functions of non zero mean, and one of scaling functions with zero mean, which in fact could be appropriately called wavelets. 5 The orthogonalization step: computational and numerical considerations In this section we discuss details of the implementation, comment on the computational complexity of the algorithms, and on their numerical stability. Suppose we are given a δ local family Φ ={ ϕ k } k K of positive functions on X, andletx = { x k } k K be a supporting set for Φ. With obvious meaning, we will sometimes write ϕ xk for ϕ k.letv = < { ϕ k } k K >. We want to build an orthonormal basis Φ = {ϕ k } k K whose span is ɛ-close to V. Out of the many possible solutions to this standard problem (see e.g. [45,46] and references therein as a starting point), we seek one for which the ϕ k s have small support (ideally of the same order as the support of ϕ k ). Standard orthonormalization in general may completely destroy the size the of the support of (most of) the ϕ k s. We suggest two algorithms that try to remedy to this problem: the first one uses geometric information on the set (hence something outside the scope of linear algebra) to derive a pivoting rule that guarantees several orthogonal functions to have small support; the second one uses uniquely the matrix representing T in a local basis, and a nonlinear pivoting rule mixing L and L p norms, p<. The computational cost of constructing the Multi-Resolution Analysis described in the Theorem can be as high as O(n 3 ) in general, where n is the

21 cardinality of X. However, there are two distinct types of properties of T that would allow to dramatically improve the time necessary for the construction: I. The decay of σ(t ): the faster the spectrum decays, the smaller the numerical rank of the powers of T, the more these can be compressed, hence the faster the construction of Φ j for large j s. II. If each basis Φ j of scaling functions that we build is such that the elements with large support (large layer index l) are in a subspace spanned by eigenvectors of T corresponding to very small eigenvalues (depending on l), then these basis functions of large support will not be needed to compute the next(dyadic) power of T. This implies that the matrices representing all the basis transformations will be uniformly sparse. A combination of these two hypothesis, which are in fact quite natural and verified by many operators arising in practice, allow to perform the construction of the whole Multi-Resolution Analysis described in Theorem 8 in time O(n log n)oreveno(n). Item (I.) above can only be an assumption. Item (II.) however is a question about the algorithms from linear algebra used to construct the bases of scaling functions. We want to comment here on several possible choices of algorithms to be used. (a) The first one is called modified Gram-Schmidt with pivoting, it is classical in numerical analysis (see for example [46]). (b) The second one is also based on finite dimensional linear algebra, and is a refinement of the first. It constructs so-called Rank Revealing QR factorizations. Being purely based on linear algebra, it holds with minimum assumptions on T, and this comes at the expense of computational speed. (c) The third one is based on analysis and geometry, and applies at least if T looks like a Laplacian, in a sense to be made precise below. In this case there are generalized Heisenberg principles, or sampling theorems, that give information on the critical sampling rate necessary to approximate eigenvectors of T, depending on the corresponding eigenvalues. One can then construct the scaling functions bases using this information, and in this way guarantee that property (II) is satisfied. 5. Modified Gram-Schmidt with pivoting Given a set of functions Φ onx, andɛ>, the algorithm Modified Gram- Schmidt with Pivoting computes an orthonormal basis Φ for a subspace ɛ- close < Φ >, as follows.

22 Let Φ = Φ, k =. Let ϕ k be an element of Φ with largest L norm. If ϕ k < ɛ,stop, Φ otherwise add ϕ k / ϕ k to Φ. 3 Orthogonalize all the remaining elements of Φ toϕ k, obtaining a new set Φ. Increment k by. Go to step. When the algorithm stops, we have obtained an orthonormal basis Φ whose span is ɛ-close to < Φ >. This follows from the fact the singular values of the family Φ Φ are the same as those of Φ, however all the columns of Φ have L -norm smaller than ɛ, so the singular values of Φ cannot be larger than Φ Φ ɛ. This procedure can be shown to be stable numerically, at least when Φ isnot too ill-conditioned [46,?]. When the problem is very ill-conditioned, a loss of orthogonality in Φ may occur while running the algorithm. Reorthogonalizing the elements already selected in Φ is enough to achieve stability, see [45] for details and references. The main drawback of this procedure is that the supports of the functions in Φ can be arbitrarily large even if the supports of Φ are small, and there exists another orthonormal basis for < Φ > made of well-localized functions. Since the size of the supports of the functions in the new basis are crucial for the speed of the algorithm, we suggest two modifications of the modified Gram-Schmidt that seek to obtain orthonormal bases of functions with smaller support. 5. Multiscale modified Gram-Schmidt with geometric pivoting We suggest here to use a fast local Gram-Schmidt orthogonalization with geometric pivoting, that provides some guarantees on the size of the support of the orthonormal functions it constructs, besides having complexity O(n logn), where n is the cardinality of X, when applied to a local family. Here we are assuming that the approximate (up to precision) search for the ɛ-neighbors of any point in X has complexity at most log n. To achieve this, X may need to be pre-processed, and state of the art fast nearest neighbor algorithms achieve this in time O(n +η log n), with constant exponential in the dimension of X, defined in a reasonable way. The literature on fast approximate nearest neighbors and range searches is vast, see for example [47] as a starting point. Here we limit ourselves to observe that since T is local and is given represented in a δ-local basis, we already have a very valuable amount of information about the nearest neighbors of each point, which can be exploited.

23 In the following subsection we will present an algorithm that does not use the geometry of the supports and does not need to know nearest neighbors. The Gram-Schmidt procedure is completely local, in the sense that it is organized so that each of the constructed functions needs to be orthonormalized only to the nearby functions, across scales, at the appropriate scale. PROOF. [Construction of Proposition 4, greedy version] We start by building K so that {x,k } k K is a δ-lattice, following [] (see also [48]). We pick x, S arbitrary, then we find a x, S \ (S B δ (x, )) closest to x, and such that ϕ x, is δ-local (when there is more than one, choose one randomly) and so on: when x,,...,x,k S have been picked, pick x,k+ S \(S N δ ({x,,...,x,k })) closest to N δ ({x,,...,x,k }), and such that ϕ x,k+ is δ-local, till no such point can be found. The set {x,k } k K is a δlattice by construction, and the functions ϕ,k := ϕ,k, k K are orthogonal because they are δ-local and hence have disjoint compact supports. Of course we normalize them in L before proceeding. Now locally orthogonalize all the remaining ϕ k to the selected {ϕ,k } k K (this operation is local as well): the resulting set of residual functions Φ = ϕ k is ( +)δ-local. We can repeat the procedure above on Φ by building a ( +)δ-lattice K. In the actual numerical implementation, we pick x,k such that the corresponding residual function ϕ x,k has the largest L -norm among the remaining ones (like in pivoted Gram-Schmidt). The thesis follows by induction: after having built Φ Φ l = {{ϕ l,k } k Kl } l=,...,l we orthogonalize the remaining ϕ k to < Φ,...,Φ l >, thus getting a set Φl. Each function in Φl is ( l+ )δ-local: we find a ( l+ )δ-lattice K l + and normalize the family { ϕ k } k Kl of functions with disjoint supports to + get {ϕ l +,k}. As before, in the implementation we choose k Kl ϕ + l +,k that has the maximum norm among the residual functions Φl. In the non-greedy version of the algorithm, at each level, rather than choosing one closest scaling function in Φl with disjoint support to the ones already chosen at that level, we choose the scaling function with largest L norm among the ones in the corona N ( l+ δ )δ,( δ )δ: any of these is guaranteed l+ to have support disjoint from the previous ones (but not too far away from at least one of them), and choosing the one with largest L norm is the natural choice for pivoting to maintain numerical stability. We stop the construction when no element ϕ k has L -norm greater than ɛ. 3

24 With these modifications, this construction inherits all the stability properties of modified Gram-Schmidt with pivoting, at least in the case in which ϕ x is between two close constants, since in this case we are simply rearranging the pivoting based on geometric information, resulting in better packings, localization of the orthogonalization computations, and achieving the promised computational cost. Remark When the initial system Φ is very ill-conditioned and orthogonality of Φ is crucial, at each level l we need to re-orthogonalize the system built so far: see [49,45] for details and references (and observe that this only about doubles the overall cost). At least when X is finite, the construction above is a particular, geometricdriven, QR decomposition of the matrix representing the change of basis from Φ to Φ. More precisely, let ( Φ) kj = ϕ k (x j ), with k Kand j X (with the identification between j and x j ), and similarly let (Φ) kj = ϕ k (x j ), with k running through the indices in K,...,K L in this order. Assume that Φ is δ-local. Then we have, up to a permutation of the columns, ϕ,... ϕ, K ϕ, D, E Φ= ϕ, K =, D, Φ (5.) E J, E J, D J,J... ϕ L,... ϕ L, KL where D l,l is a K l K l diagonal matrix, and E i,j, which orthogonalizes the residual functions Φi to the subspace spanned by the first j levels, i.e. < Φ,...,Φ j >,isofsize K i K j and sparse. In fact, it is important to realize that by construction each row of E i,j contains about B l δ B i δ Cl i X non-zero entries. Hence in the whole matrix there are about 4

25 log X l= l i= K l C l i X log X l= K l C l X C l+ X C X log X X C l l= CX l X X log X non-zero entries. Of these, for large classes of useful operators, the number of entries above precision is actually of the order of X. It is not by chance that the form of the matrix above is very similar to the ones for wavelet compressed Calderón-Zygmund operators as in []. There the wavelets for the compression of the operator where fixed and universal, here they are constructed in arbitrary geometries and adapted to the operator. Remark The construction in this proposition applies in particular to the fast construction of local orthonormal (multi-)scaling functions in R n,which we think is of independent interest. As an example, this construction can be applied to spline functions of any order, yielding an orthonormal basis of splines different from the standard one due to Strömberg [5]. Remark 3 The cost of running the above algorithm is O(n log n) times the cost of finding (approximate) r-neighbors, in addition to the cost of any preprocessing that may be necessary to perform these searches fast. This follows from the fact that all the orthogonalization steps are local. Observe that because of the geometric assumptions on Φ and T, essentially all the information needed regarding nearest neighbor is encoded in the matrix representing T. 5.3 Modified Gram-Schmidt with mixed L -L p pivoting In this section we propose a second algorithm for computing an orthogonal basis spanning the same subspace as a given δ-local family Φ. The main motivation for the algorithm in the previous section was the observation that modified Gram-Schmidt with pivoting in general can generate basis functions with large supports, while it is crucial in our setting to find orthonormal bases with rather small support. We would like to introduce a term in modified Gram-Schmidt that prefers functions with smaller support to functions with larger support. On the unit ball of L, the functions with concentrated support are those in L or, more generally, in L p,forsomep<. We can then modify the modified Gram-Schmidt with pivoting algorithm of section 5. as follows. 5

26 Let Φ = Φ, k =,λ>, p [, ]. Let ϕ k the element of Φ with largest L norm, say N k = ϕ k.among all the elements of Φ withl norm larger than N k /λ, pick the one with the smallest L p norm: let it be ϕ k.if ϕ k < ɛ, stop, otherwise add Φ ϕ k / ϕ k to Φ. 3 Orthogonalize all the remaining elements of Φ toϕ k, obtaining a new set Φ. Increment k by. Go to step. Choosing the element by bounding from below its L norm is important for numerical stability, but relaxing the condition of picking the element with largest L norm allows us to pick an element with smaller L p norm, yielding potentially much better localized bases. The parameter λ controls this slack, and in practice choosing it between 3/4 and gave good results. It is easy to construct examples in which the standard modified Gram-Schmidt with L pivoting (which corresponds to λ = ) leads to bases with most elements having large support, while our modification with mixed norms yields much better localized bases. Remark 4 Observe that this algorithm does not require knowledge of the nearest neighbors. In the implementation, to achieve maximum speed, the matrix representing T should be in sparse form and queries of non-zero elements by rows and columns should be fast. We have not investigated theoretically the stability of this algorithm. We just observed that it worked very well in the examples tried, with stability comparable to the standard modified Gram-Schmidt with pivoting, and very often much better localization in the resulting basis functions. 5.4 Rank Revealing QR factorizations We refer the reader to [5], and references therein, for details regarding the numerical and algorithmic aspects related to the rank-revealing factorizations discussed in this section. Definition 5 A partial QR factorization of a matrix M Mat(n, n) MΠ =QR = Q A k B k, (5.) C k where Q is orthogonal, A k Mat(k, k) and is upper-triangular with nonnegative diagonal elements, B k Mat(k, n k), C k Mat(n k, n k), andπ is 6

27 a permutation matrix, is a strong rank-revealing QR factorization if σ min (A k ) σ k(m) p (k, n) and σ j(c k ) σ k+j (M) p (k, n) (5.3) and ( A k k) B p (k, n) (5.4) where Q in orthogonal, p (k, n) and p (k, n) are functions bounded by a lowdegree polynomial in k and n, andσ max (A) and σ min (A) denote the largest and smallest singular values of a matrix A. Modified Gram-Schmidt with pivoting, as described in section 5. actually yields a decomposition satisfying (5.3), but with p depending exponentially in n [5]. Let SRRQR(M,ɛ) be an algorithm that computes a rank-revealing QR factorization of M, withσ k+ ɛ. Gu and Eisenstat present such an algorithm, that satisfies (5.3) and (5.4) with p (k, n) = +nk(n k) andp (k, n) = n. Moreover, this algorithm requires in general O(n 3 ) operations, but faster algorithms exploiting sparsity of the matrices involved can be devised. We can then proceed in the construction of the Multi-Resolution Analysis described in Theorem 8 as follows. We replace each orthogonalization step G j by the following two steps. First we apply SRRQR(Tj j,ɛ)togetarankrevealing QR factorization of T j, on the basis Φ j,whichwewriteas Tj j Π j = Q j R j = Q j A j,k B j,k. (5.5) C j,k The columns of Q j span Ṽj+ up to error ɛ p(k, n), where k =argmax i {i : λ j+ i ɛ} =dim(ṽj+). In fact, the strong rank-revealing QR decomposition above shows that the first k columns of T j Π j are a well-conditioned basis that ɛ p(k, n)-spans Ṽ j+. Now, this basis is a well-selected subset of bump functions T j+ Φ, and can be orthogonalized with Proposition 4, with estimates on the supports exactly as claimed Theorem 8. 7

28 5.5 Computation of the Wavelets In the computation of wavelets as described in section 4.5, numerically one has to make sure that, at every orthogonalization step, the construction of the wavelets is made on vectors that are numerically orthogonal to the scaling functions at the previous scale. This can be attained, in a numerically stable way, by repeatedly orthogonalizing the wavelets to the scaling functions. Observe that this orthogonalization is again a local operation, and hence can be computed fast. This is also necessary to guarantee that the construction of wavelets will stop exactly at the exhaustion of the wavelet subspace, without spilling into the scaling spaces. 6 Fast Algorithms Assume that both (I) and (II) in section 5 are satisfied or, more generally, that each orthonormalization step and change of basis can be computed in time O(n log n). Then there are fast algorithms for constructing the whole Multi-Resolution Analysis: Proposition 6 The construction of all the scaling functions, and linear transformations {G j } and {M j },forj =,...,J, as described in Theorem 8, can be done in time O(n log n). Corollary 7 (The Fast Scaling Function Transform) Let the scaling function transform of a function f L (X, µ), X = n, be the set of all coefficients {< f,φ j >} j=,...,j. All these coefficients can be computed in time O(n log n). Remark 8 For various interesting classes of operators arising in applications, the computational complexity of the transform is actually O(n) instead of O(n log n) since the number of entries in M j above precision is n instead of log n. The following is a simple consequence of Corollary 7: Corollary 9 (The Fast Wavelet Transform) Let the wavelet transform of a function f L (X, µ) be the set of all coefficients {< f,ψ j >} j=,...,j. All these coefficients can be computed in time O(n log n). 8

29 Remark 3 Remark 8 applies here as well. 6. Fast eigenfunction computation The computation of approximations to the eigenfunctions ξ λ corresponding to λ>λ can be computed efficiently in V j,wherej is min{j : λ t j >ɛ}, and then the result can be extended to the whole space X by using the extension formula (4.4). These eigenfunctions can also be used to embed the coarsened space X j into R n, along the lines of section 4., or can be extended to X and used to embed the whole space X. See [5,35] for results of multigrid techniques applied to fast eigenfunction computations. 6. Fast inversion of Laplacian-like operators Given a fast method like ours to compute the powers of T, the Laplacian (I T ) can be inverted (on the orthogonal complement of the eigenspace corresponding to the eigenvalue ) via the Schultz method []: since and, if S K = K k= T k,wehave (I T ) f = S K+ = S K + T K S K = + k= T k f K ( I + T k) f. Since we can apply fast T k to any function f, and hence the product S K+, we can apply (I T ) to any function f fast, in time O(Knlog n). The value of K depends on the gap between and the first eigenvalue smaller than, which essentially regulates the speed of convergence of a distribution to its asymptotic distribution. Observe that we never construct the full matrix representing (I T ) (which is general full!), but we only keep a compressed multiscale representation of it, which we can use to compute the action of the operator on any function. k= 9

30 6.3 Relationship with eigenmap embeddings In Section 4. we discussed metrics induced by a symmetric diffusion semigroup {T t } acting on L (X, µ), and how eigenfunctions of the generator of the semigroup can be used to embed X in Euclidean space. Similar embeddings can be obtained by using diffusion scaling functions and wavelets, since they span subspaces localized around spectral bands. So for example the top diffusion scaling functions in V j approximate well, by the arguments in this Section, the top eigenfunctions. See for example Figure 6. 7 Natural multiresolution on sampled Riemannian manifolds and on graphs The natural construction of Brownian motion and of the associated Laplace- Beltrami operator for compact Riemannian manifolds can be approximated on a finite number of points which are realizations of a random variable taking values on the manifold according to a certain probability distribution as in [4,3]. The construction starts with assuming that we have a data set Γ which is obtained by drawing according to some unknown distribution p, and whose range is a smooth manifold Γ. Given a kernel that satisfies some natural mild conditions, for example in the standard Gaussian form x y K(x, y) =e ( δ ) for some scale factor δ>, is normalized twice as follows before computing its eigenfunctions. The main observation is that any integration on the empirical data set Γ of the kernel against a function is in the form K(x, y i )f(y i ) Γ and thus it a Riemann sum associated to the integral K(x, y)f(y)p(y)dy. Γ Hence to capture only the geometric content of the manifold it is necessary to get rid of the measure p on the dataset. This can be estimated at scale δ for example by convolving with some smooth kernel (for example K itself), thus obtaining an estimated probability density p δ. One then considers the kernel K(x, y) = K(x, y) p δ (y) 3

31 as new kernel. This kernel is further normalized so that it becomes averaging on the data set, yielding K(x, y) = K(x, y) Γ K(x, y)p(y)dy. It is shown in [3] that with this normalization, in the limit Γ + and δ, the kernel K thus obtained is the one associated with the Laplace- Beltrami operator on Γ. Applying our construction to this natural kernel, we obtain the natural multiresolution analysis associated to the Laplace-Beltrami operator on a compact Riemannian manifold. This also leads to compressed representation of the heat flow and of functions of it. On weighted graphs, the canonical random walk is associated with the matrix of transition probabilities P = D W where W is the symmetric matrix of weights and D is the diagonal matrix defined by D ii = j W ij. P is in general not symmetric, but it is conjugate, via D to the symmetric matrix D WD which is a contraction on L. Our construction allows the construction of a multiresolution on the graph, together with downsampled graphs, adapted to the random walk [3]. 8 Examples and applications 8. Orthogonal spline multiresolution Spline multiresolution analyses are well-studied in the wavelet literature (see for example [5,44,53 55] and references therein, for multiwavelets see for example [56,57] and references therein) and used in several applications, image analysis, finite elements methods for PDEs and others. The classical spline functions of order L on the real line, with knots on the integer grid, are piecewise polynomials of degree L on each interval of the form [k, k +], k Z, and globally of class C L. The basic spline function of order L is defined by ( ) +e iξ L+ ˆϕ(ξ) = so that ϕ(x) =( ) L+ χ [,] (x). The space of splines with knots on the integer grid is spanned by {ϕ( k)} k Z, and in fact this is a Riesz basis. To obtain an orthonormal basis these functions are usually orthonormalized via a translation-invariant Gram-Schmidt 3

32 V V V 3 V 4 V V V V V V V. 555 V. 555 V V V Fig. 3. Multiresolution Analysis on the circle. We consider 56 points on the unit circle, start with ϕ,k = δ k and with the standard diffusion. We plot several scaling functions in each approximation space V j Fig. 4. Multiresolution Analysis on the circle: on the left we plot the compressed matrices representing powers of the diffusion operator, on the right we plot the entries of the same matrices which are above working precision procedure [5,58] yielding the so called Franklin scaling functions and the corresponding wavelets, which have support on the whole real line, but still are exponentially decaying. To preserve the compactness of the support, [44] built biorthogonal splines. We can apply our construction to this situation, by letting X = R (or Z, or perturbed versions of Z!) with standard distance and measure, and the initial family of scaling functions being the family of splines of order L, {ϕ L ( k)} k Z (restricted to Z if X = Z).. Our Gram-Schmidt orthogonalization with geometric pivoting yields infinitely many different scaling functions, and scaling functions at substrate l having support of size comparable to l diam.supp. ϕ L. 3

33 V V V3 V4 V5 V6 V7 V8 V9 V V W.5 W.5 W3 W4 W5 W6 W7 W8 W W.5.5 W.5 x V V V3 V4 V5 V6 V7 V8 V9 V V x 5 W x 4 W x 4 W x 3 W x 3 W W6. W W8 W W W Fig. 5. Multiresolution Analysis on the circle. In the same setting as for Figure 3, we compute the multiscale transform of a periodic signal on the circle, contaminated by two δ-impulses (top) and of windowed chirp (bottom). In the first column we plot the projections onto coarser and coarser scaling spaces, in the second column we plot the projection on the corresponding wavelet subspaces. Computations here were done to 5 digits of precision. 8. A simple homogenization problem We consider the non-homogeneous heat equation on the circle u t = ( c(x) ) x x u (8.) where c(x) is quite non-uniform, and we want to represent the large scale/large time behavior of the solution by compressing powers of the operator ( ) representing the discretization of the spatial differential operator x c(x) x.the spatial operator is of course one of the simplest Sturm-Liouville operators in 33

34 c(x) 3 5 φ Fig. 6. Non homogeneous medium with non-constant diffusion coefficient (plotted top left): the scaling functions for V 5 (top right). In the middle row: an eigenfunction (the 35th) of the diffusion operator (left), and the same eigenfunction reconstructed by extending the corresponding eigenvector of the compressed T (right): the L -error is of order 5. The entries above precision of the matrix T 8 compressed on V 4 (bottom left). We plot bottom right {(ϕ 8, (x i ),ϕ 8, (x i ))} i,which are an approximation to the eigenmap corresponding to the first two eigenfunctions, since V 8 has dimension 3. 34

Diffusion Wavelets and Applications

Diffusion Wavelets and Applications Diffusion Wavelets and Applications J.C. Bremer, R.R. Coifman, P.W. Jones, S. Lafon, M. Mohlenkamp, MM, R. Schul, A.D. Szlam Demos, web pages and preprints available at: S.Lafon: www.math.yale.edu/~sl349

More information

Diffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets.

Diffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets. Diffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets. R.R. Coifman, S. Lafon, MM Mathematics Department Program of Applied Mathematics. Yale University Motivations The main

More information

A Multiscale Framework for Markov Decision Processes using Diffusion Wavelets

A Multiscale Framework for Markov Decision Processes using Diffusion Wavelets A Multiscale Framework for Markov Decision Processes using Diffusion Wavelets Mauro Maggioni Program in Applied Mathematics Department of Mathematics Yale University New Haven, CT 6 mauro.maggioni@yale.edu

More information

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN Contribution from: Mathematical Physics Studies Vol. 7 Perspectives in Analysis Essays in Honor of Lennart Carleson s 75th Birthday Michael Benedicks, Peter W. Jones, Stanislav Smirnov (Eds.) Springer

More information

Diffusion Geometries, Global and Multiscale

Diffusion Geometries, Global and Multiscale Diffusion Geometries, Global and Multiscale R.R. Coifman, S. Lafon, MM, J.C. Bremer Jr., A.D. Szlam, P.W. Jones, R.Schul Papers, talks, other materials available at: www.math.yale.edu/~mmm82 Data and functions

More information

Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes

Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes Mauro Maggioni mauro.maggioni@yale.edu Department of Mathematics, Yale University, P.O. Box 88, New Haven, CT,, U.S.A.

More information

Spectral Processing. Misha Kazhdan

Spectral Processing. Misha Kazhdan Spectral Processing Misha Kazhdan [Taubin, 1995] A Signal Processing Approach to Fair Surface Design [Desbrun, et al., 1999] Implicit Fairing of Arbitrary Meshes [Vallet and Levy, 2008] Spectral Geometry

More information

Chapter One. The Calderón-Zygmund Theory I: Ellipticity

Chapter One. The Calderón-Zygmund Theory I: Ellipticity Chapter One The Calderón-Zygmund Theory I: Ellipticity Our story begins with a classical situation: convolution with homogeneous, Calderón- Zygmund ( kernels on R n. Let S n 1 R n denote the unit sphere

More information

March 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University

March 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University Kernels March 13, 2008 Paper: R.R. Coifman, S. Lafon, maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University Kernels Figure: Example Application from [LafonWWW] meaningful geometric

More information

Multiscale Analysis and Diffusion Semigroups With Applications

Multiscale Analysis and Diffusion Semigroups With Applications Multiscale Analysis and Diffusion Semigroups With Applications Karamatou Yacoubou Djima Advisor: Wojciech Czaja Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions

Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions Sridhar Mahadevan Department of Computer Science University of Massachusetts Amherst, MA 13 mahadeva@cs.umass.edu Mauro

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Duality of multiparameter Hardy spaces H p on spaces of homogeneous type

Duality of multiparameter Hardy spaces H p on spaces of homogeneous type Duality of multiparameter Hardy spaces H p on spaces of homogeneous type Yongsheng Han, Ji Li, and Guozhen Lu Department of Mathematics Vanderbilt University Nashville, TN Internet Analysis Seminar 2012

More information

Algebras of singular integral operators with kernels controlled by multiple norms

Algebras of singular integral operators with kernels controlled by multiple norms Algebras of singular integral operators with kernels controlled by multiple norms Alexander Nagel Conference in Harmonic Analysis in Honor of Michael Christ This is a report on joint work with Fulvio Ricci,

More information

A new class of pseudodifferential operators with mixed homogenities

A new class of pseudodifferential operators with mixed homogenities A new class of pseudodifferential operators with mixed homogenities Po-Lam Yung University of Oxford Jan 20, 2014 Introduction Given a smooth distribution of hyperplanes on R N (or more generally on a

More information

Review and problem list for Applied Math I

Review and problem list for Applied Math I Review and problem list for Applied Math I (This is a first version of a serious review sheet; it may contain errors and it certainly omits a number of topic which were covered in the course. Let me know

More information

On a class of pseudodifferential operators with mixed homogeneities

On a class of pseudodifferential operators with mixed homogeneities On a class of pseudodifferential operators with mixed homogeneities Po-Lam Yung University of Oxford July 25, 2014 Introduction Joint work with E. Stein (and an outgrowth of work of Nagel-Ricci-Stein-Wainger,

More information

Spectral Continuity Properties of Graph Laplacians

Spectral Continuity Properties of Graph Laplacians Spectral Continuity Properties of Graph Laplacians David Jekel May 24, 2017 Overview Spectral invariants of the graph Laplacian depend continuously on the graph. We consider triples (G, x, T ), where G

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Data-dependent representations: Laplacian Eigenmaps

Data-dependent representations: Laplacian Eigenmaps Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component

More information

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics

More information

Analysis Preliminary Exam Workshop: Hilbert Spaces

Analysis Preliminary Exam Workshop: Hilbert Spaces Analysis Preliminary Exam Workshop: Hilbert Spaces 1. Hilbert spaces A Hilbert space H is a complete real or complex inner product space. Consider complex Hilbert spaces for definiteness. If (, ) : H H

More information

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M =

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = CALCULUS ON MANIFOLDS 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = a M T am, called the tangent bundle, is itself a smooth manifold, dim T M = 2n. Example 1.

More information

Lecture: Some Practical Considerations (3 of 4)

Lecture: Some Practical Considerations (3 of 4) Stat260/CS294: Spectral Graph Methods Lecture 14-03/10/2015 Lecture: Some Practical Considerations (3 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough.

More information

Spectral Geometry of Riemann Surfaces

Spectral Geometry of Riemann Surfaces Spectral Geometry of Riemann Surfaces These are rough notes on Spectral Geometry and their application to hyperbolic riemann surfaces. They are based on Buser s text Geometry and Spectra of Compact Riemann

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

CHAPTER VIII HILBERT SPACES

CHAPTER VIII HILBERT SPACES CHAPTER VIII HILBERT SPACES DEFINITION Let X and Y be two complex vector spaces. A map T : X Y is called a conjugate-linear transformation if it is a reallinear transformation from X into Y, and if T (λx)

More information

l(y j ) = 0 for all y j (1)

l(y j ) = 0 for all y j (1) Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that

More information

Wavelets, Multiresolution Analysis and Fast Numerical Algorithms. G. Beylkin 1

Wavelets, Multiresolution Analysis and Fast Numerical Algorithms. G. Beylkin 1 A draft of INRIA lectures, May 1991 Wavelets, Multiresolution Analysis Fast Numerical Algorithms G. Beylkin 1 I Introduction These lectures are devoted to fast numerical algorithms their relation to a

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS TSOGTGEREL GANTUMUR Abstract. After establishing discrete spectra for a large class of elliptic operators, we present some fundamental spectral properties

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Diffusion Wavelets for multiscale analysis on manifolds and graphs: constructions and applications

Diffusion Wavelets for multiscale analysis on manifolds and graphs: constructions and applications Diffusion Wavelets for multiscale analysis on manifolds and graphs: constructions and applications Mauro Maggioni EPFL, Lausanne Dec. 19th, 5 EPFL Multiscale Analysis and Diffusion Wavelets - Mauro Maggioni,

More information

October 25, 2013 INNER PRODUCT SPACES

October 25, 2013 INNER PRODUCT SPACES October 25, 2013 INNER PRODUCT SPACES RODICA D. COSTIN Contents 1. Inner product 2 1.1. Inner product 2 1.2. Inner product spaces 4 2. Orthogonal bases 5 2.1. Existence of an orthogonal basis 7 2.2. Orthogonal

More information

Topics in Harmonic Analysis Lecture 6: Pseudodifferential calculus and almost orthogonality

Topics in Harmonic Analysis Lecture 6: Pseudodifferential calculus and almost orthogonality Topics in Harmonic Analysis Lecture 6: Pseudodifferential calculus and almost orthogonality Po-Lam Yung The Chinese University of Hong Kong Introduction While multiplier operators are very useful in studying

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

L. Levaggi A. Tabacco WAVELETS ON THE INTERVAL AND RELATED TOPICS

L. Levaggi A. Tabacco WAVELETS ON THE INTERVAL AND RELATED TOPICS Rend. Sem. Mat. Univ. Pol. Torino Vol. 57, 1999) L. Levaggi A. Tabacco WAVELETS ON THE INTERVAL AND RELATED TOPICS Abstract. We use an abstract framework to obtain a multilevel decomposition of a variety

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Linear Algebra. Min Yan

Linear Algebra. Min Yan Linear Algebra Min Yan January 2, 2018 2 Contents 1 Vector Space 7 1.1 Definition................................. 7 1.1.1 Axioms of Vector Space..................... 7 1.1.2 Consequence of Axiom......................

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

are Banach algebras. f(x)g(x) max Example 7.4. Similarly, A = L and A = l with the pointwise multiplication

are Banach algebras. f(x)g(x) max Example 7.4. Similarly, A = L and A = l with the pointwise multiplication 7. Banach algebras Definition 7.1. A is called a Banach algebra (with unit) if: (1) A is a Banach space; (2) There is a multiplication A A A that has the following properties: (xy)z = x(yz), (x + y)z =

More information

Physics 202 Laboratory 5. Linear Algebra 1. Laboratory 5. Physics 202 Laboratory

Physics 202 Laboratory 5. Linear Algebra 1. Laboratory 5. Physics 202 Laboratory Physics 202 Laboratory 5 Linear Algebra Laboratory 5 Physics 202 Laboratory We close our whirlwind tour of numerical methods by advertising some elements of (numerical) linear algebra. There are three

More information

JUHA KINNUNEN. Harmonic Analysis

JUHA KINNUNEN. Harmonic Analysis JUHA KINNUNEN Harmonic Analysis Department of Mathematics and Systems Analysis, Aalto University 27 Contents Calderón-Zygmund decomposition. Dyadic subcubes of a cube.........................2 Dyadic cubes

More information

homogeneous 71 hyperplane 10 hyperplane 34 hyperplane 69 identity map 171 identity map 186 identity map 206 identity matrix 110 identity matrix 45

homogeneous 71 hyperplane 10 hyperplane 34 hyperplane 69 identity map 171 identity map 186 identity map 206 identity matrix 110 identity matrix 45 address 12 adjoint matrix 118 alternating 112 alternating 203 angle 159 angle 33 angle 60 area 120 associative 180 augmented matrix 11 axes 5 Axiom of Choice 153 basis 178 basis 210 basis 74 basis test

More information

Maxwell s equations in Carnot groups

Maxwell s equations in Carnot groups Maxwell s equations in Carnot groups B. Franchi (U. Bologna) INDAM Meeting on Geometric Control and sub-riemannian Geometry Cortona, May 21-25, 2012 in honor of Andrey Agrachev s 60th birthday Researches

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

Hilbert spaces. 1. Cauchy-Schwarz-Bunyakowsky inequality

Hilbert spaces. 1. Cauchy-Schwarz-Bunyakowsky inequality (October 29, 2016) Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/fun/notes 2016-17/03 hsp.pdf] Hilbert spaces are

More information

Multiscale Frame-based Kernels for Image Registration

Multiscale Frame-based Kernels for Image Registration Multiscale Frame-based Kernels for Image Registration Ming Zhen, Tan National University of Singapore 22 July, 16 Ming Zhen, Tan (National University of Singapore) Multiscale Frame-based Kernels for Image

More information

Spectral Theory, with an Introduction to Operator Means. William L. Green

Spectral Theory, with an Introduction to Operator Means. William L. Green Spectral Theory, with an Introduction to Operator Means William L. Green January 30, 2008 Contents Introduction............................... 1 Hilbert Space.............................. 4 Linear Maps

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

Wavelet Bases of the Interval: A New Approach

Wavelet Bases of the Interval: A New Approach Int. Journal of Math. Analysis, Vol. 1, 2007, no. 21, 1019-1030 Wavelet Bases of the Interval: A New Approach Khaled Melkemi Department of Mathematics University of Biskra, Algeria kmelkemi@yahoo.fr Zouhir

More information

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)

More information

Analysis in weighted spaces : preliminary version

Analysis in weighted spaces : preliminary version Analysis in weighted spaces : preliminary version Frank Pacard To cite this version: Frank Pacard. Analysis in weighted spaces : preliminary version. 3rd cycle. Téhéran (Iran, 2006, pp.75.

More information

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space.

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Hilbert Spaces Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Vector Space. Vector space, ν, over the field of complex numbers,

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Determinant lines and determinant line bundles

Determinant lines and determinant line bundles CHAPTER Determinant lines and determinant line bundles This appendix is an exposition of G. Segal s work sketched in [?] on determinant line bundles over the moduli spaces of Riemann surfaces with parametrized

More information

Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.

More information

Short note on compact operators - Monday 24 th March, Sylvester Eriksson-Bique

Short note on compact operators - Monday 24 th March, Sylvester Eriksson-Bique Short note on compact operators - Monday 24 th March, 2014 Sylvester Eriksson-Bique 1 Introduction In this note I will give a short outline about the structure theory of compact operators. I restrict attention

More information

Where is matrix multiplication locally open?

Where is matrix multiplication locally open? Linear Algebra and its Applications 517 (2017) 167 176 Contents lists available at ScienceDirect Linear Algebra and its Applications www.elsevier.com/locate/laa Where is matrix multiplication locally open?

More information

Lattice Theory Lecture 4. Non-distributive lattices

Lattice Theory Lecture 4. Non-distributive lattices Lattice Theory Lecture 4 Non-distributive lattices John Harding New Mexico State University www.math.nmsu.edu/ JohnHarding.html jharding@nmsu.edu Toulouse, July 2017 Introduction Here we mostly consider

More information

The heat kernel meets Approximation theory. theory in Dirichlet spaces

The heat kernel meets Approximation theory. theory in Dirichlet spaces The heat kernel meets Approximation theory in Dirichlet spaces University of South Carolina with Thierry Coulhon and Gerard Kerkyacharian Paris - June, 2012 Outline 1. Motivation and objectives 2. The

More information

Filtering via a Reference Set. A.Haddad, D. Kushnir, R.R. Coifman Technical Report YALEU/DCS/TR-1441 February 21, 2011

Filtering via a Reference Set. A.Haddad, D. Kushnir, R.R. Coifman Technical Report YALEU/DCS/TR-1441 February 21, 2011 Patch-based de-noising algorithms and patch manifold smoothing have emerged as efficient de-noising methods. This paper provides a new insight on these methods, such as the Non Local Means or the image

More information

The spectral zeta function

The spectral zeta function The spectral zeta function Bernd Ammann June 4, 215 Abstract In this talk we introduce spectral zeta functions. The spectral zeta function of the Laplace-Beltrami operator was already introduced by Minakshisundaram

More information

COUNTEREXAMPLES TO THE COARSE BAUM-CONNES CONJECTURE. Nigel Higson. Unpublished Note, 1999

COUNTEREXAMPLES TO THE COARSE BAUM-CONNES CONJECTURE. Nigel Higson. Unpublished Note, 1999 COUNTEREXAMPLES TO THE COARSE BAUM-CONNES CONJECTURE Nigel Higson Unpublished Note, 1999 1. Introduction Let X be a discrete, bounded geometry metric space. 1 Associated to X is a C -algebra C (X) which

More information

Manifold Regularization

Manifold Regularization 9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

Myopic Models of Population Dynamics on Infinite Networks

Myopic Models of Population Dynamics on Infinite Networks Myopic Models of Population Dynamics on Infinite Networks Robert Carlson Department of Mathematics University of Colorado at Colorado Springs rcarlson@uccs.edu June 30, 2014 Outline Reaction-diffusion

More information

Chapter 8 Integral Operators

Chapter 8 Integral Operators Chapter 8 Integral Operators In our development of metrics, norms, inner products, and operator theory in Chapters 1 7 we only tangentially considered topics that involved the use of Lebesgue measure,

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

WAVELETS WITH COMPOSITE DILATIONS

WAVELETS WITH COMPOSITE DILATIONS ELECTRONIC RESEARCH ANNOUNCEMENTS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 00, Pages 000 000 (Xxxx XX, XXXX S 1079-6762(XX0000-0 WAVELETS WITH COMPOSITE DILATIONS KANGHUI GUO, DEMETRIO LABATE, WANG-Q

More information

Vectors. January 13, 2013

Vectors. January 13, 2013 Vectors January 13, 2013 The simplest tensors are scalars, which are the measurable quantities of a theory, left invariant by symmetry transformations. By far the most common non-scalars are the vectors,

More information

Boolean Inner-Product Spaces and Boolean Matrices

Boolean Inner-Product Spaces and Boolean Matrices Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver

More information

The Hilbert Space of Random Variables

The Hilbert Space of Random Variables The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2

More information

Compact operators on Banach spaces

Compact operators on Banach spaces Compact operators on Banach spaces Jordan Bell jordan.bell@gmail.com Department of Mathematics, University of Toronto November 12, 2017 1 Introduction In this note I prove several things about compact

More information

Overview of normed linear spaces

Overview of normed linear spaces 20 Chapter 2 Overview of normed linear spaces Starting from this chapter, we begin examining linear spaces with at least one extra structure (topology or geometry). We assume linearity; this is a natural

More information

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline

More information

Real Analysis Notes. Thomas Goller

Real Analysis Notes. Thomas Goller Real Analysis Notes Thomas Goller September 4, 2011 Contents 1 Abstract Measure Spaces 2 1.1 Basic Definitions........................... 2 1.2 Measurable Functions........................ 2 1.3 Integration..............................

More information

Discrete quantum random walks

Discrete quantum random walks Quantum Information and Computation: Report Edin Husić edin.husic@ens-lyon.fr Discrete quantum random walks Abstract In this report, we present the ideas behind the notion of quantum random walks. We further

More information

The oblique derivative problem for general elliptic systems in Lipschitz domains

The oblique derivative problem for general elliptic systems in Lipschitz domains M. MITREA The oblique derivative problem for general elliptic systems in Lipschitz domains Let M be a smooth, oriented, connected, compact, boundaryless manifold of real dimension m, and let T M and T

More information

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms.

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. Vector Spaces Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. For each two vectors a, b ν there exists a summation procedure: a +

More information

Multiscale bi-harmonic Analysis of Digital Data Bases and Earth moving distances.

Multiscale bi-harmonic Analysis of Digital Data Bases and Earth moving distances. Multiscale bi-harmonic Analysis of Digital Data Bases and Earth moving distances. R. Coifman, Department of Mathematics, program of Applied Mathematics Yale University Joint work with M. Gavish and W.

More information

A SHORT PROOF OF THE COIFMAN-MEYER MULTILINEAR THEOREM

A SHORT PROOF OF THE COIFMAN-MEYER MULTILINEAR THEOREM A SHORT PROOF OF THE COIFMAN-MEYER MULTILINEAR THEOREM CAMIL MUSCALU, JILL PIPHER, TERENCE TAO, AND CHRISTOPH THIELE Abstract. We give a short proof of the well known Coifman-Meyer theorem on multilinear

More information

EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018

EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018 EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018 While these notes are under construction, I expect there will be many typos. The main reference for this is volume 1 of Hörmander, The analysis of liner

More information

10. Smooth Varieties. 82 Andreas Gathmann

10. Smooth Varieties. 82 Andreas Gathmann 82 Andreas Gathmann 10. Smooth Varieties Let a be a point on a variety X. In the last chapter we have introduced the tangent cone C a X as a way to study X locally around a (see Construction 9.20). It

More information

Diameters and eigenvalues

Diameters and eigenvalues CHAPTER 3 Diameters and eigenvalues 31 The diameter of a graph In a graph G, the distance between two vertices u and v, denoted by d(u, v), is defined to be the length of a shortest path joining u and

More information

here, this space is in fact infinite-dimensional, so t σ ess. Exercise Let T B(H) be a self-adjoint operator on an infinitedimensional

here, this space is in fact infinite-dimensional, so t σ ess. Exercise Let T B(H) be a self-adjoint operator on an infinitedimensional 15. Perturbations by compact operators In this chapter, we study the stability (or lack thereof) of various spectral properties under small perturbations. Here s the type of situation we have in mind:

More information

A characterization of the two weight norm inequality for the Hilbert transform

A characterization of the two weight norm inequality for the Hilbert transform A characterization of the two weight norm inequality for the Hilbert transform Ignacio Uriarte-Tuero (joint works with Michael T. Lacey, Eric T. Sawyer, and Chun-Yen Shen) September 10, 2011 A p condition

More information

Eigenvalues and eigenfunctions of the Laplacian. Andrew Hassell

Eigenvalues and eigenfunctions of the Laplacian. Andrew Hassell Eigenvalues and eigenfunctions of the Laplacian Andrew Hassell 1 2 The setting In this talk I will consider the Laplace operator,, on various geometric spaces M. Here, M will be either a bounded Euclidean

More information

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves Chapter 3 Riemannian Manifolds - I The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves embedded in Riemannian manifolds. A Riemannian manifold is an abstraction

More information

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint

More information

Lebesgue Measure on R n

Lebesgue Measure on R n 8 CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets

More information

Elements of Convex Optimization Theory

Elements of Convex Optimization Theory Elements of Convex Optimization Theory Costis Skiadas August 2015 This is a revised and extended version of Appendix A of Skiadas (2009), providing a self-contained overview of elements of convex optimization

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Journal of Mathematical Analysis and Applications. Properties of oblique dual frames in shift-invariant systems

Journal of Mathematical Analysis and Applications. Properties of oblique dual frames in shift-invariant systems J. Math. Anal. Appl. 356 (2009) 346 354 Contents lists available at ScienceDirect Journal of Mathematical Analysis and Applications www.elsevier.com/locate/jmaa Properties of oblique dual frames in shift-invariant

More information

DEVELOPMENT OF MORSE THEORY

DEVELOPMENT OF MORSE THEORY DEVELOPMENT OF MORSE THEORY MATTHEW STEED Abstract. In this paper, we develop Morse theory, which allows us to determine topological information about manifolds using certain real-valued functions defined

More information