Interpolation via weighted l 1 -minimization Holger Rauhut RWTH Aachen University Lehrstuhl C für Mathematik (Analysis) Matheon Workshop Compressive Sensing and Its Applications TU Berlin, December 11, 213 Joint work with Rachel Ward (University of Texas at Austin)
Function interpolation Aim Given a function f : D C on a domain D reconstruct or interpolate f from sample values f (t 1 ),..., f (t m ).
Function interpolation Aim Given a function f : D C on a domain D reconstruct or interpolate f from sample values f (t 1 ),..., f (t m ). Approaches (Linear) polynomial interpolation assumes (classical) smoothness in order to achieve error rates works with special interpolation points (e.g. Chebyshev points). Compressive sensing reconstruction nonlinear assumes sparsity (or compressibility) of a series expansion in terms of a certain basis (e.g. trigonometric bases) fewer (random!) sampling points than degrees of freedom
Function interpolation Aim Given a function f : D C on a domain D reconstruct or interpolate f from sample values f (t 1 ),..., f (t m ). Approaches (Linear) polynomial interpolation assumes (classical) smoothness in order to achieve error rates works with special interpolation points (e.g. Chebyshev points). Compressive sensing reconstruction nonlinear assumes sparsity (or compressibility) of a series expansion in terms of a certain basis (e.g. trigonometric bases) fewer (random!) sampling points than degrees of freedom This talk: Combine sparsity and smoothness!
A classical interpolation result C r ([, 1] d ): r-times continuously differentiable periodic functions Existence of set of sampling points t 1,..., t m and linear reconstruction operator R : C m C r ([, 1] d ) such that for every f C r ([, 1] d ) the approximation f = R(f (t 1 ),..., f (t m )) satisfies the optimal error bound f f Cm r/d f C r.
A classical interpolation result C r ([, 1] d ): r-times continuously differentiable periodic functions Existence of set of sampling points t 1,..., t m and linear reconstruction operator R : C m C r ([, 1] d ) such that for every f C r ([, 1] d ) the approximation f = R(f (t 1 ),..., f (t m )) satisfies the optimal error bound f f Cm r/d f C r. Curse of dimension: Need about m C f ε d/r samples for achieving error ε < 1. Exponential scaling in d cannot be avoided using only smoothness (DeVore, Howard, Micchelli 1989 Novak, Wozniakowski 29).
Sparse representation of functions D: domain endowed with a probability measure ν ψ j : D C, j Γ (finite or infinite) {ψ j } j Γ orthonormal system: D ψ j (t)ψ k (t)dν(t) = δ j,k, We consider functions of the form f (t) = j Γ x j ψ j (t) j, k Γ f is called s-sparse if x := {l : x l } s and compressible if the error of best s-term approximation error is small. σ s (f ) q := σ s (x) q := inf x z q ( < q ) z: z s Aim: Reconstruction of sparse/compressible f from samples!
Fourier Algebra and Compressibility Fourier algebra A p = {f C[, 1] : f p < }, < p 1, f p := x p = ( j Z x j p ) 1/p, f (t) = j Z x j ψ j (t). Motivating example trigonometric system: D = [, 1], ν Lebesgue measure, ψ j (t) = e 2πijt, t [, 1], j Z.
Fourier Algebra and Compressibility Fourier algebra A p = {f C[, 1] : f p < }, < p 1, f p := x p = ( j Z x j p ) 1/p, f (t) = j Z x j ψ j (t). Motivating example trigonometric system: D = [, 1], ν Lebesgue measure, ψ j (t) = e 2πijt, t [, 1], j Z. Compressibility via Stechkin estimate σ s (f ) q = σ s (x) q s 1/q 1/p x p = s 1/q 1/p f p, p < q. Since f := sup x [,1] f (t) f 1, the best s-term approximation f = j S x jψ j, S = s, satisfies f f s 1 1/p f p, p < 1.
Trigonometric system: smoothness and weights ψ j (t) = e 2πijt, j Z, t [, 1] Derivatives satisfy ψ j = 2π j, j Z. For f (t) = j x jψ j (t) we have f + f = j x j ψ j + j x j ψ j x j ( ψ j + ψ j ) j Z = x j (1 + 2π j ) =: x ω,1. j Z
Trigonometric system: smoothness and weights ψ j (t) = e 2πijt, j Z, t [, 1] Derivatives satisfy ψ j = 2π j, j Z. For f (t) = j x jψ j (t) we have f + f = j x j ψ j + j x j ψ j x j ( ψ j + ψ j ) j Z = x j (1 + 2π j ) =: x ω,1. j Z Weights model smoothness! Combine with sparsity (compressibility) weighted l p -spaces with < p 1
Weighted norms and weighted sparsity For a weight ω = (ω j ) j Γ with ω j 1, introduce x ω,p := ( j Γ x j p ω 2 p j ) 1/p, < p 2. Special cases: x ω,1 = j Γ x j ω j, x ω,2 = x 2 Weighted sparsity x ω, := j:x j x is called weighted s-sparse if x ω, s. ω 2 j
Weighted norms and weighted sparsity For a weight ω = (ω j ) j Γ with ω j 1, introduce x ω,p := ( j Γ x j p ω 2 p j ) 1/p, < p 2. Special cases: x ω,1 = j Γ x j ω j, x ω,2 = x 2 Weighted sparsity x ω, := j:x j x is called weighted s-sparse if x ω, s. ω 2 j Note: If x ω, s then x ω,1 s x ω,2.
Weighted best approximation Weighted best s-term approximation error σ s (x) ω,p := inf z: z ω, s x z ω,p
Weighted best approximation Weighted best s-term approximation error σ s (x) ω,p := inf z: z ω, s x z ω,p Theorem (Weighted Stechkin estimate) For a weight ω, a vector x, < p < q 2 and s > ω 2, σ s (x) ω,q (s ω 2 ) 1/q 1/p x ω,p.
Weighted best approximation Weighted best s-term approximation error σ s (x) ω,p := inf z: z ω, s x z ω,p Theorem (Weighted Stechkin estimate) For a weight ω, a vector x, < p < q 2 and s > ω 2, σ s (x) ω,q (s ω 2 ) 1/q 1/p x ω,p. If s 2 ω 2, say, then σ s (x) ω,q C p,q s 1/q 1/p x ω,p, C p,q = 2 1/p 1/q.
Weighted best approximation Weighted best s-term approximation error σ s (x) ω,p := inf z: z ω, s x z ω,p Theorem (Weighted Stechkin estimate) For a weight ω, a vector x, < p < q 2 and s > ω 2, σ s (x) ω,q (s ω 2 ) 1/q 1/p x ω,p. If s 2 ω 2, say, then σ s (x) ω,q C p,q s 1/q 1/p x ω,p, C p,q = 2 1/p 1/q. Lower bound on s natural because otherwise the single element set S = {j} with ω j = ω not allowed as support set.
(Weighted) Compressive Sensing Recover a weighted s-sparse (or weighted-compressible) vector x from measurements y = Ax, where A C m N with m < N.
(Weighted) Compressive Sensing Recover a weighted s-sparse (or weighted-compressible) vector x from measurements y = Ax, where A C m N with m < N. Weighted l 1 -minimization min z C N z ω,1 subject to Az = y Noisy version min z C N z ω,1 subject to Az y 2 η
Weighted restricted isometry property (WRIP) Definition The weighted restricted isometry constant δ ω,s of a matrix A C m N is defined to be the smallest constant such that (1 δ ω,s ) x 2 2 Ax 2 2 (1 + δ ω,s ) x 2 2 for all x C N with x ω, = l:x l ω2 j s.
Weighted restricted isometry property (WRIP) Definition The weighted restricted isometry constant δ ω,s of a matrix A C m N is defined to be the smallest constant such that (1 δ ω,s ) x 2 2 Ax 2 2 (1 + δ ω,s ) x 2 2 for all x C N with x ω, = l:x l ω2 j s. Since ω j 1 by assumption, the classical RIP implies the WRIP, δ ω,s δ 1,s = δ s. Alternative name: Weighted Uniform Uncertainty Principle (WUUP)
Recovery via weighted l 1 -minimization Theorem Let A C m N and s 2 ω 2 such that δ ω,3s < 1/3. For x C N and y = Ax + e with e 2 η let x be a minimizer of Then min z ω,1 subject to Az y 2 η. x x ω,1 C 1 σ s (x) ω,1 + D 1 sη, x x 2 C 2 σ s (x) ω,1 s + D 2 η.
Function interpolation {ψ j } j Γ, finite ONS on D with respect to probability measure ν. Given samples y 1 = f (t 1 ),..., y m = f (t m ) of f (t) = j Γ x jψ j (t) reconstruction amounts to solving y = Ax with sampling matrix A C m N, N = Γ, given by A lk = ψ k (t l ). Use weighted l 1 -minimization to recover weighted-sparse or weighted-compressible x when m < Γ. Choose t 1,..., t m i.i.d. at random according to ν in order to analyze the WRIP of the sampling matrix.
Weighted RIP of random sampling matrix ψ j : D C, j Γ, N = Γ <, ONS w.r.t. prob. measure ν. Weight ω with ψ j ω j. Sampling points t 1,..., t m taken i.i.d. at random according to ν. Random sampling matrix A C m N with entries A lj = ψ j (t l ). Theorem (R, Ward 13) If m Cδ 2 s max{ln 3 (s) ln(n), ln(ε 1 )} then the weighted restricted isometry constant of 1 m A satisfies δ ω,s δ with probability at least 1 ε. Generalizes previous results (Candès, Tao Rudelson, Vershynin Rauhut) for systems with ψ j K for all j Γ, where the sufficient condition is m Cδ 2 K 2 s ln 3 (s) ln(n).
Abstract weighted function spaces A ω,p = {f : f (t) = j Γ x j ψ j (t), f ω,p := x ω,p < } If ω j ψ j then f f ω,1. If ω j ψ j + ψ j (when D R) then f + f f ω,1, and so on...
Interpolation via weighted l 1 -minimization Theorem Assume N = Γ <, ω j ψ j and < p < 1. Choose t 1,..., t m i.i.d. at random according to ν where m Cs log 3 (s) log(n) for s 2 ω 2. Then with probability at least 1 N log3 (s) the following holds for each f A ω,p. Let x be the solution of min z C N z ω,1 subject to j Γ z j ψ j (t l ) = f (t l ), l = 1,..., m and set f (t) = j Γ x j ψ j(t). Then f f f f ω,1 C 1 s 1 1/p f ω,p, f f L 2 ν C 2 s 1/2 1/p f ω,p.
Error bound in terms of the number of samples Solving m s log 3 (s) log(n) for s and inserting into error bounds yields ( ) m 1 1/p f f f f ω,1 C 1 log(n) 4 f ω,p, ( ) m 1/2 1/p f f L 2 ν C 2 log(n) 4 f ω,p.
Quasi-interpolation in infinite-dimensional spaces Γ =, lim j ω j = and ω j ψ j. Theorem Let f A ω,p for some < p < 1, and set Γ s = {j Γ : ω 2 j s/2} for some s. Choose t 1,..., t m i.i.d. at random according to ν where m Cs max{log 3 (s) log( Γ s ), log(ε 1 )}. With η = f ω,p / s let x be the solution to min z ω,1 subject to (f (t l ) z j ψ j (t l )) m z C Γs l=1 2 η m. j Γ s and put f (t) = j Γ s x j ψ j(t). Then with probability exceeding 1 ε f f f f ω,1 C 1 s 1 1/p f ω,p, f f L 2 ν C 2 s 1/2 1/p f ω,p. Ideally Γ s Cs α. Then m C α s ln 4 (s) samples are sufficient.
Numerical example I for the trigonometric system 1.8.6.4.2 Original function 1.5.5 1 Runge s example f (x) = 1 1+25x 2 Weights: w j = 1 + j 2 Interpolation points chosen uniformly at random from [ 1, 1]. 1 Least squares solution 1 Unweighted l1 minimizer 1 Weighted l1 minimizer.5.8.6.8.6.4.2.4.2.5 1.5.5 1 1.5.5 1 1.5.5 1.5 Residual error.5 Residual error.5 Residual error.5 1.5.5 1.5 1.5.5 1.5 1.5.5 1
Numerical example II for the trigonometric system 1.8.6.4.2 Original function 1.5.5 1 f (x) = x Weights: w j = 1 + j. 2 Interpolation points chosen uniformly at random from [ 1, 1]. 1 Least squares solution 1 Unweighted l1 minimizer 1 Weighted l1 minimizer.8.8.8.6.6.6.4.4.4.2.2.2 1.5.5 1 1.5.5 1 1.5.5 1.5 Residual error.5 Residual error.5 Residual error.5 1.5.5 1.5 1.5.5 1.5 1.5.5 1
Chebyshev polynomials Chebyshev-polynomials C j, j =, 1, 2,... 1 1 dx C j (x)c k (x) π 1 x = δ j,k, j, k N, 2 C = 1 and C j = 2. Stable recovery of polynomials that are s-sparse in the Chebyshev system via unweighted l 1 -minimization from m Cs log 3 (s) log(n) samples drawn i.i.d. from the Chebyshev measure dx π 1 x 2. Also error guarantees in ω,1 via l ω,1 -minimization.
Legendre polynomials Legendre polynomials L j, j =, 1, 2,... 1 1 L j (x)l k (x)dx = δ j,k, L j j + 1, j, k N. 2 1 Unweighted Case: K = max j=,...,n 1 L j = N leads to bound m CK 2 s log 3 (s) log(n) = CNs log 3 (s) log(n).
Preconditioning Preconditioned system Q j (x) = v(x)l j (x) with v(x) = (π/2) 1/2 (1 x 2 ) 1/4 satisfies 1 1 dx Q j (x)q k (x) π 1 x = δ j,k, Q j 3, j, k N. 2 Stable recovery of polynomials that are s-sparse in the Chebyshev system via unweighted l 1 -minimization from m Cs log 3 (s) log(n) samples drawn i.i.d. from the Chebyshev measure dx π 1 x 2.
Preconditioning Preconditioned system Q j (x) = v(x)l j (x) with v(x) = (π/2) 1/2 (1 x 2 ) 1/4 satisfies 1 1 dx Q j (x)q k (x) π 1 x = δ j,k, Q j 3, j, k N. 2 Stable recovery of polynomials that are s-sparse in the Chebyshev system via unweighted l 1 -minimization from m Cs log 3 (s) log(n) samples drawn i.i.d. from the Chebyshev measure dx π 1 x 2. Alternatively, use weight ω j = j + 1 and uniform or Chebyshev measure.
Numerical example for Chebyshev polynomials 1.8.6.4.2 Original function f (x) = 1 1+25x 2 Weights: w j = 1 + j. 2 Interpolation points chosen i.i.d. at random according to Chebyshev measure 1.5.5 1 dν(x) = dx π 1 x 2. 1 Least squares solution 1 Unweighted l1 minimizer 1 Weighted l1 minimizer.8.5.5.6.4.2.5 1.5.5 1.5 1.5.5 1 1.5.5 1.5 Residual error.5 Residual error.5 Residual error.5 1.5.5 1.5 1.5.5 1.5 1.5.5 1
Numerical example for Legendre polynomials 1.8.6.4.2 Original function 1.5.5 1 f (x) = 1 1+25x 2 Weights: w j = 1 + j. 2 Interpolation points chosen i.i.d. at random according to Chebyshev measure. 1 Least squares solution 1 Unweighted l1 minimizer 1 Weighted l1 minimizer.5.5.8.6.4.2.5 1.5.5 1.5 1.5.5 1 1.5.5 1.5 Residual error.5 Residual error.5 Residual error.5 1.5.5 1.5 1.5.5 1.5 1.5.5 1
Spherical harmonics Y k l, k l k, k N : orthonormal system in L 2 (S 2 ) 1 4π 2π π Yl k k (φ, θ)yl (θ, φ) sin(θ)dφdθ = δ l,l δ k,k (φ, θ) [, 2π) [, π): spherical coordinates x cos(θ) sin(φ) y = sin(θ) sin(φ) S 2 z cos(φ) With ultraspherical polynomials p α n : Yl k (φ, θ) = eikφ (sin θ) k p k l k (cos θ), (φ, θ) [, 2π) [, π)
Unweighted RIP for Spherical Harmonics L -bound: Y k l k 1/2
Unweighted RIP for Spherical Harmonics L -bound: Y k l k 1/2 Preconditioning I (Krasikov 8) With w(θ, φ) = sin(θ) 1/2 wy k l Ck 1/4.
Unweighted RIP for Spherical Harmonics L -bound: Y k l k 1/2 Preconditioning I (Krasikov 8) With w(θ, φ) = sin(θ) 1/2 wy k l Ck 1/4. Preconditioning II (Burq, Dyatkov, Ward, Zworski 12) With v(θ, φ) = sin 2 (θ) cos(θ) 1/6, vy k l Ck 1/6. unweighted RIP for associated preconditioned random sampling matrix 1 m A C m N with sampling points drawn according to ν(dθ, dφ) = v 2 (θ, φ) sin(θ)dθdφ = tan(θ) 1/3 dθdφ with high probability if m CsN 1/6 log 4 (N).
Weighted RIP for spherical harmonics Y k l, k l k, k N : spherical harmonics. Recall: L -bound: Yl k k 1/2 Preconditioned L -bound for v(θ, φ) = sin 2 (θ) cos(θ) 1/6 : vy k l Ck 1/6.
Weighted RIP for spherical harmonics Y k l, k l k, k N : spherical harmonics. Recall: L -bound: Yl k k 1/2 Preconditioned L -bound for v(θ, φ) = sin 2 (θ) cos(θ) 1/6 : vy k l Ck 1/6. Weighted RIP: With weights ω k,l k 1/6 the preconditioned random sampling matrix 1 m A C m N satisfies δ ω,s δ with high probability if m Cδ 2 s log(s) log(n).
Comparison of error bounds Error bound for reconstruction of f A ω,p from m Cs log 3 (s) log(n) samples drawn i.i.d. at random from the measure ν(dθ, dφ) = tan(θ) 1/3 via weighted l 1 -minimization: f f f f ω,1 Cs 1 1/p f ω,p, < p < 1. Compare to error estimate for unweighted l 1 -minimization: If m CN 1/6 s log 3 (s) log(n) then f f 1 Cs 1 1/p f p, < p < 1.
Numerical Experiments for Sparse Spherical Harmonic Recovery 1.5.5 Original function unweighted l 1 ω k,l = k 1/6 ω k,l = k 1/2 1 1 1 1 1 Original function f (θ, φ) = 1 θ 2 +1/1
High dimensional function interpolation Tensorized Chebyshev polynomials on D = [ 1, 1] d C k (t) = C k1 (t 1 )C k2 (t 2 ) C kd (t d ), k N d with C k the L 2 -normalized Chebyshev polynomials on [ 1, 1]. Then 1 2 d C k (t)c j (t)dt = δ k,j, j, k N d. [ 1,1] d Expansions f (t) = k N d x kc k (t) with x p < for < p < 1 and large d (even d = ) appear in parametric PDE s (Cohen, DeVore, Schwab 211,...).
(Weighted) sparse recovery for tensorized Chebyshev polynomials L -bound: C k = 2 k /2. Curse of dimension: Classical RIP bound requires m C2 d s log 3 (s) log(n).
(Weighted) sparse recovery for tensorized Chebyshev polynomials L -bound: C k = 2 k /2. Curse of dimension: Classical RIP bound requires m C2 d s log 3 (s) log(n). Weights: ω j = 2 k /2. Weighted RIP bound: m Cs log 3 (s) log(n) Approximate recovery requires x l ω,p!
Comparison Classical Interpolation vs. Weighted l 1 -minimization Classical bound f f Cm r/d f C r Interpolation via l 1 -minimization ( ) f f m 1 1/p C ln 4 f ω,p, < p < 1. (m) Better rate if 1/p 1 > r/d, i.e., p < 1 r/d + 1. For instance, when r = d, then p < 1/2 is sufficient.
Avertisement S. Foucart, H. Rauhut, A Mathematical Introduction to Compressive Sensing Applied and Numerical Harmonic Analysis, Birkhäuser, 213.
Thank you!