Interpolation via weighted l 1 minimization Rachel Ward University of Texas at Austin December 12, 2014 Joint work with Holger Rauhut (Aachen University)
Function interpolation Given a function f : D C on a domain D, interpolate or approximate f from sample values y 1 = f (u 1 ),..., y m = f (u m ).
Function interpolation Given a function f : D C on a domain D, interpolate or approximate f from sample values y 1 = f (u 1 ),..., y m = f (u m ). Assume the form f (u) = j Γ x j ψ j (u), Γ = N Approaches: Standard interpolation: Choose m = N. Find appropriate Γ and sampling points u 1, u 2,..., u m. Least squares regression: Choose N < m; minimize y Ax 2 where A l,j = ψ j (u l ) is the sampling matrix. Compressive sensing methods: Choose N > m. Exploit approximate sparsity of coefficient vector x to solve the underdetermined system y = Ax.
Orthonormal systems L 2-normalized Legendre polynomials D: domain endowed with a probability measure ν. ψ j : D C, j Γ (finite or infinite) {ψ j } j Γ is an orthonormal system: D ψ j(t)ψ k (t)dν(t) = δ j,k
Orthonormal systems L 2-normalized Legendre polynomials D: domain endowed with a probability measure ν. ψ j : D C, j Γ (finite or infinite) {ψ j } j Γ is an orthonormal system: D ψ j(t)ψ k (t)dν(t) = δ j,k Examples: Trigonometric system: ψ j (t) = e 2πijt, D = [0, 1], dν(t) = dt, ψ j 1.
Orthonormal systems L 2-normalized Legendre polynomials D: domain endowed with a probability measure ν. ψ j : D C, j Γ (finite or infinite) {ψ j } j Γ is an orthonormal system: D ψ j(t)ψ k (t)dν(t) = δ j,k Examples: Trigonometric system: ψ j (t) = e 2πijt, D = [0, 1], dν(t) = dt, ψ j 1. L 2 -normalized Legendre polynomials L j : D = [ 1, 1], dν(t) = dt, L j = 2j + 1.
Orthonormal systems L 2-normalized Legendre polynomials D: domain endowed with a probability measure ν. ψ j : D C, j Γ (finite or infinite) {ψ j } j Γ is an orthonormal system: D ψ j(t)ψ k (t)dν(t) = δ j,k Tensor products of Legendre polynomials: D = [ 1, 1] d, j = (j 1, j 2,..., j d ), L j = d k=1 L j k, L j = d k=1 2jk + 1. In high-dimensional problems, smoothness is not enough to avoid curse of dimension too local! We will combine smoothness and sparsity.
Smoothness and weights In general, f L + f L promotes smoothness Consider ψ j (t) = e 2πijt, j Z, t [0, 1] Derivatives satisfy ψ j = 2π j, j Z. For f (t) = j x jψ j (t), f L + f L = j x j ψ j L + j x j ψ j x j ( ψ j L + ψ j L ) j Z = x j (1 + 2π j ) =: x ω,1. j Z
Smoothness and weights In general, f L + f L promotes smoothness Consider ψ j (t) = e 2πijt, j Z, t [0, 1] Derivatives satisfy ψ j = 2π j, j Z. For f (t) = j x jψ j (t), f L + f L = j x j ψ j L + j x j ψ j x j ( ψ j L + ψ j L ) j Z = x j (1 + 2π j ) =: x ω,1. j Z Weighted l 1 -coefficient norm promotes smoothness. It also promotes sparsity!
Weighted norms and weighted sparsity Weighted l 1 norm: pay a higher price to include certain indices x ω,1 = j Γ ω j x j New: weighted sparsity: x ω,0 := j:x j 0 ω 2 j x is called weighted s-sparse if x ω,0 s. Weighted best s-term approximation error: σ s (x) ω,1 := inf z: z ω,0 s x z ω,1
(Weighted) Compressive Sensing Recover a weighted s-sparse (or weighted-compressible) vector x from measurements y = Ax, where A C m N with m < N. Weighted l 1 -minimization min z C N z ω,1 subject to Az = y Noisy version min z C N z ω,1 subject to Az y 2 η
(Weighted) Compressive Sensing Recover a weighted s-sparse (or weighted-compressible) vector x from measurements y = Ax, where A C m N with m < N. Weighted l 1 -minimization min z C N z ω,1 subject to Az = y Noisy version min z C N z ω,1 subject to Az y 2 η Keep in mind sampling matrix: ψ 1 (u 1 ) ψ 2 (u 1 )...... ψ N (u 1 ) ψ 1 (u 2 ) ψ 2 (u 2 )...... ψ N (u 2 ) A =... ψ 1 (u m ) ψ 2 (u m )...... ψ N (u m )
Numerical example - comparing different reconstructions 1 Original function Least squares D=15 1 Exact interpolation, D=30 1 0.5 0.5 0.5 0 1 0 1 0 1 0 1 0 1 0 1 Unweighted l1 minimizer 1 Weighted l1 minimizer, j =j 1/2 1 0.5 0.5 0 1 0 1 0 1 0 1 f (u) = 1. Draw m = 30 sampling points u 1+25u 2 l i.i.d. from uniform measure on [ 1, 1]. Interpolate (exactly or approximately) the samples y l = f (u l ) by various choices of {x j } in f # (u) = 80 j=0 x j e πiju *Stability of unweighted l 1 minimization given exact sparsity: Rauhut, W. 09 *Stability of least squares regression: Cohen, Davenport, Leviatan 2011
Numerical example - comparing different reconstructions Different trials correspond to different random draws of m = 30 sampling points from uniform measure on [ 1, 1].
Numerical example - comparing different reconstructions Same experiment, but now interpolating/approximating by Legendre polynomials, and sampling points are from Chebyshev measure on [ 1, 1], dν(u) = du π 1 u 2.
What is going on? Runge s function and its Legendre polynomial coefficients Compare coefficient indices picked up by various reconstruction methods: Least squares unweighted l 1 minimization weighted l 1 minimization
Back to Compressive Sensing setting Recover a weighted s-sparse (or weighted-compressible) vector x from measurements y = Ax, where A C m N with m < N. Weighted l 1 -minimization min z C N N ω j z j j=1 subject to Az = y Keep in mind sampling matrix: ψ 1 (u 1 ) ψ 2 (u 1 )...... ψ N (u 1 ) ψ 1 (u 2 ) ψ 2 (u 2 )...... ψ N (u 2 ) A =... ψ 1 (u m ) ψ 2 (u m )...... ψ N (u m )
Weighted restricted isometry property (WRIP) Definition (with H. Rauhut 13) The weighted restricted isometry constant δ ω,s of a matrix A C m N is defined to be the smallest constant such that (1 δ ω,s ) x 2 2 Ax 2 2 (1 + δ ω,s ) x 2 2 for all x C N with x ω,0 = l:x l 0 ω2 j s.
Weighted restricted isometry property (WRIP) Definition (with H. Rauhut 13) The weighted restricted isometry constant δ ω,s of a matrix A C m N is defined to be the smallest constant such that (1 δ ω,s ) x 2 2 Ax 2 2 (1 + δ ω,s ) x 2 2 for all x C N with x ω,0 = l:x l 0 ω2 j s. Unweighted uniform restricted isometry property (Candès, Tao 05): ω 1. Since we assume ω j 1, WRIP is weaker than uniform RIP Related: Model-based compressive sensing (Baraniuk, Cevher, Duarte, Hedge, Wakin 10)
Weighted restricted isometry property (WRIP) Definition (with H. Rauhut 13) The weighted restricted isometry constant δ ω,s of a matrix A C m N is defined to be the smallest constant such that (1 δ ω,s ) x 2 2 Ax 2 2 (1 + δ ω,s ) x 2 2 for all x C N with x ω,0 = l:x l 0 ω2 j s. Weights allow us to analyze sampling matrices from function systems with unbounded norm. Uniform RIP does not hold for such matrices!
Weighted RIP of random sampling matrix ψ j : D C, orthonormal system w.r.t. prob. measure ν. and ψ j ω j. Sampling points u 1,..., u m taken i.i.d. at random according to ν. Random sampling matrix A C m N with entries A lj = ψ j (u l ).
Weighted RIP of random sampling matrix ψ j : D C, orthonormal system w.r.t. prob. measure ν. and ψ j ω j. Sampling points u 1,..., u m taken i.i.d. at random according to ν. Random sampling matrix A C m N with entries A lj = ψ j (u l ). Fix s, and choose N so that ω 1, ω 2,..., ω N s/2. Theorem (with H. Rauhut, 13) If m Cδ 2 s log 3 (s) log(n) then the weighted restricted isometry constant of 1 m A satisfies δ ω,s δ with high probability.
Weighted RIP of random sampling matrix ψ j : D C, orthonormal system w.r.t. prob. measure ν. and ψ j ω j. Sampling points u 1,..., u m taken i.i.d. at random according to ν. Random sampling matrix A C m N with entries A lj = ψ j (u l ). Fix s, and choose N so that ω 1, ω 2,..., ω N s/2. Theorem (with H. Rauhut, 13) If m Cδ 2 s log 3 (s) log(n) then the weighted restricted isometry constant of 1 m A satisfies δ ω,s δ with high probability. Generalizes previous results (Candès, Tao, Rudelson, Vershynin) for uniformly bounded systems, ψ j K for all j.
Recovery via weighted l 1 -minimization Theorem (with H. Rauhut) Let A C m N have the WRIP with weights (ω) and δ ω,3s < 1/3. For x C N and y = Ax + e with e 2 η let x be a minimizer of Then min z ω,1 subject to Az y 2 η. x x ω,1 C 1 σ s (x) ω,1 + D 1 sη Generalizes unweighted l 1 minimization results (Candès, Romberg, Tao 06, Donoho 06).
Recovery via weighted l 1 -minimization Theorem (with H. Rauhut) Let A C m N have the WRIP with weights (ω) and δ ω,3s < 1/3. For x C N and y = Ax + e with e 2 η let x be a minimizer of Then min z ω,1 subject to Az y 2 η. x x ω,1 C 1 σ s (x) ω,1 + D 1 sη Generalizes unweighted l 1 minimization results (Candès, Romberg, Tao 06, Donoho 06). Should be of independent interest in structured sparse recovery problems / sparse recovery problems with nonuniform priors on support
Prior work on weighted l 1 minimization Weighted l 1 norm: x ω,1 = j Γ ω j x j Many previous works on the analysis of weighted l 1 minimization! Focused on the finite-dimensional setting, and with analysis based on unweighted sparsity von Borries, Miosso, and Potes 2007 Vaswani and Lu 2009, Jacques 2010, Xu 2010 Khajehnejad, Xu, Avestimehr, and Hassibi (2009 & 2010) Friedlander, Mansour, Saab, and Yilmaz 2012, Misra and Parrilo 2013 Peng, Hampton, Doostan 2013 - sparse polynomial chaos approximation of high-dimensional stochastic functions
Back to function interpolation Suppose that Γ = N <. Given samples y 1 = f (u 1 ),..., y m = f (u m ) of f (u) = j Γ x jψ j (u) reconstruction amounts to solving y = Ax where A C m N is the sampling matrix A l,j = ψ j (u l )
Back to function interpolation Suppose that Γ = N <. Given samples y 1 = f (u 1 ),..., y m = f (u m ) of f (u) = j Γ x jψ j (u) reconstruction amounts to solving y = Ax where A C m N is the sampling matrix A l,j = ψ j (u l ) Previous analysis suggests: use weighted l 1 -minimization with weights ω j ψ j L or ω j ψ j L to recover weighted-sparse or weighted-compressible x when m < N. Choose u 1,..., u m i.i.d. at random according to ν ψ in order to analyze WRIP of sampling matrix. x will not be exactly sparse. Measure residual error f f # in which norm? Recall g L + g L g ω,1 if ω j ψ j L
Interpolation via weighted l 1 minimization Suppose f (u) = j Γ x jψ j (u), Γ = N < Set weights ω j > ψ j
Interpolation via weighted l 1 minimization Suppose f (u) = j Γ x jψ j (u), Γ = N < Set weights ω j > ψ j Theorem From m s log 3 (s) log(n) samples y 1 = f (u 1 ),..., y m = f (u m ) where u j ν ψ, if x is the solution to min z C Γ z ω,1 subject to Az = y and f (u) = j Γ x j ψ j(u),
Interpolation via weighted l 1 minimization Suppose f (u) = j Γ x jψ j (u), Γ = N < Set weights ω j > ψ j Theorem From m s log 3 (s) log(n) samples y 1 = f (u 1 ),..., y m = f (u m ) where u j ν ψ, if x is the solution to min z C Γ z ω,1 subject to Az = y and f (u) = j Γ x j ψ j(u), then with high probability, f f f f ω,1 C 1 σ(f ) ω,1
Interpolation via weighted l 1 minimization Suppose f (u) = j Γ x jψ j (u), Γ = N < Set weights ω j > ψ j Theorem From m s log 3 (s) log(n) samples y 1 = f (u 1 ),..., y m = f (u m ) where u j ν ψ, if x is the solution to min z C Γ z ω,1 subject to Az = y and f (u) = j Γ x j ψ j(u), then with high probability, f f f f ω,1 C 1 σ(f ) ω,1 More realistic: Γ =. How to pass to a finite dimensional approximation in a principled way?
Weighted function spaces A ω,p = {f : f (u) = j Γ x j ψ j (u), f ω,p := x ω,p < } Interesting range: 0 p 1 A ω,p A ω,q if p < q Solving m s log 3 (s) log(n) for s, a Stechkin-type error bound yields ( ) log(n) f f L f f 4 1/p 1 ω,1 C 1 f ω,p m
Approximation in infinite-dimensional spaces Suppose Γ =, lim j ω j = and ω j > ψ j.
Approximation in infinite-dimensional spaces Suppose Γ =, lim j ω j = and ω j > ψ j. Theorem Suppose f A ω,p for some 0 < p < 1. Fix s, and set Γ s = {j Γ : ωj 2 s/2}.
Approximation in infinite-dimensional spaces Suppose Γ =, lim j ω j = and ω j > ψ j. Theorem Suppose f A ω,p for some 0 < p < 1. Fix s, and set Γ s = {j Γ : ωj 2 s/2}. Draw m Cs log 3 (s) log( Γ s ) samples y 1 = f (u 1 ),..., y m = f (u m ) according to ν ψ, and let x be the solution to min z C Γs z ω,1 subject to Ax y 2 m/s f f Γs ω,1 Put f = j Γ s x j ψ j.
Approximation in infinite-dimensional spaces Suppose Γ =, lim j ω j = and ω j > ψ j. Theorem Suppose f A ω,p for some 0 < p < 1. Fix s, and set Γ s = {j Γ : ωj 2 s/2}. Draw m Cs log 3 (s) log( Γ s ) samples y 1 = f (u 1 ),..., y m = f (u m ) according to ν ψ, and let x be the solution to min z C Γs z ω,1 subject to Ax y 2 m/s f f Γs ω,1 Put f = j Γ s x j ψ j. Then with high probability, ( log f f L f f 4 ) 1/p 1 (N) ω,1 C 1 f ω,p m
Approximation in infinite-dimensional spaces Suppose Γ =, lim j ω j = and ω j > ψ j. Theorem Suppose f A ω,p for some 0 < p < 1. Fix s, and set Γ s = {j Γ : ωj 2 s/2}. Draw m Cs log 3 (s) log( Γ s ) samples y 1 = f (u 1 ),..., y m = f (u m ) according to ν ψ, and let x be the solution to min z C Γs z ω,1 subject to Ax y 2 m/s f f Γs ω,1 Put f = j Γ s x j ψ j. Then with high probability, ( log f f L f f 4 ) 1/p 1 (N) ω,1 C 1 f ω,p m **Using greedy alternative such as weighted iterative hard thresholding, do not need to know m/s f f Γs ω,1, get same bound
Function approximation in high dimensions Important: For domains D = [ 1, 1] d, don t want number of measurements m in resulting bound to grow much with d.
Function approximation in high dimensions Important: For domains D = [ 1, 1] d, don t want number of measurements m in resulting bound to grow much with d. L 2 -normalized Chebyshev polynomials T j (u) = 2 cos(j arccos u), T 0 (u) 1, j N, Tensorized Chebyshev polynomials T k (u) = d T kj (u j ) = j=1 j supp k T kj (u j ), u = (u j ) j 1, k F, F = {k = (k 1, k 2,...), k j N}}
Function approximation in high dimensions Important: For domains D = [ 1, 1] d, don t want number of measurements m in resulting bound to grow much with d. L 2 -normalized Chebyshev polynomials T j (u) = 2 cos(j arccos u), T 0 (u) 1, j N, Tensorized Chebyshev polynomials T k (u) = d T kj (u j ) = j=1 Product probability measure j supp k T kj (u j ), u = (u j ) j 1, k F, F = {k = (k 1, k 2,...), k j N}} η = j 1 du j π 1 uj 2, D. {T k } k F forms orthonormal basis for L 2 (D, dη).
Sparse recovery for tensorized Chebyshev polynomials L -bound for the tensorized Chebyshev polynomials T k : Choice of weights: T k = 2 k 0/2. ω k = d (k j + 1) 1/2 2 k 0/2 j=1 Encourages both sparse and low-order tensor products
Sparse recovery for tensorized Chebyshev polynomials L -bound for the tensorized Chebyshev polynomials T k : Choice of weights: T k = 2 k 0/2. ω k = d (k j + 1) 1/2 2 k 0/2 j=1 Encourages both sparse and low-order tensor products Subset of indices forms a hyperbolic cross: Λ 0 = {k N d 0 : ω 2 k s} [Kuhn, Sickel, Ulrich, 2014, Cohen, DeVore, Foucart, Rauhut 2011]: Λ 0 e d s 2+log(d)
Apply weighted l 1 theory: Consider a function f = k Λ x k T k on [ 1, 1] d, f A ω,p Fix s, and fix Λ 0 = {k N d : ωk 2 s} Reduced basis Fix number of samples m Cs log 4 (s) log(d) Draw m samples y l = f (u l ) according to Chebyshev measure Let A C m N be sampling matrix with entries A l,k = T k (u l ). Let x # be solution of min x ω,1 subject to Ax y 2 m/s f f Λ0 ω,1 and set f # = k Λ 0 x # k T k.
Apply weighted l 1 theory: Consider a function f = k Λ x k T k on [ 1, 1] d, f A ω,p Fix s, and fix Λ 0 = {k N d : ωk 2 s} Reduced basis Fix number of samples m Cs log 4 (s) log(d) Draw m samples y l = f (u l ) according to Chebyshev measure Let A C m N be sampling matrix with entries A l,k = T k (u l ). Let x # be solution of min x ω,1 subject to Ax y 2 m/s f f Λ0 ω,1 and set f # = k Λ 0 x # k T k. Then with probability exceeding 1 e d log3 (s), ( log f f 4 ) 1/p 1 (s) log(d) L C 1 f ω,p m
Limitations of weighted l 1 approach In the previous example, N = Λ 0 = e d s 2+log 2 (d) Weighted l 1 minimization as a reconstruction method on such large scale problems is impractical Fix s s. Least squares projection onto is faster, but too greedy Can we meet in the middle? Ñ = e d s 2+log 2 (d)
Summary We introduced weighted l 1 minimization for stable and robust function interpolation, as taking into account both sparsity and smoothness present in natural functions of interest Along the way, we extended the notion of restricted isometry property to weighted restricted isometry property, a more mild condition that allows us to treat function systems with increasing norm Weighted l 1 minimization can overcome curse of dimension w.r.t. number of samples in high-dimensional approximation problems.
Extensions: We observe empirically that weighted l 1 minimization is faster than unweighted l 1 minimization. The steeper the weights, the faster. Justificaton? We observe that, with error on measurements y = f (u) + ξ, reconstruction results are similar, provided that the regularization parameter is chosen correctly in regularization methods. Equality constrained l 1 minimization, weighted or not, leads to overfitting. Why?
References Rauhut, Ward. Interpolation via weighted l 1 minimization. Arxiv preprint, 2013. Rauhut, Ward, Sparse Legendre expansions via l 1 -minimization. Journal of approximation theory 164.5 (2012): 517-533. Cohen, Davenport, Leviatan. On the stability and accuracy of least squares approximations. Foundations of computational mathematics, 13 (2013), 819 834. Jo. Iterative Hard Thresholding for weighted sparse approximation. Arxiv preprint, 2014. Burq, Dyatlov, Ward, Zworski. Weighted eigenfunction estimates with applications to compressed sensing. SIAM J. Math. Anal. 44(5) (2012) 3481-3501