Basis Pursuit Denoise with Nonsmooth Constraints

Size: px

Start display at page:

Download "Basis Pursuit Denoise with Nonsmooth Constraints"

Cecily Powers
5 years ago
Views:

1 Basis Pursuit Denoise with Nonsmooth Constraints Robert Baraldi, Rajiv Kumar 2, and Aleksandr Aravkin. Department of Applied Mathematics, University of Washington 2 Formerly School of Earth and Atmospheric Sciences, Georgia Institute of Technology, USA; Currently DownUnder GeoSolutions, Perth, Australia Abstract Level-set optimization formulations with datadriven constraints imize a regularization functional subject to matching observations to a given error level. These formulations are widely used, particularly for matri completion and sparsity promotion in data interpolation and denoising. The misfit level is typically measured in the l 2 norm, or other smooth metrics. In this paper, we present a new fleible algorithmic framework that targets nonsmooth level-set constraints, including l, l, and even l norms. These constraints give greater fleibility for modeling deviations in observation and denoising, and have significant impact on the solution. Measuring error in the l and l norms makes the result more robust to large outliers, while matching many observations eactly. We demonstrate the approach for basis pursuit denoise BPDN problems as well as for etensions of BPDN to matri factorization, with applications to interpolation and denoising of 5D seismic data. The new methods are particularly promising for seismic applications, where the amplitude in the data varies significantly, and measurement noise in low-amplitude regions can wreak havoc for standard Gaussian error models. Inde Terms Nonconve nonsmooth optimization, level-set formulations, basis pursuit denoise, interpolation, seismic data. I. INTRODUCTION Basis Pursuit Denoise BPDN seeks a sparse solution to an under-deteried system of equations that have been corrupted by noise. The classic level-set formulation [22], [2] is given by s.t. A b 2 σ where A : R m n R d is a linear functional taking unknown parameters R m n to observations b R d. Problem is also known as a Morozov formulation in contrast to Ivanov or Tikhonov [7]. The functional A can include a transformation to another domain, including Wavelets, Fourier, or Curvelet coefficients [7], as well as compositions of these transforms with other linear operators such as restriction in interpolation problems. The parameter σ controls the error budget, and is based on an estimate of noise level in the data. Theoretical recovery guarantees for classes of operators A are developed in [6] and [2]. BPDN and the closely related LASSO formulation have applications to compressed sensing [8], [6] and machine learning [], [], as well as to applied domains including MRI [6]. Seismic data is a key use case [3], [5], [9], where acquisition is prohibitively epensive and interpolation techniques are used to fill in data volumes by promoting parsimonious representations in the Fourier [9] or Curvelet [2] domains. Matricization of the data leads to low-rank interpolation schemes [3], [5], [9], [24]. While BPDN uses nonsmooth regularizers including the l norm, nuclear norm, and elastic net, the inequality constraint is ubiquitously smooth, and often taken to be the l 2 norm as in. Prior work, including [23], [3], [9], [2], eploits the smoothness of the inequality constraint in developing algorithms for the problem class. Smooth constraints work well when errors are Gaussian, but this assumption fails for seismic data and is often violated in general. Contributions. The main contribution of this paper is to provide a fast, easily adaptable algorithm to solve non-smooth and nonconve data constraints in general level-set formulations including BPDN, and illustrate the efficacy of the approach using large-scale interpolation and denoising problems. To do this, we etend the universal regularization framework of [26] to level-set formulations with nonsmooth/nonconve constraints. We develop a convergence theory for the optimization approach, and illustrate the practical performance of the new formulations for data interpolation and denoising in both sparse recovery and low-rank matri factorization. Roadmap. The paper proceeds as follows. Section II develops the general relaation framework and approach. Section III specifies this framework to the BPDN setting with nonsmooth, nonconve constraints. In Section IV we apply the approach to sparse signal recovery problem and sparse Curvelet reconstruction. In Section V, we etend the approach to a low-rank interpolation framework, which embeds matri factorization within the BPDN constraint. In Section VI we test the low-rank etension using synthetic eamples and data etracted from a full 5D dataset simulated on comple SEG/EAGE overthrust model. II. NONSMOOTH, NONCONVEX LEVEL-SET FORMULATIONS. We consider the following problem class: φc s.t. ψa b σ, 2 where φ and ψ may be nonsmooth, nonconve, but have welldefined proimity and projection operators: pro αφ y = arg proj ψ σ = arg ψ σ 2α y 2 + φ 2α y 2. Here, C : C m n R c is typically a linear operator that converts to some transform domain, while A : C m n R d is a linear observation operator also acting on. In the contet of interpolation, A is often a restriction operator. 3

2 2 Algorithm Pro-gradient for 4. : Input:, w, w2 2: Initialize: k = 3: while not converged do k α C T C w 4: k+ η 5: w k+ pro η α φ 6: w k+ + A T A w 2 b η 2 w k α η w k C k+ 2 proj σbψ w 2 k α η 2 w 2 A k+ b 7: k k + 8: end while 9: Output: w k, w k 2, k This setting significantly etends that of [2], who assume ψ and φ are conve, C = I, and use the value function vτ = ψa b s.t. φ τ to solve 2 using root-finding to solve vτ = σ. Variational properties of v are fully only understood in the conve setting, and efficient evaluation of vτ requires ψ to be smooth, so that efficient first-order methods are applicable. Here, we develop an approach to solve any problem of type 2, including problems with nonsmooth and nonconve ψ, φ, using only matri vector products with A, A T, C, C T and simple nonlinear operators. In special cases, the approach can also use equation solves to gain significant speedup. The general approach uses the relaation formulation proposed in [26], [25]. We use relaation to split φ, ψ from the linear map A and transformation map C, etending 2 to φw + C w 2 + w 2 A + b 2 2,w,w 2 2η 2η 2 s.t. ψw 2 σ. 4 with w R c and w 2 R d. In contrast to [26], we use a continuation scheme to force η i, in order to solve the original formulation 2. Thus the only eternal algorithmic parameter the scheme requires is σ, which controls the error budget for ψ. There are two algorithms readily available to solve 4. The first is pro-gradient descent, detailed in Algorithm. We let z =, w, w 2, and define Φz = φw + δ ψ σ w 2, where the indicator function δ ψ σ takes the value if ψw 2 σ, and infinity otherwise. Problem 4 can now be written as [ ] η C 2 η I z 2 η2 A z b η2 I +Φz. 5 }{{} fz Applying the pro-gradient descent iteration with step-size α z k+ = pro αφ z k α fz k 6 Algorithm 2 Value-function optimization for 4. : Input:, w, w 2 2: Initialize: k = 3: Define: H = η C T C + η 2 A T A 4: while not converged do 5: k+ H 6: w k+ pro η β φ 7: w k+ η C T w k + η 2 A T b + w2 k w k β η w k C k+ 2 proj σbψ w 2 k β η 2 w 2 A k+ b 8: k k + 9: end while : Output: w k, w k 2, k gives the coordinate updates in Algorithm. Pro-gradient has been analyzed in the general nonconve setting by [4]. However, Problem 5 is the sum of a conve quadratic and a nonconve regularizer. The rate of convergence for this problem class can be quantified, and [26, Theorem 2], reproduced below, will be very useful here. Theorem II. Pro-gradient for Regularized Least Squares. Consider the least squares objective pz := z 2 Az a 2 + Φz. with p bounded below, and Φ potentially nonsmooth, nonconve, and non-finite valued. With step α = σ ma, the iterates 6 satisfy where v k+ 2 A 2 k=,...,n N pz inf p v k = A 2 2I A T A k k+ is a subgradient generalized gradient of p at z k. We can specialize Theorem II. to our case by computing the norm of the least squares system in 5. Corollary II.2 Rate for Algorithm. Theorem II. applied to problem 4 gives with k=,...,n v k+ 2 Cη, η 2, C, A N pz inf p Cη, η 2, C, A = η c + C 2 F + η 2 d + A 2 F. Problem 4 also admits a different optimization strategy, summarized in Algorithm 2. We can formally imize the objective in directly via the gradient, with the imizer given by w = H C T w + A T w 2 + b η η 2 H = η C T C + η 2 A T A

3 3 with w = w, w 2. Plugging this epression back in gives a Algorithm 3 Block-coordinate descent for 4. regularized least squares problem in w alone: : Input:, w, w2 [ ] pw := φw + w,w 2 F w w b 2 2: Initialize: k = 2 s.t. ψw 2 σ 3: Define: H = η C T C + η 2 A T A 4: while not converged do F = η η CH C T I ηη 2 CH A T 5: k+ H η C T w k + η 2 A T b + w2 k η2η AH C T η2 I η AH A T 6: w k+ pro φ C k+ [ ηη 2 CH A T ] 7: w k+ 2 proj b σbψ A k+ b b = 8: η2 η AH A T. k k + I b 9: end while 7 : Output: w, k w2, k k Pro-gradient applied to the value function pw in 7 with step β gives the iteration w + = pro β Φ w k βf T Fw b 8 This iteration, as formally written, requires forg and applying the system F in 7 at each iteration. In practice we compute the w update on the fly, as detailed in Algorithm 2. The equivalence of Algorithm 2 to iteration 8 comes from the following derivative formula for value functions [5]: F T Fw b = η C T Cw w is also differentiable, with and hence Qw = D qdw, lip Q D T D 2 = ma,. η η 2 + η 2 A T Aw w 2 + b. In order to compute β, and apply Theorem II., we first prove the following lemma: Lemma II.3 Bound on F T F 2. The operator norm F T F 2 is bounded above by ma η, η 2. Proof. Considering the function Fw b 2 = C w 2 + w 2 A + b 2 2, 2η 2η }{{ 2 } Q,w we know that the gradient is given by F T Fw b, and any Lipschitz bound L gives F T Fw F T Fw 2 L w w 2, which means F T F 2 L. On the other hand, we can write the right hand side as where and qz, = 2 z Qw, = qdw, D = [ 2 η C 2 η 2 A ] [ ] η. η [ ] 2 b Using Theorem of [25] with gz =, we have that the value function qz = qz, is differentiable, with lip q. Therefore Qw = Qw, This immediately gives the result. Now we can combine iteration 8 with Theorem II. to get a rate of convergence for Algorithm 2. Corollary II.4 Convergence of Algorithm 2. When β satisfies β η, η 2, the iterates of Algorithm 2 satisfy v k+ 2 k=,...,n N ma, pw inf p η η 2 where v k is in the subdifferential generalized gradient of objective 7 at w k. Moreover, if η = η 2, then Algorithm 2 is equivalent to block-coordinate descent, as detailed in Algorithm 3. Proof. The convergence statement comes directly from plugging the estimate of iteration 8 into Theorem II.. The equivalence of Algorithm 3 with Algorithm 2 is obtained by plugging in step size β = η = η 2 into each line of Algorithm 2. An important consequence of Corollary II.4 is that the convergence rate of Algorithm 2 does not depend on C or A, in contrast to Algorithm, whose rate depends on both matrices Corollary II.2. The rates of both algorithms are affected by η, η 2. We use continuation in η, driving η, η 2 to, at the same rate, and warm-starting each problem at the previous solution. A convergence theory that takes this continuation into account is left to future work.

4 4 TABLE I SNR VALUES AGAINS THE TRUE FOR DIFFERENT l p NORMS WITH ALGORITHM 3. BPDN with Random Linear Operator Method/Norm SNR l 2 with SPGL.27 l 2 with Alg l with Alg l with Alg l with Alg A. Ineact Least-Squares Solves. Algorithm 3 has a provably faster rate of convergence than Algorithm. The practical performance of these algorithms is compared in Figure, which is solving a problem with both a l norm regularizer and l norm BPDN constraint, with α = A 2 F, C = I, and η = η 2 = 4. We see a huge performance difference in practice as well as in theory: the proimal gradient descent from Algorithm yields a slower cost function decay than solving eactly for w as in Algorithm 3. Indeed, Algorithm 3 admits the fastest cost function decay as shown in Corollary II.4, albeit at the epense of more operations per iteration. This is due to the fact that fully solving the least squares problem in Line 5 is not tractable for large-scale problems. Hence, we implement Algorithm 3 ineactly, using the Conjugate Gradient CG algorithm. Figure shows the results when we use, 5, and 2 CG iterations. Each CG iteration is implemented using matri-vector products, and at 2 iterations the results are indistinguishable from those of Algorithm 3 with full solves. Even at 5 iterations, the performance is remarkably close to that of of Algorithm 3 with full solves. Algorithm 3 has a natural warm-start strategy, with the from each previous iteration used in the subsequent LS solve using CG. Using a CG method with a bounded number of iterates gives fast convergence and saves computational time. This approach is used in the subsequent eperiments. III. APPLICATION TO BASIS PURSUIT DE-NOISE MODELS The Basis Pursuit De-noise problem can be formulated as s.t. ρ A b σ 9 where ρ is classically taken to be the l 2 -norm. In this problem, represents unknown coefficients that are sparse in a transform domain, while A is a composition of the observation operator with a transform matri; popular eamples of transform domains include discrete cosine transforms, wavelets, and curvelets. The observed and noisy data b resides in the temporal/spatial domain, and σ is the misfit tolerance. This problem was famously solved with the SPGL [23] algorithm for ρ = 2. When the observed data is affected by large sparse noise, a smooth constraint is ineffective. A nonsmooth variant of 9 is very difficult for approaches such as SPGL, which solves subproblems of the form ρ A b s.t. τ. However, the proposed Algorithm 2 is easily adaptable to different norms. We apply Algorithm 3 with φ =, Fig.. Objective function decay for Equation 4 with proimal-gradient descent Algorithm, Direct solving Algorithm 3, and several steps in between where we only partially solve for H... with Algorithm 2. taking η, η 2, so that w, w 2, A b. We can take many different ψ, including l 2, l, l, and l. Algorithm 3 is simple to implement. The least squares update in step 4 can be computed efficiently using either factorization with Woodbury, or an iterative method in cases where A is too large to store. For the Woodbury approach, we have η2 + η A T A = I η 2 η2 2 A T I + AA T A. η η 2 For moderate size systems, we can store Cholesky factor LL T = η I + η 2 AA T, with L R m m, and use L with to implement step 4. However, in the seismic/curvelet eperiment described below, the left-hand side of Equation is too large to store in memory, but is positive definite. Hence, we solve the resulting linear system in step 4 of Algorithm 3 with CG, using matrivector products. The w update is implemented via the l - proimal operator soft thresholding, while the w 2 update requires a projection onto the l p ball. The projectors used in our eperiments are collected in Table II. The least squares solve for is when C T is an orthogonal matri or tight frame, so that C T C = I; this is the case for Fourier transforms, wavelets, and curvelets. When A is a restriction operator, as for many data interpolation problems, A T A is a diagonal matri with zeros and ones, and hence H = η C T C + η 2 A T A is a diagonal matri with entries either η or η + η 2 ; the least squares problem for the update is then trivial. IV. BASIS PURSUIT DE-NOISE EXPERIMENTS In this application, we consider two eamples: the first is a small-scale BPDN to illustrate the proof of concept of our technique, while the second is an application to de-noising a common source gather etracted from a seismic line simulated using a 2D BG Compass model. The data set contains time samples with a temporal-interval of 4ms, and the spatial sampling is m. For this eample, we use curvelets as a

5 5 TABLE II PROJECTORS FOR l p BALLS. Norm l proj τbl z Solution { z, z < τ l 2 i 2 i Analytic τz/ z 2, z > τ l ma i i ma,, Analytic l i i { See e.g. [22] On ln n routine l i z i, i one of the τ largest indices i Analytic otherwise. a True a True b l 2 b l 2 c l c l d l e l d l Fig. 3. Basis Pursuit De-noising results for a randomly generated linear model with large, sparse noise. e l Fig. 2. Residuals for different l p-norms after algorithm teration. Note how the l - and l -norms can capture the outliers only. sparsfying transform domain. The first eample considers the same model as in 9 where we want to enforce sparsity on while constraining the data misfit. The variable is a vector of length n that has values {, } on a random 4% of its entries and zeros everywhere else; represents a spike train that we observe using a linear operator, A R n,m. A was generated with independent standard Gaussian entries, and b R m is observed data with large, sparse noise. We take m = 2 and n = 52. The noise is generated by placing large values on % of the observations and assug everything else was observed cleanly ie no noise. Here, we test the efficacy of using different l p norms on the residual constraint. With the addition of large, sparse noise to the data, smooth norms on the residual constraint should not be able to effectively deal with such outlier residuals. With our adaptable formulation, it should be easy to enforce both sparsity in the domain as

6 TABLE III C URVELET I NTERPOLATION AND D ENOISING RESULTS FOR SPGL AND A LGORITHM 4 FOR SELECTED `p - NORMS FOR BPDN. Method/Norm `2 with SPGL l2 with Alg.4 l with Alg.

Other formulations, such as SPGL, do not have this capability. This noise is depicted in as the bottom black dashed line in Figure 2. The results are shown in Figure 3 and in Table I.

Our approach is resilient to different types of noise since we can easily change the residual ball projection.

The net test of the BPDN formulation is for a common source gather where entries are both omitted and corrupted with synthetic noise.

First, we note that doing interpolation only without added noise yields an SNR of approimately 3 for all formulations and algorithms; that is, all `p norms for Algorithm 4 and SPGL.

Following the first eperiment, we add large sparse noise to a handful of data points; in this case, we added large values to a random % of observations this does not include omitted entries.

Large, sparse noise cannot be filtered effectively by a smooth norm constraint, using either Algorithm 4 or SPGL.

6 6 TABLE III C URVELET I NTERPOLATION AND D ENOISING RESULTS FOR SPGL AND A LGORITHM 4 FOR SELECTED `p - NORMS FOR BPDN. Method/Norm `2 with SPGL l2 with Alg.4 l with Alg.4 l with Alg.4 l with Alg.4 4D Monochromatic Interpolation SNR SNR w Time s early stoppage well as the residuals. Other formulations, such as SPGL, do not have this capability. This noise is depicted in as the bottom black dashed line in Figure 2. The results are shown in Figure 3 and in Table I. From these, we can clearly see that the `2 norm is not effective for sparse noise, even at the correct error budget σ. Our approach is resilient to different types of noise since we can easily change the residual ball projection. This is seen by the almost eact accuracy of the ` and ` norms, with SNR s of 33 and 45 respectively. The net test of the BPDN formulation is for a common source gather where entries are both omitted and corrupted with synthetic noise. Here, the objective function looks for sparsity in the curvelet domain, while the residual constraint seeks to match observed data within a certain tolerance σ. First, we note that doing interpolation only without added noise yields an SNR of approimately 3 for all formulations and algorithms; that is, all `p norms for Algorithm 4 and SPGL. Here, we again want to enforce sparsity both in the curvelet domain and the data residual ka bk, which SPGL and other algorithms lack the capacity to do. Following the first eperiment, we add large sparse noise to a handful of data points; in this case, we added large values to a random % of observations this does not include omitted entries. The noise added is approimately 2, while the observed data can range from to 3. The interpolated and denoising results are shown in Figure 4 and Table III. Large, sparse noise cannot be filtered effectively by a smooth norm constraint, using either Algorithm 4 or SPGL. However, ` and ` norms effectively handle such noise, and can be optimized using our approach. The SNR s for these implementations are approimately and respectively, approaching that of the noiseless data mentioned above. V. E XTENSION TO L OW-R ANK M ODELS Treating the data as having a matri structure gives additional regularization tools in particular low-rank structure in particular domains. The BPDN formulation for residualconstrained low-rank interpolation is given by kxk X s.t. ρ AX b σ for X Cm n, A : Cn m Cp is a linear masking operator from full to observed noisy data b, and σ is the misfit tolerance. The nuclear norm kxk is the ` norm of the singular values of X. Solving the problem requires using a decision variable that is the size of the data, as well a True Data b Added Noise binary c Noisy Data with Missing Sources d SPGL e l2 f l g l h l Fig. 4. Interpolation and de-noising results for BPDN in the curvelet domain. Observe the complete inaccuracy of smooth norms with large, sparse noise. as updates to this variable that require SVDs at each iteration. It is much more efficient to model X is a product of two matrices L and R, given by klk2f + krk2f s.t. ρ ALRT b σ 2 L,R 2 where L Cn k, R Cm k, and LRT is the low-rank representation of the data. The solution is guaranteed to be at most rank k, and in addition, the regularizer 2 klk2f +krk2f is an upper bound for klrt k, the sum of singular values of LRT, further penalizing rank by proy. The decision variables then have combined dimension km n, which is much smaller than the nm variables required by conve formulations. When ρ is smooth, the problems are solved using a continuation that interchanges the roles of the objective and constraints, solving a sequence of problems where ρ ALRT b is imized over the `2 ball [3] using projected gradient; an approach we call SPGLR below.

7 7 When ρ is not smooth, SPGLR does not work and there are no available implementations for 2. Nonsmooth ρ arise when we want the residual to be in the l norm ball, so we are robust to outliers in the data, and can eactly fit inliers. We now etend Algorithm 3 to this case. For any ρ smooth or nonsmooth, we introduce a latent variable W for the data matri, and solve L,R,W L 2 R + F 2η W LRT 2 2, s.t. AW b p σ 3 with η a parameter that controls the degree of relaation; as η we have W LR T. The relaation allows a simple block-coordinate descent detailed in Algorithm 4. Algorithm 4 Block-Coordinate Descent for 3. : Input: w, L, R 2: Initialize: k = 3: while not converged do 4: L k+ I + ηr T k R k ηwk R k 5: R k+ ηwk T L k+ { I + ηl T k+ L k+ 6: W k+ L k+ Rk+ T ij, i, j X obs proj Bρ,σ ALk+ Rk+ T b, 7: k k + 8: end while 9: Output: w k, L k, R k o.w. Algorithm 4 is also simple to implement. It requires two least squares solves for L and R, which are inherently parallelizable. It also requires a projection of the updated data matri estimates LR T onto the σ-level set of the misfit penalty ρ. This step is detailed below. For unobserved data i, j X obs, we have W ij = LR T ij. For observed data, let v denote ALR T. Then the W update step is given by solving w w v 2 2, s.t. w b p σ. Using the simple substitution z = w b, the we get z z v b 2 2, s.t. z p σ which is precisely the projection of ALR T b onto B p,σ, the σ-level set of ρ. We use the same projectors for ρ {l, l, l 2, l } as in Section IV, see Table II. The convergence criteria for Algorithm 4 is based on the optimality of the quadratic subproblems in L, R and feasibility measure of W LR T, though in practice we compare performance of algorithms based on a computational budget. This blockcoordinate descent scheme converges to a stationary point of Equation 3 by [2, Theorem 4.]. Implementing block-coordinate descent on these forms until convergence produces the completed low-rank matri. Setting ν = LR T w 2 2, we iterate until ν < e 5 or a maimum number of iterations is reached. In the net section, we develop an application of this method to seismic interpolation and denoising. VI. 4D MATRIX COMPLETION WITH DE-NOISING There are two main requirements when using the rankimization based framework for seismic data interpolation and denoising: i underlying seismic data should ehibit lowrank structure singular values should decay fast in some transform domain, and, ii subsampling and noise destroy the low-rank structure singular values decay slow in that domain. For eploiting the low-rank structure during interpolation and denoising, we follow the matricization strategy proposed by [8]. The matricization source-, source-y, i.e., placing both the source coordinates along the columns Figure 6a, gives slow-decay of singular values Figure 5a, while the matricization source-, receiver- Figure 6c gives fast decay of the singular values Figure 5b. To understand the effect of subsampling on the low-rank structure, we remove the 5% of the sources. Subsampling destroys the fast singular value decay in the source-, receiver- matricization, but not in the source-, receiver-y matricization. This is because missing sources are missing columns in the source-, sourcey matricization, and missing sub-blocks in the source-, receiver- matricization Figure 6b. The latter is more effective for low-rank interpolation. Similar to the BPDN eperiments, we want to show that nonsmooth constraints on the data residual can be effective for dealing with large, sparse noise. The smooth l 2 norm that is most common in BPDN problem will fail in such eamples, thereby leading to better data estimation with the implementation of non-smooth norms on the residuals. Thus, the goal of the below eperiments is to show that enforcing sparsity in the singular values ie low-rank and sparsity in the residual constraint can be more effective with large, sparse noise than smooth residual constraints solved by most contemporary algorithms. A. Eperiment Description This eample demonstrates the efficacy of the proposed approach using data created by a 5D dataset based on a comple SEG/EAGE overthrust model simulation []. The dimension of the model is 5 km km km and is discretized on a 25 m 25 m 25 m grid. The simulated data contains 2 2 receivers sampled at 5 m and sources sampled at m. We apply the Fourier transform along the time domain and etract a frequency slice at Hz as shown in Figure 7a, which is a 4D object source-, source-y, receiver- and receiver-y. We eliate 8% of the sources and add large sparse outliers from the random gaussian distribution N, a i max si mean zero and variance on the order of the largest value in that particular source. The generated values with the highest magnitudes are kept, and these are randomly added to observations in the remaining sources Figure 7f. The largest value of our dataset is approimately 4, while the smallest is close to zero. Thus, we are essentially increasing/decreasing % of the entries by several orders of magnitude, which contaates the data significantly, especially if the original entry was nearly. For all low-rank completion and denoising, we let a i =. The objective is to recover missing sources and

8 rec,yrec 2 2 3 3 4 4 5 5 3 6 6 2 4 3 4 src, ysrc 5 6 2 3 4 src, ysrc 5 6 a Full src-, src-y b Subsampled src-, src-y 5 6 yrec,ysrc 7 2 3 4 5 6 2 2 yrec,ysrc Normalized singular value 2 rec,yrec No

rec- d Subsampled src-, rec- Fig. 6. Full and subsampled matricizations used in low-rank completion Source: [4]. 2 3 TABLE IV 4D D E - NOISING RESULTS FOR SPGLR AND A LGORITHM 4 FOR SELECTED `p NORMS.

3338 943-2.363 546-2.3338 28 48.867 569 b source-, receiver- Fig. 5. Normalized Singular value decay for full data and 5% missing sources with two different matricizations. Source: [3].

We perform three eperiments on the same dataset: De-noising only Figure 7c; 2 Interpolation only Figure 7d; and 3 Combined Interpolation and Denoising Figure 7f.

8 8 rec,yrec src, ysrc src, ysrc 5 6 a Full src-, src-y b Subsampled src-, src-y 5 6 yrec,ysrc yrec,ysrc Normalized singular value 2 rec,yrec No subsampling 5% missing sources Singular value inde 6 a source-, source-y src,rec src,rec 5 6 No subsampling 5% missing sources Normalized singular value c Full src-, rec- d Subsampled src-, rec- Fig. 6. Full and subsampled matricizations used in low-rank completion Source: [4]. 2 3 TABLE IV 4D D E - NOISING RESULTS FOR SPGLR AND A LGORITHM 4 FOR SELECTED `p NORMS Singular value inde 4D Monochromatic Method/Norm SNR `2 with SPGLR.7489 l2 with Alg l with Alg l with Alg l with Alg De-noising SNR-W Time s b source-, receiver- Fig. 5. Normalized Singular value decay for full data and 5% missing sources with two different matricizations. Source: [3]. eliate noise from observed data. We use a rank of k = 75 for the formulation that is, L Cn 75 and similarly for R, and run all algorithms for 5 iterations, using a fied computational budget. We perform three eperiments on the same dataset: De-noising only Figure 7c; 2 Interpolation only Figure 7d; and 3 Combined Interpolation and Denoising Figure 7f. Since we have ground truth, we pick σ to be the eact difference between generated noisy data and the true data; σ for the l norm is a cardinality measure, so it is set to number of noisy points added. B. Results Tables IV-VI display SNR values for different algorithms and formulations for the three types of eperiments, and Figures 8- display the results for a randomly selected number of sources for the three eperiments. Even a small number of outliers can greatly impact the quality of the low-rank denoising and interpolation for the standard, smoothly residualconstrained algorithms. The de-noising only results Figure 8, Table IV show that all methods perform well when all sources are available. The interpolation only results Figure 9, Table V show that all constraints perform well in interpolating the missing data. This makes sense, as all algorithms will simply favor the low-rank nature of the data. However, the combined de-noising and interpolation dataset shows that the ` norm approach does far better than any smooth norm in comparable time. Table VI shows that when data for similar sources is absent/not observed, the smoothly-constrained formulations fail completely. When noise is added to the low-amplitude section of the observed data, the smoothly-constrained norms fail drastically, while the ` norm can effectively remove the errors. This is starkly evident in Figures a-e, where all ecept Figure e are essentially noise; the result is supported by the SNR values in Table VI. While Figures ae can mostly capture the structure of the data where there were nonzero values ie where the seismic wave is observed in the upper left corner of each source, only the ` norm can capture the areas of lower energy data. TABLE V 4D I NTERPOLATION RESULTS FOR SPGLR AND A LGORITHM 4 FOR SELECTED `p NORMS. 4D Monochromatic Interpolation Method/Norm SNR SNR-W Time s `2 with SPGLR l2 with Alg l with Alg l with Alg l with Alg

9 9 TABLE VI 4D COMBINED DE-NOISING AND INTERPOLATION RESULTS FOR SPGLR AND ALGORITHM 4 FOR SELECTED l p NORMS. 4D Monochromatic De-noising & Interpolation Method/Norm SNR SNR-W Time s l 2 with SPGLR l 2 with Alg l with Alg l with Alg l with Alg VII. CONCLUSIONS We proposed a new approach for level-set formulations, including basis pursuit denoise and residual-constrained lowrank formulations. The approach is easily adapted to a variety of nonsmooth and nonconve data constraints. The resulting problems are solved using Algorithm 2 and 4; which require only that the penalty ρ has an efficient projector. The algorithms are simple, scalable, and efficient. Sparse curvelet denoising and low-rank interpolation of a monochromatic slice from the 4D seismic data volumes demonstrate the potential of the approach. A particular quality of the seismic denoising and interpolation problem is that the amplitudes of the signal have significant spatial variation. The error in the data is a much larger problem for low-amplitude data. This quality makes it very difficult to obtain reasonable results using Gaussian misfits and constraints. Nonsmooth eact formulations including l and particularly l appear to be etremely well-suited for this magnified heteroscedastic issue. VIII. ACKNOWLEDGEMENTS The authors acknowledge support from the Department of Energy Computational Science Graduate Fellowship, which is provided under grant number DE-FG2-97ER2538, and the Washington Research Foundation Data Science Professorship. REFERENCES [] F. Azadeh, N. Burkhard, L. Nicoletis, F. Rocca, and K. Wyatt. Seg/eaeg 3-d modeling project: 2nd update. The Leading Edge, 39: , 994. [2] A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, and S. Roy. Level-set methods for conve optimization. To appear in Mathematical Programg, Series B., 28. [3] A. Y. Aravkin, R. Kumar, H. E. Mansour, B. Recht, and F. J. Herrmann. Fast methods for denoising matri completion formulations, with applications to robust seismic data interpolation. SIAM J. Scientific Computing, 36, 24. [4] H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. Proimal alternating imization and projection methods for nonconve problems: An approach based on the kurdyka-łojasiewicz inequality. Mathematics of Operations Research, 352: , 2. [5] B. M. Bell and J. V. Burke. Algorithmic differentiation of implicit functions and optimal values. In Advances in Automatic Differentiation, pages Springer, 28. [6] E. J. Candès and T. Tao. Near-optimal signal recovery from random projections: universal encoding strategies. IEEE Transactions on Information Theory, 522: , 26. [7] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 2:33 6, 998. [8] C. Da Silva and F. J. Herrmann. Optimization on the hierarchical tucker manifold applications to tensor completion. Linear Algebra and its Applications, 48:3 73, 25. [9] D. Davis and W. Yin. Convergence rate analysis of several splitting schemes. In Splitting Methods in Communication, Imaging, Science, and Engineering, pages Springer, 26. [] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 322:47 99, 24. [] F. Girosi. An equivalence between sparse approimation and support vector machines. Neural Comp., 6:455 48, 998. [2] F. J. Herrmann and G. Hennenfent. Non-parametric seismic data recovery with curvelet frames. Geophysical Journal International, 73: , 28. [3] A. Kadu and R. Kumar. Decentralized full-waveform inversion. Submitted to EAGE on January 5, 28, 28. [4] R. Kumar, C. Da Silva, O. Akalin, A. Y. Aravkin, H. Mansour, B. Recht, and F. J. Herrmann. Efficient matri completion for seismic data reconstruction. Geophysics, 85:V97 V4, 25. [5] R. Kumar, O. López, D. Davis, A. Y. Aravkin, and F. J. Herrmann. Beating level-set methods for 5-d seismic data interpolation: A primal-dual alternating approach. IEEE Transactions on Computational Imaging, 32: , June 27. [6] M. Lustig, D. Donoho, and J. Pauly. Sparse mri: The application of compressed sensing for rapid mr imaging. Magnetic Resonance in Medicine, 58:82 95, 27. [7] L. Oneto, S. Ridella, and D. Anguita. Tikhonov, Ivanov and Morozov regularization for support vector machine learning. Machine Learning, 3:3 36, 26. [8] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed imum-rank solutions of linear matri equations via nuclear norm imization. SIAM Rev., 523:47 5, Aug. 2. [9] M. D. Sacchi, T. J. Ulrych, and C. J. Walker. Interpolation and etrapolation using a high-resolution discrete fourier transform. IEEE Transactions on Signal Processing, 46:3 38, Jan 998. [2] J. A. Tropp. Just rela: Conve programg methods for identifying sparse signals in noise. IEEE Transactions on Information Theory, 523:3 5, March 26. [2] P. Tseng. Convergence of a block coordinate descent method for nondifferentiable imization. Journal of optimization theory and applications, 93: , 2. [22] E. Van Den Berg and M. P. Friedlander. Probing the pareto frontier for basis pursuit solutions. SIAM Journal on Scientific Computing, 32:89 92, 28. [23] E. van den Berg and M. P. Friedlander. Probing the pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput., 32:89 92, Nov. 28. [24] A. Yurtsever, M. Udell, J. A. Tropp, and V. Cevher. Sketchy Decisions: Conve Low-Rank Matri Optimization with Optimal Storage. ArXiv e-prints, Feb. 27. [25] P. Zheng and A. Aravkin. Fast methods for nonsmooth nonconve imization. ArXiv e-prints, Feb. 28. [26] P. Zheng, T. Askham, S. L. Brunton, J. N. Kutz, and A. Y. Aravkin. A Unified Framework for Sparse Relaed Regularized Regression: SR3. ArXiv e-prints, July 28.

a Fully sampled monochromatic slize at Hz. b Noisy data alone binary.

zero and variance. maxsi c Observed noisy data. d Subsampled noiseless data.

10 a Fully sampled monochromatic slize at Hz. b Noisy data alone binary. Sparse noise was added by keeping the top entries generated from a normal distribution with mean zero and variance. maxsi c Observed noisy data. d Subsampled noiseless data. We omitted 8% e Subsampled and noise, with noise only present f Subsampled and noisy data. We again omitted of sources. binary. 8% of sources and added the noise described above to the rest of the sources. Fig. 7. True data and three different eperiments for testing our completeness algorithm. b l2 a SPGLR d l Fig. 8. Denoising-only results. c l e l

11 b l2 a SPGLR d l c l e l Fig. 9. Interpolation-only results. b l2 a SPGLR d l Fig.. Interpolation and Denoising results. c l e l

Seismic data interpolation and denoising using SVD-free low-rank matrix factorization

Seismic data interpolation and denoising using SVD-free low-rank matrix factorization R. Kumar, A.Y. Aravkin,, H. Mansour,, B. Recht and F.J. Herrmann Dept. of Earth and Ocean sciences, University of British