Basis Pursuit Denoise with Nonsmooth Constraints

Size: px
Start display at page:

Download "Basis Pursuit Denoise with Nonsmooth Constraints"

Transcription

1 Basis Pursuit Denoise with Nonsmooth Constraints Robert Baraldi, Rajiv Kumar 2, and Aleksandr Aravkin. Department of Applied Mathematics, University of Washington 2 Formerly School of Earth and Atmospheric Sciences, Georgia Institute of Technology, USA; Currently DownUnder GeoSolutions, Perth, Australia Abstract Level-set optimization formulations with datadriven constraints imize a regularization functional subject to matching observations to a given error level. These formulations are widely used, particularly for matri completion and sparsity promotion in data interpolation and denoising. The misfit level is typically measured in the l 2 norm, or other smooth metrics. In this paper, we present a new fleible algorithmic framework that targets nonsmooth level-set constraints, including l, l, and even l norms. These constraints give greater fleibility for modeling deviations in observation and denoising, and have significant impact on the solution. Measuring error in the l and l norms makes the result more robust to large outliers, while matching many observations eactly. We demonstrate the approach for basis pursuit denoise BPDN problems as well as for etensions of BPDN to matri factorization, with applications to interpolation and denoising of 5D seismic data. The new methods are particularly promising for seismic applications, where the amplitude in the data varies significantly, and measurement noise in low-amplitude regions can wreak havoc for standard Gaussian error models. Inde Terms Nonconve nonsmooth optimization, level-set formulations, basis pursuit denoise, interpolation, seismic data. I. INTRODUCTION Basis Pursuit Denoise BPDN seeks a sparse solution to an under-deteried system of equations that have been corrupted by noise. The classic level-set formulation [22], [2] is given by s.t. A b 2 σ where A : R m n R d is a linear functional taking unknown parameters R m n to observations b R d. Problem is also known as a Morozov formulation in contrast to Ivanov or Tikhonov [7]. The functional A can include a transformation to another domain, including Wavelets, Fourier, or Curvelet coefficients [7], as well as compositions of these transforms with other linear operators such as restriction in interpolation problems. The parameter σ controls the error budget, and is based on an estimate of noise level in the data. Theoretical recovery guarantees for classes of operators A are developed in [6] and [2]. BPDN and the closely related LASSO formulation have applications to compressed sensing [8], [6] and machine learning [], [], as well as to applied domains including MRI [6]. Seismic data is a key use case [3], [5], [9], where acquisition is prohibitively epensive and interpolation techniques are used to fill in data volumes by promoting parsimonious representations in the Fourier [9] or Curvelet [2] domains. Matricization of the data leads to low-rank interpolation schemes [3], [5], [9], [24]. While BPDN uses nonsmooth regularizers including the l norm, nuclear norm, and elastic net, the inequality constraint is ubiquitously smooth, and often taken to be the l 2 norm as in. Prior work, including [23], [3], [9], [2], eploits the smoothness of the inequality constraint in developing algorithms for the problem class. Smooth constraints work well when errors are Gaussian, but this assumption fails for seismic data and is often violated in general. Contributions. The main contribution of this paper is to provide a fast, easily adaptable algorithm to solve non-smooth and nonconve data constraints in general level-set formulations including BPDN, and illustrate the efficacy of the approach using large-scale interpolation and denoising problems. To do this, we etend the universal regularization framework of [26] to level-set formulations with nonsmooth/nonconve constraints. We develop a convergence theory for the optimization approach, and illustrate the practical performance of the new formulations for data interpolation and denoising in both sparse recovery and low-rank matri factorization. Roadmap. The paper proceeds as follows. Section II develops the general relaation framework and approach. Section III specifies this framework to the BPDN setting with nonsmooth, nonconve constraints. In Section IV we apply the approach to sparse signal recovery problem and sparse Curvelet reconstruction. In Section V, we etend the approach to a low-rank interpolation framework, which embeds matri factorization within the BPDN constraint. In Section VI we test the low-rank etension using synthetic eamples and data etracted from a full 5D dataset simulated on comple SEG/EAGE overthrust model. II. NONSMOOTH, NONCONVEX LEVEL-SET FORMULATIONS. We consider the following problem class: φc s.t. ψa b σ, 2 where φ and ψ may be nonsmooth, nonconve, but have welldefined proimity and projection operators: pro αφ y = arg proj ψ σ = arg ψ σ 2α y 2 + φ 2α y 2. Here, C : C m n R c is typically a linear operator that converts to some transform domain, while A : C m n R d is a linear observation operator also acting on. In the contet of interpolation, A is often a restriction operator. 3

2 2 Algorithm Pro-gradient for 4. : Input:, w, w2 2: Initialize: k = 3: while not converged do k α C T C w 4: k+ η 5: w k+ pro η α φ 6: w k+ + A T A w 2 b η 2 w k α η w k C k+ 2 proj σbψ w 2 k α η 2 w 2 A k+ b 7: k k + 8: end while 9: Output: w k, w k 2, k This setting significantly etends that of [2], who assume ψ and φ are conve, C = I, and use the value function vτ = ψa b s.t. φ τ to solve 2 using root-finding to solve vτ = σ. Variational properties of v are fully only understood in the conve setting, and efficient evaluation of vτ requires ψ to be smooth, so that efficient first-order methods are applicable. Here, we develop an approach to solve any problem of type 2, including problems with nonsmooth and nonconve ψ, φ, using only matri vector products with A, A T, C, C T and simple nonlinear operators. In special cases, the approach can also use equation solves to gain significant speedup. The general approach uses the relaation formulation proposed in [26], [25]. We use relaation to split φ, ψ from the linear map A and transformation map C, etending 2 to φw + C w 2 + w 2 A + b 2 2,w,w 2 2η 2η 2 s.t. ψw 2 σ. 4 with w R c and w 2 R d. In contrast to [26], we use a continuation scheme to force η i, in order to solve the original formulation 2. Thus the only eternal algorithmic parameter the scheme requires is σ, which controls the error budget for ψ. There are two algorithms readily available to solve 4. The first is pro-gradient descent, detailed in Algorithm. We let z =, w, w 2, and define Φz = φw + δ ψ σ w 2, where the indicator function δ ψ σ takes the value if ψw 2 σ, and infinity otherwise. Problem 4 can now be written as [ ] η C 2 η I z 2 η2 A z b η2 I +Φz. 5 }{{} fz Applying the pro-gradient descent iteration with step-size α z k+ = pro αφ z k α fz k 6 Algorithm 2 Value-function optimization for 4. : Input:, w, w 2 2: Initialize: k = 3: Define: H = η C T C + η 2 A T A 4: while not converged do 5: k+ H 6: w k+ pro η β φ 7: w k+ η C T w k + η 2 A T b + w2 k w k β η w k C k+ 2 proj σbψ w 2 k β η 2 w 2 A k+ b 8: k k + 9: end while : Output: w k, w k 2, k gives the coordinate updates in Algorithm. Pro-gradient has been analyzed in the general nonconve setting by [4]. However, Problem 5 is the sum of a conve quadratic and a nonconve regularizer. The rate of convergence for this problem class can be quantified, and [26, Theorem 2], reproduced below, will be very useful here. Theorem II. Pro-gradient for Regularized Least Squares. Consider the least squares objective pz := z 2 Az a 2 + Φz. with p bounded below, and Φ potentially nonsmooth, nonconve, and non-finite valued. With step α = σ ma, the iterates 6 satisfy where v k+ 2 A 2 k=,...,n N pz inf p v k = A 2 2I A T A k k+ is a subgradient generalized gradient of p at z k. We can specialize Theorem II. to our case by computing the norm of the least squares system in 5. Corollary II.2 Rate for Algorithm. Theorem II. applied to problem 4 gives with k=,...,n v k+ 2 Cη, η 2, C, A N pz inf p Cη, η 2, C, A = η c + C 2 F + η 2 d + A 2 F. Problem 4 also admits a different optimization strategy, summarized in Algorithm 2. We can formally imize the objective in directly via the gradient, with the imizer given by w = H C T w + A T w 2 + b η η 2 H = η C T C + η 2 A T A

3 3 with w = w, w 2. Plugging this epression back in gives a Algorithm 3 Block-coordinate descent for 4. regularized least squares problem in w alone: : Input:, w, w2 [ ] pw := φw + w,w 2 F w w b 2 2: Initialize: k = 2 s.t. ψw 2 σ 3: Define: H = η C T C + η 2 A T A 4: while not converged do F = η η CH C T I ηη 2 CH A T 5: k+ H η C T w k + η 2 A T b + w2 k η2η AH C T η2 I η AH A T 6: w k+ pro φ C k+ [ ηη 2 CH A T ] 7: w k+ 2 proj b σbψ A k+ b b = 8: η2 η AH A T. k k + I b 9: end while 7 : Output: w, k w2, k k Pro-gradient applied to the value function pw in 7 with step β gives the iteration w + = pro β Φ w k βf T Fw b 8 This iteration, as formally written, requires forg and applying the system F in 7 at each iteration. In practice we compute the w update on the fly, as detailed in Algorithm 2. The equivalence of Algorithm 2 to iteration 8 comes from the following derivative formula for value functions [5]: F T Fw b = η C T Cw w is also differentiable, with and hence Qw = D qdw, lip Q D T D 2 = ma,. η η 2 + η 2 A T Aw w 2 + b. In order to compute β, and apply Theorem II., we first prove the following lemma: Lemma II.3 Bound on F T F 2. The operator norm F T F 2 is bounded above by ma η, η 2. Proof. Considering the function Fw b 2 = C w 2 + w 2 A + b 2 2, 2η 2η }{{ 2 } Q,w we know that the gradient is given by F T Fw b, and any Lipschitz bound L gives F T Fw F T Fw 2 L w w 2, which means F T F 2 L. On the other hand, we can write the right hand side as where and qz, = 2 z Qw, = qdw, D = [ 2 η C 2 η 2 A ] [ ] η. η [ ] 2 b Using Theorem of [25] with gz =, we have that the value function qz = qz, is differentiable, with lip q. Therefore Qw = Qw, This immediately gives the result. Now we can combine iteration 8 with Theorem II. to get a rate of convergence for Algorithm 2. Corollary II.4 Convergence of Algorithm 2. When β satisfies β η, η 2, the iterates of Algorithm 2 satisfy v k+ 2 k=,...,n N ma, pw inf p η η 2 where v k is in the subdifferential generalized gradient of objective 7 at w k. Moreover, if η = η 2, then Algorithm 2 is equivalent to block-coordinate descent, as detailed in Algorithm 3. Proof. The convergence statement comes directly from plugging the estimate of iteration 8 into Theorem II.. The equivalence of Algorithm 3 with Algorithm 2 is obtained by plugging in step size β = η = η 2 into each line of Algorithm 2. An important consequence of Corollary II.4 is that the convergence rate of Algorithm 2 does not depend on C or A, in contrast to Algorithm, whose rate depends on both matrices Corollary II.2. The rates of both algorithms are affected by η, η 2. We use continuation in η, driving η, η 2 to, at the same rate, and warm-starting each problem at the previous solution. A convergence theory that takes this continuation into account is left to future work.

4 4 TABLE I SNR VALUES AGAINS THE TRUE FOR DIFFERENT l p NORMS WITH ALGORITHM 3. BPDN with Random Linear Operator Method/Norm SNR l 2 with SPGL.27 l 2 with Alg l with Alg l with Alg l with Alg A. Ineact Least-Squares Solves. Algorithm 3 has a provably faster rate of convergence than Algorithm. The practical performance of these algorithms is compared in Figure, which is solving a problem with both a l norm regularizer and l norm BPDN constraint, with α = A 2 F, C = I, and η = η 2 = 4. We see a huge performance difference in practice as well as in theory: the proimal gradient descent from Algorithm yields a slower cost function decay than solving eactly for w as in Algorithm 3. Indeed, Algorithm 3 admits the fastest cost function decay as shown in Corollary II.4, albeit at the epense of more operations per iteration. This is due to the fact that fully solving the least squares problem in Line 5 is not tractable for large-scale problems. Hence, we implement Algorithm 3 ineactly, using the Conjugate Gradient CG algorithm. Figure shows the results when we use, 5, and 2 CG iterations. Each CG iteration is implemented using matri-vector products, and at 2 iterations the results are indistinguishable from those of Algorithm 3 with full solves. Even at 5 iterations, the performance is remarkably close to that of of Algorithm 3 with full solves. Algorithm 3 has a natural warm-start strategy, with the from each previous iteration used in the subsequent LS solve using CG. Using a CG method with a bounded number of iterates gives fast convergence and saves computational time. This approach is used in the subsequent eperiments. III. APPLICATION TO BASIS PURSUIT DE-NOISE MODELS The Basis Pursuit De-noise problem can be formulated as s.t. ρ A b σ 9 where ρ is classically taken to be the l 2 -norm. In this problem, represents unknown coefficients that are sparse in a transform domain, while A is a composition of the observation operator with a transform matri; popular eamples of transform domains include discrete cosine transforms, wavelets, and curvelets. The observed and noisy data b resides in the temporal/spatial domain, and σ is the misfit tolerance. This problem was famously solved with the SPGL [23] algorithm for ρ = 2. When the observed data is affected by large sparse noise, a smooth constraint is ineffective. A nonsmooth variant of 9 is very difficult for approaches such as SPGL, which solves subproblems of the form ρ A b s.t. τ. However, the proposed Algorithm 2 is easily adaptable to different norms. We apply Algorithm 3 with φ =, Fig.. Objective function decay for Equation 4 with proimal-gradient descent Algorithm, Direct solving Algorithm 3, and several steps in between where we only partially solve for H... with Algorithm 2. taking η, η 2, so that w, w 2, A b. We can take many different ψ, including l 2, l, l, and l. Algorithm 3 is simple to implement. The least squares update in step 4 can be computed efficiently using either factorization with Woodbury, or an iterative method in cases where A is too large to store. For the Woodbury approach, we have η2 + η A T A = I η 2 η2 2 A T I + AA T A. η η 2 For moderate size systems, we can store Cholesky factor LL T = η I + η 2 AA T, with L R m m, and use L with to implement step 4. However, in the seismic/curvelet eperiment described below, the left-hand side of Equation is too large to store in memory, but is positive definite. Hence, we solve the resulting linear system in step 4 of Algorithm 3 with CG, using matrivector products. The w update is implemented via the l - proimal operator soft thresholding, while the w 2 update requires a projection onto the l p ball. The projectors used in our eperiments are collected in Table II. The least squares solve for is when C T is an orthogonal matri or tight frame, so that C T C = I; this is the case for Fourier transforms, wavelets, and curvelets. When A is a restriction operator, as for many data interpolation problems, A T A is a diagonal matri with zeros and ones, and hence H = η C T C + η 2 A T A is a diagonal matri with entries either η or η + η 2 ; the least squares problem for the update is then trivial. IV. BASIS PURSUIT DE-NOISE EXPERIMENTS In this application, we consider two eamples: the first is a small-scale BPDN to illustrate the proof of concept of our technique, while the second is an application to de-noising a common source gather etracted from a seismic line simulated using a 2D BG Compass model. The data set contains time samples with a temporal-interval of 4ms, and the spatial sampling is m. For this eample, we use curvelets as a

5 5 TABLE II PROJECTORS FOR l p BALLS. Norm l proj τbl z Solution { z, z < τ l 2 i 2 i Analytic τz/ z 2, z > τ l ma i i ma,, Analytic l i i { See e.g. [22] On ln n routine l i z i, i one of the τ largest indices i Analytic otherwise. a True a True b l 2 b l 2 c l c l d l e l d l Fig. 3. Basis Pursuit De-noising results for a randomly generated linear model with large, sparse noise. e l Fig. 2. Residuals for different l p-norms after algorithm teration. Note how the l - and l -norms can capture the outliers only. sparsfying transform domain. The first eample considers the same model as in 9 where we want to enforce sparsity on while constraining the data misfit. The variable is a vector of length n that has values {, } on a random 4% of its entries and zeros everywhere else; represents a spike train that we observe using a linear operator, A R n,m. A was generated with independent standard Gaussian entries, and b R m is observed data with large, sparse noise. We take m = 2 and n = 52. The noise is generated by placing large values on % of the observations and assug everything else was observed cleanly ie no noise. Here, we test the efficacy of using different l p norms on the residual constraint. With the addition of large, sparse noise to the data, smooth norms on the residual constraint should not be able to effectively deal with such outlier residuals. With our adaptable formulation, it should be easy to enforce both sparsity in the domain as

6 6 TABLE III C URVELET I NTERPOLATION AND D ENOISING RESULTS FOR SPGL AND A LGORITHM 4 FOR SELECTED `p - NORMS FOR BPDN. Method/Norm `2 with SPGL l2 with Alg.4 l with Alg.4 l with Alg.4 l with Alg.4 4D Monochromatic Interpolation SNR SNR w Time s early stoppage well as the residuals. Other formulations, such as SPGL, do not have this capability. This noise is depicted in as the bottom black dashed line in Figure 2. The results are shown in Figure 3 and in Table I. From these, we can clearly see that the `2 norm is not effective for sparse noise, even at the correct error budget σ. Our approach is resilient to different types of noise since we can easily change the residual ball projection. This is seen by the almost eact accuracy of the ` and ` norms, with SNR s of 33 and 45 respectively. The net test of the BPDN formulation is for a common source gather where entries are both omitted and corrupted with synthetic noise. Here, the objective function looks for sparsity in the curvelet domain, while the residual constraint seeks to match observed data within a certain tolerance σ. First, we note that doing interpolation only without added noise yields an SNR of approimately 3 for all formulations and algorithms; that is, all `p norms for Algorithm 4 and SPGL. Here, we again want to enforce sparsity both in the curvelet domain and the data residual ka bk, which SPGL and other algorithms lack the capacity to do. Following the first eperiment, we add large sparse noise to a handful of data points; in this case, we added large values to a random % of observations this does not include omitted entries. The noise added is approimately 2, while the observed data can range from to 3. The interpolated and denoising results are shown in Figure 4 and Table III. Large, sparse noise cannot be filtered effectively by a smooth norm constraint, using either Algorithm 4 or SPGL. However, ` and ` norms effectively handle such noise, and can be optimized using our approach. The SNR s for these implementations are approimately and respectively, approaching that of the noiseless data mentioned above. V. E XTENSION TO L OW-R ANK M ODELS Treating the data as having a matri structure gives additional regularization tools in particular low-rank structure in particular domains. The BPDN formulation for residualconstrained low-rank interpolation is given by kxk X s.t. ρ AX b σ for X Cm n, A : Cn m Cp is a linear masking operator from full to observed noisy data b, and σ is the misfit tolerance. The nuclear norm kxk is the ` norm of the singular values of X. Solving the problem requires using a decision variable that is the size of the data, as well a True Data b Added Noise binary c Noisy Data with Missing Sources d SPGL e l2 f l g l h l Fig. 4. Interpolation and de-noising results for BPDN in the curvelet domain. Observe the complete inaccuracy of smooth norms with large, sparse noise. as updates to this variable that require SVDs at each iteration. It is much more efficient to model X is a product of two matrices L and R, given by klk2f + krk2f s.t. ρ ALRT b σ 2 L,R 2 where L Cn k, R Cm k, and LRT is the low-rank representation of the data. The solution is guaranteed to be at most rank k, and in addition, the regularizer 2 klk2f +krk2f is an upper bound for klrt k, the sum of singular values of LRT, further penalizing rank by proy. The decision variables then have combined dimension km n, which is much smaller than the nm variables required by conve formulations. When ρ is smooth, the problems are solved using a continuation that interchanges the roles of the objective and constraints, solving a sequence of problems where ρ ALRT b is imized over the `2 ball [3] using projected gradient; an approach we call SPGLR below.

7 7 When ρ is not smooth, SPGLR does not work and there are no available implementations for 2. Nonsmooth ρ arise when we want the residual to be in the l norm ball, so we are robust to outliers in the data, and can eactly fit inliers. We now etend Algorithm 3 to this case. For any ρ smooth or nonsmooth, we introduce a latent variable W for the data matri, and solve L,R,W L 2 R + F 2η W LRT 2 2, s.t. AW b p σ 3 with η a parameter that controls the degree of relaation; as η we have W LR T. The relaation allows a simple block-coordinate descent detailed in Algorithm 4. Algorithm 4 Block-Coordinate Descent for 3. : Input: w, L, R 2: Initialize: k = 3: while not converged do 4: L k+ I + ηr T k R k ηwk R k 5: R k+ ηwk T L k+ { I + ηl T k+ L k+ 6: W k+ L k+ Rk+ T ij, i, j X obs proj Bρ,σ ALk+ Rk+ T b, 7: k k + 8: end while 9: Output: w k, L k, R k o.w. Algorithm 4 is also simple to implement. It requires two least squares solves for L and R, which are inherently parallelizable. It also requires a projection of the updated data matri estimates LR T onto the σ-level set of the misfit penalty ρ. This step is detailed below. For unobserved data i, j X obs, we have W ij = LR T ij. For observed data, let v denote ALR T. Then the W update step is given by solving w w v 2 2, s.t. w b p σ. Using the simple substitution z = w b, the we get z z v b 2 2, s.t. z p σ which is precisely the projection of ALR T b onto B p,σ, the σ-level set of ρ. We use the same projectors for ρ {l, l, l 2, l } as in Section IV, see Table II. The convergence criteria for Algorithm 4 is based on the optimality of the quadratic subproblems in L, R and feasibility measure of W LR T, though in practice we compare performance of algorithms based on a computational budget. This blockcoordinate descent scheme converges to a stationary point of Equation 3 by [2, Theorem 4.]. Implementing block-coordinate descent on these forms until convergence produces the completed low-rank matri. Setting ν = LR T w 2 2, we iterate until ν < e 5 or a maimum number of iterations is reached. In the net section, we develop an application of this method to seismic interpolation and denoising. VI. 4D MATRIX COMPLETION WITH DE-NOISING There are two main requirements when using the rankimization based framework for seismic data interpolation and denoising: i underlying seismic data should ehibit lowrank structure singular values should decay fast in some transform domain, and, ii subsampling and noise destroy the low-rank structure singular values decay slow in that domain. For eploiting the low-rank structure during interpolation and denoising, we follow the matricization strategy proposed by [8]. The matricization source-, source-y, i.e., placing both the source coordinates along the columns Figure 6a, gives slow-decay of singular values Figure 5a, while the matricization source-, receiver- Figure 6c gives fast decay of the singular values Figure 5b. To understand the effect of subsampling on the low-rank structure, we remove the 5% of the sources. Subsampling destroys the fast singular value decay in the source-, receiver- matricization, but not in the source-, receiver-y matricization. This is because missing sources are missing columns in the source-, sourcey matricization, and missing sub-blocks in the source-, receiver- matricization Figure 6b. The latter is more effective for low-rank interpolation. Similar to the BPDN eperiments, we want to show that nonsmooth constraints on the data residual can be effective for dealing with large, sparse noise. The smooth l 2 norm that is most common in BPDN problem will fail in such eamples, thereby leading to better data estimation with the implementation of non-smooth norms on the residuals. Thus, the goal of the below eperiments is to show that enforcing sparsity in the singular values ie low-rank and sparsity in the residual constraint can be more effective with large, sparse noise than smooth residual constraints solved by most contemporary algorithms. A. Eperiment Description This eample demonstrates the efficacy of the proposed approach using data created by a 5D dataset based on a comple SEG/EAGE overthrust model simulation []. The dimension of the model is 5 km km km and is discretized on a 25 m 25 m 25 m grid. The simulated data contains 2 2 receivers sampled at 5 m and sources sampled at m. We apply the Fourier transform along the time domain and etract a frequency slice at Hz as shown in Figure 7a, which is a 4D object source-, source-y, receiver- and receiver-y. We eliate 8% of the sources and add large sparse outliers from the random gaussian distribution N, a i max si mean zero and variance on the order of the largest value in that particular source. The generated values with the highest magnitudes are kept, and these are randomly added to observations in the remaining sources Figure 7f. The largest value of our dataset is approimately 4, while the smallest is close to zero. Thus, we are essentially increasing/decreasing % of the entries by several orders of magnitude, which contaates the data significantly, especially if the original entry was nearly. For all low-rank completion and denoising, we let a i =. The objective is to recover missing sources and

8 8 rec,yrec src, ysrc src, ysrc 5 6 a Full src-, src-y b Subsampled src-, src-y 5 6 yrec,ysrc yrec,ysrc Normalized singular value 2 rec,yrec No subsampling 5% missing sources Singular value inde 6 a source-, source-y src,rec src,rec 5 6 No subsampling 5% missing sources Normalized singular value c Full src-, rec- d Subsampled src-, rec- Fig. 6. Full and subsampled matricizations used in low-rank completion Source: [4]. 2 3 TABLE IV 4D D E - NOISING RESULTS FOR SPGLR AND A LGORITHM 4 FOR SELECTED `p NORMS Singular value inde 4D Monochromatic Method/Norm SNR `2 with SPGLR.7489 l2 with Alg l with Alg l with Alg l with Alg De-noising SNR-W Time s b source-, receiver- Fig. 5. Normalized Singular value decay for full data and 5% missing sources with two different matricizations. Source: [3]. eliate noise from observed data. We use a rank of k = 75 for the formulation that is, L Cn 75 and similarly for R, and run all algorithms for 5 iterations, using a fied computational budget. We perform three eperiments on the same dataset: De-noising only Figure 7c; 2 Interpolation only Figure 7d; and 3 Combined Interpolation and Denoising Figure 7f. Since we have ground truth, we pick σ to be the eact difference between generated noisy data and the true data; σ for the l norm is a cardinality measure, so it is set to number of noisy points added. B. Results Tables IV-VI display SNR values for different algorithms and formulations for the three types of eperiments, and Figures 8- display the results for a randomly selected number of sources for the three eperiments. Even a small number of outliers can greatly impact the quality of the low-rank denoising and interpolation for the standard, smoothly residualconstrained algorithms. The de-noising only results Figure 8, Table IV show that all methods perform well when all sources are available. The interpolation only results Figure 9, Table V show that all constraints perform well in interpolating the missing data. This makes sense, as all algorithms will simply favor the low-rank nature of the data. However, the combined de-noising and interpolation dataset shows that the ` norm approach does far better than any smooth norm in comparable time. Table VI shows that when data for similar sources is absent/not observed, the smoothly-constrained formulations fail completely. When noise is added to the low-amplitude section of the observed data, the smoothly-constrained norms fail drastically, while the ` norm can effectively remove the errors. This is starkly evident in Figures a-e, where all ecept Figure e are essentially noise; the result is supported by the SNR values in Table VI. While Figures ae can mostly capture the structure of the data where there were nonzero values ie where the seismic wave is observed in the upper left corner of each source, only the ` norm can capture the areas of lower energy data. TABLE V 4D I NTERPOLATION RESULTS FOR SPGLR AND A LGORITHM 4 FOR SELECTED `p NORMS. 4D Monochromatic Interpolation Method/Norm SNR SNR-W Time s `2 with SPGLR l2 with Alg l with Alg l with Alg l with Alg

9 9 TABLE VI 4D COMBINED DE-NOISING AND INTERPOLATION RESULTS FOR SPGLR AND ALGORITHM 4 FOR SELECTED l p NORMS. 4D Monochromatic De-noising & Interpolation Method/Norm SNR SNR-W Time s l 2 with SPGLR l 2 with Alg l with Alg l with Alg l with Alg VII. CONCLUSIONS We proposed a new approach for level-set formulations, including basis pursuit denoise and residual-constrained lowrank formulations. The approach is easily adapted to a variety of nonsmooth and nonconve data constraints. The resulting problems are solved using Algorithm 2 and 4; which require only that the penalty ρ has an efficient projector. The algorithms are simple, scalable, and efficient. Sparse curvelet denoising and low-rank interpolation of a monochromatic slice from the 4D seismic data volumes demonstrate the potential of the approach. A particular quality of the seismic denoising and interpolation problem is that the amplitudes of the signal have significant spatial variation. The error in the data is a much larger problem for low-amplitude data. This quality makes it very difficult to obtain reasonable results using Gaussian misfits and constraints. Nonsmooth eact formulations including l and particularly l appear to be etremely well-suited for this magnified heteroscedastic issue. VIII. ACKNOWLEDGEMENTS The authors acknowledge support from the Department of Energy Computational Science Graduate Fellowship, which is provided under grant number DE-FG2-97ER2538, and the Washington Research Foundation Data Science Professorship. REFERENCES [] F. Azadeh, N. Burkhard, L. Nicoletis, F. Rocca, and K. Wyatt. Seg/eaeg 3-d modeling project: 2nd update. The Leading Edge, 39: , 994. [2] A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, and S. Roy. Level-set methods for conve optimization. To appear in Mathematical Programg, Series B., 28. [3] A. Y. Aravkin, R. Kumar, H. E. Mansour, B. Recht, and F. J. Herrmann. Fast methods for denoising matri completion formulations, with applications to robust seismic data interpolation. SIAM J. Scientific Computing, 36, 24. [4] H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. Proimal alternating imization and projection methods for nonconve problems: An approach based on the kurdyka-łojasiewicz inequality. Mathematics of Operations Research, 352: , 2. [5] B. M. Bell and J. V. Burke. Algorithmic differentiation of implicit functions and optimal values. In Advances in Automatic Differentiation, pages Springer, 28. [6] E. J. Candès and T. Tao. Near-optimal signal recovery from random projections: universal encoding strategies. IEEE Transactions on Information Theory, 522: , 26. [7] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 2:33 6, 998. [8] C. Da Silva and F. J. Herrmann. Optimization on the hierarchical tucker manifold applications to tensor completion. Linear Algebra and its Applications, 48:3 73, 25. [9] D. Davis and W. Yin. Convergence rate analysis of several splitting schemes. In Splitting Methods in Communication, Imaging, Science, and Engineering, pages Springer, 26. [] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 322:47 99, 24. [] F. Girosi. An equivalence between sparse approimation and support vector machines. Neural Comp., 6:455 48, 998. [2] F. J. Herrmann and G. Hennenfent. Non-parametric seismic data recovery with curvelet frames. Geophysical Journal International, 73: , 28. [3] A. Kadu and R. Kumar. Decentralized full-waveform inversion. Submitted to EAGE on January 5, 28, 28. [4] R. Kumar, C. Da Silva, O. Akalin, A. Y. Aravkin, H. Mansour, B. Recht, and F. J. Herrmann. Efficient matri completion for seismic data reconstruction. Geophysics, 85:V97 V4, 25. [5] R. Kumar, O. López, D. Davis, A. Y. Aravkin, and F. J. Herrmann. Beating level-set methods for 5-d seismic data interpolation: A primal-dual alternating approach. IEEE Transactions on Computational Imaging, 32: , June 27. [6] M. Lustig, D. Donoho, and J. Pauly. Sparse mri: The application of compressed sensing for rapid mr imaging. Magnetic Resonance in Medicine, 58:82 95, 27. [7] L. Oneto, S. Ridella, and D. Anguita. Tikhonov, Ivanov and Morozov regularization for support vector machine learning. Machine Learning, 3:3 36, 26. [8] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed imum-rank solutions of linear matri equations via nuclear norm imization. SIAM Rev., 523:47 5, Aug. 2. [9] M. D. Sacchi, T. J. Ulrych, and C. J. Walker. Interpolation and etrapolation using a high-resolution discrete fourier transform. IEEE Transactions on Signal Processing, 46:3 38, Jan 998. [2] J. A. Tropp. Just rela: Conve programg methods for identifying sparse signals in noise. IEEE Transactions on Information Theory, 523:3 5, March 26. [2] P. Tseng. Convergence of a block coordinate descent method for nondifferentiable imization. Journal of optimization theory and applications, 93: , 2. [22] E. Van Den Berg and M. P. Friedlander. Probing the pareto frontier for basis pursuit solutions. SIAM Journal on Scientific Computing, 32:89 92, 28. [23] E. van den Berg and M. P. Friedlander. Probing the pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput., 32:89 92, Nov. 28. [24] A. Yurtsever, M. Udell, J. A. Tropp, and V. Cevher. Sketchy Decisions: Conve Low-Rank Matri Optimization with Optimal Storage. ArXiv e-prints, Feb. 27. [25] P. Zheng and A. Aravkin. Fast methods for nonsmooth nonconve imization. ArXiv e-prints, Feb. 28. [26] P. Zheng, T. Askham, S. L. Brunton, J. N. Kutz, and A. Y. Aravkin. A Unified Framework for Sparse Relaed Regularized Regression: SR3. ArXiv e-prints, July 28.

10 a Fully sampled monochromatic slize at Hz. b Noisy data alone binary. Sparse noise was added by keeping the top entries generated from a normal distribution with mean zero and variance. maxsi c Observed noisy data. d Subsampled noiseless data. We omitted 8% e Subsampled and noise, with noise only present f Subsampled and noisy data. We again omitted of sources. binary. 8% of sources and added the noise described above to the rest of the sources. Fig. 7. True data and three different eperiments for testing our completeness algorithm. b l2 a SPGLR d l Fig. 8. Denoising-only results. c l e l

11 b l2 a SPGLR d l c l e l Fig. 9. Interpolation-only results. b l2 a SPGLR d l Fig.. Interpolation and Denoising results. c l e l

Seismic data interpolation and denoising using SVD-free low-rank matrix factorization

Seismic data interpolation and denoising using SVD-free low-rank matrix factorization Seismic data interpolation and denoising using SVD-free low-rank matrix factorization R. Kumar, A.Y. Aravkin,, H. Mansour,, B. Recht and F.J. Herrmann Dept. of Earth and Ocean sciences, University of British

More information

Low-rank Promoting Transformations and Tensor Interpolation - Applications to Seismic Data Denoising

Low-rank Promoting Transformations and Tensor Interpolation - Applications to Seismic Data Denoising Low-rank Promoting Transformations and Tensor Interpolation - Applications to Seismic Data Denoising Curt Da Silva and Felix J. Herrmann 2 Dept. of Mathematics 2 Dept. of Earth and Ocean Sciences, University

More information

Making Flippy Floppy

Making Flippy Floppy Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Current

More information

FWI with Compressive Updates Aleksandr Aravkin, Felix Herrmann, Tristan van Leeuwen, Xiang Li, James Burke

FWI with Compressive Updates Aleksandr Aravkin, Felix Herrmann, Tristan van Leeuwen, Xiang Li, James Burke Consortium 2010 FWI with Compressive Updates Aleksandr Aravkin, Felix Herrmann, Tristan van Leeuwen, Xiang Li, James Burke SLIM University of British Columbia Full Waveform Inversion The Full Waveform

More information

Lecture 23: November 19

Lecture 23: November 19 10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo

More information

Making Flippy Floppy

Making Flippy Floppy Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Vietnam

More information

Optimal Value Function Methods in Numerical Optimization Level Set Methods

Optimal Value Function Methods in Numerical Optimization Level Set Methods Optimal Value Function Methods in Numerical Optimization Level Set Methods James V Burke Mathematics, University of Washington, (jvburke@uw.edu) Joint work with Aravkin (UW), Drusvyatskiy (UW), Friedlander

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

Structured tensor missing-trace interpolation in the Hierarchical Tucker format Curt Da Silva and Felix J. Herrmann Sept. 26, 2013

Structured tensor missing-trace interpolation in the Hierarchical Tucker format Curt Da Silva and Felix J. Herrmann Sept. 26, 2013 Structured tensor missing-trace interpolation in the Hierarchical Tucker format Curt Da Silva and Felix J. Herrmann Sept. 6, 13 SLIM University of British Columbia Motivation 3D seismic experiments - 5D

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

1-Bit Compressive Sensing

1-Bit Compressive Sensing 1-Bit Compressive Sensing Petros T. Boufounos, Richard G. Baraniuk Rice University, Electrical and Computer Engineering 61 Main St. MS 38, Houston, TX 775 Abstract Compressive sensing is a new signal acquisition

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization. A Lasso Solver

MS&E 318 (CME 338) Large-Scale Numerical Optimization. A Lasso Solver Stanford University, Dept of Management Science and Engineering MS&E 318 (CME 338) Large-Scale Numerical Optimization Instructor: Michael Saunders Spring 2011 Final Project Due Friday June 10 A Lasso Solver

More information

Lecture 22: More On Compressed Sensing

Lecture 22: More On Compressed Sensing Lecture 22: More On Compressed Sensing Scribed by Eric Lee, Chengrun Yang, and Sebastian Ament Nov. 2, 207 Recap and Introduction Basis pursuit was the method of recovering the sparsest solution to an

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Color Scheme. swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14

Color Scheme.   swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14 Color Scheme www.cs.wisc.edu/ swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July 2016 1 / 14 Statistical Inference via Optimization Many problems in statistical inference

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Robustly Stable Signal Recovery in Compressed Sensing with Structured Matrix Perturbation

Robustly Stable Signal Recovery in Compressed Sensing with Structured Matrix Perturbation Robustly Stable Signal Recovery in Compressed Sensing with Structured Matri Perturbation Zai Yang, Cishen Zhang, and Lihua Xie, Fellow, IEEE arxiv:.7v [cs.it] 4 Mar Abstract The sparse signal recovery

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Basis Pursuit Denoising and the Dantzig Selector

Basis Pursuit Denoising and the Dantzig Selector BPDN and DS p. 1/16 Basis Pursuit Denoising and the Dantzig Selector West Coast Optimization Meeting University of Washington Seattle, WA, April 28 29, 2007 Michael Friedlander and Michael Saunders Dept

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

3D INTERPOLATION USING HANKEL TENSOR COMPLETION BY ORTHOGONAL MATCHING PURSUIT A. Adamo, P. Mazzucchelli Aresys, Milano, Italy

3D INTERPOLATION USING HANKEL TENSOR COMPLETION BY ORTHOGONAL MATCHING PURSUIT A. Adamo, P. Mazzucchelli Aresys, Milano, Italy 3D INTERPOLATION USING HANKEL TENSOR COMPLETION BY ORTHOGONAL MATCHING PURSUIT A. Adamo, P. Mazzucchelli Aresys, Milano, Italy Introduction. Seismic data are often sparsely or irregularly sampled along

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes. Optimization Models EE 127 / EE 227AT Laurent El Ghaoui EECS department UC Berkeley Spring 2015 Sp 15 1 / 23 LECTURE 7 Least Squares and Variants If others would but reflect on mathematical truths as deeply

More information

Time domain sparsity promoting LSRTM with source estimation

Time domain sparsity promoting LSRTM with source estimation Time domain sparsity promoting LSRTM with source estimation Mengmeng Yang, Philipp Witte, Zhilong Fang & Felix J. Herrmann SLIM University of British Columbia Motivation Features of RTM: pros - no dip

More information

Accelerated large-scale inversion with message passing Felix J. Herrmann, the University of British Columbia, Canada

Accelerated large-scale inversion with message passing Felix J. Herrmann, the University of British Columbia, Canada Accelerated large-scale inversion with message passing Felix J. Herrmann, the University of British Columbia, Canada SUMMARY To meet current-day challenges, exploration seismology increasingly relies on

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

1 minimization without amplitude information. Petros Boufounos

1 minimization without amplitude information. Petros Boufounos 1 minimization without amplitude information Petros Boufounos petrosb@rice.edu The Big 1 Picture Classical 1 reconstruction problems: min 1 s.t. f() = 0 min 1 s.t. f() 0 min 1 + λf() The Big 1 Picture

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

A Smoothing SQP Framework for a Class of Composite L q Minimization over Polyhedron

A Smoothing SQP Framework for a Class of Composite L q Minimization over Polyhedron Noname manuscript No. (will be inserted by the editor) A Smoothing SQP Framework for a Class of Composite L q Minimization over Polyhedron Ya-Feng Liu Shiqian Ma Yu-Hong Dai Shuzhong Zhang Received: date

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

Least Squares with Examples in Signal Processing 1. 2 Overdetermined equations. 1 Notation. The sum of squares of x is denoted by x 2 2, i.e.

Least Squares with Examples in Signal Processing 1. 2 Overdetermined equations. 1 Notation. The sum of squares of x is denoted by x 2 2, i.e. Least Squares with Eamples in Signal Processing Ivan Selesnick March 7, 3 NYU-Poly These notes address (approimate) solutions to linear equations by least squares We deal with the easy case wherein the

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Successive Concave Sparsity Approximation for Compressed Sensing

Successive Concave Sparsity Approximation for Compressed Sensing Successive Concave Sparsity Approimation for Compressed Sensing Mohammadreza Malek-Mohammadi, Ali Koochakzadeh, Massoud Babaie-Zadeh, Senior Member, IEEE, Magnus Jansson, Senior Member, IEEE, and Cristian

More information

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Gongguo Tang and Arye Nehorai Department of Electrical and Systems Engineering Washington University in St Louis

More information

Compressive Phase Retrieval From Squared Output Measurements Via Semidefinite Programming

Compressive Phase Retrieval From Squared Output Measurements Via Semidefinite Programming Compressive Phase Retrieval From Squared Output Measurements Via Semidefinite Programming Henrik Ohlsson, Allen Y. Yang Roy Dong S. Shankar Sastry Department of Electrical Engineering and Computer Sciences,

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Finding a sparse vector in a subspace: Linear sparsity using alternating directions

Finding a sparse vector in a subspace: Linear sparsity using alternating directions Finding a sparse vector in a subspace: Linear sparsity using alternating directions Qing Qu, Ju Sun, and John Wright {qq05, js4038, jw966}@columbia.edu Dept. of Electrical Engineering, Columbia University,

More information

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Alfredo Nava-Tudela ant@umd.edu John J. Benedetto Department of Mathematics jjb@umd.edu Abstract In this project we are

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery Approimate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery arxiv:1606.00901v1 [cs.it] Jun 016 Shuai Huang, Trac D. Tran Department of Electrical and Computer Engineering Johns

More information

Statistical Geometry Processing Winter Semester 2011/2012

Statistical Geometry Processing Winter Semester 2011/2012 Statistical Geometry Processing Winter Semester 2011/2012 Linear Algebra, Function Spaces & Inverse Problems Vector and Function Spaces 3 Vectors vectors are arrows in space classically: 2 or 3 dim. Euclidian

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Lecture 26: April 22nd

Lecture 26: April 22nd 10-725/36-725: Conve Optimization Spring 2015 Lecture 26: April 22nd Lecturer: Ryan Tibshirani Scribes: Eric Wong, Jerzy Wieczorek, Pengcheng Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept.

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Gauge optimization and duality

Gauge optimization and duality 1 / 54 Gauge optimization and duality Junfeng Yang Department of Mathematics Nanjing University Joint with Shiqian Ma, CUHK September, 2015 2 / 54 Outline Introduction Duality Lagrange duality Fenchel

More information

On State Estimation with Bad Data Detection

On State Estimation with Bad Data Detection On State Estimation with Bad Data Detection Weiyu Xu, Meng Wang, and Ao Tang School of ECE, Cornell University, Ithaca, NY 4853 Abstract We consider the problem of state estimation through observations

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Sensing systems limited by constraints: physical size, time, cost, energy

Sensing systems limited by constraints: physical size, time, cost, energy Rebecca Willett Sensing systems limited by constraints: physical size, time, cost, energy Reduce the number of measurements needed for reconstruction Higher accuracy data subject to constraints Original

More information

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion Robert M. Freund, MIT joint with Paul Grigas (UC Berkeley) and Rahul Mazumder (MIT) CDC, December 2016 1 Outline of Topics

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Monotonicity and Restart in Fast Gradient Methods

Monotonicity and Restart in Fast Gradient Methods 53rd IEEE Conference on Decision and Control December 5-7, 204. Los Angeles, California, USA Monotonicity and Restart in Fast Gradient Methods Pontus Giselsson and Stephen Boyd Abstract Fast gradient methods

More information

A NONCONVEX ADMM ALGORITHM FOR GROUP SPARSITY WITH SPARSE GROUPS. Rick Chartrand and Brendt Wohlberg

A NONCONVEX ADMM ALGORITHM FOR GROUP SPARSITY WITH SPARSE GROUPS. Rick Chartrand and Brendt Wohlberg A NONCONVEX ADMM ALGORITHM FOR GROUP SPARSITY WITH SPARSE GROUPS Rick Chartrand and Brendt Wohlberg Los Alamos National Laboratory Los Alamos, NM 87545, USA ABSTRACT We present an efficient algorithm for

More information

Lecture 7: Weak Duality

Lecture 7: Weak Duality EE 227A: Conve Optimization and Applications February 7, 2012 Lecture 7: Weak Duality Lecturer: Laurent El Ghaoui 7.1 Lagrange Dual problem 7.1.1 Primal problem In this section, we consider a possibly

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Machine Learning for Signal Processing Sparse and Overcomplete Representations Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA

More information

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:

More information

A tutorial on sparse modeling. Outline:

A tutorial on sparse modeling. Outline: A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Lecture 23: Conditional Gradient Method

Lecture 23: Conditional Gradient Method 10-725/36-725: Conve Optimization Spring 2015 Lecture 23: Conditional Gradient Method Lecturer: Ryan Tibshirani Scribes: Shichao Yang,Diyi Yang,Zhanpeng Fang Note: LaTeX template courtesy of UC Berkeley

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Deep Learning: Approximation of Functions by Composition

Deep Learning: Approximation of Functions by Composition Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

arxiv: v1 [cs.it] 26 Oct 2018

arxiv: v1 [cs.it] 26 Oct 2018 Outlier Detection using Generative Models with Theoretical Performance Guarantees arxiv:1810.11335v1 [cs.it] 6 Oct 018 Jirong Yi Anh Duc Le Tianming Wang Xiaodong Wu Weiyu Xu October 9, 018 Abstract This

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Self-Calibration and Biconvex Compressive Sensing

Self-Calibration and Biconvex Compressive Sensing Self-Calibration and Biconvex Compressive Sensing Shuyang Ling Department of Mathematics, UC Davis July 12, 2017 Shuyang Ling (UC Davis) SIAM Annual Meeting, 2017, Pittsburgh July 12, 2017 1 / 22 Acknowledgements

More information

Sparse signals recovered by non-convex penalty in quasi-linear systems

Sparse signals recovered by non-convex penalty in quasi-linear systems Cui et al. Journal of Inequalities and Applications 018) 018:59 https://doi.org/10.1186/s13660-018-165-8 R E S E A R C H Open Access Sparse signals recovered by non-conve penalty in quasi-linear systems

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

Frequency-Domain Rank Reduction in Seismic Processing An Overview

Frequency-Domain Rank Reduction in Seismic Processing An Overview Frequency-Domain Rank Reduction in Seismic Processing An Overview Stewart Trickett Absolute Imaging Inc. Summary Over the last fifteen years, a new family of matrix rank reduction methods has been developed

More information

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation UIUC CSL Mar. 24 Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation Yuejie Chi Department of ECE and BMI Ohio State University Joint work with Yuxin Chen (Stanford).

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

An Homotopy Algorithm for the Lasso with Online Observations

An Homotopy Algorithm for the Lasso with Online Observations An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation David L. Donoho Department of Statistics Arian Maleki Department of Electrical Engineering Andrea Montanari Department of

More information

Recent developments on sparse representation

Recent developments on sparse representation Recent developments on sparse representation Zeng Tieyong Department of Mathematics, Hong Kong Baptist University Email: zeng@hkbu.edu.hk Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last

More information

Equivalence of Minimal l 0 and l p Norm Solutions of Linear Equalities, Inequalities and Linear Programs for Sufficiently Small p

Equivalence of Minimal l 0 and l p Norm Solutions of Linear Equalities, Inequalities and Linear Programs for Sufficiently Small p Equivalence of Minimal l 0 and l p Norm Solutions of Linear Equalities, Inequalities and Linear Programs for Sufficiently Small p G. M. FUNG glenn.fung@siemens.com R&D Clinical Systems Siemens Medical

More information

Recovering overcomplete sparse representations from structured sensing

Recovering overcomplete sparse representations from structured sensing Recovering overcomplete sparse representations from structured sensing Deanna Needell Claremont McKenna College Feb. 2015 Support: Alfred P. Sloan Foundation and NSF CAREER #1348721. Joint work with Felix

More information

Compressive Sensing and Beyond

Compressive Sensing and Beyond Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered

More information

Accelerated Gradient Method for Multi-Task Sparse Learning Problem

Accelerated Gradient Method for Multi-Task Sparse Learning Problem Accelerated Gradient Method for Multi-Task Sparse Learning Problem Xi Chen eike Pan James T. Kwok Jaime G. Carbonell School of Computer Science, Carnegie Mellon University Pittsburgh, U.S.A {xichen, jgc}@cs.cmu.edu

More information

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München, München, Arcistraße

More information

Overview. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Overview. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Overview Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 1/25/2016 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion

More information

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,

More information

Optimisation Combinatoire et Convexe.

Optimisation Combinatoire et Convexe. Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Adaptive one-bit matrix completion

Adaptive one-bit matrix completion Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Compressed Sensing: a Subgradient Descent Method for Missing Data Problems

Compressed Sensing: a Subgradient Descent Method for Missing Data Problems Compressed Sensing: a Subgradient Descent Method for Missing Data Problems ANZIAM, Jan 30 Feb 3, 2011 Jonathan M. Borwein Jointly with D. Russell Luke, University of Goettingen FRSC FAAAS FBAS FAA Director,

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Yin Zhang Technical Report TR05-06 Department of Computational and Applied Mathematics Rice University,

More information

ABSTRACT. Recovering Data with Group Sparsity by Alternating Direction Methods. Wei Deng

ABSTRACT. Recovering Data with Group Sparsity by Alternating Direction Methods. Wei Deng ABSTRACT Recovering Data with Group Sparsity by Alternating Direction Methods by Wei Deng Group sparsity reveals underlying sparsity patterns and contains rich structural information in data. Hence, exploiting

More information