Basis Pursuit Denoise with Nonsmooth Constraints
|
|
- Cecily Powers
- 5 years ago
- Views:
Transcription
1 Basis Pursuit Denoise with Nonsmooth Constraints Robert Baraldi, Rajiv Kumar 2, and Aleksandr Aravkin. Department of Applied Mathematics, University of Washington 2 Formerly School of Earth and Atmospheric Sciences, Georgia Institute of Technology, USA; Currently DownUnder GeoSolutions, Perth, Australia Abstract Level-set optimization formulations with datadriven constraints imize a regularization functional subject to matching observations to a given error level. These formulations are widely used, particularly for matri completion and sparsity promotion in data interpolation and denoising. The misfit level is typically measured in the l 2 norm, or other smooth metrics. In this paper, we present a new fleible algorithmic framework that targets nonsmooth level-set constraints, including l, l, and even l norms. These constraints give greater fleibility for modeling deviations in observation and denoising, and have significant impact on the solution. Measuring error in the l and l norms makes the result more robust to large outliers, while matching many observations eactly. We demonstrate the approach for basis pursuit denoise BPDN problems as well as for etensions of BPDN to matri factorization, with applications to interpolation and denoising of 5D seismic data. The new methods are particularly promising for seismic applications, where the amplitude in the data varies significantly, and measurement noise in low-amplitude regions can wreak havoc for standard Gaussian error models. Inde Terms Nonconve nonsmooth optimization, level-set formulations, basis pursuit denoise, interpolation, seismic data. I. INTRODUCTION Basis Pursuit Denoise BPDN seeks a sparse solution to an under-deteried system of equations that have been corrupted by noise. The classic level-set formulation [22], [2] is given by s.t. A b 2 σ where A : R m n R d is a linear functional taking unknown parameters R m n to observations b R d. Problem is also known as a Morozov formulation in contrast to Ivanov or Tikhonov [7]. The functional A can include a transformation to another domain, including Wavelets, Fourier, or Curvelet coefficients [7], as well as compositions of these transforms with other linear operators such as restriction in interpolation problems. The parameter σ controls the error budget, and is based on an estimate of noise level in the data. Theoretical recovery guarantees for classes of operators A are developed in [6] and [2]. BPDN and the closely related LASSO formulation have applications to compressed sensing [8], [6] and machine learning [], [], as well as to applied domains including MRI [6]. Seismic data is a key use case [3], [5], [9], where acquisition is prohibitively epensive and interpolation techniques are used to fill in data volumes by promoting parsimonious representations in the Fourier [9] or Curvelet [2] domains. Matricization of the data leads to low-rank interpolation schemes [3], [5], [9], [24]. While BPDN uses nonsmooth regularizers including the l norm, nuclear norm, and elastic net, the inequality constraint is ubiquitously smooth, and often taken to be the l 2 norm as in. Prior work, including [23], [3], [9], [2], eploits the smoothness of the inequality constraint in developing algorithms for the problem class. Smooth constraints work well when errors are Gaussian, but this assumption fails for seismic data and is often violated in general. Contributions. The main contribution of this paper is to provide a fast, easily adaptable algorithm to solve non-smooth and nonconve data constraints in general level-set formulations including BPDN, and illustrate the efficacy of the approach using large-scale interpolation and denoising problems. To do this, we etend the universal regularization framework of [26] to level-set formulations with nonsmooth/nonconve constraints. We develop a convergence theory for the optimization approach, and illustrate the practical performance of the new formulations for data interpolation and denoising in both sparse recovery and low-rank matri factorization. Roadmap. The paper proceeds as follows. Section II develops the general relaation framework and approach. Section III specifies this framework to the BPDN setting with nonsmooth, nonconve constraints. In Section IV we apply the approach to sparse signal recovery problem and sparse Curvelet reconstruction. In Section V, we etend the approach to a low-rank interpolation framework, which embeds matri factorization within the BPDN constraint. In Section VI we test the low-rank etension using synthetic eamples and data etracted from a full 5D dataset simulated on comple SEG/EAGE overthrust model. II. NONSMOOTH, NONCONVEX LEVEL-SET FORMULATIONS. We consider the following problem class: φc s.t. ψa b σ, 2 where φ and ψ may be nonsmooth, nonconve, but have welldefined proimity and projection operators: pro αφ y = arg proj ψ σ = arg ψ σ 2α y 2 + φ 2α y 2. Here, C : C m n R c is typically a linear operator that converts to some transform domain, while A : C m n R d is a linear observation operator also acting on. In the contet of interpolation, A is often a restriction operator. 3
2 2 Algorithm Pro-gradient for 4. : Input:, w, w2 2: Initialize: k = 3: while not converged do k α C T C w 4: k+ η 5: w k+ pro η α φ 6: w k+ + A T A w 2 b η 2 w k α η w k C k+ 2 proj σbψ w 2 k α η 2 w 2 A k+ b 7: k k + 8: end while 9: Output: w k, w k 2, k This setting significantly etends that of [2], who assume ψ and φ are conve, C = I, and use the value function vτ = ψa b s.t. φ τ to solve 2 using root-finding to solve vτ = σ. Variational properties of v are fully only understood in the conve setting, and efficient evaluation of vτ requires ψ to be smooth, so that efficient first-order methods are applicable. Here, we develop an approach to solve any problem of type 2, including problems with nonsmooth and nonconve ψ, φ, using only matri vector products with A, A T, C, C T and simple nonlinear operators. In special cases, the approach can also use equation solves to gain significant speedup. The general approach uses the relaation formulation proposed in [26], [25]. We use relaation to split φ, ψ from the linear map A and transformation map C, etending 2 to φw + C w 2 + w 2 A + b 2 2,w,w 2 2η 2η 2 s.t. ψw 2 σ. 4 with w R c and w 2 R d. In contrast to [26], we use a continuation scheme to force η i, in order to solve the original formulation 2. Thus the only eternal algorithmic parameter the scheme requires is σ, which controls the error budget for ψ. There are two algorithms readily available to solve 4. The first is pro-gradient descent, detailed in Algorithm. We let z =, w, w 2, and define Φz = φw + δ ψ σ w 2, where the indicator function δ ψ σ takes the value if ψw 2 σ, and infinity otherwise. Problem 4 can now be written as [ ] η C 2 η I z 2 η2 A z b η2 I +Φz. 5 }{{} fz Applying the pro-gradient descent iteration with step-size α z k+ = pro αφ z k α fz k 6 Algorithm 2 Value-function optimization for 4. : Input:, w, w 2 2: Initialize: k = 3: Define: H = η C T C + η 2 A T A 4: while not converged do 5: k+ H 6: w k+ pro η β φ 7: w k+ η C T w k + η 2 A T b + w2 k w k β η w k C k+ 2 proj σbψ w 2 k β η 2 w 2 A k+ b 8: k k + 9: end while : Output: w k, w k 2, k gives the coordinate updates in Algorithm. Pro-gradient has been analyzed in the general nonconve setting by [4]. However, Problem 5 is the sum of a conve quadratic and a nonconve regularizer. The rate of convergence for this problem class can be quantified, and [26, Theorem 2], reproduced below, will be very useful here. Theorem II. Pro-gradient for Regularized Least Squares. Consider the least squares objective pz := z 2 Az a 2 + Φz. with p bounded below, and Φ potentially nonsmooth, nonconve, and non-finite valued. With step α = σ ma, the iterates 6 satisfy where v k+ 2 A 2 k=,...,n N pz inf p v k = A 2 2I A T A k k+ is a subgradient generalized gradient of p at z k. We can specialize Theorem II. to our case by computing the norm of the least squares system in 5. Corollary II.2 Rate for Algorithm. Theorem II. applied to problem 4 gives with k=,...,n v k+ 2 Cη, η 2, C, A N pz inf p Cη, η 2, C, A = η c + C 2 F + η 2 d + A 2 F. Problem 4 also admits a different optimization strategy, summarized in Algorithm 2. We can formally imize the objective in directly via the gradient, with the imizer given by w = H C T w + A T w 2 + b η η 2 H = η C T C + η 2 A T A
3 3 with w = w, w 2. Plugging this epression back in gives a Algorithm 3 Block-coordinate descent for 4. regularized least squares problem in w alone: : Input:, w, w2 [ ] pw := φw + w,w 2 F w w b 2 2: Initialize: k = 2 s.t. ψw 2 σ 3: Define: H = η C T C + η 2 A T A 4: while not converged do F = η η CH C T I ηη 2 CH A T 5: k+ H η C T w k + η 2 A T b + w2 k η2η AH C T η2 I η AH A T 6: w k+ pro φ C k+ [ ηη 2 CH A T ] 7: w k+ 2 proj b σbψ A k+ b b = 8: η2 η AH A T. k k + I b 9: end while 7 : Output: w, k w2, k k Pro-gradient applied to the value function pw in 7 with step β gives the iteration w + = pro β Φ w k βf T Fw b 8 This iteration, as formally written, requires forg and applying the system F in 7 at each iteration. In practice we compute the w update on the fly, as detailed in Algorithm 2. The equivalence of Algorithm 2 to iteration 8 comes from the following derivative formula for value functions [5]: F T Fw b = η C T Cw w is also differentiable, with and hence Qw = D qdw, lip Q D T D 2 = ma,. η η 2 + η 2 A T Aw w 2 + b. In order to compute β, and apply Theorem II., we first prove the following lemma: Lemma II.3 Bound on F T F 2. The operator norm F T F 2 is bounded above by ma η, η 2. Proof. Considering the function Fw b 2 = C w 2 + w 2 A + b 2 2, 2η 2η }{{ 2 } Q,w we know that the gradient is given by F T Fw b, and any Lipschitz bound L gives F T Fw F T Fw 2 L w w 2, which means F T F 2 L. On the other hand, we can write the right hand side as where and qz, = 2 z Qw, = qdw, D = [ 2 η C 2 η 2 A ] [ ] η. η [ ] 2 b Using Theorem of [25] with gz =, we have that the value function qz = qz, is differentiable, with lip q. Therefore Qw = Qw, This immediately gives the result. Now we can combine iteration 8 with Theorem II. to get a rate of convergence for Algorithm 2. Corollary II.4 Convergence of Algorithm 2. When β satisfies β η, η 2, the iterates of Algorithm 2 satisfy v k+ 2 k=,...,n N ma, pw inf p η η 2 where v k is in the subdifferential generalized gradient of objective 7 at w k. Moreover, if η = η 2, then Algorithm 2 is equivalent to block-coordinate descent, as detailed in Algorithm 3. Proof. The convergence statement comes directly from plugging the estimate of iteration 8 into Theorem II.. The equivalence of Algorithm 3 with Algorithm 2 is obtained by plugging in step size β = η = η 2 into each line of Algorithm 2. An important consequence of Corollary II.4 is that the convergence rate of Algorithm 2 does not depend on C or A, in contrast to Algorithm, whose rate depends on both matrices Corollary II.2. The rates of both algorithms are affected by η, η 2. We use continuation in η, driving η, η 2 to, at the same rate, and warm-starting each problem at the previous solution. A convergence theory that takes this continuation into account is left to future work.
4 4 TABLE I SNR VALUES AGAINS THE TRUE FOR DIFFERENT l p NORMS WITH ALGORITHM 3. BPDN with Random Linear Operator Method/Norm SNR l 2 with SPGL.27 l 2 with Alg l with Alg l with Alg l with Alg A. Ineact Least-Squares Solves. Algorithm 3 has a provably faster rate of convergence than Algorithm. The practical performance of these algorithms is compared in Figure, which is solving a problem with both a l norm regularizer and l norm BPDN constraint, with α = A 2 F, C = I, and η = η 2 = 4. We see a huge performance difference in practice as well as in theory: the proimal gradient descent from Algorithm yields a slower cost function decay than solving eactly for w as in Algorithm 3. Indeed, Algorithm 3 admits the fastest cost function decay as shown in Corollary II.4, albeit at the epense of more operations per iteration. This is due to the fact that fully solving the least squares problem in Line 5 is not tractable for large-scale problems. Hence, we implement Algorithm 3 ineactly, using the Conjugate Gradient CG algorithm. Figure shows the results when we use, 5, and 2 CG iterations. Each CG iteration is implemented using matri-vector products, and at 2 iterations the results are indistinguishable from those of Algorithm 3 with full solves. Even at 5 iterations, the performance is remarkably close to that of of Algorithm 3 with full solves. Algorithm 3 has a natural warm-start strategy, with the from each previous iteration used in the subsequent LS solve using CG. Using a CG method with a bounded number of iterates gives fast convergence and saves computational time. This approach is used in the subsequent eperiments. III. APPLICATION TO BASIS PURSUIT DE-NOISE MODELS The Basis Pursuit De-noise problem can be formulated as s.t. ρ A b σ 9 where ρ is classically taken to be the l 2 -norm. In this problem, represents unknown coefficients that are sparse in a transform domain, while A is a composition of the observation operator with a transform matri; popular eamples of transform domains include discrete cosine transforms, wavelets, and curvelets. The observed and noisy data b resides in the temporal/spatial domain, and σ is the misfit tolerance. This problem was famously solved with the SPGL [23] algorithm for ρ = 2. When the observed data is affected by large sparse noise, a smooth constraint is ineffective. A nonsmooth variant of 9 is very difficult for approaches such as SPGL, which solves subproblems of the form ρ A b s.t. τ. However, the proposed Algorithm 2 is easily adaptable to different norms. We apply Algorithm 3 with φ =, Fig.. Objective function decay for Equation 4 with proimal-gradient descent Algorithm, Direct solving Algorithm 3, and several steps in between where we only partially solve for H... with Algorithm 2. taking η, η 2, so that w, w 2, A b. We can take many different ψ, including l 2, l, l, and l. Algorithm 3 is simple to implement. The least squares update in step 4 can be computed efficiently using either factorization with Woodbury, or an iterative method in cases where A is too large to store. For the Woodbury approach, we have η2 + η A T A = I η 2 η2 2 A T I + AA T A. η η 2 For moderate size systems, we can store Cholesky factor LL T = η I + η 2 AA T, with L R m m, and use L with to implement step 4. However, in the seismic/curvelet eperiment described below, the left-hand side of Equation is too large to store in memory, but is positive definite. Hence, we solve the resulting linear system in step 4 of Algorithm 3 with CG, using matrivector products. The w update is implemented via the l - proimal operator soft thresholding, while the w 2 update requires a projection onto the l p ball. The projectors used in our eperiments are collected in Table II. The least squares solve for is when C T is an orthogonal matri or tight frame, so that C T C = I; this is the case for Fourier transforms, wavelets, and curvelets. When A is a restriction operator, as for many data interpolation problems, A T A is a diagonal matri with zeros and ones, and hence H = η C T C + η 2 A T A is a diagonal matri with entries either η or η + η 2 ; the least squares problem for the update is then trivial. IV. BASIS PURSUIT DE-NOISE EXPERIMENTS In this application, we consider two eamples: the first is a small-scale BPDN to illustrate the proof of concept of our technique, while the second is an application to de-noising a common source gather etracted from a seismic line simulated using a 2D BG Compass model. The data set contains time samples with a temporal-interval of 4ms, and the spatial sampling is m. For this eample, we use curvelets as a
5 5 TABLE II PROJECTORS FOR l p BALLS. Norm l proj τbl z Solution { z, z < τ l 2 i 2 i Analytic τz/ z 2, z > τ l ma i i ma,, Analytic l i i { See e.g. [22] On ln n routine l i z i, i one of the τ largest indices i Analytic otherwise. a True a True b l 2 b l 2 c l c l d l e l d l Fig. 3. Basis Pursuit De-noising results for a randomly generated linear model with large, sparse noise. e l Fig. 2. Residuals for different l p-norms after algorithm teration. Note how the l - and l -norms can capture the outliers only. sparsfying transform domain. The first eample considers the same model as in 9 where we want to enforce sparsity on while constraining the data misfit. The variable is a vector of length n that has values {, } on a random 4% of its entries and zeros everywhere else; represents a spike train that we observe using a linear operator, A R n,m. A was generated with independent standard Gaussian entries, and b R m is observed data with large, sparse noise. We take m = 2 and n = 52. The noise is generated by placing large values on % of the observations and assug everything else was observed cleanly ie no noise. Here, we test the efficacy of using different l p norms on the residual constraint. With the addition of large, sparse noise to the data, smooth norms on the residual constraint should not be able to effectively deal with such outlier residuals. With our adaptable formulation, it should be easy to enforce both sparsity in the domain as
6 6 TABLE III C URVELET I NTERPOLATION AND D ENOISING RESULTS FOR SPGL AND A LGORITHM 4 FOR SELECTED `p - NORMS FOR BPDN. Method/Norm `2 with SPGL l2 with Alg.4 l with Alg.4 l with Alg.4 l with Alg.4 4D Monochromatic Interpolation SNR SNR w Time s early stoppage well as the residuals. Other formulations, such as SPGL, do not have this capability. This noise is depicted in as the bottom black dashed line in Figure 2. The results are shown in Figure 3 and in Table I. From these, we can clearly see that the `2 norm is not effective for sparse noise, even at the correct error budget σ. Our approach is resilient to different types of noise since we can easily change the residual ball projection. This is seen by the almost eact accuracy of the ` and ` norms, with SNR s of 33 and 45 respectively. The net test of the BPDN formulation is for a common source gather where entries are both omitted and corrupted with synthetic noise. Here, the objective function looks for sparsity in the curvelet domain, while the residual constraint seeks to match observed data within a certain tolerance σ. First, we note that doing interpolation only without added noise yields an SNR of approimately 3 for all formulations and algorithms; that is, all `p norms for Algorithm 4 and SPGL. Here, we again want to enforce sparsity both in the curvelet domain and the data residual ka bk, which SPGL and other algorithms lack the capacity to do. Following the first eperiment, we add large sparse noise to a handful of data points; in this case, we added large values to a random % of observations this does not include omitted entries. The noise added is approimately 2, while the observed data can range from to 3. The interpolated and denoising results are shown in Figure 4 and Table III. Large, sparse noise cannot be filtered effectively by a smooth norm constraint, using either Algorithm 4 or SPGL. However, ` and ` norms effectively handle such noise, and can be optimized using our approach. The SNR s for these implementations are approimately and respectively, approaching that of the noiseless data mentioned above. V. E XTENSION TO L OW-R ANK M ODELS Treating the data as having a matri structure gives additional regularization tools in particular low-rank structure in particular domains. The BPDN formulation for residualconstrained low-rank interpolation is given by kxk X s.t. ρ AX b σ for X Cm n, A : Cn m Cp is a linear masking operator from full to observed noisy data b, and σ is the misfit tolerance. The nuclear norm kxk is the ` norm of the singular values of X. Solving the problem requires using a decision variable that is the size of the data, as well a True Data b Added Noise binary c Noisy Data with Missing Sources d SPGL e l2 f l g l h l Fig. 4. Interpolation and de-noising results for BPDN in the curvelet domain. Observe the complete inaccuracy of smooth norms with large, sparse noise. as updates to this variable that require SVDs at each iteration. It is much more efficient to model X is a product of two matrices L and R, given by klk2f + krk2f s.t. ρ ALRT b σ 2 L,R 2 where L Cn k, R Cm k, and LRT is the low-rank representation of the data. The solution is guaranteed to be at most rank k, and in addition, the regularizer 2 klk2f +krk2f is an upper bound for klrt k, the sum of singular values of LRT, further penalizing rank by proy. The decision variables then have combined dimension km n, which is much smaller than the nm variables required by conve formulations. When ρ is smooth, the problems are solved using a continuation that interchanges the roles of the objective and constraints, solving a sequence of problems where ρ ALRT b is imized over the `2 ball [3] using projected gradient; an approach we call SPGLR below.
7 7 When ρ is not smooth, SPGLR does not work and there are no available implementations for 2. Nonsmooth ρ arise when we want the residual to be in the l norm ball, so we are robust to outliers in the data, and can eactly fit inliers. We now etend Algorithm 3 to this case. For any ρ smooth or nonsmooth, we introduce a latent variable W for the data matri, and solve L,R,W L 2 R + F 2η W LRT 2 2, s.t. AW b p σ 3 with η a parameter that controls the degree of relaation; as η we have W LR T. The relaation allows a simple block-coordinate descent detailed in Algorithm 4. Algorithm 4 Block-Coordinate Descent for 3. : Input: w, L, R 2: Initialize: k = 3: while not converged do 4: L k+ I + ηr T k R k ηwk R k 5: R k+ ηwk T L k+ { I + ηl T k+ L k+ 6: W k+ L k+ Rk+ T ij, i, j X obs proj Bρ,σ ALk+ Rk+ T b, 7: k k + 8: end while 9: Output: w k, L k, R k o.w. Algorithm 4 is also simple to implement. It requires two least squares solves for L and R, which are inherently parallelizable. It also requires a projection of the updated data matri estimates LR T onto the σ-level set of the misfit penalty ρ. This step is detailed below. For unobserved data i, j X obs, we have W ij = LR T ij. For observed data, let v denote ALR T. Then the W update step is given by solving w w v 2 2, s.t. w b p σ. Using the simple substitution z = w b, the we get z z v b 2 2, s.t. z p σ which is precisely the projection of ALR T b onto B p,σ, the σ-level set of ρ. We use the same projectors for ρ {l, l, l 2, l } as in Section IV, see Table II. The convergence criteria for Algorithm 4 is based on the optimality of the quadratic subproblems in L, R and feasibility measure of W LR T, though in practice we compare performance of algorithms based on a computational budget. This blockcoordinate descent scheme converges to a stationary point of Equation 3 by [2, Theorem 4.]. Implementing block-coordinate descent on these forms until convergence produces the completed low-rank matri. Setting ν = LR T w 2 2, we iterate until ν < e 5 or a maimum number of iterations is reached. In the net section, we develop an application of this method to seismic interpolation and denoising. VI. 4D MATRIX COMPLETION WITH DE-NOISING There are two main requirements when using the rankimization based framework for seismic data interpolation and denoising: i underlying seismic data should ehibit lowrank structure singular values should decay fast in some transform domain, and, ii subsampling and noise destroy the low-rank structure singular values decay slow in that domain. For eploiting the low-rank structure during interpolation and denoising, we follow the matricization strategy proposed by [8]. The matricization source-, source-y, i.e., placing both the source coordinates along the columns Figure 6a, gives slow-decay of singular values Figure 5a, while the matricization source-, receiver- Figure 6c gives fast decay of the singular values Figure 5b. To understand the effect of subsampling on the low-rank structure, we remove the 5% of the sources. Subsampling destroys the fast singular value decay in the source-, receiver- matricization, but not in the source-, receiver-y matricization. This is because missing sources are missing columns in the source-, sourcey matricization, and missing sub-blocks in the source-, receiver- matricization Figure 6b. The latter is more effective for low-rank interpolation. Similar to the BPDN eperiments, we want to show that nonsmooth constraints on the data residual can be effective for dealing with large, sparse noise. The smooth l 2 norm that is most common in BPDN problem will fail in such eamples, thereby leading to better data estimation with the implementation of non-smooth norms on the residuals. Thus, the goal of the below eperiments is to show that enforcing sparsity in the singular values ie low-rank and sparsity in the residual constraint can be more effective with large, sparse noise than smooth residual constraints solved by most contemporary algorithms. A. Eperiment Description This eample demonstrates the efficacy of the proposed approach using data created by a 5D dataset based on a comple SEG/EAGE overthrust model simulation []. The dimension of the model is 5 km km km and is discretized on a 25 m 25 m 25 m grid. The simulated data contains 2 2 receivers sampled at 5 m and sources sampled at m. We apply the Fourier transform along the time domain and etract a frequency slice at Hz as shown in Figure 7a, which is a 4D object source-, source-y, receiver- and receiver-y. We eliate 8% of the sources and add large sparse outliers from the random gaussian distribution N, a i max si mean zero and variance on the order of the largest value in that particular source. The generated values with the highest magnitudes are kept, and these are randomly added to observations in the remaining sources Figure 7f. The largest value of our dataset is approimately 4, while the smallest is close to zero. Thus, we are essentially increasing/decreasing % of the entries by several orders of magnitude, which contaates the data significantly, especially if the original entry was nearly. For all low-rank completion and denoising, we let a i =. The objective is to recover missing sources and
8 8 rec,yrec src, ysrc src, ysrc 5 6 a Full src-, src-y b Subsampled src-, src-y 5 6 yrec,ysrc yrec,ysrc Normalized singular value 2 rec,yrec No subsampling 5% missing sources Singular value inde 6 a source-, source-y src,rec src,rec 5 6 No subsampling 5% missing sources Normalized singular value c Full src-, rec- d Subsampled src-, rec- Fig. 6. Full and subsampled matricizations used in low-rank completion Source: [4]. 2 3 TABLE IV 4D D E - NOISING RESULTS FOR SPGLR AND A LGORITHM 4 FOR SELECTED `p NORMS Singular value inde 4D Monochromatic Method/Norm SNR `2 with SPGLR.7489 l2 with Alg l with Alg l with Alg l with Alg De-noising SNR-W Time s b source-, receiver- Fig. 5. Normalized Singular value decay for full data and 5% missing sources with two different matricizations. Source: [3]. eliate noise from observed data. We use a rank of k = 75 for the formulation that is, L Cn 75 and similarly for R, and run all algorithms for 5 iterations, using a fied computational budget. We perform three eperiments on the same dataset: De-noising only Figure 7c; 2 Interpolation only Figure 7d; and 3 Combined Interpolation and Denoising Figure 7f. Since we have ground truth, we pick σ to be the eact difference between generated noisy data and the true data; σ for the l norm is a cardinality measure, so it is set to number of noisy points added. B. Results Tables IV-VI display SNR values for different algorithms and formulations for the three types of eperiments, and Figures 8- display the results for a randomly selected number of sources for the three eperiments. Even a small number of outliers can greatly impact the quality of the low-rank denoising and interpolation for the standard, smoothly residualconstrained algorithms. The de-noising only results Figure 8, Table IV show that all methods perform well when all sources are available. The interpolation only results Figure 9, Table V show that all constraints perform well in interpolating the missing data. This makes sense, as all algorithms will simply favor the low-rank nature of the data. However, the combined de-noising and interpolation dataset shows that the ` norm approach does far better than any smooth norm in comparable time. Table VI shows that when data for similar sources is absent/not observed, the smoothly-constrained formulations fail completely. When noise is added to the low-amplitude section of the observed data, the smoothly-constrained norms fail drastically, while the ` norm can effectively remove the errors. This is starkly evident in Figures a-e, where all ecept Figure e are essentially noise; the result is supported by the SNR values in Table VI. While Figures ae can mostly capture the structure of the data where there were nonzero values ie where the seismic wave is observed in the upper left corner of each source, only the ` norm can capture the areas of lower energy data. TABLE V 4D I NTERPOLATION RESULTS FOR SPGLR AND A LGORITHM 4 FOR SELECTED `p NORMS. 4D Monochromatic Interpolation Method/Norm SNR SNR-W Time s `2 with SPGLR l2 with Alg l with Alg l with Alg l with Alg
9 9 TABLE VI 4D COMBINED DE-NOISING AND INTERPOLATION RESULTS FOR SPGLR AND ALGORITHM 4 FOR SELECTED l p NORMS. 4D Monochromatic De-noising & Interpolation Method/Norm SNR SNR-W Time s l 2 with SPGLR l 2 with Alg l with Alg l with Alg l with Alg VII. CONCLUSIONS We proposed a new approach for level-set formulations, including basis pursuit denoise and residual-constrained lowrank formulations. The approach is easily adapted to a variety of nonsmooth and nonconve data constraints. The resulting problems are solved using Algorithm 2 and 4; which require only that the penalty ρ has an efficient projector. The algorithms are simple, scalable, and efficient. Sparse curvelet denoising and low-rank interpolation of a monochromatic slice from the 4D seismic data volumes demonstrate the potential of the approach. A particular quality of the seismic denoising and interpolation problem is that the amplitudes of the signal have significant spatial variation. The error in the data is a much larger problem for low-amplitude data. This quality makes it very difficult to obtain reasonable results using Gaussian misfits and constraints. Nonsmooth eact formulations including l and particularly l appear to be etremely well-suited for this magnified heteroscedastic issue. VIII. ACKNOWLEDGEMENTS The authors acknowledge support from the Department of Energy Computational Science Graduate Fellowship, which is provided under grant number DE-FG2-97ER2538, and the Washington Research Foundation Data Science Professorship. REFERENCES [] F. Azadeh, N. Burkhard, L. Nicoletis, F. Rocca, and K. Wyatt. Seg/eaeg 3-d modeling project: 2nd update. The Leading Edge, 39: , 994. [2] A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, and S. Roy. Level-set methods for conve optimization. To appear in Mathematical Programg, Series B., 28. [3] A. Y. Aravkin, R. Kumar, H. E. Mansour, B. Recht, and F. J. Herrmann. Fast methods for denoising matri completion formulations, with applications to robust seismic data interpolation. SIAM J. Scientific Computing, 36, 24. [4] H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. Proimal alternating imization and projection methods for nonconve problems: An approach based on the kurdyka-łojasiewicz inequality. Mathematics of Operations Research, 352: , 2. [5] B. M. Bell and J. V. Burke. Algorithmic differentiation of implicit functions and optimal values. In Advances in Automatic Differentiation, pages Springer, 28. [6] E. J. Candès and T. Tao. Near-optimal signal recovery from random projections: universal encoding strategies. IEEE Transactions on Information Theory, 522: , 26. [7] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 2:33 6, 998. [8] C. Da Silva and F. J. Herrmann. Optimization on the hierarchical tucker manifold applications to tensor completion. Linear Algebra and its Applications, 48:3 73, 25. [9] D. Davis and W. Yin. Convergence rate analysis of several splitting schemes. In Splitting Methods in Communication, Imaging, Science, and Engineering, pages Springer, 26. [] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 322:47 99, 24. [] F. Girosi. An equivalence between sparse approimation and support vector machines. Neural Comp., 6:455 48, 998. [2] F. J. Herrmann and G. Hennenfent. Non-parametric seismic data recovery with curvelet frames. Geophysical Journal International, 73: , 28. [3] A. Kadu and R. Kumar. Decentralized full-waveform inversion. Submitted to EAGE on January 5, 28, 28. [4] R. Kumar, C. Da Silva, O. Akalin, A. Y. Aravkin, H. Mansour, B. Recht, and F. J. Herrmann. Efficient matri completion for seismic data reconstruction. Geophysics, 85:V97 V4, 25. [5] R. Kumar, O. López, D. Davis, A. Y. Aravkin, and F. J. Herrmann. Beating level-set methods for 5-d seismic data interpolation: A primal-dual alternating approach. IEEE Transactions on Computational Imaging, 32: , June 27. [6] M. Lustig, D. Donoho, and J. Pauly. Sparse mri: The application of compressed sensing for rapid mr imaging. Magnetic Resonance in Medicine, 58:82 95, 27. [7] L. Oneto, S. Ridella, and D. Anguita. Tikhonov, Ivanov and Morozov regularization for support vector machine learning. Machine Learning, 3:3 36, 26. [8] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed imum-rank solutions of linear matri equations via nuclear norm imization. SIAM Rev., 523:47 5, Aug. 2. [9] M. D. Sacchi, T. J. Ulrych, and C. J. Walker. Interpolation and etrapolation using a high-resolution discrete fourier transform. IEEE Transactions on Signal Processing, 46:3 38, Jan 998. [2] J. A. Tropp. Just rela: Conve programg methods for identifying sparse signals in noise. IEEE Transactions on Information Theory, 523:3 5, March 26. [2] P. Tseng. Convergence of a block coordinate descent method for nondifferentiable imization. Journal of optimization theory and applications, 93: , 2. [22] E. Van Den Berg and M. P. Friedlander. Probing the pareto frontier for basis pursuit solutions. SIAM Journal on Scientific Computing, 32:89 92, 28. [23] E. van den Berg and M. P. Friedlander. Probing the pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput., 32:89 92, Nov. 28. [24] A. Yurtsever, M. Udell, J. A. Tropp, and V. Cevher. Sketchy Decisions: Conve Low-Rank Matri Optimization with Optimal Storage. ArXiv e-prints, Feb. 27. [25] P. Zheng and A. Aravkin. Fast methods for nonsmooth nonconve imization. ArXiv e-prints, Feb. 28. [26] P. Zheng, T. Askham, S. L. Brunton, J. N. Kutz, and A. Y. Aravkin. A Unified Framework for Sparse Relaed Regularized Regression: SR3. ArXiv e-prints, July 28.
10 a Fully sampled monochromatic slize at Hz. b Noisy data alone binary. Sparse noise was added by keeping the top entries generated from a normal distribution with mean zero and variance. maxsi c Observed noisy data. d Subsampled noiseless data. We omitted 8% e Subsampled and noise, with noise only present f Subsampled and noisy data. We again omitted of sources. binary. 8% of sources and added the noise described above to the rest of the sources. Fig. 7. True data and three different eperiments for testing our completeness algorithm. b l2 a SPGLR d l Fig. 8. Denoising-only results. c l e l
11 b l2 a SPGLR d l c l e l Fig. 9. Interpolation-only results. b l2 a SPGLR d l Fig.. Interpolation and Denoising results. c l e l
Seismic data interpolation and denoising using SVD-free low-rank matrix factorization
Seismic data interpolation and denoising using SVD-free low-rank matrix factorization R. Kumar, A.Y. Aravkin,, H. Mansour,, B. Recht and F.J. Herrmann Dept. of Earth and Ocean sciences, University of British
More informationLow-rank Promoting Transformations and Tensor Interpolation - Applications to Seismic Data Denoising
Low-rank Promoting Transformations and Tensor Interpolation - Applications to Seismic Data Denoising Curt Da Silva and Felix J. Herrmann 2 Dept. of Mathematics 2 Dept. of Earth and Ocean Sciences, University
More informationMaking Flippy Floppy
Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Current
More informationFWI with Compressive Updates Aleksandr Aravkin, Felix Herrmann, Tristan van Leeuwen, Xiang Li, James Burke
Consortium 2010 FWI with Compressive Updates Aleksandr Aravkin, Felix Herrmann, Tristan van Leeuwen, Xiang Li, James Burke SLIM University of British Columbia Full Waveform Inversion The Full Waveform
More informationLecture 23: November 19
10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo
More informationMaking Flippy Floppy
Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Vietnam
More informationOptimal Value Function Methods in Numerical Optimization Level Set Methods
Optimal Value Function Methods in Numerical Optimization Level Set Methods James V Burke Mathematics, University of Washington, (jvburke@uw.edu) Joint work with Aravkin (UW), Drusvyatskiy (UW), Friedlander
More informationSparse Optimization Lecture: Basic Sparse Optimization Models
Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm
More informationStructured tensor missing-trace interpolation in the Hierarchical Tucker format Curt Da Silva and Felix J. Herrmann Sept. 26, 2013
Structured tensor missing-trace interpolation in the Hierarchical Tucker format Curt Da Silva and Felix J. Herrmann Sept. 6, 13 SLIM University of British Columbia Motivation 3D seismic experiments - 5D
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More information1-Bit Compressive Sensing
1-Bit Compressive Sensing Petros T. Boufounos, Richard G. Baraniuk Rice University, Electrical and Computer Engineering 61 Main St. MS 38, Houston, TX 775 Abstract Compressive sensing is a new signal acquisition
More informationMS&E 318 (CME 338) Large-Scale Numerical Optimization. A Lasso Solver
Stanford University, Dept of Management Science and Engineering MS&E 318 (CME 338) Large-Scale Numerical Optimization Instructor: Michael Saunders Spring 2011 Final Project Due Friday June 10 A Lasso Solver
More informationLecture 22: More On Compressed Sensing
Lecture 22: More On Compressed Sensing Scribed by Eric Lee, Chengrun Yang, and Sebastian Ament Nov. 2, 207 Recap and Introduction Basis pursuit was the method of recovering the sparsest solution to an
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationColor Scheme. swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14
Color Scheme www.cs.wisc.edu/ swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July 2016 1 / 14 Statistical Inference via Optimization Many problems in statistical inference
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationRobustly Stable Signal Recovery in Compressed Sensing with Structured Matrix Perturbation
Robustly Stable Signal Recovery in Compressed Sensing with Structured Matri Perturbation Zai Yang, Cishen Zhang, and Lihua Xie, Fellow, IEEE arxiv:.7v [cs.it] 4 Mar Abstract The sparse signal recovery
More informationLecture Notes 9: Constrained Optimization
Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form
More informationBasis Pursuit Denoising and the Dantzig Selector
BPDN and DS p. 1/16 Basis Pursuit Denoising and the Dantzig Selector West Coast Optimization Meeting University of Washington Seattle, WA, April 28 29, 2007 Michael Friedlander and Michael Saunders Dept
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More information3D INTERPOLATION USING HANKEL TENSOR COMPLETION BY ORTHOGONAL MATCHING PURSUIT A. Adamo, P. Mazzucchelli Aresys, Milano, Italy
3D INTERPOLATION USING HANKEL TENSOR COMPLETION BY ORTHOGONAL MATCHING PURSUIT A. Adamo, P. Mazzucchelli Aresys, Milano, Italy Introduction. Seismic data are often sparsely or irregularly sampled along
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationLECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.
Optimization Models EE 127 / EE 227AT Laurent El Ghaoui EECS department UC Berkeley Spring 2015 Sp 15 1 / 23 LECTURE 7 Least Squares and Variants If others would but reflect on mathematical truths as deeply
More informationTime domain sparsity promoting LSRTM with source estimation
Time domain sparsity promoting LSRTM with source estimation Mengmeng Yang, Philipp Witte, Zhilong Fang & Felix J. Herrmann SLIM University of British Columbia Motivation Features of RTM: pros - no dip
More informationAccelerated large-scale inversion with message passing Felix J. Herrmann, the University of British Columbia, Canada
Accelerated large-scale inversion with message passing Felix J. Herrmann, the University of British Columbia, Canada SUMMARY To meet current-day challenges, exploration seismology increasingly relies on
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More information1 minimization without amplitude information. Petros Boufounos
1 minimization without amplitude information Petros Boufounos petrosb@rice.edu The Big 1 Picture Classical 1 reconstruction problems: min 1 s.t. f() = 0 min 1 s.t. f() 0 min 1 + λf() The Big 1 Picture
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationElaine T. Hale, Wotao Yin, Yin Zhang
, Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2
More informationA Smoothing SQP Framework for a Class of Composite L q Minimization over Polyhedron
Noname manuscript No. (will be inserted by the editor) A Smoothing SQP Framework for a Class of Composite L q Minimization over Polyhedron Ya-Feng Liu Shiqian Ma Yu-Hong Dai Shuzhong Zhang Received: date
More information10-725/36-725: Convex Optimization Spring Lecture 21: April 6
10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationLarge-Scale L1-Related Minimization in Compressive Sensing and Beyond
Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March
More informationLeast Squares with Examples in Signal Processing 1. 2 Overdetermined equations. 1 Notation. The sum of squares of x is denoted by x 2 2, i.e.
Least Squares with Eamples in Signal Processing Ivan Selesnick March 7, 3 NYU-Poly These notes address (approimate) solutions to linear equations by least squares We deal with the easy case wherein the
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationSuccessive Concave Sparsity Approximation for Compressed Sensing
Successive Concave Sparsity Approimation for Compressed Sensing Mohammadreza Malek-Mohammadi, Ali Koochakzadeh, Massoud Babaie-Zadeh, Senior Member, IEEE, Magnus Jansson, Senior Member, IEEE, and Cristian
More informationRobust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition
Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Gongguo Tang and Arye Nehorai Department of Electrical and Systems Engineering Washington University in St Louis
More informationCompressive Phase Retrieval From Squared Output Measurements Via Semidefinite Programming
Compressive Phase Retrieval From Squared Output Measurements Via Semidefinite Programming Henrik Ohlsson, Allen Y. Yang Roy Dong S. Shankar Sastry Department of Electrical Engineering and Computer Sciences,
More informationBias-free Sparse Regression with Guaranteed Consistency
Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March
More informationFinding a sparse vector in a subspace: Linear sparsity using alternating directions
Finding a sparse vector in a subspace: Linear sparsity using alternating directions Qing Qu, Ju Sun, and John Wright {qq05, js4038, jw966}@columbia.edu Dept. of Electrical Engineering, Columbia University,
More informationSparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images
Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Alfredo Nava-Tudela ant@umd.edu John J. Benedetto Department of Mathematics jjb@umd.edu Abstract In this project we are
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationApproximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery
Approimate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery arxiv:1606.00901v1 [cs.it] Jun 016 Shuai Huang, Trac D. Tran Department of Electrical and Computer Engineering Johns
More informationStatistical Geometry Processing Winter Semester 2011/2012
Statistical Geometry Processing Winter Semester 2011/2012 Linear Algebra, Function Spaces & Inverse Problems Vector and Function Spaces 3 Vectors vectors are arrows in space classically: 2 or 3 dim. Euclidian
More informationStructured matrix factorizations. Example: Eigenfaces
Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix
More informationLecture 26: April 22nd
10-725/36-725: Conve Optimization Spring 2015 Lecture 26: April 22nd Lecturer: Ryan Tibshirani Scribes: Eric Wong, Jerzy Wieczorek, Pengcheng Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept.
More informationSignal Recovery from Permuted Observations
EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,
More informationGauge optimization and duality
1 / 54 Gauge optimization and duality Junfeng Yang Department of Mathematics Nanjing University Joint with Shiqian Ma, CUHK September, 2015 2 / 54 Outline Introduction Duality Lagrange duality Fenchel
More informationOn State Estimation with Bad Data Detection
On State Estimation with Bad Data Detection Weiyu Xu, Meng Wang, and Ao Tang School of ECE, Cornell University, Ithaca, NY 4853 Abstract We consider the problem of state estimation through observations
More informationMinimizing the Difference of L 1 and L 2 Norms with Applications
1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:
More informationSensing systems limited by constraints: physical size, time, cost, energy
Rebecca Willett Sensing systems limited by constraints: physical size, time, cost, energy Reduce the number of measurements needed for reconstruction Higher accuracy data subject to constraints Original
More informationAn Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion
An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion Robert M. Freund, MIT joint with Paul Grigas (UC Berkeley) and Rahul Mazumder (MIT) CDC, December 2016 1 Outline of Topics
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationMonotonicity and Restart in Fast Gradient Methods
53rd IEEE Conference on Decision and Control December 5-7, 204. Los Angeles, California, USA Monotonicity and Restart in Fast Gradient Methods Pontus Giselsson and Stephen Boyd Abstract Fast gradient methods
More informationA NONCONVEX ADMM ALGORITHM FOR GROUP SPARSITY WITH SPARSE GROUPS. Rick Chartrand and Brendt Wohlberg
A NONCONVEX ADMM ALGORITHM FOR GROUP SPARSITY WITH SPARSE GROUPS Rick Chartrand and Brendt Wohlberg Los Alamos National Laboratory Los Alamos, NM 87545, USA ABSTRACT We present an efficient algorithm for
More informationLecture 7: Weak Duality
EE 227A: Conve Optimization and Applications February 7, 2012 Lecture 7: Weak Duality Lecturer: Laurent El Ghaoui 7.1 Lagrange Dual problem 7.1.1 Primal problem In this section, we consider a possibly
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations
Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA
More informationExact Low-rank Matrix Recovery via Nonconvex M p -Minimization
Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:
More informationA tutorial on sparse modeling. Outline:
A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationLecture 23: Conditional Gradient Method
10-725/36-725: Conve Optimization Spring 2015 Lecture 23: Conditional Gradient Method Lecturer: Ryan Tibshirani Scribes: Shichao Yang,Diyi Yang,Zhanpeng Fang Note: LaTeX template courtesy of UC Berkeley
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationDeep Learning: Approximation of Functions by Composition
Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationarxiv: v1 [cs.it] 26 Oct 2018
Outlier Detection using Generative Models with Theoretical Performance Guarantees arxiv:1810.11335v1 [cs.it] 6 Oct 018 Jirong Yi Anh Duc Le Tianming Wang Xiaodong Wu Weiyu Xu October 9, 018 Abstract This
More informationOrthogonal Matching Pursuit for Sparse Signal Recovery With Noise
Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationSelf-Calibration and Biconvex Compressive Sensing
Self-Calibration and Biconvex Compressive Sensing Shuyang Ling Department of Mathematics, UC Davis July 12, 2017 Shuyang Ling (UC Davis) SIAM Annual Meeting, 2017, Pittsburgh July 12, 2017 1 / 22 Acknowledgements
More informationSparse signals recovered by non-convex penalty in quasi-linear systems
Cui et al. Journal of Inequalities and Applications 018) 018:59 https://doi.org/10.1186/s13660-018-165-8 R E S E A R C H Open Access Sparse signals recovered by non-conve penalty in quasi-linear systems
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie
More informationCompressed Sensing and Sparse Recovery
ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing
More informationSparsity in Underdetermined Systems
Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationRecent Developments in Compressed Sensing
Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline
More informationFrequency-Domain Rank Reduction in Seismic Processing An Overview
Frequency-Domain Rank Reduction in Seismic Processing An Overview Stewart Trickett Absolute Imaging Inc. Summary Over the last fifteen years, a new family of matrix rank reduction methods has been developed
More informationCombining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation
UIUC CSL Mar. 24 Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation Yuejie Chi Department of ECE and BMI Ohio State University Joint work with Yuxin Chen (Stanford).
More informationStrengthened Sobolev inequalities for a random subspace of functions
Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)
More informationAn Homotopy Algorithm for the Lasso with Online Observations
An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu
More informationLeast Sparsity of p-norm based Optimization Problems with p > 1
Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from
More informationMessage Passing Algorithms for Compressed Sensing: II. Analysis and Validation
Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation David L. Donoho Department of Statistics Arian Maleki Department of Electrical Engineering Andrea Montanari Department of
More informationRecent developments on sparse representation
Recent developments on sparse representation Zeng Tieyong Department of Mathematics, Hong Kong Baptist University Email: zeng@hkbu.edu.hk Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last
More informationEquivalence of Minimal l 0 and l p Norm Solutions of Linear Equalities, Inequalities and Linear Programs for Sufficiently Small p
Equivalence of Minimal l 0 and l p Norm Solutions of Linear Equalities, Inequalities and Linear Programs for Sufficiently Small p G. M. FUNG glenn.fung@siemens.com R&D Clinical Systems Siemens Medical
More informationRecovering overcomplete sparse representations from structured sensing
Recovering overcomplete sparse representations from structured sensing Deanna Needell Claremont McKenna College Feb. 2015 Support: Alfred P. Sloan Foundation and NSF CAREER #1348721. Joint work with Felix
More informationCompressive Sensing and Beyond
Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered
More informationAccelerated Gradient Method for Multi-Task Sparse Learning Problem
Accelerated Gradient Method for Multi-Task Sparse Learning Problem Xi Chen eike Pan James T. Kwok Jaime G. Carbonell School of Computer Science, Carnegie Mellon University Pittsburgh, U.S.A {xichen, jgc}@cs.cmu.edu
More informationTRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS
TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München, München, Arcistraße
More informationOverview. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Overview Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 1/25/2016 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion
More informationSMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines
vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,
More informationOptimisation Combinatoire et Convexe.
Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationAdaptive one-bit matrix completion
Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines
More informationSparse representation classification and positive L1 minimization
Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng
More informationCompressed Sensing: a Subgradient Descent Method for Missing Data Problems
Compressed Sensing: a Subgradient Descent Method for Missing Data Problems ANZIAM, Jan 30 Feb 3, 2011 Jonathan M. Borwein Jointly with D. Russell Luke, University of Goettingen FRSC FAAAS FBAS FAA Director,
More informationEUSIPCO
EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,
More informationSolution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions
Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Yin Zhang Technical Report TR05-06 Department of Computational and Applied Mathematics Rice University,
More informationABSTRACT. Recovering Data with Group Sparsity by Alternating Direction Methods. Wei Deng
ABSTRACT Recovering Data with Group Sparsity by Alternating Direction Methods by Wei Deng Group sparsity reveals underlying sparsity patterns and contains rich structural information in data. Hence, exploiting
More information