A Linearly Convergent First-order Algorithm for Total Variation Minimization in Image Processing

Size: px
Start display at page:

Download "A Linearly Convergent First-order Algorithm for Total Variation Minimization in Image Processing"

Transcription

1 A Linearly Convergent First-order Algorithm for Total Variation Minimization in Image Processing Cong D. Dang Kaiyu Dai Guanghui Lan October 9, 0 Abstract We introduce a new formulation for total variation minimization in image denoising. We also present a linearly convergent first-order method for solving this reformulated problem and show that it possesses a nearly dimension-independent iteration complexity bound. Keywords: Image denoising, Total variation, First-order methods, Complexity, Linear rate of convergent Introduction The restoration of images contaminated by noise is a fundamental problem in biomedical image processing and plays an important role in certain diagnosis techniques such as Magnetic Resonance Image (MRI) and functional Magnetic Resonance Image (fmri). In 99, Rudin, Osher and Fatemi (ROF) 7] proposed an influential optimization approach for image denoising by minimizing the total variation (TV). It turns out that the ROF model can preserve edges and important features in the original image. In this paper we propose an alternative formulation (or relaxation) for minimizing the total variation, which leads to comparable denoising quality to the classical ROF model. Moreover, we show that the relaxed model can be solved very efficiently. In particular, we present a linearly convergent first-order algorithm for solving this new model, and demonstrate that it possesses an O (ln(/ɛ)) iteration complexity for achieving the target accuracy ɛ. Since the aforementioned iteration complexity bound is almost dimension-independent and the iteration cost only linearly depends on the dimension, the total arithmetic complexity of our algorithm is O ( N ln(/ɛ) ) for processing an N N image. Hence, our approach is scalable to very large-scale image denoising problems. By contrast, most existing approaches for solving the original ROF model are based on an equivalent dual or primal-dual formulation (see, e.g., Chan et al. 5], Chambolle 3], Beck and Teboulle, ], Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, 36. ( congdd@ufl.edu). This author was partially supported by a doctoral fellowship from the Vietnam International Education Development and NSF GRANT CMMI Software School, Fudan University, Shanghai, China, 003. ( kydai@fudan.edu.cn). This author was partially supported by a visiting scholar fellowship from China Scholarship Council. Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, 36. ( glan@ise.ufl.edu). This author was partially supported by NSF GRANT CMMI

2 Chambolle and Pock 4]). All these algorithms converge sublinearly for solving the ROF model and the best iteration complexity bound is given by O(/ ɛ), 4]. Moreover, these complexity results heavily depend on the dimension of the problem, the selection of the initial point and certain regularization parameters. We also refer to other algorithms recently developed for solving the ROF model (e.g., 6, 8, 7, 9,, 3, 9]) and references therein. This paper is organized as follows. We review the classical ROF model and present a new strongly convex composite reformulation (or relaxation) for the ROF model in Section. An efficient algorithm for solving the reformulation is presented and analyzed in Section 3. We then report some promising numerical results and biomedical applications in Section 4. All the technical proofs are given in the Appendix. A strongly convex composite reformulation for total variation minimization In this section, we review the classical ROF model for image denoising and present a novel reformulation for it. We also show how these two TV-based models for image denoising are related. For the sake of simplicity, let us assume that the images are of -dimension with N N pixels. For any image u R N N, the discretized gradient operator u is defined as where ( u) := ( u) := (( u), ( u) ), i, j =,..., N, (.) { { ui+,j u, i < N 0, i = N and u+ u ( u) :=, j < N 0, j = N. Then, the classical total variation minimization problem is given by ū = arg min {φ(u) := T (u) + λ } u f, (.) where λ > 0 is a user-defined parameter, f is the observed noisy image and T (u) = ( ) ( u) ( u). Observe that the norm in the definition of T ( ) (and hereafter) can be either the l or l norm. If =, then problem (.) is exactly the original ROF model. It can be easily seen that the objective function φ(u) in (.) is a nonsmooth strongly convex function. It is known that oracle-based convex optimization techniques would require O(/ɛ) iterations to find an ɛ-solution of (.), i.e., a point û such that φ(û) φ ɛ (see 0]). It has recently been shown that the above iteration complexity can be significantly improved to O(/ ɛ) by using a dual or saddle point reformulation of (.) (e.g.,, 4, 5]). Note, however, that all these algorithms converge sublinearly, and that their performance also heavily depends on the dimension N and the selection of starting points. In order to address these issues, we consider an alternative formulation of problem (.). The basic idea is to introduce an extra variable d R N(N ), which corresponds to the nonzero components

3 of the gradient operator u, and then to impose the following set of constraints: d = u i+,j u, i =,..., N ; j =,..., N, d = u + u, i =,..., N; j =,..., N. Observe that the above constraints can be written in the matrix form as Eu + d = 0, (.3) where E T is a network flow matrix with N nodes and N(N ) arcs, with each node having at most degree 4, i.e., E = Now, denoting T (d) := ( ) d d, we consider the following optimization problem of { (u, d ) = arg min φ(u, d) := T (d) + λ u f + q Eu + d ]} (.4) for some parameters λ, q > 0. Similar to φ(u) in (.), the new objective function φ(u, d) is also a nonsmooth strongly convex function. While the non-separable and nonsmooth term T ( ) makes problem (.) difficult to solve, the nonsmooth term T ( ) in (.4) is separable with respect to (d, d ). This fact will enable us to design a very efficient algorithm for solving problem (.4) (see Section 3.). We would also like to provide some intuitive explanations about the reformulation given in (.4). Observe that both terms, i.e., T (d) and Eu + d, can be viewed as certain regularization terms. While the first term T (d) enforces the sparsity of the vector d, i.e., the estimated gradient vector, and thus help to smooth the recovered image, the latter term Eu + d essentially takes into account that the computation of d is not exact because of the stochastic noise. Introducing this extra regularization term into the optimization problem would protect the image from being oversmoothed, as what might happen for the original formulation in (.) (see Section 4). It is interesting to observe some relations between problem (.) and (.4). Proposition Let φ and φ be the optimal values of (.) and (.4), respectively. We have φ φ φ + N λq. (.5) If follows from Proposition that, for a given λ and N, the parameter q in (.4) should be big enough in order to approximately solve the original problem (.). Observe, however, that our goal is not to solve problem (.), but to recover the contaminated image. Due to the aforementioned role that the extra regularization term Eu + d plays in (.4), we argue that it is not necessary to choose a very large q. Indeed, we observe from our computational experiments that q can be set to or 4 for most cases, and that selecting a much larger value of q seems to be actually harmful to image denoising. 3

4 3 A linearly convergent algorithm for TV denoising In the previous section, we reformulated the TVM problem as to minimize the summation of a relatively simple nonsmooth convex function and a smooth strongly convex function. Our goal in this section is to show that such a reformulation can be efficiently solved. More specifically, we first present an accelerated gradient descent (AC-GD) method based on Nesterov s smooth optimal method 4, 6] for solving a general class of strongly convex composite optimization problems. Then, we show that, by using this algorithm, one can solve problem (.4) in O ( ) q ln(/ɛ) iterations. Our algorithm can be viewed as a variant of the well-known FISTA algorithm by Beck and Teboulle, ]. However, since FISTA does not take the advantage of the strong convexity of the problem, it possesses a much worse performance guarantee than the one mentioned above. 3. The accelerated gradient descent (AC-GD) algorithm Consider the following general composite problem of Ψ := min {Ψ(x) := ψ(x) + X (x)}, (3.6) x X where X R n is a closed convex set, X : X R is a simple convex function, and ψ : X R is smooth and strongly convex with Lipschitz continuous gradients, i.e., L 0, µ 0, such that µ y x ψ(y) ψ(x) ψ (x), y x L y x, x, y X. (3.7) The following AC-GD algorithm for solving (3.6) maintains the updating of three intertwined sequences, namely, {x t }, {x ag t } and {x md t } at each iteration t. All these types of multi-step gradient algorithms originate from Nesterov s seminal work in 4] (see Tseng 8] for a summary). However, very few of these algorithms can make use of the special strongly convex composite structure in (3.6) except those in 0, ]. The AC-GD method for strongly convex composite optimization. Input: x 0 X, step-size parameters {α t } t and {γ t } t s.t. α =, α t (0, ) for any t, and γ t 0 for any t. 0) Set the initial point x ag 0 = x 0 and t = ; ) Set ) Set x + t = x md t = ( α t)(µ + γ t ) γ t + µ( α t ) xag t + α t µ( α t ) + γ t ] γ t + µ( α t ) x t ; (3.8) α t µ x md t µ + γ t x t = arg min x X + ( α t)µ + γ t µ + γ t x t, (3.9) { α t ψ (x md t ), x + α t X (x) + (µ + γ t) } x + t x, (3.0) x ag t = α t x t + ( α t )x ag t ; (3.) 3) Set t t + and go to step. The AC-GD algorithm differs from a related accelerated stochastic approximation (AC-SA) algorithm for solving strongly convex composite optimization problems 0, ] in the following aspects. 4

5 Firstly, the above algorithm is deterministic, while the one in 0] is stochastic. Secondly, the subproblem used to define x t in (3.0) is much simpler than the corresponding one in 0]. Finally, we show that the above simple AC-GD algorithm can achieve the optimal rate of convergence for solving strongly convex composite problems possessed by a more involved multi-stage algorithm in ]. Theorem below describes the main convergence properties of the above AC-GD algorithm. Theorem Assume that {α t } t and {γ t } t in the AC-GD algorithm are chosen such that where Then, we have for any t, Γ t := µ + γ t Lα t, (3.) γ /Γ = γ /Γ =..., (3.3) {, t =, ( α t )Γ t, t. (3.4) Ψ(x ag t ) Ψ Γ tγ x 0 x, (3.5) where x is an optimal solution of (3.6). By properly choosing the stepsize parameters α t and γ t, we show that the above AC-GD algorithm can achieve the optimal rate of convergence for solving problem (3.6). Corollary 3 Let {x ag t } t be computed by the AC-GD algorithm with { } µ α t = max L, and γ t = LΓ t, (3.6) t + where Γ t is defined in (3.4). Then we have Ψ(x ag t ) Ψ min { ( µ L ) t, } L x 0 x, t. (3.7) t(t + ) Proof. The result follows by plugging the values of α t and γ t into (3.5) and noting that { ( ) t µ t ( Γ t min, ) } { ( ) t } µ = min,. L τ + L t(t + ) τ= 3. The AC-GD algorithm for total variation minimization In this subsection, we discuss how to apply the above AC-GD algorithm to solve the reformulated TV minimization problem in (.4). First, observe that the objective function φ( ) in (.4) can be written in the composite form, i.e., φ(u, d) = T (d) + ψ(u, d), where ψ(u, d) is given by ψ(u, d) := λ ( u A d ) ( f 0 ) ] Iu 0, A :=. (3.8) qe qid Here I u R N N and I d R N(N ) N(N ) are the identity matrices. Proposition 4 below summarizes some properties of ψ( ). 5

6 Proposition 4 The function ψ( ) in (3.8) is strongly convex with modulus Moreover, its gradient is Lipschitz continuous with constant µ ψ λ 0 +. (3.9) q L ψ λ( + q). (3.0) In view of the composition structure of φ( ) and Proposition 4, we can apply the AC-GD algorithm for solving problem (.4). Moreover, since A is very sparse, the computation of the gradient of ψ( ) only takes O(N ) arithmetic operations. Second, it is worth noting that the subproblem (3.0) arising from the AC-GD method applied to problem (.4) is easy to solve. Indeed, the subproblem (3.0) is given in the form of { (u, d) t = argmin u,d c, d + T (d) + p d d + t + c, u + p u u + t }, (3.) for some p > 0, c R N(N ) and c R N N. Suppose that the norm is given by an l norm in the definition of T ( ). By examining the optimality condition of problem (3.), we have the following explicit formula (see Section A.4 for more details): u t = p ( pu + t c ), (3.) and 0 if d t,ij = pd + t,ij c ) ij (pd p pd + t,ij c ij. + t,ij c ij if pd + t,ij c ij, pd + t,ij c ij > ; (3.3) where d t,ij = ( (d t ) (d t ) ) ( ( ) d +, d + t,ij = t ( ) d + t ) and c ij = ( c c ). Also note that one can write explicit solutions of (3.) if =. For both cases, solving the subproblem (3.0) requires only O(N ) arithmetic operations. We are now ready to state our main results. Theorem 5 Let (u 0, d 0 ) be an initial point of the AC-GD algorithm applied to problem (.4), and D 0 := (u 0, d 0 ) (u, d ). Also assume that the parameter q and that the stepsize parameters (α t, γ t ), t, are set to (3.6). Then, the iteration-complexity of the AC-GD algorithm for finding an ɛ-solution of (.4) can be bounded by ( { }) λqd 0 λqd0 O min q ln,. (3.4) ε ε Moreover, its arithmetic complexity can be bounded by ( { }) O N λqd 0 λqd0 min q ln,. (3.5) ε ε 6

7 Proof. The bound (3.4) follows immediately from Corollary 3, Proposition 4 and the observation that L ψ/µ ψ = O(q) when q. The bound (3.5) follows from (3.4) and the fact that the number of arithmetic operations in each iteration of the algorithm is bounded by O(N ). Observe that the complexity of FISTA applied to problem (.4) is O( λqd 0 /ε) (see, ]), which is strictly worse than the bound in (3.4). In particular, suppose that q is a given constant, in view of Theorem 5, the complexity of the AC-GD algorithm only weakly depends on the accuracy ɛ, the parameter λ, as well as the distance D 0 (and thus the dimension of the problem). Moreover, its total arithmetic complexity is polynomial with a mild linear dependence on the problem dimension N. 4 Numerical Results and Biomedical Application In this section, we report our preliminary computational results where we compare our reformulation (RTVM) in (.4) with the original TVM (OTVM) model in (.) for image denoising. We also compare the performance of two first-order algorithms for composite optimization, i.e., FISTA and AC-GD, applied to our reformulation. Furthermore, we discuss the application of the developed techniques for solving certain biomedical image denoising problems. 4. Numerical Study on General Image Denosing Problems In this subsection, we conduct numerical experiments on a few classical image denosing problems. In our first experiment, we show that the reformulated TVM model is comparable to the original model in term of the quality of the denoised images. Two image instance sets were used in this experiment. In the first instance set, we take the Lena test image whose pixels were scaled between 0 and. The noisy image is obtained by adding a white Gaussian noise with zero mean and various standard deviations (σ). In the second one, we use the Peppers test images with different sizes and the noise is added similarly to the first one with σ = 0.. The original and noisy images for both instance sets are given in Figure. We set the parameter λ to 6 for both formulations in (.) and (.4). We solve the original TVM model by using an efficient primal-dual algorithm 4] and also apply the AC-GD algorithm for the reformulated TVM model with different values of q. We then report the best (largest) value of the Peak Signal-to-Noise Ratio (PSNR) obtained after 00 iterations for both approaches. The results of these two instance sets are reported in Table and Table, respectively. Moreover, Figure and Figure 3, respectively, represent the denoised Lena and Peppers images obtained from solving the original TVM model, and the reformulated TVM model with q = and 4. Table : PSNR of denoised Lena images by OTVM and RTVM formulations Std. Denoised Lena images dev. Noisy OTVM RTVM RTVM RTVM RTVM RTVM RTVM σ image q = 0.5 q = q = q = 4 q = 8 q =

8 Original Lena image Noisy Lena image, σ = 0. Original Peppers image Noisy Peppers image, σ = 0. Figure : Original and noised images OTVM denoised image RTVM denoised image RTVM denoised image with λ = 6 with λ = 6, q = with λ = 6, q = 4 Figure : Denoised Lena images obtained by the original and reformulated TVM models 8

9 Table : PSNR of denoised Peppers images by OTVM and RTVM formulations Images Denoised Peppers images size Noisy OTVM RTVM RTVM RTVM RTVM RTVM RTVM N image q = 0.5 q = q = q = 4 q = 8 q = OTVM denoised image RTVM denoised image RTVM denoised image with λ = 6 with λ = 6, q = with λ = 6, q = 4 Figure 3: Denoised Peppers images obtained by the original and reformulated TVM models It can be seen from Table and that the values of PSNR computed from the reformulated TVM model are not too sensitive to the choice of parameter q under different selection of noise level σ and image size N. We can set q = or q = 4 in practice to achieve reasonably good solution quality. We also observe that the quality of denoised images obtained by the reformulated TVM model is comparable to that obtained by the original model. In fact, at the first glance, the denoised Lena image using the original TVM model seems to be cleaner than those obtained by using the reformulated model. However, a closer examination reveals that some undesirable oversmoothing effects, e.g., the disappeared texture at the hat and a few extra lines at the nose of the Lena image in Figure, were introduced by the original TVM model. On the other hand, these oversmoothing effects were not appearant in the denoised images using the reformulated model. Moreover, no significant differences could be observed for the denoised Peppers images obtained by using the original and reformulated TVM models. In our second experiment, we demonstrate that AC-GD is faster than FISTA for solving the composite minimization problem in (.4). From our discussion in Section 3., the convergence rate of AC-GD always dominates that of FISTA for solving strongly convex composite optimization problems. Our goal here is to verify this claim from the numerical experiments. Figure 4 shows the convergence behavior of AC-GD and FISTA applied to the Lena image. More specifically, we report the optimality gap φ(u k, d k ) φ for both algorithms, where the optimal value φ was estimated by running FISTA for 0, 000 iterations. As shown in Figure 4, after only 50 iterations, the AC-GD method can reach 0 accuracy. It can also be easily seen from Figure 4 that AC-GD converges linearly while FISTA converges sublinearly. This indeed reflects the difference on the theoretical 9

10 0 4 0 FISTA algorithm AC GD algorithm φ(uk, dk) φ k Figure 4: AC-GD vs. FISTA applied to denoise the Lena image convergence rates for both algorithms applied to problem (.4). 4. Applications in Biomedical Image Denosing In this subsection, we apply the developed reformulations for TVM in Magnetic resonance image (MRI), which provides detailed information about internal structures of the body. In comparison with other medical imaging techniques, MRI is most useful for brain and muscle imaging. Two image instance sets were used in this experiment. In the first instance set, we use the Brain MRI test image and noisy images are obtained by adding a white Gaussian noise with zero mean and various standard deviations (σ). In the second one, we use the Knee MRI test images with different sizes and noisy image with σ = 0.. We applied AC-GD algorithm with same settings (λ = 6 and q = 0.5,,, 4, 8, 6 ) as previous experiment to solve the RTVM model for these two instance sets. The As shown in Figure 5, Figure 6 and Table 3, Table 4, these results obtained for MRI images are consistent with those in Section 4.. Table 3: PSNR of denoised Brain MRI by OTVM and RTVM formulations Std. Denoised Brain MRIs dev. Noisy OTVM RTVM RTVM RTVM RTVM RTVM RTVM σ image q = 0.5 q = q = q = 4 q = 8 q =

11 56 56 Original Knee MRI Noisy Knee MRI, σ = 0. Denoised Knee MRI, λ = 6, q = Denoised Knee MRI, λ = 6, q = 4 Figure 5: Original, noised and denoised Knee MRI Original Brain MRI Noisy Brain MRI, σ = 0. Denoised Brain MRI, λ = 6, q = Denoised Brain MRI, λ = 6, q = 4 Figure 6: Original, noised and denoised Brain MRI

12 Table 4: PSNR of denoised Knee MRIs by OTVM and RTVM formulations Images Denoised Knee MRIs size Noisy OTVM RTVM RTVM RTVM RTVM RTVM RTVM N image q = 0.5 q = q = q = 4 q = 8 q = Conclusions In this paper, we introduced a strongly convex composite reformulation for the ROF model, a wellknown approach in biomedical image processing. We show that this reformulation is comparable with the original ROF model in terms of the quality of denoised images. We presented a first-order algorithm which possesses a linear rate of convergence for solving the reformulated problem and can be scalable to large-scale imaging denosing problems. We demonstrate from our numerical experiments that the developed algorithm, when applied to reformulated model, compares favorably with existing first-order algorithms applied to original model. In the future, we would like to generalize these reformulations to others image processing problems such as image deconvolution and image zooming.

13 References ] A. Beck and M. Teboulle. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Proc., 8:49 434, 009. ] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sciences, :83 0, ] A. Chambolle. An algorithm for total variation minimization and applications. J. Math. Imaging Vision, 0:89 97, ] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision, 40:0 45, 0. 5] T.F. Chan, G.H. Golub, and P. Mulet. A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput., 0(6):964977, ] P.L. Combettes and J. Luo. An adaptive level set method for nondifferentiable constrained image recovery. IEEE Trans. Image Process.,. 7] J. Dahl, P.C. Hansen, S.H. Jensen, and T.L. Jensen. Algorithms and software for total variation image reconstruction via first-order methods. Numerical Algorithms, pages 67 9, ] F. Dibos and G. Koepfler. Global total variation minimization. SIAM J. Numer. Anal., page , ] H. Y. Fu, M. K. Ng, M. Nikolova, and J. L. Barlow. Efficient minimization methods of mixed l l and l l norms for image restoration. SIAM Journal on Scientific Computing, 7(6):88 90, ] S. Ghadimi and G. Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework. Manuscript, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 36, USA, 00. Submitted to SIAM Journal on Optimization. ] S. Ghadimi and G. Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. Manuscript, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 36, USA, 00. Submitted to SIAM Journal on Optimization. ] D. Goldfarb and W. Yin. Second-order cone programming methods for total variation-based image restoration. SIAM Journal on Scientific Computing, 7:6 645, ] T. Goldstein and S. Osher. The split bregman algorithm for l regularized problems. Manuscript, UCLA CAM Report(08-9), April ] Y. E. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence O(/k ). Doklady AN SSSR, 69: , ] Y. E. Nesterov. Excessive gap technique in nonsmooth convex minimization. SIAM Journal on Optimization, 6:35 49,

14 6] Y. E. Nesterov. Smooth minimization of nonsmooth functions. Mathematical Programming, 03:7 5, ] L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D, 60:5968, 99. 8] P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. Manuscript, University of Washington, Seattle, May ] Y. Wang, J. Yang, W. Yin, and Y. Zhang. A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences, (3):48 7,

15 Appendix We provide the proofs for Proposition, Theorem and Proposition 4 in Sections A., A. and A.3, respectively. We also discuss how to solve the subproblems arising from the AC-GD algorithm applied to problem (.4) in Section A.4. A. Relation between two formulations (Proposition ) In this subsection, we provide the proof of Proposition, which shows how the optimal values of problem (.) and (.4) are related. Proof of Proposition First note that by the definitions of T ( ) and T ( ), we have T (d) = T (u), if d = Eu. (5.6) Let ū be an optimal solution of (.) and d = Eū. Using (.), (.4) and (5.6), we have φ φ(ū, d) = T ( Eū) + λ ] ū f + q Eū Eū = T (ū) + λ ū f = φ, We now show the second relation in (.5). Let (u, d ) be an optimal solution to (.4), then φ φ(u ) = T (u ) + λ u f. Observe that by the definition of T ( ), we have, T (d + δ) = ( d ij + δ ) ij d ij + ( ) d ij δ ij d + ( ) δ ij ij δij T (d) + N δ, where the last inequality follows from the Cauchy-Swartz inequality. It then follows from (5.6) and the above conclusion that Therefore, T (u ) = T ( Eu ) = T (d (d + Eu )) T (d ) + N d + Eu. φ T (d ) + N d + Eu + λ u f = T (d ) + λ u f + q d + Eu ] + N d + Eu λq d + Eu = φ(u, d ) + N d + Eu λq d + Eu φ + N λq, where the last relation follows from Young s inequality. A. Concergence analysis for the AC-GD algorithm (Theorem ) Our main goal in this subsection is to prove the convergence results of the AC-GD algorithm described in Theorem. We first establish two technical results. Lemma 6 states some properties of the subproblem (3.0) and Lemma 7 establishes an important recursion of the AC-GD algorithm. Theorem then follows directly from Lemma 7. The first technical result below characterizes the solution of the projection step (3.0). 5

16 Lemma 6 Let X{ be a convex set and p : X} R be a convex function. Assume that û is an optimal solution of min p(u) + µ x u : u X, where x, ỹ X and µ > 0 are given. Then, for any u X, p(û) + µ x û + µ û u p(u) + µ x u. (5.7) Proof. Denote q(u) = p(u)+ µ x u. The result immediately follows from the strong convexity of q(u) and the optimality condition that q (û), u û 0 for any u X. The following lemma establishes an important recursion for the AC-GD algorithm. Lemma 7 Let (x t, x ag t ) X X be given. Also let (xmd t, x t, x ag t ) X X X be computed according to (3.8),(3.0), (3.) and suppose that (3.) holds for given γ t and α t. Then, for any x X, we have Ψ(x ag t ) + µ x t x ( α t ) Ψ(x ag t ) + µ x t x ] + α t Ψ(x) + γ t x t x x t x ]. (5.8) Proof. We first establish some basic relations among the search points x ag t, x md t, x t and x + t. Denote d t := x ag t x md t. It follows from (3.8), (3.) and (3.9) that d t = α t x t + ( α t ) x ag t xmd t ( ) ( α = α t x t + t )µ + γ t µ + γ t ( = α t x t α tµ x md t ( α t)µ + γ t x t µ + γ t µ + γ t Using the above result and the convexity of ψ, we have x md t α t ( α t )µ + γ t ] x t µ + γ t ) = α t (x t x + t ). (5.9) ψ(x md t ) + ψ (x md t ), d t = ψ(x md t ) + ψ (x md t ), α t x t + ( α t )x ag t xmd t ] = ( α t ) ψ(x md t ) + ψ (x md t ), x ag t xmd t ] + α t ψ(x md t ) + ψ (x md t ), x t x md t ] ( α t )ψ(x ag t ) + α t ψ(x md t ) + ψ (x md t ), x t x md t. It then follows from the previous two observations, (3.7),(3.) and the convexity of X (x) that Ψ(x ag t ) = ψ(x ag t ) + X (x ag t ) ψ(x md t ) + ψ (x md t ), d t + L d t + ( α t )X (x ag t ) + α tx (x t ) ] ) + ψ (x md ), x t x md + L d t ( α t )ψ(x ag t ) + α t ψ(x md t +( α t )X (x ag t ) + α tx (x t ) ] ( α t )Ψ(x ag t ) + α t ψ(x md t ) + ψ (x md t ), x t x md t + X (x t ) + L d t ] = ( α t )Ψ(x ag t ) + α t ψ(x md t ) + ψ (x md t ), x t x md t + X (x t ) t t (5.30) + µ + γ t α t d t µ + γ t Lα t α t d t. (5.3) 6

17 Now let us apply the result regarding the projection step in (3.0). Specifically, by using Lemma 6 with p( ) = α t ψ (x md t ), x + X ( )], û = x t, and x = x + t, we have, for any x X, µ + γ ] t x t x + α t ψ(x md t ) + ψ (x md t ), x t x md t + X (x t ) + µ + γ t x t x + t α t ψ(x md t ) + ψ (x md t ), x x md t + X (x)] + µ + γ t x x + t α t ψ(x md t ) + ψ (x md t ), x x md t + X (x)] + α tµ x x md t + ( α t)µ + γ t x x t α t Ψ(x) + ( α t)µ + γ t x x t, (5.3) where the second inequality follows from (3.9) and the convexity of, and the last inequality follows from the strong convexity of Ψ( ). Combining (5.3) and (5.3), we obtain Ψ(x ag t ) ( α t ) Ψ(x ag t ) + µ x x t ] + α t Ψ(x) + γ t x x t x x t ] µ x x t µ + γ t Lα t α t d t, which clearly implies (5.8) due to the assumption in (3.). We are now ready to prove Theorem. Proof of Theorem : Dividing both sides of (5.8) by Γ t, and using (3.4) and the fact that α =, we have Ψ(x ag t ) + µ Γ t x x t ] Ψ(x ag t Γ ) + µ t x x t ] + α t Ψ(x) Γ t + γ t x x t x x t ], t, Γ t and Ψ(x ag Γ ) + µ x x ] α Ψ(x) + γ x x 0 x x ]. Γ Γ Summing up above inequalities, we obtain Ψ(x ag t ) + µ Γ t x x t ] t τ= α τ Γ τ Ψ(x) + t τ= Note that by (3.4) and the fact α =, we have t α τ = α t ( + Γ ) τ = + Γ τ Γ Γ τ Γ τ Γ τ= τ= γ τ x x τ x x τ ]. (5.33) Γ τ t τ= ( ) =. (5.34) Γ τ Γ τ Γ t Using the above two relations, condition (3.3) and the fact that Γ =, we have Ψ(x ag t ) + µ Γ t x x t ] Ψ(x) + γ x x 0 x x t ] Γ t Γ which clearly implies (3.5). Γ t Ψ(x) + γ x x 0, 7

18 A3. Properties of the composite function (Proposition 4) In this subsection, we provide the proof of Proposition 4, which provides certain estimates on the two crucial parameters µ ψ and L ψ for the smooth component ψ( ) in the composite function φ( ). Proof of Proposition 4: Denote the maximum eigenvalue and minimum eigenvalue of M := A T A by λ max and λ min respectively. Then, it suffices to show that and λ max + q, (5.35) λ min 0 +. (5.36) q We bound the eigenvalues of M by using Gershgorin s Theorem. Observe that the network flow matrix E in (.3) can be written explicitly as E := e K e 0 K P 0 K P 0 0 K P N K 0 0 P L, L, L,N 0 0 P L, L, L,N P N L N, L N, L N,N where e R N is the unit vector (, 0,..., 0) T, K R (N ) (N ) denotes the two-diagonals lower triangular matrix with the main diagonal entries equal to and the sub-diagonal entries equal to, P i, i N, denotes the matrix having the ith column equal to e and others entries equal to 0 and L R (N ) (N ), i, j N, denotes the matrix with the ith column equal to jth column of K and others entries equal to 0. First, we will find the upper bound for the maximum eigenvalue of M. It is easy to see that the Nth row of M having the largest value of the sum of the absolute values of all entries. We have N +N(N ) i= M N,i = + q, which, in view of the Gershgorin s Theorem, clearly implies (5.35). Second, we find the lower bound for the minimum eigenvalue of M. Since if λ is an eigenvalue of M then /λ is an eigenvalue of M, we will find the upper bound for the maximum eigenvalue of M instead of the minimum eigenvalue of M. Note that M = A (A ) T and that by applying, 8

19 Gauss-Jordan s elimination, we easily obtain the formula of A as follows I u 0 e K e 0 K P 0 K 0 0 A 0 P 0 0 K 0 ( / ) q I d 0 P N K 0 0 P L, L, L,N 0 0 P L, L, L,N P N L N, L N, L N,N It is easy to see that the (N + N)th row of M having the largest value for the sum of the absolute values of all the entries. In particular, we have N +N(N ) i= M N +N,i = 0 + q, which, in view of the Gershgorin s Theorem, clearly implies that λ max (M ) 0 + q. By using the above observation and the fact that λ min (M) = /λ max (M ), we obtain (5.36). A4. Solving the subproblem (3.) in AC-GD algorithm In this subsection, we derive the explicit solution to the subproblem (3.) in Step of the AC-GD algorithm. It is easy to see that (3.) is true. By definition of T (d), d t,ij is the solution of min c ij, d ij + d ij d + p d ij d + t,ij, ij where d ij = ( d d ), d + t,ij = ( ( d + t ) ( d + t ) ), c ij = ( c c ). For notational convenience, let us consider the following problem: min y r, y + y + p y q, (5.37) 9

20 where y, q, r R, p R and p > 0. We consider two cases: Case : pq r. We have r, y + y + p y q = y + p y + p q p q, y + r, y = y + p y + p q pq r, y y + p y + p q pq r y p y + p q, which implies that y = 0 is the solution of (5.37) in case pq r. Case : pq r >. By the optimality condition of (5.37), we have y y + py pq + r = 0, which means { y y + py pq + r = 0, y y + py pq + r = 0. Denoting t = y, then we have { y ( t + p) = pq r, y ( t + p) = pq r, or equivalently, { y = t +tp (pq r ), y = t +tp (pq r ). Combining the above relation with t = y, we have t + tp pq r = t, or equivalently, t = p ( pq r ), which immediately implies that the optimal solution of (5.37) is given by y = pq r p pq r (pq r). Replacing y, q and r by d ij, d + t,ij and c ij respectively, we obtain (3.3). It is worth noting that this formula still holds in case y, q, r R. 0

Gradient Sliding for Composite Optimization

Gradient Sliding for Composite Optimization Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

FAST ALTERNATING DIRECTION OPTIMIZATION METHODS

FAST ALTERNATING DIRECTION OPTIMIZATION METHODS FAST ALTERNATING DIRECTION OPTIMIZATION METHODS TOM GOLDSTEIN, BRENDAN O DONOGHUE, SIMON SETZER, AND RICHARD BARANIUK Abstract. Alternating direction methods are a common tool for general mathematical

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

Efficient Methods for Stochastic Composite Optimization

Efficient Methods for Stochastic Composite Optimization Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June

More information

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING JIAN-FENG CAI, STANLEY OSHER, AND ZUOWEI SHEN Abstract. Real images usually have sparse approximations under some tight frame systems derived

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

A Primal-dual Three-operator Splitting Scheme

A Primal-dual Three-operator Splitting Scheme Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Minimizing Isotropic Total Variation without Subiterations

Minimizing Isotropic Total Variation without Subiterations MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Minimizing Isotropic Total Variation without Subiterations Kamilov, U. S. TR206-09 August 206 Abstract Total variation (TV) is one of the most

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSE SIGNAL RESTORATION. 1. Introduction SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Noname manuscript No. (will be inserted by the editor) Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Saeed Ghadimi Guanghui Lan Hongchao Zhang the date of

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Generalized Uniformly Optimal Methods for Nonlinear Programming

Generalized Uniformly Optimal Methods for Nonlinear Programming Generalized Uniformly Optimal Methods for Nonlinear Programming Saeed Ghadimi Guanghui Lan Hongchao Zhang Janumary 14, 2017 Abstract In this paper, we present a generic framewor to extend existing uniformly

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient (PDHG) algorithm proposed

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method

More information

A Dual Formulation of the TV-Stokes Algorithm for Image Denoising

A Dual Formulation of the TV-Stokes Algorithm for Image Denoising A Dual Formulation of the TV-Stokes Algorithm for Image Denoising Christoffer A. Elo, Alexander Malyshev, and Talal Rahman Department of Mathematics, University of Bergen, Johannes Bruns gate 12, 5007

More information

On convergence rate of the Douglas-Rachford operator splitting method

On convergence rate of the Douglas-Rachford operator splitting method On convergence rate of the Douglas-Rachford operator splitting method Bingsheng He and Xiaoming Yuan 2 Abstract. This note provides a simple proof on a O(/k) convergence rate for the Douglas- Rachford

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro

More information

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem. ACCELERATED LINEARIZED BREGMAN METHOD BO HUANG, SHIQIAN MA, AND DONALD GOLDFARB June 21, 2011 Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (A) method for solving the

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

A First Order Primal-Dual Algorithm for Nonconvex T V q Regularization

A First Order Primal-Dual Algorithm for Nonconvex T V q Regularization A First Order Primal-Dual Algorithm for Nonconvex T V q Regularization Thomas Möllenhoff, Evgeny Strekalovskiy, and Daniel Cremers TU Munich, Germany Abstract. We propose an efficient first order primal-dual

More information

An adaptive accelerated first-order method for convex optimization

An adaptive accelerated first-order method for convex optimization An adaptive accelerated first-order method for convex optimization Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter July 3, 22 (Revised: May 4, 24) Abstract This paper presents a new accelerated variant

More information

Dual methods for the minimization of the total variation

Dual methods for the minimization of the total variation 1 / 30 Dual methods for the minimization of the total variation Rémy Abergel supervisor Lionel Moisan MAP5 - CNRS UMR 8145 Different Learning Seminar, LTCI Thursday 21st April 2016 2 / 30 Plan 1 Introduction

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

On Stochastic Primal-Dual Hybrid Gradient Approach for Compositely Regularized Minimization

On Stochastic Primal-Dual Hybrid Gradient Approach for Compositely Regularized Minimization On Stochastic Primal-Dual Hybrid Gradient Approach for Compositely Regularized Minimization Linbo Qiao, and Tianyi Lin 3 and Yu-Gang Jiang and Fan Yang 5 and Wei Liu 6 and Xicheng Lu, Abstract We consider

More information

arxiv: v1 [math.oc] 9 Sep 2013

arxiv: v1 [math.oc] 9 Sep 2013 Linearly Convergent First-Order Algorithms for Semi-definite Programming Cong D. Dang, Guanghui Lan arxiv:1309.2251v1 [math.oc] 9 Sep 2013 October 31, 2018 Abstract In this paper, we consider two formulations

More information

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Minru Bai(x T) College of Mathematics and Econometrics Hunan University Joint work with Xiongjun Zhang, Qianqian Shao June 30,

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

A Primal-Dual Method for Total Variation-Based. Wavelet Domain Inpainting

A Primal-Dual Method for Total Variation-Based. Wavelet Domain Inpainting A Primal-Dual Method for Total Variation-Based 1 Wavelet Domain Inpainting You-Wei Wen, Raymond H. Chan, Andy M. Yip Abstract Loss of information in a wavelet domain can occur during storage or transmission

More information

Convex Hodge Decomposition and Regularization of Image Flows

Convex Hodge Decomposition and Regularization of Image Flows Convex Hodge Decomposition and Regularization of Image Flows Jing Yuan, Christoph Schnörr, Gabriele Steidl April 14, 2008 Abstract The total variation (TV) measure is a key concept in the field of variational

More information

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München, München, Arcistraße

More information

Solving DC Programs that Promote Group 1-Sparsity

Solving DC Programs that Promote Group 1-Sparsity Solving DC Programs that Promote Group 1-Sparsity Ernie Esser Contains joint work with Xiaoqun Zhang, Yifei Lou and Jack Xin SIAM Conference on Imaging Science Hong Kong Baptist University May 14 2014

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient

More information

An Algorithmic Framework of Generalized Primal-Dual Hybrid Gradient Methods for Saddle Point Problems

An Algorithmic Framework of Generalized Primal-Dual Hybrid Gradient Methods for Saddle Point Problems An Algorithmic Framework of Generalized Primal-Dual Hybrid Gradient Methods for Saddle Point Problems Bingsheng He Feng Ma 2 Xiaoming Yuan 3 January 30, 206 Abstract. The primal-dual hybrid gradient method

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

On the acceleration of augmented Lagrangian method for linearly constrained optimization

On the acceleration of augmented Lagrangian method for linearly constrained optimization On the acceleration of augmented Lagrangian method for linearly constrained optimization Bingsheng He and Xiaoming Yuan October, 2 Abstract. The classical augmented Lagrangian method (ALM plays a fundamental

More information

Variational Image Restoration

Variational Image Restoration Variational Image Restoration Yuling Jiao yljiaostatistics@znufe.edu.cn School of and Statistics and Mathematics ZNUFE Dec 30, 2014 Outline 1 1 Classical Variational Restoration Models and Algorithms 1.1

More information

Sparse Optimization: Algorithms and Applications. Formulating Sparse Optimization. Motivation. Stephen Wright. Caltech, 21 April 2007

Sparse Optimization: Algorithms and Applications. Formulating Sparse Optimization. Motivation. Stephen Wright. Caltech, 21 April 2007 Sparse Optimization: Algorithms and Applications Stephen Wright 1 Motivation and Introduction 2 Compressed Sensing Algorithms University of Wisconsin-Madison Caltech, 21 April 2007 3 Image Processing +Mario

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Gradient based method for cone programming with application to large-scale compressed sensing

Gradient based method for cone programming with application to large-scale compressed sensing Gradient based method for cone programming with application to large-scale compressed sensing Zhaosong Lu September 3, 2008 (Revised: September 17, 2008) Abstract In this paper, we study a gradient based

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Asymmetric Cheeger cut and application to multi-class unsupervised clustering

Asymmetric Cheeger cut and application to multi-class unsupervised clustering Asymmetric Cheeger cut and application to multi-class unsupervised clustering Xavier Bresson Thomas Laurent April 8, 0 Abstract Cheeger cut has recently been shown to provide excellent classification results

More information

Primal-Dual First-Order Methods for a Class of Cone Programming

Primal-Dual First-Order Methods for a Class of Cone Programming Primal-Dual First-Order Methods for a Class of Cone Programming Zhaosong Lu March 9, 2011 Abstract In this paper we study primal-dual first-order methods for a class of cone programming problems. In particular,

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University

More information

Enhanced Compressive Sensing and More

Enhanced Compressive Sensing and More Enhanced Compressive Sensing and More Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Nonlinear Approximation Techniques Using L1 Texas A & M University

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities

More information

Recent developments on sparse representation

Recent developments on sparse representation Recent developments on sparse representation Zeng Tieyong Department of Mathematics, Hong Kong Baptist University Email: zeng@hkbu.edu.hk Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last

More information

Compressive Imaging by Generalized Total Variation Minimization

Compressive Imaging by Generalized Total Variation Minimization 1 / 23 Compressive Imaging by Generalized Total Variation Minimization Jie Yan and Wu-Sheng Lu Department of Electrical and Computer Engineering University of Victoria, Victoria, BC, Canada APCCAS 2014,

More information

arxiv: v1 [math.oc] 5 Dec 2014

arxiv: v1 [math.oc] 5 Dec 2014 FAST BUNDLE-LEVEL TYPE METHODS FOR UNCONSTRAINED AND BALL-CONSTRAINED CONVEX OPTIMIZATION YUNMEI CHEN, GUANGHUI LAN, YUYUAN OUYANG, AND WEI ZHANG arxiv:141.18v1 [math.oc] 5 Dec 014 Abstract. It has been

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with H. Bauschke and J. Bolte Optimization

More information

arxiv: v1 [math.na] 3 Jan 2019

arxiv: v1 [math.na] 3 Jan 2019 arxiv manuscript No. (will be inserted by the editor) A Finite Element Nonoverlapping Domain Decomposition Method with Lagrange Multipliers for the Dual Total Variation Minimizations Chang-Ock Lee Jongho

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Adaptive Primal-Dual Hybrid Gradient Methods for Saddle-Point Problems

Adaptive Primal-Dual Hybrid Gradient Methods for Saddle-Point Problems Adaptive Primal-Dual Hybrid Gradient Methods for Saddle-Point Problems Tom Goldstein, Min Li, Xiaoming Yuan, Ernie Esser, Richard Baraniuk arxiv:3050546v2 [mathna] 24 Mar 205 Abstract The Primal-Dual hybrid

More information

arxiv: v1 [math.oc] 3 Jul 2014

arxiv: v1 [math.oc] 3 Jul 2014 SIAM J. IMAGING SCIENCES Vol. xx, pp. x c xxxx Society for Industrial and Applied Mathematics x x Solving QVIs for Image Restoration with Adaptive Constraint Sets F. Lenzen, J. Lellmann, F. Becker, and

More information

A primal-dual fixed point algorithm for multi-block convex minimization *

A primal-dual fixed point algorithm for multi-block convex minimization * Journal of Computational Mathematics Vol.xx, No.x, 201x, 1 16. http://www.global-sci.org/jcm doi:?? A primal-dual fixed point algorithm for multi-block convex minimization * Peijun Chen School of Mathematical

More information

arxiv: v4 [math.oc] 29 Jan 2018

arxiv: v4 [math.oc] 29 Jan 2018 Noname manuscript No. (will be inserted by the editor A new primal-dual algorithm for minimizing the sum of three functions with a linear operator Ming Yan arxiv:1611.09805v4 [math.oc] 29 Jan 2018 Received:

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming Math. Program., Ser. A (2011) 126:1 29 DOI 10.1007/s10107-008-0261-6 FULL LENGTH PAPER Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato

More information

A Variational Approach to Reconstructing Images Corrupted by Poisson Noise

A Variational Approach to Reconstructing Images Corrupted by Poisson Noise J Math Imaging Vis c 27 Springer Science + Business Media, LLC. Manufactured in The Netherlands. DOI: 1.7/s1851-7-652-y A Variational Approach to Reconstructing Images Corrupted by Poisson Noise TRIET

More information

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Image Cartoon-Texture Decomposition and Feature Selection using the Total Variation Regularized L 1 Functional

Image Cartoon-Texture Decomposition and Feature Selection using the Total Variation Regularized L 1 Functional Image Cartoon-Texture Decomposition and Feature Selection using the Total Variation Regularized L 1 Functional Wotao Yin 1, Donald Goldfarb 1, and Stanley Osher 2 1 Department of Industrial Engineering

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

UPRE Method for Total Variation Parameter Selection

UPRE Method for Total Variation Parameter Selection UPRE Method for Total Variation Parameter Selection Youzuo Lin School of Mathematical and Statistical Sciences, Arizona State University, Tempe, AZ 85287 USA. Brendt Wohlberg 1, T-5, Los Alamos National

More information

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Instructor: Wotao Yin July 2013 Note scriber: Zheng Sun Those who complete this lecture will know what is a dual certificate for l 1 minimization

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente October 25, 2012 Abstract In this paper we prove that the broad class of direct-search methods of directional type based on imposing sufficient decrease

More information

Iteration-complexity of first-order augmented Lagrangian methods for convex programming

Iteration-complexity of first-order augmented Lagrangian methods for convex programming Math. Program., Ser. A 016 155:511 547 DOI 10.1007/s10107-015-0861-x FULL LENGTH PAPER Iteration-complexity of first-order augmented Lagrangian methods for convex programming Guanghui Lan Renato D. C.

More information

Proximal-like contraction methods for monotone variational inequalities in a unified framework

Proximal-like contraction methods for monotone variational inequalities in a unified framework Proximal-like contraction methods for monotone variational inequalities in a unified framework Bingsheng He 1 Li-Zhi Liao 2 Xiang Wang Department of Mathematics, Nanjing University, Nanjing, 210093, China

More information

Convex Hodge Decomposition of Image Flows

Convex Hodge Decomposition of Image Flows Convex Hodge Decomposition of Image Flows Jing Yuan 1, Gabriele Steidl 2, Christoph Schnörr 1 1 Image and Pattern Analysis Group, Heidelberg Collaboratory for Image Processing, University of Heidelberg,

More information

Uniqueness Conditions for A Class of l 0 -Minimization Problems

Uniqueness Conditions for A Class of l 0 -Minimization Problems Uniqueness Conditions for A Class of l 0 -Minimization Problems Chunlei Xu and Yun-Bin Zhao October, 03, Revised January 04 Abstract. We consider a class of l 0 -minimization problems, which is to search

More information