A Linearly Convergent First-order Algorithm for Total Variation Minimization in Image Processing
|
|
- Roland Bruce
- 5 years ago
- Views:
Transcription
1 A Linearly Convergent First-order Algorithm for Total Variation Minimization in Image Processing Cong D. Dang Kaiyu Dai Guanghui Lan October 9, 0 Abstract We introduce a new formulation for total variation minimization in image denoising. We also present a linearly convergent first-order method for solving this reformulated problem and show that it possesses a nearly dimension-independent iteration complexity bound. Keywords: Image denoising, Total variation, First-order methods, Complexity, Linear rate of convergent Introduction The restoration of images contaminated by noise is a fundamental problem in biomedical image processing and plays an important role in certain diagnosis techniques such as Magnetic Resonance Image (MRI) and functional Magnetic Resonance Image (fmri). In 99, Rudin, Osher and Fatemi (ROF) 7] proposed an influential optimization approach for image denoising by minimizing the total variation (TV). It turns out that the ROF model can preserve edges and important features in the original image. In this paper we propose an alternative formulation (or relaxation) for minimizing the total variation, which leads to comparable denoising quality to the classical ROF model. Moreover, we show that the relaxed model can be solved very efficiently. In particular, we present a linearly convergent first-order algorithm for solving this new model, and demonstrate that it possesses an O (ln(/ɛ)) iteration complexity for achieving the target accuracy ɛ. Since the aforementioned iteration complexity bound is almost dimension-independent and the iteration cost only linearly depends on the dimension, the total arithmetic complexity of our algorithm is O ( N ln(/ɛ) ) for processing an N N image. Hence, our approach is scalable to very large-scale image denoising problems. By contrast, most existing approaches for solving the original ROF model are based on an equivalent dual or primal-dual formulation (see, e.g., Chan et al. 5], Chambolle 3], Beck and Teboulle, ], Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, 36. ( congdd@ufl.edu). This author was partially supported by a doctoral fellowship from the Vietnam International Education Development and NSF GRANT CMMI Software School, Fudan University, Shanghai, China, 003. ( kydai@fudan.edu.cn). This author was partially supported by a visiting scholar fellowship from China Scholarship Council. Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, 36. ( glan@ise.ufl.edu). This author was partially supported by NSF GRANT CMMI
2 Chambolle and Pock 4]). All these algorithms converge sublinearly for solving the ROF model and the best iteration complexity bound is given by O(/ ɛ), 4]. Moreover, these complexity results heavily depend on the dimension of the problem, the selection of the initial point and certain regularization parameters. We also refer to other algorithms recently developed for solving the ROF model (e.g., 6, 8, 7, 9,, 3, 9]) and references therein. This paper is organized as follows. We review the classical ROF model and present a new strongly convex composite reformulation (or relaxation) for the ROF model in Section. An efficient algorithm for solving the reformulation is presented and analyzed in Section 3. We then report some promising numerical results and biomedical applications in Section 4. All the technical proofs are given in the Appendix. A strongly convex composite reformulation for total variation minimization In this section, we review the classical ROF model for image denoising and present a novel reformulation for it. We also show how these two TV-based models for image denoising are related. For the sake of simplicity, let us assume that the images are of -dimension with N N pixels. For any image u R N N, the discretized gradient operator u is defined as where ( u) := ( u) := (( u), ( u) ), i, j =,..., N, (.) { { ui+,j u, i < N 0, i = N and u+ u ( u) :=, j < N 0, j = N. Then, the classical total variation minimization problem is given by ū = arg min {φ(u) := T (u) + λ } u f, (.) where λ > 0 is a user-defined parameter, f is the observed noisy image and T (u) = ( ) ( u) ( u). Observe that the norm in the definition of T ( ) (and hereafter) can be either the l or l norm. If =, then problem (.) is exactly the original ROF model. It can be easily seen that the objective function φ(u) in (.) is a nonsmooth strongly convex function. It is known that oracle-based convex optimization techniques would require O(/ɛ) iterations to find an ɛ-solution of (.), i.e., a point û such that φ(û) φ ɛ (see 0]). It has recently been shown that the above iteration complexity can be significantly improved to O(/ ɛ) by using a dual or saddle point reformulation of (.) (e.g.,, 4, 5]). Note, however, that all these algorithms converge sublinearly, and that their performance also heavily depends on the dimension N and the selection of starting points. In order to address these issues, we consider an alternative formulation of problem (.). The basic idea is to introduce an extra variable d R N(N ), which corresponds to the nonzero components
3 of the gradient operator u, and then to impose the following set of constraints: d = u i+,j u, i =,..., N ; j =,..., N, d = u + u, i =,..., N; j =,..., N. Observe that the above constraints can be written in the matrix form as Eu + d = 0, (.3) where E T is a network flow matrix with N nodes and N(N ) arcs, with each node having at most degree 4, i.e., E = Now, denoting T (d) := ( ) d d, we consider the following optimization problem of { (u, d ) = arg min φ(u, d) := T (d) + λ u f + q Eu + d ]} (.4) for some parameters λ, q > 0. Similar to φ(u) in (.), the new objective function φ(u, d) is also a nonsmooth strongly convex function. While the non-separable and nonsmooth term T ( ) makes problem (.) difficult to solve, the nonsmooth term T ( ) in (.4) is separable with respect to (d, d ). This fact will enable us to design a very efficient algorithm for solving problem (.4) (see Section 3.). We would also like to provide some intuitive explanations about the reformulation given in (.4). Observe that both terms, i.e., T (d) and Eu + d, can be viewed as certain regularization terms. While the first term T (d) enforces the sparsity of the vector d, i.e., the estimated gradient vector, and thus help to smooth the recovered image, the latter term Eu + d essentially takes into account that the computation of d is not exact because of the stochastic noise. Introducing this extra regularization term into the optimization problem would protect the image from being oversmoothed, as what might happen for the original formulation in (.) (see Section 4). It is interesting to observe some relations between problem (.) and (.4). Proposition Let φ and φ be the optimal values of (.) and (.4), respectively. We have φ φ φ + N λq. (.5) If follows from Proposition that, for a given λ and N, the parameter q in (.4) should be big enough in order to approximately solve the original problem (.). Observe, however, that our goal is not to solve problem (.), but to recover the contaminated image. Due to the aforementioned role that the extra regularization term Eu + d plays in (.4), we argue that it is not necessary to choose a very large q. Indeed, we observe from our computational experiments that q can be set to or 4 for most cases, and that selecting a much larger value of q seems to be actually harmful to image denoising. 3
4 3 A linearly convergent algorithm for TV denoising In the previous section, we reformulated the TVM problem as to minimize the summation of a relatively simple nonsmooth convex function and a smooth strongly convex function. Our goal in this section is to show that such a reformulation can be efficiently solved. More specifically, we first present an accelerated gradient descent (AC-GD) method based on Nesterov s smooth optimal method 4, 6] for solving a general class of strongly convex composite optimization problems. Then, we show that, by using this algorithm, one can solve problem (.4) in O ( ) q ln(/ɛ) iterations. Our algorithm can be viewed as a variant of the well-known FISTA algorithm by Beck and Teboulle, ]. However, since FISTA does not take the advantage of the strong convexity of the problem, it possesses a much worse performance guarantee than the one mentioned above. 3. The accelerated gradient descent (AC-GD) algorithm Consider the following general composite problem of Ψ := min {Ψ(x) := ψ(x) + X (x)}, (3.6) x X where X R n is a closed convex set, X : X R is a simple convex function, and ψ : X R is smooth and strongly convex with Lipschitz continuous gradients, i.e., L 0, µ 0, such that µ y x ψ(y) ψ(x) ψ (x), y x L y x, x, y X. (3.7) The following AC-GD algorithm for solving (3.6) maintains the updating of three intertwined sequences, namely, {x t }, {x ag t } and {x md t } at each iteration t. All these types of multi-step gradient algorithms originate from Nesterov s seminal work in 4] (see Tseng 8] for a summary). However, very few of these algorithms can make use of the special strongly convex composite structure in (3.6) except those in 0, ]. The AC-GD method for strongly convex composite optimization. Input: x 0 X, step-size parameters {α t } t and {γ t } t s.t. α =, α t (0, ) for any t, and γ t 0 for any t. 0) Set the initial point x ag 0 = x 0 and t = ; ) Set ) Set x + t = x md t = ( α t)(µ + γ t ) γ t + µ( α t ) xag t + α t µ( α t ) + γ t ] γ t + µ( α t ) x t ; (3.8) α t µ x md t µ + γ t x t = arg min x X + ( α t)µ + γ t µ + γ t x t, (3.9) { α t ψ (x md t ), x + α t X (x) + (µ + γ t) } x + t x, (3.0) x ag t = α t x t + ( α t )x ag t ; (3.) 3) Set t t + and go to step. The AC-GD algorithm differs from a related accelerated stochastic approximation (AC-SA) algorithm for solving strongly convex composite optimization problems 0, ] in the following aspects. 4
5 Firstly, the above algorithm is deterministic, while the one in 0] is stochastic. Secondly, the subproblem used to define x t in (3.0) is much simpler than the corresponding one in 0]. Finally, we show that the above simple AC-GD algorithm can achieve the optimal rate of convergence for solving strongly convex composite problems possessed by a more involved multi-stage algorithm in ]. Theorem below describes the main convergence properties of the above AC-GD algorithm. Theorem Assume that {α t } t and {γ t } t in the AC-GD algorithm are chosen such that where Then, we have for any t, Γ t := µ + γ t Lα t, (3.) γ /Γ = γ /Γ =..., (3.3) {, t =, ( α t )Γ t, t. (3.4) Ψ(x ag t ) Ψ Γ tγ x 0 x, (3.5) where x is an optimal solution of (3.6). By properly choosing the stepsize parameters α t and γ t, we show that the above AC-GD algorithm can achieve the optimal rate of convergence for solving problem (3.6). Corollary 3 Let {x ag t } t be computed by the AC-GD algorithm with { } µ α t = max L, and γ t = LΓ t, (3.6) t + where Γ t is defined in (3.4). Then we have Ψ(x ag t ) Ψ min { ( µ L ) t, } L x 0 x, t. (3.7) t(t + ) Proof. The result follows by plugging the values of α t and γ t into (3.5) and noting that { ( ) t µ t ( Γ t min, ) } { ( ) t } µ = min,. L τ + L t(t + ) τ= 3. The AC-GD algorithm for total variation minimization In this subsection, we discuss how to apply the above AC-GD algorithm to solve the reformulated TV minimization problem in (.4). First, observe that the objective function φ( ) in (.4) can be written in the composite form, i.e., φ(u, d) = T (d) + ψ(u, d), where ψ(u, d) is given by ψ(u, d) := λ ( u A d ) ( f 0 ) ] Iu 0, A :=. (3.8) qe qid Here I u R N N and I d R N(N ) N(N ) are the identity matrices. Proposition 4 below summarizes some properties of ψ( ). 5
6 Proposition 4 The function ψ( ) in (3.8) is strongly convex with modulus Moreover, its gradient is Lipschitz continuous with constant µ ψ λ 0 +. (3.9) q L ψ λ( + q). (3.0) In view of the composition structure of φ( ) and Proposition 4, we can apply the AC-GD algorithm for solving problem (.4). Moreover, since A is very sparse, the computation of the gradient of ψ( ) only takes O(N ) arithmetic operations. Second, it is worth noting that the subproblem (3.0) arising from the AC-GD method applied to problem (.4) is easy to solve. Indeed, the subproblem (3.0) is given in the form of { (u, d) t = argmin u,d c, d + T (d) + p d d + t + c, u + p u u + t }, (3.) for some p > 0, c R N(N ) and c R N N. Suppose that the norm is given by an l norm in the definition of T ( ). By examining the optimality condition of problem (3.), we have the following explicit formula (see Section A.4 for more details): u t = p ( pu + t c ), (3.) and 0 if d t,ij = pd + t,ij c ) ij (pd p pd + t,ij c ij. + t,ij c ij if pd + t,ij c ij, pd + t,ij c ij > ; (3.3) where d t,ij = ( (d t ) (d t ) ) ( ( ) d +, d + t,ij = t ( ) d + t ) and c ij = ( c c ). Also note that one can write explicit solutions of (3.) if =. For both cases, solving the subproblem (3.0) requires only O(N ) arithmetic operations. We are now ready to state our main results. Theorem 5 Let (u 0, d 0 ) be an initial point of the AC-GD algorithm applied to problem (.4), and D 0 := (u 0, d 0 ) (u, d ). Also assume that the parameter q and that the stepsize parameters (α t, γ t ), t, are set to (3.6). Then, the iteration-complexity of the AC-GD algorithm for finding an ɛ-solution of (.4) can be bounded by ( { }) λqd 0 λqd0 O min q ln,. (3.4) ε ε Moreover, its arithmetic complexity can be bounded by ( { }) O N λqd 0 λqd0 min q ln,. (3.5) ε ε 6
7 Proof. The bound (3.4) follows immediately from Corollary 3, Proposition 4 and the observation that L ψ/µ ψ = O(q) when q. The bound (3.5) follows from (3.4) and the fact that the number of arithmetic operations in each iteration of the algorithm is bounded by O(N ). Observe that the complexity of FISTA applied to problem (.4) is O( λqd 0 /ε) (see, ]), which is strictly worse than the bound in (3.4). In particular, suppose that q is a given constant, in view of Theorem 5, the complexity of the AC-GD algorithm only weakly depends on the accuracy ɛ, the parameter λ, as well as the distance D 0 (and thus the dimension of the problem). Moreover, its total arithmetic complexity is polynomial with a mild linear dependence on the problem dimension N. 4 Numerical Results and Biomedical Application In this section, we report our preliminary computational results where we compare our reformulation (RTVM) in (.4) with the original TVM (OTVM) model in (.) for image denoising. We also compare the performance of two first-order algorithms for composite optimization, i.e., FISTA and AC-GD, applied to our reformulation. Furthermore, we discuss the application of the developed techniques for solving certain biomedical image denoising problems. 4. Numerical Study on General Image Denosing Problems In this subsection, we conduct numerical experiments on a few classical image denosing problems. In our first experiment, we show that the reformulated TVM model is comparable to the original model in term of the quality of the denoised images. Two image instance sets were used in this experiment. In the first instance set, we take the Lena test image whose pixels were scaled between 0 and. The noisy image is obtained by adding a white Gaussian noise with zero mean and various standard deviations (σ). In the second one, we use the Peppers test images with different sizes and the noise is added similarly to the first one with σ = 0.. The original and noisy images for both instance sets are given in Figure. We set the parameter λ to 6 for both formulations in (.) and (.4). We solve the original TVM model by using an efficient primal-dual algorithm 4] and also apply the AC-GD algorithm for the reformulated TVM model with different values of q. We then report the best (largest) value of the Peak Signal-to-Noise Ratio (PSNR) obtained after 00 iterations for both approaches. The results of these two instance sets are reported in Table and Table, respectively. Moreover, Figure and Figure 3, respectively, represent the denoised Lena and Peppers images obtained from solving the original TVM model, and the reformulated TVM model with q = and 4. Table : PSNR of denoised Lena images by OTVM and RTVM formulations Std. Denoised Lena images dev. Noisy OTVM RTVM RTVM RTVM RTVM RTVM RTVM σ image q = 0.5 q = q = q = 4 q = 8 q =
8 Original Lena image Noisy Lena image, σ = 0. Original Peppers image Noisy Peppers image, σ = 0. Figure : Original and noised images OTVM denoised image RTVM denoised image RTVM denoised image with λ = 6 with λ = 6, q = with λ = 6, q = 4 Figure : Denoised Lena images obtained by the original and reformulated TVM models 8
9 Table : PSNR of denoised Peppers images by OTVM and RTVM formulations Images Denoised Peppers images size Noisy OTVM RTVM RTVM RTVM RTVM RTVM RTVM N image q = 0.5 q = q = q = 4 q = 8 q = OTVM denoised image RTVM denoised image RTVM denoised image with λ = 6 with λ = 6, q = with λ = 6, q = 4 Figure 3: Denoised Peppers images obtained by the original and reformulated TVM models It can be seen from Table and that the values of PSNR computed from the reformulated TVM model are not too sensitive to the choice of parameter q under different selection of noise level σ and image size N. We can set q = or q = 4 in practice to achieve reasonably good solution quality. We also observe that the quality of denoised images obtained by the reformulated TVM model is comparable to that obtained by the original model. In fact, at the first glance, the denoised Lena image using the original TVM model seems to be cleaner than those obtained by using the reformulated model. However, a closer examination reveals that some undesirable oversmoothing effects, e.g., the disappeared texture at the hat and a few extra lines at the nose of the Lena image in Figure, were introduced by the original TVM model. On the other hand, these oversmoothing effects were not appearant in the denoised images using the reformulated model. Moreover, no significant differences could be observed for the denoised Peppers images obtained by using the original and reformulated TVM models. In our second experiment, we demonstrate that AC-GD is faster than FISTA for solving the composite minimization problem in (.4). From our discussion in Section 3., the convergence rate of AC-GD always dominates that of FISTA for solving strongly convex composite optimization problems. Our goal here is to verify this claim from the numerical experiments. Figure 4 shows the convergence behavior of AC-GD and FISTA applied to the Lena image. More specifically, we report the optimality gap φ(u k, d k ) φ for both algorithms, where the optimal value φ was estimated by running FISTA for 0, 000 iterations. As shown in Figure 4, after only 50 iterations, the AC-GD method can reach 0 accuracy. It can also be easily seen from Figure 4 that AC-GD converges linearly while FISTA converges sublinearly. This indeed reflects the difference on the theoretical 9
10 0 4 0 FISTA algorithm AC GD algorithm φ(uk, dk) φ k Figure 4: AC-GD vs. FISTA applied to denoise the Lena image convergence rates for both algorithms applied to problem (.4). 4. Applications in Biomedical Image Denosing In this subsection, we apply the developed reformulations for TVM in Magnetic resonance image (MRI), which provides detailed information about internal structures of the body. In comparison with other medical imaging techniques, MRI is most useful for brain and muscle imaging. Two image instance sets were used in this experiment. In the first instance set, we use the Brain MRI test image and noisy images are obtained by adding a white Gaussian noise with zero mean and various standard deviations (σ). In the second one, we use the Knee MRI test images with different sizes and noisy image with σ = 0.. We applied AC-GD algorithm with same settings (λ = 6 and q = 0.5,,, 4, 8, 6 ) as previous experiment to solve the RTVM model for these two instance sets. The As shown in Figure 5, Figure 6 and Table 3, Table 4, these results obtained for MRI images are consistent with those in Section 4.. Table 3: PSNR of denoised Brain MRI by OTVM and RTVM formulations Std. Denoised Brain MRIs dev. Noisy OTVM RTVM RTVM RTVM RTVM RTVM RTVM σ image q = 0.5 q = q = q = 4 q = 8 q =
11 56 56 Original Knee MRI Noisy Knee MRI, σ = 0. Denoised Knee MRI, λ = 6, q = Denoised Knee MRI, λ = 6, q = 4 Figure 5: Original, noised and denoised Knee MRI Original Brain MRI Noisy Brain MRI, σ = 0. Denoised Brain MRI, λ = 6, q = Denoised Brain MRI, λ = 6, q = 4 Figure 6: Original, noised and denoised Brain MRI
12 Table 4: PSNR of denoised Knee MRIs by OTVM and RTVM formulations Images Denoised Knee MRIs size Noisy OTVM RTVM RTVM RTVM RTVM RTVM RTVM N image q = 0.5 q = q = q = 4 q = 8 q = Conclusions In this paper, we introduced a strongly convex composite reformulation for the ROF model, a wellknown approach in biomedical image processing. We show that this reformulation is comparable with the original ROF model in terms of the quality of denoised images. We presented a first-order algorithm which possesses a linear rate of convergence for solving the reformulated problem and can be scalable to large-scale imaging denosing problems. We demonstrate from our numerical experiments that the developed algorithm, when applied to reformulated model, compares favorably with existing first-order algorithms applied to original model. In the future, we would like to generalize these reformulations to others image processing problems such as image deconvolution and image zooming.
13 References ] A. Beck and M. Teboulle. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Proc., 8:49 434, 009. ] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sciences, :83 0, ] A. Chambolle. An algorithm for total variation minimization and applications. J. Math. Imaging Vision, 0:89 97, ] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision, 40:0 45, 0. 5] T.F. Chan, G.H. Golub, and P. Mulet. A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput., 0(6):964977, ] P.L. Combettes and J. Luo. An adaptive level set method for nondifferentiable constrained image recovery. IEEE Trans. Image Process.,. 7] J. Dahl, P.C. Hansen, S.H. Jensen, and T.L. Jensen. Algorithms and software for total variation image reconstruction via first-order methods. Numerical Algorithms, pages 67 9, ] F. Dibos and G. Koepfler. Global total variation minimization. SIAM J. Numer. Anal., page , ] H. Y. Fu, M. K. Ng, M. Nikolova, and J. L. Barlow. Efficient minimization methods of mixed l l and l l norms for image restoration. SIAM Journal on Scientific Computing, 7(6):88 90, ] S. Ghadimi and G. Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework. Manuscript, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 36, USA, 00. Submitted to SIAM Journal on Optimization. ] S. Ghadimi and G. Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. Manuscript, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 36, USA, 00. Submitted to SIAM Journal on Optimization. ] D. Goldfarb and W. Yin. Second-order cone programming methods for total variation-based image restoration. SIAM Journal on Scientific Computing, 7:6 645, ] T. Goldstein and S. Osher. The split bregman algorithm for l regularized problems. Manuscript, UCLA CAM Report(08-9), April ] Y. E. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence O(/k ). Doklady AN SSSR, 69: , ] Y. E. Nesterov. Excessive gap technique in nonsmooth convex minimization. SIAM Journal on Optimization, 6:35 49,
14 6] Y. E. Nesterov. Smooth minimization of nonsmooth functions. Mathematical Programming, 03:7 5, ] L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D, 60:5968, 99. 8] P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. Manuscript, University of Washington, Seattle, May ] Y. Wang, J. Yang, W. Yin, and Y. Zhang. A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences, (3):48 7,
15 Appendix We provide the proofs for Proposition, Theorem and Proposition 4 in Sections A., A. and A.3, respectively. We also discuss how to solve the subproblems arising from the AC-GD algorithm applied to problem (.4) in Section A.4. A. Relation between two formulations (Proposition ) In this subsection, we provide the proof of Proposition, which shows how the optimal values of problem (.) and (.4) are related. Proof of Proposition First note that by the definitions of T ( ) and T ( ), we have T (d) = T (u), if d = Eu. (5.6) Let ū be an optimal solution of (.) and d = Eū. Using (.), (.4) and (5.6), we have φ φ(ū, d) = T ( Eū) + λ ] ū f + q Eū Eū = T (ū) + λ ū f = φ, We now show the second relation in (.5). Let (u, d ) be an optimal solution to (.4), then φ φ(u ) = T (u ) + λ u f. Observe that by the definition of T ( ), we have, T (d + δ) = ( d ij + δ ) ij d ij + ( ) d ij δ ij d + ( ) δ ij ij δij T (d) + N δ, where the last inequality follows from the Cauchy-Swartz inequality. It then follows from (5.6) and the above conclusion that Therefore, T (u ) = T ( Eu ) = T (d (d + Eu )) T (d ) + N d + Eu. φ T (d ) + N d + Eu + λ u f = T (d ) + λ u f + q d + Eu ] + N d + Eu λq d + Eu = φ(u, d ) + N d + Eu λq d + Eu φ + N λq, where the last relation follows from Young s inequality. A. Concergence analysis for the AC-GD algorithm (Theorem ) Our main goal in this subsection is to prove the convergence results of the AC-GD algorithm described in Theorem. We first establish two technical results. Lemma 6 states some properties of the subproblem (3.0) and Lemma 7 establishes an important recursion of the AC-GD algorithm. Theorem then follows directly from Lemma 7. The first technical result below characterizes the solution of the projection step (3.0). 5
16 Lemma 6 Let X{ be a convex set and p : X} R be a convex function. Assume that û is an optimal solution of min p(u) + µ x u : u X, where x, ỹ X and µ > 0 are given. Then, for any u X, p(û) + µ x û + µ û u p(u) + µ x u. (5.7) Proof. Denote q(u) = p(u)+ µ x u. The result immediately follows from the strong convexity of q(u) and the optimality condition that q (û), u û 0 for any u X. The following lemma establishes an important recursion for the AC-GD algorithm. Lemma 7 Let (x t, x ag t ) X X be given. Also let (xmd t, x t, x ag t ) X X X be computed according to (3.8),(3.0), (3.) and suppose that (3.) holds for given γ t and α t. Then, for any x X, we have Ψ(x ag t ) + µ x t x ( α t ) Ψ(x ag t ) + µ x t x ] + α t Ψ(x) + γ t x t x x t x ]. (5.8) Proof. We first establish some basic relations among the search points x ag t, x md t, x t and x + t. Denote d t := x ag t x md t. It follows from (3.8), (3.) and (3.9) that d t = α t x t + ( α t ) x ag t xmd t ( ) ( α = α t x t + t )µ + γ t µ + γ t ( = α t x t α tµ x md t ( α t)µ + γ t x t µ + γ t µ + γ t Using the above result and the convexity of ψ, we have x md t α t ( α t )µ + γ t ] x t µ + γ t ) = α t (x t x + t ). (5.9) ψ(x md t ) + ψ (x md t ), d t = ψ(x md t ) + ψ (x md t ), α t x t + ( α t )x ag t xmd t ] = ( α t ) ψ(x md t ) + ψ (x md t ), x ag t xmd t ] + α t ψ(x md t ) + ψ (x md t ), x t x md t ] ( α t )ψ(x ag t ) + α t ψ(x md t ) + ψ (x md t ), x t x md t. It then follows from the previous two observations, (3.7),(3.) and the convexity of X (x) that Ψ(x ag t ) = ψ(x ag t ) + X (x ag t ) ψ(x md t ) + ψ (x md t ), d t + L d t + ( α t )X (x ag t ) + α tx (x t ) ] ) + ψ (x md ), x t x md + L d t ( α t )ψ(x ag t ) + α t ψ(x md t +( α t )X (x ag t ) + α tx (x t ) ] ( α t )Ψ(x ag t ) + α t ψ(x md t ) + ψ (x md t ), x t x md t + X (x t ) + L d t ] = ( α t )Ψ(x ag t ) + α t ψ(x md t ) + ψ (x md t ), x t x md t + X (x t ) t t (5.30) + µ + γ t α t d t µ + γ t Lα t α t d t. (5.3) 6
17 Now let us apply the result regarding the projection step in (3.0). Specifically, by using Lemma 6 with p( ) = α t ψ (x md t ), x + X ( )], û = x t, and x = x + t, we have, for any x X, µ + γ ] t x t x + α t ψ(x md t ) + ψ (x md t ), x t x md t + X (x t ) + µ + γ t x t x + t α t ψ(x md t ) + ψ (x md t ), x x md t + X (x)] + µ + γ t x x + t α t ψ(x md t ) + ψ (x md t ), x x md t + X (x)] + α tµ x x md t + ( α t)µ + γ t x x t α t Ψ(x) + ( α t)µ + γ t x x t, (5.3) where the second inequality follows from (3.9) and the convexity of, and the last inequality follows from the strong convexity of Ψ( ). Combining (5.3) and (5.3), we obtain Ψ(x ag t ) ( α t ) Ψ(x ag t ) + µ x x t ] + α t Ψ(x) + γ t x x t x x t ] µ x x t µ + γ t Lα t α t d t, which clearly implies (5.8) due to the assumption in (3.). We are now ready to prove Theorem. Proof of Theorem : Dividing both sides of (5.8) by Γ t, and using (3.4) and the fact that α =, we have Ψ(x ag t ) + µ Γ t x x t ] Ψ(x ag t Γ ) + µ t x x t ] + α t Ψ(x) Γ t + γ t x x t x x t ], t, Γ t and Ψ(x ag Γ ) + µ x x ] α Ψ(x) + γ x x 0 x x ]. Γ Γ Summing up above inequalities, we obtain Ψ(x ag t ) + µ Γ t x x t ] t τ= α τ Γ τ Ψ(x) + t τ= Note that by (3.4) and the fact α =, we have t α τ = α t ( + Γ ) τ = + Γ τ Γ Γ τ Γ τ Γ τ= τ= γ τ x x τ x x τ ]. (5.33) Γ τ t τ= ( ) =. (5.34) Γ τ Γ τ Γ t Using the above two relations, condition (3.3) and the fact that Γ =, we have Ψ(x ag t ) + µ Γ t x x t ] Ψ(x) + γ x x 0 x x t ] Γ t Γ which clearly implies (3.5). Γ t Ψ(x) + γ x x 0, 7
18 A3. Properties of the composite function (Proposition 4) In this subsection, we provide the proof of Proposition 4, which provides certain estimates on the two crucial parameters µ ψ and L ψ for the smooth component ψ( ) in the composite function φ( ). Proof of Proposition 4: Denote the maximum eigenvalue and minimum eigenvalue of M := A T A by λ max and λ min respectively. Then, it suffices to show that and λ max + q, (5.35) λ min 0 +. (5.36) q We bound the eigenvalues of M by using Gershgorin s Theorem. Observe that the network flow matrix E in (.3) can be written explicitly as E := e K e 0 K P 0 K P 0 0 K P N K 0 0 P L, L, L,N 0 0 P L, L, L,N P N L N, L N, L N,N where e R N is the unit vector (, 0,..., 0) T, K R (N ) (N ) denotes the two-diagonals lower triangular matrix with the main diagonal entries equal to and the sub-diagonal entries equal to, P i, i N, denotes the matrix having the ith column equal to e and others entries equal to 0 and L R (N ) (N ), i, j N, denotes the matrix with the ith column equal to jth column of K and others entries equal to 0. First, we will find the upper bound for the maximum eigenvalue of M. It is easy to see that the Nth row of M having the largest value of the sum of the absolute values of all entries. We have N +N(N ) i= M N,i = + q, which, in view of the Gershgorin s Theorem, clearly implies (5.35). Second, we find the lower bound for the minimum eigenvalue of M. Since if λ is an eigenvalue of M then /λ is an eigenvalue of M, we will find the upper bound for the maximum eigenvalue of M instead of the minimum eigenvalue of M. Note that M = A (A ) T and that by applying, 8
19 Gauss-Jordan s elimination, we easily obtain the formula of A as follows I u 0 e K e 0 K P 0 K 0 0 A 0 P 0 0 K 0 ( / ) q I d 0 P N K 0 0 P L, L, L,N 0 0 P L, L, L,N P N L N, L N, L N,N It is easy to see that the (N + N)th row of M having the largest value for the sum of the absolute values of all the entries. In particular, we have N +N(N ) i= M N +N,i = 0 + q, which, in view of the Gershgorin s Theorem, clearly implies that λ max (M ) 0 + q. By using the above observation and the fact that λ min (M) = /λ max (M ), we obtain (5.36). A4. Solving the subproblem (3.) in AC-GD algorithm In this subsection, we derive the explicit solution to the subproblem (3.) in Step of the AC-GD algorithm. It is easy to see that (3.) is true. By definition of T (d), d t,ij is the solution of min c ij, d ij + d ij d + p d ij d + t,ij, ij where d ij = ( d d ), d + t,ij = ( ( d + t ) ( d + t ) ), c ij = ( c c ). For notational convenience, let us consider the following problem: min y r, y + y + p y q, (5.37) 9
20 where y, q, r R, p R and p > 0. We consider two cases: Case : pq r. We have r, y + y + p y q = y + p y + p q p q, y + r, y = y + p y + p q pq r, y y + p y + p q pq r y p y + p q, which implies that y = 0 is the solution of (5.37) in case pq r. Case : pq r >. By the optimality condition of (5.37), we have y y + py pq + r = 0, which means { y y + py pq + r = 0, y y + py pq + r = 0. Denoting t = y, then we have { y ( t + p) = pq r, y ( t + p) = pq r, or equivalently, { y = t +tp (pq r ), y = t +tp (pq r ). Combining the above relation with t = y, we have t + tp pq r = t, or equivalently, t = p ( pq r ), which immediately implies that the optimal solution of (5.37) is given by y = pq r p pq r (pq r). Replacing y, q and r by d ij, d + t,ij and c ij respectively, we obtain (3.3). It is worth noting that this formula still holds in case y, q, r R. 0
Gradient Sliding for Composite Optimization
Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this
More informationAdaptive Primal Dual Optimization for Image Processing and Learning
Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationFAST ALTERNATING DIRECTION OPTIMIZATION METHODS
FAST ALTERNATING DIRECTION OPTIMIZATION METHODS TOM GOLDSTEIN, BRENDAN O DONOGHUE, SIMON SETZER, AND RICHARD BARANIUK Abstract. Alternating direction methods are a common tool for general mathematical
More informationIteration-complexity of first-order penalty methods for convex programming
Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)
More informationEfficient Methods for Stochastic Composite Optimization
Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June
More informationLINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING
LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING JIAN-FENG CAI, STANLEY OSHER, AND ZUOWEI SHEN Abstract. Real images usually have sparse approximations under some tight frame systems derived
More informationACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING
ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely
More informationA Primal-dual Three-operator Splitting Scheme
Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationMinimizing Isotropic Total Variation without Subiterations
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Minimizing Isotropic Total Variation without Subiterations Kamilov, U. S. TR206-09 August 206 Abstract Total variation (TV) is one of the most
More informationA Tutorial on Primal-Dual Algorithm
A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.
More informationSPARSE SIGNAL RESTORATION. 1. Introduction
SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationMini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization
Noname manuscript No. (will be inserted by the editor) Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Saeed Ghadimi Guanghui Lan Hongchao Zhang the date of
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationGeneralized Uniformly Optimal Methods for Nonlinear Programming
Generalized Uniformly Optimal Methods for Nonlinear Programming Saeed Ghadimi Guanghui Lan Hongchao Zhang Janumary 14, 2017 Abstract In this paper, we present a generic framewor to extend existing uniformly
More informationA GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION
A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient (PDHG) algorithm proposed
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationA General Framework for a Class of Primal-Dual Algorithms for TV Minimization
A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method
More informationA Dual Formulation of the TV-Stokes Algorithm for Image Denoising
A Dual Formulation of the TV-Stokes Algorithm for Image Denoising Christoffer A. Elo, Alexander Malyshev, and Talal Rahman Department of Mathematics, University of Bergen, Johannes Bruns gate 12, 5007
More informationOn convergence rate of the Douglas-Rachford operator splitting method
On convergence rate of the Douglas-Rachford operator splitting method Bingsheng He and Xiaoming Yuan 2 Abstract. This note provides a simple proof on a O(/k) convergence rate for the Douglas- Rachford
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationPrimal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming
Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro
More informationACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.
ACCELERATED LINEARIZED BREGMAN METHOD BO HUANG, SHIQIAN MA, AND DONALD GOLDFARB June 21, 2011 Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (A) method for solving the
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationA First Order Primal-Dual Algorithm for Nonconvex T V q Regularization
A First Order Primal-Dual Algorithm for Nonconvex T V q Regularization Thomas Möllenhoff, Evgeny Strekalovskiy, and Daniel Cremers TU Munich, Germany Abstract. We propose an efficient first order primal-dual
More informationAn adaptive accelerated first-order method for convex optimization
An adaptive accelerated first-order method for convex optimization Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter July 3, 22 (Revised: May 4, 24) Abstract This paper presents a new accelerated variant
More informationDual methods for the minimization of the total variation
1 / 30 Dual methods for the minimization of the total variation Rémy Abergel supervisor Lionel Moisan MAP5 - CNRS UMR 8145 Different Learning Seminar, LTCI Thursday 21st April 2016 2 / 30 Plan 1 Introduction
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationOn Stochastic Primal-Dual Hybrid Gradient Approach for Compositely Regularized Minimization
On Stochastic Primal-Dual Hybrid Gradient Approach for Compositely Regularized Minimization Linbo Qiao, and Tianyi Lin 3 and Yu-Gang Jiang and Fan Yang 5 and Wei Liu 6 and Xicheng Lu, Abstract We consider
More informationarxiv: v1 [math.oc] 9 Sep 2013
Linearly Convergent First-Order Algorithms for Semi-definite Programming Cong D. Dang, Guanghui Lan arxiv:1309.2251v1 [math.oc] 9 Sep 2013 October 31, 2018 Abstract In this paper, we consider two formulations
More informationAdaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise
Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Minru Bai(x T) College of Mathematics and Econometrics Hunan University Joint work with Xiongjun Zhang, Qianqian Shao June 30,
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationA Primal-Dual Method for Total Variation-Based. Wavelet Domain Inpainting
A Primal-Dual Method for Total Variation-Based 1 Wavelet Domain Inpainting You-Wei Wen, Raymond H. Chan, Andy M. Yip Abstract Loss of information in a wavelet domain can occur during storage or transmission
More informationConvex Hodge Decomposition and Regularization of Image Flows
Convex Hodge Decomposition and Regularization of Image Flows Jing Yuan, Christoph Schnörr, Gabriele Steidl April 14, 2008 Abstract The total variation (TV) measure is a key concept in the field of variational
More informationTRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS
TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München, München, Arcistraße
More informationSolving DC Programs that Promote Group 1-Sparsity
Solving DC Programs that Promote Group 1-Sparsity Ernie Esser Contains joint work with Xiaoqun Zhang, Yifei Lou and Jack Xin SIAM Conference on Imaging Science Hong Kong Baptist University May 14 2014
More informationSparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More informationA GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE
A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient
More informationAn Algorithmic Framework of Generalized Primal-Dual Hybrid Gradient Methods for Saddle Point Problems
An Algorithmic Framework of Generalized Primal-Dual Hybrid Gradient Methods for Saddle Point Problems Bingsheng He Feng Ma 2 Xiaoming Yuan 3 January 30, 206 Abstract. The primal-dual hybrid gradient method
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationOn the acceleration of augmented Lagrangian method for linearly constrained optimization
On the acceleration of augmented Lagrangian method for linearly constrained optimization Bingsheng He and Xiaoming Yuan October, 2 Abstract. The classical augmented Lagrangian method (ALM plays a fundamental
More informationVariational Image Restoration
Variational Image Restoration Yuling Jiao yljiaostatistics@znufe.edu.cn School of and Statistics and Mathematics ZNUFE Dec 30, 2014 Outline 1 1 Classical Variational Restoration Models and Algorithms 1.1
More informationSparse Optimization: Algorithms and Applications. Formulating Sparse Optimization. Motivation. Stephen Wright. Caltech, 21 April 2007
Sparse Optimization: Algorithms and Applications Stephen Wright 1 Motivation and Introduction 2 Compressed Sensing Algorithms University of Wisconsin-Madison Caltech, 21 April 2007 3 Image Processing +Mario
More informationBlock Coordinate Descent for Regularized Multi-convex Optimization
Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline
More informationGradient based method for cone programming with application to large-scale compressed sensing
Gradient based method for cone programming with application to large-scale compressed sensing Zhaosong Lu September 3, 2008 (Revised: September 17, 2008) Abstract In this paper, we study a gradient based
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationAsymmetric Cheeger cut and application to multi-class unsupervised clustering
Asymmetric Cheeger cut and application to multi-class unsupervised clustering Xavier Bresson Thomas Laurent April 8, 0 Abstract Cheeger cut has recently been shown to provide excellent classification results
More informationPrimal-Dual First-Order Methods for a Class of Cone Programming
Primal-Dual First-Order Methods for a Class of Cone Programming Zhaosong Lu March 9, 2011 Abstract In this paper we study primal-dual first-order methods for a class of cone programming problems. In particular,
More informationLecture 8: February 9
0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationAbout Split Proximal Algorithms for the Q-Lasso
Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S
More informationMinimizing the Difference of L 1 and L 2 Norms with Applications
1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:
More informationRandomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity
Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University
More informationEnhanced Compressive Sensing and More
Enhanced Compressive Sensing and More Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Nonlinear Approximation Techniques Using L1 Texas A & M University
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationStochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions
International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.
More informationLarge-Scale L1-Related Minimization in Compressive Sensing and Beyond
Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March
More informationOn the acceleration of the double smoothing technique for unconstrained convex optimization problems
On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities
More informationRecent developments on sparse representation
Recent developments on sparse representation Zeng Tieyong Department of Mathematics, Hong Kong Baptist University Email: zeng@hkbu.edu.hk Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last
More informationCompressive Imaging by Generalized Total Variation Minimization
1 / 23 Compressive Imaging by Generalized Total Variation Minimization Jie Yan and Wu-Sheng Lu Department of Electrical and Computer Engineering University of Victoria, Victoria, BC, Canada APCCAS 2014,
More informationarxiv: v1 [math.oc] 5 Dec 2014
FAST BUNDLE-LEVEL TYPE METHODS FOR UNCONSTRAINED AND BALL-CONSTRAINED CONVEX OPTIMIZATION YUNMEI CHEN, GUANGHUI LAN, YUYUAN OUYANG, AND WEI ZHANG arxiv:141.18v1 [math.oc] 5 Dec 014 Abstract. It has been
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationADMM and Fast Gradient Methods for Distributed Optimization
ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work
More informationA New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction
A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with H. Bauschke and J. Bolte Optimization
More informationarxiv: v1 [math.na] 3 Jan 2019
arxiv manuscript No. (will be inserted by the editor) A Finite Element Nonoverlapping Domain Decomposition Method with Lagrange Multipliers for the Dual Total Variation Minimizations Chang-Ock Lee Jongho
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationAdaptive Primal-Dual Hybrid Gradient Methods for Saddle-Point Problems
Adaptive Primal-Dual Hybrid Gradient Methods for Saddle-Point Problems Tom Goldstein, Min Li, Xiaoming Yuan, Ernie Esser, Richard Baraniuk arxiv:3050546v2 [mathna] 24 Mar 205 Abstract The Primal-Dual hybrid
More informationarxiv: v1 [math.oc] 3 Jul 2014
SIAM J. IMAGING SCIENCES Vol. xx, pp. x c xxxx Society for Industrial and Applied Mathematics x x Solving QVIs for Image Restoration with Adaptive Constraint Sets F. Lenzen, J. Lellmann, F. Becker, and
More informationA primal-dual fixed point algorithm for multi-block convex minimization *
Journal of Computational Mathematics Vol.xx, No.x, 201x, 1 16. http://www.global-sci.org/jcm doi:?? A primal-dual fixed point algorithm for multi-block convex minimization * Peijun Chen School of Mathematical
More informationarxiv: v4 [math.oc] 29 Jan 2018
Noname manuscript No. (will be inserted by the editor A new primal-dual algorithm for minimizing the sum of three functions with a linear operator Ming Yan arxiv:1611.09805v4 [math.oc] 29 Jan 2018 Received:
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationPrimal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming
Math. Program., Ser. A (2011) 126:1 29 DOI 10.1007/s10107-008-0261-6 FULL LENGTH PAPER Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato
More informationA Variational Approach to Reconstructing Images Corrupted by Poisson Noise
J Math Imaging Vis c 27 Springer Science + Business Media, LLC. Manufactured in The Netherlands. DOI: 1.7/s1851-7-652-y A Variational Approach to Reconstructing Images Corrupted by Poisson Noise TRIET
More informationAn accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems
An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationA Sparsity Preserving Stochastic Gradient Method for Composite Optimization
A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite
More informationNear-Potential Games: Geometry and Dynamics
Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics
More informationImage Cartoon-Texture Decomposition and Feature Selection using the Total Variation Regularized L 1 Functional
Image Cartoon-Texture Decomposition and Feature Selection using the Total Variation Regularized L 1 Functional Wotao Yin 1, Donald Goldfarb 1, and Stanley Osher 2 1 Department of Industrial Engineering
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationUPRE Method for Total Variation Parameter Selection
UPRE Method for Total Variation Parameter Selection Youzuo Lin School of Mathematical and Statistical Sciences, Arizona State University, Tempe, AZ 85287 USA. Brendt Wohlberg 1, T-5, Los Alamos National
More informationConvex Optimization on Large-Scale Domains Given by Linear Minimization Oracles
Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationSparse Optimization Lecture: Dual Certificate in l 1 Minimization
Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Instructor: Wotao Yin July 2013 Note scriber: Zheng Sun Those who complete this lecture will know what is a dual certificate for l 1 minimization
More informationOptimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method
Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors
More informationStrengthened Sobolev inequalities for a random subspace of functions
Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)
More informationWorst Case Complexity of Direct Search
Worst Case Complexity of Direct Search L. N. Vicente October 25, 2012 Abstract In this paper we prove that the broad class of direct-search methods of directional type based on imposing sufficient decrease
More informationIteration-complexity of first-order augmented Lagrangian methods for convex programming
Math. Program., Ser. A 016 155:511 547 DOI 10.1007/s10107-015-0861-x FULL LENGTH PAPER Iteration-complexity of first-order augmented Lagrangian methods for convex programming Guanghui Lan Renato D. C.
More informationProximal-like contraction methods for monotone variational inequalities in a unified framework
Proximal-like contraction methods for monotone variational inequalities in a unified framework Bingsheng He 1 Li-Zhi Liao 2 Xiang Wang Department of Mathematics, Nanjing University, Nanjing, 210093, China
More informationConvex Hodge Decomposition of Image Flows
Convex Hodge Decomposition of Image Flows Jing Yuan 1, Gabriele Steidl 2, Christoph Schnörr 1 1 Image and Pattern Analysis Group, Heidelberg Collaboratory for Image Processing, University of Heidelberg,
More informationUniqueness Conditions for A Class of l 0 -Minimization Problems
Uniqueness Conditions for A Class of l 0 -Minimization Problems Chunlei Xu and Yun-Bin Zhao October, 03, Revised January 04 Abstract. We consider a class of l 0 -minimization problems, which is to search
More information