arxiv: v3 [math.na] 9 May 2014

Size: px
Start display at page:

Download "arxiv: v3 [math.na] 9 May 2014"

Transcription

1 arxiv:3.683v3 [math.na] 9 May 04 New explicit thresholding/shrinkage formulas for one class of regularization problems with overlapping group sparsity and their applications Gang Liu, Ting-Zhu Huang, Xiao-Guang Lv, Jun Liu Abstract The least-square regression problems or inverse problems have been widely studied in many fields such as compressive sensing, signal processing, and image processing. To solve this kind of ill-posed problems, a regularization term (i.e., regularizer) should be introduced, under the assumption that the solutions have some specific properties, such as sparsity and group sparsity. Widely used regularizers include thel norm, total variation (TV) semi-norm, and so on. Recently, a new regularization term with overlapping group sparsity has been considered. Majorization imization iteration method or variable duplication methods are often applied to solve them. However, there have been no direct methods for solve the relevant problems because of the difficulty of overlapping. In this paper, we proposed new explicit shrinkage formulas for one class of these relevant problems, whose regularization terms have translation invariant overlapping groups. Moreover, we apply our results in TV deblurring and denoising with overlapping group sparsity. We use alternating direction method of multipliers to iterate solve it. Numerical results also verify the validity and effectiveness of our new explicit shrinkage formulas. Key words: overlapping group sparsity; regularization; explicit shrinkage formula; total variation; ADMM; deblurring Introduction The least-square regression problems or inverse problems have been widely studied in many fields such as compressive sensing, signal processing, image processing, statistics and machine learning. Regularization terms with sparse representations (for instance thel norm regularizer) have been developed into an important tool in these applications recently [7, 8, 9, 33]. These methods are based on the assumption that signals or images have a sparse representation, that is, only The work of Gang Liu, Ting-Zhu Huang and Jun Liu is supported by 973 Program (03CB39404), NSFC (637047), Sichuan Province Sci. & Tech. Research Project (0GZX0080). The work of Xiao-Guang Lv is supported by Postdoctoral Research Funds (03M540454, 30064B). School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, 673, P. R. China (wd5577@63.com). School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, 673, P. R. China (tingzhuhuang@6.com). School of Mathematical Sciences, Nanjing Normal University, Nanjing, Jiangsu, 0097, P. R. China (xiaoguanglv@6.com). School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, 673, P. R. China (junliucd@63.com).

2 containing a few nonzero entries. To further improve the solutions, more recent studies suggested to go beyond sparsity and took into account additional information about the underlying structure of the solutions [8, 9, ]. Particularly, a wide class of solutions which with specific group sparsity structure are considered. In this case, a group sparse vector can be divided into groups of components satisfying a) only a few of groups contain nonzero values and b) these groups are not needed to be sparse. This property sometimes calls joint sparsity that a set of sparse vectors with the union of their supports being sparse [7]. Many literature had consider this new sparse problems [7, 8, 9,, 8, 33]. Putting such group vectors into a matrix as row vectors of the marix, this matrix will only have few nonzero rows and these rows may be not sparse. These problems are typically obtained by replacing the problem z z + β z x, () with z r z[i] + β z x, () i= where x R n is a given vector, z R n, z p = ( n i= z i p ) p with (p=, ) represents thel p norm of vector z, z i is the absolute value of z i, and z[i] is the ith group of z with z[i] z[ j]= and ri= z[i]=z. The first term of former equations is called the regularization term, the second term is called the fidelity term, andβ>0 is the regularization parameters. Group sparsity solutions have better representation and have been widely studied both for convex and nonconvex cases [7, 8,, 6, 6, 8]. More recently, overlapping group sparsity (OGS) had been considered [,, 7, 8,, 0,, 4, 5, 30, 3]. These methods are based on the assumption that signals or image have a special sparse representation with OGS. The task is to solve the following problem z z, + β z x, (3) where z, = n i= (z i ) g is the generalizedl, -norm. Here, each (z i ) g is a group vector containing s (called group size) elements that surrounding the ith entry of z. For example, (z i ) g = (z i, z i, z i+ ) with s=3. In this case, (z i ) g, (z i+ ) g and (z i+ ) g contain the (i+)th entry of z (z i+ ), which means overlapping different from the form of group sparsity (). Particularly, if s =, the generalizedl, -norm degenerates into the originall -norm, and the relevant regularization problem (3) degenerates to (). To be more general, we consider the weighted generalizedl, -norm z w,, = n i= w g (z i ) g (we only consider that each group has the same weight, which means translation invariant) instead the former generalizedl, -norm, the task can be extended to z z w,, + β z x, (4) where w g is a nonnegative real vector with the same size as (z i ) g and is the point-wise product or hadamard product. For instance, w g (z i ) g = ((w g ) z i, (w g ) z i+, (w g ) 3 z i+ ) with s=3 as the former example. Particularly, the weighted generalizedl, -norm degenerates into the generalized l, -norm if each entry of w g equals to. The problems (3) and (4) had been considered in [,, 8,, 0,, 30]. They solve the relevant problems by using variable duplication methods (variable splitting, latent/auxilliary variables, etc.). Particularly, Deng et. al in [8] introduced a diagonal matrix G for this variable duplication methods. This matrix G was not easy to find and would

3 break the structure of the coefficient matrix, which made the difficulty of solving solutions under high dimensional vector cases. Moreover, it is difficult to extend this method to the matrix case. Considering the matrix case of the problem (4), we can get A A W,, + β A X F, (5) where X, A R m n, A W,, = m i= nj= W g (A i, j ) g F. Here, each (A i, j ) g is a group matrix containing K K (called group size) elements that surrounding the (i, j)th entry of A. For example, (W g ), A i l, j l (W g ), A i l, j l + (W g ),K A i l, j+r W g (A i, j ) g = (W g ), A i l +, j l (W g ), A i l +, j l + (W g ),K A i l +, j+r (W g ) K,A i+r, j l (W g ) K,A i+r, j l + (W g ) K,K A i+r, j+r R K K, where l = [ K ], l = [ K ], r = [ K ], r = [ K ] (with l + r + =K and l + r + =K ) and [x] denotes the largest integer less than or equal to x. Particularly, if K =, this problem degenerate to the former vector case (4). If K = K =, this problem degenerate to the original l regularization problem () for the matrix case. If (W g ) i, j for i=,, K, j=,, K, this problem had been considered in Chen et. al [4]. However, they used an iterative algorithm based on the principle of majorization imization (MM) to solve this problem. In this paper, we propose new explicit shrinkage formulas for all the former problems (3), (4) and (5), which can get accurate solutions without iteration in [4], variable duplication (variable splitting, latent/auxilliary variables, etc.) in [,,, 0,, 30], or finding matrix G in [8]. Numerical results will also verify the validity and effectiveness of our new explicit shrinkage formulas. Moreover, this new method can be used as a subproblem in many other OGS problems such as compressive sensing withl regularization and image restoration with total variation (TV) regularization. According to the framework of ADMM, the relevant convergence theory results of these OGS problems can be easy to be obtained because of the accurate solution of their subproblems with the application of our new explicit shrinkage formulas. For example, we will apply our results in image restoration using TV with OGS in our work, and we will obtain the convergence theorems. Numerical results will also verify the validity and effectiveness of our new methods. The outline of the rest of this paper is as follows. In Section we detailed deduce our explicit shrinkage formulas for OGS problems (3), (4) and (5). In Section 3, we propose some extension for these shrinkage formulas. In Section 4, we apply our results in image deblurring and denoising problems with OGS TV. Numerical results are given in Section 5. Finally, we conclude this paper in Section 6. OGS-shrinkage. Original shrinkage For the original sparse represent solutions we often want to solve the following problems z p + β z z x, p=,. (6)

4 Definition. Define shrinkage mappings S h and S h fromr N R + tor N by S h (x,β) i = sgn(x i ) max{ x i, 0}, (7) β S h (x,β)= x max{ x, 0}, (8) x β where both expressions are taken to be zero when the second factor is zero, and sgn represents the signum function indicating the sign of a number, that is, sgn(x)=0 if x = 0, sgn(x)=- if x < 0 and sgn(x)= if x>0. The shrinkage (7) is known as soft thresholding and occurs in many algorithms related to sparsity since it is the proximal mapping for thel norm. Then the imizer of (6) with p=is the following equation (9). arg z z + β z x = S h (x,β). (9) Thanks to the additivity and separability of both thel norm and the square of thel norm, the shrinkage (7) can be deduced easily by the following formula: z z + β z x = n i= z i z i + β z i x i. (0) The imizer of (6) with p= is the following equation (). arg z z + β z x = S h (x,β). () This formula is deduced by the Euler equation of (6) with p=. Clearly, β (z x)+ z 0, z () ( + ) z x 0. β z (3) We can easily get that the necessary condition is that the vector z is parallel to the vector x. That z is z = x x. Substitute into (3), and the formula (8) is obtained. More details please refer to [37, 38]. Our new explicit OGS shrinkage formulas are based on these observations, especially the properties of additivity, separability and parallelity. Remark. The problem () is also easy to be solved by a simple shrinkage formula, which is not used in this work. More details refer to [8, 34, 35].. The OGS shrinkage formulas Now we focus on the problem (3) firstly. The difficulty of this problem is overlapping. Therefore, we must take some special techniques to avoid overlapping. That is the point of our new explicit OGS shrinkage formulas.

5 It is obvious that the first term of the problem (3) is additive and separable. So if we find some relative rules such that the second term of the problem (3) has the same properties with the same variable as the first term, the solution of (3) can be easily found similar as (0). Assug period boundary conditions is used here, we observe that each entry z i of vector z would appear exactly s times in the first term. Therefore, to hold on the uniformity of vectors z and x, we need multiply the second term by s. To maintain the invariability of the problem (3), and after some manipulations, we have f m (z)= z, + β z z x = ni= (z i ) g + β z s s z x = ni= (z i ) g + β ni= z s (z i ) g (x i ) g (4) ( = ni= (zi ) g + β z s (z ) i) g (x i ) g, where (x i ) g is same as (z i ) g defined before. For example, we set s=3 and define (z i ) g = (z i, z i+, z i+ ). The generalizedl, -norm z, can be treated as the generalizedl norm of generalized points, whose entry (z i ) g is also a vector, and the absolute value of each entry is treated as thel norm of (z i ) g. See Figure (a) intuitively, where the top line is the vector z, the rectangles with dashed line are original (z i ) g, and the rectangles with solid line are the generalized points. Because of the period boundary conditions, we know that each line of Figure (a) is translated equal. We treat the vector x same as the vector z. Putting these generalized points (rectangles with solid line in the figure) as the columns of a matrix, then s z x can be regarded as matrix Frobenius norm ( (z i ) g ) ( (xi ) g ) F in Figure (a) with every line being the row of the matrix. This is why the second equality in (4) holds. Therefore, generally, for each i of the last line of (4), from the equation (8) and (), we can obtain then arg (z i ) g (z i ) g + β s (z i) g (x i ) g = S h ((x i ) g, β s ), { (z i ) g = max (x i ) g s } β, 0 (xi ) g, (z i ) g = ( ) [(z i ) g ], [(z i ) g ],..., [(z i ) g ] s. (5) (x i ) g Similarly as Figure (a), for each i, the ith entry x i (or z i ) of the vector x (or z) may appear s times, so we need compute each z i for s times in s different groups. However, the results from (5) are wrong, because the results z i in s different groups are different from (5). That means the results are not able to be satisfied simultaneously in this way. Moreover, for each i of the last line in (4), the result (5) is given by that the vector (z i ) g is parallel to the vector (x i ) g. Notice this point and ignore that (z i ) g = 0 or (x i ) g = 0, particularly for s=4 and (z i ) g = (z i, z i, z i+, z i+ ), the vector z can be split as follows, z= (z, z, z 3, z 4, z 5,, z n, z n, z n ) = + 4 (z, z, z 3, 0, 0,, 0, 0, z n ). 4 (z, z, z 3, z 4, 0,, 0, 0, 0) + 4 (0, z, z 3, z 4, z 5,, 0, 0, 0) (z, 0, 0, 0, 0,, z n, z n, z n ) + 4 (z, z, 0, 0, 0,, 0, z n, z n ) (6)

6 Let (z i ) g=(0,, 0, z i, z i, z i+, z i+, 0,, 0) be the expansion of (z i ) g, with (z ) g= (z, z, z 3, 0,, 0, z n ), (z n ) g= (z, 0, 0,, 0, z n, z n, z n ), (z n ) g=(z, z, 0,, 0, z n, z n ). Let (x i ) g= (0,, 0, x i, x i, x i+, x i+, 0,, 0) be the expansion of (x i ) g similarly as (z i ) g. Then, we have z= 4 ni= (z i ) g, and x= 4 ni= (x i ) g. Moreover, we can easily obtain that (z i) g = (z i ) g and (x i ) g = (x i ) g for every i. On one hand, the Euler equation of f m (z) (with s=4) is given by β 4 n i= β (z x)+ (z ) g (z ) g + + (z n ) g (z n) g 0, (7) ( (zi ) g (x i ) g) + (z ) g (z ) g + + (z i) g (z i ) g + + (z n) g (z n ) g 0. (8) From the deduction of the -dimensional shrinkage formula (8) in Section., we know that the necessary condition of imizing the ith term of the last line in (4) is that (z i ) g is parallel to (x i ) g. That is, (z i ) g is parallel to (x i) g for every i. For example, Then we obtain (z ) g = (z, z, z 3, z 4, 0,, 0, 0)//(x ) g = (x, x, x 3, x 4, 0,, 0, 0). (9) β 4 (z i) g (z i ) g n i= = (x i) g (x i ) g. Therefore, (8) changes to ( (zi ) g (x i) g) + (x ) g (x ) g + + (x i) g (x i ) g + + (x n) g (x n ) g 0, (0) β (z x)+ (x ) g (x ) g + + (x i) g (x i ) g + + (x n) g (x n ) g 0, () z x β for each component, we obtained z i x i ( β ( (x ) g (x ) + + (x i) g g (x i ) + + (x n) ) g g (x n ), () g x i (x i ) g + x i (x i ) g + x i (x i ) g + x i (x i+ ) g ). (3) Therefore, when (z i ) g 0 and (x i ) g 0, we find a imizer of (3) on the direction that all the vectors (z i ) g are parallel to the vectors (x i ) g. In addition, when β ( 4 (zi ) g (x ) i) (z i ) g + g (z i ) g = 0, (8) ( ) holds, then β 4 + (z i ) g (z i ) g = (x i) g, therefore, (z i ) g (z i ) g = (x i) g (x i ) g holds. Moreover, because of the strict convexity of f m (z), we know that the imizer is unique. This imizer z is accurate. On the other hand, when (z i ) g = 0 or (x i ) g = 0, our method may not obtain the accurate imizer. When (x i ) g = 0, we know that the imizer of the subproblem (z i ) g + (z i ) g β s (z i) g (x i ) g is exactly that (z i) g = 0. When (x i ) g 0 and the imizer of the subproblem (z i ) g + β (z i ) s (z i) g (x i ) g is that (z i) g = 0 (this is because that the parameterβ/s is two g small), our method is not able to obtain the accurate imizer. For example, that (z i ) g = 0 while (z i+ ) g 0 makes the element z i in z take different values in different subproblems. However, we can obtain an approximate imizer in this case, which is that the element z i is a simple summation of corresponding subproblems containing z i. We will show that in experiments of Section 5

7 the approximate imizer is also good. Moreover, when we take this problem as a subproblem of the image processing problem, we can set the parameterβto be large enough to make sure that the imizer of the subproblem is accurate. Therefore, the convergence theorem results can be obtained by this accuracy, which will be applied in Section 4. In addition, form (3), we can know that the element z i of the imizer can be treated as in s subproblems independently and then combine them. After some manipulations, in conclusion, we can get the following two general formulas. ). with arg z z, + β z x = S h OGS (x,β), (4) { S h OGS (x,β) i = z i = max } β F(x i), 0 x i. (5) Here, for instance, when group size ( s=4, (z i ) g = (z i, z i, z i+, z i+ ) in z, and (x i ) g is defined similarly as (z i ) g, we have F(x i )= (x i ) g + (x i ) g + (x i ) g + (x i+ ) g ). The (x j ) g is contained in F(x i ) if and only if (x j ) g has the component x i, and we follow the convention (/0)= in F(x i ) because (x i ) g = 0 implies x i = 0 and the value of F(x i ) is insignificant in (5). ). arg z z, + β z x = S h OGS (x,β), (6) with S h OGS (x,β) i = z i = G(x i ) x i. (7) ( ) ( ) Here, symbols are the same as ), and G(x i )=max s β (x i ) g, 0 + max s β (x i ) g, 0 ( ) ( ) + max s β (x i ) g, 0 + max s β (x i+ ) g, 0. Whenβis sufficiently large or sufficiently small, the former two formulas are the same and both are accurate. For the other values of β, from the experiments, we find that ) is better approximate than ), so we choose the formula ). Then, we obtain the following algorithm for finding the imizer of (3). Algorithm Direct shrinkage algorithm for the imization problem (3) Input: Given vector x=(x, x,, x n ), group size s, parameterβ. Compute: Definition of (x i ) g, for example (x i ) g = ( ) x i sl,, x i,, x i+sr, with s l = [ s ] and s r= [ s ] (s l+s r + = s). w=ones(,s)=[,,,]. If s is even, then w=[w, 0] and s= s+. Let w r = fliplr(w) be fliping w over 80 degrees. Compute X n = ( ) (x ) g,, (x i ) g,, (x n ) g, by convolution of w and x. Compute X n= max ( s./(x n β), 0 ) pointwise. Compute G(x i ) by correlation of w and X n, or by convolution of w r and X n. We can see that Algorithm only need times convolution computations with time complexity n s, which is just the same time complexity as one step iteration in the MM method in [4]. Therefore, our method is much more efficient than MM method or other variable duplication methods.

8 (a) Vector (b) weighted Figure : Vector case. In Section 5, we will give the numerical experiments for comparison between our method and the MM method. Moreover, if s=, our Algorithm degenerates to the classic soft thresholding as our formula (7) degenerates to (7). Moreover, when β is sufficiently large or sufficiently small, our algorithm is accurate while MM method is also approximate. Remark Our new explicit algorithm can be treated as an average estimation algorithm for solve all the overlapping group subproblem independently. In Section 5, our numerical experiments show that Our new explicit algorithm is accurate when β is sufficiently large or sufficiently small, and is approximate to the other methods for instance the MM method for otherβ. For the problem (4), similar to (4), we can get f w (z)= z w,, + β z z x = ni= β w g (z i ) g + z s s k= (w g ) k k= (w g) k z x = ni= w g (z i ) g + β ni= w z w g g ( ) (z i ) g (x i ) g = ni= w g (z i ) g + β ni= w z w g g (z i ) g w g (x i ) g ( ) = ni= w g (z i ) g + β z w g w g (z i ) g w g (x i ) g. See Figure (b) intuitively. All the symbols are the same as before. Similarly as before, we know that the necessary condition of imizing the ith term of the last line in (8) is that (w g (z i ) g ) w is parallel to (w g (x i ) g ). That is, g (z i ) g w g (z i ) g = w g (x i ) g w g (x i ) g for every i. On the other hand, Let W=diag(w g ) be a diagonal matrix with diagonal being the vector w g, then w g (x i ) g = W(x i ) g = W T (x i ) g. If the vector x (the same size as (x i ) g ) is parallel to the vector z, we have x =αz. Then, Wx = Wαz =αwz. We obtain that the vector Wx is also parallel to the vector Wz. Therefore, is, W(z i ) g W(z i ) g = W(x i) g W(x i ) g, and each of them is a unit vector. Then, WT W(z i ) g W(z i ) g w g w g (z i ) g w g (z i ) g = w g w g (x i ) g w g (x i ) g. (8) = WT W(x i ) g W(x i ) g. That Particularly, we first consider that s=4, (z i ) g = (z i, z i, z i+, z i+ ) and w g = (w, w, w 3, w 4 ). We mark (w g (z i ) g ) be the expansion of (w g (z i ) g ) similarly as (z i ) g, and we can get x = ni= 4i= (w w g (z i ) g ). Then, the Euler equation of f w (z) is given by i β 4i= w i n i= β (z x)+ (w g w g (z ) g ) (w g (z ) g ) + + (w g w g (z n ) g ) (w g (z n ) g ) 0, (9) ( wg (z i ) g w g (x i ) g ) + (w g w g (z ) g ) (w g (z ) g ) + + (w g w g (z n ) g ) (w g (z n ) g ) 0, (30)

9 β (z x)+ (w g w g (x ) g ) (w g (x ) g ) + + (w g w g (x n ) g ) (w g (x n ) g ) 0, (3) z x ( (wg w g (x ) g ) β (w g (x ) g ) + + (w g w g (x n ) g ) ) (w g (x n ) g ), (3) for each component, we obtained z i x i w 4 β x i + w g (x i ) g w 3 x i w g (x i ) g + w x i w g (x i ) g + w x i w g (x i+ ) g. (33) Another expression is as follows. z i w 4 x i w g w 4 x i w 3 + β w g (x i ) g x i w g w 3 x i β w g (x i ) g + + w x i w g w x i. β w g (x i+ ) g (34) Similarly as Algorithm, we obtain the following algorithm for finding the imizer of (3). Algorithm Direct shrinkage algorithm for the imization problem (3) Input: Given vector x, group size s, parameterβ, weight vector w g = (w, w,, w s ). Compute: Definition of (x i ) g, for example (x i ) g = ( ) x i sl,, x i,, x i+sr, with s l = [ s ] and s r= [ s ]. If s is even, then w g = [0, w g ] and s= s+. Let w r g = fliplr(w g) be fliping w g over 80 degrees. Compute X n = ( ) (x ) g,, (x i ) g,, (x n ) g, by correlation of wg. (pointwise square) and x, or by convolution ( of w r g. and ) x. Compute X n = max./(x n β), 0 pointwise. w g Compute F(x i ) by convolution of w g. and X n, or by correlation of w r g. and X n. We can also see that Algorithm only need times convolution computations with time complexity n s, which is just the same time complexity as one step iteration in the MM method in [4]. Therefore, our method is much more efficient than MM method. Here, thanks to the properties of inequalities, without loss of generality, let x [0, ] n, then we obtain that ifβ w g s w g x g = w g w g x g w g w g x g, thenβis sufficiently small. However, we do not directly know what β is sufficiently large. In Section 5, after more than one thousand tests, we find thatβ 30 w g s is sufficiently large generally. For the problem (5), similar to (8), we can obtain f W (A)= A W,, + β A A X F = mi= nj= W g (A i, j ) g F + A K k = β β K K k = (W g) k,k K k = k = (W g) k,k A X F = mi= nj= W g (A i, j ) g F + mi= nj= W A W g F g ( ) (A i, j ) g (X i, j ) g F = mi= nj= β W g (A i, j ) g F + mi= nj= W A W g F g (A i, j ) g W g (X i, j ) g F = mi= ( ) nj= β W g (A i, j ) g F + A W g F W g (A i, j ) g W g (X i, j ) g F. (35)

10 For example, we set K =, K =, and define (A i, j ) g = (A i, j, A i, j+ ; A i+, j, A i+, j+ ) and W g = (W,, W, ; W,, W, ). Similar as the vector case, each (A i, j ) g is a matrix with (A i, j ) g F = K K k = k = ((A i, j) g ) k,k. Notice that the Frobenius norm of a matrix is equal to thel norm of a vector reshaped by the matrix. Then, the Euler equation of f w (z) is given by β (A F)+ (W g W g (A, ) g ) (W g (A, ) g ) + + (W g W g (A n,n ) g ) (W g (A n,n ) g ) 0, (36) where ((A i, j ) g ) is defined similarly as ((z i ) g ), which is an expansion of (A i, j ) g. These symbols remain consistent as default through this paper. β (A X)+ (W g W g (X, ) g ) (W g (X, ) g ) + + (W g W g (X n,n ) g ) (W g (X n,n ) g ) 0, (37) A X ( (Wg W g (X, ) g ) β (W g (X, ) g ) + + (W g W g (X n,n ) g ) ) (W g (X n,n ) g ), (38) for each component, we obtained A i, j X i, j W, β X i, j W, + X i, j W, + X i, j W, + X i, j W g (X i, j ) g W g (X i, j ) g W g (X i, j ) g W g (X i, j ) g. (39) Therefore, we can obtain a similar algorithm on the former formula (39) for finding the imizer of (3) as Algorithm. 3 Several extensions 3. Other boundary conditions In Section, we gave the explicit shrinkage formulas for one class of OGS problems (3), (4) and (5). In order to achieve a simply deduction, we assume that period boundary conditions are used. One may confuse that whether period boundary conditions are always good for regularization problems such as signal processing or image processing, since natural signals or images are often asymmetric. However, in these problems, assug a kind of boundary condition is necessary for simplifying the problem and making the computation possible [7]. There are kinds of boundary conditions, such as zero boundary conditions and reflective boundary conditions. Period boundary conditions are often used in optimization because it can be computed fast as before or other fast computation for example, computation of matrix of block circulant with circulant blocks by fast Fourier transforms [7, 37, 38]. In this section, we consider other boundary conditions such as zero boundary conditions and reflective boundary conditions. For simplification, we only consider the vector case, while the results can be easily expanded to the matrix case similar as Section. When zero boundary conditions are used, we can expand the original vectors (or signal) z (= (z i ) n i= ) and x by two s-length vectors on the both hands of the original vectors, which are z (= [ z s,, z, ( z i ) n i=, z ] n+,, z n+s = [ 0 s, (z i ) n i=, 0 s] ) and x respectively. Then, the results and algorithms in Section are similar as assug period boundary conditions on z and x. See Figure (a) intuitively. Moreover, according to the definition of weighted generalizedl, -norm, although zero boundary conditions seems better than than period boundary conditions, our numerical results will show that the results form

11 (a) zero boundary conditions (b) reflective boundary conditions Figure : Vector case with other boundary conditions. different boundary conditions are almost the same in practice (see Section 5.). Therefore, we will choose the zero boundary conditions to solve the problems (3), (4) and (5) in the following sections. When reflective boundary conditions are assumed, the results are also the same. We only need to extend the original vector x to ˆx with reflective boundary conditions. See Figure (b) intuitively. 3. Nonpositive weights and different weights in groups In this section, we show that the weights vector w g and matrix W g in the former sections can contain arbitrary s entries with arbitrary real numbers. On one hand, the zero value can be the arbitrary entries of the weights vector or matrix. For example, the originall regularization problem () can be seen as a special form of weighted generalizedl, -norm regularization problems (4) with s=, or s=3 and w g = (, 0, 0) for the example of Figure (b). On the other hand, the norm is with the property of positive homogeneity, ( ) w g any = w g any, where any can be arbitrary norm such as and. Therefore, for the regularization problems (4) (or (5)), the weights vector w g (or matrix W g ) is the same as w g (or W g ), where the point-wise absolute value. Therefore, in general, our results are true, whatever the number of entries included in the weights vector w g and matrix W g, and whatever the real number value of these entries. However, if the weight w g is dependent on the group index i, that is for example, w i,g (z i ) g = ( (wi,g ) z i, (w i,g ) z i+, (w i,g ) 3 z i+ ), we cannot solve the relevant problems easily since our method fails. As we mentioned before, we only focus on that the weight w g is independent on the group index i, that is w i,g = w g for all i, which means that it is with translation invariant overlapping groups. 4 Applications in TV regularization problems with OGS The TV regularizer was firstly introduced by Rudin et.al[3](rof). It is used in many fields, for instance, denoising and deblurring problems. Several fast algorithms have been proposed, such as Chambelle [3] and fast TV deconvolution (FTVd) [37, 38]. Its corresponding imization task is: f TV + µ f H f g, (40) where f TV := ( f ) i, j = ( x f ) i, j + ( y f ) i, j (called isotropic TV, ITV), or i, j n i, j n f TV := ( f ) i, j = ( x f ) i, j + ( y f ) i, j (called anisotropic TV, ATV), H denotes i, j n i, j n

12 the blur matrix, and g denotes the given observed image with blur and noise. Operator :R n R n denotes the discrete gradient operator (under periodic boundary conditions) which is defined by ( f ) i, j = (( x f ) i, j, ( y f ) i, j ), with ( x f ) i, j = { fi+, j f i, j if i<n, f, j f n, j if i=n, ( y f ) i, j = { fi, j+ f i, j if j<n, f i, f i,n if j=n, for i, j=,,, n, where f i, j refers to the (( j )n+i)th entry of the vector f (it is the (i, j)th pixel location of the n n image, and this notation remains valid throughout the paper unless otherwise specified). Notice that H is a matrix of block circulant with circulant blocks (BCCB) structure when periodic boundary conditions are applied or other structures when other boundary conditions are applied [7]. Recently, Selesnick et. al [3] proposed an OGS TV regularizer to one-dimensional signal denoising. They applied the MM method to solve their model. Their numerical experiments showed that their method can overcome staircase effects effectively and get better results. However, their method has the disadvantages of the low speed of computation and the difficulty to be extended to the two-dimensional image case because they did not choose a variable substitution method. More recently, Liu et. al [5] proposed an OGS TV regularizer for two-dimensional image denoising and deblurring under Gaussian noise, and Liu et. al [4] proposed an OGS TV regularizer for image deblurring under impulse noise. Both of them used a variable substitution method and the ADMM framelet with an inner MM iteration for solving the subproblems similar as (3). Therefore, they did not get convergence theorem results because of the inner iterations. However, from our results in Section, we can solve the subproblems exactly, and we can get convergence theorem results under the ADMM framelet. Moreover, when the MM method is used in the inner iterations in [4, 5], they can only solve the ATV case but not the ATV case with OGS while our methods can solve both ATV and ITV cases. Firstly, we defined the ATV case with OGS under Gaussian noise and impulse noise respectively which is similar like [4, 5]. For the Gaussian noise case, For the impulse case, f ( x f ) W,, + ( y f ) W,, + µ H f g. (4) f ( x f ) W,, + ( y f ) W,, +µ H f g. (4) We call the former model as TV OGS L model and the latter model as TV OGS L model respectively. Then, we defined the ITV case with OGS. For Gaussian noise and impulse noise, we only change the former two terms of (4) and (4) respectively by A W,,. Here, A is a high-dimensional matrix with each entry A i, j = (( x f ) i, j ; ( y f ) i, j ). Remark 3. Here and the following sections, A can be treated as (( x f ); ( y f )) for simplicity in vector and matrix computation, although the computation of A W,, is should be tread as pointwise with A i, j = (( x f ) i, j ; ( y f ) i, j ) since the computation are almost all pointwise. Moreover, we also consider constrained model as listing in [6, 4, 5]. For any true digital image, its pixel value can attain only a finite number of values. Hence, it is natural to require all pixel values of the restored image to lie in a certain interval [a, b], see [6] for more details. In general, with the easy computation and the certified results in [6], we only consider all the

13 images located on the standard range [0, ]. Therefore,we define a projection operatorp Ω on the setω= { f R n n 0 f }, u Ω,v x,v y, f P Ω ( f ) i, j = 0, f i, j < 0, f i, j, f i, j [0, ],, f i, j >. 4. Constrained TV OGS L model For the constrained model (the ATV case) (called CATVOGSL), we have { (v x ) W,, + (v y ) W,, + µ H f g : v x= x f, v y = y f, u= f The augmented Lagrangian function of (44) is (43) }. (44) L(v x, v y, u, f ;λ,λ,λ 3 )= v x W,, λ T (v x x f )+ β v x x f + v y W,, λ T (v y y f )+ β v y y f (45) λ T 3 (u f )+β u f +µ H f g, whereβ,β > 0 are penalty parameters andλ,λ,λ 3 R n are the Lagrange multipliers. The solutions are according to the scheme of ADMM in Gabay [5], and we refer to some applications in image processing which can be solved by ADMM, e.g., [9, 0, 3, 8, 9, 39 4]. For a given (v k x, v k y, u k, f k ;λ k,λk,λk 3 ), the next iteration (vk+ x, v k+ y, u k+, f k+ ;λ k+,λ k+,λ k+ 3 ) is generated as follows:. Fix f = f k,λ =λ k,λ =λ k,λ 3=λ k 3, and imize (45) with respect to v x, v y and u. Respect to v x and v y, v k+ x = arg v x W,, λ k T (vx x f k )+ β v x x f k = arg v x W,, + β v x x f k λk β, (46) v k+ y = arg v y W,, λ k T (vy y f k )+ β v y y f k = arg v y W,, + β v y y f k λk β. (47) It is obvious that problems (46) and (47) match the framework of the problem (5), thus the imizers of (46) and (47) can be obtained by using the formulas in Section.. Respect to u, u k+ = arg λ k T 3 (u f k )+ β u f k = arg β u f k λk 3 β. The imizer is given explicitly by. Compute f k+ by solving the normal equation u k+ =P Ω f k + λk 3. (48) β (β ( x x+ y y)+µh H+β I) f k+ = x (β v k+ x λ k )+ y (β v k+ y λ k )+µh g+β (u k+ λk 3 β ), (49)

14 where denotes the conjugate transpose, see [36] for more details. Since all the parameters are positive, the coefficient matrix in (49) are always invertible and symmetric positive-definite. In addition, note that H, x, y have BCCB structure under periodic boundary conditions. We know that the computations with BCCB matrix can be very efficient by using fast Fourier transforms. 3. Update the multipliers via λ k+ λ k+ λ k+ = λ k γβ (v k+ x x f k+ ), = λ k γβ (v k+ x x f k+ ), 3 = λ k 3 γβ (u k+ f k+ ). Based on the discussions above, we present the ADMM algorithm for solving the convex CATVOGSL model (44), which is shown as Algorithm 3. Algorithm 3 CATVOGSL for the imization problem (44) initialization: Starting point f 0 = g, k=0,β,β,γ,µ, group size K K, weighted matrix W g,λ 0 i= 0, i=,, 3. iteration:. Compute v k+ x and v k+ y according to (46) and (47), and compute u k+ according to (48).. Compute f k+ by solving (49). 3. updateλ 0 i= 0, i=,, 3 according to (50). 4. k=k+. until a stopping criterion is satisfied. Algorithm 3 is an application of ADMM for the case with two blocks of variables (v x, v y, u) and f. Thus, its convergence is guaranteed by the theory of ADMM [0, 5, 8], and we summarize it in the following theorem. Theorem. ). When K = K =, our method CATVOGSL degenerates to the constrained ATV L model in Chan. [6]. Therefore, we have forβ,β > 0 andγ (0, + 5 ), the sequence (v k x, v k y, u k, f k ;λ k,λk,λk 3 ) generated by Algorithm from any initial point ( f 0 ;λ 0,λ0,λ0 3 ) converges to (v x, v y, u, f ;λ,λ,λ 3 ), where (v x, v y, u, f ) is a solution of (44). ). When K or K, form Section., whenβ is sufficiently large, the computation in every step of Algorithm 3 is accurate. Therefore, we have forβ > 0,β > L>0 (L is a sufficiently large real number) andγ (0, + 5 ), the sequence (v k x, v k y, u k, f k ;λ k,λk,λk 3 ) generated by Algorithm from any initial point ( f 0 ;λ 0,λ0,λ0 3 ) converges to (v x, v y, u, f ;λ,λ,λ 3 ), where (v x, v y, u, f ) is a solution of (44). Remark 4. Here, for the image f, we use period boundary conditions because of the fast computation. However, for v x, v y, we use zero boundary conditions. Because v x and v y substitute the gradient of the image, zero boundary conditions seems better for the definition of the generalized norml, on v x and v y as mentioned in Section 3.. Therefore, the boundary conditions of the image and its gradient are different and independent. These remain valid throughout the paper unless otherwise specified. For the ITV case, the constrained model (CITVOGSL) is u Ω,A, f { A W,, + µ H f g : A=( x f ; y f ), u= f (50) }. (5)

15 We can also get convergence theorem similar as Theorem, whose detail will be presented in the next section for the constrained TV OGS L model. We call this relevant algorithm CITVOGSL. 4. Constrained TV OGS L model For the constrained model (the ITV case) (called CITVOGSL), we have { A W,, +µ z : z= H f g, A=( x f ; y f ), u= f }. (5) u Ω,A, f The augmented Lagrangian function of (5) is [ L(v x, v y, z, w, f ;λ,λ,λ 3,λ 4 )= A W,, (λ T,λT )(A x f y f ] )+ β A [ ] x f y f +µ z λ T 3 (z (H f g))+β z (H f g) λ T 4 (w f )+β 3 w f, (53) whereβ,β,β 3 > 0 are penalty parameters andλ,λ,λ 3,λ 4 R n are the Lagrange multipliers. For a given (A k, z k, u k, f k ;λ k,λk,λk 3,λk 4 ), the next iteration (Ak+, z k+, u k+, f k+ ;λ k+,λ k+,λ k+ 3, λ4 k+ ) is generated as follows:. Fix f=f k,λ =λ k,λ =λ k,λ 3=λ k 3,λ 4=λ k 4, and imize (53) with respect to A, z and u. Respect to A, A k+ [ ] A k+ A k+ [ = arg A W,, (λ T,λT )(A x f k ] [ ] y f k )+ β x f A k y f k [ ] [ = arg A W,, + β A x f k ] [ ] λ k A y f k /β λ k /β It is obvious that problems (54) match the framework of the problem (5), thus the solution of (54) can be obtained by using the formulas in Section.. Respect to z, z k+ = arg µ z λ k T ( 3 z (H f k g) ) + β z (H f k g) = arg µ z + β z (H f k g) λk 3 β. The imization with respect to z can be given by (7) and (9) explicitly, that is, z k+ = sgn H f k g+ λk 3 max H f k g+ λk 3 µ, 0 β β. (55) Respect to u, The imizer is given explicitly by β u k+ = arg λ k T 4 (u f k )+ β 3 u f k = arg β 3 u f k λk 4 β 3.., (54) u k+ =P Ω f k + λk 4. (56) β 3

16 . Compute f k+ by solving the following normal equation similarly as the last section. (β ( x x+ y y)+β H H+β 3 I) f k+ = x(β A k+ λ k )+ y (β A k+ λ k )+H (β z k+ λ k 3 )+β H g+β 3 (u k+ λk 4 β 3 ). (57) 3. Update the multipliers via [ ] [ ] ([ ] [ λ k+ λ k λ k+ = A k+ λ k γβ x f A k+ k+ ]) y f k+, λ k+ 3 = λ k 3 γβ (z k+ (H f k+ g)), λ k+ 4 = λ k 4 γβ 3(u k+ f k+ ). Based on the discussions above, we present the ADMM algorithm for solving the convex CITVOGSL model (5), which is shown as Algorithm 4. Algorithm 4 CITVOGSL for the imization problem (5) initialization: Starting point f 0 = g, k=0,β,β,β 3,γ,µ, group size K K, weighted matrix W g,λ 0 i= 0, i=,, 3, 4. iteration:. Compute A k+ according to (54), and compute z k+ according to (55). and compute u k+ according to (56).. Compute f k+ by solving (57). 3. updateλ 0 i= 0, i=,, 3, 4 according to (58). 4. k=k+. until a stopping criterion is satisfied. Algorithm is an application of ADMM for the case with two blocks of variables (A, z, u) and f. Thus, its convergence is guaranteed by the theory of ADMM [0, 5, 8], and we summarize it in the following theorem. Theorem. ). When K = K =, our method CITVOGSL degenerates to the constrained ITV L model similarly as Chan. [6]. Therefore, we have forβ,β,β 3 > 0 andγ (0, + 5 ), the sequence (A k, z k, u k, f k ;λ k,λk,λk 3,λk 4 ) generated by Algorithm from any initial point ( f 0 ; λ 0,λ0,λ0 3,λ0 4 ) converges to (A, z, u, f ;λ,λ,λ 3,λ 4 ), where (A, z, u, f ) is a solution of (5). ). When K or K, form Section., whenβ is sufficiently large, the the computation in every step of Algorithm 4 is accurate. Therefore, we have forβ,β 3 > 0,β > L>0 (L is a sufficiently large real number) andγ (0, + 5 ), the sequence (A k, z k, u k, f k ;λ k,λk,λk 3,λk 4 ) generated by Algorithm from any initial point ( f 0 ;λ 0,λ0,λ0 3,λ0 4 ) converges to (A, z, u, f ; λ,λ,λ 3,λ 4 ), where (A, z, u, f ) is a solution of (5). For the ATV case, the constrained model (CATVOGSL) is u Ω,v x,v y, f (58) { (vx ) W,, + (v y ) W,, +µ z : z=h f g, v x = x f, v y = y f, u= f } (59) we can also get convergence theorem similar as Theorem 5, whose detail has been presented in the last section for the constrained TV OGS L model. We call this relevant algorithm CATVOGSL.

17 5 Numerical results In this section, we present several numerical results to illustrate the performance of the proposed method. All experiments are carried out on a desktop computer using Matlab 00a. Our computer is equipped with an Intel Core i3-30 CPU (3.4 GHz), 3.4 GB of RAM and a 3-bit Windows 7 operation system. 5. Comparison with MM method on the problem (5) In this section, we solve an example of (5) with X= rand(00, 00) and X(45 : 55, 45 : 55)= 0. We only compare the results of our explicit shrinkage formulas with the most recent MM iteration method proposed in [4] as a simple example. Particularly, we set weighted matrix W g R 3 3 ((W g ) i, j ), and list (5) again A Fun(A)= A, + β A X F. (60) We set different parameterβand set the number of MM iteration by 0 steps. For comparison, we expand our result by explicit shrinkage formulas to length 0 (the same as the MM iteration steps). The values of the object function Fun(A) against to iterations are illustrated in top line in Figure 3. The cross sectional elements of the imizer A are show in the bottom line in Figure 3. We choose both zero boundary conditions (0BC) and period boundary conditions (PBC) both for our method and the MM method. It is obvious that our method is faster than MM method because of the explicit shrinkage formulas. From the top line in Figure 3, we observe that MM method is also fast because it only needs less than 0 steps (sometimes 5) to converge. The related error of the function value of two methods is much less than percent for different parametersβ. From the bottom line in Figure 3, we can see that whenβ which is sufficiently small, and when β 30 which is sufficiently large, our result is almost the same as the final results by the MM iteration method. This shows that our method is accurate and the error of the MM method is very small. When <β<30, the imizer computed by our method is approximate and the error is much large than the MM method although the error of the function value by two methods is very small. However, in this case, form the table and the figure, we can see that our method is also able to be seen as a approximate solution. Moreover, our method is much more efficient because our time complexity is just the same as one step iteration in the MM method. In Table, we show the numerical comparison of our method and the MM method on three parts, the related error of function value (ReE of f A (X)), related error of imizer X (ReE of X) and the mean absolute error of X (MAE of X). From the table, we can see that our formula can almost get same results as the MM method whenβis sufficiently large, and approximate results similar as the MM method for otherβ. This is another proof for the feasibility of our formula. After more than one thousand similar tests on the example X= rand(00, 00), we find that whenβ 30 which is sufficiently large, our method is always accurate and the imizers by our method and the MM method (with 0 steps) are almost the same. Therefore, form Theorem and Theorem, we choose L=30 (sufficiently large) and choose a suit parameterβ in the next section to make sure the convergence. Moreover, we can see the efficiency of our method in the next section. Similarly as weighted matrix W g R 3 3 ((W g ) i, j ), we also tested other weighted matrix for more than one thousand times. For the examples when X= rand(00, 00) and X(X>= 0.5)=0 (or X(X<= 0.5)=) (every element is in [0, ]), we find that, generally,β w g s is sufficiently

18 beta= MM method with PBC Ours with peroid BC MM method with 0BC Ours with zero BC beta= 5 MM method with PBC Ours with peroid BC MM method with 0BC Ours with zero BC.33 x 04.3 beta= 0 MM method with PBC Ours with peroid BC MM method with 0BC Ours with zero BC.50 x 04.5 beta= 0 MM method with PBC Ours with peroid BC MM method with 0BC Ours with zero BC.5695 x beta= 30 MM method with PBC Ours with peroid BC MM method with 0BC Ours with zero BC Function value Function value 9500 Function value.3.9 Function value Function value Iterations Iterations Iterations Iterations Iterations Value of solutions x beta= MM method with PBC ours with pbc MM method with 0BC ours with 0BC Value of solutions beta= 5 MM method with PBC ours with pbc MM method with 0BC ours with 0BC Value of solutions beta= 0 MM method with PBC ours with pbc MM method with 0BC ours with 0BC Value of solutions beta= 0 MM method with PBC ours with pbc MM method with 0BC ours with 0BC Value of solutions beta= 30 MM method with PBC ours with pbc MM method with 0BC ours with 0BC Position of points Position of points Position of points Position of points Position of points Figure 3: Comparison between our method and the MM method. small andβ 30 W g s is sufficiently large. This results illustrate that the former when W g R 3 3 ((W g ) i, j )β 30 which is sufficiently large again. However, in practice, ifβis too large, then A = X, and the imization problem (60) is meaningless. In our experiments after more than one thousand tests, we find that, when every element of X is in [0, ], 30 W g β 300 W g s s is sufficiently large but not too large to make the imization problem meaningless. In this case, that means our formula is useful and effective in practice. On the other hand, when some elements of A are not in [0, ], we can first project or stretch A into the region [0, ], and then choose the sufficiently small or large parameter β. Table. Comparison of our method and the MM method for two kind of boundary conditions (BC), zero boundary conditions (0BC) and period boundary conditions (PBC). BC β= ReE of f A (X) 5.9e e-4.4e-4 5.4e-5.4e-5.6e-6 0BC ReE of X e-4 4.7e-5 7.5e-6 7.8e-7 MAE of X 3.4e e-4.4e-4 ReE of f A (X) 6.5e e-4 5.5e-5.e-5 6.3e-7.3e-6 PBC ReE of X e-4.9e-5 4.3e-6 4.5e-7 MAE of X 3.8e e-4.e-4 5. Comparison with TV methods and TV with OGS with inner iterations MM methods for image deblurring and denoising In this section, we compare all our algorithms with other methods. All the test images are shown in Fig. 4, one 04-by-04 image as (a) Man, and three 5-by-5 images as: (b) Car, (c) Parlor, (d) Housepole. The quality of the restoration results is measured quantitatively by using the peak signal-tonoise ratio (PSNR) in decibel (db) and the relative error (ReE): n Max I PSNR=0 log 0 f f, ReE= f f, f where f and f denote the original and restored images respectively, and Max I represents the maximum possible pixel value of the image. In our experiments, Max I =.

19 (a) Man (b) Car (c) Parlor (d) Housepole Figure 4: Original images. The stopping criterion used in our work is set to be as other methods F k+ F k F k < 0 5, (6) wheref k is the objective function value of the respective model in the kth iteration. we compare our methods with some other methods, such as Chan s TV method proposed in [6], Liu s method proposed in [5], and Liu s method proposed in [4]. Both the latter two methods [5] and [4] are with inner iterations MM methods for OGS TV problems, where the number of the inner iterations is set 5 by them. Particularly, for a fair comparison, we set weighted matrix W g R 3 3 ((W g ) i, j ) in all the experiments of our methods as in [4, 5]. 5.. Experiments for the constrained TV OGS L model In this section, we compare our methods (CATVOGSL and CITVOGSL) with some other methods, such as Chan s method proposed in [6] (Algorithm in [6] for the constrained TV-L model) and Liu s method proposed in [5]. Regarding the penalty parameters βs in all our algorithms, theoretically any positive values ofβs ensure the convergence. Therefore, we set the penalty parametersβ = 35,β = 0, for the ATV case,β = 00,β = 0 for the ITV case and relax parameterγ=.68. The blur kernels are generated by Matlab built-in function (i)fspecial( average,9) for 9 9 average blur. We generate all blurring effects using the Matlab built-in function imfilter(i,psf, circular, conv ) under periodic boundary conditions with I the original image and psf the blur kernel. We first generated the blurred images operating on images (a)-(c) by the former Gaussian blurs and further corrupted by zero mean Gaussian noise with BSNR=40. The BSNR is given by BSNR=0 log 0 g η, where g andηare the observed image and the noise, respectively. The numerical results of the three methods are shown in Table. We have tuned the parameters for all the methods as in Table. From Table, we can see that the PSNR and ReE of our methods (both ATV and ITV cases) are almost same as Liu. [5], which used MM inner iterations to solve the subproblems (46) and (47) (only for the ATV case). However, each outer iteration of our methods is nearly twice faster than Liu. [5] from the experiments. The time of each outer iteration of our methods is almost the same as the tradition TV method in Chan. [6]. In Figure 5 we display

20 the restored Parlor images from different algorithms. We can see that OGS TV regularizers can get clearer edges on the desk of the image than TV regularizer. Now we compute the complexity of each step of our methods and Liu s method [5]. Firstly, we know that the complexity of all the methods is (4 times n log n) except the OGS subproblems. Then, for the OGS subproblems, the MM method in [5] with 5 steps inner iteration, the complexity is ( subproblems). The complexity of our methods is in CATVOGSL and in CITVOGSL. Therefore, the total complexity of [5] is 5 5, the total complexity of CATVOGSL is 5 08 and the total complexity of CITVOGSL is That means each step of our methods is more than double faster than the inner iteration method [5]. In the next section, the common computation parts of our methods and the inner iteration method are much more, and then our methods are only nearly double faster. Remark 5. Here we do not list the results of Image (d) because they are almost the same as Image (b) and (c). Moreover, whenβ < 30 which is not sufficiently large, the numerical results are also good although we do not have the convergence theorem as Theorem and Theorem. This shows that the approximate part of our shrinkage formula is also good, and that when the inner step in [4, 5] is chose to be 5 the numerical experiments are convergent although they did not find a convergence control sequence. 5.. Experiments for the constrained TV OGS L model In this section, we compare our methods (CATVOGSL and CITVOGSL) with some other methods, such as Chan s method proposed in [6] (Algorithm in [6] for the constrained TV-L model) and Liu s method proposed in [4]. Similarly as the last section, we set the penalty parametersβ = 80,β = 000,β 3 =,for the ATV case,β = 80,β = 000,β 3 =, for the ITV case and relax parameterγ=.68. The blur kernel is generated by Matlab built-in function fspecial( gaussian,7,5) for 7 7 Gaussian blur with standard deviation 5. We first generated the blurred images operating on images (b)-(d) by the former Gaussian blur and further corrupted them by salt-and-pepper noise from 30% to 50%. We generate all noise effects by Matlab built-in function imnoise(b, salt & pepper,level) with B the blurred image and fix the same random matrix for different methods. The numerical results by the three methods are shown in Table 3. We have tuned the parameters manually to give the best PSNR improvement for Chan. [6] as in Table 3 for different images. And for Liu. [4] we choose the given parameters µ default as 00,80,60 for 30% to 50% respectively. For our method CATVOGSL, we set µ as 80,40,00 for 30% to 50% respectively. For our method CITVOGSL, we set µ as 40,00,80 for 30% to 50% respectively. In experiments, Table : Numerical comparison of Chan. [6], Liu. [5], CATVOGSL and CITVOGSL for images (a) (c) in Figure 4. PSNR: db, Time: s. Images (a) Man (b) Car (c) Parlor Method µ ( 0 5 ) Itrs/PSNR/Time/ReE Itrs/PSNR/Time/ReE Itrs/PSNR/Time/ReE Chan.[6] 0.5 5/30.34/ 6.0/ /3.3/.00/0.046 /3.70/.0/0.05 Liu.[5] /30.60/ 9.06/ /3.98/ 3.93/ /3.46/ 4./ CATVOGSL 7/30.6/ 3.37/0.07 9/3.68/.76/ /3.40/.34/0.047 CITVOGSL 8/30.59/ 3.70/0.076 /3.60/.04/ /3.03/.6/0.049

21 Blurred & Noisy PSNR 3.70dB, CPU.0s, It PSNR 3.46dB, CPU 4.s, It 3 Original PSNR 3.40dB, CPU.34s, It 7 PSNR 3.03dB, CPU.6s, It 8 Figure 5: Top row: blurred and noisy image (left), restoration images of Chan. [6] (middle), Liu. [5]. Bottom row: original image (left), restoration images of CATVOGSL, CITVOGSL. Table 3: Numerical comparison of Chan. [6], Liu. [4], CATVOGSL and CITVOGSL for images (b) (d) in Figure 4. PSNR: db, Time: s, Is is short for Images. Is (a) (b) (c) (d) Noise Chan. [6] Liu. [4] CATVOGSL CITVOGSL level µ/ Itrs/PSNR/ Time/ReE Itrs/PSNR/ Time/ReE Itrs/PSNR/ Time/ReE Itrs/PSNR/ Time/ReE 30% 5/30/9.59/6.57/ /30.9/30.3/ /3.7/7.97/ /3.34/.56/ % 8/ 99/8.85/46.4/ /30./3.54/ /30.0/8.03/ /30.06/9.95/ % 5/ 83/8.03/ 39.53/ /9.0/ 4.50/ /8.64/4.38/ /8.60/9.69/ % 5/8/9.7/.90/ /3.8/.93/ /3.77/ 5.97/ /3.75/ 6.6/ % 0/ 98/8.59/7.80/ /30.70/4.99/ /30.47/ 6./ /30./ 6.38/ % 5/ 74/7.8/3.8/ /8.9/7.63/ /8.59/ 9.7/ /8.6/ 9.64/ % 6/38/30./5.65/ /3.5/.84/ /3.30/ 5.8/ /3.08/ 6.75/ % /05/9.0/9.7/ /3.04/3.67/ /3.0/ 5.76/ /30.50/ 6.9/ % 5/ 76/7.80/4.37/ /9.8/5.43/ /8.85/ 7.85/ /8.40/ 8.78/ % 6/7/30.4/3.7/ /3.47/3.5/ /3.54/ 5.68/ /3.5/ 6.75/ % 0/ 95/9.44/7.43/ /3.43/4.7/ /3.35/ 6.40/ /3.06/ 7.33/ % 6/ 77/8.33/4.9/ /9.8/7.05/ /9.44/ 8.83/0.6 45/9.9/ 9.98/0.48

22 Blurred & noisy Chan. [6] PSNR 30.4dB, CPU 3.7s, It 7 Liu. [4] PSNR 3.47dB, CPU 3.5s, It 36 Original CATVOGSL PSNR 3.54dB, CPU 5.68s, It 5 CITVOGSL PSNR 3.5dB, CPU 6.75s, It 9 Figure 6: Top row: blurred and noisy image (left), restoration images of Chan. [6] (middle), Liu. [4]. Bottom row: original image (left), restoration images of CATVOGSL, CITVOGSL. we find that the parameters of our methods are robust and have wide rage to choose. Therefore, we set the sameµfor different images. From Table 3, we can also see that the PSNR and ReE of our methods (both ATV and ITV cases) are almost same as Liu. [4], which used MM inner iterations to solve the subproblems (46) and (47) (only for the ATV case). However, each outer iteration of our methods is nearly twice faster than Liu. [4] from the experiments. The time of each outer iteration of our methods is almost the same as the tradition TV method in Chan. [6]. Moreover, we can also see that sometimes the ATV is better than ITV and sometimes on the contrary for OGS TV. Finally, in Figure 6, we display the degraded image, the original image and the restored images for 30% level of noise on Image (d) by the four methods. From the figure, we can see that both our methods and Liu. [4] can get better edges (handrail and window) than Chan. [6]. 6 Conclusion In this paper, we propose the explicit shrinkage formulas for one class of OGS regularization problems, which are with translation invariant overlapping groups. These formulas can be extended to several other regularization problems for instance nonconvex regularizers with overlapping group sparsity. We apply our results in OGS TV OGS regularization problems deblurring and denoising problems, and get theorem convergence results and good experiments results. Furthermore, we also extend the image deblurring problems with OGS ATV in [4, 5] to both ATV and ITV cases. For both of them, we have theorem convergence results by using ADMM both for L and

New inexact explicit thresholding/ shrinkage formulas for inverse problems with overlapping group sparsity

New inexact explicit thresholding/ shrinkage formulas for inverse problems with overlapping group sparsity Liu et al EURASIP Journal on Image and Video Processing 06 06:8 DOI 086/s3640-06-08-5 EURASIP Journal on Image and Video Processing RESEARCH Open Access New inexact explicit thresholding/ shrinkage formulas

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Minru Bai(x T) College of Mathematics and Econometrics Hunan University Joint work with Xiongjun Zhang, Qianqian Shao June 30,

More information

Gauge optimization and duality

Gauge optimization and duality 1 / 54 Gauge optimization and duality Junfeng Yang Department of Mathematics Nanjing University Joint with Shiqian Ma, CUHK September, 2015 2 / 54 Outline Introduction Duality Lagrange duality Fenchel

More information

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING JIAN-FENG CAI, STANLEY OSHER, AND ZUOWEI SHEN Abstract. Real images usually have sparse approximations under some tight frame systems derived

More information

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics

More information

Translation-Invariant Shrinkage/Thresholding of Group Sparse. Signals

Translation-Invariant Shrinkage/Thresholding of Group Sparse. Signals Translation-Invariant Shrinkage/Thresholding of Group Sparse Signals Po-Yu Chen and Ivan W. Selesnick Polytechnic Institute of New York University, 6 Metrotech Center, Brooklyn, NY 11201, USA Email: poyupaulchen@gmail.com,

More information

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

A fast nonstationary preconditioning strategy for ill-posed problems, with application to image deblurring

A fast nonstationary preconditioning strategy for ill-posed problems, with application to image deblurring A fast nonstationary preconditioning strategy for ill-posed problems, with application to image deblurring Marco Donatelli Dept. of Science and High Tecnology U. Insubria (Italy) Joint work with M. Hanke

More information

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学 Linearized Alternating Direction Method: Two Blocks and Multiple Blocks Zhouchen Lin 林宙辰北京大学 Dec. 3, 014 Outline Alternating Direction Method (ADM) Linearized Alternating Direction Method (LADM) Two Blocks

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSE SIGNAL RESTORATION. 1. Introduction SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful

More information

Leveraging Machine Learning for High-Resolution Restoration of Satellite Imagery

Leveraging Machine Learning for High-Resolution Restoration of Satellite Imagery Leveraging Machine Learning for High-Resolution Restoration of Satellite Imagery Daniel L. Pimentel-Alarcón, Ashish Tiwari Georgia State University, Atlanta, GA Douglas A. Hope Hope Scientific Renaissance

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

Investigating the Influence of Box-Constraints on the Solution of a Total Variation Model via an Efficient Primal-Dual Method

Investigating the Influence of Box-Constraints on the Solution of a Total Variation Model via an Efficient Primal-Dual Method Article Investigating the Influence of Box-Constraints on the Solution of a Total Variation Model via an Efficient Primal-Dual Method Andreas Langer Department of Mathematics, University of Stuttgart,

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Assorted DiffuserCam Derivations

Assorted DiffuserCam Derivations Assorted DiffuserCam Derivations Shreyas Parthasarathy and Camille Biscarrat Advisors: Nick Antipa, Grace Kuo, Laura Waller Last Modified November 27, 28 Walk-through ADMM Update Solution In our derivation

More information

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München, München, Arcistraße

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING XIAO WANG AND HONGCHAO ZHANG Abstract. In this paper, we propose an Augmented Lagrangian Affine Scaling (ALAS) algorithm for general

More information

Variational Image Restoration

Variational Image Restoration Variational Image Restoration Yuling Jiao yljiaostatistics@znufe.edu.cn School of and Statistics and Mathematics ZNUFE Dec 30, 2014 Outline 1 1 Classical Variational Restoration Models and Algorithms 1.1

More information

Recent developments on sparse representation

Recent developments on sparse representation Recent developments on sparse representation Zeng Tieyong Department of Mathematics, Hong Kong Baptist University Email: zeng@hkbu.edu.hk Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last

More information

THE solution of the absolute value equation (AVE) of

THE solution of the absolute value equation (AVE) of The nonlinear HSS-like iterative method for absolute value equations Mu-Zheng Zhu Member, IAENG, and Ya-E Qi arxiv:1403.7013v4 [math.na] 2 Jan 2018 Abstract Salkuyeh proposed the Picard-HSS iteration method

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Bayesian Paradigm. Maximum A Posteriori Estimation

Bayesian Paradigm. Maximum A Posteriori Estimation Bayesian Paradigm Maximum A Posteriori Estimation Simple acquisition model noise + degradation Constraint minimization or Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization)

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Noise, Denoising, and Image Reconstruction with Noise (lecture 10)

EE 367 / CS 448I Computational Imaging and Display Notes: Noise, Denoising, and Image Reconstruction with Noise (lecture 10) EE 367 / CS 448I Computational Imaging and Display Notes: Noise, Denoising, and Image Reconstruction with Noise (lecture 0) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Inverse problem and optimization

Inverse problem and optimization Inverse problem and optimization Laurent Condat, Nelly Pustelnik CNRS, Gipsa-lab CNRS, Laboratoire de Physique de l ENS de Lyon Decembre, 15th 2016 Inverse problem and optimization 2/36 Plan 1. Examples

More information

Enhanced Compressive Sensing and More

Enhanced Compressive Sensing and More Enhanced Compressive Sensing and More Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Nonlinear Approximation Techniques Using L1 Texas A & M University

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

arxiv: v1 [cs.cv] 3 Dec 2018

arxiv: v1 [cs.cv] 3 Dec 2018 Tensor N-tubal rank and its convex relaxation for low-rank tensor recovery Yu-Bang Zheng a, Ting-Zhu Huang a,, Xi-Le Zhao a, Tai-Xiang Jiang a, Teng-Yu Ji b, Tian-Hui Ma c a School of Mathematical Sciences/Research

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Compressive Imaging by Generalized Total Variation Minimization

Compressive Imaging by Generalized Total Variation Minimization 1 / 23 Compressive Imaging by Generalized Total Variation Minimization Jie Yan and Wu-Sheng Lu Department of Electrical and Computer Engineering University of Victoria, Victoria, BC, Canada APCCAS 2014,

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Implicitely and Densely Discrete Black-Box Optimization Problems

Implicitely and Densely Discrete Black-Box Optimization Problems Implicitely and Densely Discrete Black-Box Optimization Problems L. N. Vicente September 26, 2008 Abstract This paper addresses derivative-free optimization problems where the variables lie implicitly

More information

Channel Pruning and Other Methods for Compressing CNN

Channel Pruning and Other Methods for Compressing CNN Channel Pruning and Other Methods for Compressing CNN Yihui He Xi an Jiaotong Univ. October 12, 2017 Yihui He (Xi an Jiaotong Univ.) Channel Pruning and Other Methods for Compressing CNN October 12, 2017

More information

SIGNAL AND IMAGE RESTORATION: SOLVING

SIGNAL AND IMAGE RESTORATION: SOLVING 1 / 55 SIGNAL AND IMAGE RESTORATION: SOLVING ILL-POSED INVERSE PROBLEMS - ESTIMATING PARAMETERS Rosemary Renaut http://math.asu.edu/ rosie CORNELL MAY 10, 2013 2 / 55 Outline Background Parameter Estimation

More information

Preconditioning via Diagonal Scaling

Preconditioning via Diagonal Scaling Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.

More information

Fourier Transforms 1D

Fourier Transforms 1D Fourier Transforms 1D 3D Image Processing Alireza Ghane 1 Overview Recap Intuitions Function representations shift-invariant spaces linear, time-invariant (LTI) systems complex numbers Fourier Transforms

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Sparsifying Transform Learning for Compressed Sensing MRI

Sparsifying Transform Learning for Compressed Sensing MRI Sparsifying Transform Learning for Compressed Sensing MRI Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and Coordinated Science Laborarory University of Illinois

More information

A Fast Algorithm for Computing High-dimensional Risk Parity Portfolios

A Fast Algorithm for Computing High-dimensional Risk Parity Portfolios A Fast Algorithm for Computing High-dimensional Risk Parity Portfolios Théophile Griveau-Billion Quantitative Research Lyxor Asset Management, Paris theophile.griveau-billion@lyxor.com Jean-Charles Richard

More information

Sparse Recovery via Partial Regularization: Models, Theory and Algorithms

Sparse Recovery via Partial Regularization: Models, Theory and Algorithms Sparse Recovery via Partial Regularization: Models, Theory and Algorithms Zhaosong Lu and Xiaorui Li Department of Mathematics, Simon Fraser University, Canada {zhaosong,xla97}@sfu.ca November 23, 205

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Sparse linear models

Sparse linear models Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016 Introduction Linear transforms Frequency representation Short-time

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Sparse Approximation via Penalty Decomposition Methods

Sparse Approximation via Penalty Decomposition Methods Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization

Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization Yuan Shen Zaiwen Wen Yin Zhang January 11, 2011 Abstract The matrix separation problem aims to separate

More information

SGN Advanced Signal Processing Project bonus: Sparse model estimation

SGN Advanced Signal Processing Project bonus: Sparse model estimation SGN 21006 Advanced Signal Processing Project bonus: Sparse model estimation Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 12 Sparse models Initial problem: solve

More information

Sparse Signal Reconstruction with Hierarchical Decomposition

Sparse Signal Reconstruction with Hierarchical Decomposition Sparse Signal Reconstruction with Hierarchical Decomposition Ming Zhong Advisor: Dr. Eitan Tadmor AMSC and CSCAMM University of Maryland College Park College Park, Maryland 20742 USA November 8, 2012 Abstract

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

9 Classification. 9.1 Linear Classifiers

9 Classification. 9.1 Linear Classifiers 9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive

More information

Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem

Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem CS 189 Introduction to Machine Learning Spring 2018 Note 22 1 Duality As we have seen in our discussion of kernels, ridge regression can be viewed in two ways: (1) an optimization problem over the weights

More information

Research Article Solving the Matrix Nearness Problem in the Maximum Norm by Applying a Projection and Contraction Method

Research Article Solving the Matrix Nearness Problem in the Maximum Norm by Applying a Projection and Contraction Method Advances in Operations Research Volume 01, Article ID 357954, 15 pages doi:10.1155/01/357954 Research Article Solving the Matrix Nearness Problem in the Maximum Norm by Applying a Projection and Contraction

More information

arxiv: v1 [cs.it] 26 Oct 2018

arxiv: v1 [cs.it] 26 Oct 2018 Outlier Detection using Generative Models with Theoretical Performance Guarantees arxiv:1810.11335v1 [cs.it] 6 Oct 018 Jirong Yi Anh Duc Le Tianming Wang Xiaodong Wu Weiyu Xu October 9, 018 Abstract This

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Image Restoration with Mixed or Unknown Noises

Image Restoration with Mixed or Unknown Noises Image Restoration with Mixed or Unknown Noises Zheng Gong, Zuowei Shen, and Kim-Chuan Toh Abstract This paper proposes a simple model for image restoration with mixed or unknown noises. It can handle image

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

Image restoration: numerical optimisation

Image restoration: numerical optimisation Image restoration: numerical optimisation Short and partial presentation Jean-François Giovannelli Groupe Signal Image Laboratoire de l Intégration du Matériau au Système Univ. Bordeaux CNRS BINP / 6 Context

More information

Iterative Methods. Splitting Methods

Iterative Methods. Splitting Methods Iterative Methods Splitting Methods 1 Direct Methods Solving Ax = b using direct methods. Gaussian elimination (using LU decomposition) Variants of LU, including Crout and Doolittle Other decomposition

More information

l 0 TV: A Sparse Optimization Method for Impulse Noise Image Restoration

l 0 TV: A Sparse Optimization Method for Impulse Noise Image Restoration IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 l 0 TV: A Sparse Optimization Method for Impulse Noise Image Restoration Ganzhao Yuan, Bernard Ghanem arxiv:1802.09879v2 [cs.na] 28 Dec

More information

ABSTRACT. Recovering Data with Group Sparsity by Alternating Direction Methods. Wei Deng

ABSTRACT. Recovering Data with Group Sparsity by Alternating Direction Methods. Wei Deng ABSTRACT Recovering Data with Group Sparsity by Alternating Direction Methods by Wei Deng Group sparsity reveals underlying sparsity patterns and contains rich structural information in data. Hence, exploiting

More information

MATH 680 Fall November 27, Homework 3

MATH 680 Fall November 27, Homework 3 MATH 680 Fall 208 November 27, 208 Homework 3 This homework is due on December 9 at :59pm. Provide both pdf, R files. Make an individual R file with proper comments for each sub-problem. Subgradients and

More information

Lecture Notes 5: Multiresolution Analysis

Lecture Notes 5: Multiresolution Analysis Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Step lengths in BFGS method for monotone gradients

Step lengths in BFGS method for monotone gradients Noname manuscript No. (will be inserted by the editor) Step lengths in BFGS method for monotone gradients Yunda Dong Received: date / Accepted: date Abstract In this paper, we consider how to directly

More information

Over-enhancement Reduction in Local Histogram Equalization using its Degrees of Freedom. Alireza Avanaki

Over-enhancement Reduction in Local Histogram Equalization using its Degrees of Freedom. Alireza Avanaki Over-enhancement Reduction in Local Histogram Equalization using its Degrees of Freedom Alireza Avanaki ABSTRACT A well-known issue of local (adaptive) histogram equalization (LHE) is over-enhancement

More information

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints Klaus Schittkowski Department of Computer Science, University of Bayreuth 95440 Bayreuth, Germany e-mail:

More information

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Gongguo Tang and Arye Nehorai Department of Electrical and Systems Engineering Washington University in St Louis

More information

A tutorial on sparse modeling. Outline:

A tutorial on sparse modeling. Outline: A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

arxiv: v1 [cs.cv] 1 Jun 2014

arxiv: v1 [cs.cv] 1 Jun 2014 l 1 -regularized Outlier Isolation and Regression arxiv:1406.0156v1 [cs.cv] 1 Jun 2014 Han Sheng Department of Electrical and Electronic Engineering, The University of Hong Kong, HKU Hong Kong, China sheng4151@gmail.com

More information

SOS Boosting of Image Denoising Algorithms

SOS Boosting of Image Denoising Algorithms SOS Boosting of Image Denoising Algorithms Yaniv Romano and Michael Elad The Technion Israel Institute of technology Haifa 32000, Israel The research leading to these results has received funding from

More information

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm Jeevan K. Pant, Wu-Sheng Lu, and Andreas Antoniou University of Victoria May 21, 2012 Compressive Sensing 1/23

More information

Exact penalty decomposition method for zero-norm minimization based on MPEC formulation 1

Exact penalty decomposition method for zero-norm minimization based on MPEC formulation 1 Exact penalty decomposition method for zero-norm minimization based on MPEC formulation Shujun Bi, Xiaolan Liu and Shaohua Pan November, 2 (First revised July 5, 22) (Second revised March 2, 23) (Final

More information

Ω R n is called the constraint set or feasible set. x 1

Ω R n is called the constraint set or feasible set. x 1 1 Chapter 5 Linear Programming (LP) General constrained optimization problem: minimize subject to f(x) x Ω Ω R n is called the constraint set or feasible set. any point x Ω is called a feasible point We

More information

LINEARIZED ALTERNATING DIRECTION METHOD FOR CONSTRAINED LINEAR LEAST-SQUARES PROBLEM

LINEARIZED ALTERNATING DIRECTION METHOD FOR CONSTRAINED LINEAR LEAST-SQUARES PROBLEM LINEARIZED ALTERNATING DIRECTION METHOD FOR CONSTRAINED LINEAR LEAST-SQUARES PROBLEM RAYMOND H. CHAN, MIN TAO, AND XIAOMING YUAN February 8, 20 Abstract. In this paper, we apply the alternating direction

More information

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State

More information

A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM

A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM CAIHUA CHEN, SHIQIAN MA, AND JUNFENG YANG Abstract. In this paper, we first propose a general inertial proximal point method

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information