Image Compression Using Simulated Annealing

Size: px

Start display at page:

Download "Image Compression Using Simulated Annealing"

Doreen Hancock
5 years ago
Views:

1 Image Compression Using Simulated Annealing Aritra Dutta,GeonwooKim,MeiqinLi, Carlos Ortiz Marrero,MohitSinha,ColeStiegler k August 5 14, 2015 Mathematical Modeling in Industry XIX Institute of Mathematics and its Applications Proposed by: 1QB Information Technologies Mentors: Michael P. Lamoureux 1, Pooya Ronagh 2 Abstract Over the course of this two week workshop, we developed an algorithm for image compression using an optimized implementation of simulated annealing intended to solve Ising spin problems. Our motivation is to be able to execute this algorithm using the 1QBit interface for the D-Wave quantum computational hardware system. We also explore a combination of simulated annealing and regression techniques and compared their performance. Finally, we discuss ways to optimize our algorithm in order to make it feasible for a D-Wave architecture. University of Central Florida Pusan National University Texas A & M University University of Houston University of Minnesota, Twin Cities k University of Iowa 1 Pacific Institute for the Mathematical Sciences 2 1QB Information Technology, Vancouver B.C.

2 1 Introduction Simulated annealing (SA) is a commonly used algorithm for heuristic optimization. Inspired by the study of thermal processes, this algorithm has been particularly successful at providing approximate solution to NP-Hard problems [4]. The algorithm essentially performs a Monte Carlo simulation along with a transition function. At the beginning of the simulation the algorithm is encourage to explore the landscape of the objective function while slowly lowering the probability of abrupt transitions. If the process is done slowly enough then after a few repetitions one can hope to find the optimal value. Our goal is to apply this algorithm to image compression and image reconstruction. We use an implementation of SA optimized to solve Ising spin type problems [5]. The algorithm s original purpose was to provide a highly optimized implementation of SA and compare its performance against a D-Wave device. It turns out that finding the lowest energy state in an Ising model is equivalent to solving a quadratic unconstrained binary optimization problem (QUBO). This investigation is motivated by the assumption that our algorithm could be implemented in the processor produced by D-Wave Systems, Inc. An advantage to the D-Wave system is that it can produce a spectrum of optimal and suboptimal answers to a QUBO/Ising optimization problem, rather than merely the lowest energy point. This work grew out of the question of trying to determine if the sparse recovery problem is a feasible problem for the D-Wave architecture. 2 The Ising Model The Ising spin model is a widely used model that can be used to describe any system that have a set of individual elements interacting via pairwise interactions [1]. Definition 2.1 (Ising Problem). Let G =(V,E) be a graph on n vertices with vertex set V, edge set E, and let s i 2 { 1, 1} for i 2 V. For a given configuration s =(s 1,s 2,...,s n ) 2 { 1, 1} n, the energy of this system is given by, H(s) = X k2v h k s k + X (i,j)2e J ij s i s j = hh, si + hs, Jsi where h =(h 1,...,h n ) 2 R n and J =(J ij ) 2 M n (R). 1

3 The simulated annealing implementation [5] we will be using is designed to minimize the function H(s). 3 Image Compression/Reconstruction Problem The underlying optimization problem studied here is to find a sparse solution to the underdetermined linear system Ax = b, wherea is an m n matrix and b is an m-vector, and m apple n. The problem can be written as (1) minimize x2{0,1} n x 0 subject to Ax b 2 2 apple Note that for a binary vector x 2{0, 1} n, the two norms x 0 and x 1 are identical. This problem can be interpreted as an image compression problem where your goal is to find a sparse binary vector x such that its image under A is close to the original image b. Alternatively, you can think of this problem as a image reconstruction problem where your goal is to find a sparse binary vector x that was corrupted by A, but we only have access to the corrupted image b. Here we will only discuss the image compression problem. 4 QUBO and the Ising Model It turns out that problem (1) can be relaxed to the following unconstrained optimization problem, (2) min x2{0,1} n x 0 + Ax b 2 2 for large constraint parameter >0. Let e := (1,...,1) 2{0, 1} n and consider the following reformulation of (2) x 0 + Ax b 2 2 = he, xi + hax b, Ax bi = he, xi + hax, Axi 2 hax, bi + hb, bi = he, xi + ha Ax, xi 2 hax, bi + hb, bi = he 2 A b, xi + h A Ax, xi + hb, bi Notice that (2) is a QUBO problem. 2

4 Definition 4.1 (QUBO). Let g(x) =hx, Qxi + hc, xi, where Q is a symmetric matrix, c 2 R n, and x = (x 1,...,x n ) 2 {0, 1} n. The quadratic unconstrained binary optimization problem is to minimize the function g(x) over {0, 1} n. In order to use the solver we need to formulate our problem as an Ising problem. Consider the following change of variables, Now equation (2) becomes, 5 Results x = 1 (s + e) =) x 2{0, 1}n 2 1 min s2{ 1,1} n 2 et (s + e)+ 1 2 A(s + e) b 2 2 In this section we discuss our implementation of the problem discussed in section 3, illustrating how our algorithm e ectively reduces the image size while maintaining a reasonable level of quality. Recall our problem is min x2{0,1} n kxk 1 + kax bk 2 2 where b is the original image and A is our blurring operator, in the form of a convolution expressed explicitly as a sparse matrix. In order to define a convolution, we must choose an appropriate kernel first. We found Gaussian and averaging disk kernels performed significantly better than other types of kernels, so we limit our discussion to these kernels. We use the normalized mean square error (NMSE) as our performance measure, NMSE = kax bk 2 kbk 2 where here b is an array of grayscale values representing a target image and Ax is the blurring convolution applied to the binary compression x. The following computations were performed using a Macbook Pro on OSX with a 2.3 GHz Intel Core i7 and 16 GB of memory. For our images, we used Matlab s built-in clown and mandrill images. Each took approximately 40 seconds to perform the entire computation. Most of the computation time was spent in the C solver written by Troyer et al. [5]. Since the solver is technically taking the place of the quantum annealer, the long 3

5 Figure 1: True image, binary representation, and smoothed reconstruction. computation time is not concerning as the true quantum hardware would execute this much faster. The original clown image, the binary image, and the smoothed binary image are shown in Figure 1, with parameter = 100. Initially, we started with a rather large penalty to encourage kax bk to be as small as possible, but at the expense of a less sparse results. On average, the binary images had 30-35% nonzero entries in the solution before being blurred. With = 100 we obtained the results shown in Table 1 for the di erent kernels. Note that smaller NMSE values indicate better performance. In an attempt to further improve our results, we combined the blurred images Ax in a couple di erent ways. First was a simple averaging combina- 4

6 Table 1: NMSE results for various smoothing kernels. (Smaller is better.) Kernel Type Size Standard Deviation NMSE Gaussian 3x Gaussian 5x Gaussian 7x Gaussian 3x Gaussian 5x Gaussian 7x Gaussian 3x Gaussian 5x Gaussian 7x Disc 3x3 n/a Disc 5x5 n/a Disc 7x7 n/a Disc 9x9 n/a tion, where we took the mean of {A 1 x 1,A 2 x 2...} and found the di erence between that mean and the original image. The other type of combination performed was done by finding the element-wise maximum among the elements of {A 1 x 1,A 2 x 2...} and used that as the final value for that entry of Ax. The averaging combination proved to be more e ective with a larger penalty when the results were already quite good, but the combination made little di erence. The maximum combination was more e ective with sparser images corresponding to lower penalty, e ecting a significant improvement in error in those cases. We believe that in the sparse case, this can be thought of as kernel selection, where the best shaped kernel for a given pixel region is selected in the binary representation and then is blurred out by the appropriate operation. Combining size 3x3, 5x5, 7x7 Gaussians with standard deviation.9 as well as size 5x5, 7x7 discs, max-combination error was , averagecombination error was Combining size 3x3, 5x5, and 7x7 Gaussians with standard deviation.8 as well as size 3x3, 9x9 discs, max-combination error was , average-combination error was In general, with a large penalty, the average-combination error was smaller than the maxcombination error. The best result above was from the average-combination 5

7 with error , which is visually rather close to the original image and is shown in Figure 2. Figure 2: Average-combination of kernels MSE = , = 100 We also used a small penalty to encourage sparseness, making the kxk 1 term more influential in the minimization problem. With penalty = 0.7, the results had an average of 10% nonzero entries in the binary solutions. Increasing sparseness is possible with lower penalty, but at the expense of a large increase in error. Combining size 3x3, 5x5, 7x7 Gaussians with standard deviation.9 as well as size 5x5, 7x7 discs, max-combination error was , averagecombination error was Combining size 3x3, 5x5, 7x7 Gaussians with standard deviation.8 as well as size 3x3, 9x9 discs, max-combination error was , average-combination error was In general, with a large penalty, the average-combination error was larger than the max-combination error. The best result above was the maxcombination error of and is shown in Figure 3. However, the large sparseness results in poor image reconstruction. Overall, this approach is valid as a image compression and reconstruction technique, at least with the larger penalty, since the method converts a grayscale (8bit) image into a binary (1bit) image, which is a reduction in 6

Figure 3: Max-combination of kernels MSE = 0.5131, = 0.7. memory storage of 87.5%.

8 Figure 3: Max-combination of kernels MSE = , = 0.7. memory storage of 87.5%. The remaining memory required to reconstruct the image to the 86% accuracy found above is at most two or three parameters: the type, size, and possibly standard deviation of the kernel. Thus we have accomplished a large reduction in memory required with only a small loss in image accuracy. Having examined exact NMSE results for two specific penalty values, we will now investigate the relationship between the penalty of the least squares term in the objective function, sparsity of these binary images, and NMSE of the blurred images over a range of penalty values. Sparsity as depicted below is the number of nonzero elements (ones) in the binary image divided by the total number of pixels (64000). As the actual error and sparsity varied little between di erent types of Gaussian kernels, a 3x3 Gaussian kernel with standard deviation 0.9 was used to produce these results. First we look at the relationship between penalty and sparsity, as shown in Figure 4. One can see our sparsity starts quite low and rapidly levels out with an increase in the penalty. This is unsurprising, as with very low penalty our minimization problem is simply minimizing the ones in the image with no constraint, while with higher penalty the least squares term dominates and 7

9 0.35 Relationship between penalty and sparsity Sparsity Penalty Figure 4: Sparsity v.s. penalty. we e ectively ignore the sparsity component of the objective function. Next, we compare the penalty and the NMSE of the objective function, shown in Figure 5. Again, the results are not surprising. Since the Frobenius norm of matrices is e ectively the least squares term in our objective function, as we increase the penalty to weight the objective function more heavily towards the least squares term we will have a corresponding decrease in the Frobenius norm of the NMSE. For both of the above plots, applying a log function to the penalty term reveals they are roughly inverse relationships, but not exactly, as shown in Figure 6. Finally, we look at the relationship between sparsity and error, shown in Figure 7. Unsurprisingly, as sparsity decreases (the percent sparsity goes up) we see a decrease in NMSE: Ultimately, the increase in sparsity (lower value of sparsity) does not justify the corresponding increase in error. While 5% sparsity would take a sixth of the memory of 30% sparsity, the tradeo between about 20% error for the less sparse image and about 90% error for the sparser image is not nearly worth it. By virtue of compressing our image to a binary image, we have already accomplished significant memory reduction with minimal increase in error, thus a further - smaller - reduction in memory usage does not justify a massive decrease in quality of the image. 8

10 1 Relationship between penalty and NMSE of blurred image Error Penalty Figure 5: NMSE v.s. penalty. 1 Relationship between log(penalty) and error Error log(penalty) Figure 6: NMSE v.s. log penalty. 9

11 1 Relationship between sparsity and NMSE of blurred image Error Sparsity Figure 7: NMSE v.s. log sparsity. 6 Regression techniques In this section, we discuss our explorations in other image reconstruction methods that we used to compare and contrast with the SA implementations that were developed. In particular, we considered both regular least squares and ridge regression (Tikhonov regularization) to reconstruct the image. 6.1 Least squares and ridge regression Let A 2 R m n be the measurement matrix, such that m>nand rank(a) = n. Unless the measurements are perfect, the image vector b is outside that column space of A. Therefore it is hard to find an element x 2 R n which gives an exact solution to the overdetermined system (3) Ax = b, even when the target b is in the range of A. One can still obtain an approximate solution to (3) by solving the following minimization problem: (4) ˆx LS = arg min x2r n kax bk

12 Figure 8: Projection p = Aˆx is closest to b; soˆx minimizes E = kb Axk 2 2. This least squares solution is given by ˆx LS := (A T A) 1 A T b. Note that finding ˆx LS involves inversion of the matrix A T A. If m n, and matrix A is ill-conditioned, then A T A is singular or nearly singular. Moreover, if ˆx LS has all m components non-zero, then it is not suitable as a sparse vector for explaining the data. To give preference to a particular solution with desirable properties one can solve the regularized problem (5) min kax x2r bk2 n 2 + kxk 2 2, where > 0 is a fixed balancing parameter. In figure (9), the solid blue area represents the constraint region of kxk 2 2, while the red ellipses are the contours of the least square error function. The ridge regression solution to (5) is given by ˆx Ridge =(A T A + I n ) 1 (A T b). The minimum eigenvalue of A T A + I n is greater or equal to, which guarantees the invertibility of (A T A + I n ). If the measurement matrix A is augmented with n additional row p I n, and the vector b with n zeros, then (5) can also be viewed A b as a least squares solution to the augmented dataset p x =. In Therefore (5) is equivalent to solve A (6) ˆx Ridge = arg min k p x x2r n In b 0 n k n 11

Figure 9: Ridge regression estimate in R 2. 6.2 Implementation Recall that the image obtained from the SA has binary entries. Let x 2 {0, 1} n be the binary image obtained from SA.

13 Figure 9: Ridge regression estimate in R Implementation Recall that the image obtained from the SA has binary entries. Let x 2 {0, 1} n be the binary image obtained from SA. Denote T := {i : x i =1} as the support set of x. We form a truncated matrix A T from A such that A T = A m T, where T is the cardinality of the set T.WeuseA T to solve a truncated least squares system: (7) ˆx TLS = arg min x2r T ka T x bk 2 2. Finally, we replace x T by ˆx TLS. Next, we use truncated ridge regression to solve the minimization problem: (8) ˆx TR = arg min x2r T ka T x bk kxk 2 2. As before, we replace x T by ˆx TR. Figures 10 and 11 show a comparison of the results from our SA implementation, least squares, and ridge regression. Observe the SA implementation is quite successful in comparison to these other two methods. The truncated ridge regression is better than truncated least squares. 12

14 Original Image Blurred SQA Reconstruction (a) (b) Figure 10: (a) Original Image, (b) SA reconstruction (a) (b) Figure 11: Comparison between (a) truncated least squares and (b) truncated ridge regression 13

15 7 SPGL1: A Solver for Large-scale Sparse Optimization In this section we discuss our use of a standard large scale sparse solver SPGL1 (see reference [3]) and compare our SA results in reconstructing the image. 7.1 Outline of the method and results Solving the system Ax = b where A 2 R m n such that m<<n,su ers from ill-posedness. Classic sparse convex optimization problems which try to solve the system are 1. min x kxk 1 subject to Ax = b. (BP) 2. min x kxk 1 subject to kax bk 2 apple. (BP ) 3. min x kax bk 2 subject to kxk 1 apple. (LS ) Homotopy approaches, Basis Pursuit Denoting (BPDN) as a cone program, BPDN as a linear program, and Projected gradient method are the classic approaches to solve the above problems. If b 2 R(A) and b 6= 0, denote x as the optimal solution of (LS ). In SPGL one can consider the singleparameter function ( ) =kr k 2, with r := b Ax ; which gives an optimal value (LS ) for each > 0. So the method lies in finding a root of ( ) =. 14

16 In order to derive the dual of (LS ), this method solves an equivalent problem min r,x krk 2 subject to Ax + r = b, kxk 1 apple. Therefore dual of the above problem is max{min {krk 2 y T (Ax + r b)+ (kxk 1 )}}, y, r,x for >0. Finally the dual of (LS )reducesto max b T y subject to kyk 2 apple 1, ka T yk 1 apple. y, Theorem: With this setup, the following holds: 1. The function is convex and non-increasing. 2. For all 2 (0, BP ), is continuously di erentiable, 0 ( ) =, and the optimal dual variable = ka T y k 1, where y = r /kr k For 2 [0, BP ], kx k 1 =, and is strictly decreasing. The algorithm: Based on Newton iteration find k+1 = k + k where k = ( k ) 0. ( k ) In Figure 12, we contrast the results from SA minimization and the SPGL1 results. It is interesting to note that SPGL1 gives immediately a greyscale image, as it optimizes over a continuous range of x-values, while the SA result shown here only uses binary values. The reconstructed image in Figure 1 is a better indication of the good results obtainable with SA. 8 Other attempts While working on the project, we had an idea that we might be able to use SA to remove systematic blur from an image, such as the simulated motion blur shown in Figure 13. It was an interesting idea, but it is not clear that we obtained useful results. So we simply mention it here as an idea possibly worth pursuing. 15

17 (a) (b) Figure 12: Comparison between (a) SA and (b) SPGL minimization. 16

Figure 13: Blurred image 9 Tuning and Optimizing the Algorithm A significant concern with a real implementation of the quantum optimizer is that the computing hardware has a limited number of nodes

18 Figure 13: Blurred image 9 Tuning and Optimizing the Algorithm A significant concern with a real implementation of the quantum optimizer is that the computing hardware has a limited number of nodes to represent data in the Ising model. For instance, current hardware from D-Wave limits this to about 1000 nodes. The image compression algorithm requires hundreds of thousands of nodes, which is problematic for existing hardware, thus we had to consider methods to break the large problem into smaller, computable problems. In this section we discuss an e cient way to tune and optimize the algorithm. Recall that the original linear system is Ax = b, wherea 2 R m n, and m<n. In our examples, m is very large and solving the system requires a huge amount of memory or compute nodes. However, things are better if one can find a B 2 R k m,withk<m, such that for a predefined >0, kˆx xk 2 <,where Âˆx = ˆb and Â = BA, ˆb = Bb. 9.1 Optimizing the Algorithm: Reducing Rows Construct a vector b from b such that b =(b (i) ) m 2 R m where b (1) b (2) b (m) 0. For a tolerance 0 < apple 1, choose b(1 : k) if (9) k b(1 : k)k 2 kbk 2 >. 17

19 Let S = {i : k b(1:k)k 2 kbk 2 > } be the support set and we construct Â, such that Â 2 R S n T. We form B = e i1 e i2 e ik,whereeij is a 1 m vector with 1 in the i j th position and 0 elsewhere. To summarize, B acts as an indicator matrix which constructs Â based on (9). We use SA on Âˆx = ˆb, to reconstruct the image as shown in Figure 14. No. of rows: >10843 Figure 14: Reduced row SA reconstruction Indeed this is a memory e cient reconstruction. Originally A had rows. Using the indicator matrix B with =0.7reducedthenumber of rows in the image to On the other hand we also sacrificed the quality of the reconstructed image. For better reconstruction we target smaller blocks of the image instead of the entire matrix. We divide b in to sub-vectors b i using (9) such that 1 apple i apple k << m. We solve P p j=1 A i j x i = b i. Finally we construct x Recovered =( x j :1applej apple p), where x j is a solution to the system P p j=1 A i j x i = b i for each i. Now we use the idea of row reduction on each block A ij. For each block, using the previous technique we solve P p j=1 Âi j x i = ˆb i,where P p j=1 Âi j = P p j=1 B ia ij = B i Ã i and B i b i = ˆb i for 1 apple j apple k and we obtain the recovered image as x Recovered =(ˆx j :1applej apple p) where x j is a solution to the system P p j=1 Âi j x i = ˆb i for each i. For a predefined >0, we can guarantee P k j=1 kx j ˆx j k 2 <. The result of row reduced block reconstruction is shown in Figure 15. We partitioned the image array into 40 sub matrices, compressed each block, and reconstructed. One can notice the partition lines in the SA 18

20 Figure 15: Row reduced block SA reconstruction reconstructed image in Figure 15. To avoid that, we use overlapping block partitions of the image and use row reduced SA reconstruction technique. At the end we merged the reconstructed overlapping blocks and obtain a much better image as shown in Figure 16. Figure 16: Overlapping blocks row reduced block SA reconstruction 19

21 10 Conclusion To summarize, our compression algorithm managed to accomplish a large reduction in memory while maintaining a minimal loss in image accuracy when our penalty was large enough. We managed to get the best reconstruction by using an average of kernels. We found that by encouraging sparsity (decreasing the penalty) we lose accuracy. After trying to reconstruct the image with one of the SPGL1 regression technique we found no improvement over a reconstruction using a kernel. Now that we have develop a working algorithm for image compression, a natural next step is to attempt to test this algorithm using a quantum annealer. Here is where our optimization methods might come in handy when trying to implement this in a D-Wave system. References [1] Zhengbing Bain, Fabian Chudak, William G. Macready, and Geordie Rose, The Ising model: teaching an old problem new tricks, D-Wave Systems Technical Report Aug. 30, Available from / [2] E. van den Berg and M. P. Friedlander, Probing the Pareto frontier for basis pursuit solutions, SIAM J. on Scientific Computing, vol. 31, no. 2, pp , Nov [3] E. van den Berg and M. P. Friedlander, A solver for large-scale sparse reconstruction, June [4] V. Cerny, Thermodynamical approach to the traveling salesman problem: An e cient simulation algorithm, J. Optimization Theory and Applications, vol. 45, no. 1, pp , Jan [5] S. V. Isakov, I. N. Zintchenko, T. F. Rønnow, and M. Troyer, Optimized simulated annealing for Ising spin glasses, Computer Physics Communications, vol. 192, pp , Jul

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices Chapter 7 Iterative methods for large sparse linear systems In this chapter we revisit the problem of solving linear systems of equations, but now in the context of large sparse systems. The price to pay