Image Compression Using Simulated Annealing

Size: px
Start display at page:

Download "Image Compression Using Simulated Annealing"

Transcription

1 Image Compression Using Simulated Annealing Aritra Dutta,GeonwooKim,MeiqinLi, Carlos Ortiz Marrero,MohitSinha,ColeStiegler k August 5 14, 2015 Mathematical Modeling in Industry XIX Institute of Mathematics and its Applications Proposed by: 1QB Information Technologies Mentors: Michael P. Lamoureux 1, Pooya Ronagh 2 Abstract Over the course of this two week workshop, we developed an algorithm for image compression using an optimized implementation of simulated annealing intended to solve Ising spin problems. Our motivation is to be able to execute this algorithm using the 1QBit interface for the D-Wave quantum computational hardware system. We also explore a combination of simulated annealing and regression techniques and compared their performance. Finally, we discuss ways to optimize our algorithm in order to make it feasible for a D-Wave architecture. University of Central Florida Pusan National University Texas A & M University University of Houston University of Minnesota, Twin Cities k University of Iowa 1 Pacific Institute for the Mathematical Sciences 2 1QB Information Technology, Vancouver B.C.

2 1 Introduction Simulated annealing (SA) is a commonly used algorithm for heuristic optimization. Inspired by the study of thermal processes, this algorithm has been particularly successful at providing approximate solution to NP-Hard problems [4]. The algorithm essentially performs a Monte Carlo simulation along with a transition function. At the beginning of the simulation the algorithm is encourage to explore the landscape of the objective function while slowly lowering the probability of abrupt transitions. If the process is done slowly enough then after a few repetitions one can hope to find the optimal value. Our goal is to apply this algorithm to image compression and image reconstruction. We use an implementation of SA optimized to solve Ising spin type problems [5]. The algorithm s original purpose was to provide a highly optimized implementation of SA and compare its performance against a D-Wave device. It turns out that finding the lowest energy state in an Ising model is equivalent to solving a quadratic unconstrained binary optimization problem (QUBO). This investigation is motivated by the assumption that our algorithm could be implemented in the processor produced by D-Wave Systems, Inc. An advantage to the D-Wave system is that it can produce a spectrum of optimal and suboptimal answers to a QUBO/Ising optimization problem, rather than merely the lowest energy point. This work grew out of the question of trying to determine if the sparse recovery problem is a feasible problem for the D-Wave architecture. 2 The Ising Model The Ising spin model is a widely used model that can be used to describe any system that have a set of individual elements interacting via pairwise interactions [1]. Definition 2.1 (Ising Problem). Let G =(V,E) be a graph on n vertices with vertex set V, edge set E, and let s i 2 { 1, 1} for i 2 V. For a given configuration s =(s 1,s 2,...,s n ) 2 { 1, 1} n, the energy of this system is given by, H(s) = X k2v h k s k + X (i,j)2e J ij s i s j = hh, si + hs, Jsi where h =(h 1,...,h n ) 2 R n and J =(J ij ) 2 M n (R). 1

3 The simulated annealing implementation [5] we will be using is designed to minimize the function H(s). 3 Image Compression/Reconstruction Problem The underlying optimization problem studied here is to find a sparse solution to the underdetermined linear system Ax = b, wherea is an m n matrix and b is an m-vector, and m apple n. The problem can be written as (1) minimize x2{0,1} n x 0 subject to Ax b 2 2 apple Note that for a binary vector x 2{0, 1} n, the two norms x 0 and x 1 are identical. This problem can be interpreted as an image compression problem where your goal is to find a sparse binary vector x such that its image under A is close to the original image b. Alternatively, you can think of this problem as a image reconstruction problem where your goal is to find a sparse binary vector x that was corrupted by A, but we only have access to the corrupted image b. Here we will only discuss the image compression problem. 4 QUBO and the Ising Model It turns out that problem (1) can be relaxed to the following unconstrained optimization problem, (2) min x2{0,1} n x 0 + Ax b 2 2 for large constraint parameter >0. Let e := (1,...,1) 2{0, 1} n and consider the following reformulation of (2) x 0 + Ax b 2 2 = he, xi + hax b, Ax bi = he, xi + hax, Axi 2 hax, bi + hb, bi = he, xi + ha Ax, xi 2 hax, bi + hb, bi = he 2 A b, xi + h A Ax, xi + hb, bi Notice that (2) is a QUBO problem. 2

4 Definition 4.1 (QUBO). Let g(x) =hx, Qxi + hc, xi, where Q is a symmetric matrix, c 2 R n, and x = (x 1,...,x n ) 2 {0, 1} n. The quadratic unconstrained binary optimization problem is to minimize the function g(x) over {0, 1} n. In order to use the solver we need to formulate our problem as an Ising problem. Consider the following change of variables, Now equation (2) becomes, 5 Results x = 1 (s + e) =) x 2{0, 1}n 2 1 min s2{ 1,1} n 2 et (s + e)+ 1 2 A(s + e) b 2 2 In this section we discuss our implementation of the problem discussed in section 3, illustrating how our algorithm e ectively reduces the image size while maintaining a reasonable level of quality. Recall our problem is min x2{0,1} n kxk 1 + kax bk 2 2 where b is the original image and A is our blurring operator, in the form of a convolution expressed explicitly as a sparse matrix. In order to define a convolution, we must choose an appropriate kernel first. We found Gaussian and averaging disk kernels performed significantly better than other types of kernels, so we limit our discussion to these kernels. We use the normalized mean square error (NMSE) as our performance measure, NMSE = kax bk 2 kbk 2 where here b is an array of grayscale values representing a target image and Ax is the blurring convolution applied to the binary compression x. The following computations were performed using a Macbook Pro on OSX with a 2.3 GHz Intel Core i7 and 16 GB of memory. For our images, we used Matlab s built-in clown and mandrill images. Each took approximately 40 seconds to perform the entire computation. Most of the computation time was spent in the C solver written by Troyer et al. [5]. Since the solver is technically taking the place of the quantum annealer, the long 3

5 Figure 1: True image, binary representation, and smoothed reconstruction. computation time is not concerning as the true quantum hardware would execute this much faster. The original clown image, the binary image, and the smoothed binary image are shown in Figure 1, with parameter = 100. Initially, we started with a rather large penalty to encourage kax bk to be as small as possible, but at the expense of a less sparse results. On average, the binary images had 30-35% nonzero entries in the solution before being blurred. With = 100 we obtained the results shown in Table 1 for the di erent kernels. Note that smaller NMSE values indicate better performance. In an attempt to further improve our results, we combined the blurred images Ax in a couple di erent ways. First was a simple averaging combina- 4

6 Table 1: NMSE results for various smoothing kernels. (Smaller is better.) Kernel Type Size Standard Deviation NMSE Gaussian 3x Gaussian 5x Gaussian 7x Gaussian 3x Gaussian 5x Gaussian 7x Gaussian 3x Gaussian 5x Gaussian 7x Disc 3x3 n/a Disc 5x5 n/a Disc 7x7 n/a Disc 9x9 n/a tion, where we took the mean of {A 1 x 1,A 2 x 2...} and found the di erence between that mean and the original image. The other type of combination performed was done by finding the element-wise maximum among the elements of {A 1 x 1,A 2 x 2...} and used that as the final value for that entry of Ax. The averaging combination proved to be more e ective with a larger penalty when the results were already quite good, but the combination made little di erence. The maximum combination was more e ective with sparser images corresponding to lower penalty, e ecting a significant improvement in error in those cases. We believe that in the sparse case, this can be thought of as kernel selection, where the best shaped kernel for a given pixel region is selected in the binary representation and then is blurred out by the appropriate operation. Combining size 3x3, 5x5, 7x7 Gaussians with standard deviation.9 as well as size 5x5, 7x7 discs, max-combination error was , averagecombination error was Combining size 3x3, 5x5, and 7x7 Gaussians with standard deviation.8 as well as size 3x3, 9x9 discs, max-combination error was , average-combination error was In general, with a large penalty, the average-combination error was smaller than the maxcombination error. The best result above was from the average-combination 5

7 with error , which is visually rather close to the original image and is shown in Figure 2. Figure 2: Average-combination of kernels MSE = , = 100 We also used a small penalty to encourage sparseness, making the kxk 1 term more influential in the minimization problem. With penalty = 0.7, the results had an average of 10% nonzero entries in the binary solutions. Increasing sparseness is possible with lower penalty, but at the expense of a large increase in error. Combining size 3x3, 5x5, 7x7 Gaussians with standard deviation.9 as well as size 5x5, 7x7 discs, max-combination error was , averagecombination error was Combining size 3x3, 5x5, 7x7 Gaussians with standard deviation.8 as well as size 3x3, 9x9 discs, max-combination error was , average-combination error was In general, with a large penalty, the average-combination error was larger than the max-combination error. The best result above was the maxcombination error of and is shown in Figure 3. However, the large sparseness results in poor image reconstruction. Overall, this approach is valid as a image compression and reconstruction technique, at least with the larger penalty, since the method converts a grayscale (8bit) image into a binary (1bit) image, which is a reduction in 6

8 Figure 3: Max-combination of kernels MSE = , = 0.7. memory storage of 87.5%. The remaining memory required to reconstruct the image to the 86% accuracy found above is at most two or three parameters: the type, size, and possibly standard deviation of the kernel. Thus we have accomplished a large reduction in memory required with only a small loss in image accuracy. Having examined exact NMSE results for two specific penalty values, we will now investigate the relationship between the penalty of the least squares term in the objective function, sparsity of these binary images, and NMSE of the blurred images over a range of penalty values. Sparsity as depicted below is the number of nonzero elements (ones) in the binary image divided by the total number of pixels (64000). As the actual error and sparsity varied little between di erent types of Gaussian kernels, a 3x3 Gaussian kernel with standard deviation 0.9 was used to produce these results. First we look at the relationship between penalty and sparsity, as shown in Figure 4. One can see our sparsity starts quite low and rapidly levels out with an increase in the penalty. This is unsurprising, as with very low penalty our minimization problem is simply minimizing the ones in the image with no constraint, while with higher penalty the least squares term dominates and 7

9 0.35 Relationship between penalty and sparsity Sparsity Penalty Figure 4: Sparsity v.s. penalty. we e ectively ignore the sparsity component of the objective function. Next, we compare the penalty and the NMSE of the objective function, shown in Figure 5. Again, the results are not surprising. Since the Frobenius norm of matrices is e ectively the least squares term in our objective function, as we increase the penalty to weight the objective function more heavily towards the least squares term we will have a corresponding decrease in the Frobenius norm of the NMSE. For both of the above plots, applying a log function to the penalty term reveals they are roughly inverse relationships, but not exactly, as shown in Figure 6. Finally, we look at the relationship between sparsity and error, shown in Figure 7. Unsurprisingly, as sparsity decreases (the percent sparsity goes up) we see a decrease in NMSE: Ultimately, the increase in sparsity (lower value of sparsity) does not justify the corresponding increase in error. While 5% sparsity would take a sixth of the memory of 30% sparsity, the tradeo between about 20% error for the less sparse image and about 90% error for the sparser image is not nearly worth it. By virtue of compressing our image to a binary image, we have already accomplished significant memory reduction with minimal increase in error, thus a further - smaller - reduction in memory usage does not justify a massive decrease in quality of the image. 8

10 1 Relationship between penalty and NMSE of blurred image Error Penalty Figure 5: NMSE v.s. penalty. 1 Relationship between log(penalty) and error Error log(penalty) Figure 6: NMSE v.s. log penalty. 9

11 1 Relationship between sparsity and NMSE of blurred image Error Sparsity Figure 7: NMSE v.s. log sparsity. 6 Regression techniques In this section, we discuss our explorations in other image reconstruction methods that we used to compare and contrast with the SA implementations that were developed. In particular, we considered both regular least squares and ridge regression (Tikhonov regularization) to reconstruct the image. 6.1 Least squares and ridge regression Let A 2 R m n be the measurement matrix, such that m>nand rank(a) = n. Unless the measurements are perfect, the image vector b is outside that column space of A. Therefore it is hard to find an element x 2 R n which gives an exact solution to the overdetermined system (3) Ax = b, even when the target b is in the range of A. One can still obtain an approximate solution to (3) by solving the following minimization problem: (4) ˆx LS = arg min x2r n kax bk

12 Figure 8: Projection p = Aˆx is closest to b; soˆx minimizes E = kb Axk 2 2. This least squares solution is given by ˆx LS := (A T A) 1 A T b. Note that finding ˆx LS involves inversion of the matrix A T A. If m n, and matrix A is ill-conditioned, then A T A is singular or nearly singular. Moreover, if ˆx LS has all m components non-zero, then it is not suitable as a sparse vector for explaining the data. To give preference to a particular solution with desirable properties one can solve the regularized problem (5) min kax x2r bk2 n 2 + kxk 2 2, where > 0 is a fixed balancing parameter. In figure (9), the solid blue area represents the constraint region of kxk 2 2, while the red ellipses are the contours of the least square error function. The ridge regression solution to (5) is given by ˆx Ridge =(A T A + I n ) 1 (A T b). The minimum eigenvalue of A T A + I n is greater or equal to, which guarantees the invertibility of (A T A + I n ). If the measurement matrix A is augmented with n additional row p I n, and the vector b with n zeros, then (5) can also be viewed A b as a least squares solution to the augmented dataset p x =. In Therefore (5) is equivalent to solve A (6) ˆx Ridge = arg min k p x x2r n In b 0 n k n 11

13 Figure 9: Ridge regression estimate in R Implementation Recall that the image obtained from the SA has binary entries. Let x 2 {0, 1} n be the binary image obtained from SA. Denote T := {i : x i =1} as the support set of x. We form a truncated matrix A T from A such that A T = A m T, where T is the cardinality of the set T.WeuseA T to solve a truncated least squares system: (7) ˆx TLS = arg min x2r T ka T x bk 2 2. Finally, we replace x T by ˆx TLS. Next, we use truncated ridge regression to solve the minimization problem: (8) ˆx TR = arg min x2r T ka T x bk kxk 2 2. As before, we replace x T by ˆx TR. Figures 10 and 11 show a comparison of the results from our SA implementation, least squares, and ridge regression. Observe the SA implementation is quite successful in comparison to these other two methods. The truncated ridge regression is better than truncated least squares. 12

14 Original Image Blurred SQA Reconstruction (a) (b) Figure 10: (a) Original Image, (b) SA reconstruction (a) (b) Figure 11: Comparison between (a) truncated least squares and (b) truncated ridge regression 13

15 7 SPGL1: A Solver for Large-scale Sparse Optimization In this section we discuss our use of a standard large scale sparse solver SPGL1 (see reference [3]) and compare our SA results in reconstructing the image. 7.1 Outline of the method and results Solving the system Ax = b where A 2 R m n such that m<<n,su ers from ill-posedness. Classic sparse convex optimization problems which try to solve the system are 1. min x kxk 1 subject to Ax = b. (BP) 2. min x kxk 1 subject to kax bk 2 apple. (BP ) 3. min x kax bk 2 subject to kxk 1 apple. (LS ) Homotopy approaches, Basis Pursuit Denoting (BPDN) as a cone program, BPDN as a linear program, and Projected gradient method are the classic approaches to solve the above problems. If b 2 R(A) and b 6= 0, denote x as the optimal solution of (LS ). In SPGL one can consider the singleparameter function ( ) =kr k 2, with r := b Ax ; which gives an optimal value (LS ) for each > 0. So the method lies in finding a root of ( ) =. 14

16 In order to derive the dual of (LS ), this method solves an equivalent problem min r,x krk 2 subject to Ax + r = b, kxk 1 apple. Therefore dual of the above problem is max{min {krk 2 y T (Ax + r b)+ (kxk 1 )}}, y, r,x for >0. Finally the dual of (LS )reducesto max b T y subject to kyk 2 apple 1, ka T yk 1 apple. y, Theorem: With this setup, the following holds: 1. The function is convex and non-increasing. 2. For all 2 (0, BP ), is continuously di erentiable, 0 ( ) =, and the optimal dual variable = ka T y k 1, where y = r /kr k For 2 [0, BP ], kx k 1 =, and is strictly decreasing. The algorithm: Based on Newton iteration find k+1 = k + k where k = ( k ) 0. ( k ) In Figure 12, we contrast the results from SA minimization and the SPGL1 results. It is interesting to note that SPGL1 gives immediately a greyscale image, as it optimizes over a continuous range of x-values, while the SA result shown here only uses binary values. The reconstructed image in Figure 1 is a better indication of the good results obtainable with SA. 8 Other attempts While working on the project, we had an idea that we might be able to use SA to remove systematic blur from an image, such as the simulated motion blur shown in Figure 13. It was an interesting idea, but it is not clear that we obtained useful results. So we simply mention it here as an idea possibly worth pursuing. 15

17 (a) (b) Figure 12: Comparison between (a) SA and (b) SPGL minimization. 16

18 Figure 13: Blurred image 9 Tuning and Optimizing the Algorithm A significant concern with a real implementation of the quantum optimizer is that the computing hardware has a limited number of nodes to represent data in the Ising model. For instance, current hardware from D-Wave limits this to about 1000 nodes. The image compression algorithm requires hundreds of thousands of nodes, which is problematic for existing hardware, thus we had to consider methods to break the large problem into smaller, computable problems. In this section we discuss an e cient way to tune and optimize the algorithm. Recall that the original linear system is Ax = b, wherea 2 R m n, and m<n. In our examples, m is very large and solving the system requires a huge amount of memory or compute nodes. However, things are better if one can find a B 2 R k m,withk<m, such that for a predefined >0, kˆx xk 2 <,where ˆx = ˆb and  = BA, ˆb = Bb. 9.1 Optimizing the Algorithm: Reducing Rows Construct a vector b from b such that b =(b (i) ) m 2 R m where b (1) b (2) b (m) 0. For a tolerance 0 < apple 1, choose b(1 : k) if (9) k b(1 : k)k 2 kbk 2 >. 17

19 Let S = {i : k b(1:k)k 2 kbk 2 > } be the support set and we construct Â, such that  2 R S n T. We form B = e i1 e i2 e ik,whereeij is a 1 m vector with 1 in the i j th position and 0 elsewhere. To summarize, B acts as an indicator matrix which constructs  based on (9). We use SA on ˆx = ˆb, to reconstruct the image as shown in Figure 14. No. of rows: >10843 Figure 14: Reduced row SA reconstruction Indeed this is a memory e cient reconstruction. Originally A had rows. Using the indicator matrix B with =0.7reducedthenumber of rows in the image to On the other hand we also sacrificed the quality of the reconstructed image. For better reconstruction we target smaller blocks of the image instead of the entire matrix. We divide b in to sub-vectors b i using (9) such that 1 apple i apple k << m. We solve P p j=1 A i j x i = b i. Finally we construct x Recovered =( x j :1applej apple p), where x j is a solution to the system P p j=1 A i j x i = b i for each i. Now we use the idea of row reduction on each block A ij. For each block, using the previous technique we solve P p j=1 Âi j x i = ˆb i,where P p j=1 Âi j = P p j=1 B ia ij = B i à i and B i b i = ˆb i for 1 apple j apple k and we obtain the recovered image as x Recovered =(ˆx j :1applej apple p) where x j is a solution to the system P p j=1 Âi j x i = ˆb i for each i. For a predefined >0, we can guarantee P k j=1 kx j ˆx j k 2 <. The result of row reduced block reconstruction is shown in Figure 15. We partitioned the image array into 40 sub matrices, compressed each block, and reconstructed. One can notice the partition lines in the SA 18

20 Figure 15: Row reduced block SA reconstruction reconstructed image in Figure 15. To avoid that, we use overlapping block partitions of the image and use row reduced SA reconstruction technique. At the end we merged the reconstructed overlapping blocks and obtain a much better image as shown in Figure 16. Figure 16: Overlapping blocks row reduced block SA reconstruction 19

21 10 Conclusion To summarize, our compression algorithm managed to accomplish a large reduction in memory while maintaining a minimal loss in image accuracy when our penalty was large enough. We managed to get the best reconstruction by using an average of kernels. We found that by encouraging sparsity (decreasing the penalty) we lose accuracy. After trying to reconstruct the image with one of the SPGL1 regression technique we found no improvement over a reconstruction using a kernel. Now that we have develop a working algorithm for image compression, a natural next step is to attempt to test this algorithm using a quantum annealer. Here is where our optimization methods might come in handy when trying to implement this in a D-Wave system. References [1] Zhengbing Bain, Fabian Chudak, William G. Macready, and Geordie Rose, The Ising model: teaching an old problem new tricks, D-Wave Systems Technical Report Aug. 30, Available from / [2] E. van den Berg and M. P. Friedlander, Probing the Pareto frontier for basis pursuit solutions, SIAM J. on Scientific Computing, vol. 31, no. 2, pp , Nov [3] E. van den Berg and M. P. Friedlander, A solver for large-scale sparse reconstruction, June [4] V. Cerny, Thermodynamical approach to the traveling salesman problem: An e cient simulation algorithm, J. Optimization Theory and Applications, vol. 45, no. 1, pp , Jan [5] S. V. Isakov, I. N. Zintchenko, T. F. Rønnow, and M. Troyer, Optimized simulated annealing for Ising spin glasses, Computer Physics Communications, vol. 192, pp , Jul

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices Chapter 7 Iterative methods for large sparse linear systems In this chapter we revisit the problem of solving linear systems of equations, but now in the context of large sparse systems. The price to pay

More information

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Alfredo Nava-Tudela ant@umd.edu John J. Benedetto Department of Mathematics jjb@umd.edu Abstract In this project we are

More information

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学 Linearized Alternating Direction Method: Two Blocks and Multiple Blocks Zhouchen Lin 林宙辰北京大学 Dec. 3, 014 Outline Alternating Direction Method (ADM) Linearized Alternating Direction Method (LADM) Two Blocks

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Optimal Value Function Methods in Numerical Optimization Level Set Methods

Optimal Value Function Methods in Numerical Optimization Level Set Methods Optimal Value Function Methods in Numerical Optimization Level Set Methods James V Burke Mathematics, University of Washington, (jvburke@uw.edu) Joint work with Aravkin (UW), Drusvyatskiy (UW), Friedlander

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

Sparsity and Compressed Sensing

Sparsity and Compressed Sensing Sparsity and Compressed Sensing Jalal Fadili Normandie Université-ENSICAEN, GREYC Mathematical coffees 2017 Recap: linear inverse problems Dictionary Sensing Sensing Sensing = y m 1 H m n y y 2 R m H A

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Gradient Descent Methods

Gradient Descent Methods Lab 18 Gradient Descent Methods Lab Objective: Many optimization methods fall under the umbrella of descent algorithms. The idea is to choose an initial guess, identify a direction from this point along

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization. A Lasso Solver

MS&E 318 (CME 338) Large-Scale Numerical Optimization. A Lasso Solver Stanford University, Dept of Management Science and Engineering MS&E 318 (CME 338) Large-Scale Numerical Optimization Instructor: Michael Saunders Spring 2011 Final Project Due Friday June 10 A Lasso Solver

More information

Making Flippy Floppy

Making Flippy Floppy Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Vietnam

More information

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION A Thesis by MELTEM APAYDIN Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the

More information

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank

More information

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ???? DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

CHAPTER 7. Regression

CHAPTER 7. Regression CHAPTER 7 Regression This chapter presents an extended example, illustrating and extending many of the concepts introduced over the past three chapters. Perhaps the best known multi-variate optimisation

More information

Making Flippy Floppy

Making Flippy Floppy Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Current

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Spectral Clustering on Handwritten Digits Database

Spectral Clustering on Handwritten Digits Database University of Maryland-College Park Advance Scientific Computing I,II Spectral Clustering on Handwritten Digits Database Author: Danielle Middlebrooks Dmiddle1@math.umd.edu Second year AMSC Student Advisor:

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Yin Zhang Technical Report TR05-06 Department of Computational and Applied Mathematics Rice University,

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Jan 9

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Jan 9 Problem du jour Week 3: Wednesday, Jan 9 1. As a function of matrix dimension, what is the asymptotic complexity of computing a determinant using the Laplace expansion (cofactor expansion) that you probably

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Linear Algebra, Summer 2011, pt. 2

Linear Algebra, Summer 2011, pt. 2 Linear Algebra, Summer 2, pt. 2 June 8, 2 Contents Inverses. 2 Vector Spaces. 3 2. Examples of vector spaces..................... 3 2.2 The column space......................... 6 2.3 The null space...........................

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation

More information

Least squares problems Linear Algebra with Computer Science Application

Least squares problems Linear Algebra with Computer Science Application Linear Algebra with Computer Science Application April 8, 018 1 Least Squares Problems 11 Least Squares Problems What do you do when Ax = b has no solution? Inconsistent systems arise often in applications

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

MATRICES. a m,1 a m,n A =

MATRICES. a m,1 a m,n A = MATRICES Matrices are rectangular arrays of real or complex numbers With them, we define arithmetic operations that are generalizations of those for real and complex numbers The general form a matrix of

More information

Scientific Computing

Scientific Computing Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting

More information

One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017

One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017 One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017 1 One Picture and a Thousand Words Using Matrix Approximations Dianne P. O Leary

More information

Iterative Solvers. Lab 6. Iterative Methods

Iterative Solvers. Lab 6. Iterative Methods Lab 6 Iterative Solvers Lab Objective: Many real-world problems of the form Ax = b have tens of thousands of parameters Solving such systems with Gaussian elimination or matrix factorizations could require

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians Exercise Sheet 1 1 Probability revision 1: Student-t as an infinite mixture of Gaussians Show that an infinite mixture of Gaussian distributions, with Gamma distributions as mixing weights in the following

More information

Chapter 2. Optimization. Gradients, convexity, and ALS

Chapter 2. Optimization. Gradients, convexity, and ALS Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Rank minimization via the γ 2 norm

Rank minimization via the γ 2 norm Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for

More information

Leveraging Machine Learning for High-Resolution Restoration of Satellite Imagery

Leveraging Machine Learning for High-Resolution Restoration of Satellite Imagery Leveraging Machine Learning for High-Resolution Restoration of Satellite Imagery Daniel L. Pimentel-Alarcón, Ashish Tiwari Georgia State University, Atlanta, GA Douglas A. Hope Hope Scientific Renaissance

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 08 Boolean Matrix Factorization Rainer Gemulla, Pauli Miettinen June 13, 2013 Outline 1 Warm-Up 2 What is BMF 3 BMF vs. other three-letter abbreviations 4 Binary matrices, tiles,

More information

Bayesian Methods for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery

More information

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St Structured Lower Rank Approximation by Moody T. Chu (NCSU) joint with Robert E. Funderlic (NCSU) and Robert J. Plemmons (Wake Forest) March 5, 1998 Outline Introduction: Problem Description Diculties Algebraic

More information

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD EE-731: ADVANCED TOPICS IN DATA SCIENCES LABORATORY FOR INFORMATION AND INFERENCE SYSTEMS SPRING 2016 INSTRUCTOR: VOLKAN CEVHER SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD STRUCTURED SPARSITY

More information

Recovery Based on Kolmogorov Complexity in Underdetermined Systems of Linear Equations

Recovery Based on Kolmogorov Complexity in Underdetermined Systems of Linear Equations Recovery Based on Kolmogorov Complexity in Underdetermined Systems of Linear Equations David Donoho Department of Statistics Stanford University Email: donoho@stanfordedu Hossein Kakavand, James Mammen

More information

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional

More information

Notes for CS542G (Iterative Solvers for Linear Systems)

Notes for CS542G (Iterative Solvers for Linear Systems) Notes for CS542G (Iterative Solvers for Linear Systems) Robert Bridson November 20, 2007 1 The Basics We re now looking at efficient ways to solve the linear system of equations Ax = b where in this course,

More information

12. Cholesky factorization

12. Cholesky factorization L. Vandenberghe ECE133A (Winter 2018) 12. Cholesky factorization positive definite matrices examples Cholesky factorization complex positive definite matrices kernel methods 12-1 Definitions a symmetric

More information

An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse

An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse Yongjia Song, James Luedtke Virginia Commonwealth University, Richmond, VA, ysong3@vcu.edu University

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models

More information

2.3. Clustering or vector quantization 57

2.3. Clustering or vector quantization 57 Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :

More information

then kaxk 1 = j a ij x j j ja ij jjx j j: Changing the order of summation, we can separate the summands, kaxk 1 ja ij jjx j j: let then c = max 1jn ja

then kaxk 1 = j a ij x j j ja ij jjx j j: Changing the order of summation, we can separate the summands, kaxk 1 ja ij jjx j j: let then c = max 1jn ja Homework Haimanot Kassa, Jeremy Morris & Isaac Ben Jeppsen October 7, 004 Exercise 1 : We can say that kxk = kx y + yk And likewise So we get kxk kx yk + kyk kxk kyk kx yk kyk = ky x + xk kyk ky xk + kxk

More information

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 18th,

More information

Matrices MA1S1. Tristan McLoughlin. November 9, Anton & Rorres: Ch

Matrices MA1S1. Tristan McLoughlin. November 9, Anton & Rorres: Ch Matrices MA1S1 Tristan McLoughlin November 9, 2014 Anton & Rorres: Ch 1.3-1.8 Basic matrix notation We have studied matrices as a tool for solving systems of linear equations but now we want to study them

More information

A note on QUBO instances defined on Chimera graphs

A note on QUBO instances defined on Chimera graphs A note on QUBO instances defined on Chimera graphs Sanjeeb Dash IBM T. J. Watson Research Center July 1, 2013 Abstract McGeoch and Wang (2013) recently obtained optimal or near-optimal solutions to some

More information

Iterative Methods. Splitting Methods

Iterative Methods. Splitting Methods Iterative Methods Splitting Methods 1 Direct Methods Solving Ax = b using direct methods. Gaussian elimination (using LU decomposition) Variants of LU, including Crout and Doolittle Other decomposition

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSE SIGNAL RESTORATION. 1. Introduction SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful

More information

Lecture 22: More On Compressed Sensing

Lecture 22: More On Compressed Sensing Lecture 22: More On Compressed Sensing Scribed by Eric Lee, Chengrun Yang, and Sebastian Ament Nov. 2, 207 Recap and Introduction Basis pursuit was the method of recovering the sparsest solution to an

More information

Compressed Sensing via Partial l 1 Minimization

Compressed Sensing via Partial l 1 Minimization WORCESTER POLYTECHNIC INSTITUTE Compressed Sensing via Partial l 1 Minimization by Lu Zhong A thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A. AMSC/CMSC 661 Scientific Computing II Spring 2005 Solution of Sparse Linear Systems Part 2: Iterative methods Dianne P. O Leary c 2005 Solving Sparse Linear Systems: Iterative methods The plan: Iterative

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88 Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France. Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Overview of the course Introduction sparsity & data compression inverse problems

More information

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Structure of the tutorial Session 1: Introduction to inverse problems & sparse

More information

Agenda. 1 Duality for LP. 2 Theorem of alternatives. 3 Conic Duality. 4 Dual cones. 5 Geometric view of cone programs. 6 Conic duality theorem

Agenda. 1 Duality for LP. 2 Theorem of alternatives. 3 Conic Duality. 4 Dual cones. 5 Geometric view of cone programs. 6 Conic duality theorem Agenda 1 Duality for LP 2 Theorem of alternatives 3 Conic Duality 4 Dual cones 5 Geometric view of cone programs 6 Conic duality theorem 7 Examples Lower bounds on LPs By eliminating variables (if needed)

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Algorithm-Hardware Co-Optimization of Memristor-Based Framework for Solving SOCP and Homogeneous QCQP Problems

Algorithm-Hardware Co-Optimization of Memristor-Based Framework for Solving SOCP and Homogeneous QCQP Problems L.C.Smith College of Engineering and Computer Science Algorithm-Hardware Co-Optimization of Memristor-Based Framework for Solving SOCP and Homogeneous QCQP Problems Ao Ren Sijia Liu Ruizhe Cai Wujie Wen

More information

Basic Concepts in Linear Algebra

Basic Concepts in Linear Algebra Basic Concepts in Linear Algebra Grady B Wright Department of Mathematics Boise State University February 2, 2015 Grady B Wright Linear Algebra Basics February 2, 2015 1 / 39 Numerical Linear Algebra Linear

More information

Non-Negative Matrix Factorization with Quasi-Newton Optimization

Non-Negative Matrix Factorization with Quasi-Newton Optimization Non-Negative Matrix Factorization with Quasi-Newton Optimization Rafal ZDUNEK, Andrzej CICHOCKI Laboratory for Advanced Brain Signal Processing BSI, RIKEN, Wako-shi, JAPAN Abstract. Non-negative matrix

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal

More information

Numerical Methods I: Numerical linear algebra

Numerical Methods I: Numerical linear algebra 1/3 Numerical Methods I: Numerical linear algebra Georg Stadler Courant Institute, NYU stadler@cimsnyuedu September 1, 017 /3 We study the solution of linear systems of the form Ax = b with A R n n, x,

More information

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},

More information

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,

More information

Lecture 6 Positive Definite Matrices

Lecture 6 Positive Definite Matrices Linear Algebra Lecture 6 Positive Definite Matrices Prof. Chun-Hung Liu Dept. of Electrical and Computer Engineering National Chiao Tung University Spring 2017 2017/6/8 Lecture 6: Positive Definite Matrices

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition We are interested in more than just sym+def matrices. But the eigenvalue decompositions discussed in the last section of notes will play a major role in solving general

More information

4 Linear Algebra Review

4 Linear Algebra Review Linear Algebra Review For this topic we quickly review many key aspects of linear algebra that will be necessary for the remainder of the text 1 Vectors and Matrices For the context of data analysis, the

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Consider the following example of a linear system:

Consider the following example of a linear system: LINEAR SYSTEMS Consider the following example of a linear system: Its unique solution is x + 2x 2 + 3x 3 = 5 x + x 3 = 3 3x + x 2 + 3x 3 = 3 x =, x 2 = 0, x 3 = 2 In general we want to solve n equations

More information