Solving l 1 Regularized Least Square Problems with Hierarchical Decomposition

Similar documents
Sparse Signal Reconstruction with Hierarchical Decomposition

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Elaine T. Hale, Wotao Yin, Yin Zhang

Compressed Sensing with Quantized Measurements

Optimization Algorithms for Compressed Sensing

Bias-free Sparse Regression with Guaranteed Consistency

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

SPARSE SIGNAL RESTORATION. 1. Introduction

Basis Pursuit Denoising and the Dantzig Selector

Sparse Optimization: Algorithms and Applications. Formulating Sparse Optimization. Motivation. Stephen Wright. Caltech, 21 April 2007

Sparse Optimization Lecture: Dual Methods, Part I

Enhanced Compressive Sensing and More

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Sparse Optimization Lecture: Basic Sparse Optimization Models

Solving Minimax Problems with Feasible Sequential Quadratic Programming

ENERGY METHODS IN IMAGE PROCESSING WITH EDGE ENHANCEMENT

A Fixed-Point Continuation Method for l 1 -Regularized Minimization with Applications to Compressed Sensing

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

An Homotopy Algorithm for the Lasso with Online Observations

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Convex Optimization and l 1 -minimization

where κ is a gauge function. The majority of the solvers available for the problems under consideration work with the penalized version,

Interior Point Algorithms for Constrained Convex Optimization

Nonlinear Optimization for Optimal Control

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise

FIXED-POINT CONTINUATION APPLIED TO COMPRESSED SENSING: IMPLEMENTATION AND NUMERICAL EXPERIMENTS *

Compressive Sensing Theory and L1-Related Optimization Algorithms

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

Robust l 1 and l Solutions of Linear Inequalities

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

A Coordinate Gradient Descent Method for l 1 -regularized Convex Minimization

Computing approximate PageRank vectors by Basis Pursuit Denoising

Optimization for Compressed Sensing

Adaptive Primal Dual Optimization for Image Processing and Learning

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

ARock: an algorithmic framework for asynchronous parallel coordinate updates

Barrier Method. Javier Peña Convex Optimization /36-725

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Necessary and sufficient conditions of solution uniqueness in l 1 minimization

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Iterative Krylov Subspace Methods for Sparse Reconstruction

Dual methods for the minimization of the total variation

l 1 and l 2 Regularization

Applications with l 1 -Norm Objective Terms

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Homotopy methods based on l 0 norm for the compressed sensing problem

Signal Restoration with Overcomplete Wavelet Transforms: Comparison of Analysis and Synthesis Priors

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

arxiv: v1 [math.st] 1 Dec 2014

Iterative Methods for Inverse & Ill-posed Problems

An Introduction to Sparse Approximation

Color Scheme. swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

Heat Source Identification Based on L 1 Optimization

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization /36-725

CPSC 540: Machine Learning

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Least Sparsity of p-norm based Optimization Problems with p > 1

Sparsity Regularization

Lecture: Algorithms for Compressed Sensing

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

1 Sparsity and l 1 relaxation

Optimization methods

ANALYSIS AND GENERALIZATIONS OF THE LINEARIZED BREGMAN METHOD

Accelerated Proximal Gradient Methods for Convex Optimization

Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Fitting Linear Statistical Models to Data by Least Squares II: Weighted

The Alternating Direction Method of Multipliers

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

10725/36725 Optimization Homework 4

Lecture 15: October 15

Motivating examples Introduction to algorithms Simplex algorithm. On a particular example General algorithm. Duality An application to game theory

sparse and low-rank tensor recovery Cubic-Sketching

First-order methods for structured nonsmooth optimization

Unconstrained Optimization

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

Pathwise coordinate optimization

OWL to the rescue of LASSO

Spectral Clustering on Handwritten Digits Database

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012

Introduction to Compressed Sensing

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Novel integro-differential equations in image processing and its applications

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:

The proximal mapping

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

On a multiscale representation of images as hierarchy of edges. Eitan Tadmor. University of Maryland

Compressive Sensing (CS)

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

Gauge optimization and duality

Low Complexity Regularization 1 / 33

CHARACTERIZATION OF NONLINEAR NEURON RESPONSES

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Variational Methods in Image Denoising

Transcription:

Solving l 1 Least Square s with 1 mzhong1@umd.edu 1 AMSC and CSCAMM University of Maryland College Park Project for AMSC 663 October 2 nd, 2012

Outline 1 The 2

Outline 1 The 2

Compressed Sensing Example (A Sparse Signal in Compress Sensing) How to encode a large sparse signal using a relatively small number of linear measurements? 1 1 http://dsp.rice.edu/cscamera

A Formulation An Intuitive Approach (The Original One) min x R n{ x 0 Ax = b} (1) Where the 0 norm means the number of non-zero elements in a vector, A R m n, b R m, and m n (under-determined system). The 0 problem is solved mainly using combinatorial optimization, and thus it is NP hard 2. The 0 is not convex, one can convexify the problem by using either l 1 or l 2 norm, and then (1) can be solved using convex optimization techniques (namely linear programming). 2 B.K.Natarajan, 95

A Better Formulation In 1 We ll use p in instead of lp. (The better one) min x R n{ x p Ax = b} (2) Ax = b has infinitely many solutions. When p = 2, x = A (AA ) 1 b is the global minimizer 3, however it is not sparse, due to the limitation of l 2 norm. When p = 1, it can induce sparsity; and under the Robust Uncertainty Principles 4, one can recover the minimizer of p = 0 problem from solving (2) with p = 1. 3 proof in my report 4 E.J.Candes & T.Tao, 05

Outline 1 The 2

Unconstrained Optimization With parameter λ With p = 1 and a regularization parameter, we have: (Unconstrained with parameter λ) min x R n{ x 1 + λ 2 b Ax 2 2 } (3) λ > 1 A b in order to have non-zero optimizer 5. As λ increases, x tends to be sparser 6. λ is used to reduce overfitting, and it also adds bias to the problem (more emphasis on the least square part). Similar formulation appears in Signal Processing (BDN), Statistics (LASSO), Geophysics, etc. 5 J.J.Fuchs, 2004 6 R. Tibshirani, 1996

Nonlinear Equation Solving (3) is also equivalent to solve this following problem: (Signum Equation) sgn(x) + λ(a b A Ax) = 0 (4) { 1, a > 0 sgn(a) = 1, a < 0 for vectors. It s derived using calculus of variation. for scalars, component wise It s a nonlinear equation, and no closed form solution is found so far. It can be solved busing "A Fixed-Point Continuation Method" 7. 7 Elaine T. Hale, et al, 07

Outline 1 The 2

Disadvantages We try to address them all (2) can be defined more generally as: Where J(x) is continuous and convex. min x Rn{J(x) Ax = b} (5) When J(x) is coercive, the set of solutions of (5) is nonempty and convex. When J(x) is strictly or strongly convex, then solution of (5) is unique. However J(x) = x 1 is not strictly convex. The solutions of (3) and (4) depend on some wise choice of the parameter λ.

Outline 1 The 2

History On Image Processing A is used to solve the following problem: min x R n{ x U + λ 2 b Tx p W } (6) Where λ just represents different scales. The first 2 papers were published with U = BV, W = l 2, p = 2, and T = I. 8 3 more papers were published with T being a differential operator. 9 2 more papers afterward with emphasis on T =. 10 We ll do it with U = l 1, W = l 2, p = 2, and T = A. 8 E. Tadmor, S. Nezzar & L. Vese, 04 and 08 9 E. Tadmor & P. Athavale, 09, 10, and 11 10 E. Tadmor & C. Tan, 10 and preprint

Theorems Guidelines for Validation Theorem (Validation Principles) < x, T (b Tx) > = x U T (b Tx) U iff x solves (6) T (b Tx) U = 1 λ λ is the regularization parameter in (3)., is an inner product, and U is the dual space of U. An initial λ 0 is also found as long as it satisfies: 1 2 < λ 0 T b U 1 (8) An optimal stopping λ J is also found. 11 11 S.Osher, et al. 05 (7)

Outline 1 The 2

Least Square Data: A and b, pick λ 0 (from(8)) J Result: x = j=0 x j Initialize: r 0 = b, and j = 0; while A certain λ J is not found do x j (λ j, r j ) := arg min x R n{ x 1 + λ j 2 r j Ax 2 2 }; r j+1 = r j Ax j ; λ j+1 = 2 λ j ; j = j + 1; end b = A( J j=0 x j) + r J+1. By (7), A r J+1 = 1 λ J+1.

Examples Numerical Results 12 Figure: Increase in the details with successive increment of scales 12 E. Tadmor, S. Nezzar & L. Vese, 08

Outline 1 The 2

Gradient Projection for Sparse Reconstruction(GPSR) The Setup We ll solve min x R n{τ x 1 + 1 2 b Ax 2 2 }13 with the following transformation: (a) + = { a, a 0 0, otherwise u = (x) +, v = ( x) +, and z = y = A b and c = τ1 2n + [ A B = A A ] A A A A. A [ ] y. y [ ] u. v Then it becomes: min z R 2n {c z + 1 2 z Bz F(z) z 0}. 13 τ = 1 λ

GPSR Data: A, b, τ, and z (0), pick β (0, 1) a and µ (0, 1/2) b Result: z (K ) (τ) := min z R 2n {c z + 1 2 z Bz z 0} Initialize: k = 0; while A convergence test is not satisfied do Compute α 0 = arg min α F(z (k) αg (k) ) c ; Let α (k) be the first in the sequence α 0, βα 0, β 2 α 0,..., such that F((z (k) α (k) F(z (k) )) + ) F(z (k) ) µ F(z (k) ) (z (k) (z (k) α (k) F(z (k) )) + ) and set z (k+1) = (z (k) α (k) F(z (k) )) + ; k = k + 1; end a β is controlling the step length α 0. b µ is making sure F ( ) is decreased sufficiently from Armijo Rule. c g (k) is a projected gradient.

GPSR Convergence And Computational Efficiency The convergence of the algorithm is already shown. 14 It is more robust than IST, l 1 _l s, l 1 -magic toolbox, and the homotopy method. The Description of the algorithms is clear and easy to implement. [ A Bz = ] A(u v) A and c A(u v) z = τ1 n(u + v) y (u v) z Bz = A(u v) 2 and F(z) = c + Bz Hence the 2n 2n system can be computed as a n n system. Ax and A x can be defined as function calls instead of direct matrix-vector multiplication in order to save memory allocation. 14 M.A.T. Figueiredo, et al, 2007

Milestones Things that I m working on and will do Implement and Validate the GPSR by the end of October of 2012. Implement A Fixed-Point Continuation Method by early December of 2012. Validate A Fixed-Point Continuation Method by the end of January of 2013. Implement the whole algorithm by the end of February of 2013. Validate the whole code by the end of March of 2013. Codes will be implemented in Matlab, and validations are provided by (7). Deliverables: Matlab codes, presentation slides, complete project,

For Further Reading I Appendix For Further Reading Eitan Tadmor, Suzanne Nezzar, and Luminita Vese A Multiscale Image Representation Using (BV, L 2 ) s. 2004. Eitan Tadmor, Suzanne Nezzar, and Luminita Vese Multiscale of Images with Applications to Deblurring, Denoising and Segmentation. 2008.

For Further Reading II Appendix For Further Reading Mario A. T. Figueiredo, Roberg D. Nowak, Stephen J. Wright Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse s. 2007. Elaine T. Hal, Wotao Yin, and Yin Zhang A Fixed-Point Continuation Method for with Applications to Compressed Sensing. 2007.

For Further Reading III Appendix For Further Reading Seung-Jean Kim, K. Koh, M. Lustig, Stephen Boyd, and Dimitry Gorinevsky An Interior-Point Method for Large Scale Least Squares. 2007. Stanley Osher, Yu Mao, Bin Dong, Wotao Yin Fast Linearized Bregman Iteration for Compressive Sensing and Sparse Denoising. 2008.