Convex Optimization and l 1 -minimization

Similar documents
12. Interior-point methods

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Course Outline. FRTN10 Multivariable Control, Lecture 13. General idea for Lectures Lecture 13 Outline. Example 1 (Doyle Stein, 1979)

The Q-parametrization (Youla) Lecture 13: Synthesis by Convex Optimization. Lecture 13: Synthesis by Convex Optimization. Example: Spring-mass System

12. Interior-point methods

January 29, Introduction to optimization and complexity. Outline. Introduction. Problem formulation. Convexity reminder. Optimality Conditions

FRTN10 Multivariable Control, Lecture 13. Course outline. The Q-parametrization (Youla) Example: Spring-mass System

Lecture 9 Sequential unconstrained minimization

10. Unconstrained minimization

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

A Brief Review on Convex Optimization

Convex optimization problems. Optimization problem in standard form

Barrier Method. Javier Peña Convex Optimization /36-725

CSCI : Optimization and Control of Networks. Review on Convex Optimization

IOE 611/Math 663: Nonlinear Programming

Interior Point Algorithms for Constrained Convex Optimization

Introduction to Convex Optimization

Unconstrained minimization

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

4. Convex optimization problems

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Newton s Method. Javier Peña Convex Optimization /36-725

Lecture: Convex Optimization Problems

Lecture 6: Conic Optimization September 8

4. Convex optimization problems

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

Sparse Optimization Lecture: Basic Sparse Optimization Models

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

11. Equality constrained minimization

CS711008Z Algorithm Design and Analysis

Advances in Convex Optimization: Theory, Algorithms, and Applications

Agenda. 1 Cone programming. 2 Convex cones. 3 Generalized inequalities. 4 Linear programming (LP) 5 Second-order cone programming (SOCP)

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Tutorial on Convex Optimization for Engineers Part II

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

Lecture 1: Introduction

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization /36-725

Nonlinear Optimization for Optimal Control

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

10 Numerical methods for constrained problems

Convex Optimization in Communications and Signal Processing

ORIE 6326: Convex Optimization. Quasi-Newton Methods

A Tutorial on Convex Optimization II: Duality and Interior Point Methods

Sparsity Regularization

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization

10-725/ Optimization Midterm Exam

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

Convex Optimization and Modeling

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Canonical Problem Forms. Ryan Tibshirani Convex Optimization

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Convex Optimization & Machine Learning. Introduction to Optimization

Equality constrained minimization

Unconstrained minimization: assumptions

Lecture 14: October 17

Lecture 7: Convex Optimizations

Gauge optimization and duality

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Determinant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University

Primal-Dual Interior-Point Methods

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Primal-Dual Interior-Point Methods. Javier Peña Convex Optimization /36-725

Lecture 14: Newton s Method

Homework 4. Convex Optimization /36-725

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Lecture: Duality of LP, SOCP and SDP

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Interior Point Methods for Mathematical Programming

Lecture Note 5: Semidefinite Programming for Stability Analysis

14. Nonlinear equations

Analytic Center Cutting-Plane Method

More First-Order Optimization Algorithms

Math 273a: Optimization Subgradients of convex functions

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Constrained Optimization and Lagrangian Duality

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Subgradient Method. Ryan Tibshirani Convex Optimization

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Lecture: Cone programming. Approximating the Lorentz cone.

Lecture 8. Strong Duality Results. September 22, 2008

minimize x subject to (x 2)(x 4) u,

LINEAR AND NONLINEAR PROGRAMMING

5. Duality. Lagrangian

Convex Optimization M2

Computational Finance

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Introduction to Alternating Direction Method of Multipliers

Selected Topics in Optimization. Some slides borrowed from

Convex Optimization. Problem set 2. Due Monday April 26th

Module 04 Optimization Problems KKT Conditions & Solvers

Lecture 5: September 15

POLYNOMIAL OPTIMIZATION WITH SUMS-OF-SQUARES INTERPOLANTS

Optimization methods

Lecture 5: September 12

Transcription:

Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School

Outline I. Convex Optimization II. l 1 -Minimization: Sparse Model

Thank you! I. Convex Optimization

Content (Mathematical) Optimization Convex Optimization Brief History of Convex Optimization Linear Programming and Least Squares Problem Unconstrained Minimization Equality Constrained Minimization Interior-Point Methods Conclusions

Mathematical Optimization (Mathematical) optimization problem minimize f 0 (x) subject to f i (x) b i, i = 1,..., m x R n : optimization variable f 0 : R n R: objective function f i : R n R, i = 1,..., m: constraint functions Optimization solution x has smallest value of f 0 among all vectors that satisfy the constraints.

Optimization Problem in Standard Form minimize subject to f 0 (x) f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p x R n : optimization variable f 0 : R n R: objective or cost function f i : R n R, i = 1,..., m,: inequality constraint functions h i : R n R, i = 1,..., p,: equality constraint functions Optimal Value p = inf{f 0 (x) f i (x) 0, i = 1,..., m, h i (x) = 0, i = 1,..., p} p = if problem is infeasible (no x satisfies the constraints) p = if problem is unbounded below

Examples Portfolio Optimization variables: amounts invested in different assets constraints: budget, max./min. investment per asset, minimum return objective: overall risk or return variance Data Fitting variables: model parameters constraints: prior information, parameter limits objective: measure of misfit or prediction error

Solving Optimization Problems General Optimization Problem very difficult to solve methods involve some compromise, e.g., very long computation time, or not always finding the solution Exceptions: certain problem classes can be solved efficiently and reliably least-squares problems linear programming problems convex optimization problems

Convex Optimization minimize subject to f 0 (x) f i (x) b i, i = 1,..., m objective and constraint functions are convex: if α + β = 1, α 0, β 0 f i (αx + βy) αf i (x) + βf i (y)

Standard Form Convex Optimization Problem minimize subject to f 0 (x) f i (x) 0, i = 1,..., m ai T x = b i, i = 1,..., p f 0, f 1,..., f m : convex equality constraints are affine often written as minimize subject to f 0 (x) f i (x) 0, i = 1,..., m Ax = b Important property: feasible set of a convex optimization problem is convex

Optimality Criterion for Differentiable f 0 unconstrained problem: x is optimal if and only if x domf 0, f 0 (x) = 0 equality constrained problem minimize f 0 (x) subject to Ax = b x is optimal if and only if there exists a v such that x domf 0, Ax = b, f 0 (x) + A T v = 0 minimization over nonnegative orthant minimize f 0 (x) subject to x 0 x is optimal if and only if x domf 0, x 0, { f 0 (x) i 0 x i = 0 f 0 (x) i = 0 x i > 0

Solving convex optimization problems no analytical solution reliable and efficient algorithms computation time (roughly) proportional to max{n 3, n 2 m, F}, where F is cost of evaluating f i s and their first and second derivatives surprisingly many problems can be solved via convex optimization many tricks for transforming problems into convex form

Linear Programming and Least Squares Problem Linear Programming minimize subject to c T x ai T x b i, i = 1,..., m Example Diet problem: choose quantities x 1,..., x n of n foods one unit of food j costs c j, contains amount a ij of nutrient i healthy diet requires food i (quantity) at least b i to find cheapest healthy diet, minimize c T x subject to Ax b, x 0

Solving linear programmming problem no analytical formula for solution reliable and efficient algorithms and software computation time proportional to n 2 m if m n; less with structure a few standard tricks used to convert problems into linear programs (e.g., problems involving l 1 - or l -norms, piecewise-linear functions)

Least Squares Problem minimize Ax b 2 2 Solving least-squares problem analytical solution: x = (A T A) 1 A T b reliable and efficient algorithms and software computation time proportional to n 2 m (A R m n ); less if structured least-squares problems are easy to recognize a few standard techniques increase flexibility (e.g., including weights, adding regularization terms)

Quadratic Program (QP) 1 minimize 2 x T Px + q T x + r subject to Gx h Ax = b P S n +, so objective is convex quadratic minimize a convex quadratic function over a polyhedron Quadratically Constrained Quadratic Program (QCQP) minimize 1 2 x T P 0 x + q0 T x + r 0 subject to 1 2 x T P i x + qi T x + r i 0, i = 1,..., m Ax = b P i S n +, objective and constraints are convex quadratic if P 1,..., P m S n ++, feasible region is intersection of m ellipsoids and an affine set

Brief History of Convex Optimization Theory (convex analysis): 1900 1970 Algorithms 1947: simplex algorithm for linear programming (Dantzig) 1960s: early interior-point methods (Fiacco & McCormick, Dikin,... ) 1970s: ellipsoid method and other subgradient methods 1980s: polynomial-time interior-point methods for linear programming (Karmarkar 1984) late 1980s now: polynomial-time interior-point methods for nonlinear convex optimization (Nesterov & Nemirovski 1994) Applications before 1990: mostly in operations research; few in engineering since 1990: many new applications in engineering (control, signal/image processing, communications, circuit design,...); new problem classes (semidefinite and second-order cone programming)

Second-Order Cone Programming minimize subject to f T x A i x + b i 2 ci T x + d i, i = 1,..., m Fx = g (A i R n i n, F R p n ) inequalities are called second-order cone (SOC) constraints: (A i x + b i, ci T x + d i ) second order cone in R n i +1 for n i = 0, reduces to an LP; if c i = 0, reduces to a QCQP more general than QCQP and LP

Semidefinite Programming (SDP) with F i, C S n minimize subject to b T x x 1 F 1 + x 2 F 2 + + x n F n C inequality constraint is called linear matrix inequality (LMI) Primal form minimize C, X subject to F i, X = b i, i = 1,..., m X 0 refer to this problem as a primal semidefinite program.

Unconstrained Minimization minimize f (x) f convex, twice continuously differentiable (hence domf open) we assume optimal value p = inf x f (x) is attained (and finite) Unconstrained Minimization Methods produce sequence of points x k domf, k = 0, 1,... with f (x k ) p can be interpreted as iterative methods for solving optimality condition f (x ) = 0

Descent Methods x k+1 = x k + t k x k with f (x k+1 ) < f (x k ) x is the step, or search direction; t is the step size, or step length from convexity, f (x + t x) < f (x) implies f (x) T x < 0 (i.e., x is a descent direction) General Descent Method given a starting point x domf repeat 1. Determine a descent direction x 2. Line search. Choose a step size t > 0 3. Update. x new = x + t x until stopping criterion is satisfied.

Line Search Types exact line search: t = arg min t>0 f (x + t x) backtracking line search (with parameters α (0, 1/2), β (0, 1)) starting at t = 1, repeat t = βt until f (x + t x) < f (x) + αt f (x) T x

Gradient Descent Methods General Descent Method with x = f (x) given a starting point x domf repeat 1. x = f (x) 2. Line search. Choose step size t via exact or backtracking line search 3. Update. x new = x + t x until stopping criterion is satisfied. stopping criterion usually of the form f (x) 2 ɛ convergence result: for strongly convex f ( 2 f (x) δi, δ > 0), f (x k ) p c k (f (x 0 ) p ) c k (0, 1) depends on δ, x 0, line search type simple, but often very slow; rarely used in practice

Newton Step x nt = 2 f (x) 1 f (x) Interpretations x + x nt minimizes second order approximation ˆf (x + v) = f (x) + f (x) T v + 1 2 v T 2 f (x)v x + x nt solves linearized optimality condition f (x + v) ˆf (x + v) = f (x) + 2 f (x)v = 0

Newton Decrement a measure of the proximity of x to x Properties λ(x) = ( f (x) T 2 f (x) 1 f (x) ) 1/2 gives an estimate of f (x) p, using quadratic approximation ˆf : 1 f (x) infˆf (y) = y 2 λ(x)2 equal to the norm of the Newton step in the quadratic Hessian norm λ(x) = ( x T nt 2 f (x) x nt ) 1/2 directional derivative in the Newton direction: f (x) T x nt = λ(x) 2

Newton s Method given a starting point x domf, tolerance ɛ > 0 repeat 1. Compute the Newton step and decrement. x nt = 2 f (x) 1 f (x); λ 2 = f (x) T 2 f (x) 1 f (x) 2. Stopping criterion. quit if λ 2 /2 ɛ 3. Line search. Choose step size t by backtracking line search 4. Update. x = x + t x nt

Classical Convergence Analysis assumptions f strongly convex with constant δ 2 f is Lipschitz continuous with constant L > 0: 2 f (x) 2 f (y) 2 L x y 2 (L measures how well f can be approximated by a quadratic function) outline: there exist constants η (0, δ 2 /L), γ > 0 such that if f (x) 2 η, then f (x k+1 ) f (x k ) γ if f (x) 2 < η, then ( ) 2 L L 2δ 2 f (x k+1 ) 2 2δ 2 f (x k ) 2

damped Newton phase ( f (x) 2 η) most iterations require backtracking steps function value decreases by at least γ if p >, this phase ends after at most (f (x 0 ) p )/γ iterations quadratically convergent phase ( f (x) 2 > η) all iterations use step size t = 1 f (x) 2 converges to zero quadratically: if f (x) 2 < η, then ( ) 2 l k L L 2δ 2 f (x l ) 2 2δ 2 f (x k ) 2 ( ) 2 l k 1, l k 2

conclusion: number of iterations until f (x) p ɛ is bounded above by f (x 0 ) p + log γ 2 log 2 (ɛ 0 /ɛ) γ, ɛ 0 are constants that depend on δ, L, x 0 second term is small and almost constant for practical purposes in practice, constants δ, L (hence γ, ɛ 0 ) are usually unknown provides qualitative insight in convergence properties (i.e., explains two algorithm phases)

Equality Constrained Minimization minimize subject to f (x) Ax = b f convex, twice continuously differentiable A R m n with ranka = m we assume p is finite and attained Optimiality Conditions: x is optimal iff there exists a v such that f (x ) + A T v = 0, Ax = b

Newton Step Newton step x nt of f at feasible x is given by solution v of [ ] [ ] 2 f (x) A T v = [ f (x); 0 ] A 0 w Interpretations x nt solves second order approximation (with variable v) minimize subject to ˆf (x + v) = f (x) + f (x) T v + 1 2 v T 2 f (x)v A(x + v) = b x nt equations follow from linearizing optimality conditions f (x + v) + A T w f (x) + 2 f (x)v + A T w = 0, A(x + v) = b

Newton s Method with Equality Constraints given a starting point x domf with Ax = b, tolerance ɛ > 0 repeat 1. Compute the Newton step and decrement x nt, λ(x) 2. Stopping criterion. quit if λ 2 /2 ɛ 3. Line search. Choose step size t by backtracking line search 4. Update. x = x + t x nt a feasible descent method: x k feasible and f (x k+1 ) < f (x k )

Interior-Point Methods Inequality Constrained Minimization minimize subject to f 0 (x) f i (x) 0, i = 1,..., m Ax = b f i convex, twice continuously differentiable A R p n with ranka = p we assume p is finite and attained we assume problem is strictly feasible: there exists x with x domf 0, f i ( x) < 0, i = 1,..., m, A (x) = b Examples: LP, QP, QCQP, SDP, SOCP

Logarithmic Barrier reformulation by using indicator function minimize f 0 + m i=1 I(f i(x)) subject to Ax = b, where I(u) = 0 if u 0, I(u) = otherwise approximation via logarithmic barrier minimize subject to f 0 (1/t) m i=1 log( f i(x)) Ax = b an equality constrained problem for t > 0, (1/t) log( u) is a smooth approximation of I( ) approximation improves as t

Central path for t > 0, define x (t) as the solution of minimize subject to tf 0 (x) m i=1 log( f i(x)) Ax = b (for now, assume x (t) exists and is unique for each t > 0) central path is {x (t) t > 0}

Barrier Method given strictly feasible x, t = t 0 > 0, µ > 1, tolerance ɛ > 0 repeat 1. Centering step. Compute x (t) by minimizing tf 0 m i=1 log( f i(x)), subject to Ax = b 2. Update. x = x (t) 3. Stopping criterion. quit if m/t ɛ 4. Increase t. t = µt terminates with f 0 (x) p ɛ (stopping criterion follows from f 0 (x (t)) p m/t) centering usually done using Newton s method, starting at current x choice of µ involves a trade-off: large µ means fewer outer iterations, more inner (Newton) iterations;

Convergence Analysis number of outer (centering) iterations: exactly log(m/(ɛt 0 )) log µ plus the initial centering step (to compute x (t 0 )) centering problem m minimize tf 0 (x) log( f i (x)) see convergence analysis of Newton s method classical analysis requires strong convexity, Lipschitz condition i=1

Primal-dual interior-point methods more efficient than barrier method when high accuracy is needed update primal and dual variables at each iteration; no distinction between inner and outer iterations often exhibit superlinear asymptotic convergence search directions can be interpreted as Newton directions for modified KKT conditions can start at infeasible points cost per iteration same as barrier method

Conclusions mathematical optimization problems in engineering design, data analysis and statistics, economics, management,..., can often be expressed as mathematical optimization problems tractability roughly speaking, tractability in optimization requires convexity algorithms for nonconvex optimization find local (suboptimal) solutions, or are very expensive surprisingly many applications can be formulated as convex problems

Theoretical consequences of convexity local optima are global extensive duality theory (systematic way of deriving lower bounds on optimal value, necessary and sufficient optimality conditions) solution methods with polynomial worst-case complexity theory Practical consequences of convexity (most) convex problems can be solved globally and efficiently interior-point methods require 20 80 steps in practice basic algorithms (e.g., Newton, barrier method,...) are easy to implement and work well for small and medium size problems (larger problems if structure is exploited) more and more high-quality implementations of advanced algorithms and modeling tools are becoming available Reference Convex Optimization by Boyd and Vandenberghe (http://www.stanford.edu/ boyd/cvxbook/)

Thank you! II. l 1 -Minimization: Sparse Model

Content Underdetermined System of Linear Equations Sparse Model l 1 -Minimization & l 1 -regularization

Underdetermined System of Linear Equations Many problems arising in signal/image processing and data mining/classification often entail finding a solution of an underdetermined system of linear equations: Ax = b, where A R m n (m < n) and b R m, or a best" solution of a inconsistent system of linear equations: Ax b, i.e., the objective is to find a best" guess of unknown u from linear measurement model b = Au + η, where η is measurement error.

Least Squares approach Minimizing the 2-norm of the residual Ax b yields the well-known linear least squares problem: min x Ax b 2 2 In many cases, components of x are parameters that must lie within certain bounds: min x Ax b 2 2 s.t. l x u, where l u (possibly with or components). This is a bound-constrained convex quadratic program.

There is considerable evidence that these problems arising in signal/image processing often have sparse solutions. Advances in finding sparse solutions to underdetermined systems have energized research on such signal and image processing problems. Signal Analysis Signal/image Compression Signal/image Denoising Inverse Problems: Even more generally, suppose that we observe not b, but a noisy indirect measurement of it, b = Hb + η ( b = HAu + η). Here the linear operator H generates blurring, masking, or some other kind of degradation, and η is noise as before. Compressed Sensing: For signals which are sparsely generated, one can obtain good reconstructions from reduced numbers of measurements - thereby compressing the sensing process rather than the traditionally sensed data.

Morphological Component Analysis (MCA): Suppose that the observed signal is a superposition of two different subsignals b 1, b 2 (i.e., b = b 1 + b 2 ). Can we separate the two sources? Such source separation problems are fundamental in the processing of acoustic signals, for example, in the separation of speech from impulsive noise by independent component analysis (ICA) algorithms. An appealing image processing application that relies on MCA is inpainting, where missing pixels in an image are filled in, based on a sparse representation of the existing pixels. MCA is necessary because the piecewise smooth (cartoon) and texture contents of the image must be separated as part of this recovery process. image decomposition to cartoon and texture This work lies at the intersection of signal processing and applied mathematics, and arose initially from the wavelets and harmonic analysis research communities. Sparsity of representation is key to widely used techniques of transform-based image compression. Transform sparsity is also a driving factor for other important signal and image processing problems, including image denoising and image deblurring.

Sparsity Optimization (l 0 -norm minimization): min x x 0 s.t Ax = b (when the solution is sparse and the columns of A sufficiently incoherent) NP hard!

Compressed Sensing: Is it possible to reconstruct a signal accurately from a few observed samples (measurements)? Impossible in general, but if the signal is known to be sparse in some basis, then accurate recovery is possible by l 1 -minimization (known as Basis Pursuit) (Candés et al. 06, Donoho 06): min x x 1 s.t Ax = b, where A R m n (m n) satisfies certain restricted isometry property. Definition: A matrix A satisfies the restricted isometry property of order s with constant δ s (0, 1) if (1 δ s ) x 2 2 Ax 2 2 (1 + δ s ) x 2 2

Linear Programming reformulation: where e = (1, 1,..., 1) R n. min x +,x s.t et x + + e T x A(x + x ) = b x +, x 0,

From Exact to Approximate Solution: If the observation b is contaminated with noise, i.e., b = Au + η, then Ax = b might not be feasible and so an appropriate norm of the residual Ax b should be minimized or considered as a constraint. 1. l 1 -minimization with quadratic constraints: where λ > 0. min x x 1 s.t Ax b 2 λ, 2. Basis pursuit denoising (l 1 -regularized linear least squares): where µ > 0. 1 min x 2 Ax b 2 + µ x 1,

3. LASSO (least absolute shrinkage and selection operator): where τ > 0. min x 1 2 Ax b 2 2 s.t x 1 τ, 4. Dantzig selector (l 1 -minimization with bounded residual correlation): where σ > 0. min x x 1 s.t A T (Ax b) σ,