Gauge optimization and duality

Similar documents
Lecture: Duality of LP, SOCP and SDP

Convex Optimization M2

Convex Optimization & Lagrange Duality

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

5. Duality. Lagrangian

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Convex Optimization Boyd & Vandenberghe. 5. Duality

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lecture: Duality.

subject to (x 2)(x 4) u,

Convex Optimization and l 1 -minimization

Lecture Notes 9: Constrained Optimization

Lagrangian Duality and Convex Optimization

CS-E4830 Kernel Methods in Machine Learning

Constrained optimization

Additional Homework Problems

Making Flippy Floppy

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma

Enhanced Compressive Sensing and More

EE/AA 578, Univ of Washington, Fall Duality

Optimization for Machine Learning

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Applications of gauge duality in robust principal component analysis and semidefinite programming

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Applications of Linear Programming

12. Interior-point methods

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

EE 227A: Convex Optimization and Applications October 14, 2008

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise

Optimal Value Function Methods in Numerical Optimization Level Set Methods

Homework Set #6 - Solutions

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Convex Optimization and Modeling

4. Algebra and Duality

ICS-E4030 Kernel Methods in Machine Learning

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture 18: Optimization Programming

Lagrangian Duality Theory

15. Conic optimization

Convex Optimization M2

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

EE364a Review Session 5

Making Flippy Floppy

On the Method of Lagrange Multipliers

Lecture 2: Convex Sets and Functions

Linear and non-linear programming

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Machine Learning. Support Vector Machines. Manfred Huber

1 Sparsity and l 1 relaxation

Sparse Recovery Beyond Compressed Sensing

LECTURE 13 LECTURE OUTLINE

Support Vector Machines: Maximum Margin Classifiers

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

Variational Image Restoration

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Lagrangian Duality. Richard Lusby. Department of Management Engineering Technical University of Denmark

Optimal Value Function Methods in Numerical Optimization Level Set Methods

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis

Lecture 7: Convex Optimizations

Module 04 Optimization Problems KKT Conditions & Solvers

Acyclic Semidefinite Approximations of Quadratically Constrained Quadratic Programs

Sparse Solutions of an Undetermined Linear System

Written Examination

Duality Theory of Constrained Optimization

4. Convex optimization problems

Convex optimization problems. Optimization problem in standard form

Math 273a: Optimization Overview of First-Order Optimization Algorithms

10-725/ Optimization Midterm Exam

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Sparse Optimization Lecture: Basic Sparse Optimization Models

ROBUST BLIND SPIKES DECONVOLUTION. Yuejie Chi. Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 43210

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Towards a Mathematical Theory of Super-resolution

Lecture 8. Strong Duality Results. September 22, 2008

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

SPARSE SIGNAL RESTORATION. 1. Introduction

Support Vector Machines

Constrained Optimization and Lagrangian Duality

4. Convex optimization problems

Tutorial on Convex Optimization: Part II

Lecture: Convex Optimization Problems

A Brief Review on Convex Optimization

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Machine Learning And Applications: Supervised Learning-SVM

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Linear and Combinatorial Optimization

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

Provable Alternating Minimization Methods for Non-convex Optimization

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Transcription:

1 / 54 Gauge optimization and duality Junfeng Yang Department of Mathematics Nanjing University Joint with Shiqian Ma, CUHK September, 2015

2 / 54 Outline Introduction Duality Lagrange duality Fenchel duality Gauge duality

3 / 54 Outline Introduction Duality Lagrange duality Fenchel duality Gauge duality

4 / 54 Gauge function Definition (gauge function) A function f : R n R {+ } is called a gauge function if 1. f is convex: for all x, y R n and α (0, 1), f (αx + (1 α)y) αf (x) + (1 α)f (y); 2. f is positively homogeneous: for all x R n and λ > 0, f (λx) = λf (x); 3. f is nonnegative: f (x) 0 for all x; 4. f vanishes at the origin: f (0) = 0. Norms and semi-norms are particular gauge functions.

5 / 54 The problem We are interested in the following gauge optimization min{κ(x) ρ(ax b) ε}, x κ and ρ are gauge functions, and ρ 1 (0) = 0; A, b and ε > 0 are given data.

Sparsity 6 / 54

7 / 54 Sparse solutions Problem: find a sparse solution x min{ x 0 Ax b}, x or, find x such that the residue is sparse min x Ax b 0. Drawback: Combinatorial optimization, difficult.

8 / 54 Sparse representation represent a signal y as a superposition of elementary signal atoms: y = i φ ix i or y = Φx; unique representation if the basis is orthonormal; overcomplete dictionaries have more flexibility, but representation is not unique; a good" transform leads to a fast decay in coefficients.

Sparse representation 9 / 54

Sparse representation 10 / 54

Sparse representation 11 / 54

Sparse representation 12 / 54

Sparse representation 13 / 54

Sparse representation 14 / 54

Sparse representation 15 / 54

Sparse representation 16 / 54

17 / 54 LASSO/BP/BPDN The LASSO model (Tibshirani, 1996) min{ Ax b 2 x 1 k}. x The BP/BPDN model (Donoho et al., 1998; Candes et al., 2006) min{ x 1 Ax b 2 ε}. x Joint sparsity (Malioutov-Cetin-Wilsky 2003) { } min X(:, j) 2 AX B 2 ε. X j The last two are gauge optimization.

18 / 54 Matrix completion Seek a matrix X R m n with some desired structure (e.g., low rank) that matches certain observations, possibly noisy. min{κ(x) AX b ε}, X where A is a linear mapping (e.g. observations of certain elements of X). Setting κ as the nuclear norm, i.e., sum of all singular values, promotes low rank solutions. Applications in recommender systems, e.g., Netflix, Amazon. (Recht et al., 2010)

19 / 54 Phase retrieval In phase retrieval, one recovers the phase information of a signal, e.g., an image, from magnitude-only measurements (Candes et al. 2012, Waldspurger 2015). Application in X-ray crystallography, used to image the molecular structure of a crystal (Harrison, 1993). unknown signal x C n ; measurements are given by b k = x, a k 2 for some vectors a k that encode the waveforms used to illuminate the signal, k = 1,..., m.

20 / 54 Phase Lift These measurements can be understood as b k = xx, a k a k = X, A k of the lifted signal X := xx, where A k := a k ak rank-1 measurement matrix. The Phase Lift convex model is the k-th lifted min X S n{ I, X + ι 0(X) AX b ε} also falls into the class of gauge optimization problems.

21 / 54 Image reconstruction In image reconstruction, one recovers an unknown image from a set of linear measurements b = Ax + noise. The total variation model (Rudin-Osher-Fatemi, 1992) min{tv(x) Ax b ε}, x where TV(x) := i D ix and A is identity (denoising) blurring operator (deconvolution) partial convolution/wavelet (inpainting) partial Fourier (MRI)

Ill-possed In deconvolution/deblurring, A is a convolution matrix (square and invertible). The least squares estimation (LSE) of x is x LSE = arg min Ax f x Blurry&Noisy Direct inverse Left: f ; Right: x LSE. 22 / 54

23 / 54 Ill-possed In MRI, observed data satisfy f = PF x + noise: F: 2D Fourier matrix P: projection matrix Left: k-space sampling (9.36%); Right: back projection.

24 / 54 RGB image recovery Model: min{mtv(x) Kx f ε}. x SNR 6.25dB SNR: 17.53dB, CPU 2.78s, It: 10, Cross-channel blur, Gaussian noise, 256 256.

25 / 54 Deconvolution with impulsive noise Model: min{mtv(x) Kx f 1 ε}. x Corruption: 60% SNR 11.74dB, CPU 18.06s, It 35, Blur: cross-channel; Noise: Salt & Pepper 60% (256 256).

26 / 54 Wavelet inpainting Model: min{tv(x) PWx f ε}. x Back Projection. 50% data ADM. SNR: 29.1dB, CPU: 46.2s

27 / 54 MRI reconstruction Model: min x {TV(x) + τ Wx 1 PFx f ε}. Left: original (256 256); Middle: k-space samples (9.36%, white noise 10 3 ); Right: recovered by ADM (relative error: 0.48%, CPU: 3s).

28 / 54 Recovery from incomplete data Model: min{tv(x) PKx f ε}. x Lena: 512 512; Blur: Gaussian, 15 15; White noise: 10 3 ; SRs: 5%, 3%, 1%; SNRs: 13.32, 11.98, 6.14dB; CPU: 43, 46, 57s.

29 / 54 Robust linear regression Given a set of feature vectors a i R n and outcomes b i R, i = 1,..., m, find weights x that predict the outcome accurately, i.e., Ax b. The robust linear regression problem min x Ax b 1 is less sensitive to outliers. Equivalent gauge problem: min x,y { y 1 Ax + y = b}.

30 / 54 Semidefinite programming The semidefinite programming (SDP) problem min X { C, X AX = b, X 0}. can also be reformulated as gauge optimization. To see this, chose one dual feasible point y so that Ĉ := C A y 0. Then, this SDP is equivalent to a gauge optimization min { Ĉ, X + ι 0(X) AX = b}. X

31 / 54 Outline Introduction Duality Lagrange duality Fenchel duality Gauge duality

32 / 54 Lagrange duality Given a general nonlinear programming problem, referred to as primal problem, there exists another closely related nonlinear programming problem, named Lagrange dual problem. Under suitable regularity conditions, such as convexity and/or constraint qualifications, the primal and the dual problems have equal optimal objective values.

33 / 54 Lagrange duality Let f : R n R, h : R n R m, g : R n R q and X R n. The nonlinear constrained optimization problem: p := inf{f (x) h(x) = 0, g(x) 0, x X }. x The Lagrange function L(x, λ, µ) = f (x) + λ, h(x) + µ, g(x). The Lagrange dual problem d := sup {g(λ, µ) µ 0}, λ,µ where g(λ, µ) := inf x {L(x, λ, µ) x X} is the dual objective.

34 / 54 Lagrange duality The vectors λ and µ are the Lagrange multipliers; The Lagrange multipliers corresponding to inequality constraints are restricted to be nonnegative, and those corresponding to equality constraints are unrestricted; The dual problem is a concave maximization problem; Weak duality d p ; Strong duality d = p is in general not true. For convex problems (X convex, f and g convex, and h affine), if Slater s condition is satisfied, then duality gap vanishes; Different Lagrange dual problems can be derived depending on different formulations of the problem.

35 / 54 Example Consider l 1 -regularized least square problem Two equivalent formulations min x x 1 + 1 2 Ax b 2. min x,z min x,z { x 1 + 1 2 Az b 2 x = z}, { x 1 + 1 2 z b 2 Ax = z}. Their Lagrange dual problems max y 1 max A T y 1 min z y, z + 1 2 Az b 2, b, y 1 2 y 2.

36 / 54 Fenchel duality Let f : R n R {+ }. Consider unconstrained optimization min f (x). x Any constraints must be incorporated into the objective function via the indicator function. The key ingredient of Fenchel duality is that any closed convex function is the pointwise supremum of the set of affine functions which minorize it. This leads to the Fenchel conjugate function: f (y) = sup{ y, x f (x)}, y. x

37 / 54 Fenchel duality Assume that φ(x, y) is a perturbation function of f (x), i.e., φ(x, 0) = f (x), x, and define h(y) := inf φ(x, y), y. x Then, minimizing f is equivalent to evaluating h(0). Can show that inf y φ (0, y) = h (0) h(0) = inf x φ(x, 0). The problem of evaluating h (0) is called the Fenchel dual problem. Strong duality h (0) = h(0) holds if and only if inf x sup y { x, y h(y)} = sup y inf x { x, y h(y)}.

38 / 54 Fenchel duality Theorem (Fenchel duality theorem) Let f : R n R {+ } and g : R n R {+ } be closed proper convex functions. Then, under regularity conditions it holds that inf x {f (x) g(x)} = sup {g (y) f (y)}, y where f is the convex conjugate of f and g is the concave conjugate of g, i.e., f (y) := sup{ y, x f (x)}, x g (y) := inf{ y, x g(x)}. x

39 / 54 Geometric interpretation 150 [slop, dual obj] = [-2.05,-109.28] 100 50 0-50 -100-150 -150-100 -50 0 50 100 150

40 / 54 Geometric interpretation 150 [slop, dual obj] = [-1.60,-73.00] 100 50 0-50 -100-150 -150-100 -50 0 50 100 150

41 / 54 Geometric interpretation 150 [slop, dual obj] = [-1.10,-38.62] 100 50 0-50 -100-150 -150-100 -50 0 50 100 150

42 / 54 Geometric interpretation 150 [slop, dual obj] = [-0.60,-10.50] 100 50 0-50 -100-150 -150-100 -50 0 50 100 150

43 / 54 Geometric interpretation 150 [slop, dual obj] = [-0.10,11.38] 100 50 0-50 -100-150 -150-100 -50 0 50 100 150

44 / 54 Geometric interpretation 150 [slop, dual obj] = [1.40,39.50] 100 50 0-50 -100-150 -150-100 -50 0 50 100 150

45 / 54 Geometric interpretation 150 [slop, dual obj] = [1.90,36.38] 100 50 0-50 -100-150 -150-100 -50 0 50 100 150

46 / 54 Geometric interpretation 150 [slop, dual obj] = [2.40,27.00] 100 50 0-50 -100-150 -150-100 -50 0 50 100 150

47 / 54 Gauge optimization Definition (gauge optimization) Minimizing a closed gauge function κ over a closed convex set C is called a gauge optimization. That is Definition (feasible point) min x {κ(x) x C}. (GP) A point x C is called a feasible point of GP; If x C and κ(x) = +, then x is essentially infeasible; If x C and κ(x) < +, then x is strongly feasible; Similar definitions for the gauge dual problem defined below.

48 / 54 Gauge optimization Definition (antipolar set) The antipolar of a closed convex set C is given by C = {y R n x, y 1 for all x C}. Definition (polar function) The polar function of a gauge κ is defined as κ (y) = sup{ x, y κ(x) 1}. In particular, the polar function of a norm is its dual norm. Definition (gauge dual) The nonlinear gauge dual problem of (GP) is defined by min y {κ (y) y C }. (GD)

49 / 54 Gauge duality Theorem (weak duality, Freund 1987) Let p and d be respectively the optimal values for GP and GD. 1. If x and y are, resp., strongly feasible for GP and GD, then κ(x)κ (y) 1, and hence p d 1. 2. If p = 0, then d = +, i.e., GD is essentially infeasible; 3. If d = 0, then p = +, i.e., GP is essentially infeasible.

50 / 54 Gauge duality Theorem (strong duality, Freund 1987) Let p and d be respectively the optimal values for GP and GD. If (ri C) (ri dom κ) and (ri C ) (ri dom κ ), then p d = 1, and each program attains its optimum.

51 / 54 Gauge duality The Fenchel dual of GP is given by d f = max{ ι y C ( y) k (y) 1}. (FD) Theorem (weak duality, Friedlander et al., 2014) Suppose that dom k C. Then Furthermore, p d f = 1/d > 0. 1. if y solves FD, then y /d f solves GD; 2. if y solves GD and d > 0, then y /d solves FD.

52 / 54 Gauge duality Theorem (strong duality, Friedlander et al., 2014) 1. Suppose that dom k C and (ri dom κ) (ri C), then p d = 1 and GD attains its optimal value. 2. Suppose that (ri dom κ) (ri C) and (ri dom κ ) (ri C ), then p d = 1 and both GP and GD attain their optimal values.

53 / 54 Gauge duality Consider the gauge optimization Lagrange dual Gauge dual min{κ(x) ρ(ax b) ε}. x max{b T y ερ (y) κ (A T y) 1} y min y {κ (A T y) b T y ερ (y) 1} In general, problems with simper constraints and complex objective are more preferred than those with simpler objective yet complex constraints.

54 / 54 Acknowledgements Some pictures are taken from Michael P. Friedlander s SIAM Optimization talk in 2011.