Making Flippy Floppy

Size: px

Start display at page:

Download "Making Flippy Floppy"

Emily Ford
5 years ago
Views:

1 Making Flippy Floppy James V. Burke UW Mathematics Aleksandr Y. Aravkin IBM, T.J.Watson Research Michael P. Friedlander UBC Computer Science Vietnam National University, April 2013

2 Outline Application: Imaging: Migration Velocity Analysis BPDN & LASSO (van der Berg and Friedlander) SPGL1: Probing the pareto frontier for basis pursuit solutions SIAM J. Sci. Comput. 31(2008), Sparse optimization with least-squares constraints SIOPT 21(2011), Making Flippy Floppy Convex Problems Duality and Variational Analysis Dervatives for Piecewise Linear-Quadratic Functions Functions 2 / 33

3 Imaging Application Migration Velocity Analysis

Smallest 2D images: variable size 1/2 million Target 3D images: variable size billions Depth ( 24 meters) 20 40 60 80 100 120 50 100

4 Imaging: Migration Velocity Analysis After collecting seismic data, and having a smooth estimate of the velocity model in the subsurface, high-quality images are obtained by solving an optimization problem for the model update. Smallest 2D images: variable size 1/2 million Target 3D images: variable size billions Depth ( 24 meters) Lateral ( 24 meters) Depth ( 24 meters) Lateral ( 24 meters) / 33

1 0.05 0 0.05 0.1 0.15 0.1 C Curvelet transform 20 0.

5 Sparse Formulation for Migration BP σ : min x 1 st r J C x 2 σ Problem Specification r m J residual at smooth model estimate smooth velocity estimate Jacobian of forward model Depth ( 24 meters) Lateral ( 24 meters) C Curvelet transform x σ curvelet coefficients of the update error level Depth ( 24 meters) Results Improved recovery compared to least-squares (LS) inversion Lateral ( 24 meters) 4 / 33

6 BPDN & LASSO

7 BPDN & LASSO A R m n with m << n Basis Pursuit (Mallet and Zhang (1993), Chen, Donoho, Saunders (1998)) BP: min x 1 st Ax = b Basis Pursuit De-Noising (BPDN) (Chen, Donoho, Saunders (1998)) BP σ : min x 1 st b Ax 2 σ LASSO (Least Absolute Shrinkage and Selection Operator) (Tibshirani (1996)) LS τ : min 1 2 b Ax 2 2 st x 1 τ Lagrangian formulation QP λ : min b Ax + λ x 1 5 / 33

8 BPDN & LASSO A R m n with m << n Basis Pursuit (Mallet and Zhang (1993), Chen, Donoho, Saunders (1998)) BP: min x 1 st Ax = b Basis Pursuit De-Noising (BPDN) (Chen, Donoho, Saunders (1998)) BP σ : min x 1 st b Ax 2 σ LASSO (Least Absolute Shrinkage and Selection Operator) (Tibshirani (1996)) LS τ : min 1 2 b Ax 2 2 st x 1 τ Lagrangian formulation QP λ : min b Ax + λ x 1 Candés, Romberg, and Tao (2006): BP gives least support solutions (fewest non-zeros). 5 / 33

9 BPDN & LASSO A R m n with m << n Basis Pursuit (Mallet and Zhang (1993), Chen, Donoho, Saunders (1998)) BP: min x 1 st Ax = b Basis Pursuit De-Noising (BPDN) (Chen, Donoho, Saunders (1998)) BP σ : min x 1 st b Ax 2 σ Target Problem (SPGl 1 ) LASSO (Least Absolute Shrinkage and Selection Operator) (Tibshirani (1996)) LS τ : min 1 2 b Ax 2 2 st x 1 τ Lagrangian formulation QP λ : min b Ax + λ x 1 Candés, Romberg, and Tao (2006): BP gives least support solutions (fewest non-zeros). 5 / 33

10 BPDN & LASSO A R m n with m << n Basis Pursuit (Mallet and Zhang (1993), Chen, Donoho, Saunders (1998)) BP: min x 1 st Ax = b Basis Pursuit De-Noising (BPDN) (Chen, Donoho, Saunders (1998)) BP σ : min x 1 st b Ax 2 σ Target Problem (SPGl 1 ) LASSO (Least Absolute Shrinkage and Selection Operator) (Tibshirani (1996)) LS τ : min 1 2 b Ax 2 2 st x 1 τ Easiest to solve (SPG) Lagrangian formulation QP λ : min b Ax + λ x 1 Candés, Romberg, and Tao (2006): BP gives least support solutions (fewest non-zeros). 5 / 33

11 BPDN & LASSO A R m n with m << n Basis Pursuit (Mallet and Zhang (1993), Chen, Donoho, Saunders (1998)) BP: min x 1 st Ax = b Optimal Value = τ BP Basis Pursuit De-Noising (BPDN) (Chen, Donoho, Saunders (1998)) BP σ : min x 1 st b Ax 2 σ Target Problem (SPGl 1 ) LASSO (Least Absolute Shrinkage and Selection Operator) (Tibshirani (1996)) LS τ : min 1 2 b Ax 2 2 st x 1 τ Easiest to solve (SPG) Lagrangian formulation QP λ : min b Ax + λ x 1 Candés, Romberg, and Tao (2006): BP gives least support solutions (fewest non-zeros). 5 / 33

12 SPGL1: PROBING THE PARETO FRONTIER FOR BASIS PURSUIT SOLUTIONS van den Berg and Friedlander (2008)

13 Optimal Value Function BP σ : min x 1 st 1 2 Ax b 2 2 σ LS τ : min 1 2 Ax b 2 2 st x 1 τ The key is the value function v(τ) := 1 min x 2 Ax b τ v(τ) = 1 2 Axτ b 2 2 (τ, σ) Algorithm 1 Evaluate v(τ) by solving LS τ inexactly projected gradient 2 Compute v (τ) inexactly duality theory 3 Solve v(τ) = σ Inexact Newton s method τ BP 6 / 33

14 Optimal Value Function: Variational Properties v(τ) := 1 min x 2 Ax b τ Theorem [Berg & F., 2008, 2011] v(τ) 1 v(τ) is convex 2 For all τ (0, τ BP ) v is continuously differentiable v (τ) = λ τ with λ τ = A T r τ r τ = Ax τ b where x τ solves LS τ τ BP 7 / 33

15 Root Finding: v(τ) = σ Approximately solve minimize 1 2 Ax b 2 2 subj to x 1 τ k Newton update τ k+1 τ k (v k σ)/v k Early termination monitor duality gap / 33

16 EXTENSIONS Sparse Optimization with Least-Squares Constraints van den Berg and Friedlander (2011)

17 Gauge Functions U R n non-empty, closed and convex (usually, 0 U ). The gauge functional associated with U is given by γ (x U ) := inf {t x tu, t 0}. Examples: 1 U = B the closed unit ball for the norm γ (x B) = x 2 U = K a convex cone γ (x K ) = δ (x K ) := 3 U = B K γ (x B K ) = x + δ (x K ) { 0, x K + x / K 9 / 33

18 Optimal Value Function v(τ) = 1 2 Axτ b 2 2 BP σ : min γ (x U ) st 1 2 Ax b 2 2 σ LS τ : min 1 2 Ax b 2 2 st γ (x U ) τ The key is the value function v(τ) := 1 min γ(x U ) τ 2 Ax b 2 2 (τ, σ) Algorithm 1 Evaluate v(τ) by solving LS τ inexactly projected gradient 2 Compute v (τ) inexactly duality theory 3 Solve v(τ) = σ Inexact Newton s method τ = γ (xτ U ) 10 / 33

19 Applications for Guage Functionals Sparse optimization with least-squares constraints van der Berg and Friedlander (2011) Non-negative Basis Pursuit Source Localization Mass Spectrometry Nuclear-norm Minimization Matrix Completion Problems 11 / 33

20 HOW FAR DOES THE FLIPPING IDEA GO?

21 How far does flipping go? ψ i : X R n R, i = 1, 2, arbitrary functions and X an arbitrary set. epi(ψ) := {(x, µ) ψ(x) µ} δ ((x, µ) epi(ψ)) = 0 if (x, µ) epi(ψ); + else. v 1 (σ) := inf x X ψ 1(x) + δ ((x, σ) epi(ψ 2 )) v 2 (τ) := inf x X ψ 2(x) + δ ((x, τ) epi(ψ 1 )) P 1,2 (σ) P 2,1 (τ) S 1,2 := { σ R = arg min P 1,2 (σ) {x X ψ 2 (x) = σ } } Then, for every σ S 1,2, (a) v 2 (v 1 (σ)) = σ, and (b) arg min P 1,2 (σ) = arg min P 2,1 (v 1 (σ)) {x X ψ 1 (x) = v 1 (σ)}. Moreover, S 2,1 = {v 1 (σ) σ S 1,2 } and {(σ, v 1 (σ)) σ S 1,2 } = {(v 2 (τ), τ) τ S 2,1 }. 12 / 33

22 Making Flippy Floppy (Target Problem) (Easier to solve) v 1 (σ) := inf x X ψ 1(x) + δ ((x, σ) epi(ψ 2 )) v 2 (τ) := inf x X ψ 2(x) + δ ((x, τ) epi(ψ 1 )) P 1,2 (σ) P 2,1 (τ) GOAL: Solve P 1,2 (σ) by solving P 2,1 (τ) for perhaps several values of τ. The van den Berg-Friedlander method: Given σ solve the equation v 2 (τ) = σ for τ = τ σ. Then arg min P 2,1 (τ σ ) = arg min P 1,2 (σ). 13 / 33

23 When is the van den Berg-Friedlander method viable? Key considerations: (A) The problem P 2,1 (τ): v 2 (τ) := inf ψ 2(x) + δ ((x, τ) epi(ψ 1 )) x X must be easily and accurately solvable. (B) We must be able to solve equations of the form v 2 (τ) = σ. (C) v 2 (τ) should have reasonable variational properties (continuity, differentiability, subdifferentiability, convexity). 14 / 33

24 When is the van den Berg-Friedlander method viable? Key considerations: (A) The problem P 2,1 (τ): v 2 (τ) := inf ψ 2(x) + δ ((x, τ) epi(ψ 1 )) x X must be easily and accurately solvable. (B) We must be able to solve equations of the form v 2 (τ) = σ. (C) v 2 (τ) should have reasonable variational properties (continuity, differentiability, subdifferentiability, convexity). Fact: v 2 is non-increasing in τ > τ min, where τ min := inf {τ P 2,1 (τ) is feasible and finite valued} τ max := sup {τ P 2,1 (τ) is feasible and finite valued} and so is differentiable a.e. (τ min, τ max ). Possible to apply bisection or golden mean for zero finding. 14 / 33

25 What generalizations should we consider? In the motivating models, we minimize a sparsity inducing regularizing function subject to a linear least-squares misfit measure for the data. Data Misfit Statistical Model Error model Ax b 2 2 b = Ax + ɛ ɛ N (0, I ). Some Alternatives: Statistical Model Misfit Measure Error model Gaussian Laplace Huber Vapnik (ɛ insensitive loss) (a T i x b i ) 2 ɛ i N (0, 1) a T i x b i ɛ i L(0, 1) ρh (ai T x b i ) ɛ i H (0, 1) ρv (ai T x b i ) ɛ i H (0, 1) 15 / 33

26 What generalizations should we consider? In the motivating models, we minimize a sparsity inducing regularizing function subject to a linear least-squares misfit measure for the data. Data Misfit Statistical Model Error model Ax b 2 2 b = Ax + ɛ ɛ N (0, I ). Some Alternatives: Statistical Model Misfit Measure Error model Gaussian Laplace Huber Vapnik (ɛ insensitive loss) Gauss-nik? (a T i x b i ) 2 ɛ i N (0, 1) a T i x b i ɛ i L(0, 1) ρh (ai T x b i ) ɛ i H (0, 1) ρv (ai T x b i ) ɛ i H (0, 1) Hube-nik? 15 / 33

27 Gauss, Laplace, Huber, Vapnik y y x x V (x) = 1 2 x2 Gauss V (x) = x Laplace y y K K x ɛ ɛ x V (x) = Kx 1 2 K2 ; x < K V (x) = 1 2 x2 ; K x K V (x) = Kx 1 2 K2 ; K < x V (x) = x ɛ; x < ɛ V (x) = 0; ɛ x ɛ V (x) = x ɛ; ɛ x Huber Vapnik 16 / 33

28 ROBUSTNESS, SPARSNESS, AND BEYOND! Arbitrary Convex Pairs

29 Assume ρ and φ are closed, proper, and convex P 1 (σ): min φ(x) st ρ(b Ax) σ P 2 (τ): min ρ(b Ax) st φ(x) τ ρ(b Ax) P 1 (σ) is the target problem P 2 (τ) is the easier flipped problem. Problems P 1 (σ) and P 2 (τ) are linked by (τ, σ) v 2 (τ) := min ρ(b Ax) + δ ((x, τ) epi(φ)) φ(x) 17 / 33

30 Assume ρ and φ are closed, proper, and convex P 1 (σ): min φ(x) st ρ(b Ax) σ P 2 (τ): min ρ(b Ax) st φ(x) τ ρ(b Ax) P 1 (σ) is the target problem P 2 (τ) is the easier flipped problem. Problems P 1 (σ) and P 2 (τ) are linked by (τ, σ) v 2 (τ) := min ρ(b Ax) + δ ((x, τ) epi(φ)) φ(x) Broad summary of results: 1 v 2 (τ) is always convex, but may not be differentiable. 2 Solving v 2 (τ) = σ can be solved via an inexact secant method. 3 We have precise knowledge of the variational properties of v 2 (τ) for a large classes of problems P 2 (τ). 17 / 33

31 Convexity of general optimal value functions: Inf-Projection Theorem v 2 (τ) is non-increasing and convex. h(b Ax) Proof: f (x, τ) := ρ(b Ax) + δ ((x, τ) epiφ) is convex in (x, τ). (τ, σ) Therefore, the inf-projection in the variable x is convex in τ: v 2 (τ) = inf f (x, τ). x φ(x) 18 / 33

32 Inexact Secant method for v 2 (τ) = σ Theorem The inexact secant method for finding v 2 (τ) = σ, given by τ k+1 τ k l(τ k) σ m k m k = l(τ k) u(τ k 1 ) (τ k τ k 1 ) 0 < l k v 2 (τ k ) u k h(b Ax) is superlinearly convergent as long as 1 u(τ k ) l(τ k ) shrinks fast enough 2 the left Dini derivative of v 2 (τ) at τ σ is negative. σ τ 2 τ 3 τ 4 τ 5 τ σ φ(x) Tuesday, November 15, / 33

33 Inexact Secant method for v 2 (τ) = σ Theorem The inexact secant method for finding v 2 (τ) = σ, given by τ k+1 τ k l(τ k) σ m k m k = l(τ k) u(τ k 1 ) (τ k τ k 1 ) 0 < l k v 2 (τ k ) u k h(b Ax) is superlinearly convergent as long as 1 u(τ k ) l(τ k ) shrinks fast enough 2 the left Dini derivative of v 2 (τ) at τ σ is negative. Details for u(τ k ) l(τ k ) shrinks fast enough : Given t k 0, σ Tuesday, November 15, 2011 τ 2 τ 3 τ 4 τ 5 τ σ ( ) u k l k min{t k (τ k τ k 1 ), 1}, uk 1 (u k 1 l k ) t k (τ k τ k 1 ). l k φ(x) 19 / 33

34 Duality and the Variational Properties of v 2

35 Basic Definitions Convex Indicator For any convex set C, the convex indicator function for C is { 0, x C, δ (x C ) := +, x / C. Support Functionals For any set C, the support functional for C is Convex Conjugates δ (x C ) := sup x, z. z C For any convex function g(x), the convex conjugate is given by g (y) = sup[ x, y g(x)] = δ ((y, 1) epi(g)). x 20 / 33

36 Basic Definitions The horizon function of h h (z) := sup [h(x + z) h(x)]. x dom(h) The perspective function of h λh(λ 1 z) if λ > 0, h(z, λ) := δ (x 0) if λ = 0, + if λ < 0. The closure of the perspective function of h λh(λ 1 z) if λ > 0, h π (z, λ) := h (z) if λ = 0, + if λ < / 33

37 Basic Relationships h : R n R be closed proper and convex. Then and where δ ((y, µ) epi(h)) = (h ) π (y, µ) ( ) δ (y lev h (τ)) = cl inf [τµ + µ 0 (h ) π (y, µ)], epi(h) := {(x, µ) h(x) µ} lev h (τ) := {x h(x) τ }. 22 / 33

38 Subdifferentials and Normal Cones h : R n R be closed proper and convex and let x dom(h). Subdifferential of h h(x) := {g h(y) h(x) + g, y x y } C R n non-empty closed and convex with x C. Normal Cone of C N (x C) := δ (x C ) = {g 0 g, y x y C } 23 / 33

39 Perturbation Framework and Duality (Rockafellar (1970)) The perturbation function f (x, b, τ) := ρ(b Ax) + δ ((x, τ) epi(φ)) Its conjugate f (y, u, µ) = (φ ) π (y + A T u, µ) + ρ (u). 24 / 33

40 Perturbation Framework and Duality (Rockafellar (1970)) The perturbation function f (x, b, τ) := ρ(b Ax) + δ ((x, τ) epi(φ)) Its conjugate f (y, u, µ) = (φ ) π (y + A T u, µ) + ρ (u). The Primal Problem P(b, τ) : The Dual Problem D(b, τ) : v(b, τ) := min x f (x, b, τ). ˆv(b, τ) := sup b, u + τµ f (0, u, µ). u,µ 24 / 33

41 Perturbation Framework and Duality (Rockafellar (1970)) The perturbation function f (x, b, τ) := ρ(b Ax) + δ ((x, τ) epi(φ)) Its conjugate f (y, u, µ) = (φ ) π (y + A T u, µ) + ρ (u). The Primal Problem P(b, τ) : The Dual Problem D(b, τ) : v(b, τ) := min x f (x, b, τ). ˆv(b, τ) := sup b, u + τµ f (0, u, µ). u,µ The Subdifferential: If (b, τ) int (dom(v)), then v(b, τ) = ˆv(b, τ) and v(b, τ) = arg max u,µ D(b, τ) 24 / 33

42 The Constraint Qualification (b, τ) int (dom(v)) The Primal Problem P(b, τ) : v(b, τ) := min ρ(b Ax) + δ ((x, τ) epi(φ)) x The Constraint Qualification (Slater CQ) 0 int (dom(v)) ˆx st φ(ˆx) < τ and b Aˆx int (dom(ρ)) 25 / 33

43 The Constraint Qualification (b, τ) int (dom(v)) The Primal Problem P(b, τ) : v(b, τ) := min ρ(b Ax) + δ ((x, τ) epi(φ)) x The Constraint Qualification (Slater CQ) 0 int (dom(v)) ˆx st φ(ˆx) < τ and b Aˆx int (dom(ρ)) Solution Existence: Coercivity Conditions The dual objective is coercive iff b int (dom(ρ) + A(lev φ (τ))). The primal objective is coercive iff where hzn (φ) [ A 1 hzn (ρ)] = {0}, hzn (p) := {y p (y) 0} = [lev p (τ)] τ > inf p 25 / 33

44 Other Representations for the Dual Define g τ (u) := ρ (u) + δ ( A T u lev φ (τ) ). Then the dual optimal value function ˆv has the following equivalent representations: ˆv(b, τ) := sup b, u + τµ (φ ) π (A T u, µ) ρ (u) u,µ = sup u [ b, u ρ [ (u) inf τ(µ) + (φ ) π (A T u, µ) ]] µ 0 = [ sup b, u ρ (u) δ ( A T u lev φ (τ) )] Reduced Dual u = g τ (b) = cl (v(, τ)) (b), 26 / 33

45 KKT Conditions and Subdifferential Representations KKT Conditions: ū ρ(b A x) and A T ū N ( x lev φ (τ)) Subdifferential Representation 1: If v(b, τ), then {( ) ū ( x, ū) R v(b, τ) = n R m satisfy KKT cond. and µ µ arg min µ 0 [τµ + (φ ) π (A T ū, µ)] }. Subdifferential Representation 2: If v(b, τ) and cone ( φ( x)) is closed for all x arg min x f (x, b, τ), then {( ) } ū x s.t. 0 A v(b, τ) = T ρ(b A x) + µ φ( x). µ with µ 0 and µ(φ( x) τ) = 0 27 / 33

46 KKT Conditions and Subdifferential Representations KKT Conditions: ū ρ(b A x) and A T ū N ( x lev φ (τ)) Subdifferential Representation 1: If v(b, τ), then {( ) ū ( x, ū) R v(b, τ) = n R m satisfy KKT cond. and µ µ arg min µ 0 [τµ + (φ ) π (A T ū, µ)] }. Subdifferential Representation 2: If v(b, τ) and cone ( φ( x)) is closed for all x arg min x f (x, b, τ), then {( ) } ū x s.t. 0 A v(b, τ) = T ρ(b A x) + µ φ( x). µ with µ 0 and µ(φ( x) τ) = 0 v 2 (τ) = { ω ( ) u u st ω } v(b, τ) 27 / 33

47 Derivatives for Quadratic Support Functions

48 Piecewise Linear-Quadratic Functions Quadratic Support (QS) Functions φ(x) := sup [ x, u 1 2 ut Bu] u U U R n is nonempty, closed and convex with 0 U B R n n is symmetric positive semi-definite. Examples: 1 Support functionals: B = 0 2 Gauge functionals: γ ( U ) = δ ( U ) 3 Norms: B = closed unit ball, = γ ( B) 4 Least-squares: U = R n, B = I 5 Huber: U = [ ɛ, ɛ] n, B = I 28 / 33

49 Computing Derivatives for PLQ Functions φ(x) := sup [ x, u 1 2 ut Bu] u U P(b, τ) : v(b, τ) := min ρ(b Ax) st φ(x) τ ( ) ū v(b, τ) = µ ( x, ū) satisfy the KKT cond. for P(b, τ) and { µ = max γ ( A T ū U ), ū T ABA T ū/ } 2τ. 29 / 33

50 Quadratic Support KKT Conditions φ(x) := sup [ x, u 1 2 ut Bu] u U v(b, τ) := min ρ(b Ax) st φ(x) τ KKT Conditions ū ρ(b A x) and A T ū N ( x lev φ (τ)), KKT Conditions, Multiplier Form If U Nul(B) = {0}, then the KKT conditions can be written in multiplier form as ū ρ(b A x), A T ū = µ w, x B w+n ( w U ), and µ(φ( x) τ) = / 33

51 More specific examples of derivative computations v(b,τ) := min 1 2 b Ax 2 2 st φ(x) τ Optimal Solution: x Optimal Residual: r = A x b 1 Support functionals: φ(x) = δ (x U ), 0 U = v 2(τ) = δ ( A T r U ) = γ ( A T r U ) 2 Gauge functionals: φ(x) = γ (x U ), 0 U = v 2(τ) = γ ( A T r U ) = δ ( A T r U ) 3 Norms: φ(x) = x = v 2(τ) = A T r 4 Huber: φ(x) = sup u [ ɛ,ɛ] n [ x, u 1 2 ut u] = v 2(τ) = max{ɛ A T r, A T r 2 / 2τ} 5 Vapnik: φ(x) = (x ɛ) ( x ɛ) + 1 = v 2(τ) = ( A T r + ɛ A T r 1 ) 31 / 33

52 Sparse and Robust Formulation HBP σ : min x 1 st ρ(b Ax) σ Huber 3 2 Signal Recovery Problem Specification x 20-sparse spike train in R 512 b measurements in R 120 A Measurement matrix satisfying RIP ρ Huber function σ error level set at.01 Truth LS LS Truth Residuals 5 outliers Results In the presence of outliers, the robust formulation recovers the spike train, while the standard formulation does not. Huber / 33

53 Sparse and Robust Formulation HBP σ : min 0 x x 1 st ρ(b Ax) σ 4 3 Signal Recovery Problem Specification LS 2 x 20-sparse spike train in R Truth 1 0 b measurements in R 120 Huber 1 A ρ Measurement matrix satisfying RIP Huber function LS Residuals σ error level set at.01 5 outliers Results In the presence of outliers, the robust formulation recovers the spike train, while the standard formulation does not. Truth Huber / 33

Making Flippy Floppy

Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Current