Optimal Value Function Methods in Numerical Optimization Level Set Methods

Size: px

Start display at page:

Download "Optimal Value Function Methods in Numerical Optimization Level Set Methods"

Marianna Wood
5 years ago
Views:

1 Optimal Value Function Methods in Numerical Optimization Level Set Methods James V Burke Mathematics, University of Washington, (jvburke@uw.edu) Joint work with Aravkin (UW), Drusvyatskiy (UW), Friedlander (UBC/Davis), Roy (UW) The Hong Kong Polytechnic University Applied Mathematics Colloquium February 4, 2016

2 Motivation Optimization in Large-Scale Inference A range of large-scale data science applications can be modeled using optimization: - Inverse problems (medical and seismic imaging ) - High dimensional inference (compressive sensing, LASSO, quantile regression) - Machine learning (classification, matrix completion, robust PCA, time series) These applications are often solved using side information: - Sparsity or low rank of solution - Constraints (topography, non-negativity) - Regularization (priors, total variation, dirty data) We need efficient large-scale solvers for nonsmooth programs. 2/36

3 The Prototypical Problem Sparse Data Fitting: Find sparse x with Ax b 3/36

4 The Prototypical Problem Sparse Data Fitting: Example: Model Selection Find sparse x with Ax b y = a T x where y R k is an observation and a R n are covariates. Suppose y is a disease classifier and a is micro-array data (n 10 4 ). Given data {(y i, a i )} m i=1, find x so that y i a T i x. 3/36

5 The Prototypical Problem Sparse Data Fitting: Example: Model Selection Find sparse x with Ax b y = a T x where y R k is an observation and a R n are covariates. Suppose y is a disease classifier and a is micro-array data (n 10 4 ). Given data {(y i, a i )} m i=1, find x so that y i a T i x. Since m << n, one can always find x such that y i = a T i x, i = 1,..., m. 3/36

6 The Prototypical Problem Sparse Data Fitting: Example: Model Selection Find sparse x with Ax b y = a T x where y R k is an observation and a R n are covariates. Suppose y is a disease classifier and a is micro-array data (n 10 4 ). Given data {(y i, a i )} m i=1, find x so that y i a T i x. Since m << n, one can always find x such that y i = a T i x, i = 1,..., m. This x gives little insight into the role of the covariates a in determining the observations y. We prefer the most parsimonious subset of covariates that can be used to explain the observations. That is, we prefer the sparsest model from the 2 n possible models. Such models are used to further our knowledge of disease mechanisms and to develop efficient disease assays. 3/36

7 The Prototypical Problem Sparse Data Fitting: Find sparse x with Ax b There are numerous other applications; system identification image segmentation compressed sensing grouped sparsity for remote sensor location... 4/36

8 The Prototypical Problem Sparse Data Fitting: Find sparse x with Ax b 5/36

9 The Prototypical Problem Sparse Data Fitting: Find sparse x with Ax b Convex approaches: x 1 as a sparsity surragate (Candes-Tao-Donaho) BPDN LASSO Lagrangian (Penalty) min x s.t. x Ax b 2 2 σ min x s.t. 1 2 Ax b 2 2 x 1 τ min x 1 2 Ax b λ x 1 5/36

10 The Prototypical Problem Sparse Data Fitting: Find sparse x with Ax b Convex approaches: x 1 as a sparsity surragate (Candes-Tao-Donaho) BPDN LASSO Lagrangian (Penalty) min x s.t. x Ax b 2 2 σ min x s.t. 1 2 Ax b 2 2 x 1 τ min x 1 2 Ax b λ x 1 BPDN: often most natural and transparent. (physical considerations guide σ) 5/36

11 The Prototypical Problem Sparse Data Fitting: Find sparse x with Ax b Convex approaches: x 1 as a sparsity surragate (Candes-Tao-Donaho) BPDN LASSO Lagrangian (Penalty) min x s.t. x Ax b 2 2 σ min x s.t. 1 2 Ax b 2 2 x 1 τ min x 1 2 Ax b λ x 1 BPDN: often most natural and transparent. (physical considerations guide σ) Lagrangian: ubiquitous in practice. ( no constraints ) 5/36

12 The Prototypical Problem Sparse Data Fitting: Find sparse x with Ax b Convex approaches: x 1 as a sparsity surragate (Candes-Tao-Donaho) BPDN LASSO Lagrangian (Penalty) min x s.t. x Ax b 2 2 σ min x s.t. 1 2 Ax b 2 2 x 1 τ min x 1 2 Ax b λ x 1 BPDN: often most natural and transparent. (physical considerations guide σ) Lagrangian: ubiquitous in practice. ( no constraints ) All three are (essentially) equivalent computationally! 5/36

13 The Prototypical Problem Sparse Data Fitting: Find sparse x with Ax b Convex approaches: x 1 as a sparsity surragate (Candes-Tao-Donaho) BPDN LASSO Lagrangian (Penalty) min x s.t. x Ax b 2 2 σ min x s.t. 1 2 Ax b 2 2 x 1 τ min x 1 2 Ax b λ x 1 BPDN: often most natural and transparent. (physical considerations guide σ) Lagrangian: ubiquitous in practice. ( no constraints ) All three are (essentially) equivalent computationally! Basis for SPGL1 (van den Berg-Friedlander 08) 5/36

14 Optimal Value or Level Set Framework Problem class: Solve min x X s.t. φ(x) ρ(ax b) σ P(σ) 6/36

15 Optimal Value or Level Set Framework Problem class: Solve min x X s.t. φ(x) ρ(ax b) σ P(σ) Strategy: Consider the flipped problem v(τ) := min x X s.t. ρ(ax b) φ(x) τ Q(τ) 6/36

16 Optimal Value or Level Set Framework Problem class: Solve min x X s.t. φ(x) ρ(ax b) σ P(σ) Strategy: Consider the flipped problem v(τ) := min x X s.t. ρ(ax b) φ(x) τ Q(τ) Then opt-val(p(σ)) is the minimal root of the equation v(τ) = σ 6/36

17 Queen Dido s Problem The intuition behind the proposed framework has a distinguished history, appearing even in antiquity. Perhaps the earliest instance is Queen Dido s problem and the fabled origins of Carthage. In short, the problem is to find the maximum area that can be enclosed by an arc of fixed length and a given line. The converse problem is to find an arc of least length that traps a fixed area between a line and the arc. Although these two problems reverse the objective and the constraint, the solution in each case is a semi-circle. 7/36

18 Queen Dido s Problem The intuition behind the proposed framework has a distinguished history, appearing even in antiquity. Perhaps the earliest instance is Queen Dido s problem and the fabled origins of Carthage. In short, the problem is to find the maximum area that can be enclosed by an arc of fixed length and a given line. The converse problem is to find an arc of least length that traps a fixed area between a line and the arc. Although these two problems reverse the objective and the constraint, the solution in each case is a semi-circle. Other historical examples abound. More recently, these observations provide the basis for the Markowitz Mean-Variance Portfolio Theory. 7/36

19 The Role of Convexity Convex Sets Let C R n. We say that C is convex if (1 λ)x + λy C whenever x, y C and 0 λ 1. 8/36

20 The Role of Convexity Convex Sets Let C R n. We say that C is convex if (1 λ)x + λy C whenever x, y C and 0 λ 1. Convex Functions Let f : R n R := R {+ }. We say that f is convex if the set epi (f ) := { (x, µ) : f (x) µ } is a convex set. 8/36

21 The Role of Convexity Convex Sets Let C R n. We say that C is convex if (1 λ)x + λy C whenever x, y C and 0 λ 1. Convex Functions Let f : R n R := R {+ }. We say that f is convex if the set epi (f ) := { (x, µ) : f (x) µ } is a convex set. (x 1, f (x 1)) (x, f (x )) 2 2 x 1 λx 1 + (1 - λ)x 2 x 2 f ((1 λ)x 1 + λx 2 ) (1 λ)f (x 1 ) + λf (x 2 ) 8/36

22 Convex Functions Convex indicator functions Let C R n. Then the function { 0, if x C, δ C (x) := +, if x / C, is a convex function. 9/36

23 Convex Functions Convex indicator functions Let C R n. Then the function { 0, if x C, δ C (x) := +, if x / C, is a convex function. Addition Non-negative linear combinations of convex functions are convex: f i convex and λ i 0, i = 1,..., k f (x) := k i=1 λ i f i (x). 9/36

24 Convex Functions Convex indicator functions Let C R n. Then the function { 0, if x C, δ C (x) := +, if x / C, is a convex function. Addition Non-negative linear combinations of convex functions are convex: f i convex and λ i 0, i = 1,..., k f (x) := k i=1 λ i f i (x). Infimal Projection If f : R n R m R is convex, then so is v(x) := inf y f (x, y), since epi (v) = { (x, µ) : y s.t. f (x, y) µ }. 9/36

25 Convexity of v When X, ρ, and φ are convex, the optimal value function v is a non-increasing convex function by infimal projection: v(τ) := min x X ρ(ax b) s.t. φ(x) τ = min x ρ(ax b) + δ epi (φ) (x, τ) + δ X (x) 10/36

26 Newton and Secant Methods For f convex and non-increasing, solve f (τ) = 0. 11/36

27 Newton and Secant Methods For f convex and non-increasing, solve f (τ) = /36

28 Newton and Secant Methods For f convex and non-increasing, solve f (τ) = Problem: f is often not differentiable. 11/36

29 Newton and Secant Methods For f convex and non-increasing, solve f (τ) = Problem: f is often not differentiable. Use the convex subdifferential f (x) := { z : f (y) f (x) + z T (y x) y R n } 11/36

30 Superlinear Convergence τ := inf{τ : f (τ) 0} and g := inf { g : g f (τ ) } < 0 (non-degeneracy) 12/36

31 Superlinear Convergence τ := inf{τ : f (τ) 0} and g := inf { g : g f (τ ) } < 0 (non-degeneracy) Initialization: τ 1 < τ 0 < τ { τ k if f (τ k ) = 0, τ k+1 := τ k f (τ k) g k [for g k f (τ k )] otherwise; (Newton) and τ k+1 := { τk if f (τ k ) = 0, τ k τ k τ k 1 f (τ k ) f (τ k 1 ) f (τ k) otherwise. (Secant) 12/36

32 Superlinear Convergence τ := inf{τ : f (τ) 0} and g := inf { g : g f (τ ) } < 0 (non-degeneracy) Initialization: τ 1 < τ 0 < τ { τ k if f (τ k ) = 0, τ k+1 := τ k f (τ k) g k [for g k f (τ k )] otherwise; (Newton) and τ k+1 := { τk if f (τ k ) = 0, τ k τ k τ k 1 f (τ k ) f (τ k 1 ) f (τ k) otherwise. (Secant) If either sequence terminates finitely at some τ k, then τ k = τ ; otherwise, τ τ k+1 (1 g γ k ) τ τ k, k = 1, 2,..., where γ k = g k (Newton) and γ k f (τ k 1 ) (secant). In either case, γ k g and τ k τ globally q-superlinearly. 12/36

33 Inexact Root Finding Problem: Find root of the inexactly known convex function v( ) σ. 13/36

34 Inexact Root Finding Problem: Find root of the inexactly known convex function v( ) σ. Bisection is one approach 13/36

35 Inexact Root Finding Problem: Find root of the inexactly known convex function v( ) σ. Bisection is one approach nonmonotone iterates (bad for warmstarts) at best linear convergence (with perfect information) 13/36

36 Inexact Root Finding Problem: Find root of the inexactly known convex function v( ) σ. Bisection is one approach nonmonotone iterates (bad for warmstarts) at best linear convergence (with perfect information) Solution: modified secant approximate Newton methods 13/36

37 Inexact Root Finding: Secant 14/36

38 Inexact Root Finding: Secant 14/36

39 Inexact Root Finding: Secant 14/36

40 Inexact Root Finding: Secant 14/36

41 Inexact Root Finding: Secant 14/36

42 Inexact Root Finding: Secant 14/36

43 Inexact Root Finding: Newton 14/36

44 Inexact Root Finding: Newton 14/36

45 Inexact Root Finding: Newton 14/36

46 Inexact Root Finding: Newton 14/36

47 Inexact Root Finding: Newton 14/36

48 Inexact Root Finding: Newton 14/36

49 Inexact Root Finding: Convergence 15/36

50 Inexact Root Finding: Convergence 15/36

51 Inexact Root Finding: Convergence 15/36

52 Inexact Root Finding: Convergence Key observation: C = C(τ 0 ) is independent of v (τ ). 15/36

53 Minorants from Duality 16/36

54 Minorants from Duality 16/36

55 Minorants from Duality 16/36

56 Robustness: 1 u/l α, where α [1, 2) and ɛ = f f (a) k = 13, α = 1.3 (b) k = 770, α = 1.99 (c) k = 18, α = f f (d) k = 9, α = 1.3 (e) k = 15, α = 1.99 (f) k = 10, α = 1.3 Figure : Inexact secant (top) and Newton (bottom) for f 1 (τ) = (τ 1) 2 10 (first two columns) and f 2 (τ) = τ 2 (last column). Below each panel, α is the oracle accuracy, and k is the number of iterations needed to converge, i.e., to reach f i (τ k ) ɛ = / f 2 f 2

57 Sensor Network Localization (SNL) Given a weighted graph G = (V, E, d) find a realization: p 1,..., p n R 2 with d ij = p i p j 2 for all ij E. 18/36

58 Sensor Network Localization (SNL) SDP relaxation (Weinberger et al. 04, Biswas et al. 06): max tr (X) s.t. P E K(X) d 2 2 σ Xe = 0, X 0 where [K(X)] i,j = X ii + X jj 2X ij. 19/36

59 Sensor Network Localization (SNL) SDP relaxation (Weinberger et al. 04, Biswas et al. 06): max tr (X) s.t. P E K(X) d 2 2 σ Xe = 0, X 0 where [K(X)] i,j = X ii + X jj 2X ij. Intuition: X = PP T and then tr (X) = 1 with p n + 1 i the ith row of P. n p i p j 2 i,j=1 19/36

60 Sensor Network Localization (SNL) SDP relaxation (Weinberger et al. 04, Biswas et al. 06): max tr (X) s.t. P E K(X) d 2 2 σ Xe = 0, X 0 where [K(X)] i,j = X ii + X jj 2X ij. Intuition: X = PP T and then tr (X) = 1 with p n + 1 i the ith row of P. Flipped problem: min P E K(X) d 2 2 s.t. tr X = τ Xe = 0 X 0. n p i p j 2 i,j=1 19/36

61 Sensor Network Localization (SNL) SDP relaxation (Weinberger et al. 04, Biswas et al. 06): max tr (X) s.t. P E K(X) d 2 2 σ Xe = 0, X 0 where [K(X)] i,j = X ii + X jj 2X ij. Intuition: X = PP T and then tr (X) = 1 with p n + 1 i the ith row of P. Flipped problem: min P E K(X) d 2 2 s.t. tr X = τ Xe = 0 X 0. Perfectly adapted for the Frank-Wolfe method. n p i p j 2 i,j=1 19/36

62 Sensor Network Localization (SNL) SDP relaxation (Weinberger et al. 04, Biswas et al. 06): max tr (X) s.t. P E K(X) d 2 2 σ Xe = 0, X 0 where [K(X)] i,j = X ii + X jj 2X ij. Intuition: X = PP T and then tr (X) = 1 with p n + 1 i the ith row of P. Flipped problem: min P E K(X) d 2 2 s.t. tr X = τ Xe = 0 X 0. Perfectly adapted for the Frank-Wolfe method. n p i p j 2 i,j=1 Key point: Slater failing (always the case) is irrelevant. 19/36

63 Approximate Newton 1.5 Pareto curve Figure : σ = /36

64 Approximate Newton 1.5 Pareto curve 1.4 Pareto curve Figure : σ = 0.25 Figure : σ = 0 20/36

65 Max-trace 0.6 Newton_max 0.6 Refine positions /36

66 Max-trace 0.6 Newton_max 0.6 Refine positions Newton_min Refine positions /36

67 Observations Simple strategy for optimizing over complex domains Rigorous convergence guarantees Insensitivity to ill-conditioning Many applications Sensor Network Localization (Drusvyatskiy-Krislock-Voronin-Wolkowicz 15) Sparse/Robust Estimation and Kalman Smoothing (Aravkin-B-Pillonetto 13) Large scale SDP and LP (cf. Renegar 14) Chromosome reconstruction (Aravkin-Becker-Drusvyatskiy-Lozano 15) Phase retrieval (Aravkin-B-Drusvyatskiy-Friedlander-Roy 16) Generalized linear models (Aravkin-B-Drusvyatskiy-Friedlander-Roy 16)... 22/36

68 Conjugate Functions and Duality Convex Indicator For any convex set C, the convex indicator function for C is { 0, x C, δ (x C ) := +, x / C. 23/36

69 Conjugate Functions and Duality Convex Indicator For any convex set C, the convex indicator function for C is { 0, x C, δ (x C ) := +, x / C. Support Functionals For any set C, the support functional for C is δ (x C ) := sup x, z. z C 23/36

70 Conjugate Functions and Duality Convex Indicator For any convex set C, the convex indicator function for C is { 0, x C, δ (x C ) := +, x / C. Support Functionals For any set C, the support functional for C is Convex Conjugates δ (x C ) := sup x, z. z C For any convex function g(x), the convex conjugate is given by g (y) := δ ((y, 1) epi (g)) = sup[ x, y g(x)]. x 23/36

71 Conjugate s and the Subdifferential g (y) = sup[ x, y g(x)]. x 24/36

72 Conjugate s and the Subdifferential g (y) = sup[ x, y g(x)]. x The Bi-Conjugate Theorem If epi (g) is closed and dom (g), then (g ) = g. 24/36

73 Conjugate s and the Subdifferential g (y) = sup[ x, y g(x)]. x The Bi-Conjugate Theorem If epi (g) is closed and dom (g), then (g ) = g. The Young-Fenchel Inequality g(x) + g (z) z, x for all x, y R n with equality if and only if z g(x) and x g (z). In particular, g(x) = argmax z [ z, x g (z)]. 24/36

74 Conjugate s and the Subdifferential g (y) = sup[ x, y g(x)]. x The Bi-Conjugate Theorem If epi (g) is closed and dom (g), then (g ) = g. The Young-Fenchel Inequality g(x) + g (z) z, x for all x, y R n with equality if and only if z g(x) and x g (z). In particular, g(x) = argmax z [ z, x g (z)]. Maximal Montone Operator If epi (g) is closed and dom (g), then g is a maximal monotone operator with g 1 = g. 24/36

75 Conjugate s and the Subdifferential g (y) = sup[ x, y g(x)]. x The Bi-Conjugate Theorem If epi (g) is closed and dom (g), then (g ) = g. The Young-Fenchel Inequality g(x) + g (z) z, x for all x, y R n with equality if and only if z g(x) and x g (z). In particular, g(x) = argmax z [ z, x g (z)]. Maximal Montone Operator If epi (g) is closed and dom (g), then g is a maximal monotone operator with g 1 = g. Note:The lsc hull of g is cl g := g. 24/36

76 The perspective function epi (g π ) := cl cone (epi (g)) = cl ( λ>0 λepi (g)) 25/36

77 The perspective function epi (g π ) := cl cone (epi (g)) = cl ( λ>0 λepi (g)) λg(λ 1 z) if λ > 0, g π (z, λ) := g (z) if λ = 0, + if λ < 0, where g is the horizon function of g: g (z) := sup [g(x + z) g(x)]. x dom g 25/36

78 Support Functions for epi (g) and lev g (τ) g : R n R be closed proper and convex. Then δ ((y, µ) epi (g)) = (g ) π (y, µ) and where δ (y [g τ]) = cl inf µ 0 [τµ + (g ) π (y, µ)], epi (g) := {(x, µ) g(x) µ} [g τ] := {x g(x) τ } δ (z C ) := sup z, w w C 26/36

79 Perturbation Framework and Duality (Rockafellar (1970)) The perturbation function f (x, b, τ) := ρ(b Ax) + δ ((x, τ) epi (φ)) Its conjugate f (y, u, µ) = (φ ) π (y + A T u, µ) + ρ (u). 27/36

80 Perturbation Framework and Duality (Rockafellar (1970)) The perturbation function Its conjugate f (x, b, τ) := ρ(b Ax) + δ ((x, τ) epi (φ)) f (y, u, µ) = (φ ) π (y + A T u, µ) + ρ (u). The Primal Problem P(b, τ) : infimal projection in x v(b, τ) := min x f (x, b, τ). 27/36

81 Perturbation Framework and Duality (Rockafellar (1970)) The perturbation function Its conjugate f (x, b, τ) := ρ(b Ax) + δ ((x, τ) epi (φ)) f (y, u, µ) = (φ ) π (y + A T u, µ) + ρ (u). The Primal Problem P(b, τ) : The Dual Problem v(b, τ) := min x f (x, b, τ). D(b, τ) : ˆv(b, τ) := sup b, u + τµ f (0, u, µ) u,µ (reduced dual) = sup b, u ρ (u) δ ( A T u [φ τ] ). u 27/36

82 Perturbation Framework and Duality (Rockafellar (1970)) The perturbation function Its conjugate f (x, b, τ) := ρ(b Ax) + δ ((x, τ) epi (φ)) f (y, u, µ) = (φ ) π (y + A T u, µ) + ρ (u). The Primal Problem P(b, τ) : The Dual Problem v(b, τ) := min x f (x, b, τ). D(b, τ) : ˆv(b, τ) := sup b, u + τµ f (0, u, µ) u,µ (reduced dual) = sup b, u ρ (u) δ ( A T u [φ τ] ). u The Subdifferential: If (b, τ) int (dom v), then v(b, τ) = ˆv(b, τ) and v(b, τ) = argmax D(b, τ) u,µ 27/36

83 Piecewise Linear-Quadratic Penalties φ(x) := sup [ x, u 1 u U 2 ut Bu] U R n is nonempty, closed and convex with 0 U (not nec. poly.) B R n n is symmetric positive semi-definite. Examples: 1. Support functionals: B = 0 2. Gauge functionals: γ ( U ) = δ ( U ) 3. Norms: B = closed unit ball, = γ ( B) 4. Least-squares: U = R n, B = I 5. Huber: U = [ ɛ, ɛ] n, B = I 28/36

84 PLQ Densities: Gauss, Laplace, Huber, Vapnik y y x x V (x) = 1 2 x2 V (x) = x Gauss l 1 y y K K x ɛ ɛ x V (x) = Kx 1 2 K2 ; x < K V (x) = 1 2 x2 ; K x K V (x) = Kx 1 2 K2 ; K < x V (x) = x ɛ; x < ɛ V (x) = 0; ɛ x ɛ V (x) = x ɛ; ɛ x Huber Vapnik 29/36

85 Computing v for PLQ Penalties φ φ(x) := sup [ x, u 1 u U 2 ut Bu] P(b, τ) : v(b, τ) := min ρ(b Ax) st φ(x) τ ( ) u v(b, τ) = µ x s.t. 0 A T ρ(b Ax) + µ + φ(x) and { ( ) µ = max γ A T u U, u T ABA T u/ } 2τ. 30/36

86 A Few Special Cases v(τ) := min 1 2 b Ax 2 2 st φ(x) τ Optimal Solution: x Optimal Residual: r := Ax b 1. Support functionals: φ(x) = δ (x U ), 0 U = v (τ) = δ ( A T r U ) = γ ( A T r U ) 2. Gauge functionals: φ(x) = γ (x U ), 0 U = v (τ) = γ ( A T r U ) = δ ( A T r U ) 3. Norms: φ(x) = x = v (τ) = A T r 4. Huber: φ(x) = sup u [ ɛ,ɛ] n [ x, u 1 2 ut u] = v (τ) = max{ɛ A T r, A T r 2 / 2τ} 5. Vapnik: φ(x) = (x ɛ) ( x ɛ) + 1 = v (τ) = ( A T r + ɛ A T r 2 ) 31/36

87 Basis Pursuit with Outliers BP σ : min x 1 st ρ(b Ax) σ Standard least-squares: ρ(z) = z 2 or ρ(z) = z /36

88 Basis Pursuit with Outliers BP σ : min x 1 st ρ(b Ax) σ Standard least-squares: ρ(z) = z 2 or ρ(z) = z 2 2. Quantile Huber: τ r κτ 2 2 if r < τκ, 1 ρ κ,τ (r) = 2κ r 2 if r [ κτ, (1 τ)κ], (1 τ) r κ(1 τ)2 2, if r > (1 τ)κ. Standard Huber when τ = /36

89 Huber 33/36

90 Sparse and Robust Formulation HBP σ : min x 1 st ρ(b Ax) σ Huber2 3 Signal Recovery Problem Specification Truth x 20-sparse spike train in R 512 b measurements in R 120 LS A Measurement matrix satisfying RIP ρ Huber function σ error level set at.01 5 outliers LS Residuals Results In the presence of outliers, the robust formulation recovers the spike train, while the standard formulation does not. Truth Huber /36

91 Sparse and Robust Formulation HBP σ : min 0 x Problem Specification x 1 st ρ(b Ax) σ LS Signal Recovery x 20-sparse spike train in R b measurements in R 120 A Measurement matrix satisfying RIP ρ Huber function σ error level set at.01 5 outliers Results In the presence of outliers, the robust formulation recovers the spike train, while the standard formulation does not. Truth 1 0 Huber 1 LS Truth Residuals Huber /36

92 References Probing the pareto frontier for basis pursuit solutions van der Berg - Friedlander SIAM J. Sci. Comput. 31(2008), Sparse optimization with least-squares constraints van der Berg - Friedlander SIOPT 21(2011), Variational Properties of Value Functions. Aravkin - B - Friedlander SIOPT 23(2013), Level-set methods for convex optimization Aravkin - B - Drusvyatskiy - Friedlander - Roy Preprint, /36

Optimal Value Function Methods in Numerical Optimization Level Set Methods

Optimal Value Function Methods in Numerical Optimization Level Set Methods James V Burke Mathematics, University of Washington, (jvburke@uw.edu) Joint work with Aravkin (UW), Drusvyatskiy (UW), Friedlander