Model Selection with Partly Smooth Functions Samuel Vaiter, Gabriel Peyré and Jalal Fadili vaiter@ceremade.dauphine.fr August 27, 2014 ITWIST 14 Model Consistency of Partly Smooth Regularizers, arxiv:1405.1004, 2014 1
Linear Inverse Problems Forward model y = Φ x 0 + w Forward operator Φ : R n R q linear (q n) 2
Linear Inverse Problems Forward model y = Φ x 0 + w Forward operator Φ : R n R q linear (q n) ill-posed problem 2
Linear Inverse Problems Forward model y = Φ x 0 + w Forward operator Φ : R n R q linear (q n) ill-posed problem denoising inpainting deblurring 2
Variational Regularization Trade-off between prior regularization and data fidelity 3
Variational Regularization Trade-off between prior regularization and data fidelity x Argmin J(x) + x R n 1 2λ y Φx 2 (P y,λ ) 3
Variational Regularization Trade-off between prior regularization and data fidelity x Argmin J(x) + x R n 1 2λ y Φx 2 (P y,λ ) λ 0 + x Argmin J(x) subject to y = Φx (P y,0 ) x R n 3
Variational Regularization Trade-off between prior regularization and data fidelity x Argmin J(x) + x R n 1 2λ y Φx 2 (P y,λ ) λ 0 + x Argmin J(x) subject to y = Φx (P y,0 ) x R n J convex, bounded from below and finite-valued function, typically non-smooth. 3
Objective x 0 y x 4
Low Complexity Models Sparsity J(x) = X xi i=1,...,n Mx = x 0 : supp(x 0 ) supp(x) 5
Low Complexity Models Sparsity J(x) = X xi Group sparsity X J(x) = xb i=1,...,n b B Mx = x 0 : supp(x 0 ) supp(x) 5
Low Complexity Models Sparsity J(x) = X xi Group sparsity X J(x) = xb i=1,...,n Mx = x 0 : supp(x 0 ) supp(x) b B Low rank J(x) = X σi (x) i=1,...,n Mx = x 0 : rank(x 0 ) = rank(x) 5
Partly Smooth Functions [Lewis 2002] T M x x M J is partly smooth at x relative to a C 2 -manifold M if Smoothness. J restricted to M is C 2 around x Sharpness. h (T M x), t J(x + th) is non-smooth at t = 0. Continuity. J on M is continuous around x. 6
Partly Smooth Functions [Lewis 2002] T M x x M J, G partly smooth J is partly smooth at x relative to a C 2 -manifold M if Smoothness. J restricted to M is C 2 around x Sharpness. h (T M x), t J(x + th) is non-smooth at t = 0. Continuity. J on M is continuous around x. J + G J D with D linear operator partly smooth J σ (spectral lift) 1, 1, 1,2,,, max i ( d i, x ) + partly smooth. 6
Dual Certificates x Argmin J(x) subject to y = Φx (P y,0 ) x R n 7
Dual Certificates Source condition x Argmin J(x) subject to y = Φx (P y,0 ) x R n Φ p J(x) J(x) Φ x p Φx = Φx 0 7
Dual Certificates Source condition x Argmin J(x) subject to y = Φx (P y,0 ) x R n Φ p J(x) J(x) Φ x p Φx = Φx 0 Proposition There exists a dual certificate p if, and only if, x 0 is a solution of (P y,0 ). 7
Dual Certificates x Argmin J(x) subject to y = Φx (P y,0 ) x R n Source condition Φ p J(x) Non-degenerate source condition Φ p ri J(x) J(x) Φ x p Φx = Φx 0 Proposition There exists a dual certificate p if, and only if, x 0 is a solution of (P y,0 ). 7
Linearized Precertificate Minimal norm certificate p 0 = argmin p subject to Φ p J(x 0 ) 8
Linearized Precertificate Minimal norm certificate p 0 = argmin p subject to Φ p J(x 0 ) Linearized precertificate p F = argmin p subject to Φ p aff J(x 0 ) 8
Linearized Precertificate Minimal norm certificate p 0 = argmin p subject to Φ p J(x 0 ) Linearized precertificate p F = argmin p subject to Φ p aff J(x 0 ) Proposition Assume Ker Φ T M x 0 = {0}. Then, p F ri J(x 0 ) p F = p 0 8
Manifold Selection Theorem Assume J is partly smooth at x 0 relative to M. If Φ p F ri J(x 0 ) and Ker Φ T M x 0 = {0}. There exists C > 0 such that if max(λ, w /λ) C, the unique solution x of (P y,λ ) satisfies x M and x x 0 = O( w ). 9
Manifold Selection Theorem Assume J is partly smooth at x 0 relative to M. If Φ p F ri J(x 0 ) and Ker Φ T M x 0 = {0}. There exists C > 0 such that if max(λ, w /λ) C, the unique solution x of (P y,λ ) satisfies x M and x x 0 = O( w ). Almost sharp analysis (Φ p F J(x 0 ) x M x0 ) 9
Manifold Selection Theorem Assume J is partly smooth at x 0 relative to M. If Φ p F ri J(x 0 ) and Ker Φ T M x 0 = {0}. There exists C > 0 such that if max(λ, w /λ) C, the unique solution x of (P y,λ ) satisfies x M and x x 0 = O( w ). Almost sharp analysis (Φ p F J(x 0 ) x M x0 ) [Fuchs 2004]: l 1 [Bach 2008]: l 1 l 2 and nuclear norm. 9
Sparse Spike Deconvolution x 0 10
Sparse Spike Deconvolution Φx = i x i ϕ( i) J(x) = x 1 γ Φx 0 x 0 10
Sparse Spike Deconvolution Φx = i x i ϕ( i) J(x) = x 1 γ Φx 0 x 0 Φ η F ri J(x 0 ) Φ +, I Φ c I sign(x 0,I ) < 1 stable recovery I = supp(x 0 ) η 0,I c 1 γ crit γ 10
1D Total Variation and Jump Set J = d 1, M x = { x : supp( d x ) supp( d x) }, Φ = Id 11
1D Total Variation and Jump Set J = d 1, M x = { x : supp( d x ) supp( d x) }, Φ = Id x i u k i k +1 1 stable jump unstable jump Φ p F = div u 11
Take-away Message Partial smoothness: encodes models using singularities 12
Future Work Extended-valued functions: minimization under constraints min x R n 1 2 y Φx 2 + λj(x) subject to x 0 13
Future Work Extended-valued functions: minimization under constraints 1 min x R n 2 y Φx 2 + λj(x) subject to x 0 Non-convexity: Fidelity and regularization, dictionary learning min x k R n,d D k 1 2 y ΦDx k 2 + λj(x k ) 13
Future Work Extended-valued functions: minimization under constraints 1 min x R n 2 y Φx 2 + λj(x) subject to x 0 Non-convexity: Fidelity and regularization, dictionary learning min x k R n,d D k 1 2 y ΦDx k 2 + λj(x k ) Infinite dimensional problems: partial smoothness for BV, Besov 1 min f BV(Ω) L 2 (Ω) 2 g Ψf L 2 (Ω) + λ Df (Ω) 13
Future Work Extended-valued functions: minimization under constraints 1 min x R n 2 y Φx 2 + λj(x) subject to x 0 Non-convexity: Fidelity and regularization, dictionary learning min x k R n,d D k 1 2 y ΦDx k 2 + λj(x k ) Infinite dimensional problems: partial smoothness for BV, Besov 1 min f BV(Ω) L 2 (Ω) 2 g Ψf L 2 (Ω) + λ Df (Ω) Compressed sensing: Optimal bounds for partly smooth regularizers 13
Thanks for your attention 14