Fonctions Perspectives et Statistique en Grande Dimension

Size: px

Start display at page:

Download "Fonctions Perspectives et Statistique en Grande Dimension"

Augustus Walker
6 years ago
Views:

1 Fonctions Perspectives et Statistique en Grande Dimension Patrick L. Combettes Department of Mathematics North Carolina State University Raleigh, NC 27695, USA Basé sur un travail conjoint avec C. L. Müller du Flatiron Institute, New York Journées MAS-MODE de la SMAI, Institut Henri Poincaré, Paris, 9 janvier 2017 Patrick L. Combettes Fonctions Perspectives et Statistique 1/31

2 2/31 Some optimization problems in statistics Standard finite-dimensional linear model: Observation z = Xb+σe = (ζ i ) 1 i n R n, unknown b = (β j ) 1 j p R p Belloni et al. s square-root lasso (2011): minimize b R p Xb z 2 +α b 1 Patrick L. Combettes Fonctions Perspectives et Statistique 2/31

3 2/31 Some optimization problems in statistics Standard finite-dimensional linear model: Observation z = Xb+σe = (ζ i ) 1 i n R n, unknown b = (β j ) 1 j p R p Belloni et al. s square-root lasso (2011): minimize b R p Xb z 2 +α b 1 Sun and Zhang s scaled lasso (2012): minimize b R p,σ>0 1 Xb z σ 2n σ 2 +α b 1 Patrick L. Combettes Fonctions Perspectives et Statistique 2/31

4 2/31 Some optimization problems in statistics Standard finite-dimensional linear model: Observation z = Xb+σe = (ζ i ) 1 i n R n, unknown b = (β j ) 1 j p R p Belloni et al. s square-root lasso (2011): minimize b R p Xb z 2 +α b 1 Sun and Zhang s scaled lasso (2012): minimize b R p,σ>0 1 Xb z σ 2n σ 2 +α b 1 Lederer&Müller TREX estimator (2015): minimize b R p Xb z 2 2 X (Xb z) +α b 1 Patrick L. Combettes Fonctions Perspectives et Statistique 2/31

5 2/31 Some optimization problems in statistics Standard finite-dimensional linear model: Observation z = Xb+σe = (ζ i ) 1 i n R n, unknown b = (β j ) 1 j p R p Belloni et al. s square-root lasso (2011): minimize b R p Xb z 2 +α b 1 Sun and Zhang s scaled lasso (2012): minimize b R p,σ>0 1 Xb z σ 2n σ 2 +α b 1 Lederer&Müller TREX estimator (2015): minimize b R p Xb z 2 2 X (Xb z) +α b 1 Owen s penalized concomitant M-estimators (2007): n ( ) ζi b x i p minimize nσ+σ Huber +pτ+τ Berhu b,σ,τ σ i=1 j=1 ( βj τ ) Patrick L. Combettes Fonctions Perspectives et Statistique 2/31

6 2/31 Some optimization problems in statistics Problems involving the Fisher information of a multi-dimensional density x > 0 (Fisher, 1925): x(t) 2 2 dt R x(t) N Patrick L. Combettes Fonctions Perspectives et Statistique 3/31

7 2/31 Some optimization problems in statistics Problems involving the Fisher information of a multi-dimensional density x > 0 (Fisher, 1925): x(t) 2 2 dt R x(t) N Problems involving various notions of divergence between x > 0 and y > 0: Patrick L. Combettes Fonctions Perspectives et Statistique 3/31

8 2/31 Some optimization problems in statistics Problems involving the Fisher information of a multi-dimensional density x > 0 (Fisher, 1925): x(t) 2 2 dt R x(t) N Problems involving various notions of divergence between x > 0 and y > 0: pth order Hellinger: x(t) 1/p y(t) 1/p p dt R N Patrick L. Combettes Fonctions Perspectives et Statistique 3/31

9 2/31 Some optimization problems in statistics Problems involving the Fisher information of a multi-dimensional density x > 0 (Fisher, 1925): x(t) 2 2 dt R x(t) N Problems involving various notions of divergence between x > 0 and y > 0: pth order Hellinger: x(t) 1/p y(t) 1/p p dt R N ( ) x(t) Kullback-Leibler: x(t) ln dt R y(t) N Patrick L. Combettes Fonctions Perspectives et Statistique 3/31

10 2/31 Some optimization problems in statistics Problems involving the Fisher information of a multi-dimensional density x > 0 (Fisher, 1925): x(t) 2 2 dt R x(t) N Problems involving various notions of divergence between x > 0 and y > 0: pth order Hellinger: x(t) 1/p y(t) 1/p p dt R N ( ) x(t) Kullback-Leibler: x(t) ln dt R y(t) N Rényi: x(t) α y(t) 1 α dt R N Patrick L. Combettes Fonctions Perspectives et Statistique 3/31

11 2/31 Some optimization problems in statistics Problems involving the Fisher information of a multi-dimensional density x > 0 (Fisher, 1925): x(t) 2 2 dt R x(t) N Problems involving various notions of divergence between x > 0 and y > 0: pth order Hellinger: x(t) 1/p y(t) 1/p p dt R N ( ) x(t) Kullback-Leibler: x(t) ln dt R y(t) N Rényi: x(t) α y(t) 1 α dt R N RN x(t) y(t) 2 Pearson: dt y(t) Patrick L. Combettes Fonctions Perspectives et Statistique 3/31

12 2/31 Some optimization problems in statistics Problems involving the Fisher information of a multi-dimensional density x > 0 (Fisher, 1925): x(t) 2 2 dt R x(t) N Problems involving various notions of divergence between x > 0 and y > 0: pth order Hellinger: x(t) 1/p y(t) 1/p p dt R N ( ) x(t) Kullback-Leibler: x(t) ln dt R y(t) N Rényi: x(t) α y(t) 1 α dt R N RN x(t) y(t) 2 Pearson: dt y(t) What is the common structure underlying these formulations? Patrick L. Combettes Fonctions Perspectives et Statistique 3/31

13 2/31 Perspective functions: Definition H, G real Hilbert spaces Γ 0 (G): set of lower semicontinuous convex functions from G to ],+ ] with domϕ = { x G ϕ(x) < + }. ϕ Γ 0 (G) rec ϕ is the recession function of ϕ: given z dom ϕ, ( ) ( y G) (recϕ)(y) = sup ϕ(x + y) ϕ(y) x dom ϕ (Lower semicontinuous envelope of the) Perspective function of ϕ: ηϕ(y/η), if η > 0; ϕ: R G ],+ ]: (η, y) (recϕ)(y), if η = 0; +, if η < 0. Patrick L. Combettes Fonctions Perspectives et Statistique 4/31

14 2/31 Perspective functions: Properties Let ϕ Γ 0 (G). Then ϕ Γ 0 (R G) Patrick L. Combettes Fonctions Perspectives et Statistique 5/31

15 2/31 Perspective functions: Properties Let ϕ Γ 0 (G). Then ϕ Γ 0 (R G) ϕ is positively homogeneous Patrick L. Combettes Fonctions Perspectives et Statistique 5/31

16 2/31 Perspective functions: Properties Let ϕ Γ 0 (G). Then ϕ Γ 0 (R G) ϕ is positively homogeneous Let C = { (µ, u) R G µ+ϕ (u) 0 }. Then ϕ = σ C and ( ϕ) = ι C Patrick L. Combettes Fonctions Perspectives et Statistique 5/31

17 2/31 Perspective functions: Properties Let ϕ Γ 0 (G). Then ϕ Γ 0 (R G) ϕ is positively homogeneous Let C = { (µ, u) R G µ+ϕ (u) 0 }. Then ϕ = σ C and ( ϕ) = ι C Let η R and y G. Then ϕ(η, y) = {( ) } ϕ(y/η) y u /η, u u ϕ(y/η), if η > 0; { (µ, u) C σdomϕ (y) = u y }, if η = 0 and y 0; C, if η = 0 and y = 0; Ø, if η < 0 Patrick L. Combettes Fonctions Perspectives et Statistique 5/31

18 2/31 Perspective functions: Properties Let ϕ Γ 0 (G). Then Let ψ Γ 0 (G) be such that domϕ domψ Ø, and let λ ]0,+ [. Then [λϕ+ψ] = λ ϕ+ ψ Γ 0 (R G). Patrick L. Combettes Fonctions Perspectives et Statistique 6/31

19 2/31 Perspective functions: Properties Let ϕ Γ 0 (G). Then Let ψ Γ 0 (G) be such that domϕ domψ Ø, and let λ ]0,+ [. Then [λϕ+ψ] = λ ϕ+ ψ Γ 0 (R G). Let Λ: H G be linear, bounded, and such that ranλ domϕ Ø. Set Λ: R H R G: (ξ, x) (ξ,λx). Then [ϕ Λ] = ϕ Λ Γ 0 (R G). Patrick L. Combettes Fonctions Perspectives et Statistique 6/31

20 2/31 Perspective functions: Properties Let ϕ Γ 0 (G). Then Let ψ Γ 0 (G) be such that domϕ domψ Ø, and let λ ]0,+ [. Then [λϕ+ψ] = λ ϕ+ ψ Γ 0 (R G). Let Λ: H G be linear, bounded, and such that ranλ domϕ Ø. Set Λ: R H R G: (ξ, x) (ξ,λx). Then [ϕ Λ] = ϕ Λ Γ 0 (R G). Let ψ Γ 0 (G) and let C be a closed convex subset of G such that C domψ Ø. Set ηψ(y/η), if η > 0 and y η(c domψ); g: (η, y) (recψ)(y), if η = 0 and y rec C; +, otherwise. Then g = [ι C +ψ] Γ 0 (R G). Patrick L. Combettes Fonctions Perspectives et Statistique 6/31

21 Perspective functions: Examples Let ψ Γ 0 (G) and let envψ: y inf x G (ψ(x)+ y x 2 /2) be the Moreau envelope of ψ. Set y 2 η(envψ)(y/η), if η > 0; 2η g: (η, y) σ domψ (y), if η = 0; +, if η < 0. Then g = [env(ψ )] Γ 0 (R G). Patrick L. Combettes Fonctions Perspectives et Statistique 7/31

22 Perspective functions: Examples Take ψ = ι B(0;1) in previous example and set ρ y η, if y > η and η > 0; 2 y 2 g: (η, y), if y η and η > 0; 2η y, if η = 0; +, if η < 0. Then g = [ϕ], where ϕ = env = 2 /2 db(0;1) 2 /2 is the generalized Huber function. In computer vision, g is called the bivariate Huber function. It also shows up in Owen s concomitant M-estimator formulation. Patrick L. Combettes Fonctions Perspectives et Statistique 8/31

23 Perspective functions: Examples Let C and D be nonempty closed convex subsets of G, and let ρ ]0,+ [. Set ηdc 2(y/η) +σ D (y), if η > 0 and y / ηc; 2ρ g: (η, y) σ D (y), if η > 0 and y ηc; σ D (y), if η = 0 and y rec C; +, otherwise Then g = ϕ Γ 0 (R G), where ϕ = d 2 C /(2ρ)+σ D. A special case of g appears in computer vision. If G = R and D = [ 1, 1], ϕ is the Berhu (reversed Huber) function used in mechanics and in Owen s concomitant M- estimator formulation Patrick L. Combettes Fonctions Perspectives et Statistique 9/31

24 Perspective functions: Examples Let ψ: G [0,+ ] be a proper lower semicontinuous positively homogeneous convex function, let δ R, let ρ [0,+ [, let p [1,+ [, and set { δη + ρη g: (η, y) p +ψ p (y) 1/p, if η 0; +, if η < 0. Then g = [δ + ρ+ψ p 1/p ] Γ 0 (R G). Let φ Γ 0 (R) be an even function, let v G, let δ R, and set ηφ( y /η)+ y v +δη, if η > 0; g: (η, y) (recφ)( y )+ y v, if η = 0; +, if η < 0. Then g = [φ + v +δ] Γ 0 (R G). Patrick L. Combettes Fonctions Perspectives et Statistique 10/31

25 Perspective functions: Examples The divergences between x > 0 and y > 0 discussed earlier are of the form R N ϕ ( y(t), x(t) ) dt, where { t 1/p 1 p, if t > 0; pth order Hellinger: ϕ(ξ) = +, otherwise { ξ lnξ, if ξ > 0; Kullback-Leibler: ϕ(ξ) = +, otherwise { ξ α, if ξ > 0; Rényi: ϕ(ξ) = +, otherwise Pearson: ϕ(ξ) = ξ 1 2 Patrick L. Combettes Fonctions Perspectives et Statistique 11/31

26 Composite perspective functions Let L: H G be linear and bounded, let ϕ Γ 0 (G), let r G, let u H, let ρ R, and set ( ) ( ) Lx r x u ρ ϕ, if x u > ρ; x u ρ f : x (recϕ) ( Lx r ), if x u = ρ; +, if x u < ρ. Suppose that there exists z H such that Lz r +( z u ρ)domϕ and z u 0, and set A: H R G: x ( x u ρ, Lx r). Then f = ϕ A Γ 0 (H). Patrick L. Combettes Fonctions Perspectives et Statistique 12/31

27 Composite perspective functions: Examples Let L: H G be linear and bounded, let be a norm on G such that, for some χ ]0,+ [, χ, let r G, let u H, let ρ R, and let q and s be in ]1,+ [. Set Lx r qs x u ρ (q 1)s, if x u > ρ; h: x 0, if Lx = r and x u = ρ; +, otherwise. Then h Γ 0 (H). Patrick L. Combettes Fonctions Perspectives et Statistique 13/31

28 Composite perspective functions: Examples Let L: H G be linear and bounded, let be a norm on G such that, for some χ ]0,+ [, χ, let r G, let u H, let ρ R, and let q and s be in ]1,+ [. Set Lx r qs x u ρ (q 1)s, if x u > ρ; h: x 0, if Lx = r and x u = ρ; +, otherwise. Then h Γ 0 (H). Let (Ω,F, P) be a probability space, let H = L 2 (Ω,F, P), let p ]1, 2], and let q and s be in ]1,+ [. Set E qs/p X p, if EX > 0; E h: X (q 1)s X 0, if X = 0 a.s.; +, otherwise. Then h Γ 0 (H). Patrick L. Combettes Fonctions Perspectives et Statistique 13/31

29 Composite perspective functions: Examples Let (Ω,F,µ) be a measure space, let G be a separable real Hilbert space, and let ϕ Γ 0 (G). Set H = L 2 ((Ω,F,µ);R) and G = L 2 ((Ω,F,µ); G), and suppose thatµ(ω) < + orϕ ϕ(0) = 0. For every x H, set Ω 0 (x) = { ω Ω x(ω) = 0 } and Ω + (x) = { ω Ω x(ω) > 0 }. Define Φ: H G ],+ ]: (x, y) ( ) ( )( ) y(ω) recϕ y(ω) µ(dω) + x(ω)ϕ µ(dω), Ω 0 (x) Ω +(x) x(ω) x 0 a.e. if (recϕ)(y)1 Ω0 (x) + xϕ(y/x)1 Ω+(x) L 1( (Ω,F,µ);R ) ; +, otherwise. Then Φ Γ 0 (H G). Patrick L. Combettes Fonctions Perspectives et Statistique 14/31

30 Composite perspective functions: Examples Corollary: Let Ω be a nonempty open subset of R N and let H be the Sobolev space H 1 (Ω), i.e., H = { x L 2 (Ω) x (L 2 (Ω)) N}. For every x H, set Ω (x) = { t Ω x(t) < 0 }, Ω 0 (x) = { t Ω x(t) = 0 }, andω + (x) = { t Ω x(t) > 0 }. Letϕ Γ 0 (R N ) be such that ϕ ϕ(0) = 0, and define f : H ],+ ] ( )( ) recϕ x(t) dt + x Ω 0 (x) +, Then f Γ 0 (H). Ω +(x) ( ) x(t) x(t)ϕ dt, if x 0 x(t) else Patrick L. Combettes Fonctions Perspectives et Statistique 15/31

31 Composite perspective functions: Examples The Fisher information f: H 1 (Ω) ],+ ] { x(t) 2 2 x 0 a.e. dt, if x Ω +(x) x(t) [ x = 0 x = 0] a.e.; +, otherwise is in Γ 0 (H 1 (Ω)). For (x, y) R 2N, set I 0 (x, y) = { i I ξ i = 0 and η i < 0 } and η i + 1/p η i ξ 1/p p i, if I (x) I 0 (x, y) = Ø; d φ (x, y) = i I 0 (x) i I +(x) +, otherwise. Then d φ Γ 0 (R 2N ). We recover the Kolmogorov variational divergence for p = 1 and the Hellinger divergence for p = 2. Patrick L. Combettes Fonctions Perspectives et Statistique 16/31

32 Perspective functions: Proximity operator The Moreau proximity operator of g Γ 0 (G) is prox g : G G: x argmin (g(y)+ 12 ) x y 2. y G It is an essential tool in the design of splitting algorithms to solve a variety of convex minimization problems, especially in data science over the past dodecade PLC and V. R. Wajs, Signal recovery by proximal forwardbackward splitting, Multiscale Model. Simul., vol. 4, 2005 Patrick L. Combettes Fonctions Perspectives et Statistique 17/31

33 Proximity operators Many common convex functions in data processing (statistics, machine learning, image recovery, data denoising, support vector machine, signal processing) have explicit proximity operators: l 1 norm Shatten norm nuclear norm Huber s function Berhu function elastic net regularizer hinge loss Fisher information distance function Vapnik s ε-insensitive loss Burg s entropy etc. Patrick L. Combettes Fonctions Perspectives et Statistique 18/31

34 Proximity operators Basic properties: p = prox f x x p f(p) prox f + prox f = Id (Moreau s decomposition) For f = ι V, V a closed vector subspace: P V + P V = Id prox ρ = Id prox (ρ ) = Id P [ ρ,ρ] = soft ρ (prox f x, x prox f x) = (prox f x, prox f x) gra f Fix prox f = Argmin f prox f x prox f y x y Patrick L. Combettes Fonctions Perspectives et Statistique 19/31

35 Proximity operators Basic properties: p = prox f x x p f(p) prox f + prox f = Id (Moreau s decomposition) For f = ι V, V a closed vector subspace: P V + P V = Id prox ρ = Id prox (ρ ) = Id P [ ρ,ρ] = soft ρ (prox f x, x prox f x) = (prox f x, prox f x) gra f Fix prox f = Argmin f prox f x prox f y 2 x y 2 prox f x prox f y 2 The last two properties suggest the conceptual algorithm x n+1 = prox f x n to minimize f, which is at the root of proximal splitting algorithms. Patrick L. Combettes Fonctions Perspectives et Statistique 19/31

36 Proximal splitting methods in convex optimization f Γ 0 (H), ϕ k Γ 0 (G k ), l k Γ 0 (G k ) strongly convex, L k : H G k linear bounded, L k = 1, h: H R convex and smooth: minimize x H f(x)+ p (ϕ k l k )(L k x r k )+h(x) k=1 where: ϕ k l k : x inf y H ( ϕk (y)+l k (x y) ) Example: multiview total variation image recovery from observations r k = L k x + w k : minimize x H k N p 1 φ k ( x e k )+ k=1 α k d Ck }{{} ι C (L k x r k )+β x 1,2 A splitting algorithm activates each function and each linear operator individually Patrick L. Combettes Fonctions Perspectives et Statistique 20/31

37 Proximal splitting methods in convex optimization Algorithm: for n = 0, 1,... For k = 1,...,p y 1,n = x n ( h(x n)+ m k=1 L k v k,n ) p 1,n = prox f y 1,n y 2,k,n = v k,n +(L k x n l k(v k,n )) p 2,k,n = prox g k (y 2,k,n r k ) q 2,k,n = p 2,k,n +(L k p 1,n l k(p 2,k,n )) v k,n+1 = v k,n y 2,k,n + q 2,k,n q 1,n = p 1,n ( h(p 1,n )+ m k=1 L k p 2,k,n ) x n+1 = x n y 1,n + q 1,n (x n ) n N converges weakly to a solution PLC, Systems of structured monotone inclusions: Duality, algorithms, and applications, SIAM J. Optim., vol. 23, 2013 Patrick L. Combettes Fonctions Perspectives et Statistique 21/31

38 Perspective functions: Proximity operator Let ϕ Γ 0 (G), let γ ]0,+ [, let η R, and let y G. Suppose that η +γϕ (y/γ) 0. Then prox γ ϕ (η, y) = (0, 0). Suppose that domϕ is open and that η + γϕ (y/γ) > 0. Then prox γ ϕ (η, y) = ( η +γϕ (p), y γp ), where p is the unique solution to the inclusion y γp+ ( η +γϕ (p) ) ϕ (p). If ϕ is differentiable at p, then p is characterized by y = γp+(η +γϕ (p)) ϕ (p). Patrick L. Combettes Fonctions Perspectives et Statistique 22/31

39 Perspective functions: Proximity operator Let v G, let δ R, and let φ Γ 0 (R) be an even function such that φ is differentiable on R. Define ηφ( y /η)+δη + y v, if η > 0; g: (η, y) 0, if y = 0 and η = 0; +, otherwise. Let γ ]0,+ [, let η R, let y G, and set ψ: s (φ (s)+ ηγ ) δ φ (s)+s. Then ψ is invertible. Moreover, if η +γφ ( y/γ v ) > γδ, set Then t = ψ 1( y/γ v ) and p = v + prox γg (η, y) = t (y γv). y γv {( η +γ(φ (t) δ), y γp ), if η +γφ ( y/γ v ) > γδ; (0, 0), if η +γφ ( y/γ v ) γδ. Patrick L. Combettes Fonctions Perspectives et Statistique 23/31

40 Perspective functions: Proximity operator Let v G, let δ R, let α ]0,+ [, let q ]1,+ [, and consider the function y q +δη + y v, if η > 0; αηq 1 g: (η, y) 0, if y = 0 and η = 0; +, otherwise. Let γ ]0,+ [, set q = q/(q 1), set = (α(1 1/q )) q 1, and take η R and y G. If q γ q 1 η + y q > γδ, let t [0,+ [ be the unique solution to the equation t 2q 1 + q (η γδ) t q 1 + q γ t q y γv = 0 2 γ 2 and set p = v + t(y γv)/ y γv. Then prox γg (η, y) = {( η +γ( t q δ)/q, y γp ), if q γ q 1 η + y q > γδ; ( 0, 0 ), if q γ q 1 η + y q γδ. Patrick L. Combettes Fonctions Perspectives et Statistique 24/31

41 Perspective functions: Proximity operator Let (Ω,F,µ) be a measure space, let G be a separable real Hilbert space, and let ϕ Γ 0 (G). Set H = L 2 ((Ω,F,µ);R) and G = L 2 ((Ω,F,µ); G), and suppose thatµ(ω) < + orϕ ϕ(0) = 0. For every x H, set Ω 0 (x) = { ω Ω x(ω) = 0 } and Ω + (x) = { ω Ω x(ω) > 0 }. Define Φ: H G ],+ ]: (x, y) ( ) ( )( ) y(ω) recϕ y(ω) µ(dω) + x(ω)ϕ µ(dω), Ω 0 (x) Ω +(x) x(ω) x 0 a.e. if (recϕ)(y)1 Ω0 (x) + xϕ(y/x)1 Ω+(x) L 1( (Ω,F,µ);R ) ; +, otherwise. Now let x H and y G, and set, for µ-almost every ω Ω, (p(ω), q(ω)) = prox ϕ (x(ω), y(ω)). Then prox Φ (x, y) = (p, q). Patrick L. Combettes Fonctions Perspectives et Statistique 25/31

42 Perspective functions: Proximity operator We can also handle cases when domϕ is not open. Consider the perspective function { ϕ: R 2 d [ εη,εη] (y), if η 0; ],+ ]: (η, y) +, if η < 0 of the Vapnik loss function ϕ = max{ ε, 0}. ϕ = ε +ι [ 1,1]. Let η R, let y R, and set (χ, q) = prox γ ϕ (η, y). Then If η +ε y 0 and y γ, (χ, q) = (0, 0). If η γε and y > γ, (χ, q) = (0, y γsign(y)). If η > γε and y > εη +γ(1+ε 2 ), (χ, q) = (η +γε, y γsign(y)). If y > η/ε and εη y εη +γ(1+ε 2 ), (χ, q) = ((η +ε y )/(1+ε 2 ),ε(η +ε y )sign(y)/(1+ε 2 )). If η 0 and y εη, (χ, q) = (η, y). Patrick L. Combettes Fonctions Perspectives et Statistique 26/31

43 Applications in high-dimensional statistics Linear data model: z = Xb+σe Penalized concomitant M-estimators: n minimize ϕ i (σ, X i: b ζ i )+ σ R,τ R, b R p i=1 p ψ j (τ, aj b). This model unifies various robust regression procedures Can be solved efficiently by the block-iterative proximal splitting method of PLC and J. Eckstein, Asynchronous block-iterative primal-dual decomposition methods for monotone inclusions, Mathematical Programming, published online j=1 Other model of interest: generalized TREX Xb z q 2 minimize + b b R p α X (Xb z) q 1 1 Patrick L. Combettes Fonctions Perspectives et Statistique 27/31

44 Applications in high-dimensional statistics The nonconvex generalized TREX problem can be rewritten as a system of 2p convex problems minimize b R p x j (Xb z)>0 Xb z q 2 α x j (Xb z) q 1+ b 1, where x j = sx :j, s { 1, 1}. Each subproblem involves the (shifted) perspective function y z 2 2 α ( ), η x if η > x j z; j z g j : (η, y) 0, if y = z and η = xj z; +, otherwise of q 2 composed with the linear operator b ( x j Xb, Xb ), and h = [ 1 ] = 1. It can be solved (for instance), by a Douglas-Rachford-like algorithm. Patrick L. Combettes Fonctions Perspectives et Statistique 28/31

45 Applications in high-dimensional statistics prox h is the standard soft thresholding operator We have prox γgj (η, y) {( η +γ t q /q, y γp ), if q γ q 1 (η xj z)+ y z q = 2 > 0; ( x j z, z ), if q γ q 1 (η xj z)+ y z q 2 0, where = (α(1 1/q )) q 1, t (y z), if y z; p = y z 0, if y = z, and t is the unique solution in ]0,+ [ to the reduced equation s 2q 1 + q (η xj z) s q 1 + q γ s q y z = 0. 2 γ 2 Patrick L. Combettes Fonctions Perspectives et Statistique 29/31

46 Applications in high-dimensional statistics Algorithm for the jth generalized TREX subproblem q k = M j x k y k b k = x k R j q k c k = M j b k z k = prox γh (2b k x k ) t k = prox γgj,q (2c k y k ) x k+1 = x k +µ k (z k b k ) y k+1 = y k +µ k (t k c k ). (b k ) k N converges to a solution b to the subproblem. See paper for detailed numerical application to sparse regression. Patrick L. Combettes Fonctions Perspectives et Statistique 30/31

47 References PLC and C. L. Müller, Perspective functions: Proximal calculus and applications in high-dimensional statistics, J. Math. Anal. Appl., 2017 (published online) PLC, Perspective functions: Properties, constructions, and examples, H. H. Bauschke and PLC, Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York, (second edition: February 2017) Patrick L. Combettes Fonctions Perspectives et Statistique 31/31

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques