Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques Université Pierre et Marie Curie Paris 6 75005 Paris, France Edinburgh, May 7, 2015 Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 1/25

2/25 Framework A wide range of problems in applied nonlinear analysis can be reduced to finding a point in a closed convex subset F of a Hilbert space H. The solution set F is often constructed from prior information and the observation of data. Mathematical model: find x F n N Fix T n, T n : H H quasi-nonexpansive Algorithmic model: x n+1 = T n x n Asymptotic analysis: under suitable assumptions, x n ) n N converges weakly/strongly/linearly) to a point in F Application areas: variational inequalities, game theory, optimization, statistics, partial differential equations, inverse problems, mechanics, signal and image processing, machine learning, computer vision, transport theory, optics,... Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 2/25

2/25 Basic convergence principle T n is quasi-nonexpansive, i.e., with F Fix T n. Then: x H) y Fix T n ) T n x y x y, Fejér monotonicity: n N) y F) x n+1 y x n y Suppose that the set Wx n ) n N of weak cluster points of x n ) n N is in F. Then x n x F Elementary example: alternating projection method for finding a point in the intersection of two closed convex sets C 1 and C 2. Set T 2n = P 2 and T 2n+1 = P 1. Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 3/25

2/25 Example: Special cases Fixed point methods: Krasnosel skĭı Mann, string averaging, extrapolated barycentric methods, Martinet s cyclic firmly nonexpansive iteration, etc. Projections methods for convex feasibility problems Projections methods for split feasibility problems Subgradient projections methods for systems of convex inequalities Splitting methods for monotone inclusions: forward-backward, Douglas-Rachford, forward-backward-forward, Spingarn, etc. Convex optimization methods: projected gradient method, augmented Lagrangian method, ADMM, various proximal splitting methods, etc. Iterative methods for variational inequalites in mechanics, traffic theory, and finance etc. Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 4/25

2/25 Very large scale problems I: Block-coordinate approach The basic iteration x n+1 = T n x n in the Hilbert space H may be too involved computations, memory) to be operational We assume that H can be decomposed in m factors H = H 1 H m in which each T n has an explicit decomposition T n : x T 1,n x,...,t m,n x) The strategy is to update only arbitrarily chosen coordinates of x n+1 = x 1,n+1,...,x m,n+1 ) up to some tolerance: x i,n+1 = x i,n +ε i,n Ti,n x n + a i,n x i,n ), where ε i,n {0, 1} activation variable) Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 5/25

2/25 Very large scale problems I: Block-coordinate approach The basic iteration x n+1 = T n x n in the Hilbert space H may be too involved computations, memory) to be operational We assume that H can be decomposed in m factors H = H 1 H m in which each T n has an explicit decomposition T n : x T 1,n x,...,t m,n x) The strategy is to update only arbitrarily chosen coordinates of x n+1 = x 1,n+1,...,x m,n+1 ) up to some tolerance: x i,n+1 = x i,n +ε i,n Ti,n x n + a i,n x i,n ), where ε i,n {0, 1} activation variable) Our goal is to extend available fixed points methods to this block-coordinate setting while preserving their convergence properties Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 5/25

2/25 A roadblock The nice properties of an operator T: H H are destroyed by coordinate sampling For instance, consider the nonexpansive 1-Lipschitz) operator T: x 1, x 2 ) x 2, x 1 ) Then Q 1 : x 1, x 2 ) x 1, x 1 ) Q 2 : x 1, x 2 ) x 2, x 2 ) Q 1 Q 2 Q 2 Q 1 are no longer nonexpansive Fejér monotonicity is destroyed Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 6/25

2/25 A roadblock The nice properties of an operator T: H H are destroyed by coordinate sampling For instance, consider the nonexpansive 1-Lipschitz) operator T: x 1, x 2 ) x 2, x 1 ) Then Q 1 : x 1, x 2 ) x 1, x 1 ) Q 2 : x 1, x 2 ) x 2, x 2 ) Q 1 Q 2 Q 2 Q 1 are no longer nonexpansive Fejér monotonicity is destroyed.. and even for jointly convex functions with a unique minimizer, alternating minimizations fail, etc. Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 6/25

2/25 A roadblock The nice properties of an operator T: H H are destroyed by coordinate sampling For instance, consider the nonexpansive 1-Lipschitz) operator T: x 1, x 2 ) x 2, x 1 ) Then Q 1 : x 1, x 2 ) x 1, x 1 ) Q 2 : x 1, x 2 ) x 2, x 2 ) Q 1 Q 2 Q 2 Q 1 are no longer nonexpansive Fejér monotonicity is destroyed.. and even for jointly convex functions with a unique minimizer, alternating minimizations fail, etc. introduce stochasticity and renorming Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 6/25

Notation H: separable real Hilbert space; : weak convergence Wx n ) n N : set of weak cluster points of x n ) n N H N Γ 0 H): proper lower semicontinuous convex functions from H to ],+ ] Ω, F, P): underlying probability space Given a sequence x n ) n N of H-valued random variables, X = X n ) n N, where n N) X n = σx 0,...,x n ) l + X ): set of sequences of [0,+ [-valued random variables ξ n ) n N such that, for every n N, ξ n is X n -measurable { l p } +X )= ξ n ) n N l + X ) ξn p < + P-a.s., p ]0,+ [ l + {ξ X ) = n ) n N l + X ) n N sup n N } ξ n < + P-a.s. Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 7/25

Stochastic quasi-fejér sequences Ø F H, φ: [0,+ [ [0,+ [, φt) + as t + Deterministic definition: A sequence x n ) n N in H is Fejér monotone with respect to F if for every z F, n N) φ x n+1 z ) φ x n z ) Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 8/25

Stochastic quasi-fejér sequences Ø F H, φ: [0,+ [ [0,+ [, φt) + as t + Stochastic definition 1: A sequence x n ) n N of H-valued random variables is stochastically Fejér monotone with respect to F if, for every z F, n N) Eφ x n+1 z X n ) φ x n z ) Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 8/25

Stochastic quasi-fejér sequences Ø F H, φ: [0,+ [ [0,+ [, φt) + as t + Stochastic definition 2: A sequence x n ) n N of H-valued random variables is stochastically quasi-fejér monotone with respect to F if, for every z F, there exist χ n z)) n N l 1 + X ), ϑ n z)) n N l + X ), and η n z)) n N l 1 +X ) such that n N) Eφ x n+1 z ) X n )+ϑ n z) 1+χ n z))φ x n z )+η n z) Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 8/25

Stochastic quasi-fejér sequences Ø F H, φ: [0,+ [ [0,+ [, φt) + as t + Stochastic definition 2: A sequence x n ) n N of H-valued random variables is stochastically quasi-fejér monotone with respect to F if, for every z F, there exist χ n z)) n N l 1 + X ), ϑ n z)) n N l + X ), and η n z)) n N l 1 +X ) such that Theorem n N) Eφ x n+1 z ) X n )+ϑ n z) 1+χ n z))φ x n z )+η n z) Suppose x n ) n N is stochastically quasi-fejér monotone. Then z F) [ n N ϑ nz) < + P-a.s. ] [Wx n ) n N F P-a.s.] [x n ) n N converges weakly P-a.s. to an F-valued random variable] Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 8/25

An abstract stochastic iterative scheme Theorem Let Ø F H. Suppose that x n ) n N, t n ) n N, and e n ) n N are sequences of H-valued random variables such that: n N) x n+1 = x n +λ n t n + e n x n ), λ n ]0, 1] n N λ n E en 2 X n ) < + P-a.s. For every z F, there exist θ n z)) n N l + X ), µ n z)) n N l + X ), and ν n z)) n N l + X ) such that λ n µ n z)) n N l 1 +X ), λ n ν n z)) n N l 1/2 + X ), and n N) E t n z 2 X n )+θ n z) 1+µ n z)) x n z 2 +ν n z) Then z F) [ n N λ nθ n z) < + P-a.s. ] and [Wx n ) n N F P-a.s.] x n ) n N converges weakly P-a.s. to an F-valued random variable. Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 9/25

Single-layer algorithm H = H 1 H m, H i ) 1 i m separable real Hilbert spaces T n : H H: x T i,n x) 1 i m quasinonexpansive F = n N Fix T n Ø x 0 and the errors a n ) n N are H-valued random variables ε n ) n N identically distributed D-valued random variables with D = {0, 1} m {0} Algorithm: for n = 0, 1,... for i = 1,...,m xi,n+1 = x i,n +ε i,n λ n Ti,n x 1,n,...,x m,n )+a i,n x i,n ) Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 10/25

Single-layer algorithm Theorem Set n N) X n = σx 0,...,x n ) and E n = σε n ). Assume that inf n N λ n > 0 and sup n N λ n < 1. n N E an 2 X n ) < +. Wx n ) n N F P-a.s. For every n N, E n and X n are independent. For every i {1,...,m}, p i = P[ε i,0 = 1] > 0. Then x n ) n N converges weakly P-a.s. to an F-valued r.v. Proof. We achieve stochastic quasi-fejér monotonicity w.r.t. the norm x 2 = m i=1 x i 2 /p i Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 11/25

Example: Krasnosel skĭı Mann iteration T: H H: x T i x) 1 i m nonexpansive operator F = Fix T Ø x 0 and the errors a n ) n N are H-valued random variables ε n ) n N identically distributed D-valued random variables with D = {0, 1} m {0} Algorithm: for n = 0, 1,... for i = 1,...,m xi,n+1 = x i,n +ε i,n λ n Ti x 1,n,...,x m,n )+a i,n x i,n ) Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 12/25

Example: Krasnosel skĭı Mann iteration Theorem Set n N) X n = σx 0,...,x n ) and E n = σε n ). Assume that inf n N λ n > 0 and sup n N λ n < 1 n N E an 2 X n ) < + For every n N, E n and X n are independent For every i {1,...,m}, P[ε i,0 = 1] > 0 Then x n ) n N converges weakly P-a.s. to an F-valued r.v. Proof. Apply the single-layer theorem with T n = T. Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 13/25

Example: Block-coordinate, primal-dual splitting of coupled composite monotone inclusions Let F be the set of solutions to the problem find x 1 H 1,...,x m H m such that i {1,...,m}) 0 A i x i + p L ki B m k L kj x j ), where each A i : H i 2 H i, Bk : G k 2 G k are maximally monotone, L ki : H i G k linear&bounded k=1 Let F be the set of solutions to the dual problem find v 1 G 1,...,v p G p such that m p ) k {1,...,p}) 0 L ki A 1 i L li v l i=1 l=1 j=1 + B 1 k v k Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 14/25

Example: Block-coordinate, primal-dual splitting of coupled composite monotone inclusions Let Q j 1 j m+p) be the jth component of the projector P V onto the subspace { m } V = x, y) H G k {1,...,p}) y k = L ki x i Algorithm: γ ]0,+ [ and for n = 0, 1,... µ n ]0, 2[ for i = 1,...,m zi,n+1 = z i,n +ε i,n Qi x 1,n,...,x m,n, y 1,n,...,y p,n )+c i,n z i,n ) x i,n+1 = x i,n +ε i,n µ n JγAi 2z i,n+1 x i,n )+a i,n z i,n+1 ) for k = 1,...,p wk,n+1 = w k,n +ε m+k,n Qm+k x 1,n,...,x m,n, y 1,n,...,y p,n )+d k,n w k,n y k,n+1 = y k,n +ε m+k,n µ n JγBk 2w k,n+1 y k,n )+b k,n w k,n+1 ). i=1 Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 15/25

Example: Block-coordinate, primal-dual splitting of coupled composite monotone inclusions Under the same conditions as before: z n ) n N converges weakly P-a.s. to an F-valued random variable γ 1 w n y n )) n N converges weakly P-a.s. to an F - valued random variable Proof: This relies on the single-layer theorem and a nonstandard implementation of the Douglas-Rachford algorithm in the product space H G = H 1 H m G 1 G p : solve 0, 0) Ax By+N V x, y), where A: H 2 H : x m i=1 A ix i B: G 2 G : x p { k=1b k y k V = x, y) H G k {1,...,p}) y k = } m i=1 L kix i Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 16/25

Example: Nonsmooth, block-coordinate, primal-dual multivariate minimization Let F be the set of solutions to the problem minimize x 1 H 1,...,x m H m m f i x i )+ i=1 p m ) g k L ki x i where f i Γ 0 H i ), g k Γ 0 G k ), L ki : H i G k linear&bounded k=1 Let F be the set of solutions to the dual problem minimize v 1 G 1,...,v p G p m i=1 f i k=1 i=1 p ) L ki v k + p g k v k) k=1 Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 17/25

Example: Nonsmooth, block-coordinate, primal-dual multivariate minimization Algorithm: γ ]0,+ [ and for n = 0, 1,... µ n ]0, 2[ for i = 1,...,m zi,n+1 = z i,n +ε i,n Qi x 1,n,...,x m,n, y 1,n,...,y p,n )+c i,n z i,n ) x i,n+1 = x i,n +ε i,n µ n proxγfi 2z i,n+1 x i,n )+a i,n z i,n+1 ) for k = 1,...,p wk,n+1 = w k,n +ε m+k,n Qm+k x 1,n,...,x m,n, y 1,n,...,y p,n )+d k,n w k,n y k,n+1 = y k,n +ε m+k,n µ n proxγgk 2w k,n+1 y k,n )+b k,n w k,n+1 ). Under the same conditions as before: z n ) n N converges weakly P-a.s. to an F-valued random variable γ 1 w n y n )) n N converges weakly P-a.s. to an F -valued random variable Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 18/25

Double-layer random block-coordinate algorithms T n : H H: x T i,n x) 1 i m is α n -averaged Id+α 1 n T n Id) is nonexpansive) R n : H H is β n -averaged F = n N Fix T n R n Ø x 0, a n ) n N, and b n ) n N are H-valued random variables ε n ) n N identically distributed D-valued random variables with D = {0, 1} m {0} Algorithm: for n = 0, 1,... y n = R n x n + b n for i = 1,...,m ) xi,n+1 = x i,n +ε i,n Ti,n y n + a i,n x i,n Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 19/25

Double-layer random block-coordinate algorithms Theorem Set n N) X n = σx 0,...,x n ) and E n = σε n ). Assume that n N E an 2 X n ) < +, n N E bn 2 X n ) < + Wx n ) n N F P-a.s. For every n N, E n and X n are independent For every i {1,...,m}, p i = P[ε i,0 = 1] > 0 Then x n ) n N converges weakly P-a.s. to an F-valued r.v. Proof. We achieve stochastic quasi-fejér monotonicity w.r.t. the norm x 2 = m i=1 x i 2 /p i Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 20/25

Block-coordinate forward-backward splitting The forward-backward splitting algorithm is important because: It models many problems of interest and tolerates errors: PLC and Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Model. Simul., vol. 4, 2005. Applied to the dual problem of a strongly monotone/convex composite problem, it provides a primal-dual algorithm: PLC, Dung, and Vũ, Dualization of signal recovery problems, Set-Valued Var. Anal., vol. 18, 2010. Applied in a renormed product space, it covers/extends various methods e.g., Chambolle-Pock): Vũ, A splitting algorithm for dual monotone inclusions involving cocoercive operators, Adv. Comput. Math., vol. 38, 2013. It can be implemented with variable metrics: PLC and Vũ, Variable metric forward-backward splitting with applications to monotone inclusions in duality, Optimization, vol. 63, 2014. In minimization problems it provides Fejér-monotonicity, convergent sequences, and monotone minimizing sequences. Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 21/25

Block-coordinate forward-backward splitting Let F be the set of solutions to the problem find x 1 H 1,...,x m H m such that i {1,...,m}) 0 A i x i + B i x 1,...,x m ) where A i : H i 2 H i is maximally monotone and, for some ϑ > 0, m m x H) y H) x i y i B i x B i y ϑ B i x B i y 2 Algorithm: i=1 for n = 0, 1,... ε γ n 2 ε)ϑ for i = 1,...,m xi,n+1 = x i,n +ε i,n JγnA i xi,n γ n B i x 1,n,...,x m,n )+c i,n ) ) ) +a i,n x i,n i=1 Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 22/25

Block-coordinate forward-backward splitting Under the same conditions as before almost sure weak convergence of x n ) n N to a point in F is achieved. Proof: In double-layer theorem, set A: H 2 H : x m i=1 A ix i B: H H: x B i x) 1 i m T n = J γna R n = Id γ n B, F = zera+b) b n = γ n c n α n = 1/2 β n = γ n /2ϑ) Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 23/25

Block-coordinate forward-backward splitting: convex minimization Let F be the set of solutions to the problem minimize x 1 H 1,...,x m H m m p m ) f i x i )+ g k L ki x i i=1 where f i Γ 0 H i ), g k : G k R differentiable, convex, g k τ k -Lipschitz, L ki : H i G k linear&bounded Let Algorithm: k=1 i=1 k=1 p m ϑ = τ k L ki L ki ) 1 for n = 0, 1,... for i = 1,...,m p ri,n = ε i,n xi,n γ n k=1 L ki g m k j=1 L ) )) kjx j,n + ci,n ) x i,n+1 = x i,n +ε i,n λ n proxγnf i r i,n + a i,n x i,n. i=1 Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 24/25

References PLC and J.-C. Pesquet, Stochastic quasi-fejér block-coordinate fixed point iterations with random sweeping, SIAM J. Optim., 2015 arxiv, April 2014) H. H. Bauschke and PLC, Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York, 2011. 2nd ed., Fall 2015) Patrick L. Combettes joint work with J.-C. Pesquet) Block-coordinate splitting algorithms 25/25