Convergence of Fixed-Point Iterations

Similar documents
Tight Rates and Equivalence Results of Operator Splitting Schemes

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Math 273a: Optimization Subgradient Methods

Operator Splitting for Parallel and Distributed Optimization

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Math 273a: Optimization Convex Conjugacy

Sparse Optimization Lecture: Dual Methods, Part I

ARock: an algorithmic framework for asynchronous parallel coordinate updates

Convex Optimization Notes

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

Dual and primal-dual methods

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates

Proximal splitting methods on convex problems with a quadratic term: Relax!

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Math 273a: Optimization Subgradients of convex functions

A General Iterative Method for Constrained Convex Minimization Problems in Hilbert Spaces

ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates

New Iterative Algorithm for Variational Inequality Problem and Fixed Point Problem in Hilbert Spaces

Research Article On an Iterative Method for Finding a Zero to the Sum of Two Maximal Monotone Operators

Second order forward-backward dynamical systems for monotone inclusion problems

Extensions of the CQ Algorithm for the Split Feasibility and Split Equality Problems

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

WE consider an undirected, connected network of n

Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping

Sequential Unconstrained Minimization: A Survey

Conditional Gradient (Frank-Wolfe) Method

Convergence rate analysis for averaged fixed point iterations in the presence of Hölder regularity

Lecture Notes on Iterative Optimization Algorithms

M. Marques Alves Marina Geremia. November 30, 2017

Convex Analysis and Monotone Operator Theory in Hilbert Spaces

HAIYUN ZHOU, RAVI P. AGARWAL, YEOL JE CHO, AND YONG SOO KIM

FINDING BEST APPROXIMATION PAIRS RELATIVE TO A CONVEX AND A PROX-REGULAR SET IN A HILBERT SPACE

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

A Primal-dual Three-operator Splitting Scheme

Splitting methods for decomposing separable convex programs

Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization. October 15, 2008

Subgradient Method. Ryan Tibshirani Convex Optimization

1 Introduction and preliminaries

Math 273a: Optimization Lagrange Duality

PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT

On convergence rate of the Douglas-Rachford operator splitting method

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS

arxiv: v3 [math.oc] 1 May 2015

CONVERGENCE THEOREMS FOR STRICTLY ASYMPTOTICALLY PSEUDOCONTRACTIVE MAPPINGS IN HILBERT SPACES. Gurucharan Singh Saluja

About Split Proximal Algorithms for the Q-Lasso

arxiv: v4 [math.oc] 29 Jan 2018

Hybrid Steepest Descent Method for Variational Inequality Problem over Fixed Point Sets of Certain Quasi-Nonexpansive Mappings

A New Use of Douglas-Rachford Splitting and ADMM for Classifying Infeasible, Unbounded, and Pathological Conic Programs

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

THE CYCLIC DOUGLAS RACHFORD METHOD FOR INCONSISTENT FEASIBILITY PROBLEMS

Proximal methods. S. Villa. October 7, 2014

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

GENERAL NONCONVEX SPLIT VARIATIONAL INEQUALITY PROBLEMS. Jong Kyu Kim, Salahuddin, and Won Hee Lim

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Douglas-Rachford splitting for nonconvex feasibility problems

Math 273a: Optimization Subgradients of convex functions

Adaptive Restarting for First Order Optimization Methods

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

You should be able to...

Iterative algorithms based on the hybrid steepest descent method for the split feasibility problem

Existence and Approximation of Fixed Points of. Bregman Nonexpansive Operators. Banach Spaces

Accelerated Proximal Gradient Methods for Convex Optimization

Convergence Theorems of Approximate Proximal Point Algorithm for Zeroes of Maximal Monotone Operators in Hilbert Spaces 1

ADMM for monotone operators: convergence analysis and rates

Newton s Method. Javier Peña Convex Optimization /36-725

A Relaxed Explicit Extragradient-Like Method for Solving Generalized Mixed Equilibria, Variational Inequalities and Constrained Convex Minimization

Fixed Point Theory in Reflexive Metric Spaces

Unconstrained minimization of smooth functions

A Unified Approach to Proximal Algorithms using Bregman Distance

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Primal-dual coordinate descent

Optimization and Optimal Control in Banach Spaces

A generalized forward-backward method for solving split equality quasi inclusion problems in Banach spaces

Split equality monotone variational inclusions and fixed point problem of set-valued operator

Monotone Operator Splitting Methods in Signal and Image Recovery

Nonconvex notions of regularity and convergence of fundamental algorithms for feasibility problems

Iterative Convex Optimization Algorithms; Part Two: Without the Baillon Haddad Theorem

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Hedy Attouch, Jérôme Bolte, Benar Svaiter. To cite this version: HAL Id: hal

MATH 680 Fall November 27, Homework 3

Viscosity approximation methods for the implicit midpoint rule of asymptotically nonexpansive mappings in Hilbert spaces

APPLICATIONS IN FIXED POINT THEORY. Matthew Ray Farmer. Thesis Prepared for the Degree of MASTER OF ARTS UNIVERSITY OF NORTH TEXAS.

6. Proximal gradient method

A Dykstra-like algorithm for two monotone operators

On an iterative algorithm for variational inequalities in. Banach space

Thai Journal of Mathematics Volume 14 (2016) Number 1 : ISSN

1991 Mathematics Subject Classification: 47H09, 47H10.

Linear Convergence under the Polyak-Łojasiewicz Inequality

A New Modified Gradient-Projection Algorithm for Solution of Constrained Convex Minimization Problem in Hilbert Spaces

Research Article Strong Convergence of a Projected Gradient Method

Lecture 5: September 12

CYCLIC COORDINATE-UPDATE ALGORITHMS FOR FIXED-POINT PROBLEMS: ANALYSIS AND APPLICATIONS

Existence of Global Minima for Constrained Optimization 1

Transcription:

Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30

Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and differential equations Often require only minimal conditions Simplify complicated convergence proofs 2 / 30

3 / 30

Notation space: Hilbert space H equipped with, and Fine to think in R 2 (though not always) An operator T : H H (or C C where C is closed subset of H) our focus: when FixT := {x H : x = T (x)} is nonempty the convergence of x k+1 T (x k ) simplification: T (x) is often written as T x 4 / 30

Examples unconstrained C 1 minimization: minimize f(x) x is a stationary point if f(x ) = 0 gradient descent operator: for γ > 0 T := I γ f the gradient descent iteration x k+1 T x k lemma: x is a stationary point if, and only if, x FixT 5 / 30

Examples constrained C 1 minimization: minimize f(x) subject to x C assume: f is proper closed convex, C is nonempty closed convex projected-gradient operator: for γ > 0 T := proj C (I γ f) x k+1 T x k is the projected-gradient iteration ( x k+1 proj C x k γ f(x k ) ) x is optimal if f(x ), x x 0 x C lemma: x is optimal if, and only if, x FixT 6 / 30

Lipschitz operator definition: an operator T is L-Lipschitz, L [0, ), if T x T y L x y, x, y H definition: an operator T is L-quasi-Lipschitz, L [0, ), if for any x FixT (assumed to exist), T x x L x x, x H L 7 / 30

Contractive operator definition: T is contractive if it is L-Lipschitz for L [0, 1) definition: T is quasi-contractive if it is L-quasi-Lipschitz for L [0, 1) L 1 8 / 30

Banach fixed-point theorem Theorem: If T is contractive, then T admits a unique fixed-point x (existence and uniqueness) x k x (convergence) x k x L k x 0 x (speed) Holds in a Banach space Also known as the Picard-Lindelöf Theorem 9 / 30

Examples minimize a Lipschitz-differentiable strongly-convex function: minimize f(x) definition: a convex f is L-Lipschitz-differentiable if f(x) f(y) 2 L x y, f(x) f(y) x, y domf definition: a convex f is µ-strongly convex if, element wise, f(x) f(y), x y µ x y 2 x, y domf lemma: Gradient descent operator T := I γ f is C-contractive for all γ in a certain interval. exercise: find the interval of γ and the formula of C in γ, L, µ Also true for a projected-gradient operator if C is closed convex and C domf 10 / 30

Nonexpansive operator definition: an operator is nonexpansive if it is 1-Lipschitz, i.e., 1 T x T y x y, x, y H properties: T may not have a fixed point x if x exists, x k+1 = T x k is bounded may diverge examples: rotation, alt. reflection T 2 x T T x 3 x θ x 11 / 30

Between L = 1 and L < 1 L < 1: linear (or geometric) convergence L = 1: bounded, may diverge A vast set of algorithms (often with sublinear convergence) cannot be characterized by L Alternative projection (von Neumann) Gradient descent without strong convexity Proximal-point algorithm without strong convexity Operator splitting algorithms 12 / 30

Averaged operator fixed-point residual operator: R := I T Rx = 0 x = T x averaged operator: from some η > 0, T x T y 2 x y 2 η Rx Ry 2, x, y H. quasi-averaged operator: from some η > 0, T x x 2 x x 2 η Rx 2, x H. interpretation: improve by the amount of fixed-point violation speed: may become slower as x k gets closer to the minimizer 13 / 30

convention: use α instead of η following η > 0 α (0, 1) η := 1 α α α-averaged operator: from some η > 0, special case: T x T y 2 x y 2 1 α α Rx Ry 2, x, y H α = 1 : T is called firmly nonexpansive 2 α = 1 (violating α (0, 1)): T is called nonexpansive 14 / 30

Why called averaged? Lemma T is α-averaged if, and only if, there exists a nonexpansive map T so that T = (1 α)i + αt. or equivalently, T := ( (1 1 α )I + 1 α T ) is nonexpansive. Proof. From T := (I 1 )I + 1 T = I 1 R, basic algebraic manipulation α α α gives us: for any x and y, α( x y 2 T x T y 2 ) = x y 2 T x T y 2 1 α α Rx Ry 2. Therefore, T is nonexpansive T is α-averaged. 15 / 30

Properties assume: T is α-averaged T has a fixed point x iteration: x k+1 T x k claims about the iteration: step-by-step, (a) x k+1 x 2 x k x 2 1 α α Rxk }{{} Rx =0 (b) by telescopic sum on (a), x k+1 x 2 x 0 x 2 1 α k α j=0 Rxj 2. (c) { Rx k 2 } is summable and Rx k 0 2 16 / 30

claims (cont.): (d) by (a), { x k x 2 } is monotonically decreasing until x k FixT (e) by (d), lim k x k x 2 exists (but not necessarily zero) (f) by (d), {x k } is bounded and thus has a weak cluster point x (note: H is weakly sequentially closed) Next: we will show that x FixT and then x k x. 17 / 30

claims (cont.): (h) demiclosedness principle: Let T be nonexpansive and R := I T. If x j x and lim Rx j = 0, then Rx = 0. Proof. Goal is to expand Rx 2 into convergent terms as j. Rx 2 = Rx j 2 + 2 Rx j, T x j T x + T x j T x 2 x j x 2 2 Rx, x j x Rx j 2 + 2 Rx j, T x j T x 2 Rx, x j x. Each term on the RHS 0 as j. Therefore, Rx 2 = 0. (i) by applying (h) to any converging subsequence, each cluster point x of {x k } is a fixed point. 18 / 30

claims (cont.): (j) By (e) and (i), x is the unique cluster point. Proof. Let ȳ also be a cluster point. ȳ FixT, just like x. by (e), both lim k x k x 2 and lim k x k ȳ 2 exist. algebraically, 2 x k, x ȳ = x k x 2 x k ȳ 2 + x 2 ȳ 2, whose RHS converges to a constant, say C. passing the limits of the two subsequence, to x and to ȳ, 2 x, x ȳ = 2 ȳ, x ȳ = c. hence, x ȳ 2 = 0. 19 / 30

Theorem (Krasnosel skĭi) Let T be an averaged operator with a fixed point. Then, the iteration x k+1 T x k converges weakly to a fixed point of T. 20 / 30

Mann s version Let T be a nonexpansive operator with a fixed point. Then, the iteration x k+1 (1 λ k )x k + λ k T x k (known as the KM iteration) converges weakly to a fixed point of T as long as λ k > 0, The λ k condition is ensured if λ k (1 λ k ) =. λ k [ɛ, 1 ɛ] (bounded away from 0 and 1) k 21 / 30

Remarks Can be relaxed to quasi-averagedness Summable errors can be added to the iteration In finite dimension, demiclosedness principle is not needed This fundamental result is largely ignore, yet often reproved in R n Browder-Göhde-Kirk fixed-point theorem: If T has no fixed point and λ k is bounded away from 0 and 1, the sequence {x k } is unbounded. Speed: Rx k 2 = o(1/k), no rate for x k x Much more applications than Banach s fixed-point theorem 22 / 30

Special cases proximal-point algorithm problem: minimize f(x) proximal operator: let λ > 0, T := prox λf Since T is firmly-nonexpansive, x k+1 prox λf (x k ) converges weakly to a minimizer of f, if it exists 23 / 30

Special cases gradient descent: Define the gradient-descent operator: T := I λ f iteration: x k+1 T x k = x k γ f(x k ) Baillion-Haddad theorem: if f is convex and f is L-Lipschitz, then f(x) f(y) 2 L x y, f(x) f(y) If f has a minimizer x, then 2 Lγ γ f(xk ) 2 2 x k x, γ f(x k ) 24 / 30

Directly expand x k+1 x 2 : x k+1 x 2 = x k γ f(x k ) x 2 Therefore, T is quasi-averaged if = x k x 2 2 x k x, γ f(x k ) + γ f(x k ) 2 x k x 2 ( 2 Lγ 1) γ f(xk ) 2. λ ( 0, 2 L). In fact, it is easy to show that T is averaged. The convergence result applies to gradient descent. 25 / 30

Composition of operators If T 1,..., T m : H H are nonexpansive, then T 1 T m is nonexpansive. If T 1,..., T m : H H are averaged, then T 1 T m is averaged. The averagedness constants get worse: let T i be α i-averaged (allowing α i = 1), then T = T 1 T m is α-averaged where α = m m 1 + 1 max i α i In addition, if any T i is contractive, T 1 T m is contractive. 26 / 30

Special cases projected-gradient method: convex problem: minimize f(x) subject to x C. x assume sufficient intersection between domf and C define: T := proj C (I λ f) assume f is L-Lipschitz, let λ (0, 2/L) since both proj C and (I λ f) are averaged, T is averaged therefore, the following sequence weakly converges to a minimizer, if exists: x k+1 T x k = proj C ( x k λ f(x k ) ) 27 / 30

Special cases prox-gradient method: convex problem: minimize x f(x) + h(x) assume sufficient intersection between domf and domh define: T := prox λh (I λ f) assume f is L-Lipschitz, let λ (0, 2/L) since both prox λh and (I λ f) are averaged, T is averaged therefore, the following sequence weakly converges to a minimizer, if exists: x k+1 T x k = proj λh ( x k λ f(x k ) ) 28 / 30

Special cases Later this course, we will see more special cases forward-backward iteration Douglas-Rachford and Peaceman-Rachford iteration ADMM Tseng s forward-backward-forward iteration Davis-Yin iteration primal-dual iteration 29 / 30

Summary Fixed-point iteration and analysis are powerful tools Contractive T : fixed-point exists, is unique, iteration strongly converges Nonexpansive T : bounded, if fixed-point exists Averaged T : weakly converges, if fixed-point exists More power: closedness under composition 30 / 30