Convergence rate analysis for averaged fixed point iterations in the presence of Hölder regularity

Similar documents
THE CYCLIC DOUGLAS RACHFORD METHOD FOR INCONSISTENT FEASIBILITY PROBLEMS

Douglas-Rachford splitting for nonconvex feasibility problems

Analysis of the convergence rate for the cyclic projection algorithm applied to semi-algebraic convex sets

Analysis of the convergence rate for the cyclic projection algorithm applied to basic semi-algebraic convex sets

Convex Optimization Notes

arxiv: v1 [math.oc] 15 Apr 2016

Convex Feasibility Problems

Optimization and Optimal Control in Banach Spaces

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Identifying Active Constraints via Partial Smoothness and Prox-Regularity

On the order of the operators in the Douglas Rachford algorithm

Convergence of Fixed-Point Iterations

A Cyclic Douglas Rachford Iteration Scheme

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Mathematics 530. Practice Problems. n + 1 }

Optimality Conditions for Constrained Optimization

1 Introduction and preliminaries

Math 273a: Optimization Subgradient Methods

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Iterative common solutions of fixed point and variational inequality problems

A LOCALIZATION PROPERTY AT THE BOUNDARY FOR MONGE-AMPERE EQUATION

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

THE INVERSE FUNCTION THEOREM

Introduction and Preliminaries

A Proximal Method for Identifying Active Manifolds

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers.

arxiv: v2 [math.ag] 24 Jun 2015

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Constrained Optimization and Lagrangian Duality

ON GENERALIZED-CONVEX CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION

Metric Spaces and Topology

THROUGHOUT this paper, we let C be a nonempty

Iterative algorithms based on the hybrid steepest descent method for the split feasibility problem

Lecture 9 Monotone VIs/CPs Properties of cones and some existence results. October 6, 2008

A general iterative algorithm for equilibrium problems and strict pseudo-contractions in Hilbert spaces

Optimality, identifiability, and sensitivity

Optimality, identifiability, and sensitivity

SOME STABILITY RESULTS FOR THE SEMI-AFFINE VARIATIONAL INEQUALITY PROBLEM. 1. Introduction

(x k ) sequence in F, lim x k = x x F. If F : R n R is a function, level sets and sublevel sets of F are any sets of the form (respectively);

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Introduction to Real Analysis Alternative Chapter 1

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Convex Functions and Optimization

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

A projection-type method for generalized variational inequalities with dual solutions

A New Modified Gradient-Projection Algorithm for Solution of Constrained Convex Minimization Problem in Hilbert Spaces

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5.

Math 273a: Optimization Subgradients of convex functions

A strongly polynomial algorithm for linear systems having a binary solution

The Method of Alternating Projections

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Geometry and topology of continuous best and near best approximations

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.

Nonconvex notions of regularity and convergence of fundamental algorithms for feasibility problems

Some Properties of Convex Hulls of Integer Points Contained in General Convex Sets

On nonexpansive and accretive operators in Banach spaces

arxiv: v2 [math.oc] 21 Nov 2017

Chapter 1 Vector Spaces

Maximal Monotone Inclusions and Fitzpatrick Functions

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.

Putzer s Algorithm. Norman Lebovitz. September 8, 2016

Exercise Solutions to Functional Analysis

Viscosity approximation methods for the implicit midpoint rule of asymptotically nonexpansive mappings in Hilbert spaces

Convex Analysis and Optimization Chapter 2 Solutions

M. Marques Alves Marina Geremia. November 30, 2017

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

On intermediate value theorem in ordered Banach spaces for noncompact and discontinuous mappings

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

Metric Space Topology (Spring 2016) Selected Homework Solutions. HW1 Q1.2. Suppose that d is a metric on a set X. Prove that the inequality d(x, y)

Hölder Metric Subregularity with Applications to Proximal Point Method

ASYMPTOTICALLY NONEXPANSIVE MAPPINGS IN MODULAR FUNCTION SPACES ABSTRACT

Proximal splitting methods on convex problems with a quadratic term: Relax!

Monotone operators and bigger conjugate functions

Date: July 5, Contents

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

MATH 426, TOPOLOGY. p 1.

Your first day at work MATH 806 (Fall 2015)

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Continued fractions for complex numbers and values of binary quadratic forms

1 Lyapunov theory of stability

ASYMPTOTICALLY EXACT A POSTERIORI ESTIMATORS FOR THE POINTWISE GRADIENT ERROR ON EACH ELEMENT IN IRREGULAR MESHES. PART II: THE PIECEWISE LINEAR CASE

Extensions of the CQ Algorithm for the Split Feasibility and Split Equality Problems

2 so Q[ 2] is closed under both additive and multiplicative inverses. a 2 2b 2 + b

ESTIMATES FOR THE MONGE-AMPERE EQUATION

Appendix B Convex analysis

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Chapter 2 Convex Analysis

NORMS ON SPACE OF MATRICES

Some Background Material

Journal of Inequalities in Pure and Applied Mathematics

AN EFFECTIVE METRIC ON C(H, K) WITH NORMAL STRUCTURE. Mona Nabiei (Received 23 June, 2015)

Inexact alternating projections on nonconvex sets

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

The Generalized Viscosity Implicit Rules of Asymptotically Nonexpansive Mappings in Hilbert Spaces

Problem Set 6: Solutions Math 201A: Fall a n x n,

Transcription:

Convergence rate analysis for averaged fixed point iterations in the presence of Hölder regularity Jonathan M. Borwein Guoyin Li Matthew K. Tam October 23, 205 Abstract In this paper, we establish sublinear and linear convergence of fixed point iterations generated by averaged operators in a Hilbert space. Our results are achieved under a bounded Hölder regularity assumption which generalizes the well-known notion of bounded linear regularity. As an application of our results, we provide a convergence rate analysis for Krasnoselskii Mann iterations, the cyclic projection algorithm, and the Douglas Rachford feasibility algorithm along with some variants. In the important case in which the underlying sets are convex sets described by convex polynomials in a finite dimensional space, we show that the Hölder regularity properties are automatically satisfied, from which sublinear convergence follows. MSC200). Primary 4A25, 90C25; Secondary 4A50, 90C3 Keywords. averaged operator; fixed point iteration; convergence rate; Hölder regularity; semi-algebraic; Douglas Rachford algorithm Introduction Consider the problem of finding a point in the intersection of a finite family of closed convex subsets of a Hilbert space. This problem, often referred to as the convex feasibility problem, frequently arises throughout diverse areas of mathematics, science and engineering. For details, we refer the reader to the surveys [5, 20], the monographs [6, 23], any of [, 3, 9], and the references therein. One approach to solving convex feasibility problems involves designing a nonexpansive operator whose fixed point set can be used to easily produce a point in the target intersection in the simplest case, the fixed point set coincides with the target intersection). The operator s fixed point iteration can then be used as the basis of an iterative algorithm which, in the limit, yields a desired solution. An important class of such methods is the so-called projection and reflection methods which employ various combinations of projection and reflection operations with respect underlying constraint sets. Notable methods of this kind include the alternating projection algorithm [4, 5, 24], the Douglas Rachford DR) algorithm [33, 34, 39], along with many extensions and variants [8, 7, 8, 43]. Even in settings without convexity [ 3, 4, 37, 38], such methods remain a popular choice due largely to their simplicity, ease-of-implementation and relatively often surprisingly good performance. CARMA, Univ. of Newcastle, Callaghan, NSW 2308, Australia. E-mail: jonathan.borwein@newcastle.edu.au. Dept. of Applied Maths, Univ. of New South Wales, Sydney, NSW 2052, Australia. E-mail: g.li@unsw.edu.au. CARMA, Univ. of Newcastle, Callaghan, NSW 2308, Australia. E-mail: matthew.tam@uon.edu.au.

The origins of the Douglas Rachford DR) algorithm can be traced to [33] where it was used to solve problems arising in nonlinear heat flow. In its full generality, the method finds zeros of the sum of two maximal monotone operators. Weak convergence of the scheme was originally proven by Lions and Mercier [39], and the result was recently strengthened by Svaiter [4]. Specialized to feasibility problems, Svaiter s result implies that the iterates generated by the DR algorithm are always weakly convergent, and that the shadow sequence converges weakly to a point in the intersection of the two closed convex sets. The scheme has also been examined in [34] where its relationship with another popular method, the proximal point algorithm, was revealed and explained. Motivated by the computational observation that the Douglas Rachford algorithm sometimes outperforms other projection methods, in the convex case many researchers have studied the actual convergence rate of the algorithm. By convergence rate, we mean how fast the sequences generated by the algorithm converges to their limit points. For the Douglas Rachford algorithm, the first such result, which appeared in [36] and was later extended by [9], showed that the algorithm converges linearly whenever the two constraint sets are closed subspaces with a closed sum, and, further, that the rate is governed exactly by the cosine of the Friedrichs angle between the subspaces. When the sum of the two subspaces is not closed, convergence of the method while still assured need not be linear [9, Sec. 6]. For most projection methods, it is typical that there exists instances in which the rate of convergence is arbitrarily slow and not even sublinear or arithmetic [, 2]. Most recently, a preprint of Davis and Yin shows that indeed the Douglas Rachford method may converge arbitrarily slowly in infinite dimensions [22, Th. 9]. In potentially nonconvex settings, a number of recent works [35, 36, 40] have established local linear convergence rates for the DR algorithm using commonly used constraint qualifications. When specialized to the convex case, these results state that the DR algorithm exhibits locally linear convergence for convex feasibility problems in a finite dimensional space whenever the relative interiors of the two convex sets have a non-empty intersection. On the other hand, when such a regularity condition is not satisfied, the DR algorithm can fail to exhibit linear convergence, even in simple two dimensional cases as observed by [0, Ex. 5.4iii)] see Section 6 for further examples and discussion). This therefore calls for further research aimed at answering the question: Can the global convergence rate of the DR algorithm and its variants be established or estimated for some reasonable class of convex sets without the above mentioned regularity condition? The goal of this paper is to provide some partial answers to the above question, as well as simple tools for establishing sublinear and linear convergence of the Douglas Rachford algorithm and variants. Our analysis is performed within the more general setting of fixed point iterations described by averaged nonexpansive operators. We pay special attention to the case in which the underlying sets are convex basic semi-algebraic sets in a finite dimensional space. Such sets comprise a broad sub-class of convex sets that we shall show satisfy Hölder regularity properties without requiring any further assumptions; they capture all polyhedra and all convex sets described by convex quadratic functions. Furthermore, convex basic semi-algebraic structure can often be relatively easily identified.. Content and structure of the paper The detailed contributions of this paper are summarized as follows: I) We first examine an abstract algorithm which we refer to as the quasi-cyclic algorithm. This 2

algorithm covers many iterative fixed-point methods including various Krasnoselskii Mann iterations, the cyclic projection algorithm, and Douglas Rachford feasibility algorithms. We show that the norm of the successive change in iterates of the quasi-cyclic algorithm converges with the order at least o/ t) Proposition 3.2). This is a quantitative form of asymptotic regularity [5]. In the presence of so-called bounded Hölder regular properties, sublinear and linear convergence of the quasi-cyclic algorithm is then established Theorem 3.6). II) We next specialise our results regarding the quasi-cyclic algorithm, in particular, to the Douglas Rachford algorithm and its variants Section 4). We show that the results apply, for instance, to the important case of feasibility problems for which the underlying sets are convex basis semi-algebraic in a finite dimensional space. III) We also examine a damped variant of the Douglas Rachford algorithm. In the case where again the underlying sets are convex basic semi-algebraic sets in a finite dimensional space, we obtain a more explicit estimate of the sublinear convergence rate in terms of the dimension of the underlying space and the maximum degree of the polynomials involved Theorem 5.5). The remainder of the paper is organized as follows: in Section 2 we recall definitions and key facts used in our analysis. In Section 3 we investigate the rate of convergence of the quasicyclic algorithm which encompasses an averaged fixed point iteration in the presence of Hölder regularity. In Section 4 we specialize these results to the classical Douglas Rachford algorithm and its cyclic variants. In Section 5 we consider a damped version of the Douglas Rachford algorithm. In Section 6 we establish explicit convergence rates for two illustrative problems. We conclude the paper in Section 7 and mention some possible future research directions. 2 Preliminaries Throughout this paper our setting is a real) Hilbert space H with inner product,. The induced norm is defined by x := x, x for all x H. Given a closed convex subset A of H, the nearest point) projection operator is the operator P A : H A given by P A x = arg min x a. a A Let us now recall various definitions and facts used throughout this work, beginning with the notion of Fejér monotonicity. Definition 2. Fejér monotonicity). Let A be a non-empty convex subset of a Hilbert space H. A sequence x k ) k N in H is Fejér monotone with respect to A if, for all a A, we have x k+ a x k a k N. Fact 2.2 Shadows of Fejér monotone sequences [5, Th. 5.7iv)]). Let A be a non-empty closed convex subset of a Hilbert space H and let x k ) k N be Fejér monotone with respect to A. Then P A x k ) x, in norm, for some x A. Fact 2.3 Fejér monotone convergence [4, Th. 3.3iv)]). Let A be a non-empty closed convex subset of a Hilbert space H and let x k ) k N be Fejér monotone with respect to A with x k x A, in norm. Then x k x 2distx k, A). 3

We now turn our attention to a Hölder regularity property for usually finite collections of sets. Definition 2.4 Bounded Hölder regular intersection). Let {C j } j J be a collection of closed convex subsets in a Hilbert space H with non-empty intersection. The collection {C j } j J has a bounded Hölder regular intersection if, for each bounded set K, there exists an exponent γ 0, ] and a scalar β > 0 such that ) γ dist x, j J C j ) β max dx, C j) x K. j J Furthermore, if the exponent γ does not depend on the set K, we say the collection {C j } j J is bounded Hölder regular with exponent γ i.e., we explicitly specify the exponent). It is clear, from Definition 2.4, that any collection containing only a single set trivially has a bounded Hölder regular intersection with exponent γ =. More generally, Definition 2.4 with γ = is well-studied in the literature where it appears, amongst other names, as bounded linear regularity [5]. For a recent study, the reader is referred to [29, Remark 7]. The local counterpart to Definition 2.4 has been characterized in [29, Th. ] under the name of metric [γ]-subregularity. We next turn our attention to a nonexpansivity notion for operators. Definition 2.5. An operator T : H H is: a) non-expansive if for all x, y H, b) firmly non-expansive if for all x, y H, T x) T y) x y ; T x) T y) 2 + I T )x) I T )y) 2 x y 2 ; c) α-averaged for some α 0, ), if there exists a non-expansive mapping R: H H such that T = α)i + αr. The class of firmly non-expansive mappings comprises precisely the /2-averaged mappings, and any α-averaged operator is non-expansive [6, Ch. 4]. The following fact provides a characterization of averaged maps that is useful for our purposes. Fact 2.6 Characterization of averaged maps [6, Prop. 4.25iii)]). Let T : H H be an α-averaged operator on a Hilbert space with α 0, ). Then, for all x, y H, T x) T y) 2 + α α I T )x) I T )y) 2 x y 2. Denote the set of fixed points of an operator T : H H by Fix T = {x H T x) = x}. The following definition is of a Hölder regularity property for operators. 4

Definition 2.7 Bounded Hölder regular operators). Let D be a subset of a Hilbert space H. An operator T : H H is bounded Hölder regular if, for each bounded set K H, there exists an exponent γ 0, ] and a scalar µ > 0 such that dx, Fix T ) µ x T x) γ x K. Furthermore, if the exponent γ does not depend on the set K, we say that T is bounded Hölder regular with exponent γ i.e., we explicitly specify the exponent). Note that, in the case when γ =, Definition 2.7 collapses to the well studied concept of bounded linear regularity [5] and has been used in [8] to analyze linear convergence of algorithms involving non-expansive mappings. Moreover, it is also worth noting that if an operator T is bounded Hölder regular with exponent γ 0, ] then the mapping x x T x) is bounded Hölder metric subregular with exponent γ. Hölder metric subregularity which is a natural extension of metric subregularity and Hölder type error bounds, have recently been studied in [28, 30 32]. Finally, we recall the definitions of semi-algebraic functions and semi-algebraic sets Definition 2.8 Semi-algebraic sets and functions [2]). A set D R n is semi-algebraic if D := s j= i= l {x R n f ij x) = 0, h ij x) < 0} for some integers l, s and some polynomial functions f ij, h ij on R n i l, j s). A mapping F : R n R p is said to be semi-algebraic if its graph, gphf := {x, F x)) x R n }, is a semi-algebraic set in R n R p. The next fact summarises some fundamental properties of semi-algebraic sets and functions. Fact 2.9 Properties of semi-algebraic sets/functions). The following statements hold. P) Any polynomial is a semi-algebraic function. P2) Let D be a semi-algebraic set. Then dist, D) is a semi-algebraic function. P3) If f, g are semi-algebraic functions on R n and λ R then f + g, λf, max{f, g}, fg are semi-algebraic. P4) If f i are semi-algebraic functions, i =,..., m, and λ R, then the sets {x f i x) = λ, i =,..., m}, {x f i x) λ, i =,..., m} are semi-algebraic sets. P5) If F : R n R p and G : R p R q are semi-algebraic mappings, then their composition G F is also a semi-algebraic mapping. P6) Lojasiewicz s inequality) If φ, ψ are two continuous semi-algebraic functions on a compact semi-algebraic set K R n such that φ 0) ψ 0) then there exist constants c > 0 and τ 0, ] such that ψx) c φx) τ x K. Proof. P) and P4) follow directly from the definitions. See [2, Prop. 2.2.8] for P2), [2, Prop. 2.2.6] for P3) and P5), and [2, Cor. 2.6.7] for P6). 5

Definition 2.0 Basic semi-algebraic convex sets in R n ). A set C R n is a basic semi-algebraic convex set if there exist γ N and convex polynomial functions, g j, j =,..., γ such that C = {x R n g j x) 0, j =,, γ}. Any basic semi-algebraic convex set is clearly convex and semi-algebraic. On the other hand, there exist sets which are both convex and semi-algebraic but fail to be basic semi-algebraic convex set, see [5]. It transpires out that any finite collection of basic semi-algebraic convex sets has an intersection which is boundedly Hölder regular without requiring further regularity assumptions). In the following lemma, Bn) denotes the central binomial coefficient with respect to n given by ) n [n/2] where [ ] denotes the integer part of a real number. Lemma 2. Hölder regularity of basic semi-algebraic convex sets in R n [5]). Let C i be basic convex semi-algebraic sets in R n given by C i = {x R n g ij x) 0, j =,..., m i }, i =,..., m where g ij are convex polynomials on R n with degree at most d. Let θ > 0 and K R n be a compact set. Then there exists c > 0 such that m γ dist θ x, C) c dist θ x, C i )) x K, where γ = i= [ { }] min 2d ) n +. 2, Bn )d n We also recall the following useful recurrence relationship established in [5]. Lemma 2.2 Recurrence relationship [5]). Let p > 0, and let {δ t } t N and {β t } t N be two sequences of nonnegative numbers such that β t+ β t δ t β p t ) t N. Then β t [We use the convention that 0 = +.] β p 0 + p t i=0 δ i ) p t N. 3 The rate of convergence of the quasi-cyclic algorithm In this section we investigate the rate of convergence of an abstract algorithm we call quasi-cyclic. To define the algorithm, let J be a finite set, and let {T j } j J be a finite family of operators on a Hilbert space H. Given an initial point x 0 H, the quasi-cyclic algorithm generates a sequence according to x t+ = j J w j,t T j x t ) t N, ) where w j,t 0 for all j J, and j J w j,t =. The quasi-cyclic algorithm was proposed in [8] where linear convergence of the algorithm was established under suitable regularity conditions. We note that, as we will see later, the quasi-cyclic 6

algorithm provides a broad framework which covers many important existing algorithms such as Douglas-Rachford algorithms, cyclic projection algorithm and the Krasnoselskii Mann method. We first examine the complexity of the successive change of the sequence generated by the quasi-cyclic algorithm. To establish this, we prove a more general result which shows that the the successive change of a sequence generated by iterating averaged operators is at worst of order o/ t). Proposition 3. Complexity of the successive change). Let {T t } t N be a family of α-averaged operators on a Hilbert space H with t N Fix T t and α 0, ). Let x 0 H and set x t+ = T t x t for all t N. Then there exists a sequence ɛ t [0, ] with ɛ t 0 such that x t+ x t ɛ t α t + α distx0, t N Fix T t ) t N. Proof. First of all, if x 0 t N Fix T t, then x t = x 0 for all t N and so the conclusion follows trivially. Thus, we can assume that x 0 / t N Fix T t. Fix x t N Fix T t. For all t N, since T t is α-averaged, Fact 2.6 with x = x t and y = x gives us that x t+ x 2 + α α xt x t+ 2 x t x 2. This implies that, for all t N, x t+ x x t x... x 0 x and α x t x t+ 2 x 0 x 2. α t=0 Since x t x 2 is nonincreasing, we have that x n+ x n 2 n n + t=0 x t x t+ 2 n + ) α x 0 x 2. α Thus, by replacing n with t and taking infimum over all x t N Fix T t, we see that x t+ x t α t + α distx0, t N Fix T t ) 2) Now, as + t=0 xt+ x t 2 < +, observe that 2t N=t xn+ x N 2 0 as t. Note that, for each t N x t+2 x t+ = T t x t+ T t x t x t+ x t, where the last inequality follows from the fact that T t is an α-averaged operator and in particular, is nonexpansive). Then, we have t x 2t x 2t 2 x t+ x t 2 +... + x 2t x 2t 2 = This implies that t x t x t 0 as t. Let ɛ t := 2t N=t x t+ x t. α t+ α distx0, t N Fix T t ) x N+ x N 2 0 as t. Then, ɛ t 0 as t. Moreover, from 2), we see that ɛ t [0, ]. So, the conclusion follows. 7

Corollary 3.2 Complexity of quasi-cyclic algorithm). Let J be a finite set and let {T j } j J be a finite family of α-averaged operators on a Hilbert space H with j J Fix T j and α 0, ). For each t N, let w j,t R, j J, be such that w j,t 0 and j J w j,t =. Let x 0 H and consider the quasi-cyclic algorithm generated by ). Then there exists a sequence ɛ t 0, ] with ɛ t 0 such that x t+ x t ɛ t α t + α distx0, j J Fix T j ) t N. Proof. Define T t = j J w j,tt j. As each T j is an α-averaged operator, w j,t 0 and j J w j,t =, it can be verified that T t is also an α-averaged operator. Note that j J Fix T j t N T t. Thus the conclusion follows immediately by applying Proposition 3.. Remark 3.3 Comments on the convergence order of the successive change). Corollary 3.2 shows that the norm square of the successive change, x t+ x t 2, generated by the quasi-cyclic algorithm is at worst of the order o/t), whenever the operators T j are averaged operators. As we shall see, this proposition applies, in particular, to the classical Douglas Rachford algorithm where an O/t) complexity of x t+ x t 2 has been proved recently for the DR algorithm see [26,27]). Herein, we obtain a slightly improved complexity result for a more general algorithm framework. Moreover, we note that, very recently, an o/t) complexity of x t+ x t 2 has also been established in [2] for the forward-douglas Rachford splitting method under an additional Lipschitz gradient assumption. We note that, as pointed out in [2, Section.4], this method is a special case of the Krasnoselskii Mann iterative method, and so, is a particular case of the quasi-cyclic algorithm. We also note that the norm square of the successive change, x t+ x t 2, provides a straightforward numerical necessary condition for convergence of x t to a point in j J Fix T j or not. Of course x t+ x t 0 does not necessarily guarantee that the sequence {x t } converges, and so, the convergence order of the successive change is not enough to establish convergence of {x t }. To establish the convergence rate of the quasi-cyclic algorithm, we require the following two lemmas. Lemma 3.4. Let J be a finite set and let {T j } j J be a finite family of α-averaged operators on a Hilbert space H with j J Fix T j and α 0, ). For each t N, let w j,t R, j J, be such that w j,t 0 and j J w j,t =. Let x 0 H and consider the quasi-cyclic algorithm generated by ). Suppose that σ := inf inf {w j,t} > 0 where J + t) = {j J : w j,t > 0} for each t N. t N j J + t) Then, for each j J, we have that x t T j x t ) 0 as t. Proof. Let y j J FixT j. Then, for all t N, convexity of 2 yields x t+ y 2 = j J w j,t T j x t ) y 2 j J w j,t T j x t ) y 2 x t y 2, 3) where the last inequality follows by the fact that each T j is α-averaged and so, is nonexpansive). Thus x t y 2 ) t N is a decreasing and hence convergence sequence. Furthermore, lim w j,t T j x t ) y 2 = lim x t y 2. 4) t t j J A simple example is 0, 2,, 3, 2 3,,..., t, 2 t,..., t t,,... 8

Since T j is α-averaged for each j J, Fact 2.6 implies, for all t N, T j x t ) y 2 + α α xt T j x t ) 2 x t y 2. from which, for sufficiently large t, we deduce α α σ xt T j x t ) 2 α α j J w j,t x t T j x t ) 2 x t y 2 j J Together with 4), this implies x t T j x t ) 0 for all j J. w j,t T j x t ) y 2. The following proposition gives a convergence rate for Fejér monotone sequences which satisfy an additional property, which property we will later show is satisfied in the presence of Hölder regularity. Proposition 3.5. Let F be a non-empty closed convex set in a Hilbert space H. sequence {x t } is Fejér monotone with respect to F and satisfies Suppose the dist 2 x t+, F ) dist 2 x t, F ) δ dist 2θ x t, F ), t N, 5) for some δ > 0 and θ. Then x t x for some x F. Moreover, there exist M > 0 and r [0, ) such that {Mt x t 2θ ) θ >, x Mr t θ =. Further, when θ =, δ necessarily lies in 0, ]. Proof. Let β t = dist 2 x t, F ) and p = θ 0. Then 5) becomes ) β t+ β t δβ p t We now distinguish two cases based on the value of θ. Case : Suppose θ, + ). Then Lemma 2.2 implies ) β t β p θ 0 + θ )δt for all t N. So, we see that, there exists M > 0 such that distx t, F ) = β t M t 2θ ). In particular, x t P F x t ) = distx t, F ) 0. By Fact 2.2, P F x t ) x for some x F and hence x t x F. This together with Fact 2.3 implies that x t x 2distx t, F ) 2M t 2θ ). Case 2: Suppose θ =. Then 6) simplifies to β t+ δ)β t for all t N. Moreover, this shows that δ 0, ] and that distx t, F ) = β t β 0 δ ) t. Then, by the same argument as used in Case, for some x F, we see that x t x 2distx t, F ) 2 β 0 δ) t. The conclusion follows on setting M = max{2m, 2 β 0 } and r = δ [0, ). 6) 9

We are now in a position to state our first main convergence result, which we simultaneously prove for both variants of the Hölder regularity assumption with and without the Hölder exponents being independent of the choice of bounded set). Theorem 3.6 Rate of convergence of the quasi-cyclic algorithm). Let J be a finite set and let {T j } j J be a finite family of α-averaged operators on a Hilbert space H with j J Fix T j and α 0, ). For each t N, let w j,t R, j J, be such that w j,t 0 and j J w j,t =. Let x 0 H and consider the quasi-cyclic algorithm generated by ). Suppose the following assumptions hold: a) For each j J, the operator T j is bounded Hölder regular. b) {Fix T j } j J has a boundedly Hölder regular intersection. c) σ := inf inf {w j,t} > 0 where J + t) = {j J : w j,t > 0} for each t N. t N j J + t) Then there exists ρ > 0 such that x t x j J Fix T j with sublinear rate Ot ρ ). Furthermore, if we assume the following stronger assumptions: a ) For each j J, the operator T j is bounded Hölder regular with exponent γ,j 0, ]; b ) {Fix T j } j J has a bounded Hölder regular intersection with exponent γ 2 0, ]. Then there exist M > 0 and r [0, ) such that {Mt γ x t 2 γ), γ 0, ), x M r t, γ =, where γ := γ γ 2 and γ := min{γ,j j J}. Proof. Suppose first that the assumptions a), b) and c) hold. Since T j is α-averaged for each j J, Fact 2.6 implies that, for all x, y H, T j x) T j y) 2 + α α I T j)x) I T j )y) 2 x y 2. Set F = j J Fix T j. Then, for all x H and for all y F, Hence for all x H and for all y F, j J T j x) y 2 + α α x T jx) 2 x y 2. w j,t T j x) y 2 = j J w j,t Tj x) y ) 2 j J w j,t T j x) y 2 j J w j,t x y 2 α α x T jx) 2) = x y 2 α w j,t x T j x) 2 α j J ) α x y 2 σ x T j x) 2 j J, α 0

where the first inequality follows from convexity of 2, and the final inequality follows from Assumption c). Setting x = x t yields, for all y F, ) α x t+ y 2 x t y 2 σ x t T j x t ) 2 j J. α In particular, the sequence x t ) t N is bounded and Fejér monotone with respect to F. Further, setting y = P F x t ) gives ) α dist 2 x t+, F ) x t+ P F x t ) 2 dist 2 x t, F ) σ x t T j x t ) 2 j J. 7) α Let K be a bounded set such that {x t t N} K. For each j J, since the operator T j is bounded Hölder regular, there exist exponents γ,j > 0 and scalars µ j > 0 such that Setting γ = min{γ,j j J} gives distx, Fix T j ) µ j x T j x) γ,j x K. 8) x T j x) γ j x T j x) γ x K {x x T j x) }. By Lemma 3.4, there exists t 0 N such that x t T j x t ) for all t t 0 and j J. Hence, for all t t 0, it follows from 7) and 8) that ) α σ µ 2 γ j dist 2 γ x t, Fix T j ) dist 2 x t, F ) dist 2 x t+, F ) j J. α Taking the maximum over all j J and letting µ = max{µ j j J}, we have ) α σ µ 2 γ max α dist 2 γ x t, Fix T j ) dist 2 x t, F ) dist 2 x t+, F ). 9) j J Since {Fix T j } j J has a bounded Hölder regular intersection, there exists β > 0 such that ) 2/γ β dist2θ x t, F ) max j J distxt, Fix T j ) = max j J dist2/γ x t, Fix T j ), 0) where θ := /γ γ 2 ). Altogether, combining 9) and 0), we see that, for all t t 0, dist 2 x t+, F ) dist 2 x t, F ) δ dist 2θ x t, F ), where δ := σ α 2 α )µ γ β > 0. Then, the first assertion follows from Proposition 3.5. To see the second assertion, we suppose that the assumptions a ), b ) and c) hold. Proceed with the same proof as above, and note that the exponent γ j and γ 2 are now independent of the choice of K, we see that the second assertion also follows. Remark 3.7. Theorem 3.6 is a generalization of [8, Th. 6.] which considered the case in which the Hölder exponents are independent of the bounded set K and given by γ j = γ 2 =, j =,..., m.

We next provide three important specializations of Theorem 3.6. The first result is concerned with a simple fixed point iteration, the second with a Kransnoselskii Mann scheme, and the third with the method of cyclic projections. Corollary 3.8 Averaged fixed point iterations with Hölder regularity). Let T be an α-averaged operators on a Hilbert space H with Fix T and α 0, ). Suppose T is bounded Hölder regular. Let x 0 H and set x t+ = T x t. Then there exists ρ > 0 such that x t x Fix T with sublinear rate Ot ρ ). Furthermore, if T is bounded Hölder regular with exponent γ 0, ] then there exist M > 0 and r [0, ) such that {Mt γ x t 2 γ), γ 0, ), x M r t, γ =. Proof. The conclusion follows immediately from Theorem 3.6. Corollary 3.9 Krasnoselskii Mann iterations with Hölder regularity). Let T be an α-averaged operator on a Hilbert space H with Fix T and α 0, ). Suppose T is bounded Hölder regular. Let σ 0 0, ) and let λ t ) t N be a sequence of real numbers with σ 0 := inf {λ t λ t )} > 0. Given t N an initial point x 0 H, set x t+ = x t + λ t T x t x t ). Then there exists ρ > 0 such that x t x Fix T with sublinear rate Ot ρ ). Furthermore, if T is bounded Hölder regular with exponent γ 0, ] then there exist M > 0 and r [0, ) such that {Mt γ x t 2 γ), γ 0, ), x M r t, γ =. Proof. First observe that the sequence x t ) t N is given by x t+ = T t x t where T t = λ t )I + λ t T. Here, λ t σ 0 > 0 and λ t σ 0 > 0 for all t N by our assumption. A straightforward manipulation shows that the identity map, I, is bounded Hölder regular with exponent γ,. Since Fix I = H, the collection {Fix I, Fix T } has a bounded Hölder regular intersection with exponent. The result now follows from Theorem 3.6. The following result includes [5, Th. 4.4] and [5, Th. 3.2] as a special cases. Corollary 3.0 Cyclic projection algorithm with Hölder regularity). Let J = {, 2,..., m} and let {C j } j J a collection of closed convex subsets of a Hilbert space H with non-empty intersection. Given x 0 H set x t+ = P Cj x t where j = t mod m. Suppose that {C j } j J has a bounded Hölder regular intersection. Then there exists ρ > 0 such that x t x j J C j with sublinear rate Ot ρ ). Furthermore, if the collection {C j } j J is bounded Hölder regular with exponent γ 0, ] there exist M > 0 and r [0, ) such that {Mt γ x t 2 γ), γ 0, ), x M r t, γ =. 2

Proof. First note that the projection operator over a closed convex set is /2-averaged. Now, for each j J, C j = Fix P Cj, and hence dx, C j ) = dx, Fix P Cj ) = x P Cj x x H. That is, for each j J, the projection operator P Cj is bounded Hölder regular with exponent. The result follows from Theorem 3.6. 4 The rate of convergence of DR algorithms We now specialize our convergence results to the classical DR algorithm and its variants, and so obtain a convergence rate under the Hölder regularity condition. Recall that the basic Douglas Rachford algorithm for two set feasibility problems can be stated as follows: Algorithm : Basic Douglas Rachford algorithm Data: Two closed and convex sets C, D H Choose an initial point x 0 H; for t = 0,, 2, 3,... do Set: y t+ := P C x t ), end z t+ := P D 2y t+ x t ), x t+ := x t + z t+ y t+ ). Direct verification shows that the relationship between consecutive terms in the sequence x t ) of ) can be described in terms of the firmly nonexpansive two-set) Douglas Rachford operator which is of the form T C,D = 2 I + R DR C ), 2) where I is the identity mapping and R C := 2P D I is the reflection operator with respect to the set C reflect-reflect-average ). We shall also consider the following abstraction which chooses two constraint sets from some finite collection at each iteration. Note that iterations ) and 3) have the same structure. Algorithm 2: A multiple-sets Douglas Rachford algorithm Data: A family of m closed and convex sets C, C 2,..., C m H Choose a list of 2-tuples Ω,..., Ω s {i, j) : i, j =, 2,..., m and i j} with s j= Ω j = {,..., m}; Choose an initial point x 0 H; for t = 0,, 2, 3,... do Set the indices i, j) := Ω t where t = t mod m; Set y t+ := P Ci x t ), end z t+ := P Cj 2y t+ x t ), x t+ := x t + z t+ y t+ ). ) 3) 3

The motivation for studying Algorithm 2 is that, beyond Algorithm, it include two further DR-type schemes from the literature. The first scheme is the cyclic DR algorithm and is generated according to: x t+ = T Cm,C T Cm,C m... T C2,C 3 T C,C 2 )x t ) t N, which corresponds the Algorithm 2 with s = m and Ω j = j, j + ), j =,..., m, and Ω m = m, ). The second scheme the cyclically anchored DR algorithm and is generated according to: x t+ = T C,C m... T C,C 3 T C,C 2 )x t ) t N, which corresponds the Algorithm 2 with s = m and Ω j =, j + ), j =,..., m. The following lemma shows that underlying operators both these methods are also averaged. Lemma 4. Compositions of DR operators). Let s be a positive integer. The composition of s s Douglas Rachford operators is s+ averaged. Proof. The two-set Douglas Rachford operator of 2) is firmly nonexpansive, and hence /2- averaged. The result follows by [6, Prop. 4.32]. As a consequence of Lemma 4. and Proposition 3.2, we now obtain the complexity of the successive change for the many set DR algorithm. For convenience, in the following result, for a set Ω j = {i j, ij 2 )} with indices ij, ij 2 {,..., m}, j =,..., s, we denote T Ωj := T C j i,c j i 2. Corollary 4.2 Complexity of the multiple-sets DR algorithm). Let C, C 2,..., C m be closed convex sets in a Hilbert space H with non-empty intersection. Let {Ω j } s j= and {yt, z t, x t )} be as used by the multiple-sets Douglas Rachford algorithm 3). Then, for each t N, there exists a sequence ɛ t 0 with 0 ɛ t such that ) s ɛ t distx 0, s x t+ x t j= Fix T Ω j ). t + Proof. Let J = {}, T = T Ωs T Ωs... T Ω and w t,. From Lemma 4., T is s/s+)-averaged. Note that s j= Fix T Ω j Fix T. Thus, the conclusion follows by applying Proposition 3. with α = s/s + ). Remark 4.3. The previous corollary holds with s = m for the cyclic DR algorithm, and with s = m ) for the cyclically anchored DR algorithm. Corollary 4.4 Convergence rate for the multiple-sets DR algorithm). Let C, C 2,..., C m be closed convex sets in a Hilbert space H with non-empty intersection. Let {Ω j } s j= and {yt, z t, x t )} be generated by the multiple-sets Douglas Rachford algorithm ). Suppose that: a) For each j {,..., s}, the operator T Ωj is bounded Hölder regular. b) The collection {Fix T Ωj } s j= has a bounded Hölder regular intersection. Then there exists ρ > 0 such that x t x s j= Fix T Ω j suppose we assume the stronger assumptions: with sublinear rate Ot ρ ). Furthermore, 4

a ) For each j {,..., s}, the operator T Ωj is bounded Hölder regular with exponent γ,j. b ) The collection {Fix T Ωj } s j= has a bounded Hölder regular intersection with exponent γ 2 0, ]. Then there exist M > 0 and r 0, ) such that { x t Mt γ 2 γ) if γ 0, ), x M r t if γ =. where γ := γ γ 2 where γ := min{γ,j j s}. Proof. Let J = {, 2,..., s}. For for all j J, set T j = T Ωj and { j = t mod m, w t,j. 0 otherwise. Since T Ωj is firmly nonexpansive that is, /2-averaged), the conclusion follows from Corollary 4.4. We next observe that bounded Hölder regularity of the Douglas-Rachford operator T Ωj and Hölder regular intersection of the collection {Fix T Ωj } s j= are automatically satisfied for the semialgebraic convex case, and so, sublinear convergence analysis follows in this case without any further regularity conditions. This follows from: Proposition 4.5 Semi-algebraicity implies Hölder regularity & sublinear convergence). Let C, C 2,..., C m be basic convex semi-algebraic sets in R n with non-empty intersection, given by C j = {x R n g ij x) 0, i =,..., m j } where g ij, i =,..., m j, j =,..., m, are convex polynomials on R n with degree d. Let {Ω j } s j= be a list of 2-tuples with s j= Ω j = {,..., m} Then, a) For each j {,..., s}, the operator T Ωj is bounded Hölder regular. Moreover, if d =, then T Ωj is bounded Hölder regular with exponent. b) The collection {Fix T Ωj } s j= has a bounded Hölder regular intersection. In particular, let {y t, z t, x t )} be generated by the multiple-sets Douglas Rachford algorithm ). Then there exists ρ > 0 such that x t x s j= Fix T Ω j with sublinear rate Ot ρ ). Proof. Fix any j {,..., s}. We first verify that the operator T Ωj is bounded Hölder regular. Without loss of generality, we assume that Ω j = {, 2} and so, T Ωj = T C,C 2 where T C,C 2 is the Douglas-Rachford operator for C and C 2. Recall that for each x R n, T C,C 2 x) x = P C2 R C x)) P C x). We now distinguish two cases depending on the value of the degree d of the polynomials which describes C j, j =, 2. Case d > ): We first observe that, for a closed convex semi-algebraic set C R n, the projection mapping x P C x) is a semi-algebraic mapping. This implies that, for i =, 2, x P Ci x) and x R Ci x) = 2P Ci x) x are all semi-algebraic mappings. Since the composition of semi-algebraic maps remains semi-algebraic P5) of Fact 2.9), we deduce that f : x T C,C 2 x 2 is a continuous semi-algebraic function. By P4) of Fact 2.9, Fix T C,C 2 = {x fx) = 0} which is 5

therefore a semi-algebraic set. By P2) of Fact 2.9, the function dist, Fix T C,C 2 ) is semi-algebraic, and clearly dist, Fix T C,C 2 ) 0) = f 0). By the Lojasiewicz inequality for semi-algebraic functions P6) of Fact 2.9), we see that for every ρ > 0, one can find µ > 0 and γ 0, ] such that distx, Fix T C,C 2 ) µ x T C,C 2 x γ x B0, ρ). So, the Douglas Rachford operator T C,C 2 is bounded Hölder regular in this case. Case 2 d = ): In this case, both C and C 2 are polyhedral, hence their projections P C and P C2 are piecewise affine mappings. Noting that composition of piecewise affine mappings remains piecewise affine [42], we deduce that F : x T C,C 2 x) x is continuous and piecewise affine. Then, Robinson s theorem on metric subregularity of piecewise affine mappings [44] implies that for all a R n, there exist µ > 0, ɛ > 0 such that distx, Fix T C,C 2 ) = distx, F 0)) µ F x) = µ x T C,C 2 x) x Ba, ɛ). Then, a standard compactness argument shows that the Douglas Rachford operator T C,C 2 is bounded linear regular, that is, bounded Hölder regular with exponent. Next, we assert that the collection {Fix T Ωj } s j= has a bounded Hölder regular intersection. To see this, as in the proof of part a), we can show that for each j =,..., s, Fix T Ωj is a semialgebraic set. Then, their intersection s j= Fix T Ω j is also a semi-algebraic set. Thus, ψx) = distx, s j= Fix T Ω j ) and φx) = max j s distx, Fix T Ωj ) are semi-algebraic functions. It is easy to see that φ 0) = ψ 0) and hence the Lojasiewicz inequality for semi-algebraic functions P6) of Fact 2.9) implies that the collection {Fix T Ωj } s j= has a bounded Hölder regular intersection. The final conclusion follows by Theorem 3.6. Next, we establish the convergence rate for DR algorithm assuming bounded Hölder regularity of the Douglas Rachford operator T C,D. Indeed, by supposing that T C,D is bounded Hölder regular with exponent γ 0, ], it is immediate from Proposition 3.2 and the firmly nonexpansive property of T C,D that distx t, Fix T C,D ) converges to 0 at the order of ot γ 2 ). Below, under the same assumption we show a stronger result: x t converges to x for some x Fix T C,D at least in the order of Ot γ 2 γ) ) if γ 0, ), and x t converges to x linearly if γ =. Corollary 4.6 Convergence rate for the DR algorithm). Let C, D be two closed convex sets in a Hilbert space H with C D, and let T C,D be the Douglas Rachford operator. Let {y t, z t, x t )} be generated by the Douglas Rachford algorithm ). Suppose that T C,D is bounded Hölder regular. Then there exists ρ > 0 such that x t x Fix T C,D with sublinear rate Ot ρ ). Furthermore, if T C,D is bounded Hölder regular with exponent γ 0, ] then there exist M > 0 and r 0, ) such that { x t Mt γ 2 γ) if γ 0, ), x M r t if γ =. Proof. Let J = {}, T = T C,D and w t,. Note that T C,D is firmly nonexpansive that is, /2- averaged) and any collection containing only one set has Hölder regular intersection with exponent one. Then the conclusion follows immediately from Theorem 3.6. Similar to Proposition 4.5, if C and D are basic convex semi-algebraic sets, then DR algorithm exhibits a sublinear convergence rate. 6

Remark 4.7 Linear convergence of the DR algorithm). We note that if H = R n and ric rid, then T C,D is bounded linear regular, that is, bounded Hölder regular with exponent. Thus, the Douglas Rachford algorithm converges linearly in this case. This had been shown in [8]. Also, if C and D are both subspaces such that C + D is closed as is automatic in finite-dimensions), then T C,D is also bounded linear regular, and so, the DR algorithm converges linearly in this case as well. This was been established in [9]. It should be noted that [9] deduced the stronger result that the linear convergence rate is exactly the cosine of the Friedrichs angle. 5 The rate of convergence of the damped DR algorithm We now investigate a variant of Algorithm which we refer to as the damped Douglas Rachford algorithm. To proceed, let η > 0, let A be a closed convex set in H, and define the operator P η A by P η A = 2η + I + 2η ) 2η + P A, where I denotes the identity operator on H. The operator P η A can be considered as a relaxation of the projection mapping. Further, a direct verification shows that lim η Pη A x) = P Ax) x H, in norm, and { P η A x) = prox ηdist 2 x) = arg min dist 2 A Ay) + } y x 2 y H 2η x H, 4) where prox f denotes the proximity operator of the function f. The damped variant can be stated as follows: Algorithm 3: Damped Douglas Rachford algorithm Data: Two closed sets C, D H Choose η > 0 and step-size parameters λ t 0, 2]; Choose an initial point x 0 H; for t = 0,, 2, 3,... do Choose λ t 0, 2] with λ t λ and set: y t+ := P η C xt ), z t+ := P η D 2yt+ x t ), x t+ := x t + λ t z t+ y t+ ). 5) end Remark 5.. Algorithm 2 can be found, for instance, in [6, Cor. 27.4]. A similar relaxation of the Douglas Rachford algorithm for lattice cone constraints has been proposed and analyzed in [6]. Whilst it is possible to analyze the damped Douglas Rachford algorithm within the quasi-cyclic framework, we learn more by proving the following result directly. 7

Theorem 5.2 Convergence Rate for the damped Douglas Rachford algorithm). Let C, D be two closed and convex sets in a Hilbert space H with C D. Let λ = inf t N λ t > 0 with λ t 0, 2] and let {y t, z t, x t )} be generated by the damped Douglas Rachford algorithm 5). Suppose that the pair of sets {C, D} has a bounded Hölder regular intersection. Then there exists ρ > 0 such that x t x C D with sublinear rate Ot ρ ). Furthermore, if the pair {C, D} has a bounded Hölder regular intersection with exponent γ 0, ] then there exist M > 0 and r 0, ) such that { x t Mt γ 2 γ) if γ 0, ) x M r t if γ =. Proof. Step A Fejér monotonicity type inequality for x t ): Let x C D. We first show that 2ηλ dist 2 y t+, C) + dist 2 z t+, D) ) x t x 2 x t+ x 2. 6) To see this, note that, for any closed and convex set A, dist 2, A) is a differentiable convex function satisfying dist 2 )x, A) = 2x P A x)) which is 2-Lipschitz. Using the convex subgradient inequality, we have 2ηλ dist 2 y t+, C) + dist 2 z t+, D) ) 2ηλ t dist 2 y t+, C) + dist 2 z t+, D) ) = 2ηλ t dist 2 y t+, C) dist 2 x, C) + dist 2 z t+, D) dist 2 x, D) ) 4ηλ t y t+ P C y t+ ), y t+ x + z t+ P D z t+ ), z t+ x ) = 4ηλ t y t+ P C y t+ ), y t+ x + z t+ P D z t+ ), z t+ y t+ + z t+ P D z t+ ), y t+ x ) = 4ηλ t y t+ P C y t+ ) + z t+ P D z t+ ), y t+ x + z t+ P D z t+ ), z t+ y t+ ) = 4η λ t y t+ P C y t+ ) + z t+ P D z t+ ), y t+ x + z t+ P D z t+ ), x t+ x t ) 7) where the last equality follows from the last relation in 5). Now using 4), we see that 0 = dist 2, C) + 2η xt 2 )y t+ ) = 2 y t+ P C y t+ ) ) + η yt+ x t ), and 0 = dist 2, D) + 2η 2yt+ x t ) 2 )z t+ ) = 2 z t+ P D z t+ ) ) + η zt+ 2y t+ + x t ). Summing these two equalities and multiplying by λ t yields Note also that λ t y t+ P C y t+ ) + z t+ P D z t+ ) ) = λ t 2η zt+ y t+ ) = 2η xt+ x t ). x t + z t+ y t+ = x t+ + λ t )z t+ y t+ ) = x t+ + λ t λ t x t+ x t ). 8

Substituting the last two equations into 7) gives 2ηλ dist 2 y t+, C) + dist 2 z t+, D) ) 4η z t+ P D z t+ ) 2η yt+ x ), x t+ x t = 4η 2η zt+ 2y t+ + x t ) 2η yt+ x ), x t+ x t = 2 z t+ y t+ + x t x, x t+ x t = 2 x t+ x, x t+ x t 2 λ t x t+ x t λ t = x t x 2 x t+ x 2 x t+ x t 2) 2 λ t x t+ x t λ t = x t x 2 x t+ x 2 2 λ t x t+ x t. λ t Step 2 establishing a recurrence for dist 2 x t, C D)): First note that y t+ = P η C xt ) = 2η + xt + 2η 2η + P Cx t ). This shows that y t+ lies in the line segment between x t and its projection onto C. So, P C y t+ ) = P C x t ) and hence, ) 2 ) 2 dist 2 y t+, C) = y t+ P C x t ) 2 = P C x t ) x t 2 = dist 2 x t, C). 2η + 2η + Similarly, as z t+ = P η D 2yt+ x t ) = 2η + 2yt+ x t ) + 2η 2η + P D2y t+ x t ), the point z t+ lies in the line segment between 2y t+ x t and its projection onto D. Thus P D z t+ ) = P D 2y t+ x t ) and so, dist 2 z t+, D) = z t+ P D 2y t+ x t ) 2 ) 2 = P D 2y t+ x t ) 2y t+ x t ) 2 2η + ) 2 = dist 2 2y t+ x t, D). 2η + Now, using the non-expansiveness of dist, D), we have dist 2 x t, D) x t 2y t+ x t ) + dist2y t+ x t, D) = ) 2 2 x t y t+ + dist2y t+ x t, D) ) 2 = 4η 2η + distxt, C) + dist2y t+ x t, D) ) c dist 2 x t, C) + dist 2 2y t+ x t, D), ) 2 9

where c = 2max{ 4η 2η+, })2, and where the last inequality above follows from the following elementary inequalities: for all α, x, y R +, Therefore, we have So, using 6), we have αx + y max{α, }x + y), x + y) 2 2x 2 + y 2 ). dist 2 2y t+ x t, D) c dist 2 x t, D) dist 2 x t, C). x t x 2 x t+ x 2 2ηλ dist 2 y t+, C) + dist 2 z t+, D) ) ) 2 ) = 2ηλ dist 2 x t, C) + dist 2 2y t+ x t, D). 2η + Note that dist 2 x t, C) + dist 2 2y t+ x t, D) dist 2 x t, C) + c dist 2 x t, D) dist 2 x t, C) = c dist 2 x t, D) and It follows that dist 2 x t, C) + dist 2 2y t+ x t, D) dist 2 x t, C). ) 2 x t x 2 x t+ x 2 2η c max{dist 2 x t, C), dist 2 x t, D)} 2η + ) 2 = 2η c max{distx t, C), distx t, D)}) 2. 8) 2η + In particular, we see that the sequence {x t } is bounded and Fejér monotone with respect to C D. Thence, letting K be a bounded set containing {x t }, by bounded Hölder regularity of {C, D}, there exists µ > 0 and γ 0, ] such that Thus there exists a δ > 0 such that distx, C D) µ max{distx, C), distx, D)} γ x K. x t x 2 x t+ x 2 δ dist 2θ x t, C D) where θ = γ [, ). So, Fact 2.2 implies that P C Dx t ) x for some x C D. Setting x = P C D x t ) in 8) we therefore obtain dist 2 x t+, C D) dist 2 x t, C D) δdist 2θ x t, C D). Now, the conclusion follows by applying Proposition 3.5 with θ = /γ. Remark 5.3 DR versus damped DR). Note that Theorem 5.2 only requires Hölder regularity of the underlying collection of constraint sets, rather than the damped DR operator explicitly. A careful examination of the proof of Theorem 5.2 shows that the inequality 7) does not hold for the basic DR algorithm which would require setting η = + ). 20

Remark 5.4 Comments on linear convergence). In the case when 0 sric D), where sri is the strong relative interior, then the Hölder regularity result holds with exponent γ = see [5]). The preceding proposition therefore implies that the damped Douglas Rachford method converges linearly in the case where 0 sric D). We next show that an explicit sublinear convergence rate estimate can be achieved in the case where H = R n and C, D are basic convex semi-algebraic sets. Theorem 5.5 Convergence rate for the damped DR algorithm with semi-algebraic sets). Let C, D be two basic convex semi-algebraic sets in R n with C D, where C, D are given by C = {x R n g i x) 0, i =,..., m } and D = {x R n h j x) 0, j =,..., m 2 } where g i, h j, i =,..., m, j =,..., m 2, are convex polynomials on R n with degree at most d. Let λ := inf t N λ t > 0 with λ t 0, 2] and let {y t, z t, x t ) be generated by the damped Douglas Rachford algorithm 5). Then, x t x C D. Moreover, there exist M > 0 and r 0, ) such that { x t Mt γ 2 γ) if d > x M r t if d =. { where γ = [min 2d ) n + 2, Bn )d }] n and βn ), is the central binomial coefficient with respect to n which is given by n [n )/2]). Proof. By Lemma 2. with θ =, we see that for any compact set K, there exists c > 0 such that for all x K, ) γ distx, C D) c distx, C) + distx, D) 2 γ c max{distx, C), distx, D)} γ. { where γ = [min 2d ) n + 2, Bn )d }] n. Note that γ = if d = ; while γ 0, ) if d >. The conclusion now follows from Theorem 5.2. 6 Examples In this section we fully examine two concrete problems which illustrate the difficulty of establishing optimal rates. We begin with an example consisting of two sets having an intersection which is bounded Hölder regular but not bounded linearly regular. In the special case where n =, it has previously been examined in detail as part of [0, Ex. 5.4]. Example 6. Half-space and epigraphical set described by x d ). Consider the sets C = {x, r) R n R r 0} and D = {x, r) R n R r x,..., x n ) d }, where d > 0 is an even number. Clearly, C D = {0 R n+} and ric rid =. It can be directly verified that {C, D} does not has a bounded linearly regular intersection because, for x k = k, 0,..., 0) Rn and r k = dist x k, r k ), C D ) = O k d, ) k and max{dist x k, r k ), C ), dist x k, r k ), D ) } = k d. 2

Let T C,D be the Douglas Rachford operator with respect to the sets C and D. We will verify that T C,D is bounded Hölder regular with exponent d. Granting this, by Corollary 4.4, the sequence x t, r t ) generated by the Douglas Rachford algorithm converges to a point in Fix T C,D = {0 R n} R + at least at the order of t 2d ), regardless of the chosen initial point Firstly, it can be verified see also [7, Cor. 3.9]) that and so, Moreover, for all x, r) R n R, Fix T C,D = C D + N C D 0) = {0 R n} R +, distx, r), Fix T ) = { x if r 0 x, r) if r < 0. x, r) T C,D x, r) = P D R C x, r)) P C x, r) = P D x, r ) x, min{r, 0}). Note that, for any z, s) R n R, denote z +, s + ) = P D z, s). Then we have s + = z + d and z + z) + d z + d 2 z + d s ) z + = 0. Let a, γ) = P D x, r ). Then, a = 0 R n if and only if x = 0 R n, a x = d a d 2 a d + r )a and γ = a d. It follows that So, a = + d a 2d 2 + d a d 2 x. 9) r x, r) T C,D x, r) = d 2 a d + r )a, a d min{r, 0}) d a { = d a d 2 a d + r)a, a d ) if r 0 d a d 2 a d r)a, a d r) if r < 0. Let K be any bounded set of R n+ and consider any x, r) K. By the nonexpansivity of the projection mapping, a, γ) = P D x, r ) is also bounded for any x, x 2 ) K. Let M > 0 be such that a, γ) M and x, r) M for all x, r) K. To verify the bounded Hölder regularity, we divide the discussion into two cases depending on the sign of r. Case r 0): As d is even, it follows that for all x, r) K with x 0 R n x, r) T C,D x, r) 2 x 2d = d2 a 2d ) a d + r) 2 + a 2d x 2d a 2d x 2d = + d a 2d 2 + d a d 2 r + dm 2d 2 + dm d ) 2d, ) 2d where the equality follows from 9). This shows that, for all x, r) K, distx, r), Fix T C,D ) + dm 2d 2 + dm d ) x, r) T C,D x, r) d. 22

Case 2 r < 0): As d is even, it follows that for all x, r) K\{0 R n+}, x, r) T C,D x, r) 2 x 2d + r 2d = + d2 a 2d ) ) a d r) 2 x 2d + r 2d a 2d + r 2 x 2d + r 2d = a 2d + r 2d M 2 2d x 2d + r 2d +d a 2d 2 +d a d 2 r ) 2d x 2d + r 2d M 2 2d x 2d + r 2d ) 2d min{ + dm 2d 2 + dm d, M 2 2d }, where the equality follows from 9). Therefore, there exists µ > 0 such that, for all x, r) K, distx, r), Fix T C,D ) µ x, r) T C,D x, r) d. Combining these two cases, we see that T C,D is bounded Hölder regular with exponent d, and so, the sequence x t, r t ) generated by the Douglas Rachford algorithm converges to a point in Fix T C,D = {0 R n} R + at least at the order of t 2d ). We note that, for n =, it was shown in [0] by examining the generated DR sequence directly) that the sequence x t converges to zero at the order t d 2 where d > 2. Note that r t = x t d. It follows that the actual convergence rate for x t, r t ) for this example is t d 2 in the case n =. Thus, our convergence rate estimate is not tight for this example in the case n =. On the other hand, as noted in [0], their analysis is largely limited to the 2-dimensional case and it is not clear how it can be extended to the higher dimensional setting. We now examine an even more concrete example involving a subspace and a lower level set of a convex quadratic function in the plane. Example 6.2 Hölder regularity of the DR operator involving a ball and a tangent line). Consider the following basic convex semi-algebraic sets in R 2 : C = {x R 2 x = 0} and D = {x R 2 x +, 0) 2 }, which have intersection C D = {0}. We now show that the DR operator T C,D is bounded Hölder regular. Since C D = [0, ] R, by [7, Cor. 3.9], the fixed point set is given by We therefore have that Fix T C,D = C D + N C D 0) =, 0] {0}. distx, Fix T C,D ) = { x x > 0, x 2 x 0. Setting α = / max{, x, 0) }, a direct computation shows that ) I + RD R C T C,D x := x = α αx, αx 2 ), 20) 2 23