Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30
Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and differential equations Often require only minimal conditions Simplify complicated convergence proofs 2 / 30
3 / 30
Notation space: Hilbert space H equipped with, and Fine to think in R 2 (though not always) An operator T : H H (or C C where C is closed subset of H) our focus: when FixT := {x H : x = T (x)} is nonempty the convergence of x k+1 T (x k ) simplification: T (x) is often written as T x 4 / 30
Examples unconstrained C 1 minimization: minimize f(x) x is a stationary point if f(x ) = 0 gradient descent operator: for γ > 0 T := I γ f the gradient descent iteration x k+1 T x k lemma: x is a stationary point if, and only if, x FixT 5 / 30
Examples constrained C 1 minimization: minimize f(x) subject to x C assume: f is proper closed convex, C is nonempty closed convex projected-gradient operator: for γ > 0 T := proj C (I γ f) x k+1 T x k is the projected-gradient iteration ( x k+1 proj C x k γ f(x k ) ) x is optimal if f(x ), x x 0 x C lemma: x is optimal if, and only if, x FixT 6 / 30
Lipschitz operator definition: an operator T is L-Lipschitz, L [0, ), if T x T y L x y, x, y H definition: an operator T is L-quasi-Lipschitz, L [0, ), if for any x FixT (assumed to exist), T x x L x x, x H L 7 / 30
Contractive operator definition: T is contractive if it is L-Lipschitz for L [0, 1) definition: T is quasi-contractive if it is L-quasi-Lipschitz for L [0, 1) L 1 8 / 30
Banach fixed-point theorem Theorem: If T is contractive, then T admits a unique fixed-point x (existence and uniqueness) x k x (convergence) x k x L k x 0 x (speed) Holds in a Banach space Also known as the Picard-Lindelöf Theorem 9 / 30
Examples minimize a Lipschitz-differentiable strongly-convex function: minimize f(x) definition: a convex f is L-Lipschitz-differentiable if f(x) f(y) 2 L x y, f(x) f(y) x, y domf definition: a convex f is µ-strongly convex if, element wise, f(x) f(y), x y µ x y 2 x, y domf lemma: Gradient descent operator T := I γ f is C-contractive for all γ in a certain interval. exercise: find the interval of γ and the formula of C in γ, L, µ Also true for a projected-gradient operator if C is closed convex and C domf 10 / 30
Nonexpansive operator definition: an operator is nonexpansive if it is 1-Lipschitz, i.e., 1 T x T y x y, x, y H properties: T may not have a fixed point x if x exists, x k+1 = T x k is bounded may diverge examples: rotation, alt. reflection T 2 x T T x 3 x θ x 11 / 30
Between L = 1 and L < 1 L < 1: linear (or geometric) convergence L = 1: bounded, may diverge A vast set of algorithms (often with sublinear convergence) cannot be characterized by L Alternative projection (von Neumann) Gradient descent without strong convexity Proximal-point algorithm without strong convexity Operator splitting algorithms 12 / 30
Averaged operator fixed-point residual operator: R := I T Rx = 0 x = T x averaged operator: from some η > 0, T x T y 2 x y 2 η Rx Ry 2, x, y H. quasi-averaged operator: from some η > 0, T x x 2 x x 2 η Rx 2, x H. interpretation: improve by the amount of fixed-point violation speed: may become slower as x k gets closer to the minimizer 13 / 30
convention: use α instead of η following η > 0 α (0, 1) η := 1 α α α-averaged operator: from some η > 0, special case: T x T y 2 x y 2 1 α α Rx Ry 2, x, y H α = 1 : T is called firmly nonexpansive 2 α = 1 (violating α (0, 1)): T is called nonexpansive 14 / 30
Why called averaged? Lemma T is α-averaged if, and only if, there exists a nonexpansive map T so that T = (1 α)i + αt. or equivalently, T := ( (1 1 α )I + 1 α T ) is nonexpansive. Proof. From T := (I 1 )I + 1 T = I 1 R, basic algebraic manipulation α α α gives us: for any x and y, α( x y 2 T x T y 2 ) = x y 2 T x T y 2 1 α α Rx Ry 2. Therefore, T is nonexpansive T is α-averaged. 15 / 30
Properties assume: T is α-averaged T has a fixed point x iteration: x k+1 T x k claims about the iteration: step-by-step, (a) x k+1 x 2 x k x 2 1 α α Rxk }{{} Rx =0 (b) by telescopic sum on (a), x k+1 x 2 x 0 x 2 1 α k α j=0 Rxj 2. (c) { Rx k 2 } is summable and Rx k 0 2 16 / 30
claims (cont.): (d) by (a), { x k x 2 } is monotonically decreasing until x k FixT (e) by (d), lim k x k x 2 exists (but not necessarily zero) (f) by (d), {x k } is bounded and thus has a weak cluster point x (note: H is weakly sequentially closed) Next: we will show that x FixT and then x k x. 17 / 30
claims (cont.): (h) demiclosedness principle: Let T be nonexpansive and R := I T. If x j x and lim Rx j = 0, then Rx = 0. Proof. Goal is to expand Rx 2 into convergent terms as j. Rx 2 = Rx j 2 + 2 Rx j, T x j T x + T x j T x 2 x j x 2 2 Rx, x j x Rx j 2 + 2 Rx j, T x j T x 2 Rx, x j x. Each term on the RHS 0 as j. Therefore, Rx 2 = 0. (i) by applying (h) to any converging subsequence, each cluster point x of {x k } is a fixed point. 18 / 30
claims (cont.): (j) By (e) and (i), x is the unique cluster point. Proof. Let ȳ also be a cluster point. ȳ FixT, just like x. by (e), both lim k x k x 2 and lim k x k ȳ 2 exist. algebraically, 2 x k, x ȳ = x k x 2 x k ȳ 2 + x 2 ȳ 2, whose RHS converges to a constant, say C. passing the limits of the two subsequence, to x and to ȳ, 2 x, x ȳ = 2 ȳ, x ȳ = c. hence, x ȳ 2 = 0. 19 / 30
Theorem (Krasnosel skĭi) Let T be an averaged operator with a fixed point. Then, the iteration x k+1 T x k converges weakly to a fixed point of T. 20 / 30
Mann s version Let T be a nonexpansive operator with a fixed point. Then, the iteration x k+1 (1 λ k )x k + λ k T x k (known as the KM iteration) converges weakly to a fixed point of T as long as λ k > 0, The λ k condition is ensured if λ k (1 λ k ) =. λ k [ɛ, 1 ɛ] (bounded away from 0 and 1) k 21 / 30
Remarks Can be relaxed to quasi-averagedness Summable errors can be added to the iteration In finite dimension, demiclosedness principle is not needed This fundamental result is largely ignore, yet often reproved in R n Browder-Göhde-Kirk fixed-point theorem: If T has no fixed point and λ k is bounded away from 0 and 1, the sequence {x k } is unbounded. Speed: Rx k 2 = o(1/k), no rate for x k x Much more applications than Banach s fixed-point theorem 22 / 30
Special cases proximal-point algorithm problem: minimize f(x) proximal operator: let λ > 0, T := prox λf Since T is firmly-nonexpansive, x k+1 prox λf (x k ) converges weakly to a minimizer of f, if it exists 23 / 30
Special cases gradient descent: Define the gradient-descent operator: T := I λ f iteration: x k+1 T x k = x k γ f(x k ) Baillion-Haddad theorem: if f is convex and f is L-Lipschitz, then f(x) f(y) 2 L x y, f(x) f(y) If f has a minimizer x, then 2 Lγ γ f(xk ) 2 2 x k x, γ f(x k ) 24 / 30
Directly expand x k+1 x 2 : x k+1 x 2 = x k γ f(x k ) x 2 Therefore, T is quasi-averaged if = x k x 2 2 x k x, γ f(x k ) + γ f(x k ) 2 x k x 2 ( 2 Lγ 1) γ f(xk ) 2. λ ( 0, 2 L). In fact, it is easy to show that T is averaged. The convergence result applies to gradient descent. 25 / 30
Composition of operators If T 1,..., T m : H H are nonexpansive, then T 1 T m is nonexpansive. If T 1,..., T m : H H are averaged, then T 1 T m is averaged. The averagedness constants get worse: let T i be α i-averaged (allowing α i = 1), then T = T 1 T m is α-averaged where α = m m 1 + 1 max i α i In addition, if any T i is contractive, T 1 T m is contractive. 26 / 30
Special cases projected-gradient method: convex problem: minimize f(x) subject to x C. x assume sufficient intersection between domf and C define: T := proj C (I λ f) assume f is L-Lipschitz, let λ (0, 2/L) since both proj C and (I λ f) are averaged, T is averaged therefore, the following sequence weakly converges to a minimizer, if exists: x k+1 T x k = proj C ( x k λ f(x k ) ) 27 / 30
Special cases prox-gradient method: convex problem: minimize x f(x) + h(x) assume sufficient intersection between domf and domh define: T := prox λh (I λ f) assume f is L-Lipschitz, let λ (0, 2/L) since both prox λh and (I λ f) are averaged, T is averaged therefore, the following sequence weakly converges to a minimizer, if exists: x k+1 T x k = proj λh ( x k λ f(x k ) ) 28 / 30
Special cases Later this course, we will see more special cases forward-backward iteration Douglas-Rachford and Peaceman-Rachford iteration ADMM Tseng s forward-backward-forward iteration Davis-Yin iteration primal-dual iteration 29 / 30
Summary Fixed-point iteration and analysis are powerful tools Contractive T : fixed-point exists, is unique, iteration strongly converges Nonexpansive T : bounded, if fixed-point exists Averaged T : weakly converges, if fixed-point exists More power: closedness under composition 30 / 30