On moving averages. April 1, Abstract

Similar documents
Self-dual Smooth Approximations of Convex Functions via the Proximal Average

On a result of Pazy concerning the asymptotic behaviour of nonexpansive mappings

Monotone operators and bigger conjugate functions

Near Equality, Near Convexity, Sums of Maximally Monotone Operators, and Averages of Firmly Nonexpansive Mappings

A Dykstra-like algorithm for two monotone operators

Heinz H. Bauschke and Walaa M. Moursi. December 1, Abstract

On a result of Pazy concerning the asymptotic behaviour of nonexpansive mappings

On the order of the operators in the Douglas Rachford algorithm

On the convexity of piecewise-defined functions

The Brezis-Browder Theorem in a general Banach space

Subgradient Projectors: Extensions, Theory, and Characterizations

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

The resolvent average of monotone operators: dominant and recessive properties

Firmly Nonexpansive Mappings and Maximally Monotone Operators: Correspondence and Duality

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Strongly convex functions, Moreau envelopes and the generic nature of convex functions with strong minimizers

The Nearest Doubly Stochastic Matrix to a Real Matrix with the same First Moment

Monotone Linear Relations: Maximality and Fitzpatrick Functions

A characterization of essentially strictly convex functions on reflexive Banach spaces

On Total Convexity, Bregman Projections and Stability in Banach Spaces

Existence and Approximation of Fixed Points of. Bregman Nonexpansive Operators. Banach Spaces

Geometric Mapping Properties of Semipositive Matrices

arxiv: v1 [math.fa] 25 May 2009

SCHUR IDEALS AND HOMOMORPHISMS OF THE SEMIDEFINITE CONE

On the Solution of Linear Mean Recurrences

Brøndsted-Rockafellar property of subdifferentials of prox-bounded functions. Marc Lassonde Université des Antilles et de la Guyane

Another algorithm for nonnegative matrices

CHUN-HUA GUO. Key words. matrix equations, minimal nonnegative solution, Markov chains, cyclic reduction, iterative methods, convergence rate

Subdifferential representation of convex functions: refinements and applications

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Research Article Strong Convergence of a Projected Gradient Method

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

MOSCO STABILITY OF PROXIMAL MAPPINGS IN REFLEXIVE BANACH SPACES

EQUIVALENCE OF TOPOLOGIES AND BOREL FIELDS FOR COUNTABLY-HILBERT SPACES

Key words. Complementarity set, Lyapunov rank, Bishop-Phelps cone, Irreducible cone

THE CYCLIC DOUGLAS RACHFORD METHOD FOR INCONSISTENT FEASIBILITY PROBLEMS

Numerical Analysis: Solving Systems of Linear Equations

Continuous Sets and Non-Attaining Functionals in Reflexive Banach Spaces

MAXIMALITY OF SUMS OF TWO MAXIMAL MONOTONE OPERATORS

Convergence of generalized entropy minimizers in sequences of convex problems

ELEMENTARY LINEAR ALGEBRA

Diagonalization by a unitary similarity transformation

Chapter 3 Transformations

Victoria Martín-Márquez

Epiconvergence and ε-subgradients of Convex Functions

SOMEWHAT STOCHASTIC MATRICES

Intrinsic products and factorizations of matrices

AW -Convergence and Well-Posedness of Non Convex Functions

Boolean Inner-Product Spaces and Boolean Matrices

Regular finite Markov chains with interval probabilities

ELEMENTARY LINEAR ALGEBRA

Examples of Convex Functions and Classifications of Normed Spaces

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Notes on Linear Algebra and Matrix Theory

MATH 326: RINGS AND MODULES STEFAN GILLE

Iterative algorithms based on the hybrid steepest descent method for the split feasibility problem

Optimization and Optimal Control in Banach Spaces

arxiv: v1 [math.fa] 30 Jun 2014

Math113: Linear Algebra. Beifang Chen

Eventually reducible matrix, eventually nonnegative matrix, eventually r-cyclic

HASSE-MINKOWSKI THEOREM

Quasi-relative interior and optimization

Convex Feasibility Problems

Eigenvectors Via Graph Theory

The spectrum of a square matrix A, denoted σ(a), is the multiset of the eigenvalues

Affine iterations on nonnegative vectors

Convex analysis and profit/cost/support functions

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

ELEMENTARY LINEAR ALGEBRA

Convergence Theorems for Bregman Strongly Nonexpansive Mappings in Reflexive Banach Spaces

0 Sets and Induction. Sets

SOME REMARKS ON SUBDIFFERENTIABILITY OF CONVEX FUNCTIONS

arxiv: v1 [math.co] 10 Aug 2016

On the eigenvalues of Euclidean distance matrices

Wavelets and Linear Algebra

On the spectra of striped sign patterns

Spectral Properties of Matrix Polynomials in the Max Algebra

Spectral theory for compact operators on Banach spaces

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

Convex Analysis and Economic Theory AY Elementary properties of convex functions

Key words. Strongly eventually nonnegative matrix, eventually nonnegative matrix, eventually r-cyclic matrix, Perron-Frobenius.

FUNCTION BASES FOR TOPOLOGICAL VECTOR SPACES. Yılmaz Yılmaz

Markov Chains, Stochastic Processes, and Matrix Decompositions

CONCENTRATION OF THE MIXED DISCRIMINANT OF WELL-CONDITIONED MATRICES. Alexander Barvinok

An Even Order Symmetric B Tensor is Positive Definite

Convergence Theorems of Approximate Proximal Point Algorithm for Zeroes of Maximal Monotone Operators in Hilbert Spaces 1

Cover Page. The handle holds various files of this Leiden University dissertation

On Semicontinuity of Convex-valued Multifunctions and Cesari s Property (Q)

Nearly convex sets: fine properties and domains or ranges of subdifferentials of convex functions

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

DAVIS WIELANDT SHELLS OF NORMAL OPERATORS

Linear Algebra March 16, 2019

The sum of two maximal monotone operator is of type FPV

The Strong Convergence of Subgradients of Convex Functions Along Directions: Perspectives and Open Problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

ITERATIVE SCHEMES FOR APPROXIMATING SOLUTIONS OF ACCRETIVE OPERATORS IN BANACH SPACES SHOJI KAMIMURA AND WATARU TAKAHASHI. Received December 14, 1999

Detailed Proof of The PerronFrobenius Theorem

On the Solution of Linear Mean Recurrences

arxiv: v1 [math.fa] 21 Dec 2011

Matrix functions that preserve the strong Perron- Frobenius property

Transcription:

On moving averages Heinz H Bauschke, Joshua Sarada, and Xianfu Wang April 1, 2013 Abstract We show that the moving arithmetic average is closely connected to a Gauss Seidel type fixed point method studied by Bauschke, Wang and Wylie, and which was observed to converge only numerically Our analysis establishes a rigorous proof of convergence of their algorithm in a special case; moreover, the limit is explicitly identified Moving averages in Banach spaces and Kolmogorov means are also studied Furthermore, we consider moving proximal averages and epi-averages of convex functions 2010 Mathematics Subject Classification: Primary 15B51, 26E60, 47H10; Secondary 39A06, 65H04, 47J25, 49J53 Keywords: Arithmetic mean, difference equation, epi-average, Kolmogorov mean, linear recurrence relation, means, moving average, proximal average, stochastic matrix 1 Introduction Throughout this paper, we assume that m {2, 3, } and that (1) X = R m is an m-dimension real Euclidean space with standard inner product, Mathematics, University of British Columbia, Kelowna, BC V1V 1V7, Canada E-mail: heinzbauschke@ubcca 59 Tuscany Valley Hts NW, Calgary, Alberta T3L 2E7, Canada E-mail: jshsarada@gmailcom Mathematics, University of British Columbia, Kelowna, BC V1V 1V7, Canada E-mail: shawnwang@ubcca 1

and induced norm We also assume that (2) α 0,, α m 1 are nonnegative real numbers such that m 1 i=0 α i = 1, and α m := α 0 It will be convenient to introduce the following notation for the partial sums and associated weights: (3) ( k {0, 1,, m 1} ) a k = k α i and a = (a 0, a 1,, a m 1 ) i=0 and (4) ( k {0, 1,, m 1} ) λ k = a e k a e = a k i=0 a i = ki=0 α i j j=0 i=0 α and λ = i λ 0 λ m 1 We will study the homogeneous linear difference equation (5) ( n m) y n = α m 1 y n 1 + + α 0 y n m, where (y 0,, y m 1 ) R m In the literature, this is called the moving average (and it is also known as the rolling average, rolling mean or running average) It says that given a series of numbers and a fixed subset size, the first element of the moving average is obtained by taking the average of the initial fixed subset of the number series Then the subset is modified by shifting forward, excluding the first number of the series and including the next number following the original subset in the series This creates a new subset of numbers, which is averaged This process is repeated over the entire data series The moving average is widely applied in statistics, signal processing, econometrics and mathematical finance; see, eg, also [9, 10, 15, 13, 26] In [4, 5], we observed the numerical convergence of a Gauss Seidel type fixed point iteration numerically, but we were unable to provide a rigorous proof In this note, we present a connection between the moving average and this fixed point recursion; this allows us to give an analytical proof for the case when all monotone operators are zero 2

Moreover, the limit is identified However this approach is unlike to generalize to the general fixed point iteration due to the interlaced nonlinear resolvents While the results rest primarily on results from linear algebra, we consider in the second half of the paper several related highly nonlinear moving averages that exhibit a hidden linearity and thus allow to be rigorously studied The paper is organized as follows In Section 2, we present various facts and auxiliary results rooted ultimately in Linear Algebra In Section 3 we establish the connection between the moving average and the Gauss Seidel type iteration scheme studied by Bauschke, Wang and Wylie, which leads to a rigorous proof for the convergence of their algorithm in the aforementioned special case In Section 4 we show that the iteration matrix has a closed form when α 0 = 0 and α 1 = = α m 1 = 1/(m 1) Moving Kolmogorov means (also known as f -means) are considered in Section 5 In particular, various known means such as arithmetic mean, harmonic mean, resolvent mean, etc [10, 7] all turn out to be special cases of this general framework The final Section 6 concerns moving proximal averages and epi-averages of convex functions Our notation follows [3, 19, 24, 25] The identity operator Id is defined by X X : x x A mapping T : X X is nonexpansive (Lipschitz-1) if ( x X)( y X) Tx Ty x y The set of fixed points of T is denoted by Fix T = { x X x = Tx } The following matrix plays a central role in this paper 0 1 0 0 0 0 1 0 (6) A = 0 0 0 1 α 0 α 1 α 2 α m 1 Note that (7) Fix A = D = { x = (x, x,, x) X }, x R where D is also called the diagonal in X By [19, page 648], the characteristic polynomial of A is (8) p(x) = x m α m 1 x m 1 α 1 x α 0, and, in turn, A is the (transpose of the) companion matrix of p(x) As usual, we use C for the field of complex numbers; R + (R ++ ) for the nonnegative real numbers (positive real numbers); N = {0, 1, 2, } for the natural numbers For 3

B C m m, ρ(b) = max λ σ(b) λ is the spectral radius of B and σ(b) denotes the spectrum of B, ie, the set of eigenvalues of B Given λ σ(b), we say that λ is simple if its algebraic multiplicity is 1 and that it is semisimple if its algebraic and geometric multiplicities coincide A simple eigenvalue is always semisimple, see [19, pages 510 511] A vector y C m {0} satisfying By = λy (y B = λy ) is called a right-hand (left-hand) eigenvector of B with respect to the eigenvalue λ (The denotes the conjugate transpose of a vector or a matrix) Then range of an operator B is denoted by ran(b); if B is linear, we denote its kernel by ker(b) It will be convenient to denote the ith standard unit column vector by e i and to also set e = m i=1 e i = (1, 1,, 1) X We denote by S N N the space of all N N real symmetric matrices, while S+ N N and S++ N N stand, respectively, for the set of N N positive semidefinite and positive definite matrices The greatest common divisor of a set of integers S is denoted by gcd(s) Turning to functions, we let Γ(X) be the set of all functions that are convex, lower semicontinuous, and proper We denote the quadratic energy function by q = 2 1 2 Finally, the Fenchel conjugate g of a function g is given by g (x) = sup y X ( x, y g(y)) 2 Linear algebraic results To make our analysis self-contained, let us gather in this section some useful facts and auxiliary results on stochastic matrices, linear recurrence relations and the convergence of matrix powers Basic properties of stochastic matrices Recall that a matrix B with nonnegative entries is called stochastic (or row-stochastic) if each row sum is equal to 1 (See [8, Chapter 8] or [19, Section 84] for more on stochastic matrices) Fact 21 (See [19, pages 689 and 696]) Let B R m m be stochastic Then ρ(b) = 1 and 1 is a semisimple eigenvalue of B with eigenvector e The proof of the next result is a simple verification and hence omitted Proposition 22 The vectors a and e are, respectively, left-hand and right-hand eigenvectors of A associated with the eigenvalue 1 4

Linear recurrence relations and polynomials Consider the linear recurrence relation (9) ( n m) y n = α m 1 y n 1 + + α 0 y n m, where (y 0,, y m 1 ) R m Setting ( n N) y [n] = (y n, y n+1,, y n+m 1 ), we see that we can rewrite (9) as (10) ( n N) y [n+1] = Ay [n] = = A n+1 y [0] This explains our interest in understanding the limiting behaviour of powers of A, which yield information about the limiting behaviour of (y [n] ) n N and hence of (y n ) n N The solution to (9) depends on the roots of characteristic polynomial (11) p(x) = x m α m 1 x m 1 α 1 x α 0 of A Fact 23 (See [22, page 90] or [21, pages 74 75]) Let k be the number of distinct roots u 1,, u k of (11) with multiplicities m 1,, m k, respectively Then for each i {1,, k}, there exists a polynomial q i of degree m i 1 such that the general solution of (9) is (12) ( n N) y n = k q i (n)ui n i=1 Consequently, if k = m, ie, all roots are distinct, then there exists (ν 1,, ν m ) C m such that (13) ( n N) y n = m ν i ui n i=1 Fact 24 (Ostrowski) (See [22, Theorem 122]) Let (b 1,, b m ) R m + and set (14) q(x) = x m b 1 x m 1 b m Suppose that gcd { k {1,, m} bk > 0 } = 1 Then q has a unique positive root r Moreover, r is a simple root of q, and the modulus of every other root of q is strictly less than r See also [12, 23] for further results on polynomials Let us now state a basic assumption that we will impose repeatedly Basic Hypothesis 25 The polynomial p(x) given by (11) has a unique positive root, which is simple and equal to 1, and each other root has modulus strictly less than 1 This happens if one of the following holds: 5

(i) gcd { i {1,, m} αm i > 0 } = 1 (ii) α m 1 > 0 (iii) ( i {1,, m 1}) α m i α m (i+1) > 0 (iv) ( {i, j} {1,, m 1}) gcd{i, j} = 1 and α m i α m j > 0 Proof (i): This is clear from Fact 24 and the definition of p(x) (ii) (iv): Each condition implies (i) Limits of matrix powers and linear recurrence relations Fact 26 (See [19, pages 383 386, 518 and 630]) Let B C m m Then the following hold: (i) (B n ) n N converges to a nonzero matrix if and only if 1 is a semisimple eigenvalue of B and every other eigenvalue of B has modulus strictly less than 1 (ii) If lim n N B n exists, then it is the projector onto ker(id B) along ran(id B) (iii) If ρ(b) = 1 is a simple eigenvalue with right-hand and left-hand eigenvectors x and y respectively, then lim n N B n = xy y x Corollary 27 Suppose that the Basic Hypothesis 25 holds Then α 1i=0 0 α i (15) lim A n = ea n N a e = 1 α 1i=0 0 α i ki=0 k=0 α i α 0 1i=0 α i m 2 m 2 i=0 α i 1 i=0 α i 1 m 2 i=0 α i 1 = eλ Proof Clear from Proposition 22, Fact 26(iii) and the definitions Corollary 28 Suppose that the Basic Hypothesis 25 holds and consider the sequence (y n ) n N generated by the linear recurrence relation (16) ( n m) y n = α m 1 y n 1 + + α 0 y n m, where y = (y 0,, y m 1 ) R m Then (17) lim n N y n = a y a e = k=0 a ky k k=0 a k = ki=0 k=0 α i y k ki=0 = α i k=0 m 1 k=0 λ k y k 6

Proof This follows from Corollary 27 and (10) Definition 29 Let Y be a real Banach space and consider the sequence (y n ) n N in Y generated by the linear recurrence relation (18) ( n m) y n = α m 1 y n 1 + + α 0 y n m, where y = (y 0,, y m 1 ) Y m If lim n N y n exists, then we set it equal to l(y) and we write y dom l Remark 210 Note that dom l is a linear subspace of Y m, that { y = (y,, y) Y m } y Y dom l, and that l : dom l Y is a linear operator Furthermore, ( (19) (y0,, y m 1 ) Y m) {y n } n N conv{y 0,, y m 1 } span{y 0,, y m 1 } This implies ( (20) y = (y0,, y m 1 ) dom l) l(y) conv{y 0,, y m 1 } span{y 0,, y m 1 } because the convex hull of finitely many points is compact (see, eg, [1, Corollary 530]) We are now ready to lift Corollary 28 to general Banach spaces Corollary 211 Suppose that the Basic Hypothesis 25 holds and consider sequence (y n ) n N generated by the linear recurrence relation (21) ( n m) y n = α m 1 y n 1 + + α 0 y n m, where y = (y 0,, y m 1 ) Y m, where Y is a real Banach space Then (22) lim n N y n = l(y 0,, y m 1 ) = k=0 a ky k k=0 a k = ki=0 k=0 α i y k ki=0 = α i k=0 m 1 k=0 λ k y k Proof Denote the right-hand side of (22) by z, let y Y and define ( n N) η n = y (y n ) Applying y to (21) gives rise to the linear recurrence relation (23a) ( n m) η n = α m 1 η n 1 + + α 0 η n m, where (23b) (η 0,, η m 1 ) = ( y (y 0 ),, y (y m 1 ) ) R m Corollary 28 and the linearity and continuity of y imply that (24) y k=0 (y n ) = η n a ( kη k k=0 a = y k=0 a ) ky k k k=0 a = y (z) k It follows that (y n ) n N converges weakly to z On the other hand, (y n ) n N lies not only in a compact subset but also in a finite-dimensional subspace of Y (see Remark 210) Altogether, we deduce that (y n ) n N strongly converges to z 7

Reducible matrices Recall (see, eg, [19, page 671]) that B C m m is reducible if there exists a permutation matrix P such that ( ) (25) P U W BP =, 0 V where U and V are nontrivial square matrices; otherwise, B is irreducible Proposition 212 Let B C m m have at least one zero column Then B is reducible Proof By assumption, there exists a permutation matrix P such that the first column of BP is equal to the zero vector Then ( ) (26) P U W BP =, 0 V where U = 0 C 1 1 and V C (m 1) (m 1) are nontrivial square matrices Circulant matrices It is interesting to compare the above results to circulant matrices To this end, we set α 0 α 1 α 2 α m 1 α m 1 α 0 α 1 α m 2 (27) C = α m 2 α m 1 α 0 α m 3 α 1 α 2 α 3 α 0 Fact 213 (See [18, Theorems 1 and 2]) Set U = { k {1,, m} αk > 0 } Then (28) L = lim n N C n exists if and only if γ = gcd(u {m}) = gcd((u U) {m}), in which case the entries of L satisfy { γ/m, if i j (mod γ); (29) L i,j = 0, otherwise Remark 214 Note that C is bistochastic, ie, both C and C are stochastic Since permutation matrices are clearly nonexpansive, it follows from Birkhoff s theorem (see, eg, [17, Theorem 871]) that C is convex combination of nonexpansive matrices Hence, C is nonexpansive as well Moreover, one verifies readily that Fix C = D 8

Example 215 Suppose there exists i {1,, m 1} such that {i, i + 1} U Then (C n ) n N converges to the orthogonal projector onto D, ie, 1 1 1 (30) lim C n = 1 n N m ee = 1 1 1 1 m 1 1 1 For further results on circulant matrices, see [14] and in particular [16] 3 Main results In this section, we study the convergence of the Gauss Seidel type fixed point iteration scheme proposed in [4, 5] Recalling (2), we set α 0 α 1 α 2 α m 1 1 0 0 0 0 1 0 0 α m 1 α 0 α 1 α m 2 T 1 = 0 0 1 0, T 2 = 0 0 1 0 (31a), 0 0 0 1 0 0 0 1 (31b) or entrywise (32) 1 0 0 0 0 1 0 0 T m = 0 0 1 0 ; α 1 α 2 α 3 α 0 Furthermore, we define ( Tk ) i,j = α m k+j, if i = k and 1 j < k; α j k, if i = k and m j k; 1, if i = k and j = i; 0, otherwise (33) T = T m T 1 9

When α 0 = 0, α 1 = = α m 1 = m 1 1, iterating T exactly corresponds to the algorithm investigated in [4, 5] when all monotone operators are zeros The authors there observed the numerical convergence but were not able to provide a rigorous proofs We will present a rigorous proof by connecting T and A We start with some basic properties Proposition 31 Let k {1,, m} Then the following hold: (i) T k and T are stochastic (ii) If α 0 = 0, then neither T k nor T is nonexpansive (iii) If α 0 = 0, then neither T k nor T is irreducible Proof (i): By (2), T k is stochastic and so is therefore T as a product of stochastic matrices (ii): Indeed, in this case T k (e e k ) = e and thus T k (e e k ) 2 = m > m 1 = e e k 2 Similarly, T(e e 1 ) = e and thus T(e e 1 ) 2 = m > m 1 = e e 1 2 (iii): In this case, the T 1 and T k have a zero column, hence so does T Now apply Proposition 212 Proposition 31 illustrates neither the theory of nonexpansive mappings nor that of irreducible matrices (as is done in, eg, [9, 19]) is applicable to study limiting properties of (T n ) n N Fortunately, we are able to base our analysis on the right-shift and left-shift operators which are respectively given by 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 (34) R = 0 1 0 0, L = 0 0 0 1 0 0 1 0 1 0 0 0 Note that R and L are permutation matrices which satisfy the following properties which we will use repeatedly: (35) R m = L m = Id, R = R 1 = L, L = L 1 = R, R m k = L k, where k {0,, m} A key observation is the following result which connects T k and the companion matrix A Proposition 32 For every k {1,, m}, we have (36a) T k = R k AL k 1, 10

(36b) (36c) T k T k 1 T 1 = R k A k, T = A m Proof We prove this by induction on k Clearly, (36a) and (36b) hold when k = 1 Now assume that (36a) and (36b) hold for some k {1,, m 1} Then T k+1 = RT k L = R(R k AL k 1 )L = R k+1 AL k, which is (36a) for k + 1 Hence T k+1 T 1 = T k+1 (T k T 1 ) = (R k+1 AL k )(R k A k ) = R k+1 A k+1, which is (36b) for k + 1 Finally, (36c) follows from (33) and (36b) with k = m We are now able to derive our main results which resolves a special case of an open problem posed in [5] Theorem 33 (main result) Suppose that the Basic Hypothesis 25 holds Then α 1i=0 0 α i (37) lim T n = ea n N a e = 1 α 1i=0 0 α i ki=0 k=0 α i α 1i=0 0 α i and Fix T = D = Fix T 1 Fix T m m 2 m 2 i=0 α i 1 i=0 α i 1 m 2 i=0 α i 1 = eλ Proof In view of (36c), it is clear that (T n ) n N is a subsequence of (A n ) n N Hence (37) follows from Corollary 27 Now set L = lim n N T n = a e/(e a) On the one hand, D = Fix A Fix T by (7) and (36c) On the other hand, Fix T ran L D Altogether, Fix T = D Finally, clearly Fix T 1 Fix T m Fix T = D Fix T 1 Fix T m Remark 34 Theorem 33, with the choice (α 0,, α m 1 ) = (m 1) 1 (0, 1, 1,, 1), settles [5, Questions Q1 and Q2 on p 36] when all resolvents coincide with Id Furthermore, (37) gives an explicit formula for the limit of (T n ) n N 4 A special case: α 0 = 0 and α 1 = = α m 1 = 1/(m 1) This assignment of parameters was the setting of [5] In this case, even closed forms for T, and for the partial products T k T k 1 T 1 and A k, are available as we demonstrate in this section To this end, we abbreviate (38) γ = 1 m 1 11

Then (31) and (6) turn into 0 γ γ γ 1 0 0 0 0 1 0 0 γ 0 γ γ T 1 = 0 0 1 0, T 2 = 0 0 1 0 (39a), 0 0 0 1 0 0 0 1, 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 T m = 0 0 1 0, and A = 0 0 0 0 (39b) γ γ γ 0 0 γ γ γ Proposition 41 Define T R m m entrywise by γ(1 + γ) i 1, if i < j; (40) Ti,j := γ(1 + γ) i 1 γ(1 + γ) i j, if i j Let k {1,, m} Then ( ) first k rows of T (41) T k T k 1 T 1 = ; 0 (m k) k I m k or entrywise (42) ( Tk T k 1 T 1 ) i,j := T i,j, if 1 i k; 1, if k + 1 i m and j = i; 0, if k + 1 i m and j = i In particular, (43) T = T Proof Set (44) S k = T k T k 1 T 1 We prove (41) by induction on k {1,, m}; the rest follows readily 12

k = 1: On the one hand, by definition, 0 γ γ γ 0 1 0 0 (45) T 1 = 0 0 1 0 0 0 0 1 On the other hand, the first row of T, using (40), is (0, γ,, γ) Hence the statement is true for k = 1 k 1 k: Assume the statement is true for k {2,, m}, ie, ( ) ( ) first k 1 rows of T first k 1 rows of T (46) T k 1 T 1 = = 0 (m (k 1)) (k 1) I m (k 1) 0 (m k+1)) (k 1) I m k+1 ; or entrywise, this matrix S satisfies T i,j, if 1 i k 1; (47) ( i)( j) S i,j := 1, if k i m and j = i; 0, if k i m and j = i Recall that (48) ( i)( j) ( Tk ) i,j = γ, if i = k and j = k; 1, if i = k and j = i; 0, otherwise It is clear that T k S = T k T k 1 T 1 We must show that ( ) (49) T k S =? first k rows of T 0 (m k) k I m k We do this entrywise, and thus fix i and j in {1,, m} Case 1: 1 i k 1 Then (T k S) i,j = m l=1 (T k ) i,l S l,j = (T k ) i,i S i,j = S i,j, which shows that the first k 1 rows of T k S are the same as the first k 1 rows of S, which in turn are the same as the first k 1 rows of T, as required Case 2: i = k Then (T k S) i,j = (T k S) k,j = m l=1 (T k ) k,l S l,j = k 1 l=1 (T k) k,l S l,j + m l=k+1 (T k ) k,l S l,j = k 1 l=1 γ T l,j + m l=k+1 γs l,j 13

Subcase 21: j k 1 Then (T k S) i,j = (T k S) k,j = k 1 l=1 γ T l,j = γ j 1 T l=1 l,j + γ k 1 l=j T l,j = γ j 1 l=1 γ(1 + γ)l 1 + γ k 1 ( l=j γ(1 + γ) l 1 γ(1 + γ) l j) = γ 2 k 1 l=1 (1 + γ) l 1 γ k 1 l=j γ(1 + γ) l j = γ(1 + γ) k 1 γ(1 + γ) k j = T k,j = T i,j Subcase 22: j = k Then (T k S) i,j = (T k S) k,k = k 1 l=1 γ T l,k = γ k 1 l=1 γ(1 + γ)l 1 = γ(1 + γ) k 1 γ = T k,k = T i,j Subcase 23: k + 1 j Then (T k S) i,j = (T k S) k,j = k 1 l=1 γ T l,j + γ = γ k 1 l=1 γ(1 + γ) l 1 + γ = γ(1 + γ) k 1 = T k,j = T i,j Case 3: i k + 1 Then (T k S) i,j = m l=1 (T k ) i,l S l,j = (T k ) i,i S i,j = S i,j Subcase 31: j = i Then (T k S) i,j = S i,i = 1 Subcase 32: j = i Then (T k S) i,j = S i,j = 0 Altogether, we have shown that ( ) first k rows of T (50) T k S = 0 (m k) k I m k The In particular part is the case when k = m Corollary 42 Let k {1,, m} Then ( ) 0(m k) k (51) A k I m k = ; first k rows of T or entrywise (52) ( A k ) i,j = 1, if 1 i m k and j = i + k; 0, if 1 i m k and j = i + k; T i+k m,j, if m k + 1 i m Proof Proposition 32 yields A k = L k (T k T k 1 T 1 ) and the result now follows from Proposition 41 Remark 43 We do not know whether it is possible to obtain a closed form for the powers of T that does not rely on the eigenvalue analysis from Section 2 If such a formula exists, one may be able to construct a different proof of Theorem 33 in this setting 14

5 Kolmogorov means We now turn to moving Kolmogorov means, which are sometimes also called f -means Theorem 51 (moving Kolmogorov means) Let D be a nonempty topological Hausdorff space, let Z be a real Banach space, and let f : D Z be an injective mapping such that ran f is convex Consider the linear recurrence relation (53a) ( n m) y n = f 1( α m 1 f (y n 1 ) + + α 0 f (y n m ) ), where (53b) (y 0,, y m 1 ) D m Then the following hold: (i) The sequence (y n ) n N is well defined and the sequence ( f (y n )) n N lies in the compact convex set conv{ f (y 0 ),, f (y m 1 )} (ii) Suppose that the Basic Hypothesis 25 holds and that f 1 is continuous from f (D) to D Then (54) lim y n = f 1 n N ( m 1 k=0 ) ( λ k f (y k ) = f 1 k=0 a k f (y k ) k=0 a k Proof (i): This is similar to Remark 210 and proved inductively ) ( ki=0 = f 1 k=0 α i f (y k ) ki=0 ) α i k=0 (ii): By Corollary 211, ( f (y n )) n N converges to l( f (y 0 ),, f (y m 1 )), which belongs to conv{ f (y 0 ),, f (y m 1 )} conv f (D) = f (D) = ran f by assumption and (i) Since f 1 is continuous, the result follows from (22) Theorem 51(ii) allows for a universe of examples by appropriately choosing f and (α 0,, α m 1 ) Let us present some classical moving means on (subsets) of the real line with the equal weights α 0 = = α m 1 = 1/m Note that the Basic Hypothesis 25 is satisfied Corollary 52 (some classical means) Let D be a nonempty subset of R, let (y 0,, y m 1 ) D m Here is a list of choices for f in Theorem 51, along with the limits obtained by (54): (i) If D = R and f = Id, then the moving arithmetic mean sequence satisfies (55) y n = 1 m y n 1 + + 1 m y n m m 1 2 (j + 1)y m(m + 1) j j=0 15

(ii) If D = R ++ and f = ln, then the moving geometric mean sequence satisfies (56) y n+m = m y n+m 1 y n ( m 1 j=0 ) 2 y j+1 m(m+1) j (iii) If D = R + and f : x x p, then the moving Hölder mean sequence satisfies ( 1 (57) y n+m = m yp n+m 1 + + 1 ) 1/p ( m 1 2 ) 1/p m yp n (j + 1)y p m(m + 1) j (iv) If D = R ++ and f : x 1/x, then the moving harmonic mean sequence satisfies (58) y n+m = ( 1 1 + + 1 m y n+m 1 m ) 1 ( 1 Let us provide some means situated in a space of matrices y n 2 m(m + 1) j=0 m 1 (j + 1) 1 ) 1 y j Corollary 53 (some matrix means) Let (Y 0,, Y m 1 ) S++ N N Then the following hold: (i) The moving arithmetic mean satisfies (59) Y n = 1 m Y n 1 + + 1 m Y n m (ii) The moving harmonic mean satisfies (60) Y n = j=0 m 1 2 (j + 1)Y m(m + 1) j j=0 ( 1 m Y 1 n 1 + + 1 1 ( 2 m n m) Y 1 m(m + 1) (iii) The moving resolvent mean (see also [7] ) satisfies m 1 j=0 (j + 1)Y 1 j ) 1 ( 1 (61) Y n = m (Y n 1 + Id) 1 + + 1 ) 1 m (Y n m + Id) 1 Id ( m 1 2 ) 1 (j + 1)(Y m(m + 1) j + Id) 1 Id Proof This follows from Theorem 51 with D = S++ N N and f = Id, f = Z Z 1, and f = Z (Z + Id) 1, respectively Note that matrix inversion is continuous; see, eg, [19, Example 627] 16 j=0

6 Moving proximal and epi averages for functions In this last section, we apply the moving average (in particular Corollary 28) to functions in the context of proximal and epi averages We set (62) Z = R l, where l {1, 2, } Let us start by reviewing a key notion (see also [2, 11, 25]) Definition 61 (epi-convergence) (See [25, Proposition 72]) Let g and (g n ) n N be functions from Z to ], + ] Then (i) (g n ) n N pointwise converges to g, in symbols g n g(z) = lim n N g n (z); p g, if for every z Z we have (ii) (g n ) n N epi-converges to g, in symbols g n e g, if for every z Z one has (a) ( (z n ) n N ) zn z g(z) lim g n (z n ), and (b) ( (y n ) n N ) yn z and lim g n (y n ) g(z) Let g : Z ], + ] be an extended real valued function Recall that (63) ( g q ) : z inf y Z (g(y) + 1 2 z y 2) is the Moreau envelope [20] of g By [25, Theorem 226] (see also [20]), if g Γ(Z), then g q is convex and continuously differentiable on Z Especially important are the following connections among epi-convergence pointwise convergence of envelopes, and the epicontinuity of Fenchel conjugation Fact 62 (See [25, Theorems 737 and 1134]) Let (g n ) n N and g be in Γ(Z) Then (64) g n e g gn q p g q g n e g We now turn to the moving proximal average (see [6] for further information on this operation) Theorem 63 (moving proximal average) Let (g 0,, g m 1 ) (Γ(Z)) m and define (65) ( n m) g n = ( α m 1 (g n 1 + q) + + α 0 (g n m + q) ) q Then the following hold: 17

(i) The sequence (g n ) n N lies in Γ(Z) (ii) ( n m) g n q = α m 1 (g n 1 q) + + α 0 (g n m q) (iii) Suppose that the Basic Hypothesis 25 holds Then e ( (66a) g n g = λm 1 (g m 1 + q) + + λ 0 (g 0 + q) ) q and (66b) g n e g = ( λ m 1 (g m 1 + q) + + λ 0 (g 0 + q) ) q Proof (i): Combine [6, Propositions 43 and 52] (ii): See [6, Theorem 62] (iii): It follows from (ii) and Corollary 28 that g n q p k=0 λ k(g k q) = g q The result now follows from Fact 62 and [6, Theorem 51] We conclude the paper with the moving epi-average Recall that for α > 0 and g Γ(Z), α g = αg α 1 Id and that 0 g = ι {0}, where ι {0} (0) = 0 and ι {0} (z) = + if z Z {0} Theorem 64 (moving epi-average) Let (g 0,, g m 1 ) (Γ(Z)) m and define (67) ( n m) g n = (α m 1 g n 1 ) (α 0 g n m ) Suppose that for every k {0,, m 1}, g k is cofinite and that the Basic Hypothesis 25 holds Then (68) g n e g = (λm 1 g m 1 ) (λ 0 g 0 ) Proof It follows from [3, Proposition 157(iv)] that ( n N) g n Γ(Z), g n is cofinite, and g n is continuous everywhere Furthermore, taking the Fenchel conjugate of (67) yields (69) ( n m) g n = α m 1 g n 1 + + α 0g n m This and Corollary 28 imply not only that gn p g but also that the sequence (gn) n N is equi-lsc everywhere (see [25, pages 248f]) In turn, [25, Theorem 710] implies that e g The conclusion therefore follows from Fact 62 g n 18

Acknowledgments The authors are grateful to an anonymous referee, Jon Borwein and Tamás Erdélyi for some helpful comments Heinz Bauschke was partially supported by the Natural Sciences and Engineering Research Council of Canada and by the Canada Research Chair Program Joshua Sarada was partially supported by the Irving K Barber Endowment Fund Xianfu Wang was partially supported by the Natural Sciences and Engineering Research Council of Canada References [1] CD Aliprantis and KC Border, Infinite Dimensional Analysis, third edition, Springer, 2006 [2] H Attouch, Variational Convergence for Functions and Operators, Applicable Mathematics Series, Pitman Advanced Publishing Program, Boston, MA, 1984 [3] HH Bauschke and PL Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, 2011 [4] HH Bauschke, X Wang, and CJS Wylie, Fixed points of averages of resolvents: geometry and algorithms, http://arxivorg/pdf/11021478v1, February 2011 [5] HH Bauschke, X Wang, and CJS Wylie, Fixed points of averages of resolvents: geometry and algorithms, SIAM Journal on Optimization 22 (2012), 24 40 [6] HH Bauschke, R Goebel, Y Lucet, and X Wang, The proximal average: basic theory, SIAM Journal on Optimization 19 (2008), no 2, 766 785 [7] HH Bauschke, SM Moffat and X Wang, The resolvent average for positive semidefinite matrices, Linear Algebra Appl 432 (2010), no 7, 1757-1771 [8] A Berman and RJ Plemmons, Nonnegative Matrices in the Mathematical Sciences, SIAM, 1994 [9] D Borwein, JM Borwein, and B Sims, On the solution of linear mean recurrences, American Mathematical Monthly, to appear [10] JM Borwein and PB Borwein, Pi and the AGM, Wiley, New York, 1987 [11] JM Borwein and JD Vanderwerff, Convex Functions, Cambridge University Press, 2010 19

[12] P Borwein and T Erdélyi, Polynomials and Polynomial Inequalites, Springer-Verlag, 1995 [13] G Box, G Jenkins, and G Reinsel, Time Series Analysis: Forecasting and Control, third edition, Prentice Hall, Englewood Cliffs, NJ, 1994 [14] WS Chou, BS Du, and PJ-S Shiue, A note on circulant transition matrices in Markov chains, Linear Algebra and its Applications 429 (2008), 1699 1704 [15] PL Combettes and T Pennanen, Generalized Mann iterates for constructing fixed points in Hilbert spaces, Journal of Mathematical Analysis and Applications 275 (2002), no 2, 521 536 [16] PJ Davis, Circulant Matrices, Wiley, 1979 [17] RA Horn and CR Johnson, Matrix Analysis, Cambridge University Press, Cambridge, 1985 [18] O Krafft and M Schaefer, Convergence of the powers of a circulant stochastic matrix, Linear Algebra and its Applications 127 (1990), 59 69 [19] CD Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, 2000 [20] J-J Moreau, Proximité et dualité dans un espace hilbertien, Bulletin de la Société Mathématique de France 93 (1965), 273 299 [21] JM Ortega, Numerical Analysis: A Second Course, second edition, SIAM, 1990 [22] AM Ostrowski, Solution of Equations and Systems of Equations, second edition, Academic Press, New York and London, 1966 [23] VV Prasolov, Polynomials, Springer-Verlag, Berlin, 2004 [24] RT Rockafellar, Convex Analysis, Princeton University Press, Princeton, 1970 [25] RT Rockafellar and RJ-B Wets, Variational Analysis, Springer, corrected third printing, 2009 [26] CP Tsokos, K-th moving, weighted and exponential moving average for time series forecasting models, European Journal of Pure and Applied Mathematics 3 (2010), 406 416 20