Some Stochastic Shape Applications in. Time Series and Markov Chains. Ying Zhao. (Under the direction of Robert Lund) Abstract

Size: px

Start display at page:

Download "Some Stochastic Shape Applications in. Time Series and Markov Chains. Ying Zhao. (Under the direction of Robert Lund) Abstract"

Cody Barber
6 years ago
Views:

1 Some Stochastic Shape Applications in Time Series and Markov Chains by Ying Zhao (Under the direction of Robert Lund) Abstract This dissertation explores several shape ordering applications in statistics, primarily in time series and reversible Markov chains. The dissertation has two major goals. First, we introduce shape orderings for stationary time series and explore their convergence rate ramifications. The shapes explored include increasing likelihood ratio, decreasing hazard rate and new better than used, structures reminiscent from stochastic processes settings. Examples of ARMA(p, q) time series having these shapes are presented. The shapes are then applied to obtain explicit geometric convergence rates of several one-step-ahead forecasting quantities. The second goal of this dissertation identifies a monotonicity property in reversible Markov chains and examine consequences of this structure. In particular, we show that the return times to every state in a reversible chain have a decreasing hazard rate structure on the even time indices. Good and sometimes even optimal convergence rates of a time reversible Markov chain are deduced from this monotonicity. Index words: Autocorrelation; Convergence Rate; Decreasing Hazard Rate; Decreasing Likelihood Ratio; Increasing Likelihood Ratio; Markov Chain; Mean Squared Error; Monotonicity; New Better than Used; Partial Autocorrelation; Renewal Sequence.

2 Some Stochastic Shape Applications in Time Series and Markov Chains by Ying Zhao B.S., Tianjin University, P.R. China, 2000 M.A., York University, Canada, 2001 A Dissertation Submitted to the Graduate Faculty of The University of Georgia in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Athens, Georgia 2005

4 Some Stochastic Shape Applications in Time Series and Markov Chains by Ying Zhao Approved: Major Professor: Robert Lund Committee: Nicole Lazar William P. McCormick Jaxk Reeves John Stufken Electronic Version Approved: Maureen Grasso Dean of the Graduate School The University of Georgia August 2005

5 Acknowledgements While the work with this dissertation has been extensive and trying, it has also been exciting, instructive, and fun. I am greatly indebted to many persons; without their trust, help, support and encouragement, this dissertation would not have been possible. First of all, I would like to thank my supervisor Dr. Robert Lund for his encouraging and inspiring way that guided me to a deeper understanding of knowledge during the work. Dr. Lund has such a kind spirit as well as a wealth of knowledge and an amazing ability to effectively mentor students. Your technical and editorial advice was essential in completing this dissertation and has taught me innumerable lessons and insights on the workings of academic research in general. I am also grateful to the faculty, staff, and graduate students at the Department of Statistics, with whom I have had valuable associations. You all helped me in various ways and made my experience at University of Georgia unforgettable. Special thanks go to Dr. Nicole Lazar, Dr. William McCormick, Dr. Jaxk Reeves, Dr. John Stufken, my advisory committee members, for timely assistance and insightful feedback. I thank my family in particular my parents Kehua and Xiuling and my sister Lin for their unconditional love and endless support to pursue my interests, for always being there for me. At last, to Rui Dai thanks for supporting me with your love and understanding, all the interesting discussions concerning with the work and the happiness you brought to me in the past years of my Ph.D. study life. iv

6 Table of Contents Page Acknowledgements iv Chapter 1 Introduction References Literature Review Time Series Overview Distribution Classes of Discrete Random Variables Monotone Markov Chains Renewal Theory References Shape Orderings for Stationary Time Series Introduction Definition of Orderings Examples Convergence Rates References A Monotonicity of Reversible Markov Chains Introduction Background Results v

7 vi 4.4 Convergence Rates of Reversible Chains Examples Proofs References Conclusions and Future Work Convergence of AR(p) Coefficients How Fast Can A Time-reversible Markov Chain Converge? References

8 Chapter 1 Introduction This dissertation explores applications of shape orderings in stationary time series and reversible Markov chains. The dissertation has two major goals: (1) to introduce shape orderings for stationary time series autocovariances and explore some of their convergence rate ramifications; and (2) to identify a monotonicity property in all reversible Markov chains and examine some consequences of this structure. The first part of this research introduces stochastic shape orderings into stationary time series analyses. We suggest orderings for the autocorrelation (ACF) and partial autocorrelation (PACF) functions of stationary series and study ramifications of such structures. The orderings proposed have direct analogies in stochastic process and reliability settings and include new worse than used, new better than used, increasing hazard rate, decreasing hazard rate, increasing likelihood ratio, and decreasing likelihood ratio. Whereas these orderings are not exhaustive decreasing reversed hazard rate, new better than used in expectation, increasing hazard rate average, for example, are useful orderings that are not expounded upon (cf. Shaked and Shanthikumar 1987, 1994; Kijima 1997) the flavor of what can be achieved will become clear. The primary contribution of the first part is the introduction of stochastic orderings into the time series analyst s toolbox. We are unaware of any previous literature with this slant. Given the utility of stochastic orderings in stochastic processes and reliability, the raw idea is expected to prove fruitful. Nonetheless, we do not make 1

9 2 any attempt at completing the issue here; indeed, the applications pursued are basically limited to explicit convergence rate bounds for mean squared prediction errors and Innovations Algorithm coefficients. It is expected that such shape orderings will eventually prove useful in likelihood computations, forecasting, and quantifying closeness of weighted and least squares estimators. The second application of stochastic shape orderings in this dissertation establishes a decreasing hazard rate (DHR) structure in reversible Markov chains on countable state spaces and studies some ramifications of this property. Time reversible Markov chains arise frequently in practice (cf. Ross 1996, Stroock 2005, Chen 2005) and include many Markov chain Monte Carlo (MCMC) generated chains. The research here is motivated by the Markov chain convergence rate problem. MCMC simulation algorithms were voted as one of the top 10 algorithms of all time by the IEEE society, the official bookkeepers of algorithms. Statisticians have been clever in constructing Markov chains that converge statistically to an equilibrium distribution with preset characteristics. This allows one to study many complex systems via simulation. Unfortunately, the number of iterations n needed to run the chain for a burn-in period to reach equilibrium remains unclear. In complex simulation settings where each iteration could take days, this is not feasible even with the best available computing. Such convergence rate issues have plagued statisticians in a variety of settings. Recent progress on convergence rates for stochastically ordered Markov Chains has been made (cf. Meyn and Tweedie 1993, 1994; Lund and Tweedie 1996; Lund et al. 1996; Kijima 1997, and the references therein). Hence, one is motivated to study renewal convergence rates under more general ordering structures. Our work here shows that the return time distribution to each and every state in a countable state-space reversible Markov chain has the decreasing hazard rate (DHR) property when examined on the even time indices. This structure is first proven for finite

10 3 state Markov chains and then extended to countable state spaces via a truncation argument. The DHR property identified is then used to derive a clean and explicit convergence rate bound for reversible Markov chains. This bound is even optimal in some cases. The DHR result imparts a geometry into reversible chain analyses. For example, a DHR first return distribution implies that if a fixed state has not been visited in the last l time units, then the chances of visiting the state in the next l time units are even smaller (compared to unconditional information). Markov chain analysts have frequently used stochastic orderings to drive discourse. For instance, stochastic monotonicity was used by Lund and Tweedie (1996) in coupling arguments to extract very sharp rates of convergence for ordered chains. Lindvall (1992) and Kijima (1997) are prominent references where stochastic orderings are used to assess stability of various chains. Keilson and Kester (1978), Brown (1980), Shaked and Shanthikumar (1994), Liggett (1989), Hansen and Frenk (1991), and Berenhaut and Lund (2002) are other authors who mix renewal theory, Markov chains, and stochastic orderings. That the DHR structure is identified along the subsequence of even integers is also noteworthy. For this implies that the two-step chain might be more amenable to analysis in reversible settings than the one-step chain. For MCMC simulators, this merely involves iterating the chain twice at each step instead of once, a straightforward task. The remainder of the dissertation proceeds as follows. Chapter 2 presents a brief review of stationary time series, distribution classes of discrete random variables, stochastically monotone Markov chains, and renewal theory. Chapter 3 and 4 are selfcontained manuscripts; each chapter includes its own list of references. In Chapter 3, we clarify our definition of orderings for the autocorrelation (ACF) and partial autocorrelation (PACF) functions of stationary time series. Examples of common autoregressive moving-average (ARMA(p, q)) series which have one or more of the

11 4 introduced orderings are provided. Chapter 3 then applies the results to derive some explicit geometric decay rates for quantities encountered in forecasting of stationary series. Chapter 4 identifies a monotonicity inherent to all reversible Markov chains and examines some consequences of this structure. In particular, it shows that the return times to every state in a reversible chain have a decreasing hazard rate on the even time indices. Good, and sometimes even optimal convergence rates of a time reversible Markov chain are deduced from this monotonicity. Chapter 5 concludes the dissertation by presenting some avenues for future work. 1.1 References [1] Berenhaut, K. S., and Lund, R. B. (2002). Renewal convergence rates for DHR and NWU lifetimes, Probability in the Engineering and Informational Sciences, 16, [2] Brown, M. (1980). Bounds, inequalities, and monotonicity properties for some specialized renewal processes, The Annals of Probability, 8, [3] Chen, M. F. (2005). Eigenvalues, Inequalities and Ergodic Theory, London: Springer. [4] Hansen, B. G., and Frenk, J. B. G. (1991). Some monotonicity properties of the delayed renewal function, Journal of Applied Probability, 28, [5] Keilson, J., and Kester, A. (1978). Unimodality preservation in Markov chains, Stochastic Processes and their Applications, 7, [6] Kijima, M. (1997). Markov Processes for Stochastic Modeling, London: Chapman and Hall.

12 5 [7] Liggett, T. (1989). Total positivity and renewal theory, In: Probability, Statistics, and Mathematics, Edited by T. W. Anderson, K. B. Athreya, and D. L. Iglehart, , Boston: Academic Press. [8] Lindvall, T. (1992). Lectures on the Coupling Method. New York: Wiley. [9] Lund, R. B., Meyn, S. P., and Tweedie, R. L. (1996). Computable exponential convergence rates for stochastically ordered Markov processes, Annals of Applied Probability, 6, [10] Lund, R. B. and Tweedie, R. L. (1996). Geometric convergence rates for stochastically ordered Markov chains, Mathematics of Operations Research, 20, [11] Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability, New York: Springer. [12] Meyn, S. P. and Tweedie, R. L. (1994). Computable bounds for geometric convergence rates of Markov chains, Annals of Applied Probability, 4, [13] Ross, S. M. (1996). Stochastic Processes, second edition, New York: Wiley. [14] Shaked, M., and Shanthikumar, J. G. (1987). IFRA properties of some Markov processes with a general state space, Mathematics of Operations Research, 12, [15] Shaked, M., and Shanthikumar, J. G. (1994). Stochastic Orders and their Applications, New York: Academic Press.

13 [16] Stroock, D. (2005). An Introduction to Markov Processes, New York: Springer. 6

14 Chapter 2 Literature Review 2.1 Time Series Overview This subsection reviews some results on time series. A good general reference is Brockwell and Davis (1991). A time series is a random sequence {X t }, observed in a time-ordered fashion, with t denoting time. Here, t runs over a suitable index set t T, very often with T = {0, ±1, ±2,...}, {1, 2, 3,...}, [0, ) or (, ). In this dissertation, T will be a subset of R. Moreover, we concentrate on discrete time where the observations are taken over a discrete uniformly spaced set, as is the case when observations are reordered at fixed time intervals. Time series, unlike many other branches of statistics, relies critically on the assumption that data values represent consecutive measurements taken at equally spaced time intervals Stationary Time Series A second-order description of {X t } specifies E[X t ] and Cov(X t,x s ) for all t and s. To gain insight into the dependence between the observations in a time series, we introduce the autocovariance function, which extends the idea of a covariance matrix to infinite collections of random variables. Definition (The Autocovariance Function). If {X t,t T } is a process such that Var(X t )< for each t T, then the autocovariance function γ X (, ) of {X t } 7

15 8 is defined by γ X (r,s) = Cov(X r,x s ) = E[(X r EX r )(X s EX s )], r,s T. (2.1.1) Definition (Stationarity). The time series {X t } is said to be stationary if (i) E[X t ] 2 < for all t Z, (ii) E[X t ] is a constant for all t Z, and (iii) γ X (r,s) = γ X (r + t,s + t) for all r,s,t Z. Remark 1. The above notion is frequently referred to in the literature as weak stationarity, covariance stationarity, stationarity in the wide sense, or second-order stationarity (cf. Brockwell and Davis 1991). Remark 2. If {X t } is stationary, then γ X (r,s) = γ X (r s, 0) for all r,s Z. It is therefore convenient to redefine the autocovariance function of a stationary sequence in just one variable: γ X (h) := γ X (h, 0) = Cov(X t+h,x t ) for all t,h Z. (2.1.2) The function γ X ( ) will be referred to as the autocovariance function (ACVF) of {X t } and γ X (h) as its value at lag h. The autocorrelation function (ACF) of {X t } is defined at lag h by ρ X (h) := γ X (h)/γ X (0) = Corr(X t+h,x t ) for all t,h Z. (2.1.3) The partial autocorrelation function, like the autocorrelation function, conveys vital information regarding the dependence structure of a stationary sequence. Like the autocorrelation function, the PACF only depends on the second order properties of {X t }. The partial autocorrelation α(h) at lag h is defined as the correlation between X 1 and X h+1 adjusted for the intervening observations X 2,...,X h.

16 Definition (Partial Autocorrelation Function). The partial autocorrelation function α( ) of a stationary time series {X t } is defined at lags h 1 by 9 α(1) = Corr(X 2,X 1 ) = ρ(1), and for h 2, ( ) α(h) = Corr X h+1 P sp{1,x2,...,x h }(X h+1 ),X 1 P sp{1,x2,...,x h }(X 1 ).(2.1.4) Here, sp( ) denotes closed linear span; the projections P sp{1,x2,...,x h }(X h+1 ) and P sp{1,x2,...,x h }(X 1 ) can be calculated from the classical prediction equations: n P sp{1,z1,...,z n}(x) = α i Z i, (2.1.5) where α 0 = E[X t ], Z 0 = 1, Z 1,...,Z n are mean zero random variables and α 1,...,α n satisfy the linear system of equations i=0 n α i Cov(Z i,z j ) = Cov(X,Z j ), j = 0, 1,...,n. (2.1.6) i=0 It is perhaps less well known that the ACF and PACF essentially uniquely determine each other (Ramsey, 1974). In what follows, we let {X t } be a zero mean covariance stationary series and let γ(h) = Cov(X t+h,x t ), ρ(h) = Corr(X t+h,x t ), and α(h) = Corr(X h+1,x 1 X 2,X 3,...,X h ) denote the autocovariance (ACVF), autocorrelation (ACF), and partial autocorrelation (PACF) of {X t }, respectively, at lag h Stationary ARMA Process The family of autoregressive moving-average (ARMA) processes plays a key role in modelling time series data. The ARMA family of stationary time series is defined in terms of solutions to linear difference equations with constant coefficients. The linear structure of ARMA processes leads also to a very simple theory of linear prediction,

17 10 and ARMA models are known to be dense in stationary short memory structures (cf. Brockwell and Davis 1991). Definition (The ARMA(p, q) Processes). The sequence {X t,t = 0, ±1, ±2,...} is said to be an ARMA(p, q) process if {X t } is stationary and if for every t, X t φ 1 X t 1 φ p X t p = Z t + θ 1 Z t θ q Z t q, (2.1.7) where {Z t } WN(0,σ 2 ), where WN(0,σ 2 ) denotes a white noise sequence (uncorrelated random variables with the constant variance σ 2 ). We say that {X t } is an ARMA(p, q) process with mean µ if {X t µ} is an ARMA(p, q) process. In (2.1.7), φ 1,...,φ p are called autoregressive coefficients and θ 1,...,θ q are called movingaverage coefficients. Equation (2.1.7) can be written symbolically in the compact form φ(b)x t = θ(b)z t, t = 0, ±1, ±2,..., (2.1.8) where φ and θ are the p th and q th degree polynomials φ(z) = 1 φ 1 z φ p z p (2.1.9) and θ(z) = 1 + θ 1 z + + θ q z q, (2.1.10) and B is the backward shift operator defined by B j X t = X t j, j = 0, ±1, ±2,... (2.1.11) The polynomials φ and θ will be referred to as the autoregressive and moving-average polynomials respectively in the difference equation (2.1.8). Example (MA(q) Processes). If φ(z) 1 then

18 11 X t = θ(b)z t (2.1.12) and {X t } is said to be a moving-average sequence of order q (or MA(q)). All MAs are stationary, since they are clearly linear combinations of stationary series (in this case, white noise). Furthermore, the moving-average ACVF can be derived as γ(h) = q h σ 2 j=0 θ j θ j+ h, if h q 0, if h > q. (2.1.13) Example (AR(p) Processes). If θ(z) 1, then φ(b)x t = Z t (2.1.14) and the sequence {X t } is said to be an autoregressive sequence of order p (or AR(p)). Not every AR equation has a stationary solution; indeed, an AR(1) with φ 1 = 1 is a random walk (without an initial condition, e.g. X 0 = 0) which is not stationary. In AR(p) cases the existence and uniqueness of a stationary solution to (2.1.14) merits closer investigation. For an AR(1) model, iterating k times yields X t = Z t + φ 1 X t 1 (2.1.15) As X t = Z t + φ 1 Z t φ k 1Z t k + φ k+1 1 X t k 1. (2.1.16) φ j 1Z t j is mean-square convergent (by the Cauchy criterion) when φ 1 < 1, j=0 we conclude that the solution to (2.1.15) is X t = φ j 1Z t j. (2.1.17) j=0

19 12 and Cov(X t+h,x t ) = lim E n = σ 2 φ h 1 [ ( n j=0 j=0 φ j 1Z t+h j)( n k=0 φ 2j 1 φ k 1Z t+k ) ] = σ 2 φ h 1 /(1 φ 2 1). (2.1.18) Moreover, {X t } as defined by (2.1.17) satisfies the difference equations (2.1.16) and can be shown to be the unique (in mean square) stationary solution. Definition (Causality). An ARMA(p, q) sequence {X t } is said to be causal with respect to {Z t }, if there exists a sequence of constants {ψ j } j=0 such that ψ j < and j=0 X t = ψ j Z t j, t = 0, ±1, ±2,... (2.1.19) j=0 Definition (Invertibility). An ARMA(p, q) sequence {X t } is said to be invertible with respect to {Z t }, if there exists a sequence of constants {π j } j=0 such that π j < and j=0 Z t = π j X t j, t = 0, ±1, ±2,... (2.1.20) j=0 A fundamental ARMA result is contained in the following theorem. Theorem Suppose that {X t } satisfies the ARMA(p,q) difference equation with {Z t } WN(0,σ 2 ) and that the polynomials φ( ) and θ( ) have no common zeroes. Then {X t } is causal if and only if φ(z) 0 for all z C such that z 1. The coefficients {ψ j } j=0 in (2.1.19) are determined by the relation

20 13 ψ(z) = ψ j z j = θ(z)/φ(z), z 1. (2.1.21) j=0 Proof. See Brockwell and Davis (1991). Theorem Suppose that {X t } satisfies the ARMA(p,q) difference equation with {Z t } WN(0,σ 2 ) and that φ( ) and θ( ) have no common zeroes. Then {X t } is invertible if and only if θ(z) 0 for all z C such that z 1. The coefficients {π j } j=0 in (2.1.20) are determined by the relation π(z) = π j z j = φ(z)/θ(z), z 1. (2.1.22) j=0 Proof. See Brockwell and Davis (1991). 2.2 Distribution Classes of Discrete Random Variables This subsection overviews stochastic orderings (shapes) of discrete random variables. Kijima (1997) and Muller and Stoyan (2002) are comprehensive references on this topic. Let X be a discrete random variable taking values in {1, 2, 3,...} with probability vector a = (a i ) i=1, where a i = P[X = i] and i=1 a i = 1. If the support of X is finite, i.e., there is some N such that N i=1 a i = 1, then a i = 0 for all i > N. The hazard rate function of X is defined by h i = a i A i ; (2.2.1) where A i = P[X i] = k=i a k whenever A i > 0. Note that A i = 0 implies A j = 0 for all j > i. An interpretation of the hazard rate function is merely h i = P[X = i] P[X i] = P[X = i X i]; (2.2.2)

21 14 that is, h i is the probability that, conditional on survival up to time i, the lifetime X is equal to i. Definition (Increasing Hazard Rate). A discrete random variable X with distribution a = (a i ) i=1 is said to have an increasing hazard rate, denoted by X IHR or a IHR, if A i+1 2 Ai A i+2, i 0. (2.2.3) If the inequality is reversed, X is said to have decreasing hazard rate and is denoted by X DHR or a DHR. We note that, since A i = 0 implies that A j = 0 for j > i, X IHR if and only if with the convention 0/0 = 0. Since A i+1 A i A i+2 A i+1, i 0, (2.2.4) A i+1 A i = 1 a i A i = 1 h i, (2.2.5) provided that A i > 0, X is IHR if and only if the hazard rate function of X is increasing on the support of X. Similarly, it is easily seen that X DHR if and only if A i+1 /A i is increasing in i; that is, X is DHR if and only if the hazard rate function of X is decreasing. These observations justify the terms IHR and DHR in Definition Next, the likelihood ratio function of X is defined by whenever a i > 0. l i = a i+1 a i, (2.2.6) Definition (Decreasing Likelihood Ratio). A discrete random variable X with distribution a = (a i ) i=1 is said to have a decreasing likelihood ratio, denoted by X DLR or a DLR, if

22 15 a 2 i+1 a i a i+2. (2.2.7) If the inequality is reversed, X is said to have an increasing likelihood ratio and is denoted by X ILR or a ILR. Now suppose for simplicity that 0 < a 1 < 1. Then, if X DLR, a i = 0 implies that a j = 0 for all j > i. Hence X DLR implies that a i+1 a i = l i l i+1 = a i+2 a i+1, i 0, (2.2.8) with the convention 0/0 = 0; that is, the likelihood ratio function of X is decreasing. Similarly, if X ILR then a i > 0 for all i 0 and the likelihood ratio function of X is increasing. Definition (New Better than Used). A discrete random variable X with distribution a = (a i ) i=1 is called new better than used and is denoted by X NBU or a NBU, if A i+j A i A j, i,j 0. (2.2.9) If the inequality is reversed, X is called new worse than used and is denoted by X NWU or a NWU. The NBU property is written as P[X i] = A i A i+j A j = P[X i + j X j], j = 1, 2,..., (2.2.10) with the convention 0/0 = 0. The left-hand side in the above inequality denotes the survival probability of a new unit after i time intervals, whereas the right-hand side is the survival probability of an old unit with age j for i additional time periods. The term new better than used is thus interpreted as saying that a new item lasts stochastically longer than a used item. Analogous to (2.2.10), the NWU property is written as

23 16 P[X i] = A i A i+j A j = P[X i + j X j], j = 1, 2,..., (2.2.11) with the convention 0/0 = 0. The term new worse than used is thus interpreted as a new item lasts stochastically shorter than a used item. Theorem In the distribution classes of discrete random variables, we have (i) DLR IHR NBU; (ii) ILR DHR NWU. Proof. See Kijima (1997). The containments in Theorem are all strict. In particular, there are IHR distributions that are not DLR. 2.3 Monotone Markov Chains Markov chains are discrete-time stochastic processes that obey the so called Markov forgetfulness property in that the distribution of future states depends only on the current state and not on previous history of the process. The Markov property was proposed by A.A. Markov ( ) as part of his work on generalizing the classical limit theorems of probability. Markov chains have many applications in, for example, operations research, biology, engineering, and economics. We take the case of a discrete state space on the states i 0,i 1,i 2,.... Definition (Markov Property). The sequence {X n } n=0 is called a Markov chain on the states i 0,i 1,i 2,..., if for each n 0 and every i 0,...,i n and j 0, P[X n+1 = j X 0 = i 0,...,X n = i n ] = P[X n+1 = j X n = i n ] (2.3.1)

24 17 Given the whole history {X 0 = i 0,...,X n = i n }, the Markov property (2.3.1) states that the current state X n = i n determines the distributions of all future chain values X n+h for h 0. Definition (Monotone Markov chain). A Markov chain is said to be stochastically monotone if, when {X n } n=0 and {X n} n=0 are two realizations of the chain with X 0 X 0 stochastically (ie. P[X 0 > x] P[X 0 > x], for all x), then X n X n for all n 1 stochastically. A monotone Markov chain merely stipulates that a higher initial level produces a higher level at all other times. Example Suppose that {X n } n=0 is a stochastically monotone Markov chain on the states {0, 1, 2,...}. Then the return time to state 0, is NWU. Proof. τ 0 = inf n 1 {X n = 0 X 0 = 0} (2.3.2) P[τ 0 i + j X 0 = 0] = P[τ 0 i + j τ 0 i X 0 = 0] P[τ 0 i X 0 = 0] P[τ 0 i X 0 = 0] = P[τ 0 i + j τ 0 i X 0 = 0]P[τ 0 i X 0 = 0] P[τ 0 j X 0 = 0]P[τ 0 i X 0 = 0], (2.3.3) where the last line follows from stochastic monotonicity. Berenhaut and Lund (2002) discuss how to obtain a total variation Markov chain convergence rate from a chain with a DHR state, and give the following explicit rate and first constant. Theorem Suppose that {X n } n=0 is an ergodic (aperiodic, positive recurrent, irreducible) Markov chain on a countable state space and that the chain has a state k such that τ k (starting from X 0 = k) is DHR. Then for all n 0,

25 sup A ( P[X n A X 0 = k] π(a) n + 1 R F 1 18 ) R n F (2.3.4) where π is the unique stationary measure of the chain and R F denotes the radius of convergence of E[r τ k X0 = k]. Proof. See Berenhaut and Lund (2002). Computing R F in practice may be difficult. However, one can show that the same bound in Theorem applies to any r > 1 satisfying E[r τ k X0 = k] < : sup P[X n A X 0 = k] π(a) A ( ) n + 1 r n (2.3.5) r 1 Foster-Lyapunov drift methods (cf. Meyn and Tweedie, 1993), for example, allow practical identification of r > 1 such that E k [r τ k ] < with minimal effort. Stochastic shapes and Markov chains have been linked for over twenty years now (Keilson and Kester, 1978); however, connecting shapes with explicit convergence rates remains relatively unexplored, with the dissertation by Berenhaut (2000) being an exception. 2.4 Renewal Theory This section overviews some results in renewal theory and its applications. A stochastic process {N(t),t 0} is said to be a counting process if N(t) is the total number of events that have occurred up to time t. Then a counting process for which the times between successive events are independent and identically distributed with an arbitrary distribution is called a renewal process. For example, a Poisson process is a counting process for which the times between successive events are independent and identically distributed exponential random variables.

26 Definition If the sequence of nonnegative random variables {X 1,X 2,...} is independent and identically distributed, then the counting process {N(t), t 0} is said to be a renewal process. We will work in discrete time in what follows and take X i to be supported on {1, 2, 3,...}. We call the X i in this case lifetimes. For a renewal process having lifetimes {X i } i=1 with distribution given by P[X i = k] = f k, set S 0 = 0, and S n = n i=1 X i, for n 1. Then S n denotes the time of the nth renewal. The renewal sequence {u n } n=0 is defined by u n = k=0 P[S k = n], for n 0. Hence u n is the probability of a renewal occurring at time n. The convention u 0 = 1 is made. We assume that X 1 is not supported on a sublattice of {1, 2,...}, that the mean recurrent time µ def = E[X 1 ] <, and that X 1 is not degenerate in the sense that 19 The classical recurrent event relation is P[X 1 > 1] > 0. (2.4.1) n u n = f k u n k, n 1. (2.4.2) k=1 It can be shown that (2.4.2) defines a one-to-one map between probability mass functions on the positive integers and renewal probabilities. The basic renewal theorem (Erdös, Feller and Pollard, 1949) states that as n for any non-lattice lifetime X 1. 1 def u n E[X 1 ] = u (2.4.3) The conventional approach to obtain rates of convergence in (2.4.3) places restrictions on the tails of the sequence {q n } n=0, where q n = P[X 1 > n], usually in the form of moment conditions. For example, Feller (1957) proved that if E[X 2 1] <, then

27 20 ( ) def 1 e n = u n u = o n (2.4.4) as n. Theorem (Kendall 1959) An s > 1 exists such that E[s X 1 ] < if and only if an r > 1 exists satisfying e n = o(r n ). (2.4.5) Lindvall (1979) gives a coupling proof of Theorem However, the rate r in the statement of Theorem is not explicit. For instance, if E[s X 1 ] <, (2.4.5) may not hold with r = s. Our goal here is to investigate conditions under which explicit r > 1 can be identified satisfying (2.4.5), and to identify the largest such r if possible. It will be shown in Chapter 4, that under decreasing hazard rate lifetimes, a good convergence rate bound can be obtained. Berenhaut and Lund (2001) rehashed Heathcote (1967) and Malyshev and Spieksma (1995) to show that the optimal geometric convergence rate in (2.4.5) involves roots of a power series. For complex z with z < 1, define U(z) = u n z n and F(z) = P[X 1 = n]z n = E[z X 1 ]. (2.4.6) n=0 n=1 The relationship U(z) = (1 F(z)) 1 follows from the classical recurrent event equation (2.4.2) and relates the two power series in (2.4.6). The radius of convergence of the power series (z) = (u n u )z n (2.4.7) n=0

28 21 is denoted by R and is the optimal geometric decay rate of u n to u. R exceeds unity by Kendall s renewal theorem whenever X 1 has a finite geometric moment. A useful identity (Heathcote, 1967) is (z) = F (1) (z) 1 F(z) 1 = ( E[X 2 1 ] E[X 1 ] 2E[X 1 ] 2 ) F (2) (z) F (1) (z) (2.4.8) for z < 1, where F (1) (z) = E[z X(1) 1 ] and F (2) (z) = E[z X(2) 1 ] are the generating functions of the first two distributions derived from the tails of X 1 : P[X (1) 1 = n] = E[X 1 ] 1 P[X 1 > n] for n 0 and P[X (2) = n] = E[X (1) 1 ] 1 = P[τ (1) 1 > n] for n 0. One can also obtain F (1) (z) = F(z) 1 E[τ 1 ](z 1) and F (2) (z) = F (1) (z) 1 E[τ (1) 1 ](z 1). (2.4.9) The radius of convergence of F, denoted by R F > 1, is also the radius of convergence of F (1) and F (2). The decay rate R can be related to the location of zeros (possibly complex) of F (1). Since F (1) is also a probability generating function, F (1) (1) = 1 and z = 1 is not a zero of F (1). Let r 0 = z 0 denote the magnitude of the smallest nonzero root of F (1) (z 0 is a solution to F (1) (z 0 ) = 0) and suppose that r 0 < R F <. We now argue that R = r 0. First, note that F (1) and F (2) cannot have a common zero inside their common radius of convergence. For if F (1) (z 0 ) = 0 for some z 0 < R F, then (2.4.9) gives F (2) (z 0 ) = E[τ (1) 1 ] 1 0. Hence, F (2) (z 0 )/F (1) (z 0 ) =, and because power series that agree on the disk z < 1 must agree elsewhere (uniqueness of Taylor expansions), (2.4.8) gives (z 0 ) =. A triangle inequality establishes the absolute divergence = (z 0 ) u n u r 0 n. (2.4.10) n=0

29 22 Hence, R r 0. To argue that R r 0, return to (2.4.8) and use the fact that ratios of analytic functions are analytic in regions where the denominator has no zeros. Hence, (z) is analytic in z < r 0 and R r 0. Many applications of renewal theory are made to Markov chains. Here X 1 typically represents the first return time to a fixed state starting from that same state in the chain. It will be shown that good convergence rates for reversible Markov chains can be obtained from distributional orderings of X 1. Renewal theory and stochastic orderings have been previously considered by Kaluza (1928), Shanhbag (1977), Brown (1980, 1981), Shanthikumar (1988), Hansen and Frenk (1991) and Sengupta et al. (1995). In general, these authors explore monotonicities and inequalities for the renewal function, not rates of convergence. Liggett (1989) and Embrechts and Omey (1984) do obtain some limited results on geometric renewal convergence rates. 2.5 References [1] Berenhaut, K. S. (2000). Geometric Renewal Convergence Rates and Discrete Lifetime Distribution Classes, Ph.D. Dissertation, University of Georgia. [2] Berenhaut, K. S., and Lund, R. B. (2001). Geometric renewal convergence rates from hazard rates, Journal of Applied Probability, 38, [3] Berenhaut, K. S., and Lund, R. B. (2002). Renewal convergence rates for DHR and NWU lifetimes, Probability in the Engineering and Informational Sciences, 16, [4] Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods, second edition, New York: Springer.

30 [5] Brown, M. (1980). Bounds, inequalities, and monotonicity properties for some specialized renewal processes, The Annals of Probability, 8, [6] Brown, M. (1981). Further monotonicity properties for specialized renewal processes, The Annals of Probability, 9, [7] Embrechts, P., and Omey, E. (1984). Functions of power series, Yokohama Mathematical Journal, 32, [8] Erdös, P., Feller, W., and Pollard, H. (1945). A property of power series with positive coefficients, Bulletin of the American Mathematical Society, 55, [9] Feller W. (1957). An Introduction to Probability Theory and Its Applications, second edition, New York: John Wiley & Sons. [10] Hansen, B. G., and Frenk, J. B. G. (1991). Some monotonicity properties of the delayed renewal function, Journal of Applied Probability, 28, [11] Heathcote, C. R. (1967). Complete exponential convergence and related topics, Journal of Applied Probability, 4, [12] Kaluza, T. (1928). Uber die Koeffizienten Reziproks Potenzreihen, Mathematische Zeitschrift, 28, Keilson, J., and A. Kester (1978). Unimodality preservation in Markov chains, Stochastic Processes and their Applications, 7, [13] Kendall, D. G. (1959). Unitary dilations of Markov transition operators and the corresponding integral representations for transition probability matrices,

31 In: Probability and Statistics, Edited by U. Grenander, , New York: Wiley. 24 [14] Kijima, M. (1997). Markov Processes for Stochastic Modeling, London: Chapman and Hall. [15] Liggett, T. (1989). Total positivity and renewal theory, In: Probability, Statistics, and Mathematics, Edited by T. W. Anderson, K. B. Athreya, and D. L. Iglehart, , Boston: Academic Press. [16] Lindvall, T. (1979). On coupling of discrete renewal processes, Z. Wahrsch. Verw. Gebiete, 48, [17] Malyshev, V. A., and Spieksma, F. M. (1995). Intrinsic convergence rate of countable Markov chains, Markov Processes and Related Fields,1, [18] Marshall, A. W., and Shaked, M. (1986). NBU processes with general state space. Math. Operat. Res. 11, [19] Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability, New York: Springer-Verlag. [20] Muller, A., and Stoyan, D. (2002). Comparison Methods for Stochastic Models and Risks, New York: Wiley. [21] Ramsey, F. L. (1974). Characterization of the partial autocorrelation function, Ann. of Statist., 2,

32 25 [22] Sengupta, D., Chatterjee, A., and Chakraborty, B. (1995). Reliability bounds and other inequalities for discrete life distributions, Microelectronics and Reliability, 35, [23] Shanhbag, D. N. (1977). On renewal sequences, Bulletin of the London Mathematical Society, 9, [24] Shanthikumar, J. G. (1988). DFR property of first-passage times and its preservation under geometric compounding, Annals of Probability, 16,

33 Chapter 3 Shape Orderings for Stationary Time Series 1 1 Y. Zhao and R. Lund Submitted to Journal of Applied Probability, 10/

34 27 Abstract This paper introduces shape orderings for stationary time series autocorrelation and partial autocorrelation functions and explores some of their convergence rate ramifications. The shapes explored include increasing likelihood ratio, decreasing hazard rate, and new better than used, structures familiar from stochastic processes and reliability settings. Examples of autoregressive moving-average time series having these shapes are first presented. The shapes are then applied to obtain explicit geometric convergence rates of several forecasting quantities. Key Words and Phrases: Autocorrelation; Convergence Rate; Decreasing Hazard Rate; Increasing Likelihood Ratio; Mean Squared Error; New Better Than Used; Partial Autocorrelation. 3.1 Introduction This paper introduces stochastic shape orderings for the autocorrelation (ACF) and partial autocorrelation (PACF) functions of stationary time series and studies some of their convergence rate ramifications. The orderings introduced are familiar from stochastic process and reliability settings and include new worse than used, new better than used, increasing hazard rate, decreasing hazard rate, increasing likelihood ratio, and decreasing likelihood ratio. The utility of such orderings can be appreciated by browsing Keilson and Kester (1978), Brown (1980), Shaked and Shanthikumar (1994), Liggett (1989), Hansen and Frenk (1991), Sengupta et al. (1995), Kijima (1997), Berenhaut and Lund (2001), and Muller and Stoyan (2003). The above list of ordering types is not exhaustive: decreasing reversed hazard rate, new better than used in expectation, increasing hazard rate average, etc. orderings have also proven useful in applications (cf. Shaked and Shanthikumar 1987, 1994; Kijima 1997; Muller and Stoyan 2003).

35 28 The main purpose of this article is to introduce stochastic orderings into the time series analyst s toolbox. We are unaware of any previous literature with this slant. Given the utility of ordering methods in stochastic processes, the theme is expected to prove fruitful. Nonetheless, we do not make any attempt at completing the issue here; indeed, the applications pursued here reside only with explicit convergence rate bounds for mean squared prediction errors and Innovations Algorithm coefficients. It is expected that shape orderings will eventually be useful in likelihood computations, forecasting, and quantifying how close weighted and least squares estimators are. The remainder of this paper proceeds as follows. Section 2 clarifies the orderings that we study. Section 3 gives examples of common autoregressive moving-average (ARMA(p, q)) time series models which obey one or more of the introduced orderings. Section 4 applies the shape ideas by deriving explicit geometric convergence rates for several quantities encountered in forecasting. 3.2 Definition of Orderings Let {X t } be a zero mean covariance stationary series with autocovariance γ(h) = Cov(X t+h,x t ) at lag h. We denote the lag h autocorrelations and partial autocorrelations by ρ(h) = Corr(X t+h,x t ) and α(h) = Corr(X h+1,x 1 X 2,X 3,...,X h ), respectively. Clarifying, a conditional correlation refers to correlation after adjustment for best linear prediction given X 2,...,X h : ( ) α(h) = Corr X h+1 P sp{x2,...,x h }(X h+1 ),X 1 P sp{x2,...,x h }(X 1 ), (3.2.1) where sp( ) denotes closed linear span and P( ) indicates a linear prediction.

36 The autocorrelation function ρ( ) is said to be new better than used (NBU) if 29 If the inequality in (3.2.2) is reversed to ρ(i + j) ρ(i)ρ(j), i,j 0. (3.2.2) ρ(i + j) ρ(i)ρ(j), i,j 0, then {X t } is said to have a new worse than used (NWU) ACF. The series {X t } is said to have a decreasing likelihood ratio (DLR) ACF if If the inequality in (3.2.3) is reversed to ρ(h + 1) 2 ρ(h)ρ(h + 2), h 0. (3.2.3) ρ(h + 1) 2 ρ(h)ρ(h + 2), h 0, (3.2.4) then the ACF of {X t } is said to have an increasing likelihood ratio (ILR). The series {X t } is said to have an increasing hazard rate (IHR) ACF if the hazard rates h i defined by h i = k=i ρ(i) i 0, (3.2.5) ρ(k), are nondecreasing in i; if (3.2.5) is decreasing as i increases, then {X t } is said to have a decreasing hazard rate (DHR) ACF. We say that {X t } has a monotone autocorrelation function if ρ(h) is decreasing as h increases; {X t } is said to have a convex autocorrelation function if ρ(h) is convex in h in the sense that ρ(h) 2ρ(h 1) + ρ(h 2) 0, h 2.

37 30 Since many convex sequences with ρ(0) > 0 are non-negative definite (Polya s criterion), such sequences are indeed legitimate stationary autocovariance functions (cf. Problem 26.3 in Billingsley, 1995, for sufficient conditions). The above definitions are analogues of orderings taking the same name for a univariate random variable X {0, 1, 2...}: we have merely interchanged P(X > h) with ρ(h). Some properties follow immediately from stochastic processes theory. For example, any DHR ACF is also NWU; however, there exist NWU ACFs that are not DHR (cf. Kijima 1997; Berenhaut and Lund 2001). Likewise, any IHR ACF is NBU, but such containment is not exclusive. As ρ(h) = γ(0) 1 γ(h), the above shapes are also meaningful for the ACVF without modification. For shapes of the PACF, we merely exchange ρ( ) with α( ). For example, {X t } is said to have an NBU PACF if α(i + j) α(i)α(j), i,j Examples This section presents examples of ARMA(p, q) time series models possessing some of the above orderings. As a very simple first example, consider a first-order causal autoregression {X t } satisfying X t = φx t 1 + Z t, where {Z t } is zero mean white noise with variance σ 2. Here, ρ(h) = φ h for h 0 ( φ < 1 is due to causality). Hence, the AR(1) ACF is NWU, NBU, ILR, and DLR. When φ (0, 1), the AR(1) ACF is monotone and convex. It is not too hard to show that a first order autoregression is the only stationary series with an ACF that is both NWU and NBU.

38 31 Example Every first-order moving-average (MA(1)) has a DLR ACF. Proof. Write the MA(1) as X t = Z t + θz t 1, (3.3.1) where {Z t } is zero mean white noise with variance σ 2. The ACF of {X t } is ρ(h) = 1 {0} (h)+θ/(1+θ 2 )1 {±1} (h). Hence, ρ 2 (0) = 1 and ρ(1) 2 = θ 2 /[1+θ 2 ] 2. As ρ(h)ρ(h+ 2) = θ 2 /[1 + θ 2 ] 2 for h = 1 and zero otherwise, the claim now follows. Example Every second-order causal autoregression (AR(2)) has a DLR ACVF. Proof. Here, {X t } is a solution to the difference equation X t φ 1 X t 1 φ 2 X t 2 = Z t, (3.3.2) where {Z t } is zero mean white noise with variance σ 2. Factoring the AR(2) polynomial into its roots gives (1 ξ 1 1 B)(1 ξ 1 2 B)X t = Z t, where B is the usual backshift operator and causality implies that ξ 1 > 1 and ξ 2 > 1. We proceed only with the case of unequal roots: ξ 1 ξ 2. Using the relations φ 1 = ξ ξ 1 2 and φ 2 = ξ 1 1 ξ 1 2 in a difference equationbased expression for AR(2) autocovariances (cf. Brockwell and Davis 1991, Chapter 3) allows us to write γ(h) = σ 2 ξ 2 1ξ 2 2 (ξ 1 ξ 2 1)(ξ 2 ξ 1 ) [(ξ2 1 1) 1 ξ 1 h 1 (ξ 2 2 1) 1 ξ 1 h 2 ]. (3.3.3) Writing ξ 1 = re iθ and ξ 2 = re iθ for some θ (0,π] and r > 0 (ξ 1 and ξ 2 are complex conjugates) in (3.3.3) gives

39 32 γ(h) = σ 2 r 4 r h sin(hθ + ψ) (r 2 1)(r 4 2r 2 cos(2θ) + 1) 1 2 sin θ, where tan(ψ) = (r 2 + 1)(r 2 1) 1 tan(θ) and cos(ψ) has the same sign as cos(θ). Hence, and ρ(h) = r h sin(hθ + ψ) sin ψ ( ) sin(hθ) = tanψ + cos(hθ) r h (3.3.4) r i+j ρ(i)ρ(j) = = ( ) sin(iθ) sin(jθ) sin(iθ) cos(jθ) + cos(iθ) sin(jθ) + + cos(iθ) cos(jθ) tan 2 ψ tanψ [ ( sin((i + j)θ) 1 + tanψ 2 tan 2 ψ + 1 ) cos((i j)θ) + 2 ( ) ] cos((i + j)θ). (3.3.5) 2 tan 2 ψ Equation (3.3.5) gives the identities r 2(i+1) ρ(i)ρ(i + 2) = + [ ( sin(2(i + 1)θ) 1 + tanψ 2 tan 2 ψ + 1 ) cos(2θ) 2 ( ) ] cos(2(i + 1)θ) 2 tan 2 ψ (3.3.6) and

40 33 r 2(i+1) ρ 2 (i + 1) = = = = [ ] 2 sin((i + 1)θ) + cos((i + 1)θ) tanψ [ ] sin 2 ((i + 1)θ) + cos 2 2 sin((i + 1)θ) cos((i + 1)θ) ((i + 1)θ) + tan 2 θ tanψ [ ] 1 cos(2(i + 1)θ) 1 + cos(2(i + 1)θ) sin(2(i + 1)θ) tan 2 ψ 2 tanψ [ ( ) sin(2(i + 1)θ) 1 + tanψ 2 1 cos(2(i + 1)θ) + 2 tan 2 ψ ( )] (3.3.7) 2 tan 2 ψ Combining (3.3.6) and (3.3.7) gives ρ 2 (i + 1) ρ(i)ρ(i + 2) = ( ) (1 cos(2θ))r 2(i+1). 2 tan 2 ψ As 1 cos(2θ) 0, the DLR property of causal AR(2) ACFs now follows. Example A causal and invertible ARMA(1,1) series has a PACF whose square is monotonically decreasing and ILR. Proof. The ARMA(1,1) difference equation is X t φx t 1 = Z t + θz t 1, (3.3.8) where {Z t } is zero mean white noise with variance σ 2. Causality and invertibility imply that φ < 1 and θ < 1. The partial autocorrelation function of an ARMA(1,1) series can be explicitly identified from the result of Problem 5.13 in Brockwell and Davis (1991): α 2 (n) = The claimed monotonicity follows from (3.3.9) since θ 2n 2 (1 θ 2 ) 2 (θ + φ) 2 (1 + θφ) 2 [(θ + φ) 2 (1 θ 2n ) + (1 φ 2 )(1 θ 2 )] 2. (3.3.9)

41 34 α 2 (n 1) α 2 (n) = 1 θ 2 [(θ + φ) 2 (1 θ 2n ) + (1 φ 2 )(1 θ 2 )] 2 [(θ + φ) 2 (1 θ 2n 2 ) + (1 φ 2 )(1 θ 2 )] 2 1 θ 2 1, (3.3.10) which implies that α 2 (n) is nonincreasing in n. From this monotonicity, the ILR claim can be verified to follow. 3.4 Convergence Rates This section explores convergence rate consequences of the above orderings in onestep-ahead linear prediction settings. This topic is also studied in Section of Pourahamadi (2001). For notation, let ˆX t+1 = P(X t+1 X 1,...,X t ) = P sp{x1,...,x t}(x t+1 ) (3.4.1) be the one-step-ahead linear prediction of X t+1 from elements in the span of X 1,...,X t and let v t = E[(X t+1 ˆX t+1 ) 2 ] denote its unconditional mean squared prediction error. Let v = lim t v t denote the limiting mean squared prediction error; this limit exists as v t is nonincreasing in t. The mean squared prediction error can be expressed via the square of the PACF: t v t = γ(0) (1 α 2 (j)) (3.4.2) j=1

42 35 (cf. Proposition in Brockwell and Davis 1991 and Section 7.5 of Pourahamadi 2001). To avoid trite work, we assume that the covariance matrix of (X 1,...,X t ) is invertible for all positive integers t. Sufficient for this is merely that γ(h) 0 as h (cf. Proposition of Brockwell and Davis, 1991), as is the case for any causal ARMA(p,q) series. From (3.4.2), we deduce that v t v = γ(0) γ(0) [ t (1 α 2 (j)) 1 j=1 t (1 α 2 (j)) j=1 j=t+1 j=t+1 (1 α 2 (j)) α 2 (j), (3.4.3) where the inequality 1 j=t+1 (1 α2 (j)) j=t+1 α2 (j) has been applied. Note that (3.4.3) is tight for an autoregression of order p; specifically, the righthand side of (3.4.3) is zero for lags t > p (implying that v t σ 2 = v for t > p). Now suppose that a shape structure is imposed; specifically, consider the case where {X t } has DLR PACF. Then α(t + h) α(t)α h (1) for all t,h 0. Using this in (3.4.3) produces a very clean and explicit convergence bound: ] [ t ] v t v γ(0)α 2 (1) (1 α 2 (j)) α 2 (t) j=2 γ(0)α 2 (t), (3.4.4) where the last line follows after 0 t j=2 (1 α2 (j)) 1 and α 2 (1) 1 are applied. It is important to note that (3.4.4) holds for each and every t and as such is a convergence rate bound. This differs radically in structure from an asymptotic approximation that is applicable for large n only. One could pursue other shapes and inequalities, and we do this to an extent in the MA(1) and ARMA(1,1) examples below; however, the important theme is simply

43 36 that a shape constraint can provide clean and explicit convergence rates for the mean squared prediction errors. It is worth commenting that the shapes studied here are also useful in obtaining explicit geometric convergence rates of Markov chains to stationarity (cf. Lund, Meyn, and Tweedie 1996; Berenhaut and Lund 2001). Now if {X t } is an ARMA(p,q) series satisfying the difference equation X t φ 1 X t 1... φ p X t p = Z t + θ 1 Z t θ q Z t q, where {Z t } is zero mean white noise with variance σ 2 and the AR and MA polynomials are causal and invertible, then the classic one-step-ahead recursive prediction formula is ˆX n+1 = p q φ k X n+1 k + θ n,k ( ˆX n+1 k X n+1 k ), n max(p,q). (3.4.5) k=1 k=1 The θ n,k s are identified by applying the Innovations Algorithm to a strategic linear transform of {X t } (cf. Ansley 1979; Brockwell and Davis 1991, Chapter 5): θ n,k = v 1 n k E[X n(x n+1 k ˆX n+1 k )], 1 k q; n max(p,q). (3.4.6) In the case of an MA(1) series, one has an ILR PACF structure. This is harder to work with than DLR structures, but can be handled as we show in the next example. Example The θ n,1 in an invertible MA(1) series converge to θ with explicit convergence bound rate Proof. Equation (3.4.6) gives [ θ n,1 θ ] θ n. (3.4.7) 1 θ 2

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong STAT 443 Final Exam Review L A TEXer: W Kong 1 Basic Definitions Definition 11 The time series {X t } with E[X 2 t ] < is said to be weakly stationary if: 1 µ X (t) = E[X t ] is independent of t 2 γ X