outlier posterior probabilities outlier sizes AR(3) artificial time series t (c) (a) (b) φ

Similar documents
Outlier detection in ARIMA and seasonal ARIMA models by. Bayesian Information Type Criteria

I!!!!!!! I. DETECTION OF OUTLlER PATCHES IN AUTOREGRESSIVE TIME SERIES. J ustel, A. Pena, D and Tsay, R.S (f) ~

Additive Outlier Detection in Seasonal ARIMA Models by a Modified Bayesian Information Criterion

Computational statistics

data lam=36.9 lam=6.69 lam=4.18 lam=2.92 lam=2.21 time max wavelength modulus of max wavelength cycle

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

STA 4273H: Statistical Machine Learning

Pattern Recognition and Machine Learning

Younshik Chung and Hyungsoon Kim 968). Sharples(990) showed how variance ination can be incorporated easily into general hierarchical models, retainin

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Cross-sectional space-time modeling using ARNN(p, n) processes

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

THE IDENTIFICATION OF MULTIPLE OUTLIERS IN ARIMA MODELS

Bayesian Analysis of Vector ARMA Models using Gibbs Sampling. Department of Mathematics and. June 12, 1996

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

A Course in Time Series Analysis

STAT 425: Introduction to Bayesian Analysis

A characterization of consistency of model weights given partial information in normal linear models

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Advances and Applications in Perfect Sampling

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa

Markov Chain Monte Carlo (MCMC)

Doing Bayesian Integrals

1. INTRODUCTION State space models may be formulated in avariety of ways. In this paper we consider rst the linear Gaussian form y t = Z t t + " t " t

A note on Reversible Jump Markov Chain Monte Carlo

Part 8: GLMs and Hierarchical LMs and GLMs

Heteroskedasticity; Step Changes; VARMA models; Likelihood ratio test statistic; Cusum statistic.

Residuals in Time Series Models

STA 294: Stochastic Processes & Bayesian Nonparametrics

Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Package lmm. R topics documented: March 19, Version 0.4. Date Title Linear mixed models. Author Joseph L. Schafer

Part 6: Multivariate Normal and Linear Models

Testing for Regime Switching in Singaporean Business Cycles

the US, Germany and Japan University of Basel Abstract

Markov Chain Monte Carlo methods

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

In: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999

Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

Kobe University Repository : Kernel

The lmm Package. May 9, Description Some improved procedures for linear mixed models

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Bayesian linear regression

Appendix: Modeling Approach

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

1. Introduction. Hang Qian 1 Iowa State University

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Lecture 9. Time series prediction

Detection ASTR ASTR509 Jasper Wall Fall term. William Sealey Gosset

Bayesian Methods for Machine Learning

POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL

Bayesian Linear Regression

Fahrmeir: Recent Advances in Semiparametric Bayesian Function Estimation

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Lecture Notes based on Koop (2003) Bayesian Econometrics

at least 50 and preferably 100 observations should be available to build a proper model

Marginal Specifications and a Gaussian Copula Estimation

Gaussian process for nonstationary time series prediction

Bayesian Methods in Multilevel Regression

is used in the empirical trials and then discusses the results. In the nal section we draw together the main conclusions of this study and suggest fut

CPSC 540: Machine Learning

Markov Chain Monte Carlo methods

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

STA 4273H: Statistical Machine Learning

Bayesian Inference. Chapter 1. Introduction and basic concepts

Rank Regression with Normal Residuals using the Gibbs Sampler

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Specication Search and Stability Analysis. J. del Hoyo. J. Guillermo Llorente 1. This version: May 10, 1999

identify only a few factors (possibly none) with dispersion eects and a larger number of factors with location eects on the transformed response. Tagu

The Bayesian Approach to Multi-equation Econometric Model Estimation

1. INTRODUCTION Propp and Wilson (1996,1998) described a protocol called \coupling from the past" (CFTP) for exact sampling from a distribution using

Non-Markovian Regime Switching with Endogenous States and Time-Varying State Strengths

Markov chain Monte Carlo methods in atmospheric remote sensing

Dynamic Time Series Regression: A Panacea for Spurious Correlations

Default Priors and Effcient Posterior Computation in Bayesian

Sea Surface. Bottom OBS

Markov Chain Monte Carlo Methods for Stochastic Optimization

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

Summary Recent developments in estimating non-linear, non-normal dynamic models have used Gibbs sampling schemes over the space of all parameters invo

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

In particular, the relative probability of two competing models m 1 and m 2 reduces to f(m 1 jn) f(m 2 jn) = f(m 1) f(m 2 ) R R m 1 f(njm 1 ; m1 )f( m

Brief introduction to Markov Chain Monte Carlo

Estadística Oficial. Transfer Function Model Identication

Particle Filtering Approaches for Dynamic Stochastic Optimization

Contents. Part I: Fundamentals of Bayesian Inference 1

Learning Static Parameters in Stochastic Processes

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1

Bayesian data analysis in practice: Three simple examples

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer

Likelihood-free MCMC

Summary of Talk Background to Multilevel modelling project. What is complex level 1 variation? Tutorial dataset. Method 1 : Inverse Wishart proposals.

Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates

Transcription:

DETECTION OF OUTLIER PATCHES IN AUTOREGRESSIVE TIME SERIES Ana Justel? 2 Daniel Pe~na?? and Ruey S. Tsay???? Department ofmathematics, Universidad Autonoma de Madrid, ana.justel@uam.es?? Department of Statistics and Econometrics, Universidad Carlos III de Madrid, dpena@est-econ.uc3m.es??? Graduate School of Business, University of Chicago, rst@gsbrst.uchicago.edu Abstract This paper proposes a procedure to detect patches of outliers in an autoregressive process. The procedure is an improvement over the existing detection methods via Gibbs sampling. We show that the standard outlier detection via Gibbs sampling may be extremely inecient in the presence of severe masking and swamping eects. The new procedure identies the beginning and end of possible outlier patches using the existing Gibbs sampling, then carries out an adaptive procedure with block interpolation to handle patches of outliers. Empirical and simulated examples show that the proposed procedure is eective. Key words: Gibbs sampler, multiple outliers, sequential learning, time series. Short running title: Outlier patches in autoregressive processes. 2 Correspondence to: Ana Justel, Departamento de Matematicas, Universidad Autonoma de Madrid, 2849 Madrid, Spain. Phone:+34 9 397 537. Fax:+34 9 397 4889.

. INTRODUCTION Outliers in a time series can have adverse eects on model identication and parameter estimation. Fox (972) dened additive and innovative outliers in a univariate time series. Let fx t g be an autoregressive process of order p, AR(p), satisfying x t = + x t; + + p x t;p + a t (.) where fa t g is a sequence of independent and identically distributed Gaussian variables with mean zero and variance 2 a and the polynomial (B) = ; B ;; p B p has no zeros inside the unit circle. An observed time series y t has an additive outlier (AO) at time T of size if it satises y t = I (T ) t + x t, t = ::: n, where I (T ) t is an indicator variable such that I (T ) t =ift 6= T, and I (T ) t =ift = T. The series has an innovative outlier (IO) at time T if the outlier directly aects the noise process, that is, y t = (B) ; I (T ) t +x t, t = ::: n. Chang and Tiao (983) show that additive outliers can cause serious bias in parameter estimation whereas innovative outliers only have minor eects in estimation. We deal with additive outliers that occur in patches in an AR process. The main motivation for our study is that multiple outliers, especially those which occur closely in time, often have severe masking eects that can render the usual outlier detection methods ineective. Several procedures are available in the literature to handle outliers in a time series. Chang and Tiao (983), Chang, Tiao and Chen (988) and Tsay (986, 988) proposed an iterative procedure to detect four types of disturbance in an autoregressive integrated moving-average (ARIMA) model. However, these procedures may fail to detect multiple outliers due to masking eects. They can also misspecify \good" data points as outliers, resulting in what is commonly referred to as the swamping or smearing eect. Chen and Liu (993) proposed a modied iterative procedure to reduce masking eects by jointly estimating the model parameters and the magnitudes of outlier eects. This procedure may also fail since it starts with parameter estimation that assumes no outliers in the data, see Sanchez and Pe~na (997). Pe~na (987, 99) proposed diagnostic statistics to measure the inuence of an observation. Similar to the

case of independent data, inuence measures based on data deletion (or equivalently, using techniques of missing value in time series analysis) will encounter diculties due to masking eects. A special case of multiple outliers is a patch of additive outliers. This type of outliers can appear in a time series for various reasons. First and perhaps most importantly, as shown by Tsay, Pe~na and Pankratz (998), a multivariate innovative outlier in a vector time series can introduce a patch of additive outliers in univariate marginal times series. Second, an unusual shock may temporarily aect the mean and variance of a univariate time series in a manner that cannot be adequately described by the four types of outlier commonly used in the literature, or by conditional heteroscedastic models. The eect is a patch of additive outliers. Because outliers within a patch tend to interact with each other, introducing masking or smearing, is important in applications to detect them. Bruce and Martin (989) were the rst to analyze patches of outliers in a time series. They proposed a procedure to identify outlying patches by deleting blocks of consecutive observations. However ecient procedures to determine the block sizes and to carry out the necessary computation have not been developed. McCulloch and Tsay (994) and Barnett, Kohn and Sheather (996, 997) used Markov Chain Monte Carlo (MCMC) methods to detect outliers and compute the posterior distribution of the parameters in an ARIMA model. In particular, McCulloch and Tsay (994) showed that the Gibbs sampling provides accurate parameter estimation and eective outlier detection for an AR process when the additive outliers are not in patches. However, as clearly shown by the following example, the usual Gibbs sampling may bevery inecient whenthe outliers occur inapatch. Consider the outlier-contaminated time series shown in Figure. The outlier-free data consist of a random realization of n = 5 observations generated from the AR(3) model, x t =2:x t; ; :46x t;2 +:336x t;3 + a t t = ::: 5 (.2) where a 2 =. The roots of the autoregressive polynomial are.6,.7 and.8, so that 2

the series is stationary. A single additive outlier of size ;3 has been added to the time index t = 27, and a patch of four consecutive additive outliers have been introduced from t = 38 to t = 4, with sizes ( 9 ). The data are available to author request. Assuming that the AR order p = 3is known, we performed the usual Gibbs sampling to estimate model parameters and to detect outliers. Figure gives some summary statistics of the Gibbs sampling output using the last, samples from a Gibbs sampler of 26, iterations (when the Markov chains are stabilized as shown in Figure -c). Figure -a provides the posterior probabilities of being an outlier for each data point, and Figure -b gives the posterior means of outlier sizes. From the plots, it is clear that the usual Gibbs sampler easily detects the isolated outlier at t = 27, with posterior probability close to one and posterior mean of outlier size ;3:3. Meanwhile, the usual Gibbs sampler encounters several diculties. First, it fails to detect the inner points of (the outlying patch as outliers the outlying posterior probabilities are very lowatt = 39 and 4). This phenomenon is referred to as masking. Second, the sampler misspecies the \good" data points at t = 37and 42 as outliers because the outlying posterior probabilities of these two points are close to unity. The posterior means of the sizes of these two erroneous outliers are ;3:42 and ;2:9, respectively. In short, two \good" data points at t = 37 and 42 are swamped by the patch of outliers. Third, the sampler correctly identies the boundary points of the outlier patch at t = 38 and 4 as outliers, but substantially underestimates their sizes (the posterior means of the outlier sizes are only 3:3 and 4:5, respectively). The example clearly illustrates the masking and swamping problems encountered by the usual Gibbs sampler when additive outliers exist in a patch. The objective of this paper is to propose a new procedure to overcome these diculties. Limited experience shows that the proposed approach is eective. The paper is organized as follows. Section 2 reviews the application of the standard Gibbs sampler to outlier identication in an AR time series. Section 3 proposes a new adaptive Gibbs algorithm to detect outlier patches. The conditional posterior distributions of blocks of observations are obtained and used to expedite the convergence 3

2 AR(3) artificial time series outlier posterior probabilities 2.8.6.4.2 (a) 2 3 4 5 t 2 3 4 5 t (b) 5 4 4 σ 2 2 outlier sizes 3 2 2 3 4 2 3 4 5 t 2 φ 3 φ 2 2 φ 2 2 4 2 φ 3 (c).5.5 2 2.5 x 4 iteration Figure : Top: AR(3) articial time series with ve outliers at periods t = 27 and 38{4 (marked with a dot). Bottom: results of the Gibbs sampler after 26, iterations (a) posterior probabilities for each data point to be outlier (b) posterior mean estimates of the outlier sizes for each data and (c) convergence monitoring by plotting the parameter values drawn in each iteration. of Gibbs sampling. Section 4 illustrates the performance of the proposed procedure in two examples. 2. OUTLIER DETECTION IN AN AR PROCESS 2.. AR model with additive outliers Suppose the observed data, y = (y ::: y n ) are generated by y t = t t + x t, t = ::: n, where x t is given by (.) and =( ::: n ) is a binary random vector of outlier indicators that is, t = if the t-th observation is contaminated by an addtive outlier of size t,and t = otherwise. For simplicity, assume that x ::: x p are xed and x t = y t for t = ::: p, i.e. there exist no outliers in the rst p observations. The indicator vector of outliers then becomes = ( p+ ::: n ) and the size vector is = ( p+ ::: n ). Let t = ( x t; ::: x t;p ) and = ( ::: p ). The 4

observed series can be expressed as amultiple linear regression model given by y t = t t + t + a t t = p + ::: n: (2.3) We assume that the outlier indicator t and the outlier magnitude t are independent and distributed as Bernoulli() and N ( 2 ) respectively for all t, where and 2 are hyperparameters. Therefore, the prior probability of being contaminated by an outlier is the same for all observations, namely P ( t =)=, for t = p + ::: n:the prior distribution for the contamination parameter is Beta( 2 ), with expectation E() = =( + 2 ). Abraham and Box (979) obtained the posterior distributions for the parameters in model (2.3), under Jerey's reference prior distribution for ( 2 ). In this case the conditional posterior distribution of, given any outlier conguration r, is a multivariate-t distribution, where r is one of the 2 n;p possible outlier congurations. Then the posterior distribution of is a mixture of 2 n;p multivariate-t distributions: P ( j y) = w r P ( j y r ) (2.4) where the summation is over all 2 n;p possible outlier congurations and the weight r, i.e. w r = P ( r j y). For such a model, we can identify the outliers using the posterior marginals of elements of, p t = P ( t =jy) = P ( rt j y) t = p + ::: n (2.5) where the summation is now over the 2 n;p; outlier congurations rt with t = (the posterior probabilities P ( rt j y) are easy to compute). The posterior distributions of the outlier magnitudes are mixtures of Student-t distributions: P ( t j y) = w r P ( t j y r ) t = p + ::: n: (2.6) Therefore, it is possible to derive the posterior probabilities for the parameters in model (2.3). In practice, however, the computation is very intensive, even when the sample size is small. Since these probabilities are mixtures of 2 n;p or 2 n;p; distributions. The approach becomes infeasible when the sample size is moderate or large and some alternative approach must is needed. One alternative isto use MCMC methods. 5

2.2. The standard Gibbs sampling and its diculties McCulloch and Tsay (994) proposed to compute the posterior distributions (2.4) ; (2.6) by Gibbs sampling. The procedure requires full conditional posterior distributions of each parameter in model (2.3) given all the other parameters. Barnett, Kohn and Sheather (996) generalized the model to include innovative outliers and order selection. They used MCMC methods with Metropolis-Hasting and Gibbs Sampling algorithms. For ease of reference, we summarize conditional posterior distributions, obtained rst by McCulloch and Tsay (994), with conjugate prior distributions for and 2 a. We use non-informative priors for these parameters. Note that, as the priors for and are proper, even if we assume improper priors for ( 2 a) the joint posterior is proper. If the hyperparameters, 2 and 2 are known, the conditional posterior distribution of the AR parameter vector is multivariate normal N p+ (? 2 a ), where the mean vector and the covariance matrix are? = n t=p+ t x t and = n t=p+ t t The conditional posterior distribution of the innovational variance 2 a is Inverted ; Gamma((n ; p)=2 ( P n t=p+ a2 t )=2). The conditional posterior distribution of depends only on the vector, it is Beta( + P t 2 +(n ; p) ; P t )). The conditional posterior mean of can then be expressed as a linear combination of the prior mean and the sample mean of the! ; data: E( j ) =!E()+(;!) where! =( + 2 )=( + 2 + n ; p). The conditional posterior distribution of j, j = p + ::: n, is Bernoulli with probability P ( j =j y 2 a (j) )= + : ; ( ; ) B (j) (2.7) where (j) is obtained from by eliminating the element j and B is the Bayes factor. 6

The logarithm of the Bayes factor B is where T j log B (j) = 2 2 a 2 4 T j t=j e t () 2 ; T j t=j e t () 2 3 5 (2.8) = min(n j + p), e t ( j ) = ~x t ; ; P p i= i~x t;i, and ~x t = y t ; t t is the residual at time t when the series is corrected by the identied outliers in (j) and j. It is easy to see that e t () = e t () + t;j j, where = ; and j = j for j = ::: p. The probability (2.7) has a simple interpretation. The two hypotheses j =(y j is contaminated by an outlier) and j =(y j is outlier free), given the data, only aect the residuals e j ::: e Tj. Assuming parameters are known, we can judge the likelihoods of these hypotheses by (a) computing the residuals e t () for t = j ::: T j (b) computing the residuals e t () and (c) comparing the two sets of residuals. The Bayes factor is the usual way of comparing the likelihoods of the two hypotheses. Since the residuals are one-step-ahead prediction errors, (2.8) compares the sum of prediction errors in the periods j j+ ::: T j when the forecasts are evaluated under the hypothesis j = and under the hypothesis j =. This is equivalent to the Chow test (96) for structural changes when the variance is known. The conditional posterior distributions of the outlier magnitudes j, for j = p + ::: n are N ( j? j 2 j ), where? j = 2 j 2 a e j () ; e j+ () ;; Tj ;je Tj () and 2 j = 2 2 a =( 2 2 T j ;j j + 2 a), with 2 T j ;j =(+ 2 + + 2 T j ;j). When y j (2.9) is identied as an outlier and there is no prior information about the outlier magnitude ( 2! ), the conditional posterior mean of j tends to ^ j = ;2 T j ;j [e j() ; e j+ () ; ::: Tj ;je Tj ()], which is the least squares estimate when the parameters are known (Chang, Tiao and Chen (988)) the conditional posterior variance of j tends to the variance of the estimate ^ j. The conditional posterior mean in (2.9) can also be seen as a linear combination of the prior mean and the outlier magnitude estimated from the data. The magnitude estimate is the dierence between 7

the observation y j and the conditional expectation of y j given all the data, ^y jjn, which is the linear predictor of y j that minimizes the mean squared error. Then (2.9) can be expressed as? j = 2 T 2 j ;j 2 T 2 + (y j ; ^y jjn )+ j ;j 2 a 2 a 2 2 T j ;j + 2 a (2.) where is the prior mean of j (zero in this paper). For the AR(p) model under study, the optimal linear predictor ^y jjn is a combination of the past and future p values of y j ^y jjn = ;2 T j ;j ~ T j ;j ; ;2 T j ;j @ p i= T j ;j;i t= t t+i x j;i + T j ;j i= T j ;j;i t= t t+i x j+i A (2.) where ~ t =; ;; t for t p. Using the truncated autoregressive polynomial Tj ;j(b) =(; B ;; Tj ;jb T j ;j ) and the \truncated" variance, 2 T j ;j =(+ 2 + + 2 T j ;j) of the dual process x D t = ~ p + a t ; a t; ;; p a t;p (2.2) the lter (2.) can be written as a function of the \truncated" autocorrelation generating function D T j ;j(b) = ;2 T j ;j p(b) Tj ;j(b ; ) of the dual process: ^y jjn = ;2 T j ;j ~ T j ;j; [ ; D T j ;j(b)]x j. We may use the above results to draw a sample of a Markov chain using Gibbs sampling. This Markov chain converges to the joint posterior distribution of the parameters. When the number of iterations is suciently large, the Gibbs draw can be regarded as a sample from the joint posterior distribution. These draws are easy to obtain because they are from well-known proability distributions. However, as shown by the simple example in.2, such a Gibbs sampling procedure may fare poorly when additive outliers appear in patches. To understand the situation more fully, consider the simplest situation in which the time series follows an AR() model with mean zero and there exists a patch of three additive outliers at time T ;, T, T +. To simplify the analysis, we rst assume that the three outliers have the same size, t = for t = T ; T T +,andthe AR 8

parameter is known. Checking if an observation is contaminated by an additive outlier amounts to comparing the observed value with its optimal interpolator derived using the model and the rest of the data. For an AR() process the optimal interpolator (2.) is ^y tjn = ( + 2 ) ; (y t; + y t+ ). The rst outlier in the patch is tested by comparing y T ; = x T ; + T ; with ^y T ;jn = ^x T ;jn + ( + 2 ) ; T. It is detected if T is suciently large, but its size will be underestimated. For instance, if is close to unity the estimated size will only be half the true size. For the second outlier, we compare y T = x T + T with ^y T jn = ^x T jn + (+ 2 ) ; ( T ; + T + )=^x T jn +(+ 2 ) ; (2 T ). If is close to one the outlier cannot be easily detected, because the dierence between the optimal interpolator and the observed value is x T ; ^x T jn +(; ) 2 ( + 2 ) ;, small when is close to unity. Note that in practice the masking eect is likely to remain even if we correctly identify an outlier at T ; and adjust y T ; accordingly, because, as mentioned above, the outlier size at T ; is underestimated. In short, if is close to unity the middle outlier at time T is very hard to detect. The detection of the outlier at time T +is also dicult and its size is underestimated. Next consider the case that the AR parameter is estimated from the sample. Without outliers the least squares estimate of the AR parameter is ^ = P x t x t; ( P x 2 t ) ; = r x (), the lag- sample autocorrelation. Suppose the series is contaminated by a patch of 2k +additive outliers at times T ; k ::: T ::: T + k, of sizes t = o t s x, P where s 2 x = x 2 t =n is the sample variance of the outlier-free series. In this case, the least squares estimate of the AR coecient based on the observed time series y t given, dropping terms of order o(n ; ), by P P ^ y = r x() + n ; k ;k o T P +j(~x T +j; +~x T +j+ )+n P ; k ;k o T +j o T +j; +2n ; k ;k o T +j ~x T +j + n ; k ;k (o T +j )2 where ~x t = x t =s x. If the outliers are large, with the same sign and similar sizes, then for a xed sample size n, ^y makes the identication of outliers dicult. will approach unity as the outlier sizes increase. This Note that characteristics of the outlier patch will appear clearly if the length of the patch is known and one interpolates the whole patch using observations before and 9 is

after the patch. This is the main idea of the adaptive Gibbs Sampling proposed in the next section. For the standard outlier detection procedure using Gibbs sampling, proper \block" interpolation can only occur if one uses ideal initial values that identify each point ofthe outlier patch as an outlier. Figure 2 shows the evolution of the masking problem when the Gibbs sampling iteration starts for the simulated data in (.2). At each iteration, the Gibbs sampler checks if an observation is an outlier by comparing the observed value with its optimal interpolator using (2.8) and (2.). The outlier sizes for 37 to 42 generated in each iteration are represented in Figure 2 as continuous sequences grey shadows represent iterations where the data are identied as outliers. The outliers at t = 38 and 4 are identied and corrected in few iterations, but their sizes are underestimated. The central outliers, t = 39 and 4, are identied in very few iterations and their sizes on these occasions are small. We can see that four outliers are never identied in the same iteration. If we use ideal initial values that assign each point of the outlier patch as an outlier, the sizes obtained at an iteration are the dots in Figure 2 (horizontal lines represent true outlier sizes). At the right side of the graphs, the estimation of the posterior distributions are represented for standard Gibbs sampling (the ll curve) and for the Gibbs sampler with \block" interpolation. One can regard these diculties as a practical convergence problem for the Gibbs sampler. This problem also appears in the regression case with multiple outliers, as shown by Justel and Pe~na (996). High parameter correlations and the large dimension of the parameter space slow down the convergence of the Gibbs sampler (see Hills and Smith (992)). Note that the dimension of the parameter space is 2n + p +3 and, for outliers in a patch, correlations are large among the outlier positions and among the outlier magnitudes. When Gibbs draws are from a joint distribution of highly correlated parameters, movements from one iteration to the next are in the principal components direction of the parameter space instead of parallel to the coordinate axes.

β 37 β 38 β 39 β 4 β 4 β 42 2 3 4 iteration Figure 2: Articial time series: outlier sizes from 37 to 42 generated in each iteration. Continuous sequences for the standard Gibbs sampler, and dots for the Gibbs sampler with \block" interpolation. The grey shadows are used to show the iterations where the data are identied as outliers. The horizontal lines represent the true values. 3. DETECTING OUTLIER PATCHES Our new procedure consists of two Gibbs runs. In the rst run, the standard Gibbs sampling of Section 2 is applied to the data. The results of this Gibbs run are then used to implement a second Gibbs sampling that is adaptive in treating identied outliers and in using block interpolation to reduce possible masking and swamping eects. In what follows, we divide the detail of the proposed procedure into subsections. The discussion focuses on the second Gibbs run and assumes that results of the rst Gibbs run are available. For ease in reference, let ^ (s), ^ (s) a, ^p (s), and ^ (s) be the posterior means based on the last r iterations of the rst Gibbs run which uses s

iterations, where the j-th element of^p (s) is ^p (s) p+j, the posterior probability thaty p+j is contaminated by an outlier. 3.. Location and joint estimation of outlier patches The biases in ^ (s) induced by the masking eects of multiple outliers come from several sources. Two main sources are (a) drawing values of j one by one and (b) the misspecication of the prior mean of j, xed to zero. One-by-one drawing overlooks the dependence between parameters. For an AR(p) process, an additive outlier aects p + residuals and the usual interpolation (or ltering) involves p observations before and after the time index of interest. Consequently, an additive outlier aects the conditional posterior distributions of 2p +observations see (2.8) and (2.9). Chen and Liu (993) pointed out that estimates of outlier magnitudes computed separately can dier markedly from those obtained from a joint estimation. The situation becomes more serious in the presence of k consecutive additive outliers for which the outliers aect 2p + k observations. We make use of the results of the rst Gibbs sampler to identify possible locations and block sizes of outlier patches. The tentative specication of locations and block sizes of outlier patches is done by a forward-backward search using a window around the outliers identied by the rst Gibbs run. Let c be a critical value between and used to identify potential outliers. An observation whose posterior probability of being an outlier exceeds c is classied as an \identied" outlier. More specically, y j is identied as an outlier if ^p (s) j > c. Typically we use c =:5. Let ft ::: t m g be the collection of time indexes of outliers identied by the rst Gibbs run. Consider patch size. We select another critical value c 2, c 2 c to specify the beginning and end points of a \potential" outlier patch associated with an identied outlier. In addition, because the length of an outlier patch cannot be too large relatively to the sample size, we select a window of length 2hp to search for the boundary points of a possible outlier patch. For example, consider an \identied" outlier y ti. First, we 2

check the hp observations before y ti and compare their posterior probabilities ^p (s) j with c 2. Any point within the window with ^p (s) j > c 2 is regarded as a possible beginning point of an outlier patch associated with y ti. We then select the farthest point from y ti as the beginning point of the outlier patch. Denote the point by y ti ;k i. Second, do the same for the hp observations after y ti and select the farthest point from y ti with ^p (s) j >c 2 as the end point of the outlier patch. Denote the end pointby y ti +v i. Combine the two blocks to form a tentative candidate for an outlier patch associated with y ti, denoted by (y ti ;k i ::: y ti +v i ). Finally, consider jointly all the identied outlier patches for further renement. Overlapping or consecutive patches should be merged to form a larger patch if the total number of outliers is greater than n=2, where n is the sample size, increase c 2 and re-specify possible outlier patches if increasing c 2 cannot suciently reduce the total number of outliers, choose a smaller h and re-specify outlier patches. With outlier patches tentatively specied, draw Gibbs samples jointly within a patch. Suppose that a patch of k outliers starting at time index j is identied. Let j k =( j ::: j+k; ) and j k =( j ::: j+k; ) be the vectors of outlier indicators and magnitudes, respectively, for the patch. To complete the sampling scheme we need the conditional posterior distributions of j k and j k, given the others. We give these distributions in the next theorem, derivations are in the Appendix. Theorem. Let y =(y ::: y n ) be a vector of observations according to (2.3), with no outliers in the rst p data points. Assume t Bernoulli(), t = p + ::: n, and P ( 2 a ) / ;2 a ; ( ; ) 2; exp ; 2 2 n t=p+ where the parameters, 2 and 2 are known. Let e t ( j k )=x t ; ; P p i= ix t;i be the residual at time t when the series is adjusted for all identied outliers not in the interval [j j + k ;] with the outliers identied in j k, and T j k =minfn j + k + p;g. Then the following hold. a) The conditional posterior probability of a block conguration j k, given the sample 3 2 t!

and the other parameters, is where s j k p j k = C s j k ( ; ) k;s j k exp @ ; 2 2 a T j k t=j e t ( j k ) 2 A (3.3) P j+k; = t=j t, and C is a normalization constant so that the total probability of the 2 k possible congurations of j k is one. b) The conditional posterior distribution of j k given the sample and other parameters is N k ;? j k j k,? j k = j k @ ; j k = 2 a @ Dj k @ a 2 T j k t=j T j k t=j e t ()D j k t;j + t;j t;j A 2 (3.4) A Dj k + I 2 ; A (3.5) where D j k is a k k diagonal matrix with elements j ::: j+k;, and t = ( t t; ::: t;k+ ) is a k vector, with = ;, i = i for i = ::: p, and i =for i< or i>p. After computing the probabilities (3.3) for all 2 k possible congurations for the block j k, the outlying status of each observation in the outlier patch will be classied separately. Another possibility, suggested by a referee, is to engineer some Metropolis- Hasting moves by using Theorem. An advantage of our procedure is that we can generate large block congurations from (3.3) without computing C, but an optimal criterion (in the sense of acceptance rate and moving) should be found. This possibility will be explored in future work. P Let W = a ;2 Tj k j k D j k t=j t;j t;jd j k and W 2 = ;2 j k. Then? j k can be written as? j k = W ~ j k +W 2, where W +W 2 = I, implying that the mean of the conditional posterior distribution of j k is a linear combination of the prior mean vector and the least squares estimate (or the maximum likelihood estimate) of the 4

outlier magnitudes for an outlier patch ~ j k = T j k @ Dj k t=j ; T j k t;j t;jd j k A @ ; t=j e t ()D j k j k A : (3.6) Pe~na and Maravall (99) proved that, when t =, the estimate in (3.6) is equivalent to the vector of dierences between the observations (y j ::: y j+k; ) and the predictions ^y t = E(y t j y ::: y j; y j+k ::: y n ) for t = j ::: j + k ;. The matrix = P T j k t=j t;j t;j is the k k submatrix of the \truncated" autocovariance generating matrix of the dual process in (2.2). Specically, where D i j coecient of B i = B @ = 2 j D i j, 2 j T 2 j k ;j D T j k ;j; ::: D k; T j k ;j;k+ D ; T j k ;j T 2 j k ;j; ::: D k;2 T j k ;j;k+...... D ;k+ T j k ;j D ;k+2 T j k ; ::: 2 T j k ;j;k+ C A is the \truncated" variance of the dual process and D i j is the in the \truncated" autocorrelation generating function of the dual process, i.e. D j (B) = ;2 j p (B) j (B ; ). 3.2. The Second Gibbs Sampling We discuss the second adaptive Gibbs run of the proposed procedure. The results of the rst Gibbs run provide useful information to start the second Gibbs sampling and to specify prior distributions of the parameters. The starting values of t are as follows: () t =if^p (s) t > :5, i.e., if y t belongs to an identied outlier patch otherwise, () t =. The prior distributions of t are as follows. (s) a) If y t is identied as an isolated outlier the prior distribution of t is N ( ^ t 2 ), where ^ (s) t is the Gibbs estimate of t from the rst Gibbs run. b) If y t belongs to an outlier patch the prior distribution of t is N ( ~ (s) t 2 ), where ~ (s) t is the conditional posterior mean given in (3.6). 5

c) If y t does not belong to any outlier patch, and is not an isolated outlier, then the prior distribution of t is N ( 2 ). For each outlier patch, the results of Theorem are used to draw j k and j k in the second Gibbs sampling. The second Gibbs sampling is also run for s iterations, but only the results of the last r iterations are used to make inference. The number s can be determined by any sequential method proposed in the literature to monitor the convergence of Gibbs sampling. We use a method that can be easily implemented, based on comparing the estimates of outlying probability for each data point, computed with non-overlapping segments of samples. Specically, after a burn-in period of b =5 iterations, we assume convergence has been achieved if the standard test for the equality of two proportions is not rejected. Thus, calling ^p (s) t the probability that y t is an outlier computed with the iterations from s ; r + to s, we assume convergence if for all t = p + ::: n the dierences ^p (s) t ; ^p (s;r) t p are smaller than =3 :5 2 =s r. An alternative procedure for handling outlier patches is to use the ideas of Bruce and Martin (989). This procedure involves two steps. First, select a positive integer k in the interval [ n=2] as the maximum length of a outlier patch. Second, start the Gibbs sampler with n;k ;p parallel trials. In the jth trial, for j = ::: n;k ;p, the points at t = p+j to p+k+j are assigned initially as outliers. For other data points, use the usual method to assign initial values. In application, one can use several dierent k values. However, such a procedure requires intensive computation, especially when n is large. 4. APPLICATIONS Here we re-analyze the simulated time series of Section andthen consider some real data. We compare the results of the usual Gibbs sampling, referred to as standard Gibbs sampling, with those of the adaptive Gibbs sampling to see the ecacy of the latter algorithm. The example demonstrates the applicability and eectiveness of the adaptive Gibbs sampling, and it shows that patches of outliers occur in applications. 6

Parameter 27 37 38 39 4 4 42 True Value -3 9 Standard GS -3.3-3.42 3.3.5 -.5 4.5-2.9 Adaptive GS -3.6 -.9.97.63.43.9 -.23 Table : Outlier magnitudes: true values and estimates obtained by the standard and adaptive Gibbs samplings. 4.. Simulated Data Revisited As shown in Figure, standard Gibbs sampling can easily detect the isolated outlier at t =27of the simulated AR(3) example, but it has diculty with the outlier patch in the period 38 ; 4. For the adaptive Gibbs sampling, we choose hyperparameters = 5, 2 = 95 and = 3, implying that the contamination parameter has a prior mean = :5, and the prior standard deviation of t is three times the residual standard deviation. Using =:47 to monitor convergence, we obtained s =26 iterations for the rst Gibbs sampling and s =7 iterations for the second, adaptive Gibbs sampling. All parameter estimates reported are the sample means of the last r = iterations. For specifying the location of an outlier patch, we chose c =:5, c 2 = :3, and the window length 2p to search for boundary points of possible outlier patches, where p = 3 is the autoregressive order of the series. Additional checking conrms that results are stable over minor modications of these parameter values. The results of the rst run are shown in Figure and summarized in Table. As before, the procedure indicates a possible patch of outliers from t = 37 to 42. the second run the initial conditions and the prior distributions are specied by the proposed adaptive procedure. The posterior probability of outlier for each data point, ^p (s) t, is shown in Figure 3-a, and the posterior mean of the outlier sizes is shown in Figure 3-b. Adaptive Gibbs sampling successfully species all outliers, and there are no swamping or masking eects. In Figure 3-c we compare the posterior distributions of the adaptive (shadow area) and the standard (dotted curve) Gibbs sampling for the error variance and the autoregressive parameters in Table we compare some of the In 7

outlier posterior probabilities.8.6.4.2 (a) 2 3 4 5 t outlier sizes 5 5 (b) 5 2 3 4 5 t (c) σ 2.5.5 2 2.5 3 3.5 4 φ.5.5 φ 2.5.5 2 φ 2 2.5.5 2.5 φ 3.5.5 Figure 3: Adaptive Gibbs sampling results with 7, iterations for the articial time series with ve outliers: (a) posterior probabilities for each data point to be outlier, (b) posterior mean estimates of the outlier sizes for each data, and (c) kernel estimates of the posterior marginal parameter distributions the dotted lines are estimates from the rst run and vertical lines mark true values. outlier sizes with the true values. One clearly sees the ecacy and added value of the adaptive Gibbs sampling in this way. Figure 4 shows the scatterplots of Gibbs draws of outlier sizes for t = 37 ::: 42. The right panel is for adaptive Gibbs sampling whereas the left panel is for the standard Gibbs sampling. The plots on the diagonals are histograms of outlier sizes. Adaptive Gibbs sampling exhibits high correlations between sizes of consecutive outliers, in agreement with outlier sizes used. On the other hand, scatterplots of the standard Gibbs sampling do not show adequately correlations between outlier sizes. Finally, we also ran a more extensive simulation study where the locations for the isolated outlier and the patch were randomly selected. Using the same x t sequence, we ran standard and adaptive Gibbs sampling for 2 cases with the following results.. The isolated outlier was identied by standard and adaptive Gibbs sampling procedures in all cases. 2. The standard Gibbs sampler failed to detect all the outliers in each of the 2 simulations. A typical outcome, as pointed out in Section 3, had the extreme points of the outlier patch and their neighboring points identied as outliers observations in the middle of the outlier patch were subjected to masking eects. Occasionally, the standard Gibbs sampler did correctly identify three of the four 8

β 37 β 38 β 39 β 4 β 4 β 42 5 5 5 β 37 β 38 β 39 β 4 5 β 4 β 42 5 5 2 5 5 5 5 β 37 5 5 2 5 5 5 5 β 38 β 39 β 4 β 4 β 42 Figure 4: Scatterplots of the standard (left) and adaptive (right) Gibbs sampler output for 37 to 42 and the histograms of each magnitude in the diagonal. outliers in the patch. Figure 5-left shows a bar plot for the relative frequencies of outlier detection for each data point in the 2 runs. We summarize the relative frequency of identication for each outlier in the patch (bars st ; 4th) and their two neighboring \good" observations. 3. In all simulations, the proposed procedure indicates a possible patch of outliers that includes the true outlier patch. 4. The adaptive Gibbs sampler failed in 3 cases, corresponding to randomly selected patches in the same region of the series. Figure 5-right shows the positive performance of the adaptive algorithm in detecting all outliers in the patch. 4.2. A Real Example Consider the data of monthly U.S. Industry-unlled orders for radio and TV, in millions of dollars, as studied by Bruce and Martin (989) among others. We use the logged series from January 958 to October 98, and focus on the seasonally adjusted series, 9

Standard GS Adaptive GS.8.8.6.6.4.4.2.2 st 2 nd 3 th 4 th outlier patch st 2 nd 3 th 4 th outlier patch Figure 5: Bar plots for the relative frequencies, in 2 simulations, of outlier detection of each data in the patch (bars st; 4th), previous and next \good" observations. Left: standard Gibbs sampling. Right: adaptive Gibbs sampling. where the seasonal componentwas removed by the well-known -ARIMA procedure. The seasonally adjusted series is shown in Figure 6, and can be download from the same web site as the simulated data. An AR(3) model is t to the data and the estimated parameter values are given in the rst row oftable 2. The residual plot of the t, also shown in Figure 6, indicates possible isolated outliers and outlier patches, especially in the latter part of the series. The hyperparameters needed to run the adaptive Gibbs algorithm are set by the same criteria as those of the simulated example: = 5, 2 = 95, = 3 a = :273. In this particular instance, r = and the stopping criterion =:47 is achieved by 96 iterations in the rst Gibbs run and by iterations in the second. As before, to specify possible outlier patches prior to running the adaptive Gibbs sampling, the window width is set to twice the AR order, c = :5 and c 2 = :3. In addition, we assume that the maximum length of an outlier patch is months, just below one year. Using.5 as the cut-o posterior probability to identify outliers, we summarize the results of standard and adaptive Gibbs sampling in Table 3. The standard Gibbs algorithm identies 3 data points as outliers. Nine isolated outliers and two outlier patches both of length 2. The two outlier patches are 6-7/968 and 2-3/978. The outlier posterior probability for each data point is shown in Figure 7. On the other 2

6.5 log of TV & radio orders 6 5.5.5 residuals.5 958 96 962 964 966 968 97 972 974 976 978 98 Figure 6: Top: Seasonally adjusted series of the logarithm of U.S. Industry-unlled orders for radio and TV. Bottom: Residual plot for the AR(3) model tted to the series. hand, the second, adaptive Gibbs sampling species data points as outliers six isolated outliers, and two outlier patches of length 2 and 3 at 2-3/978 and 3-5/979, respectively. The outlier posterior probabilities based on adaptive Gibbs sampling are also presented in Figure 7. Finally, Table 4 presents outlier posterior probabilities for each data point in the detected outlier patches, for both the standard and adaptive Gibbs samplings. For the possible outlier patch from January to July 968, the two algorithms show dierent results: standard Gibbs sampling identies the patch 6-7/968 as outliers adaptive Gibbs sampling detects an isolated outlier at 5/68. For the possible outlier patch from September 976 to March 977, the standard algorithm detects an isolated outlier at /77, while the adaptive algorithm does not detect any outlier within the patch. For the possible outlier patch from March to September 979, the standard algorithm identies two isolated outliers in April and September. On the other hand, the adaptive algorithm substantially increases the outlying posterior probabilities for March and April of 979 and, hence, changes an isolated outlier into a patch of three outliers. The isolated outlier in September remains unchanged. Based on the estimated outlier sizes in Table 3, the standard Gibbs algorithm seems to encounter 2

Parameter 2 3 a Initial model.28.6.9.5.9 Standard GS.8.83.9 -.5.62 Adaptive GS.8.78.23 -.4.62 Table 2: Estimated parameter values with the initial model and those obtained by the standard and the adaptive Gibbs sampling algorithms. Date 5/68 6/68 7/68 /73 9/76 /77 3/77 8/77 Standard GS Outlier probability.3.74.66.54.99.53.99.99 Outlier size -.7.7.2 -.9 -.22 -.8 -.23.2 Adaptive GS Outlier probability.62.45.27.37.99.24.99. Outlier size -.5.4.3 -.6 -.2 -.3 -.2.2 Date 2/78 3/78 9/78 3/79 4/79 5/79 9/79 Standard GS Outlier probability....5.36.. Outlier size -.26 -.27.37.9.6 -.4.26 Adaptive GS Outlier probability....92.89.. Outlier size -.28 -.28.37.24.23 -.28.32 Table 3: Posterior probabilities for each data point tobeanoutlierby the standard and the adaptive Gibbs sampling algorithms. Estimated outlier sizes by the standard and the adaptive Gibbs sampling algorithms. severe masking eects for March and April of 979. Figure 8 shows the time plot of posterior means of residuals obtained by adaptive Gibbs sampling and should be compared with the residuals we obtained without outlier adjustment in Figure 6. The results of this example demonstrate that (a) outlier patches occur frequently in practice and (b) the standard Gibbs sampling to outlier detection may encounter severe masking and swamping eects while the adaptive Gibbs sampling performs reasonably well. 22

outlier posterior probabilities.8.6.4.2 Standard GS 958 96 962 964 966 968 97 972 974 976 978 98 outlier posterior probabilities.8.6.4.2 Adaptive GS 958 96 962 964 966 968 97 972 974 976 978 98 Figure 7: Posterior probability for each data point to be outlier. with the standard (top) and the adaptive (bottom) Gibbs sampling..2. Adaptive GS residuals..2 958 96 962 964 966 968 97 972 974 976 978 98 Figure 8: Mean residuals with the adaptive Gibbs sampler 23

Standard Gibbs Sampling Adaptive Gibbs Sampling.5.5.3.3 /68 3/68 5/68 7/68 /68 3/68 5/68 7/68 Date /68 2/68 3/68 4/68 5/68 6/68 7/68 Standard GS.35.24.7.293.38.738.66 Adaptive GS.3.5.4.489.62.447.269 Standard Gibbs Sampling Adaptive Gibbs Sampling.5.5.3.3 9/76 /76 /77 3/77 9/76 /76 /77 3/77 Date 9/76 /76 /76 2/76 /77 2/77 3/77 Standard GS.999.4.3.47.526.97.99 Adaptive GS.999..3.59.244.36.989 Standard Gibbs Sampling Adaptive Gibbs Sampling.5.5.3.3 3/79 5/79 7/79 9/79 3/79 5/79 7/79 9/79 Date 3/79 4/79 5/79 6/79 7/79 8/79 9/79 Standard GS.5.356..23.25.36. Adaptive GS.98.893..56..66. Table 4: Posterior outlier probabilities for each data point in the three larger possible outlier patches. ACKNOWLEDGEMENTS The research of Ana Justel was nanced by the European Commission in the Training and Mobility of Researchers Programme (TMR). Ana Justel and Daniel Pe~na acknowledge research support provided by DGES (Spain), under grants PB97{2 and PB96{, respectively. The work of Ruey S. Tsay is supported by the National Science Foundation, the Chiang Chien-Kuo Foundation, and the Graduate School of Business, University of Chicago. 24

APPENDI: PROOF OF THEOREM The conditional distribution of j k given the sample and the other parameters is P ( j k j y j k ) / f(y j j k j k ) s j k ( ; ) k;s j k : (4.) The likelihood function can be factorized as f(y j j k j k )=f(y j; p+ j j k ) f(y T j k j j y j; p+ j k j k ) f(y n T j k + j y T j k p+ j k ) where y k j = (y j ::: y k ). Only f(y T j k j j y j; p+ j k j k ) depends on j k and it is the product of the conditional densities: f(y j j y j; p+ j k j ) / exp f(y j+k; j y j+k;2 p+ j k j k ) / exp f(y j+k j y j+k; p+ j k j k ) / exp. ; (e j () ; j j ) 2 2a 2 ; 2a 2 ; 2a 2 (e j+k; () ; j+k; j+k; + + k; j j ) 2 (e j+k ()+ j+k; j+k; + + k j j ) 2 f(y Tj k j y T j k; p+ j k j k ) / exp. ; (e 2a 2 Tj k ()+ Tj k ;j;k+ j+k; j+k; + + + Tj k ;j j j ) 2 : Hence the likelihood function can be expressed as f(y j j k j k )/exp@ ; 2 2 a @ j+k; t=j and the residual e t ( j k )isgiven by e t ( j k )= 8 >< >: e t ()+ e t ()+ t;j i= t;j t;j (e t ()+ i= i t;i t;i ) 2 + T j k t=j+k (e t ()+ i t;i t;i if t = j ::: j+ k ; i=t;j;k+ i t;i t;i 25 if t>j+ k ; t;j i=t;j;k+ i t;i t;i ) AA 2 (4.2)

where = ;, i = i for i = ::: p and i = for i < and i > p. Therefore, (4.2) can be written as f(y j j k j k )/exp@ ; 2 2 a T j k t=j e t ( j k ) 2 A : Then by replacing in (4.) we obtain (3.3) for any conguration of the vector j k. 2 The conditional distribution of j k given the sample and the other parameters is P ( j k j y j k ) / f(y j j k j k ) P ( j k ): Using (4.2) f(y j j k j k ) / exp Therefore, P ( j k j y j k ) / exp @ ; 2 2 a @ ; 2 2 a exp ; / exp @ ; 2 ;2 / exp @ ; a 2 T j k t=j T j k t=j @ j k (e t ()+ t;jd j k j k ) (e t ()+ t;jd j k j k ) A : (e t ()+ t;jd j k j k ) (e t ()+ t;jd j k j k ) A @ 2 a 2 2 ( j k ; ) ( j k ; ) T j k t=j T j k t=j e t () t;jd j k + D j k t;j t;jd j k + I 2 2 ; 2 ( j k ;? j k) ; j k ( j k ;? j k) A j k AA A j k where j k and? j k are dened in (3.4) and (3.5) respectively. 2 26

REFERENCES Abraham, B. and Box, G.E.P. (979). Bayesian analysis of some outlier problems in time series. Biometrika 66, 229{236. Barnett, G., Kohn, R. and Sheather, S. (996). Bayesian estimation of an autoregressive model using Markov chain Monte Carlo. Journal of Econometrics 74, 237{254. Barnett, G., Kohn, R. and Sheather, S. (997). Robust Bayesian estimation of autoregressive-moving average models. Journal of Time Series Analysis 8, {28. Bruce, A.G. and Martin, D. (989). Leave-k-out diagnostics for time series (with discussion). Journal of the Royal Statistical Society, B 5, 363{424. Chang, I. and Tiao, G.C. (983). Estimation of time series parameters in the presence of outliers. Technical Report 8, Statistics Research Center, University of Chicago. Chang, I., Tiao, G.C. and Chen, C. (988). Estimation of time series parameters in the presence of outliers. Technometrics 3, 93{24. Chen, C. and Liu, L. (993). Joint estimation of model parameters and outlier eects in time series. Journal of the American Statistical Association 88, 284{297. Chow, G.C. (96). A test for equality between sets of observations in two linear regressions. Econometrica 28, 59{65. Fox, A.J. (972). Outliers in time series. Journal of the Royal Statistical Society, B 34, 35{363. Hills, S.E. and Smith, A.F.M. (992). Parameterization issues in Bayesian inference". In Bayesian Statistics, 4 (Eds. J.M. Bernardo, J. Berger, A.P. Dawid and A.F.M. Smith), 64{649. Oxford University Press. Justel, A. and Pe~na, D. (996). Gibbs sampling will fail in outlier problems with strong masking. Journal of Computational and Graphical Statistics 5, 76{89. McCulloch, R.E. and Tsay R.S. (994). Bayesian analysis of autoregressive time series via the Gibbs sampler. Journal of Time Series Analysis 5, 235{25. Pe~na, D. (987). Measuring the importance of outliers in ARIMA models. New Perspectives in Theoretical and Applied Statistics (Eds. Puri et al.), 9{8. John 27

Wiley, New York. Pe~na, D. (99). Inuential observations in time series. Journal of Business & Economic Statistics 8, 235{24. Pe~na, D. and Maravall, A. (99). "Interpolation, Outliers and Inverse Autocorrelations".Communication in Statistics (Theory and Methods) 2, 375{386. Sanchez, M.J. and Pe~na, D. (997). The identication of multiple outliers in ARIMA models. Working Paper 97-76, Universidad Carlos III de Madrid. Tsay, R.S. (986). Time series model specication in the presence of outliers, Journal of the American Statistical Association 8, 32{4. Tsay, R.S. (988). Outliers, level shifts, and variance change in time series. Journal of Forecasting 7, {2. Tsay, R.S., Pe~na D. and Pankratz A.E. (998). Outliers in multivariate time series. Working Paper 98-96, Universidad Carlos III de Madrid. 28