Younshik Chung and Hyungsoon Kim 968). Sharples(990) showed how variance ination can be incorporated easily into general hierarchical models, retainin

Similar documents
Lecture 16: Mixtures of Generalized Linear Models

November 2002 STA Random Effects Selection in Linear Mixed Models

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Bayesian Linear Regression

Part 6: Multivariate Normal and Linear Models

STA 4273H: Statistical Machine Learning

Markov Chain Monte Carlo in Practice

ON BAYESIAN ANALYSIS OF MULTI-RATER ORDINAL DATA: AN APPLICATION TO. Valen E. Johnson. Duke University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

outlier posterior probabilities outlier sizes AR(3) artificial time series t (c) (a) (b) φ

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

STA 216, GLM, Lecture 16. October 29, 2007

MULTILEVEL IMPUTATION 1

On Bayesian Inference with Conjugate Priors for Scale Mixtures of Normal Distributions

Wageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice

Rank Regression with Normal Residuals using the Gibbs Sampler

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

Bayesian linear regression

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Metropolis-Hastings Algorithm

Markov Chain Monte Carlo methods

STA 4273H: Statistical Machine Learning

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Extended Voting Measures. Andrew Heard and Tim Swartz. Abstract. This paper extends a number of voting measures used to quantify voting power.

Fahrmeir: Recent Advances in Semiparametric Bayesian Function Estimation

In: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999

Lecture Notes based on Koop (2003) Bayesian Econometrics

Bayesian Inference. Chapter 9. Linear models and regression

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

On Reparametrization and the Gibbs Sampler

Part 8: GLMs and Hierarchical LMs and GLMs

Factorization of Seperable and Patterned Covariance Matrices for Gibbs Sampling

Gibbs Sampling in Latent Variable Models #1

ST 740: Linear Models and Multivariate Normal Inference

4. Hierarchical and Empirical Bayes Analyses 4.1 Introduction Many statistical applications involve multiple parameters that can be regarded as relate

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Hierarchical Modeling for Univariate Spatial Data

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

eqr094: Hierarchical MCMC for Bayesian System Reliability

Using Bayesian Priors for More Flexible Latent Class Analysis

Introduction to Graphical Models

Default Priors and Effcient Posterior Computation in Bayesian

Making rating curves - the Bayesian approach

Nonlinear Inequality Constrained Ridge Regression Estimator

On the identification of outliers in a simple model

var D (B) = var(b? E D (B)) = var(b)? cov(b; D)(var(D))?1 cov(d; B) (2) Stone [14], and Hartigan [9] are among the rst to discuss the role of such ass

Bayesian Mixture Labeling by Minimizing Deviance of. Classification Probabilities to Reference Labels

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Principles of Bayesian Inference

Detection of Outliers in Regression Analysis by Information Criteria

Statistical Inference for Stochastic Epidemic Models

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Bayesian Methods for Machine Learning

Bayes methods for categorical data. April 25, 2017

The linear model is the most fundamental of all serious statistical models encompassing:

Hmms with variable dimension structures and extensions

Sparse Factor-Analytic Probit Models

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

Bayesian Mixture Modeling

Bayesian Linear Models

Bayesian methods for categorical data under informative censoring

the US, Germany and Japan University of Basel Abstract

Bayesian time series classification

A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1

Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal?

Markov Chain Monte Carlo

CTDL-Positive Stable Frailty Model

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Nonparametric Bayes tensor factorizations for big data

Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates

Likelihood Ratio Tests and Intersection-Union Tests. Roger L. Berger. Department of Statistics, North Carolina State University

Bayesian Analysis of Vector ARMA Models using Gibbs Sampling. Department of Mathematics and. June 12, 1996

1. INTRODUCTION Propp and Wilson (1996,1998) described a protocol called \coupling from the past" (CFTP) for exact sampling from a distribution using

Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Linear Models A linear model is defined by the expression

Conjugate Analysis for the Linear Model

Bayesian Linear Models

Bayesian Modeling of Conditional Distributions

Fractional Imputation in Survey Sampling: A Comparative Review

A Simple Solution to Bayesian Mixture Labeling

CPSC 540: Machine Learning

Multivariate Bayesian variable selection and prediction

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Multivariate exponential power distributions as mixtures of normal distributions with Bayesian applications

Module 11: Linear Regression. Rebecca C. Steorts

Online tests of Kalman filter consistency

Managing Uncertainty

On a multivariate implementation of the Gibbs sampler

Part 7: Hierarchical Modeling

Introduction to Gaussian Processes

Markov Chain Monte Carlo

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.

Bayesian model selection: methodology, computation and applications

MCMC algorithms for fitting Bayesian models

Bayesian Linear Models

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

Transcription:

Bayesian Outlier Detection in Regression Model Younshik Chung and Hyungsoon Kim Abstract The problem of 'outliers', observations which look suspicious in some way, has long been one of the most concern in the statistical structure to experimenters and data analysts. We propose a model for an outlier problem and also analyze it in linear regression model using a Bayesian approach. Then we use the mean-shift model and SSVS(George and McCulloch, 993)'s idea which is based on the data augmentation method. The advantage of proposed method is to nd a subset of data which is most suspicious in the given model by the posterior probability. The MCMC method(gibbs sampler) can be used to overcome the complicated Bayesian computation. Finally, a proposed method is applied to a simulated data and a real data. KEYWORDS : Gibbs sampler; Latent variable; Linear mixed normal model; Linear regression model; Mean-shift model; Outlier; Variance-ination model;. INTRODUCTION The problem of 'outliers', observations which look suspicious in some way, has long been one of the most concern in the statistical structure to experimenters and data analysts. In this paper, we propose a model for an outlier problem and also analyze it using a Bayesian approach. The Bayesian approaches for outlier detection can be classied to two procedures such as using alternative model for outliers or not. For the method without having alternative model, the predictive distribution is used in Geisser(985) and ettit and Smith(985), or the posterior distribution used in Johnson and Geisser(983), Chaloner and Brant(988) and Guttman and ena(993). For alternative model, the mean-shift model or the variance-ination model is used in Guttman(973) and Sharples(990). Let Y be an observation vector from N(; ). The mean-shift model is that a suspicious observation is distributed as N( + m; ). If m is not a zero, the corresponding observation is decided as an outlier, Guttman(973) applied the mean-shift model to a linear model. The variance-ination model is that an observation y i be from N(; b i ) where the observation, y i, with b i >>, is treated as an outlier (Box and Tiao, Department of Statistics, usan National University, usan, 609-735 Korea Department of Statistics, usan National University, usan, 609-735 Korea

Younshik Chung and Hyungsoon Kim 968). Sharples(990) showed how variance ination can be incorporated easily into general hierarchical models, retaining tractability of analysis. In this paper, we use the mean-shift model in linear regression model. Specifically, we assume that for some particular (n p) matrix X of constants(or design matrix), it is intended to generate data Y = (Y ; ; Y n ) t, such that Y = X + (.) where = ( ; ; p ) t is a set of p unknown regression parameters, and where the (n ) error vector is normally distributed with mean 0 and variance-covariance matrix I n, where is unknown. Despite the fact that (.) is intended generation scheme, it is feared that departures from this model will occur, indeed that the observations y may be generated as follows; for i = ; ; n, y i = px j= x ij j + m i + i : (.) Therefore if m i = 0, then ith observation is not an outlier. Otherwise, it is considered as an outlier. For detecting outliers, we use SSVS(stochstic search variable selection) method in George and McCulloch(993). The SSVS was introduced as the method for selecting the best predictors in multiple regression model using Gibbs sampler. In general likelihood approches, we nd one outlier which is most suspicious and next the pair and the triple of outliers, and so on. Then we can not compare the one most outlier with a pair of most outliers and there exits sometimes masking. But our Bayesian method overcomes such problems. In this procedure, the most great advantage is that we can nd the set of outliers which is most suspecious among the data. The plan of this article is as follows. In section, inorder to nd the outliers, we introduce and motivate the hierarchical framework that depends on the SSVS in George and McCulloch(993). In section 3, we illustrate our proposed method on a simulated example and a real data set (Darwin's data: Box and Tiao, 973). Finally, in Section 4, we extend our proposed method to normal linear mixed model... SSVS For Outliers. Hierarchical Bayesian Formulation For detecting outliers in linear regression model, we apply the mean-shift model to linear regression model. Despite the fact that (.) is intended generation scheme, it is feared that departures from this model will occur, indeed that the

Bayesian outlier detection 3 observations y may be generated as follows; for i = ; ; n, y i = px j= x ij j + m i + i : (.) Therefore our model is reexpressed as follows; Y = X ~ ~ + (.) where X ~ = [X; In ] and ~ = ( ; ; p ; m ; ; m n ) t identity matrix. and I n denotes the n n by By introducing the latent variable j = 0 or, we represent our normal mixture m j j j (? j )N(0; j ) + j N(0; c j j ) (.3) and r( j = ) =? r( j = 0) = p j : (.4) This is based on the data augmentation idea of tanner and Wong(987). George and McCulloch(993) use the same structure (.3) and (.4) for selecting variables in linear regression model. Diebolt and Robert(994) have also successfully used this approach for selecting the number of components in the mixture distribution. When i = 0, m i N(0; i ), and when i =, m i N(0; c i i ). Following George and McCulloch(993), rst, we set i (> 0)small so that if i = 0, then m i would probably be so small that it could be "safely" estimated by 0. Second, we set c i large (c i > always) so that if i = then a non-zero estimate of m i means that the corresponding data y i should probably be an outlier in the given model. To obtain (.3) as the prior for m i j i, we use a multivariate normal prior m j N k (0; D RD ); (.5) where m = (m ; ; m n ), = ( ; ; k ), R is the prior correlation matrix, and D = diag[a ; : : : ; a k k ] (.6) with a i = if i = 0 and a i = c i if i =. For selecting the values of tunning factors c and i, see George and McCulloch(993) and section.. Also see George and McCulloch(993) for selection of R. As a particular interest, the identity matrix can be used for R. The choice of f() should incorporate any available prior information about which subsets of y ; ; y n should be outliers in the given model. Although this may seem dicult with n possible choices, especially with large n, symmetry

4 Younshik Chung and Hyungsoon Kim considerations may simplify this work. For example, a reasonable choice might have the 's independent with marginal distributions (.4), so that f( j p ; ; p n ) = ny i= p i i (? p i) (? i) : (.7) Although (.7) implies that the outlier of y i is independent on the outlier of y j for all i 6= j, we found it to work well in the various situations. The uniform or indierence prior f() =?n is the special case of (.7 ) where each y i has an equal chance to be an outlier. Also, it is assumed that the prior of = ( ; ; p ) is normal distribution with mean vector = ( ; ; p ) and variance-covariance matrix?. That is, Finally, we use the inverse gamma conjugate prior N(;? ): (.8) j IG( ; ): (.9) In this procedure, our main concern for embedding the normal linear model (.) into the hierarchical mixture model is to obtain the marginal posterior distribution f( j Y ) / f(y j )f(), which contains the information relevant to outliers. As mentioned as before, f() may be interpretated as the statistician's prior probability that the y i 's corresponding to an non-zero components of should be outliers in the given model. The posterior density f( j Y ) updates the prior probabilities on each of the n possible values of. Identifying each with a subset of data via that i = is equivalent to that y i is an outlier, those with higer posterior probability f( j Y ) identify the subset of data which is most suspicious by data and the statistician's prior information. Therefore, f( j Y ) provides a ranking that can be used to select a subset of the most suspicious data.. Choice of c j and j As an idea discussed in George and McCulloch(993), the choice of c j and j should be based on two considerations. First, the choice determines a neighborhood of 0 where SSVS treats m j as equivalent to zero. This neighborhood is obtained as (? j ; j ), where? j and j are the intersection points of the densities N(0; j ) and N(0; c j j ). Because (? j; j ) is the interval where N(0; j ) dominates N(0; c j j ), the posterior probability of j = 0 is more likely to be large when m j is in (? j ; j ). Thus SSVS entails estimating m j by 0, according to whether m j is in (? j ; j ). Second, the choice determines how dierent the two densities are. If N(0; j ) is too peaked or N(0; c j j ) is too spread out, the Gibbs sampler may converge too slowly when the data are not very informative.

Bayesian outlier detection 5 Our recommentation is to rst choose j. This may be obtained by practical considerations such as setting j equal to the largest value of j m j j for which a plausible change in y j would make no practical dierence. Once j has been chosen, we recommand choosing c j between 0 and 00. This range for c j seems to provide seperation between N(0; j ) and N(0; c j j ) which is large enough to yield a useful posterior and small enough to avoid Gibbs sampling convergence problems. When such prior information is available, we also recommand choosing c j so that N(0; c j j ) give reasonable probability to all plausible values of m j. Finally, j is obtained from j and c j by j = [log(c j )c j =(c j? )]? j. Note that for c j = 0; j = :5 j and for c j = 00; j = 3:04 j. For selecting the regression parameters in linear model, a similar semiautomatic strategy for selecting j and c j based on statistical signicance is described in George and McCulloch(993)..3. Full Conditional Distributions Densities are denoted generically by brackets, so joint, conditional, and marginal forms, for example, appear as [X; Y ]; [X j Y ] and [X], respectively. The joint posterior density of ; m; ; given y ; ; y n is given by [; m; ; j y ; ; y n ] / () n j I j expf? (Y? X ~ ) ~ t? (Y? X ~ )g ~ () j j expf? (? )t (? )g () n j D RD j expf? mt (D RD )? mg (V =) ny i=?(v =) expf? V ( ) V + p i i (? p i)? i (.0) In order to Gibbs sampler, the full conditional distributions are needed as follows; [ j m; ; ; y ; ; y n ] / expf? (? ~ t ~ X t ~ X ~?? ~ t ~ X t Y + t? t )g; g (.) [m j ; ; ; y ; ; y n ] / expf? (? ~ t ~ X t ~ X ~?? ~ t ~ X t Y + m t (D RD )? m)g; (.) [ j ; m; ; y ; ; y n ] = IG( n + V ; jy? X ~ j ~ + V ) (.3)

6 Younshik Chung and Hyungsoon Kim Finally, the vector is obtained componentwise by sampling consecutively from the conditional distribution i [ i j y ; ; y n ; ; m; ; (?i)] = [ i j m; ; (?i)] (.4) where (?i) = ( ; ; i?; i+ ; ; n ). The last equality in (3.5) holds because the independency of (3.5) on Y results from the hierarchical structure where aects Y only through m and is independent of by the assumption above. Since [m; i ; (?i); ] = [m j i ; (?i); ][ j i ; (?i)][ i ; (?i)], [ i = j y ; ; y n ; ; m; ; (?i)] = [m; i = ; (?i); ] k=0 [m; i = k; (?i); ] = a a + b (.5) where a = [m j i = ; (?i); ][ j i = ; (?i)][ i = ; (?i)]and b = [m j i = 0; (?i); ][ j i = 0; (?i)][ i = 0; (?i)]. Note that under the prior (.7) on and when the prior parameters for in (.9) are constant, then (3.6) can be obtained more simply by a = [m j i = ; (?i)]p i ; b = [m j i = 0; (?i)](? p i ): (.6) 3.. Simulate data 3. Illustrative Examples In this section we illustrate the performance of SSVS on simulated examples. We consider the simple linear regression model, that is, y i = 0 + x i + i ; i = ; ; n (3.) with n = 0 and let m i = 0 for i 6= 7 and m 7 = 0. x i N(0; ), i N(0; ) for i = ; ; 0 and ( 0 ; ) = (0:5; ). Therefore since m 7 = 0, we assume that observation 7 is an outlier in this data set. For the notational convenience, let and Then 0 = yi + ( 0 + )? ( x i + )? i n + = xi y i + ( 0 + )? 0 ( x i + )? i x i n + [ 0 j ; m; ; ; y ; ; y n ] = N( 0 ; ); (3.)

Bayesian outlier detection 7 For i = ; ; n, [ j 0 ; m; ; ; y ; ; y n ] = N( ; ); (3.3) [m i j 0 ; ; m (?i); ; ; y ; ; y n ] = N( mi ; ) (3.4)? + (a i i )? where mi = y i? 0? x i + (a i i )?. [ j ; m; ; y ; ; y n ] = IG( n ; jy? X ~ j ~ ) (3.5) [ i = j y ; ; y n ; ; m; ; (?i)] = a a + b where a = [m j i = ; (?i)]p i ; b = [m j i = 0; (?i)](? p i ): (3.6) Table 3.. outlier numbers observation proportion 0 0.6 0.0 3 0.0 4 0.0 6 0.0 7 0.36 0 0.05, 7 0.0, 7 0.0 4, 7 0.0 5, 7 0.0 7, 9 0.0 7, 0 0.03 9, 0 0.0 3,, 5 0.0, 6, 7 0.0 4, 4, 6, 0 0.0, 7, 8, 9 0.0 We applied SSVS to our model with the indierence prior f() = ( )0 : = = 0 = 0:5; c = = c 0 = 0, R = I, and = 0. In Table 3., observation 7

8 Younshik Chung and Hyungsoon Kim is considered as outlier since its correponding posterior probability is 0.36 which is highest among all data. Also, we may consider that there is no outliers in data set since its posterior probability is second highest. 3.. Darwin's Data Consider the analysis of Darwin's data on the dierence in heights of selfand cross-fertilized plants quoted by Box and Tiao(973, p53). The data consists of measurements on 5 pairs of plants. Each pair contained a self-fertilized and cross-fertilized plant grown in the same pot and from the same seed. Arranged for convenience in order of magnitude, the n=5 observations (on dierences in in heights in eighths of an inch of self-fertilized and cross-fertilized plants) are: -67, -48, 6, 8, 4, 6, 3, 4, 8, 9, 4, 49, 56, 60, 75. Guttman, Dutter and Freeman(978) re-examine the Darwin's data to detect outlier(s) using Bayesian approach with the model as follows; Y = + ; N(0; I n ) (3.7) where = (; : : : ; ) t. Guttman et al(978) mentioned that observations and, having values -67 and -48, are identied as spurious observations since they have the highest posterior probability. Table 3. shows that observations and, having values -67 and -48 respectively, are considered as outliers since they have the highest posterior probability 0.57. Also, observations, and 7 may be considered as outliers. Table 3.. outlier numbers observation proportion 0 0.00 0.00, 0.57 3,, 7 0.38,, 8 0.0 4,, 7, 8 0.04 4. Extension to Linear Mixed Normal Model In this section, we use the mean-shift model in linear mixed model. Specically, we assume that for some particular (n p) matrix X and (n l) matrix Z of constants(or design matrices), it is intended to generate data Y = (Y ; ; Y n ) t,

Bayesian outlier detection 9 such that Y = X + Zb + (4.) where = ( ; ; p ) t and b = (b ; : : : ; b l ). And we assume that the (n ) error vector is normally distributed with mean 0 and variance-covariance matrix I n, where is unknown and the (l ) vector b is assumed to be normal with zero mean vector and the variance-covariance matrix b I l. Also the independency of b and are assumed. For detecting the outliers, we consider that observations y may be generated as follows; for i = ; ; n, y i = px j= x ij j + lx j= z ij b j + m i + i : (4.) Therefore if m i = 0, then ith observation is not an utlier. Otherwise, it is considered as an outlier. For detecting outliers, we use SSVS(stochstic search variable selection) method in section. Therefore our model is reexpressed as follows; Y = ~ X ~ + Zb + (4.3) where X ~ = [X; In ] and ~ = ( ; ; p ; m ; ; m n ) t identity matrix. and I n denotes the n n As before, we use all forms in (.3) - (.7) into our model in (4.3). Also, it is assumed that the prior of = ( ; ; p ) is normal distribution with mean vector = ( ; ; p ) and variance-covariance matrix?. That is, Finally, we use the inverse gamma conjugate prior N(;? ): (4.4) j IG( ; ); b IG( b ; b b ): (4.5) 4.. Full Conditional Densities The joint posterior density of ; b; m; ; given y ; ; y n is given by [; b; m; ; ; b j y ; ; y n ] / expf? ( ) n (Y? X ~ ~? Zb) t (Y? X ~ ~? Zb)g j j expf? (? )t (? )g

0 Younshik Chung and Hyungsoon Kim () n j D RD j expf? mt (D RD )? mg V expf? ( ) V + g expf? V b b (b ) V b + b g ny i= p i i (? p i)? i (b )? l expf? bt b b g (4.6) In order to Gibbs sampler, the full conditional distributions are needed as follows; [ j b; m; ; ; y ; ; y n ] (4.7) / expf? ( ~ t X ~ t X ~ ~? ~ t X ~ t Y + ~ t X ~ t )? (t? t )g; [b j ; m; ; ; b ; y ; ; y n ] (4.8) / expf? [ b b (?Y t Zb + ~ t XZb ~ + b t Z t Zb) + b t b]g [m j ; b; ; ; b ; y ; ; y n ] (4.9) / expf? (Y? X ~ ~? Zb) t (Y? X ~ ~? Zb)? mt (D RD )? mg [ j ; b; m; b ; ; y ; ; y n ] = IG( n + V ; bt b + V b b ) (4.0) [ b j ; b; m; ; ; y ; ; y n ] = IG( l + V b Finally, the vector is dened in (.5) and (.6). 4.. Generated data ; jy? X ~ ~? Zbj + V ) (4.) In this section we illustrate the performance of SSVS on simulated example. We consider the simple linear mixed normal model, that is, y i = 0 + x i + b z i + b z i + i ; i = ; ; n (4.) with n = 0, x i ; z i ; z i N(0; ), i N(0; ) for i = ; ; 0 and ( 0 ; ) = (0:5; ) and b i N(0; ) for i = ;. And letm 7 = 0 and m i = 0 for i 6= 7. Therefore we assume that observation 7 is outlier in this data set. For the notational convenience, let 0 = yi + ( 0 + )? ( x i + )? b zi? b zi? m i n + ;

Bayesian outlier detection = ( 0 + )? 0 ( x i + )? (b x i z i + b x i z i + m i x i? x i y i ) x i + b = b [yi z i? ( 0 + x i + m i )z i? b z i z i ] b z i + and b = b [yi z i? ( 0 + x i + m i )z i? b z i z i ] b z i + Then [ 0 j ; b; m; ; ; y ; ; y n ] = N( 0 ; [ j 0 ; b; m; ; ; y ; ; y n ] = N( ; n + ); (4.3) x i + ); (4.4) [b j b ; ; m; ; ; y ; ; y n ] = N( b ; b b z i + ) (4.5) [b j b ; ; m; ; ; y ; ; y n ] = N( b ; For i = ; ; n, [m i j 0 ; ; m (?i); ; ; y ; ; y n ] = N( mi ; b b z i + ) (4.6) ) (4.7)? + (a i i )? where mi = y i? 0? x i + (a i i )?. [ j ; m; ; y ; ; y n ] = IG( n ; jy? X ~ j ~ ) (4.8) [ i = j y ; ; y n ; ; m; ; (?i)] = a a + b (4.9) where a = [m j i = ; (?i)]p i ; b = [m j i = 0; (?i)](? p i ): We applied SSVS to our model with the indierence prior f() = ( )0 : = = 0 = 0:5; c = = c 0 = 0, R = I, and = 0 and b = 0. Table 4. shows that observation 7 is considered as an outlier since its proportion (which is an approximate posterior probability) is highest. Also, it may be assumed that there are no outliers.

Younshik Chung and Hyungsoon Kim Table 4.. outlier numbers observation proportion 0 0.3 0.09 4 0.03 6 0.0 7 0.35 8 0.07 0 0.0, 4 0.0, 7 0.06, 9 0.0 4, 6 0.0 4, 8 0.0 6, 7 0.0 6, 0 0.0 3, 4, 8 0.0 4, 8, 7 0.0 4, 4, 9, 0 0.0, 7, 8, 9 0.0 References Box, G. E.. and Tiao, G. C. (968), A Bayesian Approach to Some Outlier roblems, Biometrika, 55, 9-9. Box, G. E.. and Tiao, G. C. (973), Bayesian Inference in Statistical Analysis, Addison-Wesley ublishing Co., U.S.A. Chaloner, K. and Brant, R. (988), A Bayesian Approach to Outlier Detection and Residual Analysis, Biometrika, 75, 65-659. Diebolt, J. and Robert, C. (994). Estimation of Finite Mixture Distributions Through Bayesian Sampling, Journal of royal Statistical Society, Ser. B, 363-375. Geisser, S. (985), On the redicting of Observables: A Selective Update, in Bayesian Statistics, Ed. Bernardo, J. M., DeGroot, M. H., Lindley, D. V. and Smith, A. F. M., 03-30, Amsterdam: North Holland.

Bayesian outlier detection 3 Gelfand, A. and Smith, A. F. M. (990), Sampling-Based Approaches to Calculating Marginal Densities, Journal of the American Statistical Association, 85, 398-409. George, E.I. and McCulloch, R.E. (993), Variable Selection Via Gibbs Sampling, Journal of American Statistical Association, 88, 88-889. Guttman, I. (973), Care and Handling of Univariate or Multivariate Outliers in Detecting Spuriosity - A Bayesian Approach, Technometrics, 5, 4, 73-738. Guttman, I., Dutter, R. and Freeman,. R. (978), Care and Handling of Univariate Outliers in the General Linear Model to Detect Spuriosity - A Bayesian Approach, Technometrics, 0,, 87-93. Guttman, I. and ena, D. (993), A Bayesian Look at Diagnostics in the Univariate Linear Model, Statistical Sinica, 3, 367-390. Johnson, W. and Geisser, S. (983), A redictive View of the Detection and Characterization of Inuential Observations in Regression Analysis, Journal of the American Statistical Association, 78, 37-44. ettit, L. I. and Smith, A. F. M. (985), Outliers and Inuential Observations in Linear Models, in Bayesian Statistics, Ed. Bernardo, J. M., DeGroot, M. H., Lindley, D. V. and Smith, A. F. M., 473-494, Amsterdam: North Holland. Sharples, L. D. (990) Identication and Accommodation of Outliers in General Hierarchical Models, Biometrika, 77, 3, 445-453. Tanner, M. and Wong, W. (987), The Calculation of osterior Distributions by Data Augmentation (with discussion), Journal of the American Statistical Association, 8, 58-550.