On the Conditional Distribution of the Multivariate t Distribution

Similar documents
Bayesian Inference for the Multivariate Normal

RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T DISTRIBUTION

Online tests of Kalman filter consistency

Tools for Parameter Estimation and Propagation of Uncertainty

On Bayesian Inference with Conjugate Priors for Scale Mixtures of Normal Distributions

Bayesian linear regression

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Discussion on Fygenson (2007, Statistica Sinica): a DS Perspective

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

STA 216, GLM, Lecture 16. October 29, 2007

eqr094: Hierarchical MCMC for Bayesian System Reliability

Nonparametric Bayesian Methods (Gaussian Processes)

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Testing Restrictions and Comparing Models

Research Article The Laplace Likelihood Ratio Test for Heteroscedasticity

STATISTICAL INFERENCE WITH DATA AUGMENTATION AND PARAMETER EXPANSION

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bayesian Regression Linear and Logistic Regression

Uniform Correlation Mixture of Bivariate Normal Distributions and. Hypercubically-contoured Densities That Are Marginally Normal

Bayesian data analysis in practice: Three simple examples

Quantile POD for Hit-Miss Data

Chris Bishop s PRML Ch. 8: Graphical Models

Bayesian Inference. p(y)

A Note on Bayesian Inference After Multiple Imputation

Diagnosing Suboptimal Cotangent Disintegrations in Hamiltonian Monte Carlo

Contents. Part I: Fundamentals of Bayesian Inference 1

Stat 5101 Lecture Notes

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

The Goodness-of-fit Test for Gumbel Distribution: A Comparative Study

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression

Markov Chain Monte Carlo methods

arxiv: v1 [math.st] 7 Jan 2014

On Reparametrization and the Gibbs Sampler

McGill University. Department of Epidemiology and Biostatistics. Bayesian Analysis for the Health Sciences. Course EPIB-682.

The Variational Gaussian Approximation Revisited

Computation of Signal to Noise Ratios

Identifiability problems in some non-gaussian spatial random fields

A Bayesian Treatment of Linear Gaussian Regression

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions

Sampling Methods (11/30/04)

STA 4273H: Statistical Machine Learning

A Semi-parametric Bayesian Framework for Performance Analysis of Call Centers

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Prior Distributions for Variance Parameters in Hierarchical Models. Andrew Gelman. EERI Research Paper Series No 6/2004

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Statistical Practice

Biostat 2065 Analysis of Incomplete Data

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Learning Energy-Based Models of High-Dimensional Data

Bayesian inference for multivariate skew-normal and skew-t distributions

Bayesian and empirical Bayesian approach to weighted averaging of ECG signal

Default Priors and Effcient Posterior Computation in Bayesian

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

To Estimate or Not to Estimate?

STATISTICAL ANALYSIS WITH MISSING DATA

NONLINEAR APPLICATIONS OF MARKOV CHAIN MONTE CARLO

Bayesian Inference. Chapter 1. Introduction and basic concepts

Pattern Recognition and Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Prediction of ESTSP Competition Time Series by Unscented Kalman Filter and RTS Smoother

Bayesian Mixture Modeling

Expectation Propagation in Dynamical Systems

Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal?

RESEARCH ARTICLE. Online quantization in nonlinear filtering

Markov chain Monte Carlo methods for visual tracking

Inferences on missing information under multiple imputation and two-stage multiple imputation

Student-t Process as Alternative to Gaussian Processes Discussion

Heriot-Watt University

MEASUREMENT UNCERTAINTY AND SUMMARISING MONTE CARLO SAMPLES

Dynamic System Identification using HDMR-Bayesian Technique

Finite Population Sampling and Inference

A Single Series from the Gibbs Sampler Provides a False Sense of Security

Will Penny. SPM short course for M/EEG, London 2013

Hierarchical Bayesian Modeling

arxiv: v1 [math.pr] 10 Oct 2017

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Machine Learning Techniques for Computer Vision

Monitoring Random Start Forward Searches for Multivariate Data

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Statistical techniques for data analysis in Cosmology

A student's t filter for heavy tailed process and measurement noise

Reduction of Random Variables in Structural Reliability Analysis

Reconstruction of individual patient data for meta analysis via Bayesian approach

ECE521 week 3: 23/26 January 2017

A Bayesian perspective on GMM and IV

STA 4273H: Statistical Machine Learning

Deciding, Estimating, Computing, Checking

Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated?

Directed acyclic graphs and the use of linear mixed models

Introduction to Bayesian Data Analysis

EM Algorithm II. September 11, 2018

Linear Models A linear model is defined by the expression

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES

Lecture 1 Bayesian inference

Miscellanea Kernel density estimation and marginalization consistency

ABHELSINKI UNIVERSITY OF TECHNOLOGY

Transcription:

On the Conditional Distribution of the Multivariate t Distribution arxiv:604.0056v [math.st] 2 Apr 206 Peng Ding Abstract As alternatives to the normal distributions, t distributions are widely applied in robust analysis for data with outliers or heavy tails. The properties of the multivariate t distribution are well documented in Kotz and Nadarajah s book, which, however, states a wrong conclusion about the conditional distribution of the multivariate t distribution. Previous literature has recognized that the conditional distribution of the multivariate t distribution also follows the multivariate t distribution. We provide an intuitive proof without directly manipulating the complicated density function of the multivariate t distribution. Key Words: Bayes Theorem; Data augmentation; Mahalanobis distance; Normal mixture; Representation. Introduction The conventional version of the multivariate t MVT distribution X t p µ,σ,ν, with location µ, scale matrix Σ, and degrees of freedom ν, has the probability density function fx= Γν+p/2} ν+p/2. Γν/2νπ p/2 Σ /2 +ν x µ Σ x µ} We can see from the above that the tail probability of the MVT distribution decays at a polynomial rate, resulting in heavier tails than the multivariate normal distribution. Because of this property, the MVT distribution is widely applied in robust data analysis including linear and nonlinear regressions Lange et al. 988; Liu 994; Liu 2004, linear mixed effects models Pinheiro et al. 200, and sample selection models Marchenko and Genton 202; Ding 204. Peng Ding is Assistant Professor Email: pengdingpku@gmail.com, Department of Statistics, University of California, Berkeley, 425 Evans Hall, Berkeley, CA 94720, USA. The author thanks the Editor, Associate Editor and three reviewers for their helpful comments.

In the following discussion, we use W χ 2 b /c to denote the scaled χ2 distribution, with density proportional to w b/2 e cw/2. We exploit the following representation of the MVT distribution: X =µ+σ /2 Z/ q, 2 where Z follows a p dimensional standard normal distribution, q χ 2 ν/ν, and Z is independent of q Kotz and Nadarajah 2004; Nadarajah and Kotz 2005. It differs from the multivariate normal distribution N p µ,σ only by the random scaling factor q. The above representation 2 implies X q N p µ,σ/q, i.e., X follows a multivariate normal distribution given the latent variable q. If we partition X into two parts, X andx 2, with dimensions p and p 2, we obtain the following normal mixture representation conditional on q: X = X X 2 µ Σ Σ q N p +p 2, 2 µ 2 Σ 2 Σ 22 } /q, 3 where the location and scale parameters are partitioned corresponding to the partition of X. Marginally, we have X q N p µ,σ /q, and therefore X t p µ,σ,ν, which follows a p dimensional MVT distribution with degrees of freedom ν. Although we can obtain the marginal distribution in an obvious way, the conditional distribution ofx 2 givenx is less transparent. In fact, the conditional distribution of the MVT distribution is also a MVT distribution, with degrees of freedom different from the original distribution. 2 Conditional Distribution Kotz and Nadarajah 2004, page 7 and Nadarajah and Kotz 2005 claimed that the conditional distribution of the MVT distribution is not a MVT distribution except for some extreme cases. For the case with µ = 0, define x 2 = x Σ 2 Σ x, and define Σ 22 = Σ 22 Σ 2 Σ Σ 2 as the Schur complement of the block Σ in matrix Σ. They derived the conditional density of X 2 given 2

X by calculating fx/ f x using the probability density function in. Ignoring the normalizing constant in formula 5 of Nadarajah and Kotz 2005, we present only the key term: f 2 x 2 x +ν+p x ν+x Σ x 2 Σ ν+p 22 x 2 ν+p +p 2 /2. 4 Kotz and Nadarajah 2004, page 7 and Nadarajah and Kotz 2005 obtained the correct conditional density function, but due to the complex form of their formula 5, they did not recognize that the key term in 4 is essentially the unnormalized probability density function of a p 2 dimensional MVT distribution with location Σ 2 Σ x, scale matrix ν+x Σ x /ν+p Σ 22, and degrees of freedom ν+p. In the literature, some authors stated the right conclusion about the conditional distribution without providing a proof e.g., Liu 994, page 5, DeGroot 2005, page 6, and some other authors re-derived the correct conditional density function in 4 and pointed out that it is another MVT distribution e.g., Roth 203b. Instead of directly calculating the conditional density or using the general theory of elliptically contoured distributions Cambanis et al. 98; Fang et al. 990, Theorems 2.8 and 3.8; Kibria and Joarder 2006, we provide an elementary and thus more transparent proof of the conditional distribution based on the normal mixture representations in 3. 3 Proof via Representation Our proof proceeds in two steps: we first condition on q, and then average over q. First, we condition on both X and q. According to the property of the multivariate normal distribution, X 2 follows a multivariate normal distribution conditional on X,q, i.e., X 2 X,q N p2 µ2 +Σ 2 Σ X µ,σ 22 Σ 2 Σ Σ 2/q } = N p2 µ2,σ 22 /q, where µ 2 =µ 2 +Σ 2 Σ X µ is the linear regression ofx 2 on X. Second, we obtain the conditional distribution of q given X. According to Bayes Theorem, the 3

conditional probability density function of q given X is f q q X f q X qπq Σ /q /2 exp q } 2 X µ Σ X µ q ν/2 e νq/2 q ν+p/2 exp q } 2 ν+ d, where d = X µ Σ X µ is the squared Mahalanobis distance of X from µ with scale matrix Σ. Therefore, the conditional distribution of q given X is q X χ 2 ν+p /ν+ d, according to the density of the scaled χ 2 distribution. Finally, we can represent the conditional distribution ofx 2 given X as X 2 X µ 2 +Σ /2 22 Z 2/ q X / µ 2 +Σ /2 22 Z χν+p 2 2 ν+ d ν+ d / µ 2 + Σ /2 χν+p ν+p 22 Z 2 2, 5 ν+p where Z 2 is a p 2 dimensional standard multivariate normal vector, independent of χ 2 ν+p /ν + p. From the normal mixture representation of the MVT distribution in 2, we can obtain the following conditional distribution. Conclusion One: The conditional distribution of X 2 given X is X 2 X t p2 µ 2, ν+ d Σ ν+p 22,ν+p. The conditional distribution of the multivariate t distribution is very similar to that of the multivariate normal distribution. The conditional location parameter is the linear regression of X 2 on X. The conditional scale matrix is Σ 22 inflated or deflated by the factor ν+ d /ν + p. Because d p Fp,ν with mean being p ν/ν 2 or infinity according to ν > 2 or ν 2 Roth 203a, page 83, on average the factor ν+ d /ν + p is larger than one. With more extreme values of X, the conditional distributions ofx 2 are more disperse. More interestingly, the conditional degrees 4

of freedom increase to ν + p. The more dimensions we condition on, the less heavy-tailedness we have. When ν =, the MVT distribution is also called the multivariate Cauchy distribution. The marginal distributions are still Cauchy, but the conditional distributions are MVT instead of Cauchy. Although the marginal means of the multivariate Cauchy distribution do not exist for any dimensions, all the conditional means exist because the conditional distributions of the multivariate Cauchy distribution follow MVT distributions with degrees of freedom at least as large as two. As a byproduct, we can easily obtain from the representation 5 that ν+p ν+ d X2 µ 2 X t p2 0,Σ22,ν+p, which does not depend on X. The above result further implies the following conclusion. Conclusion Two: X and ν+p ν+d X2 µ 2 are independent. It is straightforward to show that X and X 2 µ 2 are uncorrelated for both multivariate normal and MVT distributions. If X follows the multivariate normal distribution, this also implies independence between X and X 2 µ 2. If X follows the MVT distribution, we need only to adjust the linear regression residual X 2 µ 2 by the factor ν+p /ν+ d and the independence result still holds. 4 Discussion Kotz and Nadarajah 2004, page 7 and Nadarajah and Kotz 2005 failed to recognize that the conditional distribution of the MVT distribution is also a MVT distribution due to the complexity of the conditional density function. Our proof, based on the normal mixture representation of the MVT distribution, offers a more direct and transparent way to revealing the property of the conditional distribution of the MVT distribution. Our representation in Section 3 implicitly exploits the data augmentation formula f 2 x 2 x = f 2,q x 2 x,q f q q x dq Tanner and Wong 987. To make our proof more intuitive, we used formula 5 to avoid the integral, which is essentially a Monte 5

Carlo version of the data augmentation formula. Conclusion One could also follow from the general theory of elliptically contoured distributions, including the MVT distribution as a special case Cambanis et al. 980; Fang et al. 990. Conditional distributions of elliptically contoured distributions are also elliptically contoured distributions. But this does not immediately guarantee that conditional distributions of the MVT distributions are also MVT distributions without some further algebra. The MVT distribution has the property of having MVT marginal and conditional distributions. It will be interesting to find other sub-classes of elliptically contoured distributions that have the same property. References Cambanis, S., Huang, S. and Simons, G. 98. On the theory of elliptically contoured distributions. Journal of Multivariate Analysis,, 368 385. Ding, P. 204. Bayesian robust inference of sample selection using selection-t models. Journal of Multivariate Analysis, 24, 45 464. DeGroot, M. H. 2005. Optimal Statistical Decisions. New York: John Wiley & Sons. Fang, K. T., Kotz, S., and Ng, K. W. 990. Symmetric Multivariate and Related Distributions. London: Chapman & Hall. Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D., Vehtari, A., and Rubin, D. B. 204. Bayesian Data Analysis, 2nd edition. London: Chapman & Hall/CRC. Kibria, B. G., and Joarder, A. H. 2006. A short review of multivariate t-distribution. Journal of Statistical Research, 40, 59 72. Kotz, S. and Nadarajah, S. 2004. Multivariate t Distributions and Their Applications. New York: Cambridge University Press. Lange, K. L., Little, R. J., and Taylor, J. M. 989. Robust statistical modeling using the t distribution. 6

Journal of the American Statistical Association, 84, 88 896. Liu, C. 994. Statistical Analysis Using the Multivariate t Distribution. Doctoral dissertation, Harvard University. Liu, C. 2004. Robit regression: a simple robust alternative to logistic and probit regression. In Gelman, A. and Meng, X.-L. Eds. Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives pp. 227 238. New York: John Wiley & Sons. Marchenko, Y. V., and Genton, M. G. 202. A Heckman selection-t model. Journal of the American Statistical Association, 07, 304 37. Nadarajah, S. and Kotz, S. 2005. Mathematical properties of the multivariate t distribution. Acta Applicandae Mathematica, 89, 53 84. Pinheiro, J. C., Liu, C., and Wu, Y. N. 200. Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t distribution. Journal of Computational and Graphical Statistics, 0, 249 276. Roth, M. 203a. Kalman Filters for Nonlinear Systems and Heavy-Tailed Noise, Licentiate of Engineering Thesis, Linköping University, Linköping. Roth, M. 203b. On the multivariate t distribution. Technical report from Automatic Control at Linköpings Universitet. Report no.: LiTH-ISY-R-3059. Tanner, M. A., and Wong, W. H. 987. The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82, 528 540. 7