Bayesian inference for the location parameter of a Student-t density Jean-François Angers CRM-2642 February 2000 Dép. de mathématiques et de statistique; Université de Montréal; C.P. 628, Succ. Centre-ville ; Montréal, Québec, H3C 3J7;angers@dms.umontreal.ca This research has been partially funded by NSERC, Canada
Abstract Student-t densities play an important role in Bayesian statistics. For example, suppose that an estimator of the mean of a normal population with unknown variance is desired, then the marginal posterior density of the mean is often a Student-t density. In this paper, estimation of the location parameter of a Student-t density is considered when its prior is also a Student-t density. It is shown that the posterior mean and variance can be written as the ratio of two finite sums when the number of the degrees of freedom of both the likelihood function and the prior are odd. When one of them (or both) is even, approximations for the posterior mean and variance are given. The behavior of the posterior mean is also investigated in presence of outlying observations. When robustness is achieved, second order approximations of the estimator and its posterior expected loss are given. Mathematics Subject Classification: 62C0, 62F5, 62F35. Keyworks:Robust estimator, Fourier transform, Convolution of Student-t densities.
Introduction Heavy tail priors play an important role in Bayesian statistics. They can be viewed as an alternative to noninformative priors since they lead to estimators which are insensitive to misspecification of the prior parameters. However, they allow the use of prior information when it is available. Because of its heavier tails, the Student-t density is a robust alternative to the normal density when large observations are expected. (Here, robustness means that the prior information is ignored when it conflicts with the information contained in the data.) This density is also encountered when the data come from a normal population with unknown variance. In this paper, the problem of estimating the location of a Student-t density, under squarederror loss, is considered. To obtain an estimator which will ignore the prior information when it conflicts with the likelihood information, the prior density proposed in the paper is another Studentt density. However, it has fewer degrees of freedom than the likelihood. Consequently, the prior tails are heavier that those of the likelihood, resulting in an estimator which is insensitive to prior misspecification (cf. O Hagan, 979). This problem has been previously studied by Fan and Berger (990), Angers and Berger (99), Angers (992) and Fan and Berger (992). However, some conditions have to be imposed on the degrees of freedom in order to obtain an analytic expression for the estimator. A statistical motivation of the importance of this problem can be found in Fan and Berger (990). In Section 2 of this paper, it is assumed that the degrees of freedom of both the prior and the likelihood are odd. Using Angers (996a), an alternative form (which is sometimes easier to use (cf. Angers, 996b)) for the estimator is also proposed in this section. In Section 3, it is shown that the effect of large observation of the proposed estimator is limited. In the last section, using Saleh (994), an approximation is considered for the case where the number of degrees of freedom of the likelihood function is even. 2 Development of the estimator odd degrees of freedom Let us consider the following model: X θ T 2k+ (θ, σ), θ T 2κ+ (µ, τ), where σ, µ and τ are known and both k and κ are in N. The notation T m (η, ν) denotes the Student-t density with m degrees of freedom and location and scale parameters respectively given by η and ν, that is f m (x η, ν) = ( Γ([m + ]/2) ν + mπγ(m/2) ) [m+]/2 (x η)2. () mν 2 Since the hyperparameters are assumed to be known, we suppose, without loss of generality, that µ = 0 and σ =. The general case can be obtained by replacing X by σx + µ and θ by θ + µ in Theorems 2 and 3. In Angers (996a), the following theorem is proved.
Theorem. If X θ g(x θ) and if θ τ τ h(θ/τ), then m(x) = marginal density of X evaluated at x = I 0 (x), (2) θ(x) = posterior expected mean of θ = x i I (x) I 0 (x), (3) ρ(x) = posterior variance of θ ( ) 2 I (x) = I 2(x) I 0 (x) I 0 (x), (4) where i =, I j (x) = F {ĥ(τs)ĝ(j) (s); x}, ĥ(s) denotes the Fourier transform of h(x), ĝ(j) (s) the j th derivative of the Fourier transform of g(x) and F { f; x} represents the inverse Fourier transform of f evaluated at x. Applications of this theorem to several models can be found in Leblanc and Angers (995) and Angers (996a, 996b). In order to compute equations (2),(3) and (4), the Fourier transform of a Student-t density is needed. It is given, along with its first two derivatives, in the following proposition. Since the proof is mostly technical, it is omitted. Proposition. If X T m (0, σ), then f m (s) = ( mσ s ) m/2 2 [m 2]/2 Γ(m/2) K m/2( mσ s ), f m(s) = σ sign(s) ( mσ s ) m/2 2 [m 2]/2 Γ(m/2) K [m 2]/2( mσ s ), f m(s) = mσ 2 f(s) m(m )σ2 ( mσ s ) m 2 K 2 [m 2]/2 [m 2]/2 ( mσ s ), Γ(m/2) where K m/2 (s) denotes the modified Bessel function of the second kind of order m/2. Note that if m = 2k + where k N, then, using Gradshteyn and Ryzhik (980, equation (8.468)), we have that K m/2 ( mσ s ) = K k+/2 ( mσ s ) π = (2σ k (2k p)! 2k + ) k+/2 s k p!(k p)! (2σ 2k + s ) p. (5) p=0 To obtain the marginal density of x, the posterior mean and variance of θ, we need to compute F { f 2κ+ (τs) (j) f 2k+ (s); x}, for j = 0, and 2. Hence, the following two integrals need to be evaluated: A k,l (x) = 0 cos( x s)s k+κ l+ K k l+/2 ( 2k + s) K κ+/2 ( 2κ + τs)ds, (6) 2
for l = 0 and and B k (x) = 0 sin( x s)s k+κ+ K k /2 ( 2k + s) K κ+/2 ( 2κ + τs)ds. (7) Using Angers (997), we can also show that Theorem 2. m 2k+ (x) = (2k + )[2k+]/4 (2κ + ) [2κ+]/4 τ 2κ+ A 2 k+κ k,0 (x), πγ(k + /2)Γ(κ + /2) θ 2k+ (x) = x 2k + sign(x) B k(x) A k,0 (x), 2k A ρ 2k+ (x) = (2k + ) k, (x) 2k + A k,0 (x) + ( ) 2 Bk (x). A k,0 (x) In order to compute equations (6) and (7), we need the following lemma. This lemma can be proven using Gradshteyn and Ryzhik (980, equations (3.944.5) and (3.944.6)). Lemma. 0 0 s a cos(xs) e bs Γ(a + ) ds = (b 2 + x 2 ) cos([a + ] [a+]/2 tan (x/b)), s a sin(xs) e bs Γ(a + ) ds = (b 2 + x 2 ) sin([a + ] [a+]/2 tan (x/b)). Using Lemma and equation (5), the functions A k,l (x) and B k (x) can be easily evaluated and they are given in the following theorem. Theorem 3. A k,l (x) = π 2 k l+κ+ (2k + ) [2(k l)+]/4 ([2κ + ]τ 2 ) [2κ+]/4 k l κ (2[k l] p)! (2κ q)! p=0 q=0 (k l p)! (κ q)! ( ) p + q 2 p+q (2k + ) p/2 ([2κ + ]τ 2 ) q/2 q ([ 2k + + τ (8) 2κ + ] 2 + x 2 ) [p+q+]/2 [ ]) cos ([p + q + ] tan x, 2k + + τ 2κ + B k (x) = π 2 k+κ (2k + ) [2k ]/4 ([2κ + ]τ 2 ) [2κ+]/4 k κ (2[k ] p)! (2κ q)! p=0 q=0 (k p)! (κ q)! ( ) p + q 2 p+q (2k + ) p/2 ([2κ + ]τ 2 ) q/2 q ([ 2k + + τ (9) 2κ + ] 2 + x 2 ) [p+q+2]/2 [ ]) sin ([p + q + 2] tan x. 2k + + τ 2κ + 3
Using Theorems 2 and 3, the posterior expected mean and the posterior variance can be computed using only a ratio of two finite sums. In Section 4, the case where the likelihood function is a Student-t density with an even number of degrees of freedom is considered. In this situation, the posterior quantities cannot be written using finite sums, although they can be expressed as the ratio of two infinite series (cf. Angers, 997). However, using an approximation for the Student-t density (cf. Saleh94), θ 2k and ρ 2k (x) can be approximated accurately. Before doing so, we first discuss two limit cases, that is, when x is large and when τ. 3 Special cases The main advantage of using a heavy-tails prior is that the resulting Bayes estimator, under the squared-error loss, is insensitive to the choice of prior when there is a conflict between the prior and the likelihood information. This situation is considered in the next subsection. 3. Behavior of θ 2k+ (x) for large x In order to study the behavior of θ 2k+ (x) for large values of x, it should first be noted that ( ) ( ) tan x 2k + + τ 2κ + = cos 2k + + τ 2κ + [ 2k + + τ 2κ + ] 2 + x 2 ( = sin x 2 ) [ 2k + + τ. 2κ + ] 2 + x 2 Using these last equalities in equations (8) and (9), the following theorem can be proven. Theorem 4. where A k,l = B k (x) = π 2 k l+κ+ (2k + ) [2(k l)+]/4 ([2κ + ]τ 2 ) [2κ+]/4 [ ] c l ([ 2k + + τ 2κ + ] 2 + x 2 ) + 2 O( x 6 ), π 2 k+κ (2k + ) [2k ]/4 ([2κ + ]τ 2 ) [2κ+]/4 [ ] 4c x ([ 2k + + τ 2κ + ] 2 + x 2 ) + 3 O( x 6 ), (2[k l])!(2κ)! c l = [ 2k + + τ 2κ + ] { 2[ 2k + + τ 2κ + ] (k l)!κ! ( 2[κ ] 3 2κ (2κ + )τ 2 I 2 (κ) + τ 2k + 2κ + I (κ)i (k l) )} 2[k l ] +2 2[k l] (2k + )I 2(k l), if b {a, a +,...}, I a (b) = 0 otherwise. 4
Using this theorem, it can be shown that if x, then ( θ 2k+ (x) = 8c ) 2k + c 0 [ 2k + + τ x + O( x 2 ), 2κ + ] 2 + x 2 ( ρ 2k+ (x) = (2k + ) 4k c ) + O( x 2 ). c 0 Note that, as expected, θ 2k+ (x) collapses to x when a conflict occurs between the prior and the likelihood information. 3.2 Behavior of θ 2k+ (x) for large τ If the prior scale parameter is large, the resulting Bayes estimator should be close to the one obtained using a uniform prior on θ (i.e., π(θ) ). In this subsection, the behavior of θ 2k+ (x) and ρ 2k+ (x) are considered when τ. If τ is large, tan (x/[ 2k + + τ 2κ + ]) = (x/[ 2k + + τ 2κ + ]) + O(τ 3 ). Consequently, ([p + q + ] tan [ cos [ ([p + q + 2] tan sin x 2k + + τ 2κ + ]) = + O(τ 2 ), (0) x 2k + + τ 2κ + ]) = (p + q + 2)x 2k + + τ 2κ + + O(τ 3 ). () Substituting equations (0) and () in equations (8) and (9), we obtain the following theorem. Theorem 5. A k,l (x) = B k (x) = π (2[k l])! 2 k l+κ+ (2k + ) (2[k l]+)/4 ([2κ + ]τ 2 ) (2κ+)/4 (k l)! S 0 [ 2k + + τ + O(τ 2 ), 2κ + ] 2 + x 2 π (2[k ])! 2 k+κ (2k + ) (2k+)/4 ([2κ + ]τ 2 ) (2κ+)/4 (k )! [ ] S x τ 2κ + ([ 2k + + τ 2κ + ] 2 + x 2 ) + O(τ 4 ), where S 0 = S = κ q=0 κ q=0 (2κ q)! (κ q)! 2q, (2κ q)! (κ q)! (q + )(q + 2)2q. 5
Table : Maximum error for k =, 2,..., 5 k 2 3 4 5 η 2.737 2.400 2.299 2.253 2.226 max η e k (η) 5.2 0 6.2 0 7 9.6 0 9.4 0 9 2.9 0 0 η 3.049 2.55 2.39 2.330 2.294 max η ηe k (η).5 0 5 3.0 0 7 2.2 0 8 3.2 0 9 6.6 0 0 k 6 7 8 9 0 η 2.208 2.95 2.85 2.77 2.72 max η e k (η) 8.0 0 2.6 0 9.9 0 2 4. 0 2.8 0 2 η 2.272 2.256 2.244 2.234 2.227 max η ηe k (η).8 0 0 5.8 0 2.2 0 9. 0 2 4. 0 2 k 2 3 4 5 η 2.67 2.63 2.59 2.56 2.54 max η e k (η) 9.2 0 3 4.8 0 3 2.6 0 3.5 0 3 8.7 0 4 η 2.22 2.26 2.22 2.208 2.205 max η ηe k (η) 2.0 0 2.0 0 2 5.7 0 3 3.2 0 3.9 0 3 Using the previous theorem, it can be shown that ( θ 2k+ (x) = 2k + S 2k S 0,0 τ 2κ + ( 2k + + τ x 2κ + ) 2 + x 2 + O(τ 3 ), ρ 2k+ (x) = 2k + 2k + O(τ ). Hence, θ 2k+ (x) has the desired behavior. In the next section, approximations for θ 2k (x) and ρ k (x) are given. 4 Even number of degrees of freedom for the likelihood function In Saleh (994), it is shown that f 2k (x) = 2k 4k f 2k (x) + 2k + 4k f 2k+(x) + e k (x), (2) where f m (x) is given by equation () with η = 0 and ν =. (Note that other approximations for the Student-t density are discussed in Saleh (994).) The term e k (x) represents an error term. Using Mathematica, the maximum of e k (η) is approximately given by max η R e k (η).58 0 5 /k 6.782. In Table, we computed max η e k (η) for k = to 5 along with the values of η, denoted by η, where the maximum occurs. It can be seen that the maximum error becomes negligible as k increases. 6
Using equation (2), the Fourier transform of f 2k (x) given in Proposition can also be approximated by f 2k (s) 2k 4k = 2k 4k f 2k (s) + 2k + f 2k+ (s) 4k ( 2k s ) k /2 2 k 3/2 Γ(k /2) K k /2( 2k s ) + 2k + ( 2k + s ) k+/2 4k 2 k /2 Γ(k + /2) K k+/2( 2k + s ). Using this approximation, the following theorem can be proved. Theorem 6. If X θ T 2k (θ, ) and θ T 2κ+ (0, τ), then θ 2k (x) w k (x) θ 2k (x) + ( w k (x)) θ 2k+ (x), (3) ρ 2k (x) w k (x)ρ 2k (x) + ( w k (x))ρ 2k+ (x) w k (x)( w k (x))( θ 2k+ (x) θ 2k (x)) 2, where w k (x) = (2k ) [2k+7]/4 A k,0 (x) (2k ) [2k+7]/4 A k,0 (x) + (2k + ) [2k+5]/4 A k,0 (x). In order to see if the approximation given in Theorem 6 is accurate, let θ 2k (x) denote the exact Bayes estimator of θ (cf. Angers, 997) and θ 2k (x), its approximation using equation (3). Then, it can be shown that θ 2k (x) θ 2k (x) = E (x) E 0 (x)(x θ 2k (x)), m(x) E 0 (x) where E i (x) = ηi e k (η)π(η x)dη for i = 0, and m(x) represents the marginal density of x using the approximation given by equation (2). Using Mathematica, we also tabulated in Table, the value of max η R ηe k (η) for k = to 5 along with the value of η, denoted by η, for which the maximum occurs. Fitting a log-log regression model, we obtain that max η R ηe k(η) 3.058 0 5 k 6.862. The plot of the approximation error (in absolute value), that is θ 2k (x) θ 2k (x), is also given in Figure for 2k = 4, 6 and 0 and κ = 0 and for 0 x 0. (Note that θ 2k (x) has been computed using equation (3) and the I l (x) integrals were evaluated using the Monte Carlo integration technique.) The marginal density (0m 3 (x)) is also plotted in Figure to indicate which values of x have a large likelihood. From Figure it can be seen that the maximum error occurs around x = 5 and that it decreases as k increases. For small values of x (values for which the marginal of X is maximal), the error does not depend much on k. For large values of x the approximation is better for larger k. 5 Conclusion In this paper, we provide an exact (and closed form) solution for the estimation of a Studentt location parameter when the prior is also a Student-t density and both numbers of degrees of 7
0.8 0.6 0.4 0.2 2 4 6 8 0 Figure : Approximation error for 2k = 4 (top curve), 2k = 6 (middle curve) and 2k = 0 (bottom curve) and κ = 0 freedom are odd. This estimator is also shown to be insensitive to misspecification of the prior location and scale parameter. It also corresponds to the generalized Bayes estimator (based on π(θ) ) when τ is large. When the number of degrees of freedom of the likelihood function is even, the previous estimator does not apply. However, based on this estimator, an approximation of θ 2k (x) is proposed. This case can be easily generalized to the cases where either the prior or the likelihood, or both, are Student-t densities with an even number of degrees of freedom. References [] Angers, J.-F. (992). Use of Student-t prior for the estimation of normal means: A computational approach. Bayesian Statistics IV (J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, eds.), Oxford University Press, 567 575. [2] Angers, J.-F. (996a). Fourier transform and Bayes estimator of a location parameter. Statistics & Probability Letters 29, 353 359. [3] Angers, J.-F. (996b). Protection against outliers using a symmetric stable law prior. In IMS Lecture Notes - Monograph Series, 29, 273 283. [4] Angers, J.-F. (997). Bayesian estimator of the location parameter of a student-t density. Technical Report 97-07, University of Nottingham, Nottingham University Statistics Group. [5] Angers, J.-F. and J. O. Berger (99). Robust hierarchical Bayes estimation of exchangeable means. The Canadian Journal of Statistics 9, 39 56. [6] Fan, T. H. and J. O. Berger (990). Exact convolution of t distributions, with applications to Bayesian inference for a normal mean with t prior distributions. Journal of Statistical Computing and Simulation 36, 209 228. 8
[7] Fan, T. H. and J. O. Berger (992). Behaviour of the posterior distribution and inferences for a normal means with t prior distributions. Statistics & Decisions 0, 99 20. [8] Gradshteyn, L. S. and I. M. Ryzhik (980). Table of integrals, series and products. New York: Academic Press. [9] Leblanc, A. and J.-F. Angers (995). Fast Fourier transforms and Bayesian estimation of location parameters. Technical Report DMS-380, Université de Montréal, Département de mathématiques et de statistique. [0] O Hagan, A. (979). On outlier rejection phenomena in Bayes inference. Journal of the Royal Statistical Society Ser. B 4, 358 367. [] Saleh, A. A. (994). Approximating the characteristic function of the student s t distribution. The Egyptian Statistical Journal 39(2), 77 95. 9