Default Bayesian Analysis of the Skew-Normal Distribution

Size: px

Start display at page:

Download "Default Bayesian Analysis of the Skew-Normal Distribution"

Alban Summers
6 years ago
Views:

1 Default Bayesian Analysis of the Skew-Normal Distribution Brunero Liseo Università di Roma La Sapienza Nicola Loperfido Università di Urbino August 23, 2002 Corresponding author: Dipartimento di studi geoeconomici, statistici e storici per l analisi regionale, Università di Roma La Sapienza, Via del Castro Laurenziano, 9 I Roma, Italia; brunero.liseo@uniroma1.it 1

2 Abstract The Skew Normal SN, hereafter) class of densities has posed several and interesting inferential problems. In particular, the maximum likelihood estimator of the shape parameter λ may take infinite values with positive sampling probability. To overcome these problems we propose an objective Bayesian approach, based on reference priors. We show that the reference prior for λ is proper when λ is the only parameter in the model; also, the reference prior for λ is marginally proper when location and scale parameters are added. This fact allows us i) to provide a coherent estimation strategy which can be justified also from a pure frequentist viewpoint, since the refrence priors are 2nd order matching priors, and ii) to consider the regular version of the Bayes factor for testing and model selection purposes. In the last years we have experienced a dramatic increase of statistical papers dealing with departures from normality, especially in terms of skewness and tail behaviour. Among the many proposals the SN class of sensities stands for its important mathematical properties, which have facilitated a large number of generalizations of the family in different directions. Despite this fact, many questions remain unsettled, especially from an inferential perspective. The strange behaviour of the maximum likelihood estimator, in fact, is not the only problem: also the method of moments can provide point estimates of the parameters which are outside the admissible range; also, when location and scale parameters are added to the model, the profile likelihood for λ always has a stationary point at λ = 0, independently of the observed data. Our default Bayes approach provides a general solution which overcomes these problems, and it can be easily implemented, as we illustrate with simulated and real data. In particular we build a skew-in-mean GARCH model to represent financial time series data, where innovation terms are assumed to follow a SN distribution. We apply this model to the United Kingdom FTSE index. This paper can be of interest also from a pure theoretical perspective: in fact, it provides the first example of a proper Jeffreys or reference) prior for an unbounded parameter. Key words: Anomalies of MLE, GARCH models; Jeffreys prior, Objective Bayes inference, Reference prior, Skewness. 2

3 1 Introduction. The Skew Normal SN, hereafter) class of densities independently appeared several times in statistical literature [Roberts 1966), O Hagan and Leonard 1976), Aigner and Lovell 1977)] It was given the present name by Azzalini 1985) and it was recently generalized to the multivariate case by Azzalini and Dalla Valle 1996) and Azzalini and Capitanio 1999). The SN class of densities extends the Normal model by allowing a shape parameter to account for skewness. The density function of the generic element of the class is fx; λ, µ, σ) = 2 σ ϕx µ σ )Φλx µ σ ) 1) where ϕ and Φ represents the pdf and the cdf of the standard Normal density, respectively, and λ is a real parameter. Positive negative) values of λ indicate positive negative) skewness; when λ = 0, one gets back to the Normal density. We shall write X SNλ, µ, σ) to denote a r.v. with density 1). The SN class enjoys remarkable properties in terms of mathematical tractability and it proved itself quite useful in modelling real data sets Azzalini and Capitanio 1999). There is actually a growing statistical literature which focuses on the use of sampling models which are able to capture non gaussian behaviour of the data: stochastic frontier models Aigner and Lovell 1977), portfolio selection of financial assets Adcock and Shutes 1999) and detection of skewness in stock returns [Bartolucci, De Luca and Loperfido 2000); De Luca and Loperfido 2001)] are the most recent example. In a related work Branco and Dey 2001) introduced a class of skewed elliptical densities which model non normality in the tails. The SN class plays a role in a Bayesian context too: it has been considered as a sampling model in Liseo 1990) and as a class of prior densities in a hierarchical gaussian model in O Hagan and Leonard 1976) or as a prior distribution for wavelet shrinkage analysis in Mukhopadhyay and Vidakovic 1995). Recently Liseo and Loperfido 2002) generalised the O Hagan and Leonard s results to the multivariate case. Despite the nice properties of the SN family, problems arise in the inferential steps. Since the pioneering paper of Azzalini 1985) it has been known that the estimation of the parameters is not easy. For simplicity consider the standard case σ = 1, µ = 0). From 1) one sees that the likelihood function associated to a n-dimensional sample, is 3

4 the product of n cdf s of the standard normal density: if all the observed x i s are positive negative) then the likelihood function will be monotonically increasing decreasing) and the maximum likelihood estimate for λ will be minus) infinite. However, even with positive and negative observations, the maximum likelihood estimator is quite unstable, with a positive probability of obtaining an infinite mle: as a result, the sampling distribution of the mle is difficult to work with and it is not clear how to compute the standard error of the estimates. In the general three-parameters case things are complicated by two reasons: the Fisher information matrix is singular as λ goes to 0. the profile likelihood function for λ has a stationary point at λ = 0, independently of the observed sample. Azzalini 1985) addresses the first problem by proposing a different parameterization: see also Chiogna 1998) and Pewsey 2001). The second problem is more serious: as stated in Azzalini and Capitanio 1999),... there are cases where the likelihood shape and the MLE are problematic. We are not referring here to difficulties with numerical maximization, but to the intrinsic properties of the likelihood function, not removable by change of parameterization. In case of this sort, the behaviour of the MLE appears quite unsatisfactory, and an alternative estimation method is called for.... Likelihood estimation methods are not the only frequentist methods which encounter difficulties with the SN model. The method of moments can give even worse results: see for example the Azzalini s website on the SN distribution SN/index.html). The fact that difficulties are intrinsically tied with the likelihood shape suggests to calibrate the likelihood with a weight function. In our view the most natural calibration of the experimental information provided by the likelihood is the Bayesian approach, where the prior distribution plays the role of the weight function. In this paper we present a noninformative Bayesian analysis of the SN model based on reference priors [Berger and Bernardo 1992), Kass and Wasserman 1996)]. This approach is important for both practical and theoretical reasons; from the practical viewpoint, we provide point estimates of the parameters of the SN model with remarkable 4

5 frequentist properties, even for small sample sizes. We also provided a simple testing procedure to verify the hypothesis of symmetry λ = 0 vs. λ 0). Our results can be of some interest also from a more theoretical perspective: in fact, the SN model provides, to our knowledge, the first example of a proper reference prior for an unbounded parameter, namely the skewness parameter λ. In the standard case µ = 0, σ = 1) the reference or Jeffreys ) prior for λ has tails of order Oλ 3/2 ). This fact automatically calibrates the odd behaviour of the likelihood function and allows for reliable estimation procedure. In the three-parameter case the joint reference prior for λ, µ, σ), say q R λ, µ, σ), factorizes into q R λ) and the usual default prior for location-scale parameters, namely q R µ, σ) 1/σ; again, q R λ) is marginally proper with tails of order Oλ 3/2 ). The paper is organised as follows: in 2 we obtain the reference prior for the SN model in the simplest standard) case; we also discuss its theoretical properties and provide simulation evidence of the implied estimation procedures. The fact that the default prior is proper allows a straight use of the Bayes factor for testing the nested models H 0 : λ = 0 vs. H 1 : λ 0 This Bayes test can be fruitfully used for checking the normality assumption against skewness in a given data set. Due to the mathematical intractability of the likelihood function and to the fact that the reference prior can be written only in an integral form, both estimation and testing procedures are based both on MonteCarlo Markov Chain algorithms and numerical integration. In Section 3 we generalise our result to the general scalar case. Here we provide a closed form for the integrated likelihood function L λ) = n i=1 fx i ; λ, µ, σ)qµ, σ)dµdσ, 2) when the prior q, ) for the location and scale parameters belongs to the conjugate normal-gamma family. Formula 2) can be expressed in terms of the cdf of the multivariate Student t distribution. Results for the noninformative analysis naturally follows as a particular case. The closed form of the integrated likelihood of the scalar parameter λ allows a simple default or subjective analysis of the SN model even in the general case. 5

6 This paper can be of some interest both from the methodological and from the applied perspectives. From the methodological viewpoint, it shows an example where likelihood approaches to inference pose problems which an objective and proper Bayes analysis may solve. From the applied viewpoint, it provides a complete inferential strategy for the analysis of possibly skew data sets which is, at the same time, objective since it is based on a default prior which satisfies some frequentist requisites) and fully Bayesian since the prior is proper). In the last section we propose the use of the SN class to model the innovation terms of a UK stock return time series. This approach allows us to take into account explicitly the skewness of the returns, as implied by the so-called leverage effect Peirò 1999). 2 Noninformative analysis of the SN model In this section we confine ourselves to the standard case, to enlighten the inferential difficulties of the model and to show the remarkable properties of a default Bayesian analysis. The general case will be discussed in 3. In 2.1 we derive the Jeffreys prior for the skewness parameter λ and in 2.2 we provide simulation results and speculate on some possible estimation strategies. 2.1 The Jeffreys prior Let X 1, X 2,, X n be n independent replications of a SN random variable with density function The Jeffreys prior associated with this model is where fz; λ) = 2ϕz)Φλz). 3) ) 2 Iλ) = IE λ log fz; λ) = λ q J λ) I 1 2 λ), 4) 2z 2 ϕz) ϕ2 λz) Φλz) dz represents the Fisher expected information; Iλ) cannot be written in a closed form Azzalini, 1985). In this section we explore some relevant characteristics of q J λ). 6

7 Theorem 1. i) q J λ) is symmetric about λ = 0 and it is decreasing in λ ; ii) the tails of q J λ) are of order Oλ 3 2 ) Proof. See Appendix A. The Jeffreys prior 4) deserves several comments. First, it is a proper prior although the support of the parameter is unbounded. A possible explanation of this unusual phenomenon is that the Jeffreys prior compensates the fact that the likelihood function of a SNλ) sample might not vanish on the tails. This also implies a strong disagreement between frequentist and default Bayesian estimation of the parameter λ, at least for finite sample sizes, which we shall explore in the next subsections. Also, from a practical point of view, the propriety of the Jeffreys prior allows its use in a hypothesis testing context: as a result, the regular Bayes factor for testing the sharp hypothesis H 0 : λ = λ 0 versus composite alternative hypotheses can be used in a default sense. It is also important to note that this result cannot be generalized to skew-elliptical distributions. For example, the scalar Skew-Cauchy distribution, introduced by Arnold and Beaver 2000), does not have a finite Fisher information as the shape parameter λ tends to zero. 2.2 Simulation result It has been well known since Welch and Peers 1963) that, under general conditions which are full-filled in one-parameter case) and in absence of nuisance parameters, the Jeffreys prior is a second order matching prior, that is the frequentist coverage of one-sided Bayesian intervals induced by the prior is equal to the nominal value plus a remainder of order On 1 ). The agreement between default Bayesian analysis and maximum likelihood estimation is, however, only asymptotical. In the skew-normal case, for finite sample sizes, the frequentist approaches are more problematic and the efforts to find reasonable estimators for λ have been very discouraging: for example, the maximum likelihood estimates are infinite with a non negligible sampling probability, especially for small sample sizes. 7

8 In this section we provides results of a simulation study concerning the Bayesian procedures implied by the use of the Jeffreys prior. For each true value of the parameter λ 0.5, 1, 2, 3, 5) and for each fixed sample size n = 10, 30) ten thousand independent samples were generated: for each of them we calculate The posterior mean of λ, IEλ x) The posterior median Meλ x), the 5th and the 95th quantiles. The maximum likelihood estimator ˆλ The Bayes factor of λ = 0 vs. λ 0, B 0 The Bayes factor of λ = λ vs. λ λ, B. Table 1 contains the summary of our results: for each combination λ, n) we provide the frequency of times that B 0 incorrectly chooses the value λ = 0 Column 2) and the frequency of times that B correctly provides strong evidence in favor of the nested hypothesis Column 3). Here we interpret as strong evidence in favor of an hypothesis values of B 0 < 0.5 and values ofb > 2. the median of the simulated sampling distribution of the posterior median Column 4) the 0.05 and 0.95 frequentist coverage of the one tail credible set based on the Jeffreys prior Columns 5 and 6) the mean of the sampling distributions of the posterior mean IEλ x) Columns 7) and MLE Column 8). Note that both these estimators can be infinite with the same sampling probability, reported in Column 9. Computation were done using simple numerical integration techniques for the computation of the posterior mean and simple mumerical maximization for the MLE. A Metropolis-Hastings algorithm has been used for the computation of the median and the quantiles; the proposal distribution was chosen to be a Cauchy distribution centered at the previous value of the chain, in order to mimic the tails of q J λ). Computation of the denominator of the Bayes factors involved the calculations of the marginal likelihood of the data: this task is nowadays made simple by using the Chib and Jeliazkov s method Chib and Jeliazkov 2001). Fortran and R codes are available upon request. 8

9 Results of Bayesian strategies are encouraging, especially when compared with the maximum likelihood estimator. There is always a bias towards smaller value of λ, although this bias decreases for large n. The bad behaviour of the likelihood function is definitely mitigated by the use of the Jeffreys prior; the median appears to be a more precise synthetic index of the posterior distribution. Also, it always exists, while the posterior mean is undefined when the likelihood function is increasing or decreasing. Finite sample size simulations True value B 0 < 0.5 B > 2 Median 5th q. 95th q. Mean MLE % λ = % λ = % λ = % λ = % λ = % n = 10 True value B 0 < 0.5 B > 2 Median 5th q. 95th q. Mean MLE % λ = % λ = % λ = % λ = % λ = % n = 30 We conclude this session by discussing the choice of point estimates for λ. Although a Bayesian analysis can be considered complete after that the posterior distribution is available, it is common practice to suggest some specific point estimate for each unknown parameter. Statisticians too often by-pass this point and they uncritically use the posterior mean. In the SN context this choice can be dangerous because the posterior mean of λ exists only for those sample which would provide a finite maximum likelihood estimate! The choice is therefore restricted among the posterior median and the posterior mode. 9

10 Simulations reported in Table 1 suggest that both of them perform reasonably well and the choioce among them should be done in terms of computational convenience. In particular, those who prefer to use Monte Carlo Markov chains methods would find much easier to compute the posterior median rather than the posterior mode. 3 The general case The estimation problems are even worse in the three-parameters case. It has been known since Azzalini 1985) that the maximum likelihood estimator of λ converges to the limit distribution at a rate which is slower than the usual On 1 2 ) for λ close to 0. Consequently, finite sample size performances of the maximum likelihood estimator are not satisfactory and the use of alternative estimators such as the one based on the method of moments does not solve the problem. To avoid these difficulties, it is possible to introduce an alternative, interest-respecting, parameterization, where the parameter of interest becomes the skewness index γ 1 = γ 1 λ) [see Chiogna 1998) and Pewsey 2001) for details] From a Bayesian perspective, parameterizations are irrelevant in principle and they can be chosen on computational bases. Here we shall consider a default Bayes analysis for the general scalar case in the original parameterization, that is the 3-parameters model whose generic element is given in 1); µ and σ play the role of location and scale parameters: when λ is the actual parameter of interest, they are essentially nuisance parameters and they should be integrated out with respect some suitable prior distribution in order to work with a marginal model, depending only on λ. We are implicitly assuming that µ, σ) are a priori independent of λ; moreover we assume that they follow a normal-gamma distribution, that is ) µ σ N µ 0 ; σ2, σ 2 Ga α; β), 5) τ where Gaα; β) represents the usual gamma distribution with positive parameters α and β. Note that, as a particular case, one can use the usual default prior for location-scale parameters, that is q R µ, σ) 1 σ ; 6) 10

11 Incidentally, we note that 6) is actually the conditional reference prior for µ, σ) given λ Berger, Liseo and Wolpert 1999). The following theorem provides a closed form of the integrated likelihood for the parameter λ. Theorem 1 Let X 1,..., X n be i.i.d. random variables with density SNµ, σ, λ), and assume that µ, σ) are distributed as in 5). Let x and s 2 be the sample mean and variance. Then the integrated likelihood 2) has the following form: L λ) F V Ω) λ ) n + 2β v where F V Ω) ) is the cdf of a centered n-dimensional t distribution with n + 2β degrees of freedom, and scale matrix Ωλ) = I n + λ2 1 τ+n n1 n, v = v 1, v 2,, v n ) with generic element v i = x i nx + τµ ) / 0 2α + ns τ + n 2 ) + τn τ + n x µ 0) 2 Proof. See Appendix B. The next corollary is trivial but important for practical purposes. Corollary 2 Noninformative case. The reference-integrated likelihood is given by LR λ) F V ΩR ) λ x x1 ) n, 7) s where F V ΩR ) is the cdf of a centered n-dimensional t distribution with n degrees of freedom and scale matrix Ω R λ), obtained from Ωλ) by setting τ = 0. The behaviour of LR λ) for large λ deserves some comments. The following result, whose proof is given in the Appendix E, can be shown. Corollary 3 The tails of LR λ) are such that ) n x 1) x lim LR λ) = F n λ + s n x x n) lim LR λ) = F n λ s 11 8) ), 9) 10)

12 where F n denotes the cdf of a univariate Student t distribution with n degrees of freedom, and x 1) and x n) represent the minimum and the maximum of the sample, respectively. Because of the internality property of the mean, one can see that LR λ) do not vanish for large values of λ : this phenomenon makes it difficult, for example, to produce likelihood intervals for λ; see Example 3.1 below and Berger et al. 1999) for details on this issue. On the other hand this problem need not be serious in practice, especially for large n. A correct default analysis for this problem would be carried on by deriving the Jeffreys prior for the marginal model whose likelihood function is given by 7). Alternatively one can use the reference algorithm approach Berger and Bernardo 1992) to show that the actual joint reference prior for the three-parameter case is π R λ, µ, σ) = π R µ, σ)π R λ) 1 σ g 1 2 λ), 11) where gλ) is a complicated function of the parameter of interest λ: details are reported in the Appendix C. The use of the exact reference prior 11) is obviously cumbersome. However, as in the standard case, the tails of π R λ) are of order λ 3 2 see Appendix D): again, the marginal prior is proper and it can be used together with 7) to produce a proper posterior distribution for λ. We exploit this tails similarity by approximating the actual marginal prior π R λ) with the Jeffreys prior obtained in the standard case. 4 Applications Example 4.1. We shall first consider the data set available at It consists of n = 50 simulated data from a SN0, 1, 5). Data are reproduced in Figure 1.1; they are quite challenging because the maximum likelihood estimate of λ is infinite and no other frequentist methods seems to work properly. A graphical approach suggests to use λ = 8.1 to fit the data see Fig. 1.1 reproduced from Azzalini and Capitanio 1999)) Figure 1.2 shows the integrated likelihood for λ: Although the location is not 12

13 excellent, most of the problems seems to be solved. The maximum integrated likelihood estimate is ˆθ R = However, one can do better, using a default Bayesian analysis with prior 11) in order to calibrate LR. Figure 1.3 shows the marginal posterior marginal distribution of λ. This posterior does not have a finite mean and a reasonable point estimate for λ is the median, which is about 2.1. the same as θ R ) While this value is far from the true one λ = 5) it is important, however, to place matters in perspective: in the SN setup, there simply are no methods of objective inference that will be viewed as broadly satisfactory; when a case is made for having standard default analysis available, the default Bayes approach seems to be the best candidate. Example 4.2. Detecting skewness in stock return distributions. A major challenge in the analysis of financial time series is the explicit modelling of the skewness of the returns distribution. Various empirical studies Mills 1995) have shown that returns distributions tend to be negatively skewed; a possible economic explanation is the so-called volatility feedback effect, illustrated in Black 1976). Peirò 1999) discusses the problem in the context of time series analysis where the dependence among data renders the usual Mardia s test for asymmetry unapplicable. Following De Luca and Loperfido 2001), we propose the use of a GARCH1,1) model with innovation terms being an i.i.d. sequence of SN random variables. Since arbitrage constraints force the mean of the process to be zero, we use error terms with general SNµ, τ, λ) under the constraint that 2 λτ µ = µλ) = π. 12) 1 + λ 2 In detail, our Bayesian model is given by the following: for t = 1,, T, let y t = σ t x t, where the conditional variances σ 2 t = α 0 +α 1 y 2 t 1 +β 1 σ 2 t 1 follow a GARCH1,1) structure, and the innovation terms x t are i.i.d. SN, that is x t SNµλ)), τλ), λ), 13

14 where µλ) is given by 12); to mantain the actual meaning of variance for σ 2 t, the scale parameter of x t is set to be τλ) = ) 2λ π1 + λ 2 ) Note that, while the innovation terms x t s follow a SN distribution, the random variables y t s have an analytically intractable distribution which can only be handled via a Monte Carlo Markov chain approach Following the approach of Vrontos, Dellaportas and Politis 2000) the prior distributions for the GARCH parameters α 0, α 1, β 1 ) and the initial value σ 0 are chosen to be vague: we used πα 1, β 1 ) = 2 over the stationary region α 1 + β 1 1), while α 0 and σ 0 follows a Gamma distribution with hyperparameters equal to 0.01; the prior distribution for λ is of course π J. We illustrate the performance of the above model with a series of T = 639 observations of the UK FTSE 100 index. Observations are taken at the end of each week observed from January 93 to June 02; we use our skew GARCH model to represents the log ratios logy t /y t 1 ). Figure 2.1 shows the data set, whose observed skweness is We basically generalize the Monte Carlo Markov chain approach of Vrontos et al. 2000) up to include skew normal innovations. R codes are available upon request). Figure 2.2 shows the marginal posterior distribution of λ, with median equal to λ = 1.204, which induces µλ ) = 0.93 and τλ) = By exploiting the relations between direct and centered parametrization of the SN family Azzalini and Capitanio 1999), one can see that these values correspond to a skewness index equal to -0.20, close enough to the observed sampling estimate. Note that with the same data, the usual EGARCH model fails to detect the negativity of the skewness. APPENDICES Appendix A. Proof of Theorem 1. Since the square root is a monotonic transformation, it suffices to prove the results for Iλ). 14

15 i) The symmetry of Iλ) is proved by splitting the integral in two parts. Iλ) = 2 = z 2 ϕz) ϕ2 λz) Φλz) dz + 2 z 2 ϕ z) ϕ2 λz) 0 1 Φλz) dz z 2 ϕz)ϕ 2 1 λz) Φλz)1 Φλz)) = I λ). The monotonicity of q J ) for λ > 0 is proved calculating the first derivative; λ Iλ) = z 3 0 ϕz)ϕ 2 λz) Φ 2 λz)1 Φ 2 [2λzΦλz)1 Φλz)) + ϕλz)1 2Φλz))]. 13) λz)) To prove that Iλ)/ λ is negative it suffices to show that, for all s > 0, 2sΦs)Φ s) + ϕs) 1 2Φs)) > 0. Since for all s > 0, sφ s) < ϕs) the result follows immediately. ii) From the symmetry of Iλ) it is enough to study the right tail only. Let Iλ) = Aλ) + Bλ) = 2 0 z 2 ϕz) ϕ2 λz) Φλz) dz z 2 ϕ 2 λz) ϕz) 1 Φλz) dz. For all λ > 0 and z > 0, 1 < 1/Φλz) < 2. Then A λ) < Aλ) < 2A λ), where A λ) = 0 z 2 ϕz)ϕ 2 λz)dz = 1 ; 2π1 + 2λ 2 ) 3 2 then Aλ) = O1/λ 3 ). Also, Bλ) = 2 1/λ 0 z 2 ϕ 2 λz) ϕz) 1 Φλz) dz + 2 1/λ z 2 ϕ 2 λz) ϕz) 1 Φλz) dz = B 1λ) + B 2 λ). When 0 < z < λ 1, 1/2 < Φλz) < Φ1); then 2 < 1 Φλz)) 1 < 6.30 = c. It follows that B1 λ) < B 1λ) < c/2b1 λ), where B 1λ) = 4 1/λ 0 z 2 ϕz)ϕ 2 λz)dz = 4 π 3/ λ 2 ) 3/2 where Γ a, x) denotes the incomplete Gamma function Γ a, x) = x 0 t a 1 exp { t} dt. Γ 3 2, λ 2 ), It follows that B 1 λ) = O1/λ 3 ). following inequality Feller 1971) Finally, for fixed positive λ and z > λ 1 we use the 1 1 Φλz) λ 3 z 3 λ 2 z 2 1)ϕλz). 15

16 Then B 2 λ) 2λ 3 1/λ 2λ 3 1/λ z 5 λ 2 z 2 1 ϕz)ϕλz)dz z 5 ϕz)ϕλz)dz < 2 2π λ λ 2 ) 3 0 w 5 ϕw)dw = 16 2π λ λ 2 ) 3. It follows that Iλ) is the sum of three functions of order O1/λ 3 ), and, consequently, the tails of q J λ) are of order -3/2. Appendix B. Proof of Theorem 2. The sampling distribution is ) 2 n/2 { f x λ, µ, σ) = πσ 2 exp n 2σ 2 [ s 2 + x µ) 2]} n i=1 ) Φ λ xi µ σ Integrating out the location parameter µ, one obtains f x λ, σ) = τ2 n 1 ] πσ 2 n+1 exp [ ns2 ) 2σ 2 + [ exp 1 τµ µ0 2σ 2 ) 2 + n x µ) 2) ] n ) Φ λ xi µ dµ σ where Using a well-known identity Box and Tiao 1973), pag. 418), one gets f x λ, σ) = hλ, σ) = τ2 n 1 πσ 2 ) n+1 exp [ n 2σ 2 { exp τ + n µ m)2 2σ2 i=1 s 2 + τ x µ 0) 2 } n i=1 n + τ ) Φ λ xi µ dµ, σ 14) )] hλ, σ), 15) and m = n x + τµ 0 )/τ + n). A change of variable y = τ + nµ m)/σ allows us to write } σ hλ, σ) = exp { y2 n [ )] xi m y Φ λ dy τ + n 2 σ τ + n i=1 Let us define the r.v. s Y, Z 1,..., Z n as i.i.d. standard Gaussian random variables; the above equation can be written as [ 2πσ h λ; σ) = 2 n τ + n) IEY P Z i + Y λ λ x ) ] i m, 16) τ + n σ i=1 16

17 where IE Y denotes the expect value with respect to Y. For i = 1,, n. let U i = Z i + λ τ + n. Standard multivariate normal theory shows that U = U 1,, U n ) has a centered multivariate normal density with covariance matrix Ωλ) = I n + λ2 τ+n 1 n1 n It follows that h λ; σ) is proportional to the joint cumulative distribution function of U, evaluated at λ σ x m1 n). integrated likelihood of λ is proportional to where L λ) Using the marginal prior distribution for σ 2 defined in 5), the 0 σ r = 1 n+1+2β exp } { r2 2σ 2 F U λ x m1 ) n dσ, 17) σ ns 2 + 2α + nτ n + τ x µ 0) 2 With a new change of variable, w = r 2 /σ 2, one gets U L λ) w n/2+β+1 e w/2 Prob 0 w λx m1 ) n r [ U n + 2β IE W Prob λ n + 2β x m1 )] n, W r where W χ 2 n+2β. It follows from standard multivariate normal theory Mardia, Kent and Bibby 1979), pag. 43) that the random vector V = U n + 2β/ W has a n-dimensional t distribution with n + 2β degrees of freedom with the same scale matrix as U. Then L λ) F V Ω) λ n + 2β x m1 ) n, 18) r and the proof is complete. Appendix C. The reference prior for the three-parameter case. We follow the notation and the approach of Berger and Bernardo 1992). The Fisher matrix is a 2 λa 2 /σ Iλ, µ, σ) = {i jk } = λa 2 /σ 2 + λ 2 ) a 2 /σ 2 ) b λa 1+λ 2 ) 3/2 1 /σ ) b λa 1+λ 2 ) 3/2 1 bδ 1+2λ2 1+λ 2 + λ 2 a 1 ) bδ 1+2λ2 1+λ 2 + λ 2 a 1 ) /σ λ 2 a 0 ) /σ 2 /σ /σ 2 17

18 where and a j = a j λ) = IR b = 2 π, δ = λ 1 + λ 2 2z j ϕλz)ϕz) dz, j = 0, 1, 2. Φλz) We choose a nested sequence of compact subsets in the parameter space Ω as follows note that the final answer does not depend on the particular choice of the sequence): Ω 1 Ω 2 Ω l Ω where Ω l = Λ l M l Σ l, with Λ l = {λ : t l λ t l }, M l = {µ : b l µ b l }, Σ l = {σ : 1/ c l σ c l }; here {t l }, {b l }, and {c l } are real monotonic sequences diverging to infinity. With respect to Ω l, it is easy to see that the conditional reference prior for µ is q l 3µ λ, σ) = I M l µ) V M l ), where V A) denotes the volume of the set A. Also, q2µ, l σ λ) = q3µ l exp{ 1 2 M λ, σ) l log h 2 q3 l µ λ, σ)}dµ Σ l exp{ 1 2 M l log h 2 q3 l µ λ, σ)}dµ, where h 2 is as in Berger and Bernardo 1992)) the lower right corner of the matrix S 1 2, S 2 is the 2 by 2 upper left corner of the inverse of the Fisher matrix and h 2 = 1 σ 2 [[2 + λ 2 a 2 ]1 + λ 2 a 0 ) bδ 1+2λ2 1+λ 2 + λ 2 a 1 ) λ 2 a 0 = 1 σ 2 g λ) It follows that q2 l µ, σ λ) const./σ. Finally, Bernardo 1992) for details), defining h 1 similarly to h 2 see Berger and and h 1 = i 11i 22 i i 12 i 13 i 23 i 11 i 2 23 i 33i 2 12 i 22i 2 13 i 33 i 22 i 2 23 q l 1λ, µ, σ) = 1 σ gλ) Λ l gλ)dλ, = gλ) 19) and a standard passage to the limit in l provides the result q R λ, µ, σ) 1 σ gλ). 18

19 Appendix D. The tail behaviour of the marginal prior of λ in the three parameter case. Since q R λ, µ, σ) gλ)/σ, it suffices to study gλ). We have already proved see Appendix A, part ii) that a 2 λ) has tails of order Oλ 3 ). Using exactly the same argument it is easy to show that, in general, a j λ) has tails of order Oλ j+1) ), for integer j 0. Recalling the expression 19) of gλ) one obtains ) ) gλ) 2 + λ 2 a 2 )1 + λ 2 b a 0 ) 2λ λa bδ1+2λ 2 ) lim λ λ 3 = lim 1+λ 2 ) 3/2 1 + λ 2 a 1+λ 2 1 ) λ 2 + λ 2 a 2 )1 + λ 2 a 0 ) bδ1+2λ 2 2 ) + λ 1+λ 2 a 2 1 a bδ1+2λ 2 ) 2 + lim λ ) 2 + λ 2 a 1+λ 2 1 λ 2 a λ2 a 0 ) 2 + λ 2 a 2 ) 2 + λ 2 a 2 )1 + λ 2 a 0 ) ) 2 b λa 1+λ 2 ) 3/2 1 bδ1+2λ 2 ) 1+λ 2 + λ 2 a 1 ) 2 = k, where k is a fixed non zero constant. A similar argument can be used for the left tail; this concludes the proof. Appendix E. The tail behaviour of LR λ). We prove the proposition for λ +, the case where λ being similar. Starting from 7), and assuming λ positive, one can alternatively write LR λ) as ) n x x1 n n LR λ) F nw, λ s λ W T n 0; n ) λ 2 I n + 1 n 1 n Then, ) n x x1 n lim LR λ) F U, U T n 0; 1n 1 ) n λ + s The rank of the scale matrix of U is 1; also, all the elements of the scale matrix are positive. It follows that, with probability 1, 20) 21) Then U 1 = U 2 = = U n ) ) n x x1 n x i x F U = P U 1 min n s 1 i n s Since the distribution of U is multivariate Student with n degrees of freedom the distribution of U 1 is univariate Student with n degrees of freedom and the proof is complete. 19

20 References Adcock C. and Shutes K. 1999) Portfolio selection based on the multivariate skewed normal distribution, In Financial Modelling, A Skulimowski Editor),Progress and Business Publishers, Krakow, Aigner D.J. and Lovell C.A.K. 1977) Formulation and estimation of stochastic frontier production function model, Econometrics, 12, Arnold B.C. and Beaver R.J. 2000) The skew-cauchy distribution, Statist. Probab. Lett., 49, Azzalini A. 1985) A class of distributions which includes the normal ones, Scand. J. Statist., 12, Azzalini A. and Capitanio A. 1999) Statistical applications of the multivariate skewnormal distributions, J. R. Statist. Soc. B, 61, Azzalini A. and Dalla Valle A. 1996) The multivariate skew-normal distribution, Biometrika, 83, Bartolucci F., De Luca G. and Loperfido N. 2000) A generalization for skewness in the basic stochastic volatility model, Proceedings of the 15th International Workshop on Statistical Modelling, Berger J. and Bernardo J. 1992) On the development of reference priors, Bayesian statistics, 4. Oxford Univ. Press, New York, Berger J.O., Liseo B. and Wolpert R.L. 1999) Integrating Likelihood Methods for Eliminating Nuisance Parameters, Statistical Science, 14, Black F. 1976) Studies of stock price volatility changes, Proceedings of the 1976 meeting of Business and Economics Statistics, A.S.A., Box G. and Tiao R. 1973) Bayesian Inference in Statistical Analysis, Wiley, New York. Branco M.D. and Dey D. 2001) A general class of multivariate skew-elliptical distributions, Journal of Multivariate Analysis, 77, Chib S. and Jeliazkov I. 2001) Marginal likelihood from the Metropolis-Hastings output, Journal of the American Statistical Association, 96, Chiogna M. 1998) Some result on the scalar skew-normal distribution, J. Ital. Statist. Soc., 7,

21 De Luca G. and Loperfido N. 2001) A skew-in-mean GARCH model for the Italian Stock Exhange Index, Technical Report, Università di Urbino. Feller W. 1971) An Introduction to Probability Theory, Vol. II, Wiley, New York. Kass R. and Wasserman L. 1996) Formal Rules for Selecting Prior Distributions: A Review and Annotated Bibliography, Journal of the American Statistical Association, 91, Liseo B. 1990) La classe delle densità normali sghembe: aspetti inferenziali da un punto di vista bayesiano, Statistica, 50, Liseo B. and Loperfido N. 2002) A Bayesian interpretation of the multivariate skew normal distribution, Technical Report; Università di Roma. Mardia K.V., Kent J. and Bibby J. 1979) Multivariate analysis, Academic Press, New York. Mills T. 1995) Modelling skewness and kurtosis in the London stock exchange FT-SE index return distributions, Journal of the Royal Statistical Society, D The Statistician), 44, Mukhopadhyay S. and Vidakovic B. 1995) Efficiency of linear Bayes rules for a normal mean: skewed prior class, J. R. Statist. Soc. D, 44, O Hagan A. and Leonard T. 1976) Bayes estimation subject to uncertainty about parameter constraints, Biometrika, 63, Peirò A. 1999) Skewness in financial returns, Journal of Banking and Finance, 23, Pewsey A. 2001) Problems of inference for Azzalini s skew-normal distribution, Journal of Applied Statistics, 27, Roberts C. 1966) A correlation model useful in the study of twins, Journal of the American Statistical Association, 61, Vrontos I., Dellaportas P. and Politis D. 2000) Full Bayesian Inference for GARCH and EGARCH Models, Journal of Business and Economic Statistics, 18, Welch B. and Peers H. 1963) On a formulae for confidence points based on integrals of weighted likelihoods, J. Roy. Statist. Soc. B, 25,

22 Kernel and SN eye density estimates for the frontier data density Kernel estimate SN eye estimate x Figure 1: Figure 1.1. Reprinted from JRSS-B, Azzalini and Capitanio 1999): Simulated data points small circles) leading to ˆλ =, with a non parametric and paramertric estimates. 22

23 Non normalized Integrated Likelihood with the frontier data Lλ) λ Figure 2: Figure 1.2. Frontier data: integrated likelihood for λ. 23

24 Marginal posterior distribution of lambda with the frontier data πλ, x) λ Figure 3: Figure 1.3. Frontier data: marginal posterior distribution for λ using π R. 24

25 Time series of UK closure data t yt Figure 4: Figure 2.1. The UK FTSE 100 index time series from January 93 to June

26 Marginal Posterior density of lambda UK series) πλ y) λ Figure 5: Figure 2.2. The marginal posterior of the skewness parameter for the S-GARCH model with the UK FTSE 100 index time series data. 26

JMASM25: Computing Percentiles of Skew- Normal Distributions

JMASM25: Computing Percentiles of Skew- Normal Distributions Journal of Modern Applied Statistical Methods Volume 5 Issue Article 3 --005 JMASM5: Computing Percentiles of Skew- Normal Distributions Sikha Bagui The University of West Florida, Pensacola, bagui@uwf.edu