Posterior propriety in objective Bayesian extreme value analyses

Please cite: Northrop, P. J. and Attalides, N. 5) Posterior propriety in Bayesian extreme value analyses using reference priors. Statistica Sinica, doi:.575/ss.4.34. Posterior propriety in objective Bayesian extreme value analyses Paul J. Northrop and Nicolas Attalides University College London Abstract The Generalized Pareto GP) and Generalized extreme value GEV) distributions play an important role in extreme value analyses, as models for threshold excesses and block maxima respectively. For each extreme value distribution we consider objective Bayesian inference using so-called noninformative or default prior distributions for the model parameters, specifically a Jeffreys prior, the maximal data information MDI) prior and independent uniform priors on separate model parameters. We investigate the important issue of whether these improper priors lead to proper posterior distributions. We show that, in the GP and GEV cases, the MDI prior, unless modified, never yields a proper posterior and that in the GEV case this also applies to the Jeffreys prior. We also show that a sample size of three four) is sufficient for independent uniform priors to yield a proper posterior distribution in the GP GEV) case. Introduction Extreme value theory provides asymptotic justification for particular families of models for extreme data. Let X, X,... X N be a sequence of independent and identically distributed random variables. Let u N be a threshold, increasing with N. Pickands 975) showed that if there is a non-degenerate limiting distribution for appropriately linearly rescaled excesses of u N then this limit is a Generalized Pareto GP) distribution. In practice, a suitably high threshold u is chosen empirically. Given that there is an exceedance of u, the excess Z = X u is modelled by a GP u, ξ) distribution, with threshold-dependent scale parameter u, shape parameter ξ and distribution function { + ξz/ u ) /ξ +, ξ, F GP z) = ) exp z/ u ), ξ =, where z >, z + = maxz, ), u > and ξ R. The use of the generalized extreme value GEV) distribution Jenkinson, 955), with distribution function { { } exp [ + ξy µ)/] /ξ +, ξ, F GEV y) = ) exp { exp[ ξy µ)/]}, ξ =, Research Report number 33, Department of Statistical Science, University College London. Date: January 4 Address for correspondence: Paul Northrop, Department of Statistical Science, University College London, Gower Street, London WCE 6BT, UK. E-mail: p.northrop@ucl.ac.uk

where > and µ, ξ R, as a model for block maxima is motivated by considering the behaviour of Y = max{x,..., X b } as b Fisher and Tippett, 98; Leadbetter et al., 983). Commonly-used frequentist methods of inference for extreme value distributions are maximum likelihood estimation MLE) and probability-weighted moments PWM). However, conditions on ξ are required for the asymptotic theory on which inferences are based to apply: ξ > / for MLE Smith, 984, 985) and ξ < / for PWM Hosking et al., 985; Hosking and Wallis, 987). Alternatively, a Bayesian approach Coles, ; Coles and Powell, 996; Stephenson and Tawn, 4) can avoid conditions on the value of ξ and performs predictive inference about future observations naturally and conveniently using Markov chain Monte Carlo MCMC) output. A distinction can be made between subjective analyses, in which the prior distribution supplies information from an expert Coles and Tawn, 996) or more general experience of the quantity under study Martins and Stedinger,, ), and objective analyses Berger, 6). In the latter, a prior is constructed using a formal rule, providing an objective prior when no subjective information is to be incorporated into the analysis. Many such formal rules have been proposed: Kass and Wasserman 996) provides a comprehensive review. In this paper we consider three priors that have been used in extreme value analyses: the Jeffreys prior Eugenia Castellanos and Cabras, 7; Beirlant et al., 4), the maximal data information MDI) prior Beirlant et al., 4), and the uniform prior Pickands, 994). These priors are improper, that is, they do not integrate to a finite number and therefore do not correspond to a proper probability distribution. An improper prior can lead to an improper posterior, which is clearly undesirable. There is no general theory providing simple conditions under which an improper prior yields a proper posterior for a particular model, so this must be investigated case-by-case. Eugenia Castellanos and Cabras 7) establish that Jeffreys prior for the GP distribution always yields a proper posterior, but no such results exist for the other improper priors we consider. It is important that posterior propriety is established because impropriety may not create obvious numerical problems, for example, MCMC output may appear perfectly reasonable Hobert and Casella, 996). One way to ensure posterior propriety is to use a diffuse proper prior, such as a normal prior with a large variance Coles and Tawn, 5; Smith, 5) or by truncating an improper prior Smith and Goodman, ). For example, Coles, chapter 9) uses a GEVµ,, ξ) model for annual maximum sea-levels, placing independent normal priors on µ, log and ξ with respective variances 4, 4 and. However, one needs to check that the posterior is not sensitive to the choice of proper prior and, as Bayarri and Berger 4) note... these posteriors will essentially be meaningless if the limiting improper objective prior would have resulted in an improper posterior distribution. Therefore, independent uniform priors on separate model parameters are of interest in their own right and as the limiting case of independent diffuse normal priors. In section we give the general form of the three objective priors we consider in this paper. In section 3 we investigate whether or not these priors yield a proper posterior distribution when updated using a random sample z = z,..., z m ) from the GP distribution, and, in cases where propriety is possible, we derive sufficient conditions for this to occur. We repeat this for a random sample y = y,..., y n ) from a GEV distribution in section 4. Proofs of results are presented in the appendix.

Objective priors for extreme value distributions Let Y is a random variable with density function fy φ), indexed by a parameter vector by φ and define the Fisher information matrix Iφ) by Iφ) ij = E [ ln fy φ)/ φ i φ j ]. Uniform priors. Priors that are flat, i.e. equal to a positive constant, suffer from the problem that they are not automatically invariant to reparameterisation: for example, if we give log a uniform distributon then is not uniform. Thus, it matters which particular parameterization is used to define the prior. Jeffreys priors. Jeffreys general rule Jeffreys, 96) is π J φ) detiφ)) /. 3) An attractive property of this rule is that it produces a prior that is invariant to reparameterization. Jeffreys suggested a modification of this rule for use in location-scale problems. We will follow this modification, which is summarised on page 345 of Kass and Wasserman 996). If there is no location parameter then 3) is used. If there is a location parameter µ, say, then φ = µ, θ) and π J µ, θ) = detiθ)) /, 4) where Iθ) is calculated holding µ fixed. In the current context the GP distribution does not have a location parameter whereas the GEV distribution does. MDI prior. The MDI prior Zellner, 97) is defined as π M φ) exp {E[log fy φ)]}. 5) This is the prior for which the increase in average information, provided by the data via the likelihood function, is maximised. For further information see Zellner 998). 3 Generalized Pareto GP) distribution Without loss of generality we take the m threshold excesses to be ordered: z < < z m. For simplicity we denote the GP scale parameter by rather than u. We consider a class of priors of the form π, ξ) π ξ ξ)/, >, ξ R, that is, a priori and ξ are independent and log has an improper uniform prior over the real line. The posterior is given by where π G, ξ y) = C m π ξ ξ) m+) C m = max, ξz m) m + ξz i /) +/ξ), >, ξ > /z m, π ξ ξ) m+) m + ξz i /) +/ξ) d dξ 6) and the inequality ξ > /z m comes from the constraints + ξz i / >, i =,..., m in the likelihood. 3

3. Prior densities Using 3) with φ =, ξ) gives the Jeffrey s prior π J,GP, ξ), >, ξ > /. + ξ) + ξ) / Eugenia Castellanos and Cabras 7) show that a proper posterior density results for n. Using 5) gives the MDI prior π M,GP, ξ) e ξ+) e ξ >, ξ R. 7) Beirlant et al. 4, page 447) use this prior but they do not investigate the propriety of the posterior. Placing independent uniform priors on log and ξ gives the prior This prior was proposed by Pickands 994). π U,GP, ξ), >, ξ R,. 8) Figure shows the Jeffreys and MDI priors for GP parameters as a functions of ξ. The MDI prior increases without limit as ξ. scaled prior density...4.6.8. Jeffreys MDI 3 Figure : Scaled Jeffreys and MDI GP prior densities against ξ. ξ 3. Results Theorem. A sufficient condition for the prior π, ξ) π ξ ξ)/, >, ξ R to yield a proper posterior density function is that π ξ ξ) is proportional to) a proper density function. 4

Proof. This trivial extension of the proof of theorem in Eugenia Castellanos and Cabras 7) is given in the appendix. The MDI prior 7) does not satisfy the condition in theorem because exp{ ξ + )} is not a proper density function on ξ R. Theorem. There is no sample size for which the MDI prior 7) yields a proper posterior density function. Proof. See appendix. The problem with the MDI prior is due to its behaviour for negative ξ so a simple solution is to place a lower bound on ξ a priori. This approach is common in extreme value analyses, for example, Martins and Stedinger ) constrain ξ to /, /) a priori. We suggest π M,GP, ξ) = e ξ+), ξ, 9) that is, a proper) unit exponential prior on ξ +. Any finite lower bound on ξ ensures propriety of the posterior but ξ =, for which the GP distribution reduces to a uniform distribution on, ), seems sensible. For smaller values of ξ the GP density is increasing over its support, which seems unlikely in an application where interest is in modelling the upper tail of a distribution. Theorem 3. The truncated MDI prior 9) yields a proper posterior density function for m. Proof. This follows directly from theorem. Theorem 4. A sufficient condition for the uniform prior 8) to yield a proper posterior density function is that m 3. Proof. See appendix. 4 Generalized extreme value GEV) distribution Without loss of generality we take the n block maxima to be ordered: y < < y n. We consider a class of priors of the form πµ,, ξ) π ξ ξ)/, >, µ, ξ R that is, a priori µ, and ξ are independent and µ and log have a improper uniform priors over the real line. Based on a random sample y,..., y n the posterior density for µ,, ξ) is proportional to { n [ )] } { +/ξ) n+) yi µ n [ )] } /ξ yi µ π ξ ξ) + ξ exp + ξ, ) where > and if ξ > then µ /ξ < y and if ξ < then µ /ξ > y n. 5

4. Prior densities Kotz and Nadarajah, page 63) give the Fisher information matrix for the GEV distribution ). Using 4) with φ = µ,, ξ) gives the Jeffrey s prior { [ π J,GEV µ,, ξ) = π [ Γ + ξ) + p] ξ 6 + γ + ) ] q ξ ξ + p ξ [ γ + ξ ξ Γ + ξ) q + p ] } /, µ R, >, ξ > /, ) ξ where p = + ξ) Γ + ξ), q = Γ + ξ) {ψ + ξ) + + ξ)/ξ}, ψr) = log Γr)/ r and γ.577 is Euler s constant. van Noortwijk et al. 4) give an alternative form for the Jeffreys prior, based on 3). Beirlant et al. 4, page 435) give the form of the MDI prior: π M,GEV µ,, ξ) = e γξ++/γ) e γ+ξ) >, µ, ξ R. ) Placing independent uniform priors on µ, log and ξ gives the prior π U,GEV µ,, ξ), >, µ, ξ R. 3) Figure shows the Jeffreys and MDI priors for GEV parameters as a functions of ξ. The MDI prior increases without limit as ξ and the Jeffreys prior increases without limit as ξ. scaled prior density...4.6.8. Jeffreys MDI 3 Figure : Scaled Jeffreys and MDI GEV prior densities against ξ. ξ 4. Results Theorem 5. For the prior πµ,, ξ) π ξ ξ)/, >, µ, ξ R to yield a proper posterior density function it is necessary that n and, in that event, it is sufficient that π ξ ξ) is proportional to) a proper density function. 6

Proof. See appendix. Theorem 6. There is no sample size for which the independence Jeffreys prior ) yields a proper posterior density function. Proof. See appendix. Truncation of the independence Jeffreys prior to ξ ξ + would yield a proper posterior density function if n. In this event theorem 5 requires only that ξ + / π ξξ) dξ is finite. From the proof of theorem 6 we have π ξ ξ) < [π /6 + γ) ] / + ξ) / for ξ /, / + ɛ), where ɛ >. Therefore, /+ɛ / π ξ ξ) dξ < [ π /6 + γ) ] / /+ɛ / = 3/ [ π /6 + γ) ] / ɛ /. + ξ) / dξ, The integral over / + ɛ, ξ + ) is also finite. However, the choice of an a priori upper limit for ξ may be less obvious than the choice of a lower limit. Theorem 7. There is no sample size for which the MDI prior ) yields a proper posterior density function. Proof. See appendix. As in the GP case, truncating the MDI prior to ξ, that is, π M,GEV µ,, ξ) e γ+ξ) µ R, >, ξ, 4) is a sensible way to yield a proper posterior distribution. Theorem 8. The truncated MDI prior 4) yields a proper posterior density function for n. Proof. This follows directly from theorem 5. Theorem 9. A sufficient condition for the uniform prior 3) to yield a proper posterior density function is that n 4. Proof. See appendix. 5 Discussion We have shown that some of the objective priors used, or proposed for use, in extreme value modelling do not yield a proper posterior distribution unless we are willing to truncate the possible values of ξ priori. An interesting aspect of our findings is that the Jeffreys prior ) for GEV parameters fails to yield a proper posterior, whereas the uniform prior 3) requires only weak conditions to ensure posterior propriety. This is the opposite of more general experience, summarised by Berger, 6, page 393) and Yang and Berger, 998, page 5), that Jeffreys prior almost always yields a proper posterior whereas a uniform prior often fails to do so. The impropriety of the posterior under the Jeffreys prior is due to the high rate at which the component π ξ ξ) of this prior increases for large ξ. An 7

alternative prior based on Jeffreys general rule 3) van Noortwijk et al., 4) also has this property. The conditions sufficient for posterior propriety under the uniform priors 8) and 3) are weak. Therefore, a posterior yielded by a diffuse normal priors is meaningful but such a prior could be replaced by an improper uniform prior. Although it is reassuring to know that ones posterior is proper, with a sufficiently informative sample posterior impropriety might not present a practical problem Kass and Wasserman, 996, section 5.). This may explain why Beirlant et al., 4, pages 435 and 447) obtain sensible results using untruncated) MDI priors. However, the posterior impropriety may be evident for smaller sample sizes. In making inferences about high quantiles of the marginal distribution of X, the GP model for threshold excesses is combined with a binomialn, p u ) model for the number of excesses, where p u = P X > u). Objective priors for a binomial probability have been studied extensively, see, for example, Tuyl et al. 9). An approximately equivalent approach is the non-homogeneous Poisson process NHPP) model Smith, 989), which is parameterized in terms of GEV parameters µ, and ξ relating to the distribution of max{x,..., X b }. Suppose that m observations x,..., x m exceed u. Under the NHPP the posterior density for µ,, ξ) is proportional to m+) π ξ ξ) exp { n [ + ξ u µ )] /ξ + } m [ + ξ xi µ )] +/ξ) +, 5) where n is the notional) number of blocks into which the data are divided in defining µ,, ξ). Without loss of generality, we take n = m. The exponential term in 5) is an increasing function of u, and x i > u, i =,..., m. Therefore, exp { n [ + ξ and 5) is less than { m+) π ξ ξ) exp u µ m [ + ξ )] /ξ + } xi µ < exp { )] /ξ + m } m [ + ξ [ + ξ xi µ xi µ )] /ξ + } )] +/ξ) +. 6) Equation 6) is of the same form as ), with n = m and y i = x i, i =,..., n. Therefore, theorems 5 and 9 apply to the NHPP model, that is, for posterior propriety it is sufficient that either a) n and πµ,, ξ) π ξ ξ)/, for >, µ, ξ R, where ξ π ξξ) dξ is finite, or b) n 4 and πµ,, ξ) /, for >, µ, ξ R. One possible extension of our work is to regression modelling using extreme value response distributions. For example, Roy and Dey 3) use GEV regression modelling to analyze reliability data. They prove posterior propriety under relatively strong conditions on the prior distribution placed on, ξ). The conditions required in our theorem 9 are weaker. We expect that the proof of theorem 7 in Roy and Dey 3) could be adapted using approach we use to prove our theorem 9. Another extension is to explore other formal rules for constructing priors, such as reference priors Berger et al., 9) and probability matching priors Datta et al., 9). Ho ) considers the latter for the GP distribution. 8

Acknowledgements One of us NA) was funded by an Engineering and Physical Sciences Research Council studentship while carrying out this work. References Abramowitz, M. and I. A. Stegun 97). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover Publications. Alzer, H. 999). Inequalities for the gamma function. Proc. Amer. Math. Soc. 8 ), 4 47. Bayarri, M. J. and J. O. Berger 4). The interplay of Bayesian and frequentist analysis. Statist. Sci. 9 ), 58 8. Beirlant, J., Y. Goegebeur, J. Teugels, J. Segers, D. De Waal, and C. Ferro 4). Statistics of Extremes. John Wiley & Sons. Berger, J. 6). The case for objective Bayesian analysis. Bayesian analysis 3), 385 4. Berger, J., J. Bernardo, and D. Sun 9). The formal definition of reference priors. The Annals of Statistics 37, 95 938. Coles, S. and E. Powell 996). Bayesian methods in extreme value modelling: a review and new developments. International Statistical Review 64, 9 36. Coles, S. and J. Tawn 996). A Bayesian analysis of extreme rainfall data. Journal of the Royal Statistical Society: Series C Applied Statistics) 45 4), 463 478. Coles, S. and J. A. Tawn 5). Bayesian modelling of extreme surges on the UK east coast. Phil. Trans. R. Soc. A. 363 83), 387 46. Coles, S. G. ). An Introduction to statistical modelling of extreme values. London: Springer. Datta, G., R. Mukerjee, M. Ghosh, and T. Sweeting 9). Bayesian prediction with approximate frequentist validity. The Annals of Statistics 8, 44 46. Eugenia Castellanos, M. and S. Cabras 7). A default Bayesian procedure for the generalized Pareto distribution. Journal of Statistical Planning and Inference 37 ), 473 483. Fisher, R. A. and L. H. C. Tippett 98). Limiting forms of the frequency distribution of the largest or smallest member of a sample. Proc. Camb. Phil. Soc. 4, 8 9. Giles, D. E. and H. Feng 9). Bias-corrected maximum likelihood estimation of the parameters of the generalized Pareto distribution. Econometric Working Paper EWP9, University of Victoria. Gradshteyn, I. S. and I. W. Ryzhik 7). Table of Integrals, Series and Products 7th ed.). New York: Academic Press. 9

Ho, K.-W. ). A matching prior for extreme quantile estimation of the generalized Pareto distribution. Journal of Statistical Planning and Inference 4 6), 53 58. Hobert, J. P. and G. Casella 996). The effect of improper priors on Gibbs sampling in hierarchical linear mixed models. Journal of the American Statistical Association 9 436), 46 473. Hosking, J. R. M. and J. R. Wallis 987). Parameter and quantile estimation for the generalized Pareto distribution. Technometrics 9 3), 339 349. Hosking, J. R. M., J. R. Wallis, and E. F. Wood 985). Estimation of the generalized extreme-value distribution by the method of probability-weighted moments. Technometrics 7 3), 5 6. Jeffreys, H. 96). Theory of Probability, Volume of The International series of monographs on physics. Oxford University Press. Jenkinson, A. F. 955). The frequency distribution of the annual maximum or minimum) values of meteorological elements. Q. J. R. Meteorol. Soc. 8, 58 7. Kass, R. E. and L. Wasserman 996). The selection of prior distributions by formal rules. Journal of the American Statistical Association 9 435), 343 37. Kotz, S. and S. Nadarajah ). Extreme value distributions: theory and applications. London: Imperial College Press. Leadbetter, M. R., G. Lindgren, and H. Rootzén 983). Extremes and Related Properties of Random Sequences and Processes. New York: Springer. Martins, E. S. and J. R. Stedinger ). Generalized maximum-likelihood generalized extreme-value quantile estimators for hydrologic data. Water Resources Research 36 3), 737. Martins, E. S. and J. R. Stedinger ). Generalized maximum likelihood Pareto- Poisson estimators for partial duration series. Water Resources Research 37 ), 55. Mitrinović, D. A. 964). Elementary inequalities. Groningen: Noordhoff. Pickands, J. 975). Statistical inference using extreme order statistics. Ann. Stat. 3, 9 3. Pickands, J. 994). Bayes quantile estimation and threshold selection for the generalized Pareto family. In J. Galambos, J. Lechner, and E. Simiu Eds.), Extreme Value Theory and Applications, pp. 3 38. Springer US. Qiu, S. L. and M. Vuorinen 4). Some properties of the gamma and psi functions, with applications. Math. Comp. 74 5), 73 74. Roy, V. and D. K. Dey 3). Propriety of posterior distributions arising in categorical and survival models under generalized extreme value distribution. Statistica Sinica. To appear. Smith, E. 5). Bayesian Modelling of Extreme Rainfall Data. Ph. D. thesis, University of Newcastle upon Tyne.

Smith, R. 984). Threshold methods for sample extremes. In J. Oliveira Ed.), Statistical Extremes and Applications, Volume 3 of NATO ASI Series, pp. 6 638. Springer Netherlands. Smith, R. and D. Goodman ). Bayesian risk analysis. In P. Embrechts Ed.), Extremes and Integrated Risk Management, pp. 35 5. London: Risk Books. Smith, R. L. 985). Maximum likelihood estimation in a class of non-regular cases. Biometrika 7, 67 9. Smith, R. L. 989). Extreme value analysis of environmental time series: An application to trend detection in ground-level ozone. Statistical Science 4, 367 377. Stephenson, A. and J. Tawn 4). Inference for extremes: accounting for the three extremal types. Extremes 7 4), 9 37. Tuyl, F., R. Gerlachy, and K. Mengersenz 9). Posterior predictive arguments in favor of the Bayes-Laplace prior as the consensus prior for binomial and multinomial parameters. Bayesian analysis 4 ), 5 58. van Noortwijk, J. M., H. J. Kalk, and E. H. Chbab 4). Bayesian estimation of design loads. HERON 49 ), 89 5. Yang, R. and J. Berger 998). A catalog of noninformative priors. ISDS Discussion paper 97-4, Duke University. Available at http://www.stats.org.uk/priors/noninformative/yangberger998.pdf. Zellner, A. 97). An Introduction to Bayesian Inference in Econometrics, Volume 7 of Wiley Series in Probability and Mathematical Statistics. John Wiley and Sons. Zellner, A. 998). Past and recent results on maximal data information priors. J. Statist. Res. 3 ),. Appendix Moments of a GP distribution We give some moments of the GP distribution for later use. Suppose that Z GP, ξ), where ξ < /r. Then Giles and Feng, 9) EZ r ) = r! r. 7) r iξ) Now suppose that ξ <. Then, for a constant a > ξ, and using the substitution x = ξv/, we have /ξ EZ a/ξ ) = v a/ξ + ξv ) +/ξ) dv, = ξ) a/ξ a/ξ x a/ξ x) +/ξ) dx, = ξ) a/ξ a/ξ Γ a/ξ)γ /ξ) Γ a + )/ξ), 8)

where we have used integral number in section 3.5 on page 34 of Gradshteyn and Ryzhik 7), namely x µ x λ ) ν dx = λ B µ λ, ν ) with λ =, µ = a/ξ and v = /ξ. = Γµ/λ)Γν) Γµ/λ + ν) λ >, ν >, µ >, In the following proofs we use the generic notation π ξ ξ) for the component of the prior relating to ξ: the form of π ξ ξ) varies depending on the prior being considered. Proof of theorem Suppose m =, with an observation z. distribution is given by The normalizing constant C of the posterior C = = z π ξ ξ) ξz π ξ ξ) dξ. + ξz/) +/ξ) ddξ + π ξ ξ) + ξz/) +/ξ) ddξ, If the latter integral is finite, that is, π ξ ξ) is proportional to a proper density function, then the posterior distribution is proper for m = and therefore, by successive iterations of Bayes theorem, it is proper for m. Proof of theorem Let Aξ) = e ξ and B, ξ) = m+) m + ξz i/) +/ξ). Then, from 6) we have C m = Aξ) B, ξ) ddξ, = Aξ) max, ξz m) ξz m B, ξ) ddξ + Aξ) ξz m B, ξ) ddξ + Aξ) B, ξ) ddξ. The latter two integrals converge for m. However, the first integral diverges for all samples sizes. For ξ <, + ξz/) +/ξ) > when z is in the support, /ξ) of the GP, ξ) density. Therefore B, ξ) > m+). Thus, the first integral above satisfies Aξ) B, ξ) d dξ ξz m > = = = mz m m Aξ) Aξ) m+) d dξ, ξz [ m ] m m dξ, ξz m Aξ) m [ ξz m] m v m e v dv, where v = ξ. This integral is divergent for all m, so there is no sample size for which the posterior is proper. dξ,

Proof of theorem 4 We need to show that C 3 is finite. We split the range of integration over ξ so that C 3 = I + I + I 3, where I = ξz 3 B, ξ) d dξ, I = ξz 3 B, ξ) d dξ, I 3 = and B, ξ) = 4 3 + ξz i/) +/ξ). For convenience we let ρ = ξ/. B, ξ) d dξ 9) Proof that I is finite We have ξ < and so + /ξ) <, ρ < and < + ρz i < for i =,, 3. Noting that ρz 3 < gives + ρz ) + ρz ) + ρz 3 ) > ρz 3 + ρz ) ρz 3 + ρz ) + ρz 3 ), = ρ) z 3 z )z 3 z ) + ρz 3 ), = ξ) z 3 z )z 3 z ) + ρz 3 ). ) Therefore, 3 + ξz ) +/ξ) [ i < ξ) +/ξ) +/ξ) z 3 z )z 3 z ) + ξz )] +/ξ) 3. Thus, where I I = ξ) +/ξ) [z 3 z )z 3 z )] +/ξ) I dξ, = z 3 ξz 3 4 +/ξ) /ξz3 + ξz 3 v /ξ z3 = ξ) /ξ z /ξ) 3 ) +/ξ) d, + ξv ) +/ξ) dv, z3 Γ /ξ)γ /ξ), Γ 3/ξ) where v = / and the last line follows from 8) on noting that the integrand is E V /ξ), where V GPz3, ξ). Therefore, I ξ) 3 [z 3 z )z 3 z )] +/ξ) z /ξ) 3 = [z 3 z 3 z )z 3 z )] = [z 3 z 3 z )z 3 z )] x = [z 3 z 3 z )z 3 z )] ξ) 3 z z z z 3 z 3 z 3 ) x z ) x z z 3 Γ /ξ)γ /ξ) dξ, Γ 3/ξ) ) /ξ z ) /ξ Γ /ξ)γ /ξ) dξ, z 3 Γ 3/ξ) ) x Γ + x)γx) dx, z 3 Γ + 3x) ) x Γ + x)γ + x) dx, ) Γ + 3x) where x = /ξ and we have used the relation Γ + x) = x Γx). The integrand in ) is finite over the range of integration so this integral is finite and therefore I is finite. 3

Proof that I is finite We have < ξ <, so + /ξ) > and + ξz/) +/ξ) < and decreases in z over, /ξ). Therefore, I = = 3 4 ξz 3 4 ξz 3 /ξz3 z = z = z 3 = z 3 ln3/). + ξz i + ξz z v z ) +/ξ) d dξ, ) +/ξ) d dξ, + ξv ) +/ξ) z dv dξ, ξ) ξ) dξ, { ) } ξ ξ) dξ, Proof that I 3 is finite We have ξ > and so + /ξ) <. We let g n = n z i) /n. Using Mitrinović 964, page 3): n n + a k ) + b) n, a k > ; a k = b n, ) k= with a k = ξz k / gives 3 k= + ξz ) +/ξ) i + ξg ) 3+/ξ) 3, and therefore I 3 = = 3 4 β + ξz i 4 + ξg 3 v β + αv β ) +/ξ) d dξ, ) 3+/ξ) d dξ, ) +/α) dv dξ, where v = /, α = / + 3/ξ) and β = α/ξg 3 = /3 + ξ)g 3. We note that for ξ >, α < / and so this inner integral is the second moment of a GPβ, α) distribution. 4

Therefore, I 3 = 3 g 3 3 = 9 g 3 3 β β α) α) dξ, = 9 g 3 3 ln. ξ + 3)ξ + 3) dξ, ξ + 3/ ) ξ + 3 dξ, The normalizing constant C 3 is finite, so π U,GP, ξ) yields a proper posterior density for m = 3 and therefore does so for m 3. Proof of theorem 5 Throughout the following proofs we define δ i = y i y, i =,..., n. We make the parameter transformation φ = µ /ξ. Then the posterior density for φ,, ξ) is given by πφ,, ξ) = Kn π ξ ξ) ξ n+/ξ) G n φ, ), where { n } { G n φ, ) = n/ξ y i φ +/ξ) exp ξ /ξ /ξ and, if ξ > then φ < y and if ξ < then φ > y n. } n y i φ /ξ We let b = ξ /ξ n y i φ /ξ and v = /ξ. The normalizing constant K n is given by K n = π ξ ξ) ξ n+/ξ) G n φ, ) d dφ dξ, = = = { n } π ξ ξ) ξ n+/ξ) y i φ +/ξ) n/ξ exp { b /ξ} d dφ dξ, { n } π ξ ξ) ξ n+/ξ) y i φ +/ξ) v n exp{ bv} ξ dv dφ dξ, { n } π ξ ξ) ξ n+/ξ) y i φ +/ξ) Γn)b n ξ dφ dξ, { n } { n } n = π ξ ξ) ξ n+/ξ) y i φ +/ξ) n )! ξ n/ξ+ y i φ /ξ dφ dξ, { n } { n } n = n )! π ξ ξ) ξ n y i φ +/ξ) y i φ /ξ dφ dξ, 3) For n = the integral φ:ξy φ)> y φ dφ is divergent so if n = the posterior is not proper for any prior in this class. 5

Now we take n = and for clarity consider the cases ξ > and ξ < separately, with respective contributions K + and K to K. For ξ >, using the substitution u = y φ) in 3) gives K + = δ π ξ ξ) ξ y π ξ ξ) ξ π ξ ξ) dξ, y φ) +/ξ) y φ) +/ξ) {y φ) /ξ + y φ) /ξ } u n + δ u) +/ξ) du dξ, + + δ u) /ξ dφ dξ, the final step following because the u-integrand is a multiple ξδ ) of a shifted log-logistic density function with location, scale and shape parameters of, ξδ and ξ respectively, and the location of this distribution equals the median. For ξ < an analogous calculation using the substitution v = y n φ) in 3) gives Therefore, K = δ K = K + + K = δ π ξ ξ) dξ. π ξ ξ) dξ. Thus, K is finite if π ξξ) dξ is finite, and the result follows. Proof of theorem 6 The crucial aspects are the rates at which π ξ ξ) as ξ / and as ξ. The component π ξ ξ) of ) involving ξ can be expressed as π ξξ) = ξ 4 T + T ), 4) where [ ] π T = + γ) + ξ) Γ + ξ), 5) 6 ] T = [ π 6 + γ)γ + ψ + ξ)) π Γ + ξ) [ + ψ + ξ)] [Γ + ξ)]. 6) 3 Firstly, we derive a lower bound for π ξ ξ) that holds for ξ > 3. Using the duplication formula Abramowitz and Stegun, 97, page 56; 6..8) Γz) = π) / z / Γz) Γz + /), with z = / + ξ in 5) we have [ ] π T = + γ) + ξ) π / ξ Γ/ + ξ) Γ + ξ). 6 We note that Γ/ + ξ) = Γ3/ + ξ) / + ξ > Γ + ξ) / + ξ 6 = Γ + ξ) + ξ > Γ + ξ) + ξ,

where for the first inequality to hold it is sufficient that ξ > /; and that, for ξ > 3, ξ > + ξ) 3. Therefore, [ ] π T > + γ) π / + ξ) 4 [Γ + ξ)]. 7) 6 Completing the square in 6) gives T = {[ + ψ + ξ)] Γ + ξ) + fξ)} + [fξ)] + π /6, where fξ) = π /6 γ)γ + ψ + ξ)) + ψ + ξ) and [fξ)] + π /6 >. = π /6 + γ) + ψ + ξ) γ) For ξ >, ψ + ξ) increases with ξ and so fξ) decreases with ξ. Therefore, for ξ > 3, fξ) < f3).39 and T > {[ + ψ + ξ)] Γ + ξ) + f3)}. For ξ >, we have ψ + ξ) < ln + ξ) + ξ) / Qiu and Vuorinen, 4, theorem C) and ln + ξ) ξ Abramowitz and Stegun, 97, page 68; 4..33). Therefore, noting that Γ + ξ) = + ξ) Γ + ξ) we have For ξ > 3, f3) Γ + ξ)/ < so { T > + ξ) Γ + ξ) } Γ + ξ) + f3). T > + ξ) 4 [Γ + ξ)]. 8) Substituting 7) and 8) in 4) gives, for ξ > 3, πξξ) > {[ ] } + ξ)4 π + ξ 4 γ) π / [Γ + ξ)], 6 > c[γ + ξ)], > c + ξ) λξ γ) where c = 4/3) 4 {[π /6+ γ) ]π / }.93 and the final step uses the inequality Γx) > x λx ) γ, for x > Alzer, 999), where λ = π /6 γ)/.534. Thus, a lower bound for the ξ component of the Jeffreys prior ) is given by π ξ ξ) > c / + ξ) λξ γ, for ξ > 3. 9) [In fact, numerical work shows that this lower bound holds for ξ > /.] Let K n + denote the contribution to K n for ξ > 3. Using the substitution u = y φ) in 3) gives K + n = n )! π ξ ξ) ξ n u n 7 { n + δ i u) +/ξ) + } n du dξ. 3) n + δ i u) /ξ i=

n For ξ > we have + +δ i u) /ξ n and n +δ iu) +/ξ) +δ n u) n )+/ξ). i= Applying these inequalities to 3) gives K + n n n n )! = n n n )! π ξ ξ) ξ n u n + δ n u) n )+/ξ) du dξ, π ξ ξ) ξ n β u n β + αu ) +/α) du dξ, 3) β where β = α/δ n and α = [n + n )/ξ] and < α < n ). The u-integrand is the density function of a GPβ, α) distribution and so, using 7) with r = n, the integral over u is given by n n )! β n Substituting 3) into 3) gives K + n n n n )!n )! δ n n = n n n )!n )! δ n n n iα = n )! ξn δn n = n n n )!n )! δn n > Cn) + ξ) π ξξ) dξ, n n n )ξ + n n i= n ) n n i)ξ + n. 3) n i)ξ + n π ξξ) dξ, n i= n i)ξ + n π ξξ) dξ, + i n ξ π ξξ) dξ, where Cn) = n n n )!n )! δn n n ) n. Applying 9) gives K n + > Cn) c / + ξ) n+λξ γ dξ. For any sample size n the integrand as ξ. Therefore, the integral diverges and the result follows. Now we derive an upper bound for π ξ ξ) that applies for ξ close to /. We note that for / < ξ < we have Γ + ξ) = Γ + ξ)/ + ξ) < + ξ). From 4) we have [ ] ) π πξξ) + ξ = + γ) Γ + ξ) + T 6 ξ ξ, 4 where T 3.39 as ξ /. Noting that + ξ) /ξ 4 4 as ξ / shows that π ξ ξ) < [π /6 + γ) ] / + ξ) / for ξ /, / + ɛ), for some ɛ >. In fact numerical work shows that ɛ.9. Proof of theorem 7 We show that the integral Kn, giving the contribution to the normalising constant from ξ <, diverges. From the proof of theorem 5 we have Kn = { n } { n } n n )! e γ+ξ) ξ) n y i φ +/ξ) y i φ /ξ y n dφ dξ. 8

For ξ < we have + /ξ) < and /ξ >. Therefore, for i =,..., n, φ y i ) +/ξ) > φ y ) +/ξ) and φ y i ) /ξ < φ y ) /ξ, and thus the φ-integrand is greater than n n φ y ) n. Therefore, K n > n )! e γ+ξ) ξ) n n n φ y ) n dφ dξ, = n )! n n n ) y n y ) n e γ+ξ) ξ) n dξ, y n = n )! n n y n y ) n e γ x n e x dx. For all samples sizes n this integral diverges so the result follows. Proof of theorem 9 We need to show that K 4 is finite. We split the range of integration over ξ in 3) so that K 4 = J + J + J 3, with respective contributions from ξ <, ξ and ξ >. Proof that J is finite We use the substitution u = φ y ) in 3) to give J = 3! = 3! { 4 { 4 } n ξ) 3 y i ) y 4 φ +/ξ)} φ y i ) /ξ dφ dξ, } 4 /δ4 4 4 ξ) 3 u δ i u) { +/ξ) + δ i u) /ξ du dξ. A similar calculation to ) gives 4 i= i= δ i u) +/ξ) u +/ξ) { 3 i= Noting also that + 4 i= δ iu) /ξ we have i= δ 4 δ i )} +/ξ) δ 4 u) +/ξ). { 3 } +/ξ) /δ4 J 3! ξ) 3 δ 4 δ i ) u /ξ δ 4 u) +/ξ) du dξ, i= { 3 +/ξ) /δ4 = 3! ξ) 3 δ 4 δ i )} β u /ξ i= β + ξu β ) +/ξ) du dξ, { 3 = 3! ξ) 3 δ 4 δ i ) i= } +/ξ) δ /ξ n Γ /ξ)γ /ξ) Γ 3/ξ) dξ, 9

where β = ξ/δ 4 and the last line follows from 8) on noting that the integrand is E U /ξ), where U GPβ, ξ). Therefore, J 3! = 3! = 3! = 3! ξ) 3 y 4 y ) /ξ y 4 y ) +/ξ) +/ξ) Γ /ξ)γ /ξ) y 4 y 3 ) dξ, Γ 3/ξ) 3 3 ) /ξ y 4 y i ) ξ) 3 y 4 y i Γ /ξ)γ /ξ) dξ, y i= 4 y Γ 3/ξ) 3 3 ) x y 4 y i ) y 4 y i Γ + x)γx) x dx, y i= 4 y Γ + 3x) 3 3 ) x y 4 y i ) y 4 y i Γ + x)γ + x) dx, 33) y 4 y Γ + 3x) i= where x = /ξ and we have used the relation Γ + x) = x Γx). The integrand in 33) is finite over the range of integration so this integral is finite and therefore J is finite. Proof that J is finite Using the substitution u = φ y ) in 3) gives J = 3! ξ) 3 /δ4 u 4 δ i u) { +/ξ) + i= } 4 4 δ i u) /ξ du dξ. For ξ we have + /ξ). Noting that < δ i u < gives 4 δ i u) +/ξ) δ 4 u) +/ξ). i= Noting also that + 4 i= δ iu) /ξ we have J 3! = 3! = 3!δ 3 4 i= /δ4 ξ) 3 u δ 4 u) +/ξ) du dξ, ξ) 3 β /δ4 u β ξ) ξ) dξ, = 3!y 4 y ) 3 ln3/) + ξu β ) +/ξ) du dξ, where β = ξ/δ 4 and the penultimate last line follows from 8) on noting that the integrand is E U ), where U GPβ, ξ). Proof that J 3 is finite Using the substitution u = y φ) in 3) gives { y 4 { 4 } n J 3 = 3! ξ 3 i φ) y +/ξ)} y i φ) /ξ dφ dξ, } 4 4 4 = 3! ξ 3 u + δ i u) { +/ξ) + + δ i u) /ξ du dξ. i= i=

Noting that for ξ > we have + /ξ) <, using ) with a k = δ k u gives 4 + δ i u) +/ξ) + gu) 3+/ξ), i= where g = δ δ 3 δ 4 ) /3. Noting also that + 4 i= + δ iu) /ξ we have J 3 3! 3! ξ 3 u + gu) 3+/ξ) du dξ, ξ 3 β u β where α = ξ/ξ + 3) and β = α/g. Therefore, + αu ) +/α) du dξ, β J 3 3! ξ 3 β β α) α) dξ, = 4g 3 ξ + 3)ξ + 3) dξ, = 4 3 g 3 ξ + 3/ ) dξ, ξ + 3 = 4 3 g 3 ln. The normalizing constant K 4 is finite, so π U,GEV, ξ) yields a proper posterior density for n = 4 and therefore does so for n 4.