The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Size: px

Start display at page:

Download "The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations"

Dwain Webster
5 years ago
Views:

1 The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture approach is an exact methodology for simulating new families of bivariate distributions with specified correlation coefficients. It accommodates the entire range of correlations, produces bivariate surfaces that are intuitively appealing, and is often remarkably easy to implement. The approach is introduced in a Bayesian context and demonstrated for the conjugate families of beta and gamma distributions, with special attention given to the bivariate uniform. For these distributions, formulas for correlations have simple closed forms and computations are easy. KEY WORDS: Bayes, beta, conjugate prior, cool, exchangeable, gamma, generating, hierarchical models, Markov Chain Monte Carlo, posterior, uniform. 1. Introduction The generation of multivariate distributions has widespread applications for research and practice. An important application is the computer evaluation of new statistical methods for analyzing multivariate data. Many, if not most, multivariate methods assume a specific form for the underlying distribution of the observed data. Through simulation studies, the researcher can investigate properties of methods, estimators, or test statistics for a variety of multivariate distributions that may be encountered in practice. A popular multivariate distribution is the normal. It is easy to simulate, often permits closed-form theoretical results, and is perhaps the most widely known statistical distribution. The popularity of the normal might be due in part to its familiarity and convenience rather than to its appropriateness. Statistical distributions encountered in practice, particularly those underlying observational data, are often non-normal, e.g., skewed or relatively heavy-tailed. The development of convenient simulation techniques for alternative distributions can help increase their familiarity and use, both in research and practice. For non-normal data, some general approaches have been developed to

2 simulate multivariate distributions, such as the conditional distribution approach, the transformation approach, and the rejection approach. Unfortunately, these approaches often suffer from computational difficulties, are restrictive in form, and model only weak dependence (Johnson, 1987). This paper presents a new exact methodology for simulating multivariate distributions, which we term the mixture approach. It uses concepts that are found in Bayesian analysis, which inspired its development. Accordingly, to help fix ideas, we loosely refer to certain distributions in this paper as the prior, likelihood, joint, marginal, and posterior, where the likelihood is the conditional sampling distribution of the observed data. The essence of the mixture approach lies in a deceptively simple concept: for a specified prior, observations simulated from appropriate posteriors have the same marginal as the prior. The correlation between the two marginals depends on the intermediate likelihood for some disposable data that effectively "mixes" posteriors. Specifying the parameters of the likelihood precisely controls the correlation coefficient. In the language of Markov Chain Monte Carlo (MCMC) methods (see Tierney, 1994), we capitalize on the structure of sequences that are in equilibrium. In effect we exploit Gibbs samplers that have actually converged (see Casella and George, 1992). Section 2 presents the theory of the mixture approach. Section 3 summarizes the simulation steps using the mixture approach. Sections 4 and 5 apply the approach to create new bivariate beta and gamma families, respectively, both of which are remarkably easy to generate. Finally, Section 6 offers some concluding remarks and describes extensions that are underway. The appendices contain derivations of formulas for correlation coefficients and the density function for a specific bivariate uniform distribution. 2. Theoretical Basis of the Mixture Approach Let the random variable X 1 have a prior represented by the pdf g ( x 1 ; θ ), (1) where the parameter θ may be multidimensional. Next, conditioning on X 1 = x 1, let the random variable K for the data have the likelihood represented by the pmf (or pdf) h ( k x 1 ; η ), (2) where the parameter η may also be multidimensional. Multiplying (1) and (2) yields the joint of X 1 and K, 2

3 j ( x 1,k; θ,η ) = g ( x 1 ; θ) h ( k x 1 ; η ). (3) Integrating out x 1 yields the marginal of K, m ( k; θ,η ) = j ( z, k; θ, η ) dz. (4) Dividing (3) by (4) yields the posterior of X 2 given K = k, p ( x 2 k; θ,η ) = j ( x 2,k; θ,η ) / m ( k; θ, η ). (5) Formulas (1) through (4) parallel the conventional approach in Bayesian inference; however, the introduction of the new random variable X 2 in (5) is a fundamental departure. For continuous priors, the probability is zero that an observation, x 2, simulated from the posterior will exactly equal the corresponding value simulated from prior, x 1. Nevertheless, the marginal of X 2 is identical to the prior of X 1. This is because the simulation process makes no use of the data as a conditioning factor, thereby "disposing" of the intermediate information about K. The resulting distribution is a weighted average (mixture) of different posteriors that exactly reproduces the prior. We now formally demonstrate the equality of the marginal distributions of X 1 and X 2. Multiplying (3) and (5) yields the trivariate distribution of X 1, X 2, and K, f ( x 2,x 1,k; θ, η ) = j ( x 1, k; θ, η ) j ( x 2,k; θ, η ) / m ( k; θ, η ). (6) The symmetry of (6) with respect to x 1 and x 2 constitutes a proof that X 1 and X 2 have the same marginal distribution, g ( x 2 ; θ ). 3. Mixture Simulation Steps A bivariate pair ( x 1, x 2 ) is generated by sequentially simulating observations from the prior, likelihood, and posterior as follows: 1. Generate an observation, x 1, from the prior; 2. Generate an observation, k, from the likelihood, which is conditioned on x Generate an observation, x 2, from the posterior. Table 1 lists the steps in the mixture simulation approach alongside major steps in conventional Bayesian inference. 3

4 Table 1. Steps in Mixture Simulation and Conventional Bayesian Inference Step Mixture Simulation Bayesian Inference 1 Specify the prior. Specify the prior. 2 Simulate prior values, x 1. 3 Formulate likelihoods conditioned upon values x 1. Formulate the likelihood. 4 Simulate the data, k. from conditional likelihoods. Collect the data, k. 5 Derive conditional posteriors. Derive the posterior. 6 Simulate values, x 2, from conditional posteriors. 7 Form bivariate pairs ( x 1,x 2 ). Draw inferences about x Simulating a New Bivariate Beta Family The beta ( α, β ) is the familiar conjugate prior for estimating a binomial parameter x 1 in a Bayesian context. Applying the mixture approach, the associated prior, likelihood, posterior, and trivariate distributions are g ( x 1 ; α, β ) x 1 α-1 ( 1-x1 ) β-1, h ( k x 1 ; ν ) k C ν x 1 k ( 1-x1 ) ν-k, p ( x 2 k; α,β,ν ) x 2 α+k-1 ( 1-x2 ) β+ν k-1, and f ( x 1,x 2, k; α,β,ν ) ( x 1 x 2 ) α+k-1 [ ( 1-x 2 ) ( 1-x 1 ) ] β+ν-k-1, where θ = ( α, β ) in (1), η = ν in (2), and k C ν is the binomial coefficient. (Some proportionality constants have been omitted in the above formulas for the sake of simplicity.) The formula for the correlation coefficient, ρ beta, between X 1 and X 2 reduces to the simple intuitive expression ρ beta = ν / ( ν + α + β ). (7) 4

5 (See Appendix 1 for technical details.) Assuming the values of α and β are prespecified and fixed, the value of the correlation coefficient is controlled by specifying the value of ν, which represents the amount of information to be realized from the likelihood. Solving for ν, the required "effective" sample size for a specified correlation coefficient is ν = ( α + β ) ρ beta / ( 1-ρ beta ). (For the binomial case, if fractional sample sizes are needed, one can determine another member of the generalized power series family of distributions, which permits fractional values of ν, and can be used to expand the new bivariate beta family presented here. See Patil, Sharadchandra, & Rao, 1968, for details.) Inspection of (7) reveals that the entire range of correlations is accommodated by the mixture simulation approach. In particular, ρ beta is an increasing function of the effective sample size ( ν ) approaching 1 asymptotically, and ρ beta is a decreasing function of ( α + β ) approaching 0 asymptotically. Of course, negative correlations are obtained by reversing the sign of x 2. These properties parallel those in a Bayesian context: the dependence of the Bayes estimator on the data increases with the sample size from the likelihood, and decreases with the specificity of the prior. 4.1 Graphical Illustration of Mixture Simulation The following simple illustration is intended to elucidate the concepts upon which mixture simulation is based. We simulated five independent bivariate pairs of observations from the bivariate beta ( α = β = 3) and ρ beta = 0.75, which implies that ν = 2. Table 2 summarizes the relevant steps, which can be easily implemented, say, using the functions rbeta ( ) and rbinom ( ) in S-Plus 2000 (MathSoft, Inc., Seattle). It is instructive to follow the steps in the simulation of a single observation. The first x 1 value was 0.85, which produced k = 2, which led to x 2 = By disposing of our knowledge of which x 2 value was simulated from which posterior, we can now regard the sample of x 2 values as unconditioned and governed only by the prior. Thus the sample of x 2 values constitutes a random sample from a distribution that is precisely beta (3,3). The sample in Figure 1 has the observed correlation ρ beta = For illustrative purposes, we purposefully chose this sample from a number of simulated samples because of its regularity. Many of the other samples exhibited 5

6 substantially greater variability in some aspects. For example, in one simulation all five values of k were equal to 1. Table 2. Steps Used to Simulate Five Bivariate Pairs Step Operation Pair 1 Pair 2 Pair 3 Pair 4 Pair 5 1 Specified a beta (3, 3) prior, graphed in Figure 1a. 2 Simulated five independent values of X 1, marked in Figure 1a. 3 Simulated five values of K from conditional binomials (2,x 1 ), marked in Figure 1b. 4 Formed three distinct posteriors, overlaid in Figure 1c, and Simulated five values of X 2, marked in Figure 1c. 5 Formed ( x 1, x 2 ) pairs, plotted in Figure 1d (5,3) (4,4) (3,5) (5,3) (4,4)

7 (a) Beta(3, 3) Prior Density (b)conditional Binomial Data (c) Three Beta Posteriors g(x1) p(x k) x1 (d) Scatterplot for Bivariate Beta k x2 x2 x1 Figure 1. Illustrative Simulation of a New Bivariate Beta (3,3) 7

8 Howard (1998) argues that it may be appropriate to use dependent priors for a Bayesian analysis of two binomials. One of his main recommendations is that Bayesians give careful consideration to dependent priors. Our bivariate beta provides a very satisfactory alternative to the augmented versions of independent betas presented by Howard (1998). Furthermore, the mixture approach has the advantage of preserving the marginals. The Dirichlet distribution is a special three-parameter bivariate beta family that has been previously investigated. Loukas (1984) examines several alternative approaches to generating bivariate beta variates and recommends combining independent gammas for reasons of efficiency. This family is not flexible enough to permit a range of correlations. In this regard see Gupta and Wong (1985) for an extension of the Morgenstern system. 4.2 Importance of the Bivariate Uniform When α = β = 1, the beta reverts to the familiar uniform on the unit interval. Efficient simulation of correlated uniform observations is particularly useful because bivariate distributions with other specified marginals can then be obtained by applying inverse probability integral transformations. In particular, suppose we wish to simulate ( Y, Z), wherey and Z have marginal cdf's F and G and continuous quantile functions (inverse marginal cdf's) V ( u ) = F -1 ( u ) and W ( u ) = G -1 ( u ), respectively. It follows that ( Y, Z ) = [V ( X 1 ), W ( X 2 ) ] has the desired bivariate distribution. 8

9 rho = 0.5 rho = 0.7 x2 x2 x1 x1 rho = 0.9 rho = 0.95 x2 x2 x1 x1 Figure 2. Scatterplots of 100 Observations Simulated from the Bivariate Beta (3, 3) with Correlations = 0.5, 0.7, 0.9,

10 For some distributions, the appropriate quantile function can be applied with relative ease. For example, if Y = V ( X 1 ) = µ + σ { log [X 1 / (1-X 1 ) ] ) }, then Y is logistic with location and scale parameters µ and σ, respectively. Although the correlation between simulated bivariate uniforms can be precisely controlled with the mixture approach, the correlation between the transformed bivariate uniforms, Y and Z, depends on the functional form of the transformation and must be derived on a case-by-case basis. A Taylor series expansion of the transformation function might provide a satisfactory approximation. Conversely, any continuous bivariate distribution can be transformed to bivariate uniformity by applying the appropriate probability integral transformation to each member of the pair. In particular, (U 1,U 2 ) = [F (Y ), G (Z ) ] is bivariate uniform. This distribution is generally not a member of the new bivariate family of uniforms generated by the mixture approach; however, a member of the latter family might provide a satisfactory approximation to the distribution of (U 1,U 2 ). Assuming so, it is then possible to simulate the bivariate pair (Y,Z ) by applying the appropriate quantile functions, as described in the previous paragraph. In practice, when the data are observational, the functional forms of the true cdf's are typically unknown and can be unique to the process that generates the data. In such cases, smoothed empirical cdf's and corresponding quantile functions might be adequate for transforming the observed data to and from approximate uniformity, thereby permitting simulation of quite specialized joint distributions. 4.3 Graphical Illustration of the Bivariate Uniform Figure 3 is a contour plot of a bivariate uniform generated by the mixture method. The special case where ν = 2 and ρ beta = 0.5 is plotted because of the relative simplicity of the joint distribution of X 1 and X 2, which has pdf b ( x 1, x 2 ) = 3{ x 1 ( x 1-2 ) + x 2 ( x 2-2 ) x 1 x x 1 x 2 ( 1- x 1 ) ( 1- x 2 ) }. (See Appendix 2 for technical details.) 10

11 The contour plot in Figure 3 bears some resemblance to the corresponding plot for the Farlie-Gumbel-Morgenstern family of bivariate distributions (see Johnson, 1987, page 181). However the latter family is subject to the restriction that ρ beta 1 / 3, whereas the mixture approach accommodates the entire range of correlation coefficients. Falk (1999) describes a distinctly different approach using probability integral transformations of multivariate normals. Although Falk s approach allows a prescribed correlation structure, there are some constraints for a positive semi-definite result. x x1 Figure 3. Contour Plot of the Bivariate Uniform Distribution with r beta =

12 5. Simulating a New Bivariate Gamma Family The gamma ( λ ) is the familiar conjugate prior for estimating a Poisson parameter x 1 in a Bayesian context. The presentation here parallels that for the beta in the previous section. Applying the mixture approach, the associated prior, likelihood, posterior, and trivariate distributions are g ( x 1 ; λ ) x 1 r-1 e -λx 1, h ( k x 1 ; τ ) x 1 k e -τx 1 / k!, p ( x 2 k; λ,τ ) x 2 r+k-1 e -(λ+τ) x 2, and f ( x 2, k, x 1 ; λ,τ ) ( x 1 x 2 ) r+k-1 e -(λ+τ) (x 2 +x 1 ), where θ = ( r,λ ) and η = τ. Once again, the formula for the correlation coefficient between X 1 and X 2 reduces to a simple intuitive expression, ρ gamma = τ / ( τ + λ ). (8) (See Appendix 3 for technical details). Assuming the value of λ is pre-specified and fixed, the value of the correlation coefficient is controlled by specifying the value of τ, which represents the amount of the data observed from the likelihood. Solving for τ, the required Poisson parameter for a specified correlation coefficient is τ = λρ gamma / ( 1-ρ gamma ). 6. Concluding Remarks The mixture approach is a method of simulating new families of bivariate distributions. The approach has the advantage of accommodating the entire range of correlation coefficients, which can be precisely controlled. And for some distributions, such as the beta and gamma, the computations are very easy. Application of the mixture approach with a conjugate normal prior for a normal mean yields the familiar bivariate normal distribution. Although this approach is new, it is not very efficient, compared to the standard simulation approach (see Johnson, 1987, pages 52-54), because it requires the generation of three independent normals rather than two. 12

13 Manuscripts are in preparation that extend the mixture approach with specified correlations in two directions: to the multivariate case with three or more random variables; and to the bivariate beta with different (arbitrary) marginal distributions. The latter extension is particularly relevant to problems encountered in practice where marginal distributions have different shapes. References Casella, G., & George, E.I. (1992). Explaining the Gibbs Sampler, The American Statistician, 46, Falk, M. (1999). A simple approach to the generation of uniformly distributed random variables with prescribed correlations. Communications in Statistics - Simulation and Computation, 28, Gupta, A.K. & Wong, C.F. (1985). On three and five parameter bivariate beta distributions, Metrika, 32, Howard, J. V. (1998). The 2x2 table: A discussion from a Bayesian viewpoint, Statistical Science, 13, Johnson, M.E. (1987). Multivariate Statistical Simulation. New York: John Wiley & Sons. Loukas, S. (1984). Simple methods for computer generation of bivariate beta random variables, J. Statistical Computation and Simulation, 20, Patil, G. P., Sharadchandra, W. J., & Rao, C. R. (1968). A Dictionary and Bibliography of Discrete Distributions. New York: Hafner. Tierney, L. (1994). Markov Chains for exploring posterior distributions (with discussion), The Annals of Statistics, 22,

Computational statistics

Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated