Bayes factor testing of equality and order constraints on measures of association in social research

Size: px

Start display at page:

Download "Bayes factor testing of equality and order constraints on measures of association in social research"

Clinton Brooks
5 years ago
Views:

1 arxiv: v1 [stat.me] 16 Jul 2018 Bayes factor testing of equality and order constraints on measures of association in social research Joris Mulder & John P.T.M. Gelissen July 17, 2018 Abstract Measures of association play a central role in the social sciences to quantify the degree of association between the variables of interest. In many applications researchers can translate scientific expectations to hypotheses with equality and/or order constraints on these measures of association. In this paper a Bayes factor test is proposed for testing multiple hypotheses with constraints on the measures of association between ordinal and/or continuous variables, possibly after correcting for certain covariates. This test can be used to obtain a direct answer to the research question how much evidence there is in the data for a social science theory relative to competing theories. The accompanying software package BCT allows users to apply the methodology in an easy manner. An empirical application from leisure studies about the associations between life, leisure and relationship satisfaction is used to illustrate the methodology. 1 Introduction In the social sciences, statistical measures that quantify the degree of association between variables of interest are a fundamental tool for making causal inferences from correlational data when the design or the logic of the study makes it plausible that a variable X is an antecedent to a variable Y. The best-known example of such a measure is the Pearson product-moment 1

2 correlation coefficient r, which expresses the strength of the linear relationship between two continuous variables. If the variables are measured on an ordinal scale, ordinal measures of association such as Spearman s rho are needed to quantify the strength of the linear relationship between the variables. In many analyses, the researcher is not only interested in a zero-order association between the variables of interest, but he also wants to rule out that any association found is the result of the two focal variables having a common cause, which would make the zero-order correlation spurious. A classical tool for this purpose is the partial correlation coefficient, which can be used to measure the linear association between to variables while controlling for other variables. Many researchers usually stop after the second step in which other variables are controlled or partialled out, just to conclude that the evidence is in favor or against a spurious association. But such a basic correlational analysis has also the potential for testing more complex and interesting substantive sociological hypotheses. It may be of interest to find out which variables in such a correlational analysis are most important, with some more or less informed expectation about a meaningful ordering in the sizes of the controlled associations, and that some other controlled correlations are equal to each other. In this paper a general framework is presented for testing multiple hypotheses with equality and order constraints on measures of association. The current approach builds upon the earlier work of Mulder (2016) who presented a test for order constraints on bivariate correlations between continuous variables. The current paper proposes several important extensions. First, the proposed method can also be used for testing hypotheses with equality constraints, hypotheses with order constraints, as well as hypotheses with combinations of equality and order constraints on the correlations. The importance of this extension is evident given the importance of the (equality constrained) null hypothesis in scientific research, e.g., an association equals zero, or the association between all variables is exactly equal. Second, the methodology can be used for testing the association between continuous variables, ordinal variables, and combinations of ordinal and continuous variables. This is particularly relevant in sociological research where variables are commonly measured on an ordinal scale. Table 1 shows an overview of the types of association measures that can be tested using the proposed methodology. Third, the methodology can be used to test constraints on partial correlations by correcting for external covariates. Controlling for external (confounding) covariates is very important to avoid spurious relationships 2

3 Table 1: Types of measures of association depending on the measurement scale of the variables. Scale of Y 2 Dichotomous Polychotomous- Continuous- Scale Y 1 Ordinal categories Interval Dichotomous Tetrachoric Polychoric Biserial Polychotomous- Polychoric Polyserial Ordinal categories Continuous- Interval Product- Moment between the variables of interest. When testing statistical hypotheses on measures of association, different testing criteria can be used. The Fisherian p-value is perhaps most commonly used for testing a single correlation. In this paper however we shall not develop p-values for testing multiple hypotheses with equality and/or order constraints on the measures of association between the variables of interest. There are two important reasons for this. First, p-values cannot be used to directly test nonnested hypotheses with order constraints, such as H 1 : ρ 21 < ρ 31 < ρ 32 versus H 2 : ρ 21 > ρ 31 > ρ 32 ; instead p-values can only be used for testing a precise null hypothesis with only equality constraints against an ordered alternative or for testing an ordered null hypothesis against the unconstrained alternative (Silvapulle & Sen, 2004). Second, p-value tests are not suitable for testing multiple hypotheses simultaneously; instead the p- value test is specifically designed to test one specific null hypothesis against one alternative hypothesis. In this paper however we do not limit ourselves to this specific scenario because researchers can often formulate multiple scientific expectations on measures of association as will be exemplified in this paper. Another important class for testing statistical hypotheses is the class of information criteria. Well-known examples are the AIC (Akaike, 1973), the BIC (Schwarz, 1978; Raftery, 1995; Wagenmakers, 2007), and the DIC (Spiegelhalter et al., 2002). These information criteria explicitly balance between fit and complexity when quantifying which model or hypothesis is best for the data at hand. Although information criteria can straightforwardly be 3

4 used for testing multiple hypotheses or models simultaneously (unlike classical p-values), these criteria are not suitable for testing order constrained hypotheses. The reason is that these information criteria incorporate the complexity of a model based on the number of free parameters which is illdefined when order constraints are present. As an example, it is unclear how many free parameters are present under the order-constrained hypothesis H 1 : ρ 21 < ρ 31 < ρ 32. Therefore information criteria will not be considered further in this paper. The criterion that will be used is the Bayes factor (Jeffreys, 1961; Kass & Raftery, 1995). A useful property of the Bayes factor is its straightforward interpretation as the relative evidence in the data between two hypotheses. Thus, the Bayes factor can also be used to quantify evidence for a null hypothesis (e.g., Rouder et al., 2009), which is not the case for the p-value; the p-value can only be used to falsify the null. Another important property of the Bayes factor is that it can straightforwardly be used for testing multiple hypotheses simultaneously (e.g., Berger & Mortera, 1999). In addition Bayes factors can be transformed to so-called posterior probabilities of the hypotheses. These posterior probabilities provide a direct answer to the research question how plausible (in a Bayesian sense) each hypothesis is based on the observed data. Bayes factors are also particularly suitable for testing hypotheses with order constraints. This has been shown by Klugkist et al. (2005) for group means in ANOVA designs, Mulder et al. (2009) for repeated measurement means, Mulder et al. (2010) for multivariate regression models, Klugkist et al. (2010) for contingency tables, Mulder & Fox (2013) for intraclass correlations, and Gu et al. (2014) for structural equation models. Finally it is important to note that under very general conditions, Bayes factors and posterior probabilities are consistent which implies that the evidence for the true hypothesis goes to infinity as the sample size grows to infinity. In order to compute the Bayes factor for testing a set of hypotheses with constraints on the correlations, two challenges need to be overcome. The first challenge is the specification of the prior of the parameters under each hypothesis. The prior reflects which values of the correlations are most likely a priori. The choice of the prior is particularly important when testing hypotheses with equality constraints on the model parameters. For example, arbitrarily vague priors should not be used when computing Bayes factors because the evidence for the null then becomes arbitrarily large. This is also known as Bartlett s phenomenon (Lindley, 1957; Bartlett, 1957; Jef- 4

5 freys, 1961). In this paper we propose a prior specification method for the correlations under the hypotheses based on uniform distributions. This prior assumes that every combination of the correlations under each hypothesis is equally likely, which seems a reasonable default setting. Also note that Jeffreys (1961) originally proposed a default Bayes factor for testing a single bivariate correlation using a uniform prior (Ly et al., 2016). The proposed methodology can be seen as a generalization of Jeffreys original approach. The second challenge is the computation of the marginal likelihood, a key ingredient of the Bayes factor. To compute the marginal likelihood of a hypothesis we need to compute the integral of the product of the likelihood and prior over the parameter space of the free parameters. In complex settings the computation of this integral can take a lot of time, which limits general utilization of the methodology for applied users. To tackle this problem we first provide a general expression of a Bayes factor for a hypothesis with equality and order constraints on the parameters of interest versus an unconstrained alternative. This result can be used for any testing scenario where the unconstrained posterior and prior have an (approximate) normal distribution under the unconstrained model. The result is used for the current problem of testing inequality and order constraints on measures of association. To ensure that the unconstrained posterior is approximately normal, a Fisher transformation is applied to the measures of association to obtain an approximately normal distribution. The algorithm for computing Bayes factors and the posterior probabilities for the hypotheses based on the new methodology is implemented in a Fortran software program called BCT (Bayesian Correlation Testing). The program allows users to test a general class of equality and order constrained hypotheses on measures of association which are commonly observed in social research. A user manual is included. This paper is structured as follows. In the next section we illustrate some equality and order constrained hypotheses that we develop for associations between life and domain satisfaction as addressed in Quality of Life research. Then we explain the methodology and numerical computational details of the method, and report evidence of the methods performance using a small simulation study. We then return to our empirical example and discuss the results of testing the equality and order constrained hypotheses we developed. Finally, we conclude with summarizing the properties and advantages of applying the method to social science research problems. 5

6 2 Exemplary hypotheses: Associations between life, leisure and relationship satisfaction An important research question in Quality of Life research concerns how a person s satisfaction in certain life domains (e.g. about leisure, relationship, or work) relates to overall life satisfaction (Felce & Perry, 1995; Newman et al., 2014). A prevailing hypothesis is that women are much more relational than men (Hook et al., 2003). As these authors point out, women like being connected (i.e., experiencing we-ness ), doing things together with others, and they place great emphasis on talking and emotional sharing. Men, on the other hand, see togetherness more as an activity than a state of being, as it is for women. They favor interactions that involve doing rather than being. Unlike women, men prefer to have an element of separation included in their relationships with others and the doing orientation seems to promote this (Hook et al., 2003, p. 465). Research has furthermore shown that men have on average more leisure time than women (Mattingly & Blanchi, 2003). Based on these general theoretical ideas and observations concerning gender differences in relational issues and leisure, we anticipate that such systematic differences will also be observed in how satisfaction with one s relationship and leisure is associated with overall life satisfaction, with leisure satisfaction relating more strongly to overall life satisfaction among men and relationship satisfaction relating more strongly to overall life satisfaction among women. Consider the following graphical model about the partial associations between three focal dependent variables that might be included in such an analysis: the degree of life satisfaction, leisure satisfaction, and relationship satisfaction (Figure 1). In this example we are interested in testing various informative hypotheses about conditional partial associations between the variables concerned, with gender also potentially moderating such partial associations. As can be seen, in this model the associations between the key variables of interest are controlled for differences in self-reported health and mood at survey. We use data from the Dutch LISS panel (Longitudinal Internet Studies for the Social sciences) to test various informative hypotheses about this model. In particular, the LISS panel contains the following variables used to operationalize the variables in the conceptual model: 1. Life Satisfaction (y 1 ): respondents were asked to rate the following statement: I am satisfied with my life ; ordered categorical variable, 6

7 Life Satisfaction (y1) Leisure Satisfaction (y2) Relationship Satisfaction (y3) Self-reported Health (x1) Mood at Survey (x2) Figure 1: Graphical Model describing Partial Associations between Life and Domain Satisfaction Variables. with 1 = strongly disagree 7 strongly agree 2. Leisure Satisfaction (y 2 ): respondents were asked to indicate how satisfied they are with the way in which they spend their leisure time; continuous variable, with 1 = entirely dissatisfied 11 entirely satisfied 3. Relationship Satisfaction (y 3 ): respondents were asked to indicate how satisfied they are with their current relationship; continuous variable, with 1 = entirely dissatisfied 11 entirely satisfied 4. Self-reported Health (x 1 ): respondents were asked: How would you describe your health, generally speaking? ; ordered categorical variable, with 1 = poor 5 = excellent 5. Mood at Survey (x 2 ): respondents were asked to indicate how they feel at the moment of completing the survey; ordered categorical variable, with 1 = very bad and 7 = very good 6. Gender (grouping variable g): the respondent s gender, with 1 = men and 2 = women 7

8 Several competing informative hypotheses can be formulated about the ordering of, and equalities between the partial correlations between y1, y2 and y3 conditional on g. Here, we consider the following hypotheses: Hypotheses about partial associations between satisfaction domains for men: H1 a H1 b H1 c For men, the partial association between leisure satisfaction and life satisfaction is stronger than the partial association between relationship satisfaction and life satisfaction, and in turn this association is stronger than the partial association between leisure satisfaction and relationship satisfaction: ρ g1y2y1 > ρ g1y3y1 > ρ g1y3y2. For men, the partial association between relationship satisfaction and life satisfaction is stronger than the partial association between leisure satisfaction and life satisfaction, and in turn this association is stronger than the partial association between leisure satisfaction and relationship satisfaction: ρ g1y3y1 > ρ g1y2y1 > ρ g1y3y2. For men, the partial associations between life satisfaction, leisure satisfaction, and relationship satisfaction are equal: ρ g1y3y1 = ρ g1y2y1 = ρ g1y3y2. H1 d The complement hypothesis, which holds not H1 a, nor H1 b, nor H1 c. Hypotheses about partial associations between satisfaction domains for women: H2 a H2 b H2 c For women, the partial association between leisure satisfaction and life satisfaction is stronger than the partial association between relationship satisfaction and life satisfaction, and in turn this association is stronger than the partial association between leisure satisfaction and relationship satisfaction: ρ g2y2y1 > ρ g2y3y1 > ρ g2y3y2. For women, the partial association between relationship satisfaction and life satisfaction is stronger than the partial association between leisure satisfaction and life satisfaction, and in turn this association is stronger than the partial association between leisure satisfaction and relationship satisfaction: ρ g2y3y1 > ρ g2y2y1 > ρ g2y3y2. For women, the partial associations between life satisfaction, leisure satisfaction, and relationship satisfaction are equal: ρ g2y3y1 = ρ g2y2y1 = ρ g2y3y2. 8

9 H2 d The complement hypothesis, which holds not H2 a, nor H2 b, nor H2 c. Hypotheses about conditional partial associations between life satisfaction and leisure satisfaction, with gender as moderator: H3 a H3 b H3 c The partial association between leisure satisfaction and life satisfaction among men is stronger than the partial association among women: ρ g1y2y1 >ρ g2y2y1. The partial association between leisure satisfaction and life satisfaction among men is equally strong as the partial association among women: ρ g1y2y1 =ρ g2y2y1. The partial association between leisure satisfaction and life satisfaction among women is stronger than the partial association among men: ρ g1y2y1 <ρ g2y2y1. Hypotheses about conditional partial associations between relationship satisfaction and life satisfaction, with gender as moderator: H4 a H4 b The partial association between relationship satisfaction and life satisfaction among women is stronger than the partial association among men: ρ g2y3y1 >ρ g1y3y1. The partial association between relationship satisfaction and life satisfaction among men is equally strong as the partial association among women: ρ g2y3y1 =ρ g1y3y1. H4 c The partial association between relationship satisfaction and life satisfaction among men is stronger than the partial association among women: ρ g2y3y1 <ρ g1y3y1. 3 Model specification 3.1 The generalized multivariate probit model A generalized multivariate probit model will be used for modeling combinations of continuous and ordinal dependent variables, which is based on Albert & Chib (1993), Chib & Greenberg (1998), and Boscardin et al. (2008). We 9

10 consider G independent populations. The P -dimension vector of dependent variables of subject i in population g will be denoted by y ig = (v ig, c ig ), of which the first P 1 elements, v ig = (v ig1,..., v igp1 ), are continuous normally distributed variables and the remaining P 2 = P P 1 elements, c ig = (c ig1,..., c igp2 ), are measured on an ordinal scale, for i = 1,..., n g, and g = 1,..., G. Furthermore, we assume that the p-th ordinal variable can assume the categories 1,..., K p, for p = 1,..., P 2. As is common in a Bayesian multivariate probit modeling, a multivariate normal latent variable, denoted by z ig, is used for each ordinal vector c ig. This implies that the p ordinal variable of subject i in population g falls in category k, i.e., c igp = k, if z igp (γ gp(k 1), γ gpk ], for k = 1,..., K p, where γ gpk is the upper cut-point of the k-th category of the p-th ordinal variable in the g-th population. In order to ensure identification of the model it is necessary to set γ gp0 =, γ gp1 = 0, and γ gpkp =, for g = 1,..., G and p = 1,..., P 2, and to fix the error variances of the latent variables to 1. As is common in regression models, the mean structure of each dependent variable is assumed to be a linear combination of K external covariates x ig. Subsequently, the generalized multivariate probit model can be defined by [ ] vig N(B g x ig, Σ g ), where (1) z ig Σ g = diag(σ g, 1 P 2 )C g diag(σ g, 1 P 2 ), (2) where 1 P 2 is a vector of length P 2 of ones. In this model B g is a P K matrix with regression coefficients of the g-th population where element (p, k) reflects the effect of the k-th covariate on the p-th dependent variable, for k = 1,..., K and p = 1,..., P, the P 1 error standard deviations of v ig in population g are contained in σ g, and C g denotes the P P correlation matrix of population g. Note that the (p, q)-th element denotes the linear association of the p-th and q-th dependent variable, for p q, while controlling for the covariates. Note that if K = 1 and x ig = 1, no covariates are incorporated in the model, which implies that the correlations are bivariate correlations. 10

11 3.2 Multiple hypothesis testing on measures of association In the context of testing constraints on correlations, which is the goal of the current paper, the correlation matrices C g are of central importance while the parameter matrix B g and the variances σ are treated as nuisance parameters. The correlations in C g are contained in the vector ρ, e.g., C g = 1 ρ g21 1 ρ g31 ρ g32 1 ρ g = (ρ g21, ρ g31, ρ g32 ). (3) Furthermore, the correlations over all populations will be combined in the vector ρ = (ρ 1,..., ρ G ) of length L = 1 GP (P 1). Similarly we combine 2 the parameter matrices B g and variances σ g over all G population in the matrices B and σ, respectively, and subsequently, the vectorization of B is denoted by the vector β. We consider a multiple hypothesis test of T hypotheses H 1,..., H T of the form H t : R E t ρ = r E t, R I t ρ > r I t, (4) for t = 1,..., T, where [R E t r E t ] is a qt E (L + 1) matrix of coefficients that capture the set of equality constraints under H t and [R I t r I t ] is a qt I (L + 1) matrix of coefficients that capture the set of inequality (or order) constraints under H t. In most applications researchers either compare two correlations with each other, e.g., ρ 121 > ρ 131, or a single correlation is compared to constant, e.g., ρ 121 >.5. Therefore each row of the matrices [R E t ] and [R I t ] is either a permutation of (1, 1, 0..., 0) with corresponding constant in r E t and r I t equals 0, or a row is a permutation of (±1, 0,..., 0) with corresponding constant r ( 1, 1). As an example, hypothesis H1 a : ρ 121 > ρ 131 > ρ 132 from the previous section (with the index labels omitted) would have the following matrix form ρ 121 [ ] ρ 131 [ ] H1 a : ρ ρ 221 >. 0 ρ 231 ρ

12 The allowed parameter space under the constrained H t will be denoted by C t. In certain parts of the paper we refer to an unconstrained hypothesis, denoted by H u, which does not assume any constraints on the correlations besides the necessary constraints on ρ that ensure that the corresponding correlation matrices are positive definite. 4 Methodology 4.1 Marginal likelihoods, Bayes factors, and posterior probabilities The Bayes factor is a Bayesian criterion that quantifies the relative evidence in the data between two hypotheses. The Bayes factor of hypothesis H 1 versus H 2 is defined by the ratio of the marginal likelihoods under the respective hypotheses, i.e., B 12 = m 1(Y) m 2 (Y), (5) where the marginal likelihood is defined by m t (Y) = f t (Y X, β, σ, ρ)π t (β, σ, ρ)dρdσdβ, (6) C t where f t denotes the likelihood under H t, which follows directly from (1) and (1), and π t denotes the prior for the unknown model parameters under H t. The marginal likelihood captures how likely the observed data is under a hypothesis and its respective prior. One of the main strengths of the Bayes factor is its intuitive interpretation. For example, the Bayes factor is symmetrical in the sense that if, say B 12 =.1, which implies that H 1 receives 10 times less evidence than H 2, it follows naturally from (5) that B 21 = B12 1 = 10, which implies that H 2 receives 10 times more evidence from the data than H 1. Furthermore, Bayes factors are transitive in the sense that if H 1 received 10 times more evidence than H 2, i.e., B 12 = 10, and H 2 received 5 times more evidence than H 3, i.e., B 32 = 5, it again follow naturally from the definition in (5) that hypothesis H 1 received 50 times more evidence than H 3 because B 13 = B 12 B 23 = 10 5 = 50. Various researchers have provided an indication how to interpret Bayes factors (e.g. Jeffreys, 1961; Raftery, 1995). For completeness we provided the guidelines that were given by Raftery (1995). These guidelines are helpful 12

13 Table 2: Rough guidelines for interpreting Bayes factors (Raftery, 1995). B 12 Evidence < 1/150 Very strong evidence for H 2 1/150 to 1/20 Strong evidence for H 2 1/20 to 1/3 Positive evidence for H 2 1/3 to 1 Weak evidence for H 2 1 No preference 1 to 3 Weak evidence for H 1 3 to 20 Positive evidence for H 1 20 to 150 Strong evidence for H 1 > 150 Very strong evidence for H 1 for researchers who are new to Bayes factors. We do not recommend to use these guidelines as strict rules because researchers should decide by himself or herself when he or she feels that the Bayes factor indicates strong evidence. The Bayes factor can be used to update the prior odds between two hypotheses being true before observing the data to obtain the posterior odds that the hypotheses are true after observing the data according to the following formula which follows directly from Bayes theorem Pr(H 1 Y) Pr(H 2 Y) = B 12 Pr(H 1) Pr(H 2 ), (7) where Pr(H t ) and Pr(H 2 Y) denote the prior and posterior probability that H t is true, respectively, for t = 1 or 2. In the general case of T hypotheses, the posterior hypothesis probabilities can be obtained as follows Pr(H t Y) = Pr(H t )B t1 T t =1 Pr(H t )B. (8) t 1 Posterior hypothesis probabilities are useful because they provide a direct answer to the research question how plausible each hypothesis is in light of the observed data. Researchers typically find these posterior probabilities easier to interpret than Bayes factors because the posterior probabilities add up to one. It should be noted however that in the case of equal prior probabilities for the hypotheses, i.e., Pr(H 1 ) = Pr(H 2 ), which is the default setting, the posterior odds between two hypotheses corresponds exactly to the respective Bayes factor as can be seen in (7). 13

14 4.2 Prior specification In order to compute the marginal likelihoods (and subsequently, the Bayes factors and posterior hypothesis probabilities), prior distributions need to be formulated for the unknown parameters β, σ, and ρ under each constrained hypothesis. A prior distribution reflects the plausibility of the possible values of the parameters before observing the data. Under each hypothesis, we set independent priors for the three different types of model parameters, i.e., π t (β, σ, ρ) = π N t (β) π N t (σ) π U t (ρ) I(ρ C t ), (9) with noninformative improper priors for the nuisance parameters πt N (β) = 1 (10) G πt N (σ) = σg, σ 1 g,p 1 (11) g=1 It is well known that proper priors (i.e., prior distributions that integrate to one) need to be formulated for the parameters that are tested, i.e., the correlations, in order for the Bayes factor to be well-defined. In this paper we consider a uniform prior for the correlations under a hypothesis in its allowed constrained subspace. This implies that every combination of values for the correlations that satisfies the constraints is equally likely a priori under each hypothesis. The prior for ρ is zero outside the constrained subspace under each hypothesis. Because the constrained parameter space C t is bounded, a proper uniform prior can be formulated for the correlations under every hypothesis, i.e., πt U (ρ) = Vt 1 I(ρ C t ), (12) where the normalizing constant V t is given by V t = 1dρ. (13) C t The normalizing constant V t can be seen as a measure of the size or volume of the constrained space. To illustrate the priors under different constrained hypotheses on measures, let us consider a model with 3 dependent variables and one population (the correlation matrix was given in (3) where we omit the population index j). Furthermore, let us consider the following hypotheses: 14

15 H u : ρ 21, ρ 31, ρ 32 (the unconstrained hypothesis). H 1 : ρ 21 = ρ 31 = ρ 32 (all ρ s are equal). H 2 : ρ 31 = 0, ρ 21, ρ 32 (only ρ 31 is restricted to zero). H 3 : ρ 31 = 0, ρ 21 > ρ 32 (ρ 31 is restricted to zero and ρ 21 is larger than ρ 32 ). The unconstrained parameter space of ρ = (ρ 21, ρ 31, ρ 32 ) that results in a positive definite correlation matrix is displayed in Figure 2a (taken from Rousseeuw & Molenberghs (1994) with permission). As noted by Joe (2006), the volume of this 3-dimensional subspace equals Therefore, the volume under the unconstrained hypothesis H u is given by V u = in (13). Therefore, the unconstrained uniform prior for the correlations equals πu U 1 (ρ) = I(ρ C u) in (12). When all ρ s are equal as under H 1, the common correlation, say, ρ, must lie in the interval ( 1, 1) to ensure positive definiteness (e.g. Mulder & Fox, ). Thus, the size of the parameter space of ρ corresponds to the length of the interval which is V 1 = 3. Therefore, the uniform prior for ρ under H 2 1 corresponds to π1 U (ρ) = 2 I(ρ ( 1, 1)), which is plotted in Figure 2b. 3 2 The diagonal ρ 21 = ρ 31 = ρ 32 is also plotted in Figure 2a as a dashed line, where the thick part lies within positive definite subspace C u. When ρ 31 is restricted to zero as under H 2, the allowed parameter space for (ρ 21, ρ 32 ) that results in a positive definite correlation matrix must satisfy ρ ρ 2 32 < 1, i.e., a circle with radius 1 (Rousseeuw & Molenberghs, 1994). Therefore, the uniform prior under H 2 : ρ 31 = 0 is given by π 2 (ρ 21, ρ 32 ) = 1 π I(ρ ρ 2 32 < 1) (Figure 2c) because a circle or radius 1 has a surface of V 2 = π. When ρ 31 is restricted to zero and ρ 21 is larger than ρ 32 as under H 3, the subspace is half as small as under H 2. Therefore the uniform prior density for the free parameters under H 3 is twice as large as the prior under H 1 to ensure the prior integrates to one. Thus, the uniform prior under H 3 is given by π 3 (ρ 21, ρ 32 ) = 2 π I(ρ ρ 2 32 < 1 & ρ 21 > ρ 32 ) (Figure 2d). 15

ρ21 = ρ31 = ρ32 (1,1,1) 2 1 ρ32 1 1 1 1 3 (a) ρ31 (-1,-1,-1) ρ21 (b) -.5 0.

positive definite (taken from Rousseeuw & Molenberghs (1994) with permission).

16 ρ21 = ρ31 = ρ32 (1,1,1) 2 1 ρ (a) ρ31 (-1,-1,-1) ρ21 (b) ρ -1 (0,0) ρ π (0,0) ρ32 2 π -1 1 ρ21 C 2 C ρ21 (c) (d) Figure 2: (a) Graphical representation of the subspace of (ρ 21, ρ 31, ρ 32 ) for which the 3-dimensional correlation matrix is positive definite (taken from Rousseeuw & Molenberghs (1994) with permission). The thick diagonal line from ( 1, 1, 1) to (1, 1, 1) represents the correlations that satisfy ρ = ρ 31 = ρ 32 and result in a positive diagonal correlation matrix. (b) Uniform prior π1 U for the common correlation ρ under H 1 : ρ 21 = ρ 31 = ρ 32 in the allowed region C 1 = {ρ ρ ( 1, 1)}. (c) Uniform prior 2 πu 2 for the free parameters under H 2 : ρ 31 = 0 in the allowed region C 2 = {(ρ 21, ρ 32 ) ρ ρ 2 32 < 1}. (d) Uniform prior for the free correlations under H 3 : ρ 31 = 0, ρ 21 > ρ 32 in the allowed region C 3 = {(ρ 21, ρ 32 ) ρ ρ 2 32 < 1, ρ 21 > ρ 32 }. 4.3 An expression of the Bayes factor for testing (in)equality constrained hypotheses In order to compute the marginal likelihood under each constrained hypothesis via (6) using the constrained uniform prior in (15) a complex multivariate 16

17 integral must be solved. This endeavor can be somewhat simplified by using the fact that the uniform prior for the correlations under each hypothesis H t can be written as a truncation of a uniform unconstrained prior under the unconstrained hypothesis H u, i.e., π U t (β, σ, ρ) = V u V t π U u (β, σ, ρ) I(ρ C t ) (14) where the unconstrained prior is given by π U u (β, σ, ρ) = V 1 u σ 1 1,1... σ 1 1,P 1 I(ρ C u ), (15) where V u = C u 1dρ, which can be interpreted as the volume of the subspace of ρ that results in positive definite correlation matrices under all populations. Note that the normalizing constant in (14) is equal to the reciprocal of the unconstrained prior integrated over the constrained space C t, πu U (ρ)dρ = Vu 1 C t C t 1dρ = V t V u. (16) This relationship between each constrained prior and the unconstrained prior allows us to use the following general result for computing the Bayes factor of a hypothesis with certain constraints on the parameters of interest, say, θ, against a larger unconstrained hypothesis in which the constrained hypothesis is nested. Lemma 1 Consider a model where a hypothesis is formulated with equality and inequality constraints on the parameter vector θ of length Q of the form H t : R E t θ = r E t, R I t θ > r I t, where the q E Q matrix R E t has rank q E Q, and a larger unconstrained hypothesis H u in which H q is nested. Let the vector of nuisance parameters in the model are denoted by φ. If the prior of θ under H t is defined as the truncation of a proper prior for θ under H u in its constrained subspace, then the Bayes factor of H t against H u can be written as B tu = Pr(RI t T 1 [r E, ξ I ] > r I t ξ E = r E t, H u, Y) Pr(R I t T 1 [r E, ξ I ] > r I t ξ E = r E t, H u ) π u(ξ E = r E t Y) π u (ξ E = r E t ), (17) with ξ E = R E t θ, ξ I = R t θ, where R t is a (Q q E ) Q arbitrary matrix such that the Q Q matrix T = [R E t R t] has rank Q. The conditional probabilities in the numerator and denominator in the first factor in (17) are computed based on the conditional posterior and prior of ξ I ξ E = r E t under H u, respectively. 17

18 Proof. The derivation is combines the results of Dickey (1971), Klugkist et al. (2005), Pericchi et al. (2008), Wetzels et al. (2010), Mulder (2014a), and Gu et al. (2017). A proof is given in Appendix A. Note that the second factor in (17) is equal to the well-known Savage- Dickey density ratio (Dickey, 1971; Verdinelli & Wasserman, 1995; Wetzels et al., 2010). The ratio of posterior and prior probabilities in the first factor was also observed in Klugkist et al. (2005) when there are no equality constraints under H t. The conditional posterior probability, i.e., the numerator of the first term of (17), can be interpreted as a measure of fit of the order constraints of H t relative to H u ; the marginal posterior density, i.e., the numerator of the second term of (17), can be seen as a measure of fit of the equality constraints of H t relative to H u ; the conditional prior probability, i.e., the denominator of the first term of (17), can be interpreted as a measure of complexity of the order constraints of H t relative to H u ; the marginal prior density, i.e., the denominator of the second term of (17), can be seen as a measure of complexity based on the equality constraints of H t relative to H u ; see also Mulder (2014a) and Gu et al. (2017). Evaluating equality constraints and the inequality constraints conditional on the equality constraints separately was shown by Pericchi et al. (2008). The contribution here is that the Bayes factor is derived for the general case of a set of linear equality constraints and a set of linear inequality constraints where the prior under H t is a truncation of the unconstrained prior. We have not seen (17) elsewhere in the literature. In the current paper, the parameter vector θ would correspond to the vector of correlations ρ and the nuisance parameters φ would correspond to the vector of containing B and σ. The computation of the Bayes factor in (17) for a constrained hypothesis H t on measures of association is discussed in the following section. 5 Numerical computation In this section we discuss a general numerical method to compute the elements in (17) for a hypothesis H t with equality and/or inequality constraints on the measures of association in the generalized multivariate probit model in Section 3.1 using the uniform prior discussed in Section 4.2. The computation of the posterior parts in the numerators in (17) is discussed first, 18

19 followed by the prior parts in the denominators. 5.1 Computation of the posterior probability and posterior density To compute the posterior probability in the nominator in the first factor in (17) and the posterior density in the numerator in the second factor of (17) under the unconstrained hypothesis, first the unconstrained posterior is approximated using an unconstrained posterior sample from the unconstrained generalized multivariate probit model (Section 3.1). This sample can be obtained using an MCMC sampler. Because the unconstrained uniform prior in (15) is not conjugate, Metropolis-Hastings steps are needed when sampling the model parameters. A MCMC sampler was written based on the algorithm of Boscardin et al. (2008), where the correlation matrices are efficiently sampled using the Metropolis-Hastings algorithm of Liu & Daniels (2006). The cut-off parameters γ for modeling ordinal dependent variables are sampled using uniform distributions. Computing the posterior probability in (17) based on the proportion of unconstrained draws that satisfy the inequality constraints can be very inefficient (Mulder et al., 2012). For this reason an alternative method is used. First note that the sampling distribution of the Fisher transformed sample correlation r given a population correlation ρ is approximately normal. Furthermore, the population correlation ρ and the sample correlation r have a similar role in the integrated likelihood of r given ρ (Johnson & Kotz, 1970; Mulder, 2016). Now because a uniform prior is used for ρ, which is relatively vague, the posterior of ρ is completely dominated by the likelihood. Therefore, the posterior of the Fisher transformed parameters, which will be denoted by η 1, is approximately normal. To illustrate the accuracy of the normal approximation, data of size n = 40 were generated from the generalized multivariate probit model of dimension P = 3 where the first dependent variable was normally distributed, the second dependent variable was ordinal with two categories, and the third dependent variable was ordinal with three categories. The population variables were set to ρ = (ρ 21, ρ 31, ρ 32 ) = (.25,.25, 0). A posterior sample of size 1 Let p 1 p 2 ( be the association ) between the p 1 -th and p 2 -th dependent variable. Then, η p1p 2 = 1 2 log 1+ρp1 p 2 1 ρ p1 is the Fisher transformed association. Note that the constraints p 2 under H t are identical for η because the Fisher transformation is monotonically increasing. 19

20 posterior draw (η 21,η 31 ) density estimate normal approximation η η 31 Figure 3: Scatter plot of posterior draws of (η 21, η 31 ) (red dots) with additional contour plot and univariate density plots (solid lines) and normal approximations (dashed lines). 10,000 was obtained for ρ which was then transformed to Fisher transformed parameters η = (η 21, η 31, η 32 ). The posterior draws of (η 21, η 31 ) are plotted in Figure 3 including a contour plot of the bivariate density as well as density estimates of the univariate posteriors. Normal approximations are also displayed (dashed lines). As can be seen the normal approximation is very accurate. For larger samples, which are typical in social research, the normal approximation will be even better. Thus, we can write π(η Y) N(µ η, Σ η ), where the posterior mean µ η can be estimated from the posterior sample as the arithmetic mean of the Fisher transformed draws of η, and the posterior covariance matrix Σ η can be estimated using least squares. Subsequently, the linear transformation ξ = Tη is used, with T = [R E t R t] as in Lemma 1. Approximately normality also holds for ξ, with π(ξ Y) N(Tµ η, TΣ η T ). Therefore, the posterior density can be estimated by plugging in ξ E = r E t in the multivariate normal density π(ξ E Y) N(R E t µ η, R E t Σ η R E t ). The conditional posterior probability can also be efficiently computed using the normal approximation. First 20

21 the inequality constraints are rewritten via a linear transformation, e.g., P r(η 21 < η 31 < η 32 ) = P r(ζ 1 > 0, ζ 2 > 0) = P r(ζ 1 > 0 ζ 2 > 0) P r(ζ 2 > 0) where (ζ 1, ζ 2 ) = (η 31 η 21, η 32 η 31 ) (Mulder, 2016). The conditional probabilities can then be estimated efficiently from MCMC output (Morey et al., 2011; Gu et al., 2017). 5.2 Computation of the prior probability and prior density Similar as the posterior components, the conditional prior probability and prior density in (17), are computed by first approximating the unconstrained uniform prior using a prior sample. This can be done using the algorithm of Joe (2006). The algorithm is provided in Appendix B for completeness. To avoid the Borel-Kolmogorov paradox in the case of equality constraints (Wetzels et al., 2010), the Fisher transformation is also applied to the prior draws of the ρ s, resulting in prior draws for the corresponding η s. Substantially similarly as for the posterior, a linear transformation is applied to the prior draws for ξ, i.e., ξ = Tη. Unlike the posterior, the prior of ξ is not approximately normal. To estimate the prior density of ξ at r E t, (i.e., the Fisher transformed values of r E t ), we use the fact that P r( ξ 1 r E 1 < δ 2,..., ξ q E re q E < δ 2 H u) δ qe π u (ξ E = r E ) for sufficiently small δ. Because a large prior sample can efficiently be obtained without needing MCMC, estimating the above probability as the proportion of draws satisfying the constraints is quite efficient. Thus, the prior density will be estimated as ˆπ u (ξ E = r E ) = δ qe S 1 S s=1 I( ξ (s) 1 r E 1 < δ 2,..., ξ(s) q E r E q E < δ 2 ), for sufficiently large S and a some small value for δ > 0. To compute the conditional prior probability, we approximate the prior conditional of ξ I given ξ E = r E with a normal distribution where the mean and covariance matrix are estimated as the arithmetic mean and the least 21

22 squares estimate based on the prior draws that satisfy ξ E r E, respectively. The same approximation is used as for the equality constraints, i.e., ξ q (s) r q E < δ, for q = 1,..., 2 qe, where ξ q (s) is the s-th draw of ξ q. A normal approximation is justified for the computation of the conditional prior probability because this probability is not very sensitive to the exact distributional form. For example, the probability that a parameter is larger than 0 is identical for a uniform distribution in ( 1, 1) as for a standard normal distribution or for any other symmetrical distribution around zero. Note that this is not the case for the prior density at 0 which is why a normal approximation was not used for estimating the prior density. Based on the normal approximation, which can be summarized as π u (ξ E r E ) N(µ ξ0, Σ ξ0 ), the same procedure can be used for estimating the conditional prior probability of R I t T 1 [r E, ξ I ] > r I t, as was used for the conditional posterior probability. 6 Performance of the Bayes factor test To illustrate the behavior of the methodology we consider the following multiple hypothesis test H 1 : ρ 21 = ρ 31 = ρ 32 H 2 : ρ 21 > ρ 31 > ρ 32 H 3 : not H 1, H 2, for a generalized multivariate probit model with P = 3 dependent variables of which the first is normally distributed (with variance 1), the second is ordinal with two categories, and the third is ordinal with three categories for one population (the population index j is therefore omitted) and an intercept matrix B = (1, 1, 1). Data sets were generated for populations where the matrix with the measures of association were equal to C = 1 ρ 21 1 ρ 31 ρ 32 1 = 1 ρ for ρ =.7,.6,...,.6,.7. Note that for ρ = 0, ρ > 0, and ρ < 0, hypothesis H 1, H 2, and H 3 are true, respectively. Sample sizes of n = 30, 100, 500, and 5000 were considered. Equal prior probabilities were set for the hypotheses, i.e., P (H 1 ) = P (H 2 ) = P (H 3 ). For each data set, the Bayes factor of 22,

23 the constrained hypotheses against the unconstrained hypothesis were first computed using the methodology of Section 5. Note that the Bayes factor of the complement hypothesis H 3 against the unconstrained hypothesis can be obtained as the ratio of posterior and prior probabilities that the order constraints of H 2 do not hold. The equality constraints of hypothesis H 1 are not evaluated because they have zero probability under H 3. The Bayes factors between the constrained hypotheses can then be computed using its transitive relationship. Finally posterior probabilities for H 1, H 2, and H 3 can then be computed using (8). These posterior probabilities are plotted in Figure 4. The plots show consistent behavior of the posterior probabilities for the hypotheses: as the sample size becomes very large the posterior probability of the true hypothesis goes to 1 and the posterior probabilities of the incorrect hypotheses goes to zero. Furthermore it can be seen that for small data sets with n = 30, relatively large effects (approximately ρ =.5) need to be observed before either H 2 or H 3 (depending on the sign of the effect) receives more evidence than the null hypothesis. This implies that a null hypothesis with a uniform prior is better able predict the data than the alternative hypotheses in the case of moderate effects and small samples. As the sample size grows only very small to zero effects can best be explained by the null hypothesis as expected, and larger effects can best be explained by the alternative hypotheses. 7 Software The methodology is implemented in a Fortran software package to ensure fast computation and general utilization of the new methodology. The software package is referred to as BCT (Bayesian Correlation Testing) and can be downloaded from The user only needs to specify the model characteristics (such as the number of dependent variables, the measurement level of the dependent variables, and the number of covariates) and the hypotheses with competing equality and order constraints on the measures of association. After running the program an output file is generated that contains the posterior probabilities of all constrained hypotheses, as well as the complement hypothesis, based on equal prior probabilities for the hypotheses. Furthermore, unconstrained Bayesian estimates of the model parameters, and 95%-credibility intervals are provided. A user manual for 23

24 n = ρ n = ρ n = 500 n = ρ ρ H1: ρ21 = ρ31 = ρ32 H2: ρ21 > ρ31 > ρ32 H3: not H1, H2 Figure 4: Posterior probabilities of H 1 : ρ 21 = ρ 31 = ρ 32 (solid line), H 1 : ρ 21 > ρ 31 > ρ 32 (dotted line), and H 2 : not H 1, H 2 (dashed line) for different effects ρ and different sample sizes n. BCT can be found in Appendix C. 8 Exemplary hypotheses: Associations between life, leisure and relationship satisfaction (revisited) We now return to our empirical example and evaluate the informative hypotheses that we developed at the beginning of this contribution. Table 3 reports for each set of hypotheses the posterior probabilities that each hypothesis is true after observing the data when assuming equal prior probabilities for the hypotheses. Based on these results we can conclude that for men there is overwhelming evidence of equal partial associations between life satisfaction, leisure 24

25 Table 3: Posterior probabilities for the competing hypothesis about partial associations between life satisfaction and domain satisfaction in leisure and relationship. Hypotheses for men: H1 a : ρ g1y2y1 > ρ g1y3y1 > ρ g1y3y H1 b : ρ g1y3y1 > ρ g1y2y1 > ρ g1y3y H1 c : ρ g1y3y1 = ρ g1y2y1 = ρ g1y3y H1 d : not H1 a, nor H1 b, nor H1 c Hypotheses for women: H2 a : ρ g2y2y1 > ρ g2y3y1 > ρ g2y3y H2 b : ρ g2y3y1 > ρ g2y2y1 > ρ g2y3y H2 c : ρ g2y3y1 = ρ g2y2y1 = ρ g2y3y H2 d : not H2 a, nor H2 b, nor H2 c Hypotheses with gender as moderator: H3 a : ρ g1y2y1 > ρ g2y2y H3 b : ρ g1y2y1 = ρ g2y2y H3 c : ρ g1y2y1 < ρ g2y2y H4 a : ρ g1y3y1 > ρ g2y3y H4 b : ρ g1y3y1 = ρ g2y3y H4 c : ρ g1y3y1 < ρ g2y3y Posterior probabilities for the hypotheses 25

26 satisfaction, and relationship satisfaction, controlling for differences in mood at survey and self-reported health (97%). The same conclusion is reached for women, for whom the posterior probability of the hypothesis assuming equal partial correlations is estimated 98%. Finally, we consider the hypotheses that expect either ordered or equal conditional (i.e. with gender as a moderator) partial associations between the satisfaction variables. Here, we see that the evidence is about equally large for the hypotheses that life satisfaction and each type of domain-specific satisfaction relate equality strong to each other for men and women (about 77%). In summary, the analysis portrays a picture of a large degree of equality of association between satisfaction indicators while holding constant for other variables, and little evidence for the expected ordering of partial associations among the variables considered. 9 Concluding remarks This paper presented a flexible framework for testing statistical hypotheses on most commonly observed measure of association in social research. The methodology has the following useful properties. Constrained hypotheses can be formulated on tetrachoric correlations, polychoric correlations, biserial correlations, polyserial correlations, and product-moment correlations. These measures of association can be corrected for certain covariates to avoid spurious relations. A broad class of hypotheses can be tested with competing equality and/or order constraints on the measures of association. Multiple (more than two) hypothese can be tested simultaneously in a straightforward manner. A simple answer is provided to the research question which hypothesis receives most evidence from the data and how much, using Bayes factors and posterior probabilities. The proposed test is consistent which implies that the posterior probability of the true hypothesis goes to one as the sample size goes to infinity. The software package BCT allows social science researchers to easily apply the methodology in real-life examples. 26

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett