Bayes factor testing of equality and order constraints on measures of association in social research

Size: px
Start display at page:

Download "Bayes factor testing of equality and order constraints on measures of association in social research"

Transcription

1 arxiv: v1 [stat.me] 16 Jul 2018 Bayes factor testing of equality and order constraints on measures of association in social research Joris Mulder & John P.T.M. Gelissen July 17, 2018 Abstract Measures of association play a central role in the social sciences to quantify the degree of association between the variables of interest. In many applications researchers can translate scientific expectations to hypotheses with equality and/or order constraints on these measures of association. In this paper a Bayes factor test is proposed for testing multiple hypotheses with constraints on the measures of association between ordinal and/or continuous variables, possibly after correcting for certain covariates. This test can be used to obtain a direct answer to the research question how much evidence there is in the data for a social science theory relative to competing theories. The accompanying software package BCT allows users to apply the methodology in an easy manner. An empirical application from leisure studies about the associations between life, leisure and relationship satisfaction is used to illustrate the methodology. 1 Introduction In the social sciences, statistical measures that quantify the degree of association between variables of interest are a fundamental tool for making causal inferences from correlational data when the design or the logic of the study makes it plausible that a variable X is an antecedent to a variable Y. The best-known example of such a measure is the Pearson product-moment 1

2 correlation coefficient r, which expresses the strength of the linear relationship between two continuous variables. If the variables are measured on an ordinal scale, ordinal measures of association such as Spearman s rho are needed to quantify the strength of the linear relationship between the variables. In many analyses, the researcher is not only interested in a zero-order association between the variables of interest, but he also wants to rule out that any association found is the result of the two focal variables having a common cause, which would make the zero-order correlation spurious. A classical tool for this purpose is the partial correlation coefficient, which can be used to measure the linear association between to variables while controlling for other variables. Many researchers usually stop after the second step in which other variables are controlled or partialled out, just to conclude that the evidence is in favor or against a spurious association. But such a basic correlational analysis has also the potential for testing more complex and interesting substantive sociological hypotheses. It may be of interest to find out which variables in such a correlational analysis are most important, with some more or less informed expectation about a meaningful ordering in the sizes of the controlled associations, and that some other controlled correlations are equal to each other. In this paper a general framework is presented for testing multiple hypotheses with equality and order constraints on measures of association. The current approach builds upon the earlier work of Mulder (2016) who presented a test for order constraints on bivariate correlations between continuous variables. The current paper proposes several important extensions. First, the proposed method can also be used for testing hypotheses with equality constraints, hypotheses with order constraints, as well as hypotheses with combinations of equality and order constraints on the correlations. The importance of this extension is evident given the importance of the (equality constrained) null hypothesis in scientific research, e.g., an association equals zero, or the association between all variables is exactly equal. Second, the methodology can be used for testing the association between continuous variables, ordinal variables, and combinations of ordinal and continuous variables. This is particularly relevant in sociological research where variables are commonly measured on an ordinal scale. Table 1 shows an overview of the types of association measures that can be tested using the proposed methodology. Third, the methodology can be used to test constraints on partial correlations by correcting for external covariates. Controlling for external (confounding) covariates is very important to avoid spurious relationships 2

3 Table 1: Types of measures of association depending on the measurement scale of the variables. Scale of Y 2 Dichotomous Polychotomous- Continuous- Scale Y 1 Ordinal categories Interval Dichotomous Tetrachoric Polychoric Biserial Polychotomous- Polychoric Polyserial Ordinal categories Continuous- Interval Product- Moment between the variables of interest. When testing statistical hypotheses on measures of association, different testing criteria can be used. The Fisherian p-value is perhaps most commonly used for testing a single correlation. In this paper however we shall not develop p-values for testing multiple hypotheses with equality and/or order constraints on the measures of association between the variables of interest. There are two important reasons for this. First, p-values cannot be used to directly test nonnested hypotheses with order constraints, such as H 1 : ρ 21 < ρ 31 < ρ 32 versus H 2 : ρ 21 > ρ 31 > ρ 32 ; instead p-values can only be used for testing a precise null hypothesis with only equality constraints against an ordered alternative or for testing an ordered null hypothesis against the unconstrained alternative (Silvapulle & Sen, 2004). Second, p-value tests are not suitable for testing multiple hypotheses simultaneously; instead the p- value test is specifically designed to test one specific null hypothesis against one alternative hypothesis. In this paper however we do not limit ourselves to this specific scenario because researchers can often formulate multiple scientific expectations on measures of association as will be exemplified in this paper. Another important class for testing statistical hypotheses is the class of information criteria. Well-known examples are the AIC (Akaike, 1973), the BIC (Schwarz, 1978; Raftery, 1995; Wagenmakers, 2007), and the DIC (Spiegelhalter et al., 2002). These information criteria explicitly balance between fit and complexity when quantifying which model or hypothesis is best for the data at hand. Although information criteria can straightforwardly be 3

4 used for testing multiple hypotheses or models simultaneously (unlike classical p-values), these criteria are not suitable for testing order constrained hypotheses. The reason is that these information criteria incorporate the complexity of a model based on the number of free parameters which is illdefined when order constraints are present. As an example, it is unclear how many free parameters are present under the order-constrained hypothesis H 1 : ρ 21 < ρ 31 < ρ 32. Therefore information criteria will not be considered further in this paper. The criterion that will be used is the Bayes factor (Jeffreys, 1961; Kass & Raftery, 1995). A useful property of the Bayes factor is its straightforward interpretation as the relative evidence in the data between two hypotheses. Thus, the Bayes factor can also be used to quantify evidence for a null hypothesis (e.g., Rouder et al., 2009), which is not the case for the p-value; the p-value can only be used to falsify the null. Another important property of the Bayes factor is that it can straightforwardly be used for testing multiple hypotheses simultaneously (e.g., Berger & Mortera, 1999). In addition Bayes factors can be transformed to so-called posterior probabilities of the hypotheses. These posterior probabilities provide a direct answer to the research question how plausible (in a Bayesian sense) each hypothesis is based on the observed data. Bayes factors are also particularly suitable for testing hypotheses with order constraints. This has been shown by Klugkist et al. (2005) for group means in ANOVA designs, Mulder et al. (2009) for repeated measurement means, Mulder et al. (2010) for multivariate regression models, Klugkist et al. (2010) for contingency tables, Mulder & Fox (2013) for intraclass correlations, and Gu et al. (2014) for structural equation models. Finally it is important to note that under very general conditions, Bayes factors and posterior probabilities are consistent which implies that the evidence for the true hypothesis goes to infinity as the sample size grows to infinity. In order to compute the Bayes factor for testing a set of hypotheses with constraints on the correlations, two challenges need to be overcome. The first challenge is the specification of the prior of the parameters under each hypothesis. The prior reflects which values of the correlations are most likely a priori. The choice of the prior is particularly important when testing hypotheses with equality constraints on the model parameters. For example, arbitrarily vague priors should not be used when computing Bayes factors because the evidence for the null then becomes arbitrarily large. This is also known as Bartlett s phenomenon (Lindley, 1957; Bartlett, 1957; Jef- 4

5 freys, 1961). In this paper we propose a prior specification method for the correlations under the hypotheses based on uniform distributions. This prior assumes that every combination of the correlations under each hypothesis is equally likely, which seems a reasonable default setting. Also note that Jeffreys (1961) originally proposed a default Bayes factor for testing a single bivariate correlation using a uniform prior (Ly et al., 2016). The proposed methodology can be seen as a generalization of Jeffreys original approach. The second challenge is the computation of the marginal likelihood, a key ingredient of the Bayes factor. To compute the marginal likelihood of a hypothesis we need to compute the integral of the product of the likelihood and prior over the parameter space of the free parameters. In complex settings the computation of this integral can take a lot of time, which limits general utilization of the methodology for applied users. To tackle this problem we first provide a general expression of a Bayes factor for a hypothesis with equality and order constraints on the parameters of interest versus an unconstrained alternative. This result can be used for any testing scenario where the unconstrained posterior and prior have an (approximate) normal distribution under the unconstrained model. The result is used for the current problem of testing inequality and order constraints on measures of association. To ensure that the unconstrained posterior is approximately normal, a Fisher transformation is applied to the measures of association to obtain an approximately normal distribution. The algorithm for computing Bayes factors and the posterior probabilities for the hypotheses based on the new methodology is implemented in a Fortran software program called BCT (Bayesian Correlation Testing). The program allows users to test a general class of equality and order constrained hypotheses on measures of association which are commonly observed in social research. A user manual is included. This paper is structured as follows. In the next section we illustrate some equality and order constrained hypotheses that we develop for associations between life and domain satisfaction as addressed in Quality of Life research. Then we explain the methodology and numerical computational details of the method, and report evidence of the methods performance using a small simulation study. We then return to our empirical example and discuss the results of testing the equality and order constrained hypotheses we developed. Finally, we conclude with summarizing the properties and advantages of applying the method to social science research problems. 5

6 2 Exemplary hypotheses: Associations between life, leisure and relationship satisfaction An important research question in Quality of Life research concerns how a person s satisfaction in certain life domains (e.g. about leisure, relationship, or work) relates to overall life satisfaction (Felce & Perry, 1995; Newman et al., 2014). A prevailing hypothesis is that women are much more relational than men (Hook et al., 2003). As these authors point out, women like being connected (i.e., experiencing we-ness ), doing things together with others, and they place great emphasis on talking and emotional sharing. Men, on the other hand, see togetherness more as an activity than a state of being, as it is for women. They favor interactions that involve doing rather than being. Unlike women, men prefer to have an element of separation included in their relationships with others and the doing orientation seems to promote this (Hook et al., 2003, p. 465). Research has furthermore shown that men have on average more leisure time than women (Mattingly & Blanchi, 2003). Based on these general theoretical ideas and observations concerning gender differences in relational issues and leisure, we anticipate that such systematic differences will also be observed in how satisfaction with one s relationship and leisure is associated with overall life satisfaction, with leisure satisfaction relating more strongly to overall life satisfaction among men and relationship satisfaction relating more strongly to overall life satisfaction among women. Consider the following graphical model about the partial associations between three focal dependent variables that might be included in such an analysis: the degree of life satisfaction, leisure satisfaction, and relationship satisfaction (Figure 1). In this example we are interested in testing various informative hypotheses about conditional partial associations between the variables concerned, with gender also potentially moderating such partial associations. As can be seen, in this model the associations between the key variables of interest are controlled for differences in self-reported health and mood at survey. We use data from the Dutch LISS panel (Longitudinal Internet Studies for the Social sciences) to test various informative hypotheses about this model. In particular, the LISS panel contains the following variables used to operationalize the variables in the conceptual model: 1. Life Satisfaction (y 1 ): respondents were asked to rate the following statement: I am satisfied with my life ; ordered categorical variable, 6

7 Life Satisfaction (y1) Leisure Satisfaction (y2) Relationship Satisfaction (y3) Self-reported Health (x1) Mood at Survey (x2) Figure 1: Graphical Model describing Partial Associations between Life and Domain Satisfaction Variables. with 1 = strongly disagree 7 strongly agree 2. Leisure Satisfaction (y 2 ): respondents were asked to indicate how satisfied they are with the way in which they spend their leisure time; continuous variable, with 1 = entirely dissatisfied 11 entirely satisfied 3. Relationship Satisfaction (y 3 ): respondents were asked to indicate how satisfied they are with their current relationship; continuous variable, with 1 = entirely dissatisfied 11 entirely satisfied 4. Self-reported Health (x 1 ): respondents were asked: How would you describe your health, generally speaking? ; ordered categorical variable, with 1 = poor 5 = excellent 5. Mood at Survey (x 2 ): respondents were asked to indicate how they feel at the moment of completing the survey; ordered categorical variable, with 1 = very bad and 7 = very good 6. Gender (grouping variable g): the respondent s gender, with 1 = men and 2 = women 7

8 Several competing informative hypotheses can be formulated about the ordering of, and equalities between the partial correlations between y1, y2 and y3 conditional on g. Here, we consider the following hypotheses: Hypotheses about partial associations between satisfaction domains for men: H1 a H1 b H1 c For men, the partial association between leisure satisfaction and life satisfaction is stronger than the partial association between relationship satisfaction and life satisfaction, and in turn this association is stronger than the partial association between leisure satisfaction and relationship satisfaction: ρ g1y2y1 > ρ g1y3y1 > ρ g1y3y2. For men, the partial association between relationship satisfaction and life satisfaction is stronger than the partial association between leisure satisfaction and life satisfaction, and in turn this association is stronger than the partial association between leisure satisfaction and relationship satisfaction: ρ g1y3y1 > ρ g1y2y1 > ρ g1y3y2. For men, the partial associations between life satisfaction, leisure satisfaction, and relationship satisfaction are equal: ρ g1y3y1 = ρ g1y2y1 = ρ g1y3y2. H1 d The complement hypothesis, which holds not H1 a, nor H1 b, nor H1 c. Hypotheses about partial associations between satisfaction domains for women: H2 a H2 b H2 c For women, the partial association between leisure satisfaction and life satisfaction is stronger than the partial association between relationship satisfaction and life satisfaction, and in turn this association is stronger than the partial association between leisure satisfaction and relationship satisfaction: ρ g2y2y1 > ρ g2y3y1 > ρ g2y3y2. For women, the partial association between relationship satisfaction and life satisfaction is stronger than the partial association between leisure satisfaction and life satisfaction, and in turn this association is stronger than the partial association between leisure satisfaction and relationship satisfaction: ρ g2y3y1 > ρ g2y2y1 > ρ g2y3y2. For women, the partial associations between life satisfaction, leisure satisfaction, and relationship satisfaction are equal: ρ g2y3y1 = ρ g2y2y1 = ρ g2y3y2. 8

9 H2 d The complement hypothesis, which holds not H2 a, nor H2 b, nor H2 c. Hypotheses about conditional partial associations between life satisfaction and leisure satisfaction, with gender as moderator: H3 a H3 b H3 c The partial association between leisure satisfaction and life satisfaction among men is stronger than the partial association among women: ρ g1y2y1 >ρ g2y2y1. The partial association between leisure satisfaction and life satisfaction among men is equally strong as the partial association among women: ρ g1y2y1 =ρ g2y2y1. The partial association between leisure satisfaction and life satisfaction among women is stronger than the partial association among men: ρ g1y2y1 <ρ g2y2y1. Hypotheses about conditional partial associations between relationship satisfaction and life satisfaction, with gender as moderator: H4 a H4 b The partial association between relationship satisfaction and life satisfaction among women is stronger than the partial association among men: ρ g2y3y1 >ρ g1y3y1. The partial association between relationship satisfaction and life satisfaction among men is equally strong as the partial association among women: ρ g2y3y1 =ρ g1y3y1. H4 c The partial association between relationship satisfaction and life satisfaction among men is stronger than the partial association among women: ρ g2y3y1 <ρ g1y3y1. 3 Model specification 3.1 The generalized multivariate probit model A generalized multivariate probit model will be used for modeling combinations of continuous and ordinal dependent variables, which is based on Albert & Chib (1993), Chib & Greenberg (1998), and Boscardin et al. (2008). We 9

10 consider G independent populations. The P -dimension vector of dependent variables of subject i in population g will be denoted by y ig = (v ig, c ig ), of which the first P 1 elements, v ig = (v ig1,..., v igp1 ), are continuous normally distributed variables and the remaining P 2 = P P 1 elements, c ig = (c ig1,..., c igp2 ), are measured on an ordinal scale, for i = 1,..., n g, and g = 1,..., G. Furthermore, we assume that the p-th ordinal variable can assume the categories 1,..., K p, for p = 1,..., P 2. As is common in a Bayesian multivariate probit modeling, a multivariate normal latent variable, denoted by z ig, is used for each ordinal vector c ig. This implies that the p ordinal variable of subject i in population g falls in category k, i.e., c igp = k, if z igp (γ gp(k 1), γ gpk ], for k = 1,..., K p, where γ gpk is the upper cut-point of the k-th category of the p-th ordinal variable in the g-th population. In order to ensure identification of the model it is necessary to set γ gp0 =, γ gp1 = 0, and γ gpkp =, for g = 1,..., G and p = 1,..., P 2, and to fix the error variances of the latent variables to 1. As is common in regression models, the mean structure of each dependent variable is assumed to be a linear combination of K external covariates x ig. Subsequently, the generalized multivariate probit model can be defined by [ ] vig N(B g x ig, Σ g ), where (1) z ig Σ g = diag(σ g, 1 P 2 )C g diag(σ g, 1 P 2 ), (2) where 1 P 2 is a vector of length P 2 of ones. In this model B g is a P K matrix with regression coefficients of the g-th population where element (p, k) reflects the effect of the k-th covariate on the p-th dependent variable, for k = 1,..., K and p = 1,..., P, the P 1 error standard deviations of v ig in population g are contained in σ g, and C g denotes the P P correlation matrix of population g. Note that the (p, q)-th element denotes the linear association of the p-th and q-th dependent variable, for p q, while controlling for the covariates. Note that if K = 1 and x ig = 1, no covariates are incorporated in the model, which implies that the correlations are bivariate correlations. 10

11 3.2 Multiple hypothesis testing on measures of association In the context of testing constraints on correlations, which is the goal of the current paper, the correlation matrices C g are of central importance while the parameter matrix B g and the variances σ are treated as nuisance parameters. The correlations in C g are contained in the vector ρ, e.g., C g = 1 ρ g21 1 ρ g31 ρ g32 1 ρ g = (ρ g21, ρ g31, ρ g32 ). (3) Furthermore, the correlations over all populations will be combined in the vector ρ = (ρ 1,..., ρ G ) of length L = 1 GP (P 1). Similarly we combine 2 the parameter matrices B g and variances σ g over all G population in the matrices B and σ, respectively, and subsequently, the vectorization of B is denoted by the vector β. We consider a multiple hypothesis test of T hypotheses H 1,..., H T of the form H t : R E t ρ = r E t, R I t ρ > r I t, (4) for t = 1,..., T, where [R E t r E t ] is a qt E (L + 1) matrix of coefficients that capture the set of equality constraints under H t and [R I t r I t ] is a qt I (L + 1) matrix of coefficients that capture the set of inequality (or order) constraints under H t. In most applications researchers either compare two correlations with each other, e.g., ρ 121 > ρ 131, or a single correlation is compared to constant, e.g., ρ 121 >.5. Therefore each row of the matrices [R E t ] and [R I t ] is either a permutation of (1, 1, 0..., 0) with corresponding constant in r E t and r I t equals 0, or a row is a permutation of (±1, 0,..., 0) with corresponding constant r ( 1, 1). As an example, hypothesis H1 a : ρ 121 > ρ 131 > ρ 132 from the previous section (with the index labels omitted) would have the following matrix form ρ 121 [ ] ρ 131 [ ] H1 a : ρ ρ 221 >. 0 ρ 231 ρ

12 The allowed parameter space under the constrained H t will be denoted by C t. In certain parts of the paper we refer to an unconstrained hypothesis, denoted by H u, which does not assume any constraints on the correlations besides the necessary constraints on ρ that ensure that the corresponding correlation matrices are positive definite. 4 Methodology 4.1 Marginal likelihoods, Bayes factors, and posterior probabilities The Bayes factor is a Bayesian criterion that quantifies the relative evidence in the data between two hypotheses. The Bayes factor of hypothesis H 1 versus H 2 is defined by the ratio of the marginal likelihoods under the respective hypotheses, i.e., B 12 = m 1(Y) m 2 (Y), (5) where the marginal likelihood is defined by m t (Y) = f t (Y X, β, σ, ρ)π t (β, σ, ρ)dρdσdβ, (6) C t where f t denotes the likelihood under H t, which follows directly from (1) and (1), and π t denotes the prior for the unknown model parameters under H t. The marginal likelihood captures how likely the observed data is under a hypothesis and its respective prior. One of the main strengths of the Bayes factor is its intuitive interpretation. For example, the Bayes factor is symmetrical in the sense that if, say B 12 =.1, which implies that H 1 receives 10 times less evidence than H 2, it follows naturally from (5) that B 21 = B12 1 = 10, which implies that H 2 receives 10 times more evidence from the data than H 1. Furthermore, Bayes factors are transitive in the sense that if H 1 received 10 times more evidence than H 2, i.e., B 12 = 10, and H 2 received 5 times more evidence than H 3, i.e., B 32 = 5, it again follow naturally from the definition in (5) that hypothesis H 1 received 50 times more evidence than H 3 because B 13 = B 12 B 23 = 10 5 = 50. Various researchers have provided an indication how to interpret Bayes factors (e.g. Jeffreys, 1961; Raftery, 1995). For completeness we provided the guidelines that were given by Raftery (1995). These guidelines are helpful 12

13 Table 2: Rough guidelines for interpreting Bayes factors (Raftery, 1995). B 12 Evidence < 1/150 Very strong evidence for H 2 1/150 to 1/20 Strong evidence for H 2 1/20 to 1/3 Positive evidence for H 2 1/3 to 1 Weak evidence for H 2 1 No preference 1 to 3 Weak evidence for H 1 3 to 20 Positive evidence for H 1 20 to 150 Strong evidence for H 1 > 150 Very strong evidence for H 1 for researchers who are new to Bayes factors. We do not recommend to use these guidelines as strict rules because researchers should decide by himself or herself when he or she feels that the Bayes factor indicates strong evidence. The Bayes factor can be used to update the prior odds between two hypotheses being true before observing the data to obtain the posterior odds that the hypotheses are true after observing the data according to the following formula which follows directly from Bayes theorem Pr(H 1 Y) Pr(H 2 Y) = B 12 Pr(H 1) Pr(H 2 ), (7) where Pr(H t ) and Pr(H 2 Y) denote the prior and posterior probability that H t is true, respectively, for t = 1 or 2. In the general case of T hypotheses, the posterior hypothesis probabilities can be obtained as follows Pr(H t Y) = Pr(H t )B t1 T t =1 Pr(H t )B. (8) t 1 Posterior hypothesis probabilities are useful because they provide a direct answer to the research question how plausible each hypothesis is in light of the observed data. Researchers typically find these posterior probabilities easier to interpret than Bayes factors because the posterior probabilities add up to one. It should be noted however that in the case of equal prior probabilities for the hypotheses, i.e., Pr(H 1 ) = Pr(H 2 ), which is the default setting, the posterior odds between two hypotheses corresponds exactly to the respective Bayes factor as can be seen in (7). 13

14 4.2 Prior specification In order to compute the marginal likelihoods (and subsequently, the Bayes factors and posterior hypothesis probabilities), prior distributions need to be formulated for the unknown parameters β, σ, and ρ under each constrained hypothesis. A prior distribution reflects the plausibility of the possible values of the parameters before observing the data. Under each hypothesis, we set independent priors for the three different types of model parameters, i.e., π t (β, σ, ρ) = π N t (β) π N t (σ) π U t (ρ) I(ρ C t ), (9) with noninformative improper priors for the nuisance parameters πt N (β) = 1 (10) G πt N (σ) = σg, σ 1 g,p 1 (11) g=1 It is well known that proper priors (i.e., prior distributions that integrate to one) need to be formulated for the parameters that are tested, i.e., the correlations, in order for the Bayes factor to be well-defined. In this paper we consider a uniform prior for the correlations under a hypothesis in its allowed constrained subspace. This implies that every combination of values for the correlations that satisfies the constraints is equally likely a priori under each hypothesis. The prior for ρ is zero outside the constrained subspace under each hypothesis. Because the constrained parameter space C t is bounded, a proper uniform prior can be formulated for the correlations under every hypothesis, i.e., πt U (ρ) = Vt 1 I(ρ C t ), (12) where the normalizing constant V t is given by V t = 1dρ. (13) C t The normalizing constant V t can be seen as a measure of the size or volume of the constrained space. To illustrate the priors under different constrained hypotheses on measures, let us consider a model with 3 dependent variables and one population (the correlation matrix was given in (3) where we omit the population index j). Furthermore, let us consider the following hypotheses: 14

15 H u : ρ 21, ρ 31, ρ 32 (the unconstrained hypothesis). H 1 : ρ 21 = ρ 31 = ρ 32 (all ρ s are equal). H 2 : ρ 31 = 0, ρ 21, ρ 32 (only ρ 31 is restricted to zero). H 3 : ρ 31 = 0, ρ 21 > ρ 32 (ρ 31 is restricted to zero and ρ 21 is larger than ρ 32 ). The unconstrained parameter space of ρ = (ρ 21, ρ 31, ρ 32 ) that results in a positive definite correlation matrix is displayed in Figure 2a (taken from Rousseeuw & Molenberghs (1994) with permission). As noted by Joe (2006), the volume of this 3-dimensional subspace equals Therefore, the volume under the unconstrained hypothesis H u is given by V u = in (13). Therefore, the unconstrained uniform prior for the correlations equals πu U 1 (ρ) = I(ρ C u) in (12). When all ρ s are equal as under H 1, the common correlation, say, ρ, must lie in the interval ( 1, 1) to ensure positive definiteness (e.g. Mulder & Fox, ). Thus, the size of the parameter space of ρ corresponds to the length of the interval which is V 1 = 3. Therefore, the uniform prior for ρ under H 2 1 corresponds to π1 U (ρ) = 2 I(ρ ( 1, 1)), which is plotted in Figure 2b. 3 2 The diagonal ρ 21 = ρ 31 = ρ 32 is also plotted in Figure 2a as a dashed line, where the thick part lies within positive definite subspace C u. When ρ 31 is restricted to zero as under H 2, the allowed parameter space for (ρ 21, ρ 32 ) that results in a positive definite correlation matrix must satisfy ρ ρ 2 32 < 1, i.e., a circle with radius 1 (Rousseeuw & Molenberghs, 1994). Therefore, the uniform prior under H 2 : ρ 31 = 0 is given by π 2 (ρ 21, ρ 32 ) = 1 π I(ρ ρ 2 32 < 1) (Figure 2c) because a circle or radius 1 has a surface of V 2 = π. When ρ 31 is restricted to zero and ρ 21 is larger than ρ 32 as under H 3, the subspace is half as small as under H 2. Therefore the uniform prior density for the free parameters under H 3 is twice as large as the prior under H 1 to ensure the prior integrates to one. Thus, the uniform prior under H 3 is given by π 3 (ρ 21, ρ 32 ) = 2 π I(ρ ρ 2 32 < 1 & ρ 21 > ρ 32 ) (Figure 2d). 15

16 ρ21 = ρ31 = ρ32 (1,1,1) 2 1 ρ (a) ρ31 (-1,-1,-1) ρ21 (b) ρ -1 (0,0) ρ π (0,0) ρ32 2 π -1 1 ρ21 C 2 C ρ21 (c) (d) Figure 2: (a) Graphical representation of the subspace of (ρ 21, ρ 31, ρ 32 ) for which the 3-dimensional correlation matrix is positive definite (taken from Rousseeuw & Molenberghs (1994) with permission). The thick diagonal line from ( 1, 1, 1) to (1, 1, 1) represents the correlations that satisfy ρ = ρ 31 = ρ 32 and result in a positive diagonal correlation matrix. (b) Uniform prior π1 U for the common correlation ρ under H 1 : ρ 21 = ρ 31 = ρ 32 in the allowed region C 1 = {ρ ρ ( 1, 1)}. (c) Uniform prior 2 πu 2 for the free parameters under H 2 : ρ 31 = 0 in the allowed region C 2 = {(ρ 21, ρ 32 ) ρ ρ 2 32 < 1}. (d) Uniform prior for the free correlations under H 3 : ρ 31 = 0, ρ 21 > ρ 32 in the allowed region C 3 = {(ρ 21, ρ 32 ) ρ ρ 2 32 < 1, ρ 21 > ρ 32 }. 4.3 An expression of the Bayes factor for testing (in)equality constrained hypotheses In order to compute the marginal likelihood under each constrained hypothesis via (6) using the constrained uniform prior in (15) a complex multivariate 16

17 integral must be solved. This endeavor can be somewhat simplified by using the fact that the uniform prior for the correlations under each hypothesis H t can be written as a truncation of a uniform unconstrained prior under the unconstrained hypothesis H u, i.e., π U t (β, σ, ρ) = V u V t π U u (β, σ, ρ) I(ρ C t ) (14) where the unconstrained prior is given by π U u (β, σ, ρ) = V 1 u σ 1 1,1... σ 1 1,P 1 I(ρ C u ), (15) where V u = C u 1dρ, which can be interpreted as the volume of the subspace of ρ that results in positive definite correlation matrices under all populations. Note that the normalizing constant in (14) is equal to the reciprocal of the unconstrained prior integrated over the constrained space C t, πu U (ρ)dρ = Vu 1 C t C t 1dρ = V t V u. (16) This relationship between each constrained prior and the unconstrained prior allows us to use the following general result for computing the Bayes factor of a hypothesis with certain constraints on the parameters of interest, say, θ, against a larger unconstrained hypothesis in which the constrained hypothesis is nested. Lemma 1 Consider a model where a hypothesis is formulated with equality and inequality constraints on the parameter vector θ of length Q of the form H t : R E t θ = r E t, R I t θ > r I t, where the q E Q matrix R E t has rank q E Q, and a larger unconstrained hypothesis H u in which H q is nested. Let the vector of nuisance parameters in the model are denoted by φ. If the prior of θ under H t is defined as the truncation of a proper prior for θ under H u in its constrained subspace, then the Bayes factor of H t against H u can be written as B tu = Pr(RI t T 1 [r E, ξ I ] > r I t ξ E = r E t, H u, Y) Pr(R I t T 1 [r E, ξ I ] > r I t ξ E = r E t, H u ) π u(ξ E = r E t Y) π u (ξ E = r E t ), (17) with ξ E = R E t θ, ξ I = R t θ, where R t is a (Q q E ) Q arbitrary matrix such that the Q Q matrix T = [R E t R t] has rank Q. The conditional probabilities in the numerator and denominator in the first factor in (17) are computed based on the conditional posterior and prior of ξ I ξ E = r E t under H u, respectively. 17

18 Proof. The derivation is combines the results of Dickey (1971), Klugkist et al. (2005), Pericchi et al. (2008), Wetzels et al. (2010), Mulder (2014a), and Gu et al. (2017). A proof is given in Appendix A. Note that the second factor in (17) is equal to the well-known Savage- Dickey density ratio (Dickey, 1971; Verdinelli & Wasserman, 1995; Wetzels et al., 2010). The ratio of posterior and prior probabilities in the first factor was also observed in Klugkist et al. (2005) when there are no equality constraints under H t. The conditional posterior probability, i.e., the numerator of the first term of (17), can be interpreted as a measure of fit of the order constraints of H t relative to H u ; the marginal posterior density, i.e., the numerator of the second term of (17), can be seen as a measure of fit of the equality constraints of H t relative to H u ; the conditional prior probability, i.e., the denominator of the first term of (17), can be interpreted as a measure of complexity of the order constraints of H t relative to H u ; the marginal prior density, i.e., the denominator of the second term of (17), can be seen as a measure of complexity based on the equality constraints of H t relative to H u ; see also Mulder (2014a) and Gu et al. (2017). Evaluating equality constraints and the inequality constraints conditional on the equality constraints separately was shown by Pericchi et al. (2008). The contribution here is that the Bayes factor is derived for the general case of a set of linear equality constraints and a set of linear inequality constraints where the prior under H t is a truncation of the unconstrained prior. We have not seen (17) elsewhere in the literature. In the current paper, the parameter vector θ would correspond to the vector of correlations ρ and the nuisance parameters φ would correspond to the vector of containing B and σ. The computation of the Bayes factor in (17) for a constrained hypothesis H t on measures of association is discussed in the following section. 5 Numerical computation In this section we discuss a general numerical method to compute the elements in (17) for a hypothesis H t with equality and/or inequality constraints on the measures of association in the generalized multivariate probit model in Section 3.1 using the uniform prior discussed in Section 4.2. The computation of the posterior parts in the numerators in (17) is discussed first, 18

19 followed by the prior parts in the denominators. 5.1 Computation of the posterior probability and posterior density To compute the posterior probability in the nominator in the first factor in (17) and the posterior density in the numerator in the second factor of (17) under the unconstrained hypothesis, first the unconstrained posterior is approximated using an unconstrained posterior sample from the unconstrained generalized multivariate probit model (Section 3.1). This sample can be obtained using an MCMC sampler. Because the unconstrained uniform prior in (15) is not conjugate, Metropolis-Hastings steps are needed when sampling the model parameters. A MCMC sampler was written based on the algorithm of Boscardin et al. (2008), where the correlation matrices are efficiently sampled using the Metropolis-Hastings algorithm of Liu & Daniels (2006). The cut-off parameters γ for modeling ordinal dependent variables are sampled using uniform distributions. Computing the posterior probability in (17) based on the proportion of unconstrained draws that satisfy the inequality constraints can be very inefficient (Mulder et al., 2012). For this reason an alternative method is used. First note that the sampling distribution of the Fisher transformed sample correlation r given a population correlation ρ is approximately normal. Furthermore, the population correlation ρ and the sample correlation r have a similar role in the integrated likelihood of r given ρ (Johnson & Kotz, 1970; Mulder, 2016). Now because a uniform prior is used for ρ, which is relatively vague, the posterior of ρ is completely dominated by the likelihood. Therefore, the posterior of the Fisher transformed parameters, which will be denoted by η 1, is approximately normal. To illustrate the accuracy of the normal approximation, data of size n = 40 were generated from the generalized multivariate probit model of dimension P = 3 where the first dependent variable was normally distributed, the second dependent variable was ordinal with two categories, and the third dependent variable was ordinal with three categories. The population variables were set to ρ = (ρ 21, ρ 31, ρ 32 ) = (.25,.25, 0). A posterior sample of size 1 Let p 1 p 2 ( be the association ) between the p 1 -th and p 2 -th dependent variable. Then, η p1p 2 = 1 2 log 1+ρp1 p 2 1 ρ p1 is the Fisher transformed association. Note that the constraints p 2 under H t are identical for η because the Fisher transformation is monotonically increasing. 19

20 posterior draw (η 21,η 31 ) density estimate normal approximation η η 31 Figure 3: Scatter plot of posterior draws of (η 21, η 31 ) (red dots) with additional contour plot and univariate density plots (solid lines) and normal approximations (dashed lines). 10,000 was obtained for ρ which was then transformed to Fisher transformed parameters η = (η 21, η 31, η 32 ). The posterior draws of (η 21, η 31 ) are plotted in Figure 3 including a contour plot of the bivariate density as well as density estimates of the univariate posteriors. Normal approximations are also displayed (dashed lines). As can be seen the normal approximation is very accurate. For larger samples, which are typical in social research, the normal approximation will be even better. Thus, we can write π(η Y) N(µ η, Σ η ), where the posterior mean µ η can be estimated from the posterior sample as the arithmetic mean of the Fisher transformed draws of η, and the posterior covariance matrix Σ η can be estimated using least squares. Subsequently, the linear transformation ξ = Tη is used, with T = [R E t R t] as in Lemma 1. Approximately normality also holds for ξ, with π(ξ Y) N(Tµ η, TΣ η T ). Therefore, the posterior density can be estimated by plugging in ξ E = r E t in the multivariate normal density π(ξ E Y) N(R E t µ η, R E t Σ η R E t ). The conditional posterior probability can also be efficiently computed using the normal approximation. First 20

21 the inequality constraints are rewritten via a linear transformation, e.g., P r(η 21 < η 31 < η 32 ) = P r(ζ 1 > 0, ζ 2 > 0) = P r(ζ 1 > 0 ζ 2 > 0) P r(ζ 2 > 0) where (ζ 1, ζ 2 ) = (η 31 η 21, η 32 η 31 ) (Mulder, 2016). The conditional probabilities can then be estimated efficiently from MCMC output (Morey et al., 2011; Gu et al., 2017). 5.2 Computation of the prior probability and prior density Similar as the posterior components, the conditional prior probability and prior density in (17), are computed by first approximating the unconstrained uniform prior using a prior sample. This can be done using the algorithm of Joe (2006). The algorithm is provided in Appendix B for completeness. To avoid the Borel-Kolmogorov paradox in the case of equality constraints (Wetzels et al., 2010), the Fisher transformation is also applied to the prior draws of the ρ s, resulting in prior draws for the corresponding η s. Substantially similarly as for the posterior, a linear transformation is applied to the prior draws for ξ, i.e., ξ = Tη. Unlike the posterior, the prior of ξ is not approximately normal. To estimate the prior density of ξ at r E t, (i.e., the Fisher transformed values of r E t ), we use the fact that P r( ξ 1 r E 1 < δ 2,..., ξ q E re q E < δ 2 H u) δ qe π u (ξ E = r E ) for sufficiently small δ. Because a large prior sample can efficiently be obtained without needing MCMC, estimating the above probability as the proportion of draws satisfying the constraints is quite efficient. Thus, the prior density will be estimated as ˆπ u (ξ E = r E ) = δ qe S 1 S s=1 I( ξ (s) 1 r E 1 < δ 2,..., ξ(s) q E r E q E < δ 2 ), for sufficiently large S and a some small value for δ > 0. To compute the conditional prior probability, we approximate the prior conditional of ξ I given ξ E = r E with a normal distribution where the mean and covariance matrix are estimated as the arithmetic mean and the least 21

22 squares estimate based on the prior draws that satisfy ξ E r E, respectively. The same approximation is used as for the equality constraints, i.e., ξ q (s) r q E < δ, for q = 1,..., 2 qe, where ξ q (s) is the s-th draw of ξ q. A normal approximation is justified for the computation of the conditional prior probability because this probability is not very sensitive to the exact distributional form. For example, the probability that a parameter is larger than 0 is identical for a uniform distribution in ( 1, 1) as for a standard normal distribution or for any other symmetrical distribution around zero. Note that this is not the case for the prior density at 0 which is why a normal approximation was not used for estimating the prior density. Based on the normal approximation, which can be summarized as π u (ξ E r E ) N(µ ξ0, Σ ξ0 ), the same procedure can be used for estimating the conditional prior probability of R I t T 1 [r E, ξ I ] > r I t, as was used for the conditional posterior probability. 6 Performance of the Bayes factor test To illustrate the behavior of the methodology we consider the following multiple hypothesis test H 1 : ρ 21 = ρ 31 = ρ 32 H 2 : ρ 21 > ρ 31 > ρ 32 H 3 : not H 1, H 2, for a generalized multivariate probit model with P = 3 dependent variables of which the first is normally distributed (with variance 1), the second is ordinal with two categories, and the third is ordinal with three categories for one population (the population index j is therefore omitted) and an intercept matrix B = (1, 1, 1). Data sets were generated for populations where the matrix with the measures of association were equal to C = 1 ρ 21 1 ρ 31 ρ 32 1 = 1 ρ for ρ =.7,.6,...,.6,.7. Note that for ρ = 0, ρ > 0, and ρ < 0, hypothesis H 1, H 2, and H 3 are true, respectively. Sample sizes of n = 30, 100, 500, and 5000 were considered. Equal prior probabilities were set for the hypotheses, i.e., P (H 1 ) = P (H 2 ) = P (H 3 ). For each data set, the Bayes factor of 22,

23 the constrained hypotheses against the unconstrained hypothesis were first computed using the methodology of Section 5. Note that the Bayes factor of the complement hypothesis H 3 against the unconstrained hypothesis can be obtained as the ratio of posterior and prior probabilities that the order constraints of H 2 do not hold. The equality constraints of hypothesis H 1 are not evaluated because they have zero probability under H 3. The Bayes factors between the constrained hypotheses can then be computed using its transitive relationship. Finally posterior probabilities for H 1, H 2, and H 3 can then be computed using (8). These posterior probabilities are plotted in Figure 4. The plots show consistent behavior of the posterior probabilities for the hypotheses: as the sample size becomes very large the posterior probability of the true hypothesis goes to 1 and the posterior probabilities of the incorrect hypotheses goes to zero. Furthermore it can be seen that for small data sets with n = 30, relatively large effects (approximately ρ =.5) need to be observed before either H 2 or H 3 (depending on the sign of the effect) receives more evidence than the null hypothesis. This implies that a null hypothesis with a uniform prior is better able predict the data than the alternative hypotheses in the case of moderate effects and small samples. As the sample size grows only very small to zero effects can best be explained by the null hypothesis as expected, and larger effects can best be explained by the alternative hypotheses. 7 Software The methodology is implemented in a Fortran software package to ensure fast computation and general utilization of the new methodology. The software package is referred to as BCT (Bayesian Correlation Testing) and can be downloaded from The user only needs to specify the model characteristics (such as the number of dependent variables, the measurement level of the dependent variables, and the number of covariates) and the hypotheses with competing equality and order constraints on the measures of association. After running the program an output file is generated that contains the posterior probabilities of all constrained hypotheses, as well as the complement hypothesis, based on equal prior probabilities for the hypotheses. Furthermore, unconstrained Bayesian estimates of the model parameters, and 95%-credibility intervals are provided. A user manual for 23

24 n = ρ n = ρ n = 500 n = ρ ρ H1: ρ21 = ρ31 = ρ32 H2: ρ21 > ρ31 > ρ32 H3: not H1, H2 Figure 4: Posterior probabilities of H 1 : ρ 21 = ρ 31 = ρ 32 (solid line), H 1 : ρ 21 > ρ 31 > ρ 32 (dotted line), and H 2 : not H 1, H 2 (dashed line) for different effects ρ and different sample sizes n. BCT can be found in Appendix C. 8 Exemplary hypotheses: Associations between life, leisure and relationship satisfaction (revisited) We now return to our empirical example and evaluate the informative hypotheses that we developed at the beginning of this contribution. Table 3 reports for each set of hypotheses the posterior probabilities that each hypothesis is true after observing the data when assuming equal prior probabilities for the hypotheses. Based on these results we can conclude that for men there is overwhelming evidence of equal partial associations between life satisfaction, leisure 24

25 Table 3: Posterior probabilities for the competing hypothesis about partial associations between life satisfaction and domain satisfaction in leisure and relationship. Hypotheses for men: H1 a : ρ g1y2y1 > ρ g1y3y1 > ρ g1y3y H1 b : ρ g1y3y1 > ρ g1y2y1 > ρ g1y3y H1 c : ρ g1y3y1 = ρ g1y2y1 = ρ g1y3y H1 d : not H1 a, nor H1 b, nor H1 c Hypotheses for women: H2 a : ρ g2y2y1 > ρ g2y3y1 > ρ g2y3y H2 b : ρ g2y3y1 > ρ g2y2y1 > ρ g2y3y H2 c : ρ g2y3y1 = ρ g2y2y1 = ρ g2y3y H2 d : not H2 a, nor H2 b, nor H2 c Hypotheses with gender as moderator: H3 a : ρ g1y2y1 > ρ g2y2y H3 b : ρ g1y2y1 = ρ g2y2y H3 c : ρ g1y2y1 < ρ g2y2y H4 a : ρ g1y3y1 > ρ g2y3y H4 b : ρ g1y3y1 = ρ g2y3y H4 c : ρ g1y3y1 < ρ g2y3y Posterior probabilities for the hypotheses 25

26 satisfaction, and relationship satisfaction, controlling for differences in mood at survey and self-reported health (97%). The same conclusion is reached for women, for whom the posterior probability of the hypothesis assuming equal partial correlations is estimated 98%. Finally, we consider the hypotheses that expect either ordered or equal conditional (i.e. with gender as a moderator) partial associations between the satisfaction variables. Here, we see that the evidence is about equally large for the hypotheses that life satisfaction and each type of domain-specific satisfaction relate equality strong to each other for men and women (about 77%). In summary, the analysis portrays a picture of a large degree of equality of association between satisfaction indicators while holding constant for other variables, and little evidence for the expected ordering of partial associations among the variables considered. 9 Concluding remarks This paper presented a flexible framework for testing statistical hypotheses on most commonly observed measure of association in social research. The methodology has the following useful properties. Constrained hypotheses can be formulated on tetrachoric correlations, polychoric correlations, biserial correlations, polyserial correlations, and product-moment correlations. These measures of association can be corrected for certain covariates to avoid spurious relations. A broad class of hypotheses can be tested with competing equality and/or order constraints on the measures of association. Multiple (more than two) hypothese can be tested simultaneously in a straightforward manner. A simple answer is provided to the research question which hypothesis receives most evidence from the data and how much, using Bayes factors and posterior probabilities. The proposed test is consistent which implies that the posterior probability of the true hypothesis goes to one as the sample size goes to infinity. The software package BCT allows social science researchers to easily apply the methodology in real-life examples. 26

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Bayesian Evaluation of Informative Hypotheses for Multiple Populations. Herbert Hoijtink. Department of Methodology and Statistics, Utrecht University

Bayesian Evaluation of Informative Hypotheses for Multiple Populations. Herbert Hoijtink. Department of Methodology and Statistics, Utrecht University Running head: BAIN FOR MULTIPLE POPULATIONS 1 Bayesian Evaluation of Informative Hypotheses for Multiple Populations Herbert Hoijtink Department of Methodology and Statistics, Utrecht University Xin Gu

More information

Bayesian Analysis of Latent Variable Models using Mplus

Bayesian Analysis of Latent Variable Models using Mplus Bayesian Analysis of Latent Variable Models using Mplus Tihomir Asparouhov and Bengt Muthén Version 2 June 29, 2010 1 1 Introduction In this paper we describe some of the modeling possibilities that are

More information

Illustrating the Implicit BIC Prior. Richard Startz * revised June Abstract

Illustrating the Implicit BIC Prior. Richard Startz * revised June Abstract Illustrating the Implicit BIC Prior Richard Startz * revised June 2013 Abstract I show how to find the uniform prior implicit in using the Bayesian Information Criterion to consider a hypothesis about

More information

Divergence Based priors for the problem of hypothesis testing

Divergence Based priors for the problem of hypothesis testing Divergence Based priors for the problem of hypothesis testing gonzalo garcía-donato and susie Bayarri May 22, 2009 gonzalo garcía-donato and susie Bayarri () DB priors May 22, 2009 1 / 46 Jeffreys and

More information

An Encompassing Prior Generalization of the Savage-Dickey Density Ratio Test

An Encompassing Prior Generalization of the Savage-Dickey Density Ratio Test An Encompassing Prior Generalization of the Savage-Dickey Density Ratio Test Ruud Wetzels a, Raoul P.P.P. Grasman a, Eric-Jan Wagenmakers a a University of Amsterdam, Psychological Methods Group, Roetersstraat

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Index Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Adaptive rejection metropolis sampling (ARMS), 98 Adaptive shrinkage, 132 Advanced Photo System (APS), 255 Aggregation

More information

Bayesian Latent-Normal Inference for the Rank Sum Test, the Signed Rank Test, and

Bayesian Latent-Normal Inference for the Rank Sum Test, the Signed Rank Test, and Bayesian Latent-Normal Inference for the Rank Sum Test, the Signed Rank Test, and Spearman s ρ arxiv:1712.06941v2 [stat.me] 20 Dec 2017 Johnny van Doorn Department of Psychological Methods, University

More information

A simple two-sample Bayesian t-test for hypothesis testing

A simple two-sample Bayesian t-test for hypothesis testing A simple two-sample Bayesian t-test for hypothesis testing arxiv:159.2568v1 [stat.me] 8 Sep 215 Min Wang Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA and Guangying

More information

Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

More information

Bayesian Statistics Adrian Raftery and Jeff Gill One-day course for the American Sociological Association August 15, 2002

Bayesian Statistics Adrian Raftery and Jeff Gill One-day course for the American Sociological Association August 15, 2002 Bayesian Statistics Adrian Raftery and Jeff Gill One-day course for the American Sociological Association August 15, 2002 Bayes Course, ASA Meeting, August 2002 c Adrian E. Raftery 2002 1 Outline 1. Bayes

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is Multinomial Data The multinomial distribution is a generalization of the binomial for the situation in which each trial results in one and only one of several categories, as opposed to just two, as in

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

MULTILEVEL IMPUTATION 1

MULTILEVEL IMPUTATION 1 MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression

More information

Bain: A program for Bayesian testing of order constrained hypotheses in structural equation models

Bain: A program for Bayesian testing of order constrained hypotheses in structural equation models Bain: A program for Bayesian testing of order constrained hypotheses in structural equation models Xin Gu 1, Herbert Hoijtink 2, Joris Mulder 3, and Yves Rosseel 4 1 Department of Educational Psychology,

More information

Textbook Examples of. SPSS Procedure

Textbook Examples of. SPSS Procedure Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of

More information

Bivariate Relationships Between Variables

Bivariate Relationships Between Variables Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Environmentrics 00, 1 12 DOI: 10.1002/env.XXXX Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Regina Wu a and Cari G. Kaufman a Summary: Fitting a Bayesian model to spatial

More information

Departamento de Economía Universidad de Chile

Departamento de Economía Universidad de Chile Departamento de Economía Universidad de Chile GRADUATE COURSE SPATIAL ECONOMETRICS November 14, 16, 17, 20 and 21, 2017 Prof. Henk Folmer University of Groningen Objectives The main objective of the course

More information

Latent Variable Centering of Predictors and Mediators in Multilevel and Time-Series Models

Latent Variable Centering of Predictors and Mediators in Multilevel and Time-Series Models Latent Variable Centering of Predictors and Mediators in Multilevel and Time-Series Models Tihomir Asparouhov and Bengt Muthén August 5, 2018 Abstract We discuss different methods for centering a predictor

More information

Bayesian Assessment of Hypotheses and Models

Bayesian Assessment of Hypotheses and Models 8 Bayesian Assessment of Hypotheses and Models This is page 399 Printer: Opaque this 8. Introduction The three preceding chapters gave an overview of how Bayesian probability models are constructed. Once

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

STAT 740: Testing & Model Selection

STAT 740: Testing & Model Selection STAT 740: Testing & Model Selection Timothy Hanson Department of Statistics, University of South Carolina Stat 740: Statistical Computing 1 / 34 Testing & model choice, likelihood-based A common way to

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

All models are wrong but some are useful. George Box (1979)

All models are wrong but some are useful. George Box (1979) All models are wrong but some are useful. George Box (1979) The problem of model selection is overrun by a serious difficulty: even if a criterion could be settled on to determine optimality, it is hard

More information

Bayesian Statistics as an Alternative for Analyzing Data and Testing Hypotheses Benjamin Scheibehenne

Bayesian Statistics as an Alternative for Analyzing Data and Testing Hypotheses Benjamin Scheibehenne Bayesian Statistics as an Alternative for Analyzing Data and Testing Hypotheses Benjamin Scheibehenne http://scheibehenne.de Can Social Norms Increase Towel Reuse? Standard Environmental Message Descriptive

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Inequality constrained hypotheses for ANOVA

Inequality constrained hypotheses for ANOVA Inequality constrained hypotheses for ANOVA Maaike van Rossum, Rens van de Schoot, Herbert Hoijtink Department of Methods and Statistics, Utrecht University Abstract In this paper a novel approach for

More information

A Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors

A Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors Journal of Data Science 6(008), 75-87 A Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors Scott Berry 1 and Kert Viele 1 Berry Consultants and University of Kentucky

More information

An Extended BIC for Model Selection

An Extended BIC for Model Selection An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Optimal exact tests for complex alternative hypotheses on cross tabulated data

Optimal exact tests for complex alternative hypotheses on cross tabulated data Optimal exact tests for complex alternative hypotheses on cross tabulated data Daniel Yekutieli Statistics and OR Tel Aviv University CDA course 29 July 2017 Yekutieli (TAU) Optimal exact tests for complex

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Correlation: Relationships between Variables

Correlation: Relationships between Variables Correlation Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means However, researchers are

More information

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks (9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

More information

Bayesian Evaluation of Informative Hypotheses

Bayesian Evaluation of Informative Hypotheses Bayesian Evaluation of Informative Hypotheses Xin Gu Utrecht University BAYESIAN EVALUATION OF INFORMATIVE HYPOTHESES BAYESIAANSE EVALUATIE VAN INFORMATIEVE HYPOTHESES (met een samenvatting in het Nederlands)

More information

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i, Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives Often interest may focus on comparing a null hypothesis of no difference between groups to an ordered restricted alternative. For

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

BAYESIAN MODEL SELECTION FOR CONSTRAINED MULTIVARIATE NORMAL LINEAR MODELS. Joris Mulder Utrecht University

BAYESIAN MODEL SELECTION FOR CONSTRAINED MULTIVARIATE NORMAL LINEAR MODELS. Joris Mulder Utrecht University BAYESIAN MODEL SELECTION FOR CONSTRAINED MULTIVARIATE NORMAL LINEAR MODELS Joris Mulder Utrecht University BAYESIAN MODEL SELECTION FOR CONSTRAINED MULTIVARIATE NORMAL LINEAR MODELS BAYESIAANSE MODELSELECTIE

More information

Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal?

Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal? Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal? Dale J. Poirier and Deven Kapadia University of California, Irvine March 10, 2012 Abstract We provide

More information

Data Analysis as a Decision Making Process

Data Analysis as a Decision Making Process Data Analysis as a Decision Making Process I. Levels of Measurement A. NOIR - Nominal Categories with names - Ordinal Categories with names and a logical order - Intervals Numerical Scale with logically

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information

Factor Analysis. Qian-Li Xue

Factor Analysis. Qian-Li Xue Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 7, 06 Well-used latent variable models Latent variable scale

More information

Running head: BSEM: SENSITIVITY TO THE PRIOR 1. Prior Sensitivity Analysis in Default Bayesian Structural Equation Modeling

Running head: BSEM: SENSITIVITY TO THE PRIOR 1. Prior Sensitivity Analysis in Default Bayesian Structural Equation Modeling Running head: BSEM: SENSITIVITY TO THE PRIOR 1 Prior Sensitivity Analysis in Default Bayesian Structural Equation Modeling Sara van Erp, Joris Mulder & Daniel L. Oberski Tilburg University, The Netherlands

More information

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables /4/04 Structural Equation Modeling and Confirmatory Factor Analysis Advanced Statistics for Researchers Session 3 Dr. Chris Rakes Website: http://csrakes.yolasite.com Email: Rakes@umbc.edu Twitter: @RakesChris

More information

Subjective and Objective Bayesian Statistics

Subjective and Objective Bayesian Statistics Subjective and Objective Bayesian Statistics Principles, Models, and Applications Second Edition S. JAMES PRESS with contributions by SIDDHARTHA CHIB MERLISE CLYDE GEORGE WOODWORTH ALAN ZASLAVSKY \WILEY-

More information

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic STATISTICS ANCILLARY SYLLABUS (W.E.F. the session 2014-15) Semester Paper Code Marks Credits Topic 1 ST21012T 70 4 Descriptive Statistics 1 & Probability Theory 1 ST21012P 30 1 Practical- Using Minitab

More information

The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Interpreting and using heterogeneous choice & generalized ordered logit models

Interpreting and using heterogeneous choice & generalized ordered logit models Interpreting and using heterogeneous choice & generalized ordered logit models Richard Williams Department of Sociology University of Notre Dame July 2006 http://www.nd.edu/~rwilliam/ The gologit/gologit2

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

A Study into Mechanisms of Attitudinal Scale Conversion: A Randomized Stochastic Ordering Approach

A Study into Mechanisms of Attitudinal Scale Conversion: A Randomized Stochastic Ordering Approach A Study into Mechanisms of Attitudinal Scale Conversion: A Randomized Stochastic Ordering Approach Zvi Gilula (Hebrew University) Robert McCulloch (Arizona State) Ya acov Ritov (University of Michigan)

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

Nesting and Equivalence Testing

Nesting and Equivalence Testing Nesting and Equivalence Testing Tihomir Asparouhov and Bengt Muthén August 13, 2018 Abstract In this note, we discuss the nesting and equivalence testing (NET) methodology developed in Bentler and Satorra

More information

Bayesian Hypothesis Testing: Redux

Bayesian Hypothesis Testing: Redux Bayesian Hypothesis Testing: Redux Hedibert F. Lopes and Nicholas G. Polson Insper and Chicago Booth arxiv:1808.08491v1 [math.st] 26 Aug 2018 First draft: March 2018 This draft: August 2018 Abstract Bayesian

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Estimating the Correlation in Bivariate Normal Data with Known Variances and Small Sample Sizes 1

Estimating the Correlation in Bivariate Normal Data with Known Variances and Small Sample Sizes 1 Estimating the Correlation in Bivariate Normal Data with Known Variances and Small Sample Sizes 1 Bailey K. Fosdick and Adrian E. Raftery Department of Statistics University of Washington Technical Report

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Reminder: Student Instructional Rating Surveys

Reminder: Student Instructional Rating Surveys Reminder: Student Instructional Rating Surveys You have until May 7 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs The survey should be available

More information

SEM for Categorical Outcomes

SEM for Categorical Outcomes This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Another Statistical Paradox

Another Statistical Paradox Another Statistical Paradox Eric-Jan Wagenmakers 1, Michael Lee 2, Jeff Rouder 3, Richard Morey 4 1 University of Amsterdam 2 University of Califoria at Irvine 3 University of Missouri 4 University of

More information

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access Online Appendix to: Marijuana on Main Street? Estating Demand in Markets with Lited Access By Liana Jacobi and Michelle Sovinsky This appendix provides details on the estation methodology for various speci

More information

Xia Wang and Dipak K. Dey

Xia Wang and Dipak K. Dey A Flexible Skewed Link Function for Binary Response Data Xia Wang and Dipak K. Dey Technical Report #2008-5 June 18, 2008 This material was based upon work supported by the National Science Foundation

More information

Bayesian tests of hypotheses

Bayesian tests of hypotheses Bayesian tests of hypotheses Christian P. Robert Université Paris-Dauphine, Paris & University of Warwick, Coventry Joint work with K. Kamary, K. Mengersen & J. Rousseau Outline Bayesian testing of hypotheses

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SEM is a family of statistical techniques which builds upon multiple regression,

More information

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 What is SEM? When should we use SEM? What can SEM tell us? SEM Terminology and Jargon Technical Issues Types of SEM Models Limitations

More information

arxiv: v1 [stat.ap] 27 Mar 2015

arxiv: v1 [stat.ap] 27 Mar 2015 Submitted to the Annals of Applied Statistics A NOTE ON THE SPECIFIC SOURCE IDENTIFICATION PROBLEM IN FORENSIC SCIENCE IN THE PRESENCE OF UNCERTAINTY ABOUT THE BACKGROUND POPULATION By Danica M. Ommen,

More information

Bayesian Methods in Multilevel Regression

Bayesian Methods in Multilevel Regression Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation Bivariate Regression & Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate Linear Regression Line SPSS Output Interpretation Covariance ou already

More information

Model Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC.

Model Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC. Course on Bayesian Inference, WTCN, UCL, February 2013 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs) 36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Nemours Biomedical Research Biostatistics Core Statistics Course Session 4. Li Xie March 4, 2015

Nemours Biomedical Research Biostatistics Core Statistics Course Session 4. Li Xie March 4, 2015 Nemours Biomedical Research Biostatistics Core Statistics Course Session 4 Li Xie March 4, 2015 Outline Recap: Pairwise analysis with example of twosample unpaired t-test Today: More on t-tests; Introduction

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Comparing Non-informative Priors for Estimation and. Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and. Prediction in Spatial Models Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Vigre Semester Report by: Regina Wu Advisor: Cari Kaufman January 31, 2010 1 Introduction Gaussian random fields with specified

More information