Adaptive testing of conditional association through Bayesian recursive mixture modeling

Size: px
Start display at page:

Download "Adaptive testing of conditional association through Bayesian recursive mixture modeling"

Transcription

1 Adaptive testing of conditional association through Bayesian recursive mixture modeling Li Ma February 12, 2013 Abstract In many case-control studies, a central goal is to test for association or dependence between the predictors and the response. Relevant covariates must be conditioned on to avoid false positives and loss in power. Conditioning on covariates is easy in parametric frameworks such as the logistic regression by incorporating the covariates into the model as additional variables. In contrast, nonparametric methods such as the Cochran-Mantel-Haenszel test accomplish conditioning by dividing the data into strata, one for each possible covariate value. In modern applications, this often gives rise to numerous strata, most of which are sparse due to the multi-dimensionality of the covariate and/or predictor space, while in reality, the covariate space often consists of just a small number of subsets with differential response-predictor dependence. We introduce a Bayesian approach to inferring from the data such an effective stratification and testing for association accordingly. The core of our framework is a recursive mixture model on the retrospective distribution of the predictors, whose mixing distribution is a prior on the partitions on the covariate space. Inference under the model can proceed efficiently in closed form through a sequence of recursions, striking a balance between model flexibility and computational tractability. A power study shows that our method substantially outperforms classical tests under various scenarios. 1

2 1 Introduction The case-control or retrospective sampling scheme is a popular design for studying the dependence relationship between a set of predictors and a binary response variable. It is more cost efficient compared to prospective (or cohort) studies when one of the response category is much less common than the other [1]. In the past 15 years, this design is widely adopted in areas such as epidemiology, most notably in the so-called genetic association studies [9], which aim at finding the genetic factors that affect disease risks. In these studies, a central focus is to test for the association (or dependence) between the predictors (the gene markers) and the response (the disease). The classical approach to analyzing case-control studies centers around inference on the odds ratio [2]. In particular, the most popular class of methods of this sort is the logistic regression, which imposes linear modeling assumptions over the odds-ratio on the log transformed scale. In many applications such as genetic association studies, there are no particular reasons to believe that genetic factors add to (or reduce from) the disease risk in a linear, additive fashion under a particular scale such as the logit. Indeed, despite the ever increasing sample sizes involved in such studies, the genetic factors discovered for most common complex disease such as heart diseases and diabetes contribute to only a small fraction of the heritability of these diseases [8]. One potential direction, among others, to unraveling this mystery of missing heritability is to develop more flexible tests for association, in particular those without any parametric modeling assumptions. Several nonparametric methods for testing association have been adopted in applications such as genetic association studies. They are classical tests for independence on contingency tables, including Pearson s χ 2 -test [1, Sec. 3.2] as well as their Bayesian counterparts based on computing the appropriate Bayes factor (BF) [6, 10]. While these classical nonparametric tests afford more flexibility in capturing the different modes of predictor-response association than parametric methods, they have a few important shortcomings that have significantly 2

3 limited their usefulness. One of the most important limitations is the difficulty, in comparison to parametric methods, to incorporate covariates that should be properly conditioned on. In genetic association studies, for example, such covariates may include gender, race, diet, and smoking status, etc. Without proper conditioning, phenomena such as Simpson s paradox may arise, leading to false positives and loss of power in detecting true signals. Classical extensions to these tests, such as the (generalized) Cochran-Mantel-Haenszel () test [1, Sec ] allows conditioning on covariates by dividing the marginal predictorresponse table into conditional ones, or the so-called strata, based on the covariate values. In particular, the method treats observations with each possible covariate value combination as a separate stratum, and computes a global test statistic essentially by adding up statistics summarizing the evidence from each stratum. In many modern applications, the strata can often be very sparse containing few data points due to the multi-dimensionality of the covariates and/or predictors. In addition, the test relies on a homogeneous effect assumption that the dependence structure as measured in odds-ratio is the same (or in practice at least similar) across all the strata in adding up the information across strata. When this assumption holds, the test is moderately robust to data sparsity, but the assumption is often unrealistic in modern applications with multiple covariates. Consequently, classical methods based on such brute-force conditioning often performs very poorly in modern applications such as genetic association studies and are not commonly adopted. In many practical situations, the covariate space can often be divided into just a small number of homogeneous groups with respect to their relation to the predictor-response dependence. For example, in epidemiological studies, even though many covariates may be considered, one often expects only a small number of them to be relevant. In other words, the underlying actual number of strata necessary for appropriate conditioning is small, suggesting that the brute-force approach to conditioning can be extremely wasteful. Following this reasoning, it is desirable to condition on the covariates based on a strat- 3

4 ification of the covariate space that is as parsimonious as possible. But of course how the covariate space should be partitioned to provide sufficient conditioning is typically not known a priori. A key motivation of this work is that such information is contained in the data, and with the appropriate method can be inferred. In particular, we introduce a Bayesian framework for nonparametric testing of association that achieves such adaptive conditioning on the covariates. Inference under our framework proceeds entirely in a principled, probabilistic fashion that properly takes into account all sources of uncertainty, including that involved in inferring the stratification. The framework is based on a recursive mixture formulation for the retrospective distribution of the predictors for which the mixing probabilities are defined by a recursively constructed probability distribution on the collection of partitions over the covariate space. This recursive mixture design allows inference under this framework to be carried out efficiently in closed form through a sequence of recursion, striking a balance between model flexibility and computational tractability. The rest of the paper is organized as follows. In Section 2 we start by briefly reviewing the existing Bayesian methods for testing association without any covariates. Then we introduce our method as an extension to the existing approach in the simple case with a binary covariate. Next we generalize our framework to cover the scenario with multiple predictors and covariates. In Section 3 we illustrate our method through numerical examples and carry out a power study to investigate its performance and compare it to those of two classical methods. In Section 4 we discuss choices of prior specification and investigate the robustness of our method to different choices. Finally we conclude with some discussions in Section 5. While the framework we propose can be applied to both discrete and continuous predictors and covariates, we focus our presentation on the discrete case partly due to our prime motivating example genetic association studies and partly for simplicity. In the following, we use G to denote predictors, E for covariates, and D for the response, standing for genes, environment, and disease respectively as in a genetic association study. 4

5 2 Methods 2.1 Testing association without covariates We start from the simple case in which there is a single discrete predictor G and no covariates. For illustrative purpose, let G be a single trinary predictor, such as a single-nucleotidepolymorphism (SNP) marker commonly encountered in genetic association studies, and so G Ω G = {0, 1, 2}. Also, let θ d = (θ d0,θ d1,θ d2 ) for d = 0 and 1 be the distribution of G given the response D = d, that is P(G D = d). In other words, θ 0 and θ 1 denote the case and control retrospective predictor distributions respectively. Accordingly, we let n dg for d = 0, 1 and g = 0, 1, 2 denote the number of observations with D = d and G = g, and let n d = (n d0,n d1,n d2 ) for d = 0, 1. Under the null hypothesis that no association exists between G and D H 0 : θ 0 = θ 1 and under the alternative that there is association, H 1 : θ 0 θ 1. A simple Bayesian approach for testing H 0 and H 1 is to place Dirichlet priors on these probability vectors [10] and compute the corresponding marginal likelihood under each hypothesis. One can then calculate the Bayes factor (BF) [6] as a ratio of these marginal likelihoods and use it as an instrument for testing. More specifically, under H 0, we let θ 0 = θ 1 Dirichlet(α) where α = (α 0,α 1,α 2 ), the prior pseudo-count parameters of the Dirichlet prior. A common non-informative choice is to let α 0 = α 1 = α 2 = 0.5. It is easy to show that given this prior, the marginal likelihood under H 0 is M 0 = D(n 0 + n 1 + α) D(α) 5

6 where the function D( ), when applied on a vector of length p, is defined as D(t 1,t 2,...,t p ) = p i=1 Γ(t i) / p ) Γ(. i=1 t i Similarly, under the alternative hypothesis H 1, we let θ 0 Dirichlet(α 0 ) and θ 1 Dirichlet(α 1 ) where α d = (α d0,α d1,α d2 ) for d = 0 or 1 are the pseudo-count parameters for the Dirichlet priors. The marginal likelihood under H 1 is then M 1 = D(n 0 + α 0 )D(n 1 + α 1 ). D(α 0 )D(α 1 ) The BF for comparing H 1 against H 0 is then BF = M 1 /M 0. Stephens and Balding [10] called this BF the retrospective BF, or rbf, since it is based on modeling the retrospective distribution of G, P(G D). Besides computing the BF, one may further assign prior probabilities to the two hypotheses (or competing models) H 0 and H 1 : π(h 0 ) = γ and π(h 1 ) = 1 γ. One can think of this as modeling the retrospective distribution P(G D) using a mixture model with two mixing components H 0 and H 1. The marginal likelihood under this mixture model is then M = γm 0 + (1 γ)m 1. By Bayes Theorem, the probability for H 0 given the data, that is, the posterior probability for no association between G and D, is π(h 0 data) = π(h 0)M 0 M = γm 0 γm 0 + (1 γ)m 1 = γ γ + (1 γ) BF. (1) 2.2 Testing conditional association with a single binary covariate Next we consider the case with covariates but start with the simplest such scenario when there is a single binary covariate E Ω E = {0, 1}. This covariate may be an environment 6

7 variable (e.g. smoking or not) that may affect the joint distribution of the predictors and the response. There are two possibilities in the relationship between the covariate E and the distribution of G given D: either the distribution P(G D,E) depends on the value of E or it does not. An appropriate model for the retrospective distribution should take into account both possibilities. To present such a model, we introduce some new notation. Let θ ed = (θ ed0,θ ed1,θ ed2 ) for d = 0, 1 and e = 0, 1 be the probability distribution vector of G given D = d and E = e, that is P(G D = d,e = e). Similarly, let n edg be the number of observations with D = d, E = e and G = g, and let n ed = (n ed0,n ed1,n ed2 ). For any non-empty A Ω E, that is, A = {0}, {1}, or {0, 1}, we define H(A) : θ e0 θ 0 (A) and θ e1 θ 1 (A) for all e A. In other words, H(A) is the hypothesis that A forms a homogeneous stratum the retrospective distribution of G given D does not depend on E given E A, or P(G D,E = e) = P(G D,E A) for all e A. Note that when A = {0} or {1}, it has only one element, and so H(A) holds trivially. The hypothesis H(A) can be further divided into two sub-hypotheses depending on whether P(G D,E A) depends on D: H 0 (A) : H 1 (A) : H(A) is true, and θ 0 (A) = θ 1 (A) H(A) is true, but θ 0 (A) θ 1 (A). In words, H 0 (A) represents the scenario that A forms a single stratum, and given E A, no association exists between G and D. In contrast, H 1 (A) represents the case that the retrospective distribution differs across the two response groups and so there is association between G and D conditional on E A. To compare H 0 (A) and H 1 (A) in a Bayesian framework, we can again place Dirichlet 7

8 priors on the retrospective distribution of G given D for E A. Under H 0 (A), we let θ 0 (A) = θ 1 (A) θ(a) Dirichlet(α(A)) where α(a) = (α 0 (A),α 1 (A),α 2 (A)) are the pseudo-count parameters. Under H 1 (A), we let θ 0 (A) Dirichlet(α 0 (A)) and θ 1 (A) Dirichlet(α 1 (A)) where α d (A) = (α d0 (A),α d1 (A),α d2 (A)) for d = 0, 1 are the pseudo-count parameters. With this setup, the marginal likelihoods from the data with E A under H 0 (A) and H 1 (A) are M 0 (A) = D(n 0(A) + n 1 (A) + α(a)) D(α(A)) and M 1 (A) = D(n 0(A) + α 0 (A))D(n 1 (A) + α 1 (A)), D(α 0 (A))D(α 1 (A)) (2) where n d (A) = (n d0 (A),n d1 (A),n d2 (A)) is the sample size vector with n dg (A) := e A n edg = # of observations with D = d, G = g and E A. The corresponding BF for the two components H 0 (A) versus H 1 (A) is BF(A) = M 1 (A)/M 0 (A), which measures the evidence against no association given E A under H(A). Similar to before, we can place prior probabilities on H 0 (A) and H 1 (A) under H(A): π(h 0 (A) H(A)) = γ(a) = 1 π(h 1 (A) H(A)), that is, to model the retrospective distribution of G given D and E A under H(A) using a mixture with components H 0 (A) and H 1 (A). The marginal probability under H(A) is then M(A) = γ(a)m 0 (A) + (1 γ(a))m 1 (A). (3) By Bayes Theorem, we get the probability for H 0 (A) given the data γ post (A) := π ( H 0 (A) H(A), data ) = γ(a)m 0(A) M(A) = γ(a) (1 γ(a)) BF(A). (4) 8

9 Figure 1: The two-layer mixture model for P(G D,E) for Ω E = {0, 1}. Red indicates the conditional mixing probabilities induced by the latent decisions. This is completely analogous to what we have seen in Eq. (1) for the case without covariates. Either γ post (A) or BF(A) can be used to test H 0 (A) versus H 1 (A). In particular, when A = {0, 1}, this allows us to test H 0 ({0, 1}) (no association between G and D) against H 1 ({0, 1}) (there is association between G and D) under H({0, 1}), that is, when the covariate does not affect the distribution of G given D at all. This is apparently not satisfactory enough, as we must also take into account the case when the dependence between G and D does depend on E, that is, when {0, 1} does not form a single stratum. Further extension is needed to produce a test that incorporates the latter possibility. We address this need in the Bayesian way by introducing a further layer of mixture over the competing hypotheses H({0, 1}) vs H c ({0, 1}). We let π (H({0, 1})) = ρ({0, 1}) and π (H c ({0, 1})) = 1 ρ({0, 1}) be the mixing (or prior) probabilities for the two hypotheses. The idea is that we will be able to let the data speak as to whether {0, 1} forms a stratum or not. Figure 1 illustrates the structure of this two-layer mixture model for P(G D,E). To see how we may test for association with this model, let Φ({0, 1}) be the overall 9

10 marginal probability (conditional on the covariate) under the two-layer mixture model. To find Φ({0, 1}), first note that given H({0, 1}), the marginal likelihood is M({0, 1}). Under H c ({0, 1}), on the other hand, the distribution of G given D is different for E = 0 versus for E = 1, and so the likelihood in this case is the product of the likelihoods from the data in each of the covariate subsets {0} and {1}, or M({0}) M({1}). Accordingly, the overall marginal likelihood on {0, 1} under the two-layer mixture model is Φ({0, 1}) = ρ({0, 1})M({0, 1}) + (1 ρ({0, 1}))M({0}) M({1}). (5) Eqs. (2), (3), and (5) provide a complete recipe for computing the overall marginal likelihood Φ({0, 1})) using the marginal likelihoods from subsets of the data, M({0}) and M({1}). Note that the BF for comparing H({0, 1}) versus its complement H c ({0, 1}) is M({0})M({1})/M({0, 1}), and the posterior probability for H({0, 1}) given the data is ρ post ({0, 1}) := π (H({0, 1}) data) = ρ({0, 1})M({0, 1}), (6) Φ({0, 1}) which can be used for testing whether the distribution P(G D,E) depends on E. Next let us find the posterior probability for association between G and D given E, or whether the distribution P(G D,E) depends on the response D. By plugging (3) into (5), we get a seemingly tedious expression for the marginal likelihood: ( ) Φ({0, 1}) = ρ({0, 1}) γ({0, 1})M 0 ({0, 1}) + (1 γ({0, 1})) M 1 ({0, 1}) + ( 1 ρ({0, 1}) )( )( ) γ({0})m 0 ({0}) + (1 γ({0})) M 1 ({0}) γ({1})m 0 ({1}) + (1 γ({1})) M 1 ({1}. However after rearranging terms we can write Φ({0, 1}) = Ψ({0, 1}) + ({0, 1}) where Ψ({0, 1})=ρ({0, 1})γ({0, 1})M 0 ({0, 1})+ ( 1 ρ({0, 1}) )( ) γ({0})γ({1})m 0 ({0})M 0 ({1}) (7) 10

11 and ({0, 1}) is defined to be Φ({0, 1}) Ψ({0, 1}). What are the meanings of Ψ({0, 1}) and ({0, 1})? It turns out that Ψ({0, 1}) is the marginal probability of the data joint with the following event: or in words H 0 ({0, 1}) ( H c ({0, 1}) ( H 0 ({0}) H 0 ({1}) )), {G and D are not associated conditional on E.} while ({0, 1}) = Φ({0, 1}) Ψ({0, 1}) is the marginal probability of the data joint with the event H 1 ({0, 1}) ( H c ({0, 1}) ( H 1 ({0}) H 1 ({1}) )), or in words {G and D are associated conditional on E.} By Bayes Theorem, the posterior probability of no association is thus P(G and D are not associated conditional on E data) = Ψ({0, 1}) Φ({0, 1}). (8) Accordingly, the posterior probability for association is 1 Ψ({0, 1})/Φ({0, 1}). This suggests a three-step recipe for testing the association between G and D conditional on E: 1. Compute M 0 (A), M 1 (A), and M(A) for A = {0}, {1}, and {0, 1} by Eqs. (2) and (3). 2. Compute Φ({0, 1}) by Eq. (5), and compute Ψ({0, 1}) by Eq. (7). 3. Compute Ψ({0, 1})/Φ({0, 1}) and use it as a test statistic for association. This test is nonparametric as our model places no distributional assumptions on P(G D,E). To carry out this testing procedure one must specify the prior mixing probabilities ρ({0, 1}), γ({0, 1}), γ({0}), and γ({1}). While in certain examples one may have a priori beliefs about these probabilities, most often such knowledge is lacking and there needs to 11

12 be some guidelines for choosing these parameters. To this end, a simple choice is to set all of the mixing probabilities to a non-extreme constant value. For example, one may set those probabilities to 0.5, that is, whenever there is a choice between two mixing components we assume 50/50 probability to choose either. This type of prior specification is out of convenience, but our experience suggests that it generally leads to good performance and the resulting inference is robust to this choice choosing a constant value between 0.1 and 0.9 often leads to similar results. (We investigate the robustness of inference to such choices and justify this claim in Section 4 through a sensitivity analysis.) Additionally, one also needs to specify the prior pseudo-count vectors α({0, 1}), α 0 ({0}) and α 1 ({1}) for the Dirichlet priors. For these, we adopt Jeffrey s prior and set all the pseudo-counts to Testing conditional association with multivariate covariates Next we consider the more general case with multiple covariates. Let E = (E 1,E 2,...,E h ) be h covariates with joint sample space Ω E = Ω E1 Ω E2 Ω Eh. To illustrate ideas, let us consider the case with binary covariates, that is Ω E = {0, 1} h. (Our framework is applicable to other discrete, continuous, or mixed spaces. The case with binary covariates greatly simplifies presentation.) Let A be any non-empty rectangular subset of Ω E, that is, A = A 1 A 2... A h where each A i for i = 1, 2,...,h is a non-empty subset of Ω Ei = {0, 1}. We define hypotheses H(A), H 0 (A) and H 1 (A) in the same way as we did in Section 2.2, and again model the retrospective distribution of G given D and E A as a mixture of H(A) and H c (A) with mixing probabilities ρ(a) and 1 ρ(a). Similar to before, under H(A) A forms a homogeneous stratum we model P(G D,E) as a mixture of H 0 (A) and H 1 (A) with mixing probabilities γ(a) and 1 γ(a). Accordingly, the expressions for the marginal likelihood terms M 0 (A), M 1 (A) and M(A) all stay the same as given in Eqs. (2) and (3). The difference from the case with only one covariate arises under H c (A) that is, when 12

13 the retrospective distribution of G given D does depend on E for E A. Earlier, for A containing more than one value, for example A = Ω E = {0, 1}, under H c (A) we have modeled the retrospective distribution P(G D,E) for E = 0 and for E = 1 separately. Similarly, here we want to model the retrospective distribution differently for varying E values, but it is undesirable to adopt a separate model for each possible value of E, as A may contain a large number of possible values. In particular, for A = Ω E, it contains 2 h values. In real problems, for a large Ω E, many, if not all, of the E values will correspond to only a small fraction of observations, while a gigantic model with a huge number of free parameters will be needed for separate modeling of the retrospective distribution under all possible E values, costing power for detecting association. An oracle, who knows exactly how the retrospective distribution depends on the covariates, will know the optimal way to divide Ω E for modeling. For example, suppose a study has four covariates gender (M/F), smoking (Y/N), exercise (Y/N) and diet (Vegetarian/Non- Vegetarian), and that the subjects who smoke and do not exercise have a different retrospective distribution for G given D from others. The oracle will then divide Ω E into two blocks Ω E = Ω E,1 Ω E,2 according to the subjects smoking and exercise status, and adopt two separate models for the retrospective distributions P(G D,E Ω E,1 ) and P(G D,E Ω E,2 ). In practical situations, however, such oracular knowledge is unavailable, at least not perfectly. Good divisions of the covariate space are not known a priori. But such information is contained in the data and with appropriate methods can be inferred! From a statistical perspective, this is but a model comparison/selection problem, where the different models are indexed by the corresponding stratifications (i.e. partitions) over the covariate space. Indeed we have already adopted this idea in the simple case of a single binary covariate. There were only two possible partitions on {0,1} either {0, 1} is undivided or it is partitioned into {0} and {1} corresponding to H({0, 1}) and H c ({0, 1}) respectively. The mixing probabilities ρ({0, 1}) and 1 ρ({0, 1}) form a prior on this (albeit small) model space, which 13

14 allows inference on the appropriate stratification through the standard Bayesian machinery. For general covariate spaces, a natural generalization is to place a prior over the collection of partitions or stratifications on Ω E, which represents different models for P(G D,E). This will allow us to infer on the appropriate division and thereby achieve adaptive conditioning on the covariates. In principle, we can choose any probability distribution over the collection of partitions on Ω E. But in making this choice, one must balance the need for modeling flexibility and for computational tractability. More specifically, we want the prior to cover a wide class of partitions, while still allowing inference to be carried out efficiently. The latter is especially important in large-scale studies such as genome-wide studies where the scientist often needs to test a large number ( ) of genomic locations. Having to run, say, a separate Markov Chain Monte Carlo (MCMC) chain for each location, aside from the accumulation of Monte Carlo errors, is often impractical. One way to constructing such a prior distribution on the space of partitions that meets the dual-goal is by randomly dividing the covariate space recursively, in a sequential manner. Recursive partitioning has large support over the space of partitions. Moreover, as we will see, efficient posterior inference can be attained utilizing the self-similar nature of the prior. Probability distributions over recursive partitions have been extensively studied in the literature. A notable example include the Bayesian CART [3, 4, 12]. The prior we adopt here, which we call the optional tree (OT) distribution, is a constructive process recently introduced in the context of density estimation [11]. The key motivation to adopting the OT prior on stratifications is that it gives rise to a recursive mixture model for P(G D, E) that extends naturally our earlier two-layer mixture in the single binary covariate case. As a result, inference can also proceed efficiently in a way that generalizes the recipe for the single binary covariate case. In the following, we provide the details about this recursive mixture framework and show how it can be used for adaptively testing conditional association. We start with a brief description of the OT 14

15 distribution The optional tree (OT) distribution The OT distribution can be described in terms of the following constructive procedure, termed the OT procedure, that generates a random partition on Ω E recursively. Starting from the whole space A = Ω E, draw a Bernoulli random variable S(A) Bernoulli(ρ(A)). If S(A) = 1, then we terminate the partitioning procedure on A and end up with a trivial partition of a single block. If instead S(A) = 0, then we proceed to divide up A into smaller pieces as follows. Suppose there are a total of N(A) available ways to partition A. We randomly select one of the N(A) ways to divide A and let λ j (A) denote the probability of choosing the jth way of partitioning. That is, we draw a random variable J(A) {1, 2,...,N(A)} with P(J(A) = j) = λ j (A) for j = 1, 2,...,N(A) and N(A) j=1 λ j(a) = 1. If J(A) = j then we divide A in the jth possible way into K j (A) children A = Kj (A) i=1 A j i where A j i denote the ith child of A under the jth possible way of partitioning. For example, if we allow only dimension-wise dyadic cuts on A that is, each A can be divided into two halves according to the value of one of the covariates, then for A = Ω E = {0, 1} h, we have N(A) = h, K j (A) = 2 for all j = 1, 2,...,h, and the two children A j 1 and A j 2 are A j 1 = {(e 1,e 2,...,e h ) : e j {0}, and e i {0, 1} for i j, i = 1, 2,...,n} A j 2 = {(e 1,e 2,...,e h ) : e j {1}, and e i {0, 1} for i j, i = 1, 2,...,n}. This completes the partition step for A = Ω E, and we then restart the same procedure on each of the children A j i, beginning with the drawing of a Bernoulli stopping variable S(Aj i ). This recursive partitioning procedure continues so on and so forth, but it will eventually stop everywhere, thereby producing a random partition of Ω E consisting of rectangular 15

16 blocks on Ω E. The process is specified by the hyperparameters ρ(a) and λ j (A) for j = 1, 2,...,N(A) for each set A Ω E that can potentially arise during the procedure. Note that the partition procedure will always stop because the atomic subsets those A s that contain only one possible value for E such as A = {1} {1} {1} {1} cannot be further divided so we must let ρ(a) = 1 for these sets. Hence if the partitioning procedure has reached all the way to the atomic sets on some parts of Ω E, then it will be forced to stop there as no further partitioning is possible. For h = 1, we have Ω E = {0, 1}, and so A = {0} and A = {1} are the only two atomic subsets as we have seen in Section 2.2. (In the more general case when some of the covariates are continuous and so there are no atomic subsets, one can show that the partitioning procedure will stop almost everywhere on Ω E with probability 1, provided that all prior stopping probabilities ρ(a) are bounded below by a common constant c > 0.) A recursive mixture model for P(G D,E) Now that we have a prior distribution on the partitions over Ω E, let us associate the random partitions arising from this process to the corresponding models for the distribution of G given D and E. If the OT procedure stops on a subset A, that is, S(A) = 1, we let A form a single stratum and model the retrospective distribution P(G D,E) for E A with H(A), specified as a mixture model over H 0 (A) and H 1 (A) with mixing probabilities γ(a) and 1 γ(a) as we did for the case with h = 1 in Section 2.2. If instead S(A) = 0 and J(A) = j so A is not a single stratum and is divided into its children A = A j 1 A j 2, then we adopt H c (A) for the retrospective distribution P(G D,E) for E A and build separate models for the retrospective distribution for E in each A j i, in the same manner as we have just described for A. Again, since the partitioning will eventually stop, this recursive mapping from partitions to models is well-defined. An important feature of this recursive prior/model specification is its self-similarity the prior and model specified on a rectangular subset A is in exactly the same form as the one on its children. As we will see below, this feature leads 16

17 Figure 2: The local model for P(G D,E) for E A when Ω E = {0, 1} h and dimensionwise dyadic cuts are allowed. Red indicates the conditional mixing probabilities induced by the latent decisions. For simplicity, we have suppressed notation and N(A) is denoted as N. to an analytic approach to inference based on recursive computation. From now on, we shall refer to this self-similar prior/model specification on A as the local (recursive mixture) model on A. Figure 2 illustrates the self-similar design of each local model Adaptive conditional association test under the recursive mixture model How do we test for predictor-response association conditional on the covariates under this recursive model? Also in a recursive manner! To see this, we again define Φ(A) to be the marginal probability (conditional on E) under the local model on A, computed from the observations with E A. From the design of the model, Φ(A) can be written recursively as Φ(A) = ρ(a)m(a) + ( 1 ρ(a) ) N(A) j=1 K j (A) λ j (A) i=1 Φ(A j i ) (9) 17

18 where M(A) is the marginal likelihood under H(A) as in Eq. (3) and N(A) j=1 λ j(a) K j (A) i=1 Φ(A j i ) is the marginal likelihood under H c (A). Eq. (9) is a generalization of Eq. (5) with the main difference being the additional integration over the N(A) different ways to partition A. In order to infer on whether G is associated with D conditional on E, it is again useful to decompose Φ(A) into two pieces Φ(A) = Ψ(A) + (A) where Ψ(A) = ρ(a)γ(a)m 0 (A) + ( 1 ρ(a) ) N(A) j=1 K j (A) λ j (A) i=1 Ψ(A j i ) (10) is the part of Φ(A) contributed from the models with no association between G and D given E. Eq. (10) shows that Ψ(A) also has a recursive representation. To understand Eq. (10), note that when H(A) holds, the marginal likelihood on A under no association is M 0 (A) as defined in Eq. (2), while if H(A) does not hold and A is divided in the jth way, then the contribution to the marginal likelihood on A from the models of no association is the product of such contributions from A s children. Finally, when A is an atomic subset, then ρ(a) = 1 and Ψ(A) = γ(a)m 0 (A) by design. So this recursion will also eventually terminate. Eqs. (9) and (10) provide a recipe for recursively computing Φ(A) and Ψ(A). But how do we use these terms? In very much the same way as we did in Section 2.2. More specifically, Ψ(Ω E ) is the probability of the data joint with the event W = {G and D are not associated conditional on E.} By Bayes Theorem, the posterior probability for the above event is Pr(W data) = Ψ(Ω E) Φ(Ω E ). Accordingly, the posterior probability for association is 1 Ψ(Ω E )/Φ(Ω E ). To summarize, under the proposed framework one may test the association between G 18

19 and D conditional on E through the following three-step recipe (which is in complete analogy to the three-step recipe presented at the end of Section 2.2): 1. Compute M 0 (A), M 1 (A) and M(A) for each rectangular subset A of Ω E. 2. Compute Φ(A) and Ψ(A) recursively by Eqs. (9) and (10) for the rectangular subsets. 3. Compute Ψ(Ω E )/Φ(Ω E ) and use as a statistic for testing association. Again, to carry out this test one needs to specify the prior mixing probabilities ρ(a), γ(a), λ(a), and the pseudo-counts for each rectangular subset A of Ω E. A simple choice that often leads to good performance in common situations is to set ρ(a) 0.5, λ(a) = 1/N(A) for all non-atomic A s, γ(a) 0.5 for all A, and set all the pseudo-counts to 0.5. We defer a more careful study of the effect of prior specifications on inference to Section General Bayesian inference under the recursive mixture model In the previous two subsections we have introduced a recursive mixture model for the distribution of G given D and E, and have provided recipes for testing association between G and D conditional on E based on this model. This is achieved through a sequence of recursive computation for the terms Φ(A) and Ψ(A). Next, we show that once the Φ(A) s are computed, we can in fact derive the exact posterior of the recursive mixture model, and therefore inference can proceed through the general Bayesian paradigm. For example, one may take on the prediction task through Bayesian model averaging (BMA) [5]. The main result (Theorem 1) states that the recursive mixture model is conjugate. Here we must first make clear what we mean by the conjugacy of a model as conjugacy typically refers to the relation between a prior-model pair. To this end, note that the random recursive procedure we have introduced in Section 2.3 is essentially a prior for the conditional retrospective distribution P(G D, E). Conjugacy here refers to the fact that given 19

20 the data, the corresponding posterior for P(G D,E) has exactly the same recursive mixture representation, with the hyperparameters updated to their posterior values. Theorem 1. Suppose the conditional retrospective distribution for the predictors P(G D, E) arises from the recursive mixture model described in the previous subsection. In other words, suppose P(G D,E) arises from a prior that (1) randomly divides Ω E in a recursive manner according to the hyperparameters ρ(a) and λ(a), (2) for each stopped set A, i.e. each stratum, randomly assign a hypothesis H 0 (A) or H 1 (A) for P(G D,E A) with probability γ(a) and 1 γ(a) respectively, and (3) generates P(G D,E A) by drawing from the corresponding Dirichlet distributions with hyperparameters α(a) under H 0 (A) or α 0 (A) and α 1 (A) under H 1 (A). Then given the data, the posterior distribution for P(G D,E) can be represented by exactly the same recursive mixture model with the hyperparameters updated to the following values. 1. Posterior stopping probabilities: 2. Posterior selection probabilities: λ post j ρ post (A) = ρ(a)m(a). Φ(A) (A) = (1 ρ(a))λ j(a) Kj(A) i=1 Φ(A j i ), for j = 1, 2,...,N(A). Φ(A) ρ(a)m(a) 3. Posterior mixing probability for H 0 (A): 4. Posterior Dirichlet pseudo-counts: γ post (A) = γ(a)m 0(A). M(A) α post (A) = α(a) + n 0 (A) + n 1 (A), α post 0 (A) = α 0 (A) + n 0 (A) and α post 1 (A) = α 1 (A) + n 1 (A). 20

21 Proof. See Online Supplementary Materials S1. This theorem generalizes Eqs. (4) and (6) and shows that once we have computed the Φ(A) terms recursively, the posterior can also be computed exactly. One can therefore draw samples directly from the posterior through simulation of the recursive mixture model with the updated parameter values, and use the simulated posterior for inference. 2.5 The case with multiple predictors and covariates Up to this point, we have been assuming that G is a single trinary variable, which has allowed us to model the retrospective distribution of G given D using multinomial distributions together with Dirichlet priors. The multinomial-dirichlet conjugacy provides simple closed form expressions for the marginal likelihood under hypotheses H 0 (A) and H 1 (A) M 0 (A) and M 1 (A) as given in Eq. (2). It is these closed form marginal likelihoods that in turn have allowed us to carry out the recursive computation to get the Φ(A) and Ψ(A) terms exactly. In many studies the predictors under investigation are more complicated. For example, in the context of genetic association studies, instead of testing the association of a single SNP at a time, one may want to test several neighboring SNPs jointly. In principle the multinomial-dirichlet modeling strategy will still apply as long as the predictor space stays finite. However, when the predictor place Ω G is large containing many possible predictor combinations modeling the retrospective distribution of the predictors with multinomial distributions becomes inefficient. Such models ignore the sparsity of the data points in the predictor sample space and incur too many degrees of freedom than the data would allow reliable inference upon. To overcome this difficulty, we can adopt the optional Pólya tree (OPT) model/prior introduced in [11] in place of the multinomial-dirichlet conjugate pair. Instead of using a multinomial model on the sample space and placing an Dirichlet prior on that multinomial model, the OPT framework essentially adopts a mixture of multinomial models, each being 21

22 defined on a different partition of the predictor space, where the mixing distribution on the collection of partitions is also the OT distribution. (In fact, this was the context in which the OT distribution was first introduced.) The OPT prior can be considered a conjugate prior to this mixture-of-multinomial model as Dirichlet is to a standard multinomial model. The idea is that some of the partitions being mixed over will capture the structure of the underlying distribution of G very well. In particular, these partitions will divide more finely regions on the sample space where the underlying distribution changes more abruptly while leaving other parts of the space where the distribution is relatively flat undivided. (Please see [11] for details.) In the current context, replacing the multinomial-dirichlet pair with the OPT, we can model P(G D,E A) with a single OPT under H 0 (A) and with two independent OPTs under H 1 (A). But it turns out that this extension alone is still not enough to address the curse of dimensionality. The challenge remaining is that after conditioning on E, each response group (cases or controls) will often populate Ω G so sparsely that if under H 1 (A) we model P(G D = 0,E A) and P(G D = 1,E A) independently either using two OPTs or two multinomial-dirichlet pairs, we will not be able to draw reliable inference on the two distributions very well. Consequently, even when the two retrospective distributions are indeed different, the marginal likelihood under H 0 (A), or M 0 (A), will often tend to be higher, in fact often much higher, than that under H 1 (A), or M 1 (A), because under H 0 (A) the two response groups are pooled together to infer a single retrospective distribution P(G E A), thereby suffering less from data sparsity. To address this difficulty, one needs to allow borrowing of information between the two response groups under H 1 (A). To this end, a further extension utilizes the coupling optional Pólya tree prior (co-opt) [7] to replace the OPT prior for P(G D,E A) under H(A). 22

23 Specifically, under H(A), we place a co-opt prior on the pair of retrospective distributions (θ 0 (A),θ 1 (A)) co-opt( γ A, λ A, α A, ρ b A, λ b A, α b A) where γ A, λ A, α A, ρ b A, λ b A, and αb A are the hyperparameters for the co-opt prior. The subscript A indicates that we may have different prior specifications across different A s, although typically a specification shared across all A s will suffice. Given the constraint of space, here we just provide the intuition for why the co-opt is powerful in the current context. More technical details including a brief overview of OPT and co-opt priors are provided in the Online Supplementary Materials S2. Readers interested in a full treatment on these priors may refer to [7]. The co-opt is a prior for two distributions on the same sample space, which here is the predictor space Ω G, and it generalizes our two-component mixture model for H(A). Under the co-opt, the retrospective distribution is again modeled as a mixture of H 0 (A) and H 1 (A). Under H 0 (A), the common retrospective distribution for the two response groups P(G E A), or θ(a) in our earlier notation, is modeled as a single OPT, whereas under H 1 (A), the two retrospective distributions P(G D = 0,E A), or θ 0 (A), and P(G D = 1,E A), or θ 1 (A), are modeled as two dependent OPTs that can randomly couple and become a single OPT on subsets of A. The motivation for introducing such dependence is that in many situations even when two distributions are different, they still share some common structure at least in some parts of the sample space. Accordingly, the possible coupling or combining of the two samples for inferring shared structures allows exactly the borrowing of information between the two response groups. As our numerical examples will illustrate, this strategy becomes especially beneficial in higher dimensional situations where data is sparse and so effective sharing of information is crucial for overcoming the data sparsity. An important feature of the co-opt framework, in analogy to the multinomial-dirichlet 23

24 conjugate pair, is that the corresponding marginal likelihood terms can also be computed exactly, through a sequence of recursive computations. As a result, we can still compute the corresponding marginal likelihoods such as M 0 (A) and M(A) in closed form. Consequently, we can compute the recursive terms Φ(A) and Ψ(A) in the same way as before and our three-step inferential procedure for testing association remain the same as before. (For details see Online Supplementary Materials S2.) Moreover, the posterior conjugacy of our proposed recursive mixture model (Theorem 1) also remains true with a corresponding posterior update for co-opt distribution according to Theorem 4 in [7] taking the place of the update for the Dirichlet pseudo-counts in Theorem 1. 3 A power study In this section we illustrate the work of our recursive mixture method through numerical examples and carry out a power study to investigate its performance for testing conditional association between the predictors G and the response D given the covariates E. In addition, we compare our method to two other nonparametric methods for testing association one is the χ 2 -test [1] that aims at testing the marginal association between G and D, ignoring the covariates E, and the other is the generalized test [1]. To imitate the sampling of case-control studies, we simulate the predictors G and the covariates E first for a population, and then simulate the response D through a prospective disease model. We then retrospectively sample a case group and a control group, and apply the three methods to the simulated case-control data. We consider disease models that correspond to a variety of dependence relationships among G, D and E. Moreover, we vary the dimensionality of G and E to study how increasing sparsity of the data influences the performance of the three methods. More specifically, we simulate the data under four different scenarios, representing four different disease models motivated by common situations in genetic association studies. Un- 24

25 der each scenario, the predictors are all independent trinary variables taking values 0, 1, and 2 with probabilities 0.16, 0.48 and (The reader may think of them as SNP genotype markers under Hardy-Weinberg equilibrium with a minor allele frequency of 0.4.) The covariates are all independent binary variables. One may imagine them being binary environment variables such as gender, diet, and smoking status, etc. in a genetic association study. In all of the examples, we choose ρ(a) 0.5 and λ j (A) = 1/N(A) for all non-atomic A s, and γ(a) 0.5 for all A s. The first two scenarios (see below) correspond to the case when the predictors and covariates are independent of each other in the general population that is, without conditioning on the response. In both scenarios the covariates are simulated as independent Bernoulli(0.5) variables. Under each scenario, we simulate a population of 100,000 individuals with k g predictors G = (G 1,G 2,...,G kg ) and k e covariates E = (E 1,E 2,...,E ke ), and then we retrospectively sample a case-control data set. We carry out 500 such simulations for each of the 16 combinations of (k g,k e ) for k g and k e ranging from 2 to 5. So the number of cells in the corresponding contingency tables ranges from = 36 to = 7, 776. For the larger tables, for instance for k g = k e = 5, the data is sparse as the corresponding sample sizes we consider range from 1,200 to 2,500 per group while the table has 7,776 cells per group. Such sparsity is typical in genetic association studies when multiple markers are considered jointly. For each simulated case-control data set, we calculate three statistics (1) Ψ(Ω E )/Φ(Ω E ), (2) the χ 2 statistic (with kg 2 1 degrees of freedom) apply to the marginal table of G and D, and (3) the (generalized) statistic applied to the stratified table of G and D with E being the strata. Sample sizes for the case and control groups are chosen to make the comparison across the methods discriminative. 25

26 Scenario 1. In this case, the prospective model for the response is D G,E Bernoulli(0.35) if G 1 = 1 and G 2 = 0 Bernoulli(0.2) otherwise. So the predictors are associated with the response, while the covariates play no role. Scenario 2. In this case, the prospective model for the response is D G,E Bernoulli(0.4) if G 1 = 0, E 1 = 1, and E 2 = 1 Bernoulli(0.4) if G 2 = 0, E 1 = 0, and E 2 = 0 Bernoulli(0.2) otherwise. So the predictors and the covariates interact with each other in affecting the response distribution. In this case, the homogeneous effect assumption, which the test relies on, does not hold. Figure 3 and Figure 4 present the ROC curves for the three test statistics under the two scenarios. (In order to construct the ROC curves, we need to also simulate the test statistics under a null hypothesis where there is no association between G and D. For this purpose we simulated from the null model D G,E Bernoulli(0.2).) Note that the larger k g is, the sparser the tables are, and so the harder to detect the association. Consequently we increase the sample sizes n 0 for the controls and n 1 for the cases together with k g to keep the ROC curves informative (as opposed to lying along the 45 degree line). The specific sample sizes we used for each simulation are given in the figures as well. Let us first look at Figure 3. Under Scenario 1, the covariates do not enter into the disease model P(D G,E), and so it is in fact not necessary to condition on the covariates. Although our method takes E into account, it still is more sensitive than the other two 26

27 statistics, due partly to the mixing component H(Ω E ) that leaves Ω E undivided. The ROC curves of the three statistics are similar for small tables k g = 2, but the advantage of our method becomes apparent as k g increases. This is expected as the χ 2 test and the test does not take into account the sparsity of the data when the predictor space expands. In contrast, our method, with the co-opt prior as the model for the retrospective distribution under H(A), is robust to the sparsity of data through the borrowing of information across the two response groups as discussed in Section 2.5. The results for Scenario 2 (Figure 4) are similar to those for Scenario 1, but now our method outperforms the other two even more. It is interesting that the χ 2 test performs slightly better than the, despite the fact that the former ignores the covariates. This is probably due to two reasons: first, the conditional dependence structure induces fairly strong marginal dependence, which the χ 2 test directly targets at, and second, the homogeneous effect assumption, which the test relies on, fails. Next, we simulate from two additional scenarios that illustrate the importance of conditioning on relevant covariates to avoid false positives and improve power for true detections. Scenario 3. In this case, one of the covariates, E 1, is no longer simulated as a Bernoulli(0.5) variable. Instead, its (prospective) distribution is determined by the predictor G 2 E 1 G Bernoulli(0.7) if G 2 1 Bernoulli(0.5) otherwise. In addition, the prospective model for the response is D G,E Bernoulli(0.4) if E 1 = 1 Bernoulli(0.2) otherwise. 27

28 ke=2,kg=2,n0=300,n1=300 ke=2,kg=3,n0=500,n1=500 ke=2,kg=4,n0=800,n1=800 ke=2,kg=5,n0=1200,n1=1200 ke=3,kg=2,n0=300,n1=300 ke=3,kg=3,n0=500,n1=500 ke=3,kg=4,n0=800,n1=800 ke=3,kg=5,n0=1200,n1=1200 ke=4,kg=2,n0=300,n1=300 ke=4,kg=3,n0=500,n1=500 ke=4,kg=4,n0=800,n1=800 ke=4,kg=5,n0=1200,n1=1200 ke=5,kg=2,n0=300,n1=300 ke=5,kg=3,n0=500,n1=500 ke=5,kg=4,n0=800,n1=800 ke=5,kg=5,n0=1200,n1=1200 Figure 3: ROC curves for the three test statistics under Scenario 1. 28

29 ke=2,kg=2,n0=400,n1=400 ke=2,kg=3,n0=750,n1=750 ke=2,kg=4,n0=1500,n1=1500 ke=2,kg=5,n0=2000,n1=2000 ke=3,kg=2,n0=400,n1=400 ke=3,kg=3,n0=750,n1=750 ke=3,kg=4,n0=1500,n1=1500 ke=3,kg=5,n0=2000,n1=2000 ke=4,kg=2,n0=400,n1=400 ke=4,kg=3,n0=750,n1=750 ke=4,kg=4,n0=1500,n1=1500 ke=4,kg=5,n0=2000,n1=2000 ke=5,kg=2,n0=400,n1=400 ke=5,kg=3,n0=750,n1=750 ke=5,kg=4,n0=1500,n1=1500 ke=5,kg=5,n0=2000,n1=2000 Figure 4: ROC curves for the three test statistics under Scenario 2. 29

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Supplementary materials for Scalable Bayesian model averaging through local information propagation

Supplementary materials for Scalable Bayesian model averaging through local information propagation Supplementary materials for Scalable Bayesian model averaging through local information propagation August 25, 2014 S1. Proofs Proof of Theorem 1. The result follows immediately from the distributions

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Bayesian analysis of the Hardy-Weinberg equilibrium model

Bayesian analysis of the Hardy-Weinberg equilibrium model Bayesian analysis of the Hardy-Weinberg equilibrium model Eduardo Gutiérrez Peña Department of Probability and Statistics IIMAS, UNAM 6 April, 2010 Outline Statistical Inference 1 Statistical Inference

More information

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Analysis for Natural Language Processing Lecture 2 Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Optimal exact tests for complex alternative hypotheses on cross tabulated data

Optimal exact tests for complex alternative hypotheses on cross tabulated data Optimal exact tests for complex alternative hypotheses on cross tabulated data Daniel Yekutieli Statistics and OR Tel Aviv University CDA course 29 July 2017 Yekutieli (TAU) Optimal exact tests for complex

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs STAT 5500/6500 Conditional Logistic Regression for Matched Pairs Motivating Example: The data we will be using comes from a subset of data taken from the Los Angeles Study of the Endometrial Cancer Data

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative

More information

Bayesian Econometrics

Bayesian Econometrics Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

A Note on Lenk s Correction of the Harmonic Mean Estimator

A Note on Lenk s Correction of the Harmonic Mean Estimator Central European Journal of Economic Modelling and Econometrics Note on Lenk s Correction of the Harmonic Mean Estimator nna Pajor, Jacek Osiewalski Submitted: 5.2.203, ccepted: 30.0.204 bstract The paper

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Bayesian Inference of Interactions and Associations

Bayesian Inference of Interactions and Associations Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression

More information

Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 10 Class Summary Last time... We began our discussion of adaptive clinical trials Specifically,

More information

A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW

A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW Miguel A Gómez-Villegas and Beatriz González-Pérez Departamento de Estadística

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2008 Paper 241 A Note on Risk Prediction for Case-Control Studies Sherri Rose Mark J. van der Laan Division

More information

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

Bayesian model selection for computer model validation via mixture model estimation

Bayesian model selection for computer model validation via mixture model estimation Bayesian model selection for computer model validation via mixture model estimation Kaniav Kamary ATER, CNAM Joint work with É. Parent, P. Barbillon, M. Keller and N. Bousquet Outline Computer model validation

More information

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University July 31, 2018

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Bayesian Methods for Multivariate Categorical Data. Jon Forster (University of Southampton, UK)

Bayesian Methods for Multivariate Categorical Data. Jon Forster (University of Southampton, UK) Bayesian Methods for Multivariate Categorical Data Jon Forster (University of Southampton, UK) Table 1: Alcohol intake, hypertension and obesity (Knuiman and Speed, 1988) Alcohol intake (drinks/day) Obesity

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017 Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling

2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling 2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction and Motivating

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Ordered Designs and Bayesian Inference in Survey Sampling

Ordered Designs and Bayesian Inference in Survey Sampling Ordered Designs and Bayesian Inference in Survey Sampling Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu Siamak Noorbaloochi Center for Chronic Disease

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Part IV Statistics in Epidemiology

Part IV Statistics in Epidemiology Part IV Statistics in Epidemiology There are many good statistical textbooks on the market, and we refer readers to some of these textbooks when they need statistical techniques to analyze data or to interpret

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Markov Chain Monte Carlo, Numerical Integration

Markov Chain Monte Carlo, Numerical Integration Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1 Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1 Numerical

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web: capture/recapture.

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies. Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

The STS Surgeon Composite Technical Appendix

The STS Surgeon Composite Technical Appendix The STS Surgeon Composite Technical Appendix Overview Surgeon-specific risk-adjusted operative operative mortality and major complication rates were estimated using a bivariate random-effects logistic

More information

Approximate Bayesian Computation: a simulation based approach to inference

Approximate Bayesian Computation: a simulation based approach to inference Approximate Bayesian Computation: a simulation based approach to inference Richard Wilkinson Simon Tavaré 2 Department of Probability and Statistics University of Sheffield 2 Department of Applied Mathematics

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Introduction. Basic Probability and Bayes Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 26/9/2011 (EPFL) Graphical Models 26/9/2011 1 / 28 Outline

More information

Quantifying the Price of Uncertainty in Bayesian Models

Quantifying the Price of Uncertainty in Bayesian Models Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available. Title Quantifying the Price of Uncertainty in Bayesian Models Author(s)

More information

ASA Section on Survey Research Methods

ASA Section on Survey Research Methods REGRESSION-BASED STATISTICAL MATCHING: RECENT DEVELOPMENTS Chris Moriarity, Fritz Scheuren Chris Moriarity, U.S. Government Accountability Office, 411 G Street NW, Washington, DC 20548 KEY WORDS: data

More information

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,

More information

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract Bayesian analysis of a vector autoregressive model with multiple structural breaks Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus Abstract This paper develops a Bayesian approach

More information

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21 Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes

More information

2.1.3 The Testing Problem and Neave s Step Method

2.1.3 The Testing Problem and Neave s Step Method we can guarantee (1) that the (unknown) true parameter vector θ t Θ is an interior point of Θ, and (2) that ρ θt (R) > 0 for any R 2 Q. These are two of Birch s regularity conditions that were critical

More information