Bayesian Inference in the Multivariate Probit Model

Size: px
Start display at page:

Download "Bayesian Inference in the Multivariate Probit Model"

Transcription

1 Bayesian Inference in the Multivariate Probit Model Estimation of the Correlation Matrix by Aline Tabet A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in The Faculty of Graduate Studies (Statistics) The University Of British Columbia August, 27 c Aline Tabet 27

2 Abstract Correlated binary data arise in many applications. Any analysis of this type of data should take into account the correlation structure among the variables. The multivariate Probit model (MVP), introduced by Ashford and Snowden (97), is a popular class of models particularly suitable for the analysis of correlated binary data. In this class of models, the response is multivariate, correlated and discrete. Generally speaking, the MVP model assumes that given a set of explanatory variables the multivariate response is an indicator of the event that some unobserved latent variable falls within a certain interval. The latent variable is assumed to arise from a multivariate normal distribution. Difficulties with the multivariate Probit are mainly due to computation as the likelihood of the observed discrete data is obtained by integrating over a multidimensional constrained space of latent variables. In this work, we adopt a Bayesian approach and develop an an efficient Markov chain Monte Carlo algorithm for estimation in MVP models under the full correlation and the structured correlation assumptions. Furthermore, in addition to simulation results, we present an application of our method to the Six Cities data set. Our algorithm has many advantages over previous approaches, namely it handles identifiability and uses a marginally uniform prior on the correlation matrix directly. ii

3 Table of Contents Abstract ii Table of Contents iii List of Tables vi List of Figures viii Acknowledgements xi Dedication xii I Thesis Introduction Motivation Outline The Multivariate Probit Model Model Specification and Notation Difficulty with Multivariate Probit Regression: Identifiability Bayesian Inference in Multivariate Probit Models Prior Specification on β Prior Specification on the correlation matrix R Correlation Estimation in the Saturated Model Introduction iii

4 Table of Contents 3.2 Parameter Expansion and Data Augmentation Data Augmentation Parameter Expansion for Data Augmentation Data Transformation Proposed Model Imputation Step Posterior Sampling Step Simulations Results for T = Results for T = Convergence Assessment Application: Six Cities Data Correlation Estimation in the Structured Model Introduction Conditional Independence Gaussian Graphical Models Graph Theory The Hyper-inverse Wishart Distribution Marginally Uniform Prior for Structured Covariance PX-DA in Gaussian Graphical Models Simulations Loss Under the Saturated Model and the Structured Model Effect of Decreasing Sample Size Prediction Accuracy Application: Six Cities Data Revisited Conclusion Summary Extensions, Applications, and Future Work Bibliography iv

5 Table of Contents II Appendices 8 Appendices A Distributions and Identities A. The Multivariate Normal (Gaussian) Distribution A.2 The Gamma Distribution A.3 The Standard Inverse Wishart Distribution B Marginal Prior on R proof from Barnard et al. (2).. 84 C Computation of the Jacobian J : Z W D Sampling from Multivariate truncated Gaussian E Sampling from the Hyper Inverse Wishart Distribution (Carvalho et al., 27) F Simulation Results v

6 List of Tables 2. Summary of how identifiability has been handled in some previous work Correlation results from simulations for T = Regression coefficients results from simulations for T = Correlation Results from simulations for T = Regression coefficients results from simulations when T = Six Cities Data: Posterior estimates using Marginal Prior, MLE estimate using MCEM and Posterior estimates using the Jointly Uniform Prior (Chib and Greenberg (998)) Simulation results: Entropy and quadratic loss averaged over 5 data sets generated by different correlation matrices with the same structure Entropy and Quadratic loss obtained by estimating the true correlation and partial correlation matrix with the PX-DA algorithm under the saturated and structured model assumption Simulation results on the unconstrained correlation coefficients corresponding to the model in 4., with n =, T = 8 based on N = 5 Gibbs samples Simulation results on the constrained correlation coefficients corresponding to the model in 4., with n =, T = 8 based on N = 5 Gibbs samples vi

7 List of Tables 4.5 Simulation results on the unconstrained correlation coefficients corresponding to the model in 4., with n = 2, T = 8 based on N = 5 Gibbs samples Simulation results on the constrained correlation coefficients corresponding to the model in 4., with n = 2, T = 8 based on N = 5 Gibbs samples Six Cities Data: Posterior estimates under structured model assumption, MLE estimate using MCEM and Posterior estimates using the Jointly Uniform Prior under a saturated model assumption(chib and Greenberg (998)) F. Simulation results: Entropy and quadratic loss for 5 data sets generated by different correlation matrices with the same structure F.2 Table F continued vii

8 List of Figures 2. A graphical representation of the model in 2.3 under a full correlation structure. Observed nodes are shaded Marginal prior density for r 2 when T = 3 and T = under the jointly uniform prior p(r), based on 2 draws. (Figure reproduced from Barnard et al. (2)) Marginal correlations obtained using the prior in 2.2 by sampling from a standard inverse Wishart with degrees of freedom ν = T Correlation estimates for ρ =.4, T = 3 and increasing sample size from n = to n= Correlation estimates for ρ =.8, T = 3 and increasing sample size from n = to n= β estimates for ρ =.4, T = 3 and sample size n = β estimates for ρ =.4, T = 3 and sample size n = β estimates for ρ =.8, T = 3 and sample size n = β estimates for ρ =.8, T = 3 and sample size n = Correlation estimates for ρ =.2, T = 8 and increasing sample size from n = to n= Correlation estimates for ρ =.6, T = 8 and increasing sample size from n = to n= β estimates for ρ =.2, T = 8 and sample size n = β estimates for ρ =.2, T = 8 and sample size n = β estimates for ρ =.6, T = 8 and sample size n = β estimates for ρ =.6, T = 8 and sample size n = viii

9 List of Figures 3.3 n =, T = 3, Trace plots as the number of iterations increase from N = 5 to N = 5 post Burn-in. The algorithm has started to converge after about iteration post Burn-in n =, T = 3, Autocorrelation plots of a randomly chosen parameter from correlation matrices for the cases where the marginal correlations is ρ =.2, ρ =.4, ρ =.6, and ρ = Trace plots of the cumulative mean and cumulative standard deviation of randomly chosen parameters from correlation matrices as the correlation is varied from ρ =.2, ρ =.4, ρ =.6, and ρ =.8 and n =, T = 3. The vertical line marks the Burn-in value (5) used in the simulations Six Cities Data: Trace plots and density plots of the correlation coefficients. The vertical lines denote 95 % credible interval and the line in red indicates the posterior mean reported by Chib and Greenberg (998) Six Cities Data : Trace plots, density plots and autocorrelation plots of the regression coefficients. Vertical lines denote 95 % credible interval and the line in red indicates the posterior mean reported by Chib and Greenberg (998) A graphical representation of a structured MVP model for T = 3. The edge between Z i and Z i3 is missing, this is equivalent to r 3 =. This structure is typical of longitudinal models where each variable is strongly associated with the one before it and after it, given the other variables in the model A graphical model with T = 7 vertices. In this graph, Z is a neighbor of Z 2. Z 3, Z 2, and Z 7 form a complete subgraph or a clique. This graph can be decomposed into two cliques {Z, Z 2, Z 3, Z 5, Z 4 } and {Z 3, Z 6, Z 7 }. {Z 3 } separates the two cliques Marginal distributions of the prior on the correlation matrix corresponding to the model in ix

10 List of Figures 4.4 Illustration of the marginally uniform prior on the structure of the graph in figure 4.2. In this graph we have unequal clique sizes where C = 5 and C 2 = Box plot of the entropy and quadratic loss obtained by generating data from 5 correlation structures and computing the loss function under the full correlation structure versus a structured correlation structure Six Cities Data: Correlation and partial correlation estimates Six Cities Data : Trace plots, density plots and autocorrelation plots of the regression coefficients under a structured model assumption. Vertical lines denote 95 % credible interval and the line in red indicates the posterior mean reported by Chib and Greenberg (998) x

11 Acknowledgements I would like to thank my supervisors Dr. Arnaud Doucet and Dr. Kevin Murphy. This work would not have been possible without their valued advice and suggestions. I also thank the staff and faculty members of the Statistics Department at UBC, in particular, Dr. Paul Gustafson, Dr. Harry Joe and Dr. Matias Salibian-Barrera, for their help, advice and mentorship. I am forever grateful to my family, Salma, Ghassan, Najat, Sal and Rhea, for their continued support and encouragement. The numerous sacrifices they made over the last few years allowed me to pursue my aspirations, and reach important milestones in my professional career. Finally I want to thank my friends and fellow graduate students, both in the Statistics Department and in Computer Science, for providing theoretical advice, computer support and numerous help, but most importantly for making the last two years a memorable journey. xi

12 Dedication To my mom and dad, your love and support makes everything possible. xii

13 Part I Thesis

14 Chapter Introduction. Motivation Correlated discrete data, whether be it binary, nominal or ordinal, arise in many applications. Examples range from the study of group randomized clinical trials to consumer behavior, panel data, sample surveys and longitudinal studies. Modeling dependencies between binary variables can be done using Markov random fields (e.g., Ising models). However, an attractive alternative is to use a latent variable model, where the observed binary variables are assumed independent given latent Gaussian variables, which are correlated. An example of such model is the multivariate Probit model (MVP), introduced by Ashford and Snowden (97). In this class of models, the response is multivariate, correlated and discrete. Generally speaking, the MVP model assumes that given a set of explanatory variables the multivariate response is an indicator of the event that some unobserved latent variable falls within a certain interval. The latent variable is assumed to arise from a multivariate normal distribution. The likelihood of the observed discrete data is then obtained by integrating over the multidimensional constrain space of latent variables. P (Y ij = X i, β, Σ) =... φ T (Z i X i, β, R)dZ... dz T (.) A it A i where i =,..., n indexes the independent observation, j =,..., T indexes the dimension of the response, Y ij is a T -dimensional vector taking values in {, }, A ij is the interval (, ) if Y ij = and the interval (, ] otherwise, β is the regression coefficient, Σ is the covariance matrix, and φ T (Z i X i, β, R) is the probability density function of the standard normal 2

15 Chapter. Introduction distribution defined in A.. The MVP model has been proposed as an alternative to the multivariate logistic model, which is defined as: P (Y ij = X i, β, Σ) = exp(x i β j) T k= exp(x i β k) (.2) The appeal of the probit model is that it relaxes the independence of the irrelevant alternatives (IIA) property assumed by the logit model. This IIA property assumption states that if choice A is preferred to choice B out of the choice set {A,B}, then introducing a third alternative C, thus expanding the choice set to {A,B,C} must not make B preferred to A. This means that adding or deleting alternative outcome categories does not affect the odds among the remaining outcomes. More specifically in the logistic regression model, the odds of choosing m versus n does not depend on which other outcomes are possible. That is, the odds are determined only by the coefficient vectors for m and n, namely β m and β n : P (Y im = X i, β, Σ) P (Y in = X i, β, Σ) = exp(x i β T m)/ k= exp(x i β k) exp(x i β n)/ T k= exp(x i β k) = exp(x(β m β n )) (.3) In many cases, this is considered to be an unrealistic assumption (see for example McFadden (974)), particularly when the alternatives are similar or redundant as is the case in many econometric applications. Until recently, estimation of MVP models, despite its appeal, has been difficult due to computational intractability especially when the response is high dimensional. However, recent advances in computational and simulation methods made this class of models more widely used. Both classical and Bayesian methods have been extensively developed for estimation of these models. For a low dimensional response, finding the maximum likelihood estimator numerically using quadrature methods for solving the multidimensional integral is possible, but becomes quickly intractable as the number of dimensions T increases usually past 3. 3

16 Chapter. Introduction Lerman and Manski (98) suggest the method of simulated maximum likelihood (SML). This method is based on Monte Carlo simulations to approximate the high dimensional integral to estimate the probability of each choice. McFadden (989) introduced the method of simulated moments (MSM). This method also requires simulating the probability of each outcome based on moment conditions. Natarajan et al. (2) introduced a Monte Carlo variant of the Expectation Maximization algorithm (MCEM) to find the maximum likelihood estimator without solving the high dimensional integral. Other frequentist methods were also developed using Generalized Estimation Equations (GEE) (eg. Chaganty and Joe (24)). On the Bayesian side, Albert and Chib (993) introduced a method that involves a Gibbs Sampling algorithm using data augmentation for the univariate probit model. McCulloch and Rossi (994) extended this model to the multivariate case. The Bayesian method entails iteratively alternating between sampling the latent data and estimating the unknown parameters by drawing from their conditional distributions. The idea is that under mild conditions, successive sampling from the conditional distributions produces a Markov chain which converges in distribution to the desired joint conditional distribution. Other work on the Bayesian side includes that of Chib and Greenberg (998), and more recently Liu (2), Liu and Daniels (26), and Zhang et al. (26). These methods will be examined in more detail in Chapter 2. Geweke et al. (994) compared the performance of the classical frequentist methods SML and MSM with the Bayesian Gibbs sampling method and found the Bayesian method to be superior especially when the covariates are correlated and the error variances vary across responses..2 Outline In this work we adopt a Bayesian approach for estimation in the multivariate Probit class of models. The multinomial and the ordinal models are generalizations of the binary case. The multivariate binary response is a special case of the multinomial response with only two categories. The ordinal 4

17 Chapter. Introduction model is also a special case of the multinomial model, where the categories are expected to follow a certain order. All the methods developed herein are developed for the multivariate binary model, but could be easily extended to include the multinomial and ordinal cases. The aim is to find a general framework to estimate the parameters required for inference in the MVP model, especially in high dimensional problems. We particularly focus on the estimation of an identifiable correlation matrix under a full correlation assumption and a constrained partial correlation assumption. This thesis will be structured as follows: In Chapter 2, we introduce the notation that will be used throughout the thesis. We discuss the problem of identifiability in the MVP class of models. We briefly compare several possible choices of prior distributions for Bayesian modeling, as well as review some methods that have been proposed in the literature to deal with identifiability and prior selection. In Chapter 3, we detail a method for estimating an identifiable correlation matrix under the saturated model. The saturated model admits a full covariance matrix where all off-diagonal elements are assumed to be non-zero. We show simulation results on a low dimensional and a higher dimensional problem. Finally, we further investigate the method, by applying it to a widely studied data set: The Six Cities Data. In Chapter 4, we extend the method developed in Chapter 3 to the case where a structure on the partial correlation matrix is imposed. To do so, we motivate the use of Gaussian graphical models and the Hyperinverse Wishart Distribution. We provide a general introduction to Gaussian graphical models, and we adapt the algorithm and the priors developed in Chapter 3 to the new framework. Throughout this chapter, we assume that the structure of the inverse correlation matrix is known and given. Simulation results are presented as well as an application to the Six Cities Data set from Chapter 3. We conclude in Chapter 5, by summarizing the work and the results. We also discuss possible extensions, applications and future work. 5

18 Chapter 2 The Multivariate Probit Model 2. Model Specification and Notation The multivariate Probit model assumes that each subject has T distinct binary responses, and a matrix of covariates that can be any mixture of discrete and continuous variables. Specifically, let Y i = (Y i,..., Y it ) denote the T - dimensional vector of observed binary / responses on the ith subject, i =,..., n. Let X i be a T p design matrix, and let Z i = (Z i,..., Z it ) denote a T -variate normal vector of latent variables such that Z i = X i β + ɛ i, i =,..., n (2.) The relationship between Z ij and Y ij in the multivariate probit model is given by { if Z ij > ; Y ij = j =,..., T (2.2) otherwise. So that P (Y i = β, Σ) = Φ(Z i ) Z i N(X i β, Σ) (2.3) where Φ is the Probit link which denotes the cumulative distribution function of the normal distribution as defined in A.. Here β = ( β,..., βt ) is a p T matrix of unknown regression coefficients, ɛ i is a T vector of residual error distributed as N T (, Σ), where Σ is the T T correlation matrix of Z i. 6

19 Chapter 2. The Multivariate Probit Model β X i X i 2 X i 3 Zi Z i 3 Z i2 Y i Y i2 Y i3 Σ i = : n Figure 2.: A graphical representation of the model in 2.3 under a full correlation structure. Observed nodes are shaded. The posterior distribution of Z i is given by T f(z i Y i, β, R) φ T (Z i X i, β, R) {I(z ij > )I(y ij = ) + I(z ij < )I(y ij = )} j= (2.4) This is a multivariate truncated Gaussian where φ T (Z) is the probability density function of the normal distribution as in A.. The likelihood of the observed data Y is obtained by integrating over the latent variables Z: P (Y i = y i X i, β, R) =... A it Φ T (Z i X i, β, R)dZ i A i (2.5) 7

20 Chapter 2. The Multivariate Probit Model where A ij is the interval (, ) if Y ij = and the interval (, ] otherwise. This formulation of the model is most general, since it allows the regression parameters as well as the covariates to vary across categories T. In this work, we let the covariates vary across categories, however, we constrain the regression coefficients β to be fixed across categories by requiring β =... = β T = β. 2.2 Difficulty with Multivariate Probit Regression: Identifiability In the multivariate Probit model, the unknown parameters (β, Σ) are not identifiable from the observed-data model (e.g: Chib and Greenberg (998), Keane (992)). This could be easily seen if we scale Z by a constant c >, we get cz = c(xβ + ɛ) (2.6) = X(cβ) + cɛ (2.7) from equation 2.2, clearly Y will have the same value given Z and given cz, which means that the likelihood of Y X, β, Σ is the same as that of Y X, cβ, c 2 Σ. Furthermore, we have no way of estimating the value of c. In order to handle this identifiability issue in MVP, restrictions need to be imposed on the covariance matrix. In the univariate case, this restriction is handled by setting the variance to one. However, imposing such a restriction in the multivariate case is a little more complicated. It is not uncommon to ignore the identifiability problem and perform the analysis on the unidentified model and post-process samples by scaling with the sampling variance using the separation strategy R = D ΣD, where D is a diagonal matrix with diagonal elements d ii = Σ ii. This method is adopted by McCulloch and Rossi (994), and is widely used (e.g Edwards and Allenby (23)). Many researchers are uncomfortable working with unidentified parameters. For instance, ignoring identifiability adds difficulty in the choice of prior 8

21 Chapter 2. The Multivariate Probit Model distributions, since priors are placed on unidentified parameters. Therefore, if the prior is improper, it is difficult to verify that the scaled draws are from a proper posterior distribution. Koop (23, p. 227) gives an empirical illustration of the effect of ignoring identifiability. From simulation results, he shows that unidentifiable parameters have higher standard errors, and furthermore with non-informative priors there is nothing stopping estimates from going to infinity. McCulloch et al. (2) address identifiability by setting the first diagonal element of the covariance matrix σ =. However, this means that the standard priors for covariance could no longer be used, they propose a prior directly on the identified parameters, but their method is computationally expensive, and is slow to converge as pointed out by Nobile (2). Nobile suggests an alternative way of normalizing the covariance by drawing from an inverse Wishart conditional on σ = (Linardakis and Dellaportas, 23). The approach of constraining one element of the covariance adds difficulty in the interpretability of the parameters and priors, and is computationally demanding and slow to converge. Other approaches impose constraints on Σ, the precision matrix. Webb and Forster (26) parametrize Σ in terms of its Cholesky decomposition: Σ = Ψ T ΛΨ T. In this parametrization, Ψ is an upper triangular matrix with diagonal elements equal to, and Λ is a diagonal matrix. The elements of Ψ could be regarded as the regression coefficients obtained by regressing the latent variable on its predecessors. Each λ jj is interpreted as the conditional precision of the latent data corresponding to variable j given the latent data for all the variables preceding j in the decomposition. Identifiability is addressed in this case by setting λ jj to. This approach only works if the data follows a specific ordering, for example time series. Dobra et al. (24) propose an algorithm to search over possible orderings, however this becomes very computationally expensive in high dimensions. Alternatively, identifiability could be handled by restricting the covariance matrix Σ to be a correlation matrix R (Chib and Greenberg (998)). The correlation matrix admits additional constraints, since in addition to being positive semi-definite, it is required to have diagonal elements equal to 9

22 Chapter 2. The Multivariate Probit Model and off-diagonal elements [, ]. Furthermore, just as in the covariance case, the number of parameters to be estimated increases quadratically with the dimension of the matrix. Barnard et al. (2) use the decomposition Σ = DRD, and place a separate prior on R and D directly. They use a Griddy Gibbs sampler (Ritter and Tanner, 992) to sample the correlation matrix. Their approach involves drawing the correlation elements one at time and requires setting grid sizes and boundaries. This approach is inefficient, especially in high dimensions. Chib and Greenberg (998) use a Metropolis Hastings Random Walk algorithm to sample the correlation matrix. This is more efficient than the Griddy Gibbs approach because it draws the correlation coefficient in blocks. However the resulting correlation matrix is not guaranteed to be positive definite, which requires the algorithm to have an extra rejection step. Furthermore, as with random walk algorithms in general, the mixing is slow in high dimensions. Alternatively, some approaches use parameter expansion as described in Liu and Wu (999) together with data augmentation, for example Liu (2), Zhang et al. (26), Liu and Daniels (26), and others. The idea is to propose an alternative parametrization, to move from a constrained correlation space to sampling a less constrained covariance matrix and transform it back to a correlation matrix. These approaches differ mainly with the choice of priors and how the covariance matrix is sampled. The different possibilities for priors will be discussed in more detail in the next section, and an in-depth explanation of parameter expansion with data augmentation algorithm is in the next Chapter. Table 2. gives a summary of the how identifiability has been handled in the Probit model. 2.3 Bayesian Inference in Multivariate Probit Models A Bayesian framework treats parameters as random variables and therefore requires the computation of the posterior distribution of the unknown

23 Chapter 2. The Multivariate Probit Model Table 2.: Summary of how identifiability has been handled in some previous work Identifiability Paper Ignored McCulloch and Rossi (994) Restrict σ = McCulloch et al. (2) Nobile (2) Restrict λ jj = in Σ = Ψ T ΛΨ T Webb and Forster (26) Restrict Σ to R Barnard et al. (2) Liu (2) Liu and Daniels (26) Zhang et al. (26) random parameters conditional on the data. A straightforward application of Bayes rule results in the posterior distribution of (β, R) where R is the correlation matrix, β is the matrix of regression coefficients, and D is the data. π(β, R D) f(d β, R)π(β, R) (2.8) In order to estimate the posterior distribution, a prior distribution on the unknown parameters β and R needs to be specified. In the absence of prior knowledge, it is often desirable to have uninformative flat priors on the parameters we are estimating 2.3. Prior Specification on β It is common to assume that a priori β and R are independent. Liu (2) propose a prior on β that depends on R to facilitate computations. There are several other choices of priors in the literature on the regression coefficients β. The most common choice is a multivariate Gaussian distribution centered at B, with known diagonal covariance matrix Ψ β. It is typical to choose large values for the diagonal elements of Ψ β so that the prior on β is uninformative. This is the proper conjugate prior. In addition, without loss of generality,

24 Chapter 2. The Multivariate Probit Model we could set B to π( β) N pt (, Ψ β I T ) (2.9) where β is the nt -dimensional vector obtained by stacking up the columns of the p T regression coefficient matrix β. In this work, we constrain the regression parameter to be constant across T Prior Specification on the correlation matrix R To handle identifiability, we restrict the covariance matrix Σ to be a correlation matrix, which means that the standard conjugate inverse Wishart prior for covariances cannot be used. Instead, a prior needs to be placed on R directly. However, as mentioned previously there does not exist a conjugate prior for correlation matrices. Barnard et al. (2) discuss possible choices of diffuse priors on R. The first is the proper jointly uniform prior: π(r), R R T (2.) Where the correlation matrix space R T is a compact subspace of the hypercube [, ] T (T )/2. The posterior distribution resulting from this prior is not easy to sample from. Barnard et al. use the Griddy Gibbs approach (Ritter and Tanner, 992), which is inefficient. The approach in Chib and Greenberg (998) uses this prior as well. Liu and Daniels (26) use this prior for inference. However, they use a different prior to generate their sampling proposal. It is important to note that using a jointly uniform prior would not result in uniform marginals on each r ij. Barnard et al. (2) show that a jointly uniform prior will tend to favor marginal correlations close to, making it highly informative, marginally. This problem becomes more apparent as T increases (see Figure 2.2). 2

25 r 2 Chapter 2. The Multivariate Probit Model Another commonly used uninformative prior is the Jeffrey s prior π(r) R (p+) 2 (2.) This prior is used by Liu (2). Liu and Daniels (26) use it for generating their proposal..4 Jointly Uniform Prior T = 3 T = Figure 2.2: Marginal prior density for r 2 when T = 3 and T = under the jointly uniform prior p(r), based on 2 draws. (Figure reproduced from Barnard et al. (2)) It has been shown that in the context of parameter expansion, this prior helps facilitate computations. However, it suffers from the disadvantage of being improper. Improper priors are not guaranteed to have a proper posterior distribution and, in addition, cannot be used for model selection due to Lindley s paradox. Furthermore, it has been shown that the use of improper priors on covariance matrices is in fact informative and tends to favor marginal correlations close to ± (Rossi et al., 25, Chapter 2). Alternatively, Barnard et al. (2) propose a prior on R such that marginally each r ij is uniform on the interval [, ]. This is achieved by taking the joint distribution of R to be: 3

26 Chapter 2. The Multivariate Probit Model T (T ) π(r) R 2 ( i R ii ) (T +)/2 (2.2) The above distribution is difficult to sample from directly. However, they show that sampling from it can be achieved by sampling from a standard inverse Wishart with degrees of freedom equal to ν = T + and transforming Marginal Correlations Density.5 Density.5 Density ρ ρ ρ 23 Figure 2.3: Marginal correlations obtained using the prior in 2.2 by sampling from a standard inverse Wishart with degrees of freedom ν = T + back to a correlation matrix using the separation strategy (Σ = DRD). The proof is reproduced in Appendix B and the result is illustrated in Figure 2.3. The marginally uniform prior seems convenient, since it is proper and we are able to compute its normalizing constant. It does not push correlations toward or ± even in high dimensions. Most importantly, because it is proper, it opens the possibility for Bayesian model selection. However, multiplying together the distribution of Z in equation 2.4 and the marginally uniform prior in 2.2, results in a posterior distribution that is complicated and not easily sampled from. 4

27 Chapter 2. The Multivariate Probit Model Nevertheless, we show in the next chapter that the marginal prior, when used in the context of parameter expansion, is actually computationally convenient for sampling from the posterior distribution. 5

28 Chapter 3 Correlation Estimation in the Saturated Model 3. Introduction As we have seen from the previous chapter, inference in the MVP model is complicated due to the identifiability issue which requires constraining the covariance to be a correlation matrix. There is no conjugate prior for correlation matrices and therefore the posterior is not easily sampled from. In this Chapter, we build on previous work and adopt a Bayesian approach that uses a combination of Gibbs sampling and data augmentation. Furthermore, we use a re-parametrization leading to an expansion of the parameter space. This helps significantly with the computation of the posterior distribution. We focus on R being a full T T correlation matrix. 3.2 Parameter Expansion and Data Augmentation 3.2. Data Augmentation Data Augmentation (DA) is an algorithm introduced by Tanner and Wong (987), very popular in statistics, used mainly to facilitate computation. These methods center on the construction of iterative algorithms by introducing artificial variables, referred to as missing data or latent variables. These variables may or may not have a physical interpretation but are mainly there for computational convenience. Let Y be the observed data, and θ be the unknown parameter of interest. 6

29 Chapter 3. Correlation Estimation in the Saturated Model If we are interested in making draws from f(y θ), the idea is to find a latent variable Z such that the joint distribution f(y, Z θ) is easily sampled from. The distribution of the observed data model is recovered by marginalizing the latent variable: f(y θ) = f(y, Z θ)dz (3.) Algorithm 3. Data Augmentation At iteration i. Draw Z f(z θ, Y ) f(y, Z θ) 2. Draw θ f(θ Z, Y ) f(y, Z θ)f(θ) The data augmentation algorithm 3. iterates between an imputation step where the latent variables are sampled and a posterior estimation step until convergence. The samples of the unknown parameter θ could then be used for inference Parameter Expansion for Data Augmentation Parameter Expansion for Data Augmentation (PX-DA), introduced by Liu and Wu (999), is a technique usually useful for accelerating convergence. The idea is that if we can find an hidden parameter α in the complete data model f(y, Z θ), we can then expand this model to a larger model p(y, W θ, α), that would preserve the distribution of the observed data model: p(y, W θ, α)dw = f(y θ) (3.2) We adopt the notation used in Liu and Wu (999), and use W instead of Z and p instead of f to denote the latent data and the distributions under the expanded model. To implement the DA algorithm in this setting, a joint prior on the expansion parameter α and the original parameter of interest θ needs to be specified such that the prior on θ is the same under the original model and the expanded model ( p(θ, α)dα = f(θ)). This can be done by maintaining the prior for θ at f(θ) and specifying a prior p(α θ). 7

30 Chapter 3. Correlation Estimation in the Saturated Model By iterating through the steps of algorithm 3.2, we are able to achieve a faster rate of convergence than the DA algorithm in 3.. Algorithm 3.2 PX-DA Algorithm At iteration i. Draw (α, W ) jointly by drawing α p(α θ) 2. Draw (α, θ) jointly by drawing W p(w θ, α, Y ) p(y, W θ, α) α, θ Y, W p(y, W θ, α)p(α θ)f(θ) Data Transformation Under certain conditions, an alternative view of the PX-DA treats W as the result of a transformation on the latent data Z induced by the expansion parameter α (Liu and Wu, 999, Scheme ). For this interpretation to hold, a transformation Z = t α (W ), needs to be defined such that for any fixed value of α, t α (W ) is a one-to-one differentiable mapping between Z and W: p(y, W θ, α) = f(y, t α (W ) θ) J α (W ) (3.3) where J α (W ) is the determinant of the Jacobian of the transformation T α evaluated at W. The algorithm is detailed in 3.3. Note that in the second step of algorithm 3.3, α is sampled from its prior distribution. This interpretation of the PX-DA algorithm is particularly useful in the case of MVP regression. 3.3 Proposed Model In the model we are proposing, we want to use PX-DA mainly to simplify computation. We adopt the scheme described in algorithm 3.3 (correspond- 8

31 Chapter 3. Correlation Estimation in the Saturated Model Algorithm 3.3 PX-DA Algorithm/ Data Transformation (scheme ) At iteration i. Draw Z f(z Y, θ), compute W = t α (Z) 2. Draw (α, θ) jointly conditional on the latent data α, θ Y, W p(y, t α (W ) θ) J α (W ) p(α θ)f(θ) ing to scheme in Liu and Wu (999)) Imputation Step Let θ = (R, β), be the identifiable parameter of interest. The first step of algorithm 3.3, involves drawing Z conditional on the identifiable parameter θ. This is achieved by sampling from a multivariate truncated Gaussian as in equation (2.4). For the generation of multivariate truncated Gaussian variables, we followed the approach outlined in Appendix D. This approach uses Gibbs steps to cycle through a series of univariate truncated Gaussians. In each step Z ij is simulated from Z ij Z i, j, β, R, which is a univariate Gaussian distribution truncated to [, ) if Y ij = and to (, ] if Y ij =. The parameters of the untruncated distribution Z ij Z i, j, β, R are obtained from the usual formulae for moments of conditional Gaussians Posterior Sampling Step Given the latent data sampled in step, we would like to draw (α, θ) from its posterior distribution. In order to implement step 2 of algorithm 3.3, we need to find an expansion parameter α, not identifiable from the observed data model, but identifiable from complete data-model. Subsequently, we need to define a transformation on the latent data. 9

32 Chapter 3. Correlation Estimation in the Saturated Model Defining the Expansion Parameter and the Transformation If we let Z = t α (W ) = D W (3.4) or alternatively W = DZ, where D is a diagonal matrix with positive diagonal elements d ii = Σ ii. The scale parameter D is not identifiable. For reasons which will become clear later, we could conveniently pick α = (α,..., α T ) to be a function of D by taking α i = rii 2d 2 i (3.5) where r ii is the ith diagonal element of R and d i is the ith diagonal element of D. In this case, for any fixed value of α, D is a one-to-one function of α and t α (W ) is a one-to-one differentiable mapping between Z and W. This choice of α is not arbitrary. It is conveniently picked so that when combined with the prior of (R, β), the transformed likelihood, and the Jacobian, it results in a posterior distribution that is easily sampled from. The Transformed Complete Likelihood: p(y, t α (W ) θ) J α (W ) For a given α, the determinant of the Jacobian resulting by going from (Z W ) under the transformation in 3. is given by: J : Z W = (Z,... Z n ) (W... W n ) (3.6) = (I n D ) (3.7) = D n (3.8) Combining the complete likelihood in equation 2.4 with the Jacobian, and after doing some algebra, we get: see a 3 3 example in Appendix C 2

33 Chapter 3. Correlation Estimation in the Saturated Model p(y, t α (W ) β, R) J : Z W = p(y, Z β, R) J : Z W (3.9) ( ) = R n 2 exp n (Z i X i β) R (Z i X i β) J : Z W 2 i= ( ) = D n R n 2 exp n (D(Z i X i β)) (DRD) (D(Z i X i β)) 2 i= ( ) = DRD n 2 exp n (W i X i βd) (DRD) (W i X i βd) 2 i= If we define Σ = DRD (3.) ɛ = D(Z Xβ) (3.) We can re-write the likelihood under the expanded data model in equation 3. as p(y, t α (W ) R, β) J α (W ) Σ ( n 2 exp tr Σ ɛ ɛ ) (3.2) The Prior: p (α θ)f(θ) For Bayesian inference, we need to define a joint prior on θ = (β, R) and α. We assume that β and R are independent a priori so that π(β, R, α) = p (α R)f(R)f(β). Under the transformation Σ = DRD, Barnard et al. (2) showed that if we take Σ to be a standard inverse Wishart distribution as in A.4 we can re-write the distribution of Σ as in B.: π(σ) = π(α, R) J : Σ D, R = f(r)p(α R) (3.3) Where with a particular choice of parameters, namely ν = T +, the distribution f(r) is as in 2.2 such as each r ij is uniform on the interval 2

34 Chapter 3. Correlation Estimation in the Saturated Model [, ]. Furthermore, the distribution of p (α R) is Gamma with shape parameter (T + )/2 and rate parameter. Therefore, we are able to get the desired prior distribution π(α R)π(R) by sampling Σ from a standard inverse Wishart with degrees of freedom ν = T +, and transforming using Σ = DRD. Here, we like to point out that both the prior distributions of R and β are the same under the expanded model and the observed data model. This is a condition required for the PX-DA algorithm. In addition, we note that R and α are not a priori independent. The independence of these parameters is a necessary condition only to prove the optimality of the convergence of algorithm 3.3. In this case, their independence is not key since we are using PX-DA mainly for the convenience in that it results in a posterior distributions that is easily sampled from. Posterior Distribution of (α, θ) Now that we have specified the expanded likelihood and prior on the parameters of interest (R, β) and the expansion parameter α, the joint posterior distribution of (β, R, α) conditional on the latent data can be computed: β, R, α Y, W p(y, t α (W ) β, R) J α (W ) f(r)f(β)p (α R) (3.4) where t α (W ) = Z = D W is the transformation of the latent data and J α (W ) is the determinant of the Jacobian of going from Z W. We could therefore put together the likelihood in 3.2 and the marginally uniform prior on R in 2.2, the Gamma prior on α in 3.3, and the prior on β in 2.9, we get: π(r, α, β Y, W ) Σ ( n 2 exp tr Σ ɛ ɛ ) T (T ) R 2 ( ( ) T + R ii ) (T +)/2 Gamma, 2 i exp(β ψ β β) (3.5) 22

35 Chapter 3. Correlation Estimation in the Saturated Model where the Gamma distribution is defined as in A.2. In order to sample from the joint posterior distribution in 3.5, we use a Gibbs Sampling framework, where we sample β Z, R and then sample R, α W. Since given R, the parameter β is identifiable, we sample it prior to transforming the data. Straightforward computations give the posterior distribution of β Y, Z, R. The normal distribution is the conjugate prior, therefore the posterior distribution of β will also follow a multivariate normal distribution with mean parameters β and covariance Ψ β where Ψ β = Ψ β + β = Ψ β ( n X ir X i i= n X ir Z) The joint posterior π(r, α Y, W, β) can be obtained from 3.5: i= π(r, α Y, W, β) Σ ( n 2 exp tr Σ ɛ ɛ } (3.6) T (T ) R 2 ( ( ) T + R ii ) (T +)/2 Gamma, 2 i We perform a change of variable Σ = DRD: π(σ Y, W, β) π(r, α Y, W, β) J α : (D, R) Σ = Σ ( n 2 exp tr Σ ɛ ɛ } Σ 2 2(T +) exp( 2 tr(σ )) = Σ 2 (ν+t +) exp( 2 tr(σ S)) (3.7) This is an inverse Wishart distribution with ν = n+t + and S = ɛ ɛ. The second line in the equation above is obtained by reversing the steps of the proof in Appendix B. 23

36 Chapter 3. Correlation Estimation in the Saturated Model Algorithm 3.4 Full PX-DA Sampling Scheme in Multivariate Probit At iteration i. Imputation Step Draw Z f(z Y, β, R) from a truncated Multivariate Normal distribution T MV N(Xβ, R) as described in Appendix D. 2. Posterior Sampling Step Draw (β, R, α) jointly conditional on the latent data : Draw β Z, Y, R from a Multivariate Normal distribution β MV N(β, Ψ β ) Draw α p (α R) from a Gamma distribution G( T + 2, ) Compute the diagonal matrix D, where each diagonal element d i = r ii 2α i and r ii is the ith diagonal element of R. compute W = t α (Z) = DZ or equivalently ɛ = D(Z Xβ). Draw Σ β, Y, W from an inverse Wishart distribution Σ IW (ν, S) where ν = n + T + and S = ɛ ɛ. compute R = D ΣD Repeat until convergence 3.4 Simulations In order to test the performance of the algorithm developed in the previous section, we conduct several simulation studies first with T = 3 and then we increase the dimension to T = 8. The data is simulated as follows: we generate a design matrix with p = 2 covariates from a uniform distribution from [.5,.5], we set the coefficients β = (, ) and we generate random error from a multivariate Gaussian distribution centered at and a full correlation matrix R. We fix R such that all ρ ij off-diagonal elements are of equal value. We try for different values of ρ namely.2,.4,.6, and.8. The following two loss functions are considered to evaluate the accuracy of 24

37 Chapter 3. Correlation Estimation in the Saturated Model the estimated correlation matrix: L ( ˆR, R) = tr( ˆRR ) log ˆRR T (3.8) L 2 ( ˆR, R) = tr( ˆRR I) 2 (3.9) Where ˆR is the estimated correlation and R is the true correlation used to generate the data. The first loss function is the entropy loss and the second is the quadratic loss. These loss functions are discussed in more detail in Yang and Berger (994). In each case, N = Gibbs samples are drawn and the first 5 are discarded as Burn-in. We tried multiple runs, to ensure convergence of results. The correlation is always initialized at the identity matrix, and the latent variables are initialized at Results for T = 3 For T = 3, three parameters in the correlation matrix are estimated. Table 3. outlines results from the simulations for the correlation matrix. The posterior median estimate is reported, the number of parameters falling within the 95% credible interval, the average interval length, as well as the entropy loss and the quadratic loss. 95% credible intervals are calculated based on 2.5% and 97.5% quantiles of the estimates. We can see that the likelihood carries more information with larger correlation values, estimation of the correlation becomes more accurate and confidence intervals become smaller on average. Similarly with more data, estimates become more precise and furthermore, we see a decrease in both the entropy and the quadratic loss. Except in one case (r ij =.2, n = 5), the true correlation coefficient was always included in the 95% credible interval. Figures 3. and 3.2, provide examples of traceplots and density plots for the correlation matrix with ρ ij =.4 and ρ ij =.8 respectively. Subfigures (a) and (b) in each case show how the density becomes narrower by increasing the sample size from n = to n =. Furthermore, we see 25

38 Chapter 3. Correlation Estimation in the Saturated Model that the algorithm mixes very well and converges fast. Table 3.: Correlation results from simulations for T = 3 Sample CI Contains Average CI Entropy Quadratic Size r ij Length Loss Loss.2 3/ / / / / / / / / / / / Table 3.2, shows simulation results for the regression coefficients β. For each coefficient, we report the median of the posterior distribution, a 95% credible interval and the standard error The true regression coefficients seems to always fall within the 95% credible interval. Standard errors and consequently credible intervals lengths tend to become smaller with the increase of correlation as well as the increase in sample size. Figures 3.3, 3.4, 3.5, and 3.6 provide trace plots, density and autocorrelation plots for the regression coefficient in the case where the correlation matrix has elements ρ ij =.4 and ρ ij =.8 and increasing the sample size from n = to n = respectively. The density becomes narrower with a larger sample size and here too, the algorithm seems to be mixing well. 26

39 Chapter 3. Correlation Estimation in the Saturated Model Table 3.2: Regression coefficients results from simulations for T = 3 Sample Confidence Standard Confidence Standard Size r ij ˆβ Interval Error ˆβ2 Interval Error (-.88,-.83) (.79,.86) (-.72,-.72) (.38,.37) (-.47,-.52) (.4,.36) (-.73,-.82).23.5 (.62,.49) (-.45,-.99).2.8 (.95,.4) (-.45,-.)..4 (.82,.26) (-.,-.7)..5 (.93,.35) (-.33,-.95)..93 (.75,.) (-.28,-.96).8.4 (.98,.3) (-.25,-.94).8.96 (.8,.2) (-.23,-.93).8.2 (.97,.26) (-.22,-.96).7.98 (.85,.).7 27

40 Chapter 3. Correlation Estimation in the Saturated Model Figure 3.: Correlation estimates for ρ =.4, T = 3 and increasing sample size from n = to n= (a) n = (b) n =

41 Chapter 3. Correlation Estimation in the Saturated Model Figure 3.2: Correlation estimates for ρ =.8, T = 3 and increasing sample size from n = to n= (a) n = (b) n =

42 Chapter 3. Correlation Estimation in the Saturated Model.5 β 2 β Traceplot Traceplot Samples Samples Autocorrelation Autocorrelation Figure 3.3: β estimates for ρ =.4, T = 3 and sample size n = 3

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton) Bayesian (conditionally) conjugate inference for discrete data models Jon Forster (University of Southampton) with Mark Grigsby (Procter and Gamble?) Emily Webb (Institute of Cancer Research) Table 1:

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal?

Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal? Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal? Dale J. Poirier and Deven Kapadia University of California, Irvine March 10, 2012 Abstract We provide

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Gibbs Sampling in Latent Variable Models #1

Gibbs Sampling in Latent Variable Models #1 Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Endogenous Variables Models Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take

More information

Lecture Notes based on Koop (2003) Bayesian Econometrics

Lecture Notes based on Koop (2003) Bayesian Econometrics Lecture Notes based on Koop (2003) Bayesian Econometrics A.Colin Cameron University of California - Davis November 15, 2005 1. CH.1: Introduction The concepts below are the essential concepts used throughout

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract Bayesian analysis of a vector autoregressive model with multiple structural breaks Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus Abstract This paper develops a Bayesian approach

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Index Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Adaptive rejection metropolis sampling (ARMS), 98 Adaptive shrinkage, 132 Advanced Photo System (APS), 255 Aggregation

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

Bayesian Methods in Multilevel Regression

Bayesian Methods in Multilevel Regression Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Bayesian Inference: Probit and Linear Probability Models

Bayesian Inference: Probit and Linear Probability Models Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 5-1-2014 Bayesian Inference: Probit and Linear Probability Models Nate Rex Reasch Utah State University Follow

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH Lecture 5: Spatial probit models James P. LeSage University of Toledo Department of Economics Toledo, OH 43606 jlesage@spatial-econometrics.com March 2004 1 A Bayesian spatial probit model with individual

More information

Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables

Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables Mohammad Emtiyaz Khan Department of Computer Science University of British Columbia May 8, 27 Abstract

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract Bayesian Estimation of A Distance Functional Weight Matrix Model Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies Abstract This paper considers the distance functional weight

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

variability of the model, represented by σ 2 and not accounted for by Xβ

variability of the model, represented by σ 2 and not accounted for by Xβ Posterior Predictive Distribution Suppose we have observed a new set of explanatory variables X and we want to predict the outcomes ỹ using the regression model. Components of uncertainty in p(ỹ y) variability

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Wageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice

Wageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice Wageningen Summer School in Econometrics The Bayesian Approach in Theory and Practice September 2008 Slides for Lecture on Qualitative and Limited Dependent Variable Models Gary Koop, University of Strathclyde

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham MCMC 2: Lecture 3 SIR models - more topics Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. What can be estimated? 2. Reparameterisation 3. Marginalisation

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous

More information

A BAYESIAN APPROACH TO SPATIAL CORRELATIONS IN THE MULTIVARIATE PROBIT MODEL

A BAYESIAN APPROACH TO SPATIAL CORRELATIONS IN THE MULTIVARIATE PROBIT MODEL A BAYESIAN APPROACH TO SPATIAL CORRELATIONS IN THE MULTIVARIATE PROBIT MODEL by Jervyn Ang B.Sc, Simon Fraser University, 2008 a Project submitted in partial fulfillment of the requirements for the degree

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access Online Appendix to: Marijuana on Main Street? Estating Demand in Markets with Lited Access By Liana Jacobi and Michelle Sovinsky This appendix provides details on the estation methodology for various speci

More information

MULTILEVEL IMPUTATION 1

MULTILEVEL IMPUTATION 1 MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

AMS-207: Bayesian Statistics

AMS-207: Bayesian Statistics Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

A Bayesian Probit Model with Spatial Dependencies

A Bayesian Probit Model with Spatial Dependencies A Bayesian Probit Model with Spatial Dependencies Tony E. Smith Department of Systems Engineering University of Pennsylvania Philadephia, PA 19104 email: tesmith@ssc.upenn.edu James P. LeSage Department

More information

Monte Carlo Composition Inversion Acceptance/Rejection Sampling. Direct Simulation. Econ 690. Purdue University

Monte Carlo Composition Inversion Acceptance/Rejection Sampling. Direct Simulation. Econ 690. Purdue University Methods Econ 690 Purdue University Outline 1 Monte Carlo Integration 2 The Method of Composition 3 The Method of Inversion 4 Acceptance/Rejection Sampling Monte Carlo Integration Suppose you wish to calculate

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals John W. Mac McDonald & Alessandro Rosina Quantitative Methods in the Social Sciences Seminar -

More information

Supplementary Material for Analysis of Job Satisfaction: The Case of Japanese Private Companies

Supplementary Material for Analysis of Job Satisfaction: The Case of Japanese Private Companies Supplementary Material for Analysis of Job Satisfaction: The Case of Japanese Private Companies S1. Sampling Algorithms We assume that z i NX i β, Σ), i =1,,n, 1) where Σ is an m m positive definite covariance

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

Bayesian inference for factor scores

Bayesian inference for factor scores Bayesian inference for factor scores Murray Aitkin and Irit Aitkin School of Mathematics and Statistics University of Newcastle UK October, 3 Abstract Bayesian inference for the parameters of the factor

More information

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK Practical Bayesian Quantile Regression Keming Yu University of Plymouth, UK (kyu@plymouth.ac.uk) A brief summary of some recent work of us (Keming Yu, Rana Moyeed and Julian Stander). Summary We develops

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Monte Carlo Integration using Importance Sampling and Gibbs Sampling

Monte Carlo Integration using Importance Sampling and Gibbs Sampling Monte Carlo Integration using Importance Sampling and Gibbs Sampling Wolfgang Hörmann and Josef Leydold Department of Statistics University of Economics and Business Administration Vienna Austria hormannw@boun.edu.tr

More information