Data Augmentation for the Bayesian Analysis of Multinomial Logit Models

Size: px
Start display at page:

Download "Data Augmentation for the Bayesian Analysis of Multinomial Logit Models"

Transcription

1 Data Augmentation for the Bayesian Analysis of Multinomial Logit Models Steven L. Scott, University of Southern California Bridge Hall 401-H, Los Angeles, CA Key Words: Markov chain Monte Carlo, logistic regression, data augmentation 1. Introduction This article introduces a Markov chain Monte Carlo (MCMC) method for sampling the parameters of a multinomial logit model from their posterior distribution. Let y i {0,...,M} denote the categorical response of subject i with covariates x i = (x i1,..., x ip ) T. Let X = (x 1,...,x n ) T denote the design matrix, and let y = (y 1,..., y n ) T. Multinomial logit models relate y i to x i through p(y i = m) exp{g m (x i β)} = λ im (1) where g m (x i β) is a linear function and β is a parameter vector. Adding the same constant to each g m (x β) leaves (1) unchanged, so one commonly assumes g 0 (x β) = 0 to preserve identifiability. The general function notation masks subtleties in the linear predictor that distinguish several varieties of multinomial logit models. For example, equation (1) can be made to model either ordinal or nominal responses by suitably constraining the linear predictor. By extending x through basis expansions equation (1) includes generalized additive multinomial logit models (Abe, 1999). Variants of multinomial logit models occur frequently in many areas of application. The models are especially important in econometrics (McFadden, 1974), where they are referred to as discrete choice models, and as a component of the partial credit models used in educational testing (Muraki, 1992). Despite multinomial logit models obvious importance in applied research, Bayesian statisticians typically prefer to work with multinomial probit models instead. The preference for multinomial probit is largely due to a convenient Gibbs sampling algorithm introduced by McCulloch and Rossi (1994), which is a multivariate version of the probit regression algorithm of Albert and Chib (1993) (henceforth MRAC). The MRAC algorithm is a data augmentation scheme which alternates between simulating a latent multivariate Gaussian vector for each subject, given observed data and model parameters, and simulating model parameters given complete data. The MRAC algorithm is widely used, even though computationally superior methods have been developed (e.g. van Dyk and Meng, 2001; Liu and Wu, 1999). MRAC s appeal lies it its aesthetic simplicity, which derives from stochastically replacing the nonlinear probit likelihood with a complete data likelihood based on the identity link. The identity link allows model parameters to be simulated from closed form full conditional distributions. Consequently MRAC avoids the tuning constants required by most Metropolis-Hastings algorithms, which means that MRAC is a default method that can be implemented with little expertise on the part of the user. It is easy to program and easy to explain to clients. The analytically tractable complete data likelihood also makes it easy to embed probit regression models into more elaborate hierarchical or random effects models. The sampler introduced in this article is the natural extension of MRAC to multinomial logit models. Unlike MRAC, multinomial logit models do not admit a Gibbs sampler with closed form full conditional distributions. However it is possible to simulate a set of latent variables with mean g m (x β) and known, constant variance. The latent variables can be combined with frequentist asymptotic theory for linear models, or with the Bayesian method of moments (Zellner, 1997), to produce a closed form surrogate distribution that approximates the full conditional distribution of the model parameters given complete data. The draw from the surrogate distribution is filtered using a Metropolis-Hastings probability (Metropolis et al., 1953; Hastings, 1970) to produce a draw from the desired posterior distribution p(β X, y). The proposed sampler inherits MRAC s considerable aesthetic appeal, and several practical features as well. First, the method is simple to program, and it readily extends to complex stochastic systems which include multinomial logit models as embedded components. Second, the proposal distribution is tailored to the target distribution at each iteration of the sampler without invoking iterative root finding methods which might fail for computational reasons at some un-

2 lucky draw of β. Third, the sampler requires no tuning constants typically needed for random walk Metropolis-Hastings algorithms, so its burden on practitioners is minimal. Computationally, the sampler only evaluates the complete data likelihood once during each iteration. The complete data likelihood is simpler than the full multinomial logit likelihood because it avoids the multinomial logit normalizing constant. As a result the sampler is computationally faster than one scalar at a time sampling methods, such as adaptive rejection sampling, which require several likelihood evaluations per draw of β. The proposed method can handle both continuous and categorical covariates, so it is more flexible than methods based on the multinomial-poisson transformation (Baker, 1994; Spiegelhalter, Thomas, Best, and Gilks, 1996; Chen and Kuo, 2001). Finally, the proposed sampler can be modified to work with Poisson regression. The remainder of the article is structured as follows. Section 2. explains the latent exponential sampler in general terms, without reference to a specific form for g m (x i β). Section 3. reviews several subfamilies of multinomial logit models and explains how the sampler can be applied to each. Section 4. illustrates the algorithm on a real data set. Section 5. provides a concluding discussion. 2. The Latent Exponential Sampler Let E(λ) denote the exponential distribution with rate λ, and let Z = (z im ) denote a matrix of independent exponential random variables with rows z i = (z i0,...,z im ), where p(z im X, β) = E(λ im ) with λ im defined in (1). If y i = arg min m (z im ) then p(y i = m X, β) λ im, which is the multinomial logit model. Note that y is a deterministic function of Z, so that the complete data likelihood that would be obtained if Z were observed is n M 1 p(z X, β) = exp{g m (x i β m ) i=1 m=0 z im exp{g m (x i β m )}}. (2) which does not involve y. Equation (2) implies a convenient conditional independence property. Many variants of multinomial logit models are parameterized so that g m (x β) = g(x δ, β m ) where β m and β m are distinct for m m and δ is a parameter shared by all response levels. If the {β m } are independent in the prior distribution p(β X, δ) then they remain independent in p(β X, Z, δ). This property is absent from p(β X,y, δ) because of the normalizing constant in (1). The latent exponential sampler cycles between three steps: sampling Z from p(z X, y, β), proposing a new value of β from a surrogate for p(β X,Z) constructed using a transformation of Z, and promoting either the proposal or the current β according to a Metropolis-Hastings probability. 2.1 Sampling Latent Data Sampling Z from p(z X,y, β) is trivial because z 1,...,z n are conditionally independent given (β,x,y). To draw z i, first draw the minimal element z iyi from p(z iyi X,y, β) = E( m λ im ). Then the memoryless property of the exponential distribution implies z im = z iyi + z im with z im E(λ im ) independently for m y i. If the identifiability constraint g 0 (x β) = 0 is imposed then β is independent of z i0 in (2). However z i0 must be sampled during the data augmentation step to maintain the scale of the imputed variables. 2.2 Sampling β Equation (2) fails to suggest a closed form full conditional distribution for β, but a closed form surrogate exists. Any exponential random variable z E(λ) may be written z = e/λ, with e E(1). Thus log(z) = log(e) log(λ) has mean µ log(λ) and variance σ 2, where µ and σ 2 are the mean and variance of log(e). Because log(e) follows an extreme value distribution (Johnson et al., 1995) we have µ = γ, the negative of Euler s constant , and σ 2 = π 2 /6. In particular note that σ 2 does not depend on λ. Therefore u im µ log(z im ) [g m (x i β), σ 2 ], (3) with square brackets denoting a random variable s mean and variance. The random variables in U = (u im ) are independent observations with constant variance whose expected value is the linear predictor. Therefore, frequentist theory for linear models implies that ˆβ, the least squares estimate of β in a regression of U on X, has limiting distribution p(ˆβ X, β) N(β, V ), where V is a known function of X and σ 2. If the prior distribution for β is Gaussian, say β N(α, Σ), then a closed form surrogate for the full conditional distribution is p(β X, ˆβ) ) = N (Ω(Σ 1 α + V 1 ˆβ), Ω, (4) where Ω 1 = Σ 1 +V 1. In many cases equation (4) is the full conditional distribution that would be obtained if the observations in (3) were Gaussian with 2

3 the specified mean and variance. The same proposal distribution can be justified, without asymptotics, based on the maximum entropy principle using the Bayesian method of moments (Zellner, 1997). A candidate β is drawn from equation (4) and compared to the current β (t) through { f(β)/f(β (t) } ) α = min q(β)/q(β (t) ), 1 (5) where f(β) = p(β X,Z) and q(β) = p(β X, ˆβ). The candidate β is promoted to β (t+1) with probability α, otherwise β (t+1) = β (t). Note that the conditional independence properties of equation (2) are also present in (4). 2.3 Asymptotics The proposal distribution violates the likelihood principle because ˆβ is not a sufficient statistic for β. The cost of replacing the full conditional distribution p(β X, Z) with its method of moments approximation p(β X, ˆβ) can be seen by examining the limiting behavior of the two distributions as n. If y truly follows a multinomial logit model with parameter β 0, and if the prior distribution p(β) is such that it is eventually dominated by the likelihood, then both p(β X, ˆβ) and p(β X,Z) are asymptotically normal with mean β 0 (Le Cam and Yang, 2000). However, it is easy to show that the asymptotic variance of p(β X, ˆβ) is σ 2 times that of p(β X, Z). Figure 1 contrasts the proposal and full conditional distributions for β based on a simulated data set of 100 observations from the model z i E(exp(.2x i )) where x i U(0, 1). Figure 1 reminds us that the proposal and full conditional distributions can have different means in finite samples, even though the means would be the same in the limit. The inflated variance of the proposal distribution relative to the full conditional is readily apparent. Upon viewing Figure 1, one is tempted to replace the variance of p(β X, ˆβ) with the asymptotic variance of p(β X,Z) by simply setting σ 2 = 1 in the computer code that fits the model. However, notice that doing so places much smaller posterior mass near the true β 0 = 0.2 in Figure 1 than either the proposal or the full conditional distribution. In fact, we know that Metropolis-Hastings algorithms with heavy tailed proposals have desirable mixing properties (see, e.g. Mengersen and Tweedie, 1996), so the increased variance of p(β X, ˆβ) relative to p(β X, Z) is actually something of a blessing. In practice one prefers to inflate the tails of (4) even further, for example by replacing it with a multivariate T distribution with small (e.g. 3) degrees of freedom Full Cond. Proposal Adj. Var Figure 1: Comparing p(β X, Z) ( Full Cond. ) and p(β X, ˆβ) ( Proposal ) assuming the prior p(β) 1. Adj. Var. is the proposal distribution rescaled to have the same asymptotic variance as p(β X,Z). 2.4 Motivation The latent exponential sampler can be motivated by either of two primary means. The first is a utility maximization argument which has been heavily used in the econometrics literature since its introduction by McFadden (1974). See Train (2003) for a recent review and bibliography. Conceptually, u im represents the perceived utility of choice m for subject i, which is linearly related to x i. Then subject i chooses y i to maximize his perceived utility. The proposed sampler stochastically restores the unobservable utilities. Of course the utilities need not physically exist for the sampler to use them as a computational device. That is, multinomial logit models apply equally well to physical systems which lack rational decision makers exercising free will. The sampler may also be viewed as a multinomial-poisson transformation, a name given to the dual relationship between the multinomial and Poisson likelihoods. Several authors (Baker, 1994; Chen and Kuo, 2001; Spiegelhalter et al., 1996) have used the multinomial-poisson transformation to approximate p(β X, y) with a normal distribution based on a Poisson regression of y on X determined by iteratively reweighted least squares. However, the approximation involves an additional parameter for each distinct covariate pattern in X, which limits its usefulness when X contains continuously measured variables. The beta 3

4 latent exponential sampler views the exponential likelihood as primitive, rather than the Poisson. The advantage is that one can achieve linearity on the scale of the parameters by taking the log of exponential data. The same cannot be said for Poisson data, which has a positive probability of being zero. 3. Multinomial Logit Models To illustrate the variety of multinomial logit models with which the latent exponential sampler can be used, this Section reviews several versions and extensions of the model in equation (1) and explains how the sampler can be applied to each. The models discussed include multinomial logistic regression, discrete choice models, ordinal logit models, and additive multinomial logit models. More elaborate models, such as random effects models and partial credit models can also be accommodated, but are not discussed here due to space constraints (but see Scott and Ip, 2002). 3.1 Multinomial Logistic Regression Multinomial logistic regression sets g m (x i β) = β T m x i, where the {β m } are distinct across m with β 0 = 0 for identifiability. If u m denotes column m of U = (u im ), and if one assumes independent prior distributions β m N(α m, Σ m ) for each m, then the proposal distribution for β m is p(β m X, ˆβ) = N [ Ω m (Σ 1 m α m + X T u m /σ 2 ), Ω m ] where Ω 1 m (6) = (Σ 1 m + XT X/σ 2 ). Note that Ω m depends only on known quantities, so it only needs to be computed once. It depends on m only through the prior distribution. The {β m } can be treated separately because equation (2) implies their conditional independence given Z. That contrasts with the asymptotic variance obtained from the Hessian matrix of the observed multinomial logistic regression log likelihood 2 l n β β T = (diag π i π i π T i ) x ix T i, (7) i=1 where π i = (π i1,..., π im ) with π im = exp(λ im )/ M k=0 exp(λ ik). The Hessian matrix in (7) has M times as many rows and columns as the variance of the proposal distribution in (6). Thus, conditioning on Z replaces one large matrix factorization with M smaller ones differing only in the prior variance Σ m. 3.2 Discrete Choice Models Discrete choice models differ from ordinary multinomial logit models in that some covariates are response specific while others are subject specific. For example, if y i indicates the type of car purchased by customer i then the car s gas mileage is response specific, whereas the customer s age is subject specific. Response specific covariates shift a subscript from the coefficient to the covariate, which substantially reduces the dimension of the parameter space. Let x i denote the (p 1) vector of subject specific characteristics for observation i, and let w im denote the (q 1) vector of response specific characteristics for potential response m on observation i. Then one may write p(y i = m) exp(β T mx i + δ T w im ). (8) Assuming β 0 = 0 for identifiability, a convenient algorithm for sampling from p(β, δ X,y) is as follows. (1) Generate U = (u im ) as in Section 2.. (2) Sample δ from p(δ X,Z, β). (3) Sample β m from p(β m X,Z, δ) for m = 1,...,M. Step 2 can be accomplished by defining u (d) im = u im βmx T i, then stacking u (d) im into the nm 1 vector u(d), and wim T into nm q matrix W. Assuming independent Gaussian priors δ N(α d, Σ d ) and β m N(α m, Σ m ), the proposal distribution for δ is p(δ X, β, ˆδ) [ ] = N Ω d (Σ 1 d α d + W T u (d) /σ 2 ), Ω d, (9) where Ω 1 d = Σ 1 d + W T W/σ 2. Step 3 can be realized by creating u (b) im = u im δ T w im, and then proceeding as in Section Generalized Additive Multinomial Logit Models Generalized additive multinomial logit models (Abe, 1999) extend discrete choice models by assuming log λ im = p s mp (x ip β m ) + q s q (w imq δ) where s mp and s q are scalar functions of scalar arguments to be estimated by a spline or some other smoother indexed by the parameters β m and δ. Hastie and Tibshirani (2000) describe a Bayesian backfitting algorithm for fitting additive models under the linear link. Their advice for fitting generalized additive models under other link functions is to use the Metropolis algorithm but they offer no guidance on how to create appropriate proposal distributions. 4

5 By conditioning on Z, the latent exponential sampler splits log λ im into M +1 conditionally independent additive models under the identity link. Thus sampling δ and β can proceed as in Section 3.2, where a proposal distribution for δ or β m (for each m) is attained by one iteration of Hastie and Tibshirani s Bayesian backfitting algorithm. Note that if s mp and s q are splines then Bayesian backfitting algorithm simply applies the methods of Section 3.2 after a basis expansion of x i and w im, albeit with computational tricks to capitalize on the structure of the spline basis functions. 3.4 Ordinal Logit Models Ordinal logit models, which are described by McCullagh and Nelder (1989, Chapter 5.2.3) and Agresti (1990, Chapter 6), assume g m (x i ) = η m + s m γ T x i, where γ and η m are parameters and s m is a known, real-valued score assigned to level m. In practice one often sets s m = m unless a better alternative presents itself. These ordinal logit models are distinct from cumulative logit models, in which the support of an unobserved logistic distribution is partitioned into regions corresponding to the levels of the observed response (Johnson and Albert, 1999). Assuming the identifiability constraint s 0 = η 0 = 0, one may construct a proposal distribution for ordinal logit models as follows. Let s = (s 1,..., s M ) T, let X i = (sx T i, I) where I is the M M identity matrix, and set β = (γ T, η T ) T. Then g m (x i β) is row m of the vector X i β. Form the design matrix by stacking the X i, so that X = (X T 1,...,XT n) T, and form the response vector u analogously. Then the proposal distribution is obtained by regressing u on X with parameter β, as in (6). 4. Data Example In lieu of a simulation study, this Section considers a traditional multinomial logit problem where likelihood methods perform adequately. This section compares the latent exponential sampler to several random walk Metropolis samplers using data describing automobile preferences for n = 263 customers (Foster et al., 1998). This example assumes the multinomial logistic regression model with flat priors p(β m ) 1 for all m. The outcome variable is the type of car purchased (Family=0, Sporty=1, or Work=2). The covariates used are Age (in years), Sex (1 if Female, 0 if Male), and Marital Status (1 if Single, 0 if Married). All are subject specific variables. Note that this example precludes comparisons with the multinomial-poisson transformation because the data include Age, a continuous covariate. Table 1 and Figure 2 compare MCMC output for the latent exponential method to samplers labeled RW k, where k is the number of parameters simulated in each Metropolis proposal. The RW k samplers propose β (t+1) N[β (t), τ 2 I], where I is the k k identity matrix. Thus RW all proposes all elements of β in a single draw and accepts or rejects the entire parameter, RW p proposes, accepts, or rejects each β m vector individually, and RW 1 performs the Metropolis algorithm on each scalar element of β. Table 1 records the time required for each sampler to produce 10,000 iterations, the fraction of proposals that were accepted, and the estimated posterior means and standard deviations of each component of β. Figure 2 displays the corresponding MCMC sample paths for the coefficient of Sex on Sporty cars (plots of other coefficients are similar). The latent exponential sampler converges almost immediately, accepts a high fraction of proposed deviates, and closely agrees with maximum likelihood point estimates and standard errors. The computational speed of the latent exponential sampler compares favorably with the fastest random walk Metropolis algorithms. This is because most of the computational effort in Metropolis-Hastings samplers comes from evaluating likelihoods required to compute the acceptance probability. The latent exponential and RW all samplers each have only one such evaluation per iteration. The RW k samplers fare poorly in Table 1 and Figure 2. None have converged by the end of 10,000 iterations, which causes them to produce misleading estimates of posterior mean and standard error. The RW all samplers are computationally fast, but mix poorly because most of their proposals are rejected. To obtain an acceptance rate competitive with the latent exponential and hybrid methods, RW k must either choose τ to be small or k to be large, which slows the algorithm with respect to mixing, computational time, or both. 5. Discussion The latent exponential sampler is a convenient method for sampling the posterior distribution of general multinomial logit models. The sampler requires simulating a collection of exponential random variables Z, a minor computational burden. Conditioning on Z allows β to be sampled from approximate full conditional proposal distributions which require no tuning, without resorting to iterative root finding methods. The latent exponential method accommodates both categorical and continuous covariates and allows parameters of (g m ) which are distinct 5

6 Table 1: MCMC output summaries for several samplers, each run for 10,000 iterations on the auto choice data. Estimated posterior means are shown, with estimated posterior standard errors in parentheses. The first line of coefficients in each group is for sporty cars, the second is for work cars. Family cars are the baseline. Sampler Time (s) Accept % Intercept Age Sex Mar.Stat. RW 1 (τ =.05) (0.61) (0.02) (0.09) (0.35) (0.24) (0.01) (0.21) (0.18) RW 1 (τ =.01) (0.43) (0.02) (0.15) (0.55) (0.28) (0.01) (0.20) 0.16 RW M (τ =.05) (1.24) (0.04) (0.29) (0.36) (0.44) (0.02) (0.31) 0.30 RW M (τ =.01) (0.32) (0.02) (0.18) (0.47) (0.36) (0.01) (0.25) (0.24) RW MP (τ =.05) (1.14) (0.04) (0.31) (0.31) (1.12) (0.03) (0.43) (0.42) RW MP (τ =.01) (0.51) (0.02) (0.24) (0.36) (0.42) (0.01) (0.26) (0.34) Latent Exponential (0.92) (0.03) (0.31) (0.32) (0.95) (0.03) (0.34) (0.39) MLE (0.95) (0.03) (0.31) (0.32) (0.97) (0.03) (0.35) (0.38) 6

7 (a) RW 1 (τ =.05) (b) RW M (τ =.05) (c) RW MP (τ =.05) (d) RW 1 (τ =.01) (e) RW M (τ =.01) (f) RW MP (τ =.01) (g) Latent exponential Figure 2: Sample paths for the coefficient of Sex on Sporty cars The reference lines represent the maximum likelihood estimate computed from SAS, and ±2/ ± 3 standard errors. 7

8 across m to be independently sampled. There are several ways to motivate the latent exponential sampler. It is the natural generalization of Albert and Chib (1993) to multilevel categorical responses under the logit link. The utility restoration argument in Section 2. will be familiar to econometricians. One can also view method as a twist on the well established relationship between the multinomial and Poisson likelihoods, known at least since Birch (1965). By exploiting the duality between the Poisson and exponential distributions, the latent exponential method is able to replace the log link for the Poisson likelihood with the identity link. Conditioning on Z allows the modeler to work with the identity link by transforming imputed data instead of functions of model parameters, which removes a layer of complexity from the model. Finally, this article has focused on the fact that the latent exponential sampler is a convenient method for sampling from the posterior distribution of multinomial logit models. It may be the key to finding a rapidly mixing sampler as well. The parameter expansion methods described by van Dyk and Meng (2001) and Liu and Wu (1999) have produced much more rapidly mixing samplers in binomial probit regression models. I am currently investigating whether similar results can be obtained for multinomial logit models. References Abe, M. (1999). A generalized additive model for discrete-choice data. Journal of Business and Economic Statistics 17, Agresti, A. (1990). Categorical Data Analysis. Wiley. Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88, Baker, S. G. (1994). The multinomial-poisson transformation. The Statistician 43, Birch, M. W. (1965). The defection of partial association, II: The general case. Journal of the Royal Statistical Society, Series B, Methodological 27, Chen, Z. and Kuo, L. (2001). A note on the estimation of multinomial logit models with random effects. The American Statistician 55, Foster, D. P., Stine, R. A., and Waterman, R. P. (1998). Business Analysis Using Regression. Springer. Hastie, T. and Tibshirani, R. (2000). Bayesian backfitting (with discussion). Statistical Science 15, Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, Johnson, N. L., Kotz, S., and Balakrishnan, N. (1995). Continuous Univariate Distributions, vol. 2. Wiley Interscience, Somerset, NJ, 2nd edn. Johnson, V. E. and Albert, J. H. (1999). Ordinal Data Modeling. Springer-Verlag. Le Cam, L. M. and Yang, G. L. (2000). Asymptotics in Statistics: Some Basic Concepts. Springer-Verlag. Liu, J. S. and Wu, Y. N. (1999). expansion for data augmentation. Journal of the American Statistical Association 94, McCullagh, P. and Nelder, J. A. (1989). Generalized linear models (Second edition). Chapman & Hall. McCulloch, R. and Rossi, P. E. (1994). An exact likelihood analysis of the multinomial probit model. Journal of Econometrics 64, McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka, ed., Frontiers in Econometrics, Academic Press. Mengersen, K. L. and Tweedie, R. L. (1996). Rates of convergence of the Hastings and Metropolis algorithms. The Annals of Statistics 24, Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953). Equation of state calculations by fast computing machines. Journal of Chemical Physics 21, Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement 16, Scott, S. L. and Ip, E. H. (2002). Empirical Bayes and item clustering effects in a latent variable hierarchical model: A case study from the National Assessment of Educational Progress. Journal of the American Statistical Association 97, 458, Spiegelhalter, D. J., Thomas, A., Best, N. G., and Gilks, W. R. (1996). BUGS: Bayesian inference Using Gibbs Sampling, Version 0.5, (version ii). Train, K. E. (2003). Discrete Choice Methods with Simulation. Cambridge University Press, New York. Available at van Dyk, D. A. and Meng, X.-L. (2001). The art of data augmentation (disc: P ). Journal of Computational and Graphical Statistics 10, Zellner, A. (1997). The Bayesian method of moments (BMOM): Theory and applications. Advances in Econometrics 12,

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Gibbs Sampling in Latent Variable Models #1

Gibbs Sampling in Latent Variable Models #1 Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Lecture Notes based on Koop (2003) Bayesian Econometrics

Lecture Notes based on Koop (2003) Bayesian Econometrics Lecture Notes based on Koop (2003) Bayesian Econometrics A.Colin Cameron University of California - Davis November 15, 2005 1. CH.1: Introduction The concepts below are the essential concepts used throughout

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1

A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1 Int. J. Contemp. Math. Sci., Vol. 2, 2007, no. 13, 639-648 A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1 Tsai-Hung Fan Graduate Institute of Statistics National

More information

Markov Chain Monte Carlo in Practice

Markov Chain Monte Carlo in Practice Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

A Bayesian perspective on GMM and IV

A Bayesian perspective on GMM and IV A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

MULTILEVEL IMPUTATION 1

MULTILEVEL IMPUTATION 1 MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression

More information

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Sampling Methods (11/30/04)

Sampling Methods (11/30/04) CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with

More information

Dynamic Generalized Linear Models

Dynamic Generalized Linear Models Dynamic Generalized Linear Models Jesse Windle Oct. 24, 2012 Contents 1 Introduction 1 2 Binary Data (Static Case) 2 3 Data Augmentation (de-marginalization) by 4 examples 3 3.1 Example 1: CDF method.............................

More information

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Index Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Adaptive rejection metropolis sampling (ARMS), 98 Adaptive shrinkage, 132 Advanced Photo System (APS), 255 Aggregation

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal?

Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal? Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal? Dale J. Poirier and Deven Kapadia University of California, Irvine March 10, 2012 Abstract We provide

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Subjective and Objective Bayesian Statistics

Subjective and Objective Bayesian Statistics Subjective and Objective Bayesian Statistics Principles, Models, and Applications Second Edition S. JAMES PRESS with contributions by SIDDHARTHA CHIB MERLISE CLYDE GEORGE WOODWORTH ALAN ZASLAVSKY \WILEY-

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University

More information

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH Lecture 5: Spatial probit models James P. LeSage University of Toledo Department of Economics Toledo, OH 43606 jlesage@spatial-econometrics.com March 2004 1 A Bayesian spatial probit model with individual

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of

More information

Discussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance

Discussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance Discussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance by Casarin, Grassi, Ravazzolo, Herman K. van Dijk Dimitris Korobilis University of Essex,

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350 bsa347 Logistic Regression Logistic regression is a method for predicting the outcomes of either-or trials. Either-or trials occur frequently in research. A person responds appropriately to a drug or does

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis

Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis Joyee Ghosh Institute of Statistics and Decision Sciences, Duke University Box 90251, Durham, NC 27708 joyee@stat.duke.edu

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Random Effects Models for Network Data

Random Effects Models for Network Data Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Numerical Analysis for Statisticians

Numerical Analysis for Statisticians Kenneth Lange Numerical Analysis for Statisticians Springer Contents Preface v 1 Recurrence Relations 1 1.1 Introduction 1 1.2 Binomial CoefRcients 1 1.3 Number of Partitions of a Set 2 1.4 Horner's Method

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

PACKAGE LMest FOR LATENT MARKOV ANALYSIS PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,

More information

MONTE CARLO METHODS. Hedibert Freitas Lopes

MONTE CARLO METHODS. Hedibert Freitas Lopes MONTE CARLO METHODS Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu

More information

Discussion of Maximization by Parts in Likelihood Inference

Discussion of Maximization by Parts in Likelihood Inference Discussion of Maximization by Parts in Likelihood Inference David Ruppert School of Operations Research & Industrial Engineering, 225 Rhodes Hall, Cornell University, Ithaca, NY 4853 email: dr24@cornell.edu

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Longitudinal and Panel Data: Analysis and Applications for the Social Sciences. Table of Contents

Longitudinal and Panel Data: Analysis and Applications for the Social Sciences. Table of Contents Longitudinal and Panel Data Preface / i Longitudinal and Panel Data: Analysis and Applications for the Social Sciences Table of Contents August, 2003 Table of Contents Preface i vi 1. Introduction 1.1

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Rank Regression with Normal Residuals using the Gibbs Sampler

Rank Regression with Normal Residuals using the Gibbs Sampler Rank Regression with Normal Residuals using the Gibbs Sampler Stephen P Smith email: hucklebird@aol.com, 2018 Abstract Yu (2000) described the use of the Gibbs sampler to estimate regression parameters

More information

Non-Markovian Regime Switching with Endogenous States and Time-Varying State Strengths

Non-Markovian Regime Switching with Endogenous States and Time-Varying State Strengths Non-Markovian Regime Switching with Endogenous States and Time-Varying State Strengths January 2004 Siddhartha Chib Olin School of Business Washington University chib@olin.wustl.edu Michael Dueker Federal

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Kobe University Repository : Kernel

Kobe University Repository : Kernel Kobe University Repository : Kernel タイトル Title 著者 Author(s) 掲載誌 巻号 ページ Citation 刊行日 Issue date 資源タイプ Resource Type 版区分 Resource Version 権利 Rights DOI URL Note on the Sampling Distribution for the Metropolis-

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.

More information

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i, Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives Often interest may focus on comparing a null hypothesis of no difference between groups to an ordered restricted alternative. For

More information

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm On the Optimal Scaling of the Modified Metropolis-Hastings algorithm K. M. Zuev & J. L. Beck Division of Engineering and Applied Science California Institute of Technology, MC 4-44, Pasadena, CA 925, USA

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access Online Appendix to: Marijuana on Main Street? Estating Demand in Markets with Lited Access By Liana Jacobi and Michelle Sovinsky This appendix provides details on the estation methodology for various speci

More information

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Bayesian Inference in the Multivariate Probit Model

Bayesian Inference in the Multivariate Probit Model Bayesian Inference in the Multivariate Probit Model Estimation of the Correlation Matrix by Aline Tabet A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science

More information

Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Endogenous Variables Models Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take

More information

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Control Variates for Markov Chain Monte Carlo

Control Variates for Markov Chain Monte Carlo Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability

More information

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information