Data Augmentation for the Bayesian Analysis of Multinomial Logit Models
|
|
- Jonah Shelton
- 5 years ago
- Views:
Transcription
1 Data Augmentation for the Bayesian Analysis of Multinomial Logit Models Steven L. Scott, University of Southern California Bridge Hall 401-H, Los Angeles, CA Key Words: Markov chain Monte Carlo, logistic regression, data augmentation 1. Introduction This article introduces a Markov chain Monte Carlo (MCMC) method for sampling the parameters of a multinomial logit model from their posterior distribution. Let y i {0,...,M} denote the categorical response of subject i with covariates x i = (x i1,..., x ip ) T. Let X = (x 1,...,x n ) T denote the design matrix, and let y = (y 1,..., y n ) T. Multinomial logit models relate y i to x i through p(y i = m) exp{g m (x i β)} = λ im (1) where g m (x i β) is a linear function and β is a parameter vector. Adding the same constant to each g m (x β) leaves (1) unchanged, so one commonly assumes g 0 (x β) = 0 to preserve identifiability. The general function notation masks subtleties in the linear predictor that distinguish several varieties of multinomial logit models. For example, equation (1) can be made to model either ordinal or nominal responses by suitably constraining the linear predictor. By extending x through basis expansions equation (1) includes generalized additive multinomial logit models (Abe, 1999). Variants of multinomial logit models occur frequently in many areas of application. The models are especially important in econometrics (McFadden, 1974), where they are referred to as discrete choice models, and as a component of the partial credit models used in educational testing (Muraki, 1992). Despite multinomial logit models obvious importance in applied research, Bayesian statisticians typically prefer to work with multinomial probit models instead. The preference for multinomial probit is largely due to a convenient Gibbs sampling algorithm introduced by McCulloch and Rossi (1994), which is a multivariate version of the probit regression algorithm of Albert and Chib (1993) (henceforth MRAC). The MRAC algorithm is a data augmentation scheme which alternates between simulating a latent multivariate Gaussian vector for each subject, given observed data and model parameters, and simulating model parameters given complete data. The MRAC algorithm is widely used, even though computationally superior methods have been developed (e.g. van Dyk and Meng, 2001; Liu and Wu, 1999). MRAC s appeal lies it its aesthetic simplicity, which derives from stochastically replacing the nonlinear probit likelihood with a complete data likelihood based on the identity link. The identity link allows model parameters to be simulated from closed form full conditional distributions. Consequently MRAC avoids the tuning constants required by most Metropolis-Hastings algorithms, which means that MRAC is a default method that can be implemented with little expertise on the part of the user. It is easy to program and easy to explain to clients. The analytically tractable complete data likelihood also makes it easy to embed probit regression models into more elaborate hierarchical or random effects models. The sampler introduced in this article is the natural extension of MRAC to multinomial logit models. Unlike MRAC, multinomial logit models do not admit a Gibbs sampler with closed form full conditional distributions. However it is possible to simulate a set of latent variables with mean g m (x β) and known, constant variance. The latent variables can be combined with frequentist asymptotic theory for linear models, or with the Bayesian method of moments (Zellner, 1997), to produce a closed form surrogate distribution that approximates the full conditional distribution of the model parameters given complete data. The draw from the surrogate distribution is filtered using a Metropolis-Hastings probability (Metropolis et al., 1953; Hastings, 1970) to produce a draw from the desired posterior distribution p(β X, y). The proposed sampler inherits MRAC s considerable aesthetic appeal, and several practical features as well. First, the method is simple to program, and it readily extends to complex stochastic systems which include multinomial logit models as embedded components. Second, the proposal distribution is tailored to the target distribution at each iteration of the sampler without invoking iterative root finding methods which might fail for computational reasons at some un-
2 lucky draw of β. Third, the sampler requires no tuning constants typically needed for random walk Metropolis-Hastings algorithms, so its burden on practitioners is minimal. Computationally, the sampler only evaluates the complete data likelihood once during each iteration. The complete data likelihood is simpler than the full multinomial logit likelihood because it avoids the multinomial logit normalizing constant. As a result the sampler is computationally faster than one scalar at a time sampling methods, such as adaptive rejection sampling, which require several likelihood evaluations per draw of β. The proposed method can handle both continuous and categorical covariates, so it is more flexible than methods based on the multinomial-poisson transformation (Baker, 1994; Spiegelhalter, Thomas, Best, and Gilks, 1996; Chen and Kuo, 2001). Finally, the proposed sampler can be modified to work with Poisson regression. The remainder of the article is structured as follows. Section 2. explains the latent exponential sampler in general terms, without reference to a specific form for g m (x i β). Section 3. reviews several subfamilies of multinomial logit models and explains how the sampler can be applied to each. Section 4. illustrates the algorithm on a real data set. Section 5. provides a concluding discussion. 2. The Latent Exponential Sampler Let E(λ) denote the exponential distribution with rate λ, and let Z = (z im ) denote a matrix of independent exponential random variables with rows z i = (z i0,...,z im ), where p(z im X, β) = E(λ im ) with λ im defined in (1). If y i = arg min m (z im ) then p(y i = m X, β) λ im, which is the multinomial logit model. Note that y is a deterministic function of Z, so that the complete data likelihood that would be obtained if Z were observed is n M 1 p(z X, β) = exp{g m (x i β m ) i=1 m=0 z im exp{g m (x i β m )}}. (2) which does not involve y. Equation (2) implies a convenient conditional independence property. Many variants of multinomial logit models are parameterized so that g m (x β) = g(x δ, β m ) where β m and β m are distinct for m m and δ is a parameter shared by all response levels. If the {β m } are independent in the prior distribution p(β X, δ) then they remain independent in p(β X, Z, δ). This property is absent from p(β X,y, δ) because of the normalizing constant in (1). The latent exponential sampler cycles between three steps: sampling Z from p(z X, y, β), proposing a new value of β from a surrogate for p(β X,Z) constructed using a transformation of Z, and promoting either the proposal or the current β according to a Metropolis-Hastings probability. 2.1 Sampling Latent Data Sampling Z from p(z X,y, β) is trivial because z 1,...,z n are conditionally independent given (β,x,y). To draw z i, first draw the minimal element z iyi from p(z iyi X,y, β) = E( m λ im ). Then the memoryless property of the exponential distribution implies z im = z iyi + z im with z im E(λ im ) independently for m y i. If the identifiability constraint g 0 (x β) = 0 is imposed then β is independent of z i0 in (2). However z i0 must be sampled during the data augmentation step to maintain the scale of the imputed variables. 2.2 Sampling β Equation (2) fails to suggest a closed form full conditional distribution for β, but a closed form surrogate exists. Any exponential random variable z E(λ) may be written z = e/λ, with e E(1). Thus log(z) = log(e) log(λ) has mean µ log(λ) and variance σ 2, where µ and σ 2 are the mean and variance of log(e). Because log(e) follows an extreme value distribution (Johnson et al., 1995) we have µ = γ, the negative of Euler s constant , and σ 2 = π 2 /6. In particular note that σ 2 does not depend on λ. Therefore u im µ log(z im ) [g m (x i β), σ 2 ], (3) with square brackets denoting a random variable s mean and variance. The random variables in U = (u im ) are independent observations with constant variance whose expected value is the linear predictor. Therefore, frequentist theory for linear models implies that ˆβ, the least squares estimate of β in a regression of U on X, has limiting distribution p(ˆβ X, β) N(β, V ), where V is a known function of X and σ 2. If the prior distribution for β is Gaussian, say β N(α, Σ), then a closed form surrogate for the full conditional distribution is p(β X, ˆβ) ) = N (Ω(Σ 1 α + V 1 ˆβ), Ω, (4) where Ω 1 = Σ 1 +V 1. In many cases equation (4) is the full conditional distribution that would be obtained if the observations in (3) were Gaussian with 2
3 the specified mean and variance. The same proposal distribution can be justified, without asymptotics, based on the maximum entropy principle using the Bayesian method of moments (Zellner, 1997). A candidate β is drawn from equation (4) and compared to the current β (t) through { f(β)/f(β (t) } ) α = min q(β)/q(β (t) ), 1 (5) where f(β) = p(β X,Z) and q(β) = p(β X, ˆβ). The candidate β is promoted to β (t+1) with probability α, otherwise β (t+1) = β (t). Note that the conditional independence properties of equation (2) are also present in (4). 2.3 Asymptotics The proposal distribution violates the likelihood principle because ˆβ is not a sufficient statistic for β. The cost of replacing the full conditional distribution p(β X, Z) with its method of moments approximation p(β X, ˆβ) can be seen by examining the limiting behavior of the two distributions as n. If y truly follows a multinomial logit model with parameter β 0, and if the prior distribution p(β) is such that it is eventually dominated by the likelihood, then both p(β X, ˆβ) and p(β X,Z) are asymptotically normal with mean β 0 (Le Cam and Yang, 2000). However, it is easy to show that the asymptotic variance of p(β X, ˆβ) is σ 2 times that of p(β X, Z). Figure 1 contrasts the proposal and full conditional distributions for β based on a simulated data set of 100 observations from the model z i E(exp(.2x i )) where x i U(0, 1). Figure 1 reminds us that the proposal and full conditional distributions can have different means in finite samples, even though the means would be the same in the limit. The inflated variance of the proposal distribution relative to the full conditional is readily apparent. Upon viewing Figure 1, one is tempted to replace the variance of p(β X, ˆβ) with the asymptotic variance of p(β X,Z) by simply setting σ 2 = 1 in the computer code that fits the model. However, notice that doing so places much smaller posterior mass near the true β 0 = 0.2 in Figure 1 than either the proposal or the full conditional distribution. In fact, we know that Metropolis-Hastings algorithms with heavy tailed proposals have desirable mixing properties (see, e.g. Mengersen and Tweedie, 1996), so the increased variance of p(β X, ˆβ) relative to p(β X, Z) is actually something of a blessing. In practice one prefers to inflate the tails of (4) even further, for example by replacing it with a multivariate T distribution with small (e.g. 3) degrees of freedom Full Cond. Proposal Adj. Var Figure 1: Comparing p(β X, Z) ( Full Cond. ) and p(β X, ˆβ) ( Proposal ) assuming the prior p(β) 1. Adj. Var. is the proposal distribution rescaled to have the same asymptotic variance as p(β X,Z). 2.4 Motivation The latent exponential sampler can be motivated by either of two primary means. The first is a utility maximization argument which has been heavily used in the econometrics literature since its introduction by McFadden (1974). See Train (2003) for a recent review and bibliography. Conceptually, u im represents the perceived utility of choice m for subject i, which is linearly related to x i. Then subject i chooses y i to maximize his perceived utility. The proposed sampler stochastically restores the unobservable utilities. Of course the utilities need not physically exist for the sampler to use them as a computational device. That is, multinomial logit models apply equally well to physical systems which lack rational decision makers exercising free will. The sampler may also be viewed as a multinomial-poisson transformation, a name given to the dual relationship between the multinomial and Poisson likelihoods. Several authors (Baker, 1994; Chen and Kuo, 2001; Spiegelhalter et al., 1996) have used the multinomial-poisson transformation to approximate p(β X, y) with a normal distribution based on a Poisson regression of y on X determined by iteratively reweighted least squares. However, the approximation involves an additional parameter for each distinct covariate pattern in X, which limits its usefulness when X contains continuously measured variables. The beta 3
4 latent exponential sampler views the exponential likelihood as primitive, rather than the Poisson. The advantage is that one can achieve linearity on the scale of the parameters by taking the log of exponential data. The same cannot be said for Poisson data, which has a positive probability of being zero. 3. Multinomial Logit Models To illustrate the variety of multinomial logit models with which the latent exponential sampler can be used, this Section reviews several versions and extensions of the model in equation (1) and explains how the sampler can be applied to each. The models discussed include multinomial logistic regression, discrete choice models, ordinal logit models, and additive multinomial logit models. More elaborate models, such as random effects models and partial credit models can also be accommodated, but are not discussed here due to space constraints (but see Scott and Ip, 2002). 3.1 Multinomial Logistic Regression Multinomial logistic regression sets g m (x i β) = β T m x i, where the {β m } are distinct across m with β 0 = 0 for identifiability. If u m denotes column m of U = (u im ), and if one assumes independent prior distributions β m N(α m, Σ m ) for each m, then the proposal distribution for β m is p(β m X, ˆβ) = N [ Ω m (Σ 1 m α m + X T u m /σ 2 ), Ω m ] where Ω 1 m (6) = (Σ 1 m + XT X/σ 2 ). Note that Ω m depends only on known quantities, so it only needs to be computed once. It depends on m only through the prior distribution. The {β m } can be treated separately because equation (2) implies their conditional independence given Z. That contrasts with the asymptotic variance obtained from the Hessian matrix of the observed multinomial logistic regression log likelihood 2 l n β β T = (diag π i π i π T i ) x ix T i, (7) i=1 where π i = (π i1,..., π im ) with π im = exp(λ im )/ M k=0 exp(λ ik). The Hessian matrix in (7) has M times as many rows and columns as the variance of the proposal distribution in (6). Thus, conditioning on Z replaces one large matrix factorization with M smaller ones differing only in the prior variance Σ m. 3.2 Discrete Choice Models Discrete choice models differ from ordinary multinomial logit models in that some covariates are response specific while others are subject specific. For example, if y i indicates the type of car purchased by customer i then the car s gas mileage is response specific, whereas the customer s age is subject specific. Response specific covariates shift a subscript from the coefficient to the covariate, which substantially reduces the dimension of the parameter space. Let x i denote the (p 1) vector of subject specific characteristics for observation i, and let w im denote the (q 1) vector of response specific characteristics for potential response m on observation i. Then one may write p(y i = m) exp(β T mx i + δ T w im ). (8) Assuming β 0 = 0 for identifiability, a convenient algorithm for sampling from p(β, δ X,y) is as follows. (1) Generate U = (u im ) as in Section 2.. (2) Sample δ from p(δ X,Z, β). (3) Sample β m from p(β m X,Z, δ) for m = 1,...,M. Step 2 can be accomplished by defining u (d) im = u im βmx T i, then stacking u (d) im into the nm 1 vector u(d), and wim T into nm q matrix W. Assuming independent Gaussian priors δ N(α d, Σ d ) and β m N(α m, Σ m ), the proposal distribution for δ is p(δ X, β, ˆδ) [ ] = N Ω d (Σ 1 d α d + W T u (d) /σ 2 ), Ω d, (9) where Ω 1 d = Σ 1 d + W T W/σ 2. Step 3 can be realized by creating u (b) im = u im δ T w im, and then proceeding as in Section Generalized Additive Multinomial Logit Models Generalized additive multinomial logit models (Abe, 1999) extend discrete choice models by assuming log λ im = p s mp (x ip β m ) + q s q (w imq δ) where s mp and s q are scalar functions of scalar arguments to be estimated by a spline or some other smoother indexed by the parameters β m and δ. Hastie and Tibshirani (2000) describe a Bayesian backfitting algorithm for fitting additive models under the linear link. Their advice for fitting generalized additive models under other link functions is to use the Metropolis algorithm but they offer no guidance on how to create appropriate proposal distributions. 4
5 By conditioning on Z, the latent exponential sampler splits log λ im into M +1 conditionally independent additive models under the identity link. Thus sampling δ and β can proceed as in Section 3.2, where a proposal distribution for δ or β m (for each m) is attained by one iteration of Hastie and Tibshirani s Bayesian backfitting algorithm. Note that if s mp and s q are splines then Bayesian backfitting algorithm simply applies the methods of Section 3.2 after a basis expansion of x i and w im, albeit with computational tricks to capitalize on the structure of the spline basis functions. 3.4 Ordinal Logit Models Ordinal logit models, which are described by McCullagh and Nelder (1989, Chapter 5.2.3) and Agresti (1990, Chapter 6), assume g m (x i ) = η m + s m γ T x i, where γ and η m are parameters and s m is a known, real-valued score assigned to level m. In practice one often sets s m = m unless a better alternative presents itself. These ordinal logit models are distinct from cumulative logit models, in which the support of an unobserved logistic distribution is partitioned into regions corresponding to the levels of the observed response (Johnson and Albert, 1999). Assuming the identifiability constraint s 0 = η 0 = 0, one may construct a proposal distribution for ordinal logit models as follows. Let s = (s 1,..., s M ) T, let X i = (sx T i, I) where I is the M M identity matrix, and set β = (γ T, η T ) T. Then g m (x i β) is row m of the vector X i β. Form the design matrix by stacking the X i, so that X = (X T 1,...,XT n) T, and form the response vector u analogously. Then the proposal distribution is obtained by regressing u on X with parameter β, as in (6). 4. Data Example In lieu of a simulation study, this Section considers a traditional multinomial logit problem where likelihood methods perform adequately. This section compares the latent exponential sampler to several random walk Metropolis samplers using data describing automobile preferences for n = 263 customers (Foster et al., 1998). This example assumes the multinomial logistic regression model with flat priors p(β m ) 1 for all m. The outcome variable is the type of car purchased (Family=0, Sporty=1, or Work=2). The covariates used are Age (in years), Sex (1 if Female, 0 if Male), and Marital Status (1 if Single, 0 if Married). All are subject specific variables. Note that this example precludes comparisons with the multinomial-poisson transformation because the data include Age, a continuous covariate. Table 1 and Figure 2 compare MCMC output for the latent exponential method to samplers labeled RW k, where k is the number of parameters simulated in each Metropolis proposal. The RW k samplers propose β (t+1) N[β (t), τ 2 I], where I is the k k identity matrix. Thus RW all proposes all elements of β in a single draw and accepts or rejects the entire parameter, RW p proposes, accepts, or rejects each β m vector individually, and RW 1 performs the Metropolis algorithm on each scalar element of β. Table 1 records the time required for each sampler to produce 10,000 iterations, the fraction of proposals that were accepted, and the estimated posterior means and standard deviations of each component of β. Figure 2 displays the corresponding MCMC sample paths for the coefficient of Sex on Sporty cars (plots of other coefficients are similar). The latent exponential sampler converges almost immediately, accepts a high fraction of proposed deviates, and closely agrees with maximum likelihood point estimates and standard errors. The computational speed of the latent exponential sampler compares favorably with the fastest random walk Metropolis algorithms. This is because most of the computational effort in Metropolis-Hastings samplers comes from evaluating likelihoods required to compute the acceptance probability. The latent exponential and RW all samplers each have only one such evaluation per iteration. The RW k samplers fare poorly in Table 1 and Figure 2. None have converged by the end of 10,000 iterations, which causes them to produce misleading estimates of posterior mean and standard error. The RW all samplers are computationally fast, but mix poorly because most of their proposals are rejected. To obtain an acceptance rate competitive with the latent exponential and hybrid methods, RW k must either choose τ to be small or k to be large, which slows the algorithm with respect to mixing, computational time, or both. 5. Discussion The latent exponential sampler is a convenient method for sampling the posterior distribution of general multinomial logit models. The sampler requires simulating a collection of exponential random variables Z, a minor computational burden. Conditioning on Z allows β to be sampled from approximate full conditional proposal distributions which require no tuning, without resorting to iterative root finding methods. The latent exponential method accommodates both categorical and continuous covariates and allows parameters of (g m ) which are distinct 5
6 Table 1: MCMC output summaries for several samplers, each run for 10,000 iterations on the auto choice data. Estimated posterior means are shown, with estimated posterior standard errors in parentheses. The first line of coefficients in each group is for sporty cars, the second is for work cars. Family cars are the baseline. Sampler Time (s) Accept % Intercept Age Sex Mar.Stat. RW 1 (τ =.05) (0.61) (0.02) (0.09) (0.35) (0.24) (0.01) (0.21) (0.18) RW 1 (τ =.01) (0.43) (0.02) (0.15) (0.55) (0.28) (0.01) (0.20) 0.16 RW M (τ =.05) (1.24) (0.04) (0.29) (0.36) (0.44) (0.02) (0.31) 0.30 RW M (τ =.01) (0.32) (0.02) (0.18) (0.47) (0.36) (0.01) (0.25) (0.24) RW MP (τ =.05) (1.14) (0.04) (0.31) (0.31) (1.12) (0.03) (0.43) (0.42) RW MP (τ =.01) (0.51) (0.02) (0.24) (0.36) (0.42) (0.01) (0.26) (0.34) Latent Exponential (0.92) (0.03) (0.31) (0.32) (0.95) (0.03) (0.34) (0.39) MLE (0.95) (0.03) (0.31) (0.32) (0.97) (0.03) (0.35) (0.38) 6
7 (a) RW 1 (τ =.05) (b) RW M (τ =.05) (c) RW MP (τ =.05) (d) RW 1 (τ =.01) (e) RW M (τ =.01) (f) RW MP (τ =.01) (g) Latent exponential Figure 2: Sample paths for the coefficient of Sex on Sporty cars The reference lines represent the maximum likelihood estimate computed from SAS, and ±2/ ± 3 standard errors. 7
8 across m to be independently sampled. There are several ways to motivate the latent exponential sampler. It is the natural generalization of Albert and Chib (1993) to multilevel categorical responses under the logit link. The utility restoration argument in Section 2. will be familiar to econometricians. One can also view method as a twist on the well established relationship between the multinomial and Poisson likelihoods, known at least since Birch (1965). By exploiting the duality between the Poisson and exponential distributions, the latent exponential method is able to replace the log link for the Poisson likelihood with the identity link. Conditioning on Z allows the modeler to work with the identity link by transforming imputed data instead of functions of model parameters, which removes a layer of complexity from the model. Finally, this article has focused on the fact that the latent exponential sampler is a convenient method for sampling from the posterior distribution of multinomial logit models. It may be the key to finding a rapidly mixing sampler as well. The parameter expansion methods described by van Dyk and Meng (2001) and Liu and Wu (1999) have produced much more rapidly mixing samplers in binomial probit regression models. I am currently investigating whether similar results can be obtained for multinomial logit models. References Abe, M. (1999). A generalized additive model for discrete-choice data. Journal of Business and Economic Statistics 17, Agresti, A. (1990). Categorical Data Analysis. Wiley. Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88, Baker, S. G. (1994). The multinomial-poisson transformation. The Statistician 43, Birch, M. W. (1965). The defection of partial association, II: The general case. Journal of the Royal Statistical Society, Series B, Methodological 27, Chen, Z. and Kuo, L. (2001). A note on the estimation of multinomial logit models with random effects. The American Statistician 55, Foster, D. P., Stine, R. A., and Waterman, R. P. (1998). Business Analysis Using Regression. Springer. Hastie, T. and Tibshirani, R. (2000). Bayesian backfitting (with discussion). Statistical Science 15, Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, Johnson, N. L., Kotz, S., and Balakrishnan, N. (1995). Continuous Univariate Distributions, vol. 2. Wiley Interscience, Somerset, NJ, 2nd edn. Johnson, V. E. and Albert, J. H. (1999). Ordinal Data Modeling. Springer-Verlag. Le Cam, L. M. and Yang, G. L. (2000). Asymptotics in Statistics: Some Basic Concepts. Springer-Verlag. Liu, J. S. and Wu, Y. N. (1999). expansion for data augmentation. Journal of the American Statistical Association 94, McCullagh, P. and Nelder, J. A. (1989). Generalized linear models (Second edition). Chapman & Hall. McCulloch, R. and Rossi, P. E. (1994). An exact likelihood analysis of the multinomial probit model. Journal of Econometrics 64, McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka, ed., Frontiers in Econometrics, Academic Press. Mengersen, K. L. and Tweedie, R. L. (1996). Rates of convergence of the Hastings and Metropolis algorithms. The Annals of Statistics 24, Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953). Equation of state calculations by fast computing machines. Journal of Chemical Physics 21, Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement 16, Scott, S. L. and Ip, E. H. (2002). Empirical Bayes and item clustering effects in a latent variable hierarchical model: A case study from the National Assessment of Educational Progress. Journal of the American Statistical Association 97, 458, Spiegelhalter, D. J., Thomas, A., Best, N. G., and Gilks, W. R. (1996). BUGS: Bayesian inference Using Gibbs Sampling, Version 0.5, (version ii). Train, K. E. (2003). Discrete Choice Methods with Simulation. Cambridge University Press, New York. Available at van Dyk, D. A. and Meng, X.-L. (2001). The art of data augmentation (disc: P ). Journal of Computational and Graphical Statistics 10, Zellner, A. (1997). The Bayesian method of moments (BMOM): Theory and applications. Advances in Econometrics 12,
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationBayesian Multivariate Logistic Regression
Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More informationBayesian Nonparametric Regression for Diabetes Deaths
Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,
More informationGibbs Sampling in Latent Variable Models #1
Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationA note on Reversible Jump Markov Chain Monte Carlo
A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationThe Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations
The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture
More informationStat 542: Item Response Theory Modeling Using The Extended Rank Likelihood
Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal
More informationLecture Notes based on Koop (2003) Bayesian Econometrics
Lecture Notes based on Koop (2003) Bayesian Econometrics A.Colin Cameron University of California - Davis November 15, 2005 1. CH.1: Introduction The concepts below are the essential concepts used throughout
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationA Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1
Int. J. Contemp. Math. Sci., Vol. 2, 2007, no. 13, 639-648 A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1 Tsai-Hung Fan Graduate Institute of Statistics National
More informationMarkov Chain Monte Carlo in Practice
Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationA Bayesian perspective on GMM and IV
A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all
More informationMarginal Specifications and a Gaussian Copula Estimation
Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required
More informationMULTILEVEL IMPUTATION 1
MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression
More informationOnline appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US
Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationSampling Methods (11/30/04)
CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with
More informationDynamic Generalized Linear Models
Dynamic Generalized Linear Models Jesse Windle Oct. 24, 2012 Contents 1 Introduction 1 2 Binary Data (Static Case) 2 3 Data Augmentation (de-marginalization) by 4 examples 3 3.1 Example 1: CDF method.............................
More informationIndex. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.
Index Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Adaptive rejection metropolis sampling (ARMS), 98 Adaptive shrinkage, 132 Advanced Photo System (APS), 255 Aggregation
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationMultivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal?
Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal? Dale J. Poirier and Deven Kapadia University of California, Irvine March 10, 2012 Abstract We provide
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationSubjective and Objective Bayesian Statistics
Subjective and Objective Bayesian Statistics Principles, Models, and Applications Second Edition S. JAMES PRESS with contributions by SIDDHARTHA CHIB MERLISE CLYDE GEORGE WOODWORTH ALAN ZASLAVSKY \WILEY-
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationOutline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models
Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University
More informationLecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH
Lecture 5: Spatial probit models James P. LeSage University of Toledo Department of Economics Toledo, OH 43606 jlesage@spatial-econometrics.com March 2004 1 A Bayesian spatial probit model with individual
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationBayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference
Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of
More informationDiscussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance
Discussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance by Casarin, Grassi, Ravazzolo, Herman K. van Dijk Dimitris Korobilis University of Essex,
More informationModelling geoadditive survival data
Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationREVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350
bsa347 Logistic Regression Logistic regression is a method for predicting the outcomes of either-or trials. Either-or trials occur frequently in research. A person responds appropriately to a drug or does
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationPlausible Values for Latent Variables Using Mplus
Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can
More informationIntroducing Generalized Linear Models: Logistic Regression
Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationDefault Priors and Efficient Posterior Computation in Bayesian Factor Analysis
Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis Joyee Ghosh Institute of Statistics and Decision Sciences, Duke University Box 90251, Durham, NC 27708 joyee@stat.duke.edu
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationRandom Effects Models for Network Data
Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of
More informationRon Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)
Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationNumerical Analysis for Statisticians
Kenneth Lange Numerical Analysis for Statisticians Springer Contents Preface v 1 Recurrence Relations 1 1.1 Introduction 1 1.2 Binomial CoefRcients 1 1.3 Number of Partitions of a Set 2 1.4 Horner's Method
More informationeqr094: Hierarchical MCMC for Bayesian System Reliability
eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167
More informationF denotes cumulative density. denotes probability density function; (.)
BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models
More informationThe Bayesian Approach to Multi-equation Econometric Model Estimation
Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationPACKAGE LMest FOR LATENT MARKOV ANALYSIS
PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,
More informationMONTE CARLO METHODS. Hedibert Freitas Lopes
MONTE CARLO METHODS Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu
More informationDiscussion of Maximization by Parts in Likelihood Inference
Discussion of Maximization by Parts in Likelihood Inference David Ruppert School of Operations Research & Industrial Engineering, 225 Rhodes Hall, Cornell University, Ithaca, NY 4853 email: dr24@cornell.edu
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationLongitudinal and Panel Data: Analysis and Applications for the Social Sciences. Table of Contents
Longitudinal and Panel Data Preface / i Longitudinal and Panel Data: Analysis and Applications for the Social Sciences Table of Contents August, 2003 Table of Contents Preface i vi 1. Introduction 1.1
More informationDAG models and Markov Chain Monte Carlo methods a short overview
DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex
More informationGeneralized Linear Models (GLZ)
Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the
More informationRank Regression with Normal Residuals using the Gibbs Sampler
Rank Regression with Normal Residuals using the Gibbs Sampler Stephen P Smith email: hucklebird@aol.com, 2018 Abstract Yu (2000) described the use of the Gibbs sampler to estimate regression parameters
More informationNon-Markovian Regime Switching with Endogenous States and Time-Varying State Strengths
Non-Markovian Regime Switching with Endogenous States and Time-Varying State Strengths January 2004 Siddhartha Chib Olin School of Business Washington University chib@olin.wustl.edu Michael Dueker Federal
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationKobe University Repository : Kernel
Kobe University Repository : Kernel タイトル Title 著者 Author(s) 掲載誌 巻号 ページ Citation 刊行日 Issue date 資源タイプ Resource Type 版区分 Resource Version 権利 Rights DOI URL Note on the Sampling Distribution for the Metropolis-
More informationBayesian Inference for DSGE Models. Lawrence J. Christiano
Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.
More informationBayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,
Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives Often interest may focus on comparing a null hypothesis of no difference between groups to an ordered restricted alternative. For
More informationOn the Optimal Scaling of the Modified Metropolis-Hastings algorithm
On the Optimal Scaling of the Modified Metropolis-Hastings algorithm K. M. Zuev & J. L. Beck Division of Engineering and Applied Science California Institute of Technology, MC 4-44, Pasadena, CA 925, USA
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationOnline Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access
Online Appendix to: Marijuana on Main Street? Estating Demand in Markets with Lited Access By Liana Jacobi and Michelle Sovinsky This appendix provides details on the estation methodology for various speci
More informationNon-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public
More informationDavid B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison
AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationBayesian Inference in the Multivariate Probit Model
Bayesian Inference in the Multivariate Probit Model Estimation of the Correlation Matrix by Aline Tabet A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science
More informationGibbs Sampling in Endogenous Variables Models
Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take
More informationMH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution
MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationCharles E. McCulloch Biometrics Unit and Statistics Center Cornell University
A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components
More informationKneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"
Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationControl Variates for Markov Chain Monte Carlo
Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability
More informationPetr Volf. Model for Difference of Two Series of Poisson-like Count Data
Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More information