STAT 425: Introduction to Bayesian Analysis
|
|
- Francis Bailey
- 6 years ago
- Views:
Transcription
1 STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
2 Part 3: Hierarchical and Linear Models Hierarchical models Linear regression models Generalized linear models (logistic and Poisson) Hierarchical linear and mixed models Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
3 Data augmentation techniques for binary responses Binary response case. Basic idea: re-expression of discrete-data regression models as unobserved (latent) continuous data. Aids interpretation and allows convenient MCMC sampling Used both for logistic and probit regression Albert and Chib (1993) demonstrated an auxiliary variable approach to simplify binary probit regression Introduce extra variables into model, z such that y = g(z); g any non-decreasing function for interpretability Can also be used for multinomial/ordinal data (see Hoff Chapter 12, Section 12.1) Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
4 Binary regression model Let us observe y i {0, 1} and a set of covariates, X i, i = 1,...n. y i = Bernoulli(g 1 (η i ) η i = X i β (1) β π(β) Probit regression: g(u) = Φ 1 (u), Normal CDF Logit regression: g(u) = logit(u) = u 1 u logit link Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
5 Probit link for binary outcome (chapter 12) The auxiliary variable formulation for binary outcomes assumes that a continuous latent variable z i exists such that The latent value z i is related to the binary y i via { yi = 1 if z i > 0 y i = 0 if z i 0 Associated with the i-th response, the values of k covariates x i1,..., x ik are observed. The latent value z i is related to the k covariates by the normal regression model z i = x i1 β x ik β k + ε i ε i N(0, 1) Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
6 Then we can show that p(y i = 1 β) = p(y i = 1 z i > 0, β)p(z i > 0 β) + p(y i = 1 z < 0, β)p(z i < 0 β) = 1 p(z i > 0 β) + 0 p(z i < 0 β) = p(z i η i > η i β) = Φ(η i ). with η i = (x i1 β x ik β k ) and where Φ() is the cdf of a standard normal distribution. The latent values z i are viewed as additional parameters. Gibbs sampling can be used to obtain posterior draws of β and z = (z 1,..., z n ). Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
7 If we place a uniform prior on β, p(β) 1 the full conditionals are given by: ( β z, X, y N k (X T X) 1 X T z, (X T X) 1) { N (xi β, 1) I{z z i β, X, y i > 0} if y i = 1 N (x i β, 1) I{z i 0} if y i = 0 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
8 If we want to specify, instead, a normal prior density for β its full conditional becomes β N (0, S 0 ) β z, X, y N k ( (X T X + S 1 0 ) 1 X T z, (X T X + S 1 0 ) 1) Note: Sampling from truncated normal density, y N(µ, σ 2 ) I(a < y < b), via the inverse CDF transformation method: 1 Setting u 1 = Φ(a; µ, σ 2 ) and u 2 = Φ(b; µ, σ 2 ) 2 Sampling u U(u 1, u 2 ) 3 Setting y = Φ 1 (u; µ, σ 2 ) Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
9 Example: Donner Party (from Bayesian computation using R by Jim Albert). The Donner Party was a group of American pioneers who set out for California in a wagon train. They spent the winter of snowbound in the Sierra Nevada. The first relief party did not arrive until the middle of February 1847, almost four months after the wagon train became trapped. Forty-eight of the 87 members of the party survived to reach California. The dataset donner.dat contains the age (in years), gender (MALE) and survival status (1 if survived) for 45 members of the Donner Party. We want to fit the probit model for π i = P (y i = 1) Φ 1 (π i ) = β 0 + β 1 MALE i + β 2 AGE i Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
10 donner = read.table("donner.dat", header=t, sep="\t") y = donner$survival; n=length(y) X = as.matrix(cbind(rep(1, n), donner[,1:2])) k=dim(x)[2] library(mass) T=10000; BETA=matrix(NA, T, k); Z=matrix(NA, T, n) set.seed(1) # initial value z = rnorm(n, 0, 1) Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
11 # Implement Gibbs sampler for(t in 1:T) { # Update beta vb = solve(t(x)%*%x); mb = vb%*%t(x)%*%z beta = mvrnorm(1, mb, vb) # Update z_i s for(i in 1:n) { if(y[i]==1) z[i]=rtruncnorm(1,x[i,]%*%beta,1,0,inf else z[i]=rtruncnorm(1, X[i,]%*%beta, 1, -Inf, 0) } BETA[t,]=beta; Z[t,] = z } Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
12 Let s calculate some posterior summaries: nburn=1000 > apply(beta[(nburn+1):t,], 2, mean) [1] > apply(beta[(nburn+1):t,], 2, sd) [1] > apply(beta[(nburn+1):t,],2,quantile,c(0.025,0.975)) [,1] [,2] [,3] 2.5% % Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
13 Let s compare the results with a maximum likelihood fit of the probit model: fit.probit = glm(survival., family=binomial(link=probit), data=donner) summary(fit.probit) Call: glm(formula = survival., family = binomial(link = probit), data = donner) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) * age * male * Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
14 Model Fit and Model Choice (Hoff - chapter 9.3) Good statisticians question whether their model is an adequate approximation to reality. A chosen model may not be the best model to fit the data. For example, a different set or combination of the available covariates should be considered. This is a problem of model selection. A first attempt to model and prior criticism considers analyzing the adequacy of our fit by using common regression diagnostic tools, e.g. by inspecting the residuals of the fit provided by the posterior mean: with ˆβ = E(β y). ˆɛ i = y i x i ˆβ, Alternatively, one could think at more formal, and perhaps more Bayesian, ways to compare models. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
15 Why variable selection? Avoid the use of redundant variables (problems with interpretations) Inclusion of un-necessary terms yields less precise estimates, particularly if explanatory variables are highly correlated with each other reduced MSE: reduced variance but possibly higher bias It is too expensive to use all variables Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
16 Model selection criteria Model selection criteria have been devised to compare different models. Kadane and Lazar (2004) review model selection from Bayesian and frequentist perspectives. In regression settings, the cndidate models are distinguished by different covariate combinations or transformations of predictor variables: 1. Adjusted R 2 2. Stepwise regresison 3. Regularation (Ridge, LASSSO) 4. Akaike Information criterion (AIC) 5. Bayesian Information Criterion (BIC) 6. Deviance Information Criterion (DIC) 7. Watanabe-Akaike information criteria (WAIC) 8. Log pseudo marginal likelihood (LPML) 9. Bayes Factors Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
17 Model Selection In Linear regression, one may want to understand what predictors (and models) fit the data best. For example, we can consider the 6 models below: Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
18 Variable Selection in Linear Models A prior on the regression coefficient that is often considered is the spike and slab prior, which is a mixture prior: β i γ δ 0 ( ) + (1 γ) N(0, b) for large b. This prior sets β i = 0 if γ = 1 and draws β i N(0, b) if γ = 0. The variable γ is a latent auxiliary variable such that γ Bern(π), with π (0, 1) The spike-and-slab prior achieves dimensional reduction: a variable is included in the model if P (γ = 1 data) > λ for some threshold lambda. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
19 With the point masses, a subset of predictors can be excluded with positive probability, which can be treated as direct shrinkage to zero. The continuous components in the prior also pull the coefficients included in the model towards their prior centers, which are usually zero, to achieve another layer of shrinkage. Indeed, instead of N(0, b), one can use the g-prior β g, τ γ δ 0 ( ) + (1 γ) N(0, g τ (X X) 1 ) Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
20 Bayes Factors Suppose we have two models, M 0 and M 1, with sampling density: f(y θ 0, M 0 ) & f(y θ 1, M 1 ), The two models and the vectors θ 0 and θ 1 may not have anything in common, the two parameters need not even have the same dimension. We have prior distributions on the parameters under the two models: p M0 (θ 0 ) & p M1 (θ 1 ) The Bayes factor (B 01 ) is calculated as the ratio of the marginal distributions of the data p(y M 0 ) = f(y θ 0, M 0 ) p M0 (θ 0 )dθ 0 Θ 0 and p(y M 1 ) = f(y θ 1, M 1 ) p M1 (θ 1 )dθ 1 Θ 1 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
21 Bayes Factors By Bayes Theorem, the posterior probability of model M j, j = 0, 1 is P (M j y) = We can then consider the posterior odds: p(y M j ) p(m j ) f(y M 0 )p(m 0 ) + p(y M 1 ) p(m 1 ) P (M 0 y) P (M 1 y) = p(y M 0) p(m 0 ) p(y M 1 ) p(m 1 ) p(m 0 ) = BF 01 p(m 1 ) If p(m 0 ) = p(m 1 ) = 1 2, so P (M 0 y) P (M 1 y) = BF 01. Also, usually one looks at LBF = log(bf 01 ) or 2LBF. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
22 Bayes Factors - Strength of evidence The Bayes Factor can be used in general also to test two competing hypotheses, besides two competing models. Traditionally, strength of evidence for model M 0 (or hypotheses M 0 ) is decided based on the following table for BFs (Kass and Raftery, 1985): 2LBF Strength of evidence 0 to 2 not really worth considering 2 to 6 positive 6 to 10 strong > 10 very strong Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
23 Bayesian Model Choice Models for the variable selection problem are based on a subset of the X 1,... X p variables Encode models with a vector γ = (γ 1,... γ p ) where γ j {0, 1} is an indicator for whether variable X j should be included in the model M γ. γ j = 0 β j = 0 Each value of γ represents one of the 2 p models. Under model M γ : Y α, β, σ 2, γ N(1α + X γ β γ, σ 2 I) Where X γ is design matrix using the columns in X where γ j = 1 and β γ is the subset of β that are non-zero.
24 Posterior Probabilities of Models Posterior model probabilities p(m j Y) = p(y M j)p(m j ) j p(y M j)p(m j ) Marginal likelihod of a model is proportional to p(y M γ ) = p(y β γ, σ 2 )p(β γ γ, σ 2 )p(σ 2 γ)dβ dσ 2 Bayes Factor BF [i : j] P(M i Y) P(M j Y) = p(y M i) p(y M j ) P(M i) P(M j ) Posterior Odds = Bayes Factor Prior odds Probability β j 0: M j :β j 0 p(m j Y) (marginal posterior inclusion probability)
25 Zellner s g-prior within Models Centered model: Y = 1 n α + X c γβ γ + ɛ Common parameters p(α, φ) φ 1 Model Specific parameters β γ α, φ, γ N(0, gφ 1 (X c γ X c γ) 1 ) Marginal likelihood of M γ is proportional to p(y M γ ) = C(1 + g) n p 1 2 (1 + g(1 Rγ)) 2 (n 1) 2 where R 2 γ is the usual R 2 for model M γ and C is a constant that is p(y M 0 ) (model with intercept alone) uniform distribution over space of models p(m γ ) = 1/(2 p )
26 Computing the Bayes Factors for large p In many cases, the Bayes factors can be computed from a single posterior sample. it is very easy to compute both the numerator and the denominator of the Bayes Factor, by using post-mcmc compositional sampling (Monte Carlo) techniques based on the output of the MCMC chains. One of the neat features of Bayes factors is their transitivity. If I know that Model A outperforms Model B by 3, and I know that Model B outperforms Model C by 4, then I know that Model A outperforms Model C by 3 4 = 12. On the other hand, they are not defined with improper priors. One criticism of Bayes Factors is the (implicit) assumption that one of the competing models (M1 or M2) is correct. For complex models, the post-mcmc compositional sampling (Monte Carlo) may be very inefficient and computationally costly. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
27
28 Hierarchical Linear and Mixed Models - Outline Hierarchical regression models Generalized linear mixed models Examples Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
29 Hierarchical Regression Models (chapter 11) Hierarchical/multilevel models are used with nested designs (multiple levels or sampling) or with clustered/correlated observations within groups. Example: Data on education system (students within schools within districts). Hierarchical linear models extend hierarchical models to situations where (i) a regression model describes within-group variation and (ii) a multivariate normal distribution captures heterogeneity among regressions (naive regressions unreasonable). Recall example on math scores for 10th grade students from 100 schools: (i) We estimated school-specific expected math scores and (ii) assessed variation of the estimates across schools. With hierarchical linear models we can model the relationship between math scores and other variables (SES), assuming the relationship is linear and that it varies from school to school. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
30 Example: math score & SES Regress math scores on socioeconomic status. Center SES scores within each school (intercepts school-level averaged math scores) Figure: LS regression lines and plots of estimates versus group sample size. Individual regressions not optimal (want to borrow strength across schools, especially for small sample sizes) Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
31 Random Effects Model (simple case) For Y = (y 1,..., y n ) falling into m groups, we can write a simple Random Effects model as Y β, Σ N(Xβ, σ 2 I) β θ, s 2 N(θ, s 2 I) s 2 0 implies all β i s are equal. s 2 implies all β i s are unrelated. Check the posterior is not sensitive to priors for s 2. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
32 Random Effects Model (general case) For Y = (y ij ), i = 1,..., n (observations) and j = 1,..., m (groups), we can write a general Random Effects model as y ij = β T j x ij + ɛ ij, ɛ ij iid normal (0, σ 2 ) β 1,..., β m N(θ, Σ) With Y j = (y 1j,..., y nj j) we have Y j N(X j β j, σ 2 I) Notice exchangeability assumptions Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
33 Mixed Effects Model Linear Mixed Effects Models have fixed effects (parameters for entire population) and random effects (parameters for smallers units sampled from the population). Reparameterize previous model as β j = θ + γ j, with γ 1,..., γ m N(0, Σ) then we have y ij = βj T x ij + ɛ ij = θ T x ij + γ T x ij + ɛ ij with θ the fixed effect and γ 1,..., γ m the random effects Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
34 The mixed effects model in the more general Laird-Ware form is: y ij = θ T x ij + γ T z ij + ɛ ij γ j N(0, Σ), ɛ ij N(0, σ 2 ) where x ij and z ij can be vectors of different length and with overlapping/non-overlapping variables. Typically x ij contains group-specific predictors (constant within groups) while z ij contains effects specific to subunit i that can be thought of as extra error terms inducing intra-cluster dependence. Note: Random and fixed is confusing to a Bayesian (all parameters are random). Refer to fixed effect coefficients as those which are constant for all subjects, and to random effect coefficients as those which are subject-specific. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
35 Examples Random intercepts, (Xβ + Zu) ij = β 0 + u i + β 1 x ij, cov(u) = σ 2 Random intercepts and slope, (Xβ + Zu) ij = β 0 + u i + (β 1 + v i )x ij, cov(u) = Σ 2 2 with i = 1,..., n and j = 1,..., n i Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
36 Prior Model Semi-conjugate priors (see multivariate normal model) lead to a straightforward Gibbs sampler θ N(µ 0, Λ 0 ) Σ IW (η 0, S 1 0 ) σ 2 IG(ν 0 /2, ν 0 σ 2 0/2) Full conditional of β 1,..., β m : Multivariate normal Full conditional of θ: Multivariate normal Full conditional of Σ: Inverse Wishart Full conditional of σ 2 : Inverse gamma Prior on θ usually flat. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
37 Example (continued) Regress math scores on socioeconomic status. Center SES scores within each school (intercepts school-level averaged math scores) Figure: Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
38 Generalized Linear Mixed Models GLMM combine GLM with LMM. For data with a hierarchical structure where the normal model is not an appropriate within-group model (for example, data as counts or binary). For m groups: β 1,..., β m N(θ, Σ) f(y j X j, β, φ) = f(y ij β T x ij, φ) with f(y j ) a density with mean that depends on β T x and where we assume exchangeability across groups. More generally: n j i=1 Y u exp(y (θ T X + γ T Z) 1 b(θ T X + γ T Z) + 1 c(y )) γ N(0, Γ) where b( ) varies with the model (e.g. b(x) = exp(x) for Poisson). Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
39 Bayesian GLMM Assume diffuse or non-informative priors for the fixed effects β. Need MCMC due to the intractable form of the posteriors and the marginals: Full conditionals for (θ, Σ) Metropolis step for β j with normal proposal centered in previous value and with var-cov equal to a scaled version of sampled Σ (s) More readings: Hierarchical centering of certain parameters (Gelfand, Sahu and Carlin 1995) and data-augmentation methods for non-conjugate priors (van Dyk and Meng 2001). Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
40 Examples of GLMM Poisson regression Example: Africa data Logistic regression model Example: Seeds data Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
41 Seeds data Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which occur at the community level. To gain insight into species responses, a sample of seeds were selected from a suite of eight species selected to represent the range of regeneration types which occur in this community. This representative community was then placed in experimental plots manipulated to mimic the natural variation in light conditions found in rain forests. Mammals were excluded from one half of each plot in order to assess their effects on the regeneration of rain forest trees. Six seeds of each type were planted and an indicator of whether they germinated and survived was recorded. Which variables are important in determining whether a seedling will survive? Are there interactions that influence survival probabilities? Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
42 Variables: SURV: Survival (No = 0, Yes = 1) of seedling. Indicator of whether there was a seedling present at the end of the observation period. GAP: 0,1 Indicator for understory versus clearing CAGE 0,1 (Absent/Present) Enclosure to prevent mammals from eating the seeds LITTER: (different levels = 0,1,2,4) SPECIES = (names on slides). Size= 1 smallest to 8 largest E = Epigeal - cotyledons, H=Hypogeal - food reserves in seed. Epigeal species rely on the cotyledons for photosynthesis and production of energy to become estabished. Seed size tends to be small, with little reserves in the seeds. Hypogeal species tend to have larger seeds, and can rely on reserves in the seed to produce energy, thus if initial leaves are lost to predators, there may still be additional reserves that can be used to produce additional leaves. Larger seeds, are easier to spot by predators. LIGHT measure of light levels at the forest floor Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
43 The dataset seeds.txt includes columns in the following order: PLOT (number 1 to 8) SUBPLT (within the plot) SPECIES (character string with names above) IND (seeding number within plot/subplot) SURV (indicator of survival) GERM (indicator for germination) ESTAB (intermediate measure of survival germination) LIGHT (measure of light at the forest floor for the plot - observational) LITTER (ordered categorical variable (manuplated litter levels) CAGE (indicator of enclosure) GAP (indicatory for clearing in forest - observation) Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
44 Graduate level courses on Bayesian Statistics More on MCMC Formal derivations of posterior distributions Nonparametric regression Bayesian survival analysis Bayesian spatial analysis Multicomparison testing Bayesian time Series Graphical models Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 3) Fall / 40
Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017
Binary Regression GH Chapter 5, ISL Chapter 4 January 31, 2017 Seedling Survival Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationModel Choice. Hoff Chapter 9. Dec 8, 2010
Model Choice Hoff Chapter 9 Dec 8, 2010 Topics Variable Selection / Model Choice Stepwise Methods Model Selection Criteria Model Averaging Variable Selection Reasons for reducing the number of variables
More informationNovember 2002 STA Random Effects Selection in Linear Mixed Models
November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear
More informationGibbs Sampling in Latent Variable Models #1
Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor
More informationHypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33
Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationModel Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015
Model Choice Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci October 27, 2015 Topics Variable Selection / Model Choice Stepwise Methods Model Selection Criteria Model
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationBayesian Model Averaging
Bayesian Model Averaging Hoff Chapter 9, Hoeting et al 1999, Clyde & George 2004, Liang et al 2008 October 24, 2017 Bayesian Model Choice Models for the variable selection problem are based on a subset
More informationGibbs Sampling in Linear Models #2
Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks
(9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate
More informationBayesian Model Diagnostics and Checking
Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationBayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,
Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives Often interest may focus on comparing a null hypothesis of no difference between groups to an ordered restricted alternative. For
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationThe linear model is the most fundamental of all serious statistical models encompassing:
Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationBayesian Multivariate Logistic Regression
Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationPartial factor modeling: predictor-dependent shrinkage for linear regression
modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework
More informationHierarchical Linear Models
Hierarchical Linear Models Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin The linear regression model Hierarchical Linear Models y N(Xβ, Σ y ) β σ 2 p(β σ 2 ) σ 2 p(σ 2 ) can be extended
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationA Fully Nonparametric Modeling Approach to. BNP Binary Regression
A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public
More informationAccounting for Complex Sample Designs via Mixture Models
Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3
More informationSTAT 740: Testing & Model Selection
STAT 740: Testing & Model Selection Timothy Hanson Department of Statistics, University of South Carolina Stat 740: Statistical Computing 1 / 34 Testing & model choice, likelihood-based A common way to
More informationPenalized Loss functions for Bayesian Model Choice
Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented
More informationNELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation
NELS 88 Table 2.3 Adjusted odds ratios of eighth-grade students in 988 performing below basic levels of reading and mathematics in 988 and dropping out of school, 988 to 990, by basic demographics Variable
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationBayes: All uncertainty is described using probability.
Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationLecture 16: Mixtures of Generalized Linear Models
Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently
More informationMULTILEVEL IMPUTATION 1
MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression
More informationBayesian model selection for computer model validation via mixture model estimation
Bayesian model selection for computer model validation via mixture model estimation Kaniav Kamary ATER, CNAM Joint work with É. Parent, P. Barbillon, M. Keller and N. Bousquet Outline Computer model validation
More informationBayesian Regression (1/31/13)
STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed
More informationStandard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j
Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationBayesian GLMs and Metropolis-Hastings Algorithm
Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationModels for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data
Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationLinear Regression With Special Variables
Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationModule 11: Linear Regression. Rebecca C. Steorts
Module 11: Linear Regression Rebecca C. Steorts Announcements Today is the last class Homework 7 has been extended to Thursday, April 20, 11 PM. There will be no lab tomorrow. There will be office hours
More informationDynamic Generalized Linear Models
Dynamic Generalized Linear Models Jesse Windle Oct. 24, 2012 Contents 1 Introduction 1 2 Binary Data (Static Case) 2 3 Data Augmentation (de-marginalization) by 4 examples 3 3.1 Example 1: CDF method.............................
More informationFrailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.
Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationAn Introduction to Bayesian Linear Regression
An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,
More informationLecture Notes based on Koop (2003) Bayesian Econometrics
Lecture Notes based on Koop (2003) Bayesian Econometrics A.Colin Cameron University of California - Davis November 15, 2005 1. CH.1: Introduction The concepts below are the essential concepts used throughout
More informationBayesian Nonparametric Regression for Diabetes Deaths
Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,
More informationEstimating Sparse High Dimensional Linear Models using Global-Local Shrinkage
Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline
More informationModel Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection
Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist
More informationST 740: Model Selection
ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationST 740: Linear Models and Multivariate Normal Inference
ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /
More informationMarginal Specifications and a Gaussian Copula Estimation
Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationGibbs Sampling in Endogenous Variables Models
Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take
More informationSTAT 705 Generalized linear mixed models
STAT 705 Generalized linear mixed models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 24 Generalized Linear Mixed Models We have considered random
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationBayesian and frequentist cross-validation methods for explanatory item response models. Daniel C. Furr
Bayesian and frequentist cross-validation methods for explanatory item response models by Daniel C. Furr A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of
More informationBayesian Analysis of Latent Variable Models using Mplus
Bayesian Analysis of Latent Variable Models using Mplus Tihomir Asparouhov and Bengt Muthén Version 2 June 29, 2010 1 1 Introduction In this paper we describe some of the modeling possibilities that are
More informationLinear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model
Regression: Part II Linear Regression y~n X, 2 X Y Data Model β, σ 2 Process Model Β 0,V β s 1,s 2 Parameter Model Assumptions of Linear Model Homoskedasticity No error in X variables Error in Y variables
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationUsing Bayesian Priors for More Flexible Latent Class Analysis
Using Bayesian Priors for More Flexible Latent Class Analysis Tihomir Asparouhov Bengt Muthén Abstract Latent class analysis is based on the assumption that within each class the observed class indicator
More informationChapter 1. Modeling Basics
Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical
More informationPart 7: Hierarchical Modeling
Part 7: Hierarchical Modeling!1 Nested data It is common for data to be nested: i.e., observations on subjects are organized by a hierarchy Such data are often called hierarchical or multilevel For example,
More informationIndex. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.
Index Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Adaptive rejection metropolis sampling (ARMS), 98 Adaptive shrinkage, 132 Advanced Photo System (APS), 255 Aggregation
More informationAn Introduction to Path Analysis
An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving
More informationAMS-207: Bayesian Statistics
Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of
More informationA Bayesian Probit Model with Spatial Dependencies
A Bayesian Probit Model with Spatial Dependencies Tony E. Smith Department of Systems Engineering University of Pennsylvania Philadephia, PA 19104 email: tesmith@ssc.upenn.edu James P. LeSage Department
More informationAn Introduction to Mplus and Path Analysis
An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression
More informationA general mixed model approach for spatio-temporal regression data
A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression
More informationDepartamento de Economía Universidad de Chile
Departamento de Economía Universidad de Chile GRADUATE COURSE SPATIAL ECONOMETRICS November 14, 16, 17, 20 and 21, 2017 Prof. Henk Folmer University of Groningen Objectives The main objective of the course
More informationA Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1
Int. J. Contemp. Math. Sci., Vol. 2, 2007, no. 13, 639-648 A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1 Tsai-Hung Fan Graduate Institute of Statistics National
More informationLecture 13 Fundamentals of Bayesian Inference
Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationWU Weiterbildung. Linear Mixed Models
Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes
More information