A Workshop on Bayesian Nonparametric Regression Analysis

Size: px
Start display at page:

Download "A Workshop on Bayesian Nonparametric Regression Analysis"

Transcription

1 A Workshop on Bayesian Nonparametric Regression Analysis George Karabatsos University of Illinois-Chicago Methodological Illustration Presentation (90 min.) Modern Modeling Methods (M 3 ) Conference University of Connecticut Neag School of Education Wednesday, May 21, :00-2:30pm Supported by NSF-MMS Research Grant SES

2 Workshop Outline I. Introduction (20 minutes). A. Regression is ubiquitous in applied social science research. B. Review linear regression by least-squares (LS). C. Review Bayesian approach to linear regression: Prior, Data, Posterior (Denison et al. 2002). 1. Bayes estimate of (, 2 ): Combination of LS estimate of (, 2 ), and information from the prior (, 2 ). 2. MCMC methods to estimate posterior, (, 2 Data). 3. Extensions of Bayesian linear model for binary regression (probit and logit), ordinal regression, and HLM/multilevel model. 2

3 Workshop Outline (continued) I. Introduction (20 minutes) (Continued). D. Why Bayesian nonparametric (BNP) regression? Compared to linear models, BNP regression models: 1. Provides more modeling flexibility for real data sets that are typically more complex than what can be adequately described by a linear model. 2. Provides a richer approach to data analysis, compared to linear models which focus only on the mean of Y x. A BNP regression model provide inference as to how the distribution of Y x depends on x, and/or provide flexible modeling of random effects. 3. BNP regression model has an infinite number of parameters. The estimation of the parameters of these models, from data, can be adequately handled by the Bayesian approach to statistical inference. 3

4 Workshop Outline (continued) II. BNP regression models (30 min.) A. General Overview of Bayesian Inference and MCMC. 1. Prior-to-posterior updating. 2. Posterior predictive inference. 3. Posterior predictive model fit evaluation. 4. MCMC methods, and convergence assessment. 4

5 Workshop Outline (continued) II. BNP regression models (30 min.) B. BNP Regression, based on infinite mixtures 1. General BNP (density) regression model a) General Dependent Dirichlet process (DDP) b) Dirichlet process (DP) c) General stick-breaking process d) Other normalized random measures 2. ANOVA/Linear DDP model (De Iorio et al. 2004) 3. Infinite probits model (Karabatsos & Walker, 2012b) 4. Infinite probits model, with automatic covariate selection 5

6 III. Workshop Outline (continued) How to apply BNP regression models (40 minutes), for data analysis, using my Bayesian Regression Software. A. PIRLS data (students nested within schools) 1. Dependent variable: reading achievement. Covariates of the student, teacher, classroom, & school. 2. Binary regression, and ordinal regression. 3. Multilevel BNP regression analysis via the DDP. 4. Causal analysis, regression discontinuity design (effect of class size on student achievement). B. Larynx data: Censored survival data (survival analysis). C. School Calendar data: Meta-analysis. D. Item response data (NAEP). BNP-IRT analysis. (All data applications above are time-permitting). IV. Conclusions. 6

7 Regression and the Social Sciences Social science research questions usually have the form: How does Y depend on x = (x 1,, x p )? Examples of such questions: Association of Y and X 1, given X 2 = x 2,, X p = x p? Association of Y and (X 1, X 2 ) given X 3 = x 3,, X p = x p? Which of (X 1,, X p ) are important predictors of Y? How item responses Y ij relate to examinee ability and to item difficulty? As in Item Response Theory (IRT). (x: Indicators (0,1) of examinees; Indicators (0, 1) of items.) How is Y causally effected by X 1? How is Y causally effected by X 1, given X 2 = x 0? with some Y observations censored (survival analysis) 7

8 Regression and the Social Sciences Social science research questions usually have the form: How does Y depend on x = (x 1,, x p )? Often of interest to investigate this dependence based on one or more features of the distribution of Y: Mean of Y (common approach; as in linear regression) Variance of Y (for variance regression) Median of Y (for median regression) Quantiles of Y (for quantile regression; incl. median reg.) Survival function of Y (for survival analysis) Hazard function of Y (for survival analysis) p.d.f. of Y (for density regression) c.d.f. of Y (for density regression) 8

9 Linear Regression (least-squares) Linear model: y i = x i β + ε i, ε i N(0, σ²), i = 1,, n; i.e., f(y i x i ; ) = n(y i x i β, σ²), i = 1,, n. Notation: x i β = x x p x p ( x = (1, x 1,, x p ) ) σ² is variance of the regression errors (the ε i ). N(, σ²) is normal distribution, with mean and variance parameters, (, σ²), and with probability density function (p.d.f.) n(, σ²) (p.d.f is a bell-shaped curve). f(y x; ) is the likelihood of y conditionally on x, and given all model parameters,. For linear regression model, = (β, σ²). 9

10 Linear Regression (least-squares) Linear model: y i = x i β + ε i, ε i N(0, σ²), i = 1,, n; i.e., f(y i x i ; ) = n(y i x i β, σ²), i = 1,, n. Given sample data D n = {(y i, x i )} i =1:n, the maximum-likelihood estimate (MLE) of (β, σ²) is defined by:, 2 arg max n i 1 n y i x i β,σ², 2 p 1 For linear model, MLE given by Ordinary Least-Squares (OLS) X X 1 X y and 2 1 n n i 1 y i x i 2 where: X = ((1, x i )) n (p+1) and y = (y 1,, y n ). 10

11 Bayesian Linear Regression Model f(y i x i ; ) = n(y i x i β, σ²), i = 1,, n β σ² ~ N p+1 (m, σ²v) σ² ~ IG(a, b) (a = shape, b = rate) Prior density: (, 2 ) = n p+1 ( m, σ²v)ig(σ² a, b) n p+1 ( m, ) density function of (p + 1)-variate normal N p+1 (0, ) ig( a, b) density funct. of inverse-gamma distribution IG(a, b). The data, D n = {(y i, x i )} i =1:n, update the prior (, 2 ) to a Posterior density:, 2 D n n i 1 n i 1 n y i x i, 2, 2 n y i x i, 2 d, 2 Posterior distribution (density), (, 2 D n ), conveys the probable values of the model parameters (, 2 ), given the data (D n ) and prior (, 2 ). 11

12 Bayesian Linear Regression Model Can estimate posterior distribution (density) π(β, σ² D n ) by constructing a Markov (MCMC) chain {(β, σ²) (s) } s=1:s. First, initialize with (β, σ²) (0) = (β, σ²) and stage s = 1. Then this chain is constructed by sampling/drawing from the following full conditional posterior distributions (f.c.p.d.s), in the following Sampling Steps: Sampling Step 1. Draw β X, y, σ² N(m *, σ²v * ). Sampling Step 2. Draw σ² X, y, β IG(a *, b * ). Then set (β, σ²) (s) = (β, σ²), s = s + 1, and repeat Steps if s < S. Notation: V * = (V 1 + X X) 1, m * = V * (V 1 m + X y), a * = a + n/2, b * = b + (1/2)(m V 1 m + y y (m ) (V ) 1 m ). As S, the chain {(β, σ²) (s) } s=1:s converges to samples from the posterior distribution (density), π(β, σ² D n ). 12

13 OLS/MLE & Bayesian Connections Recall that the prior density of the linear model is: (, 2 ) = n p+1 ( m, σ²v)ig(σ² a, b). If the prior density is non-informative or neutral, i.e., with m = 0, V 1 0, a = b = 0, so that the density (, 2 ) is flat over the range of (, 2 ), then the mode of the posterior density π(β, σ² D n ) coincides with the OLS/MLE estimator,, 2, and m * = = E[ D n ] = V * X y, with V * = (X X) 1. Then sampling distribution of (frequentist statistics) coincides with the posterior distribution of β, given 2 2 N m, 2 V 13

14 Extensions of Bayesian Linear Model Extension to a binary dependent variable, Y {0, 1}: Likelihood: f(y x; ) = (x β / σ) y [1 (x β /σ)] 1 y with ( ) = Pr[Y * < ]. Probit model: ( ) is c.d.f. of normal N(0,1) distribution. Logit model: ( ) = exp( )/[1 + exp( )] is Logistic(0,1) c.d.f. MCMC estimation methods are the same as in the linear model, after adding a MCMC sampling step that draws: y i D n, N(x i β, σ²)1(y i > 0) y i1(y i < 0) 1 y i, i = 1,,n and after replacing y with y* = (y 1,, y i,, y n ). Probit model: Fix σ² 1 (Albert & Chib, 1992) Logit model: Add another MCMC step to sample σ² from the Kolmogorov-Smirnov distribution (Holmes & Held, 2006). 14

15 Extensions of Bayesian Linear Model Extension to ordinal dependent variable, Y {c = 0,1,,C}: Likelihood, Cumulative logits or probits model: f(y x; ) = c [ ({w c x β}/σ) ({w c 1 x β}/σ)] 1(y=c) Add prior: (w) 1( w 1 < w 0 = 0 < w 1 < w 2 < < w C ) for the cutoff parameters, to the linear model. Then MCMC estimation methods are the same as in the linear model, after adding MCMC sampling steps to draw from the following f.c.p.d.s: y i D n, N(x i β, σ²) c 1(w c 1 < y i < w c ) 1(y i =c) (for i =1,, n) w c D n, i=1:n N(y i x i β, σ²)1(w c 1 < w c < w c+1 ) (c=1,,c 1) and after replacing y with y* = (y 1,, y i,, y n ). 15

16 Extensions of Bayesian Linear Model Extension to a censored dependent variable response, Y i Suppose that a continuous-valued response y i (e.g. log survival time) is censored, i.e., known only to fall within a fixed interval A i = [Y LBi, Y UBi ]. Interval censored: < Y LBi < Y UBi < Right censored: < Y LBi < Y UBi = Left censored: = Y LBi < Y UBi < Likelihood of censored response f(y i x i ; ) = ({Y UBi x i β} / σ) ({Y LBi x i β} / σ) MCMC estimation methods are the same as in the linear model, after adding a MCMC sampling step to draw from the f.c.p.d.: y i D n, N(x i β, σ²)1(y i A i ) and after replacing y i with y i 16

17 Extensions of Bayesian Linear Model Bayesian HLM / Multilevel model: Likelihood: f(y x; ) = n(y x β + z u h, σ²), for Level 2 groups h = 1,, H. adding priors: u h T ~ iid N(0, T), and T ~ (T), to the linear model. Then MCMC estimation methods are the same as in the linear model, after adding sampling steps in the MCMC algorithm to draw from the following f.c.p.d.s: u h D n, T, i=1:n N(y i x i β + z i u h, σ²)n(u h 0, T) (for h = 1,, H) T D n, h=1:h N(u h 0, T) (T). 17

18 Why Nonparametric? Why Bayesian nonparametric (BNP) regression? Modeling flexibility needed for real data sets (typically complex) Simple linear/hlm model stronger (less credible) data assumptions inaccurate statistical inferences. More flexible BNP regression model weaker (more credible) data assumptions more accurate statistical inferences. Provides a richer approach to data analysis Linear model/hlm: How mean Y changes with x? BNP regression: How Y changes with x? For mean, variance, p.d.f. (density regression), quantiles (quantile regression), survival functions of Y? BNP-HLM: Supports all random effect distributions. Also provides a model-based cluster analysis. 18

19 Student Literacy Score (z standardized) Blue circles: Data points (n = 565) Years teaching experience (TEXP) Solid red line: Normal linear model fit. OLS estimate of E[Y x] 19 Dashed lines: +/ 2 time error s.d. estimate

20 Probability Density (p.d.f.) TEXP = TEXP = TEXP = TEXP = Probability Density (p.d.f.) TEXP = Literacy Score TEXP = Literacy Score TEXP = Literacy Score TEXP = Literacy Score Probability density function (p.d.f.) estimates: Red: Normal linear model p.d.f. Blue: Estimate of true p.d.f. 20

21 Why Bayes Nonparametric? Why Bayes? A BNP regression model, provides a very flexible and rich approach to data analysis, by employing an infinite number of parameters. The estimation of the parameters of these models, from data, can be handled by the Bayesian approach to statistical inference. Hence, before we describe BNP regression models, we need to give a brief overview of the general approach to Bayesian statistical inference and MCMC. 21

22 Overview of Bayesian Modeling In general, consider a model (m) with likelihood density function f( x; m ), parameter m m R K ), and a prior probability density m ( m ) assigned over the space m. Data, D n = {(y i, x i )} i =1:n, update the prior m ( m ) to a Posterior density: m D n n i 1 n i 1 f y i x i ; m m m f y i x i ; m d m m with ( m D n ) = Pr[ < ] the c.d.f. of ( m D n ). Posterior density m ( m D n ) conveys the probable values of m, given data D n = {(y i, x i )} i =1:n, and prior, m ( m ). 22

23 Overview of Bayesian Modeling Model predictions from the prior predictive density: f(y x, m) = f(y x; ζ m )dπ m (ζ m ) f(y x, m) is likelihood of y given model m and prior Π m (ζ m ). Bayes factor (B 12 ): Evidence in favor of model M 1 vs. model M 2 : (Jeffreys, 1961; Kass & Raftery, 1995): Table of Evidence: B 12 2 log B 12 Evidence in favor of M 1 vs. M 2 1 to 3 0 to 2 Not worth more than a bare mention 3 to 20 2 to 6 Positive 20 to to 10 Strong > 150 > 10 Very strong 23

24 Overview of Bayesian Modeling Model predictions from the posterior predictive density: f n (y x, m) = f(y x; ζ m )dπ(ζ m D n ) c.d.f. F n (y x, m) = Y<y f(y x; ζ m )dπ(ζ m D n ) Model Predictive Mean E n [Y x, m] = y df n (y x, m) Variance Var n [Y x] = (y E n [Y x]) 2 df n (y x, m) c.d.f. F n (y x, m) = Pr n (Y < y x, m) Median F n 1 (.5 x, m) Quantile F n 1 (u x, m), for a choice u [0, 1]. Survival function 1 F n (y x, m) Hazard function etc. f n (y x, m) / [1 F n (y x, m)] 24

25 Overview of Bayesian Modeling Partial dependence (PD) method (Friedman, 2001): Let x S x denote 1 or 2 covariates of interest to study predictions, with x = x S x C and x = x S x C =. PD posterior predictive: Mean E n y x S n 1 i 1 E n y x S,x Ci Variance Var n y x S 1 n n i 1 Var n y x S,x Ci Density f n y x S 1 n n i 1 f n y x S,x Ci c.d.f. F n y x S 1 n n i 1 F n y x S,x Ci Median F 1 n.5 x S 1 n n i 1 F 1 n.5 x S,x Ci Quantile F 1 n u x S 1 n n i 1 F 1 n u x S,x Ci, u 0,1. Survival 1 F n y x S n 1 n i 1 1 F n y x S,x Ci etc. Then can study such PD predictions over a range of x S, via scatterplots, Trellis plots, etc. n 25

26 Overview of Bayesian Modeling Measures of the predictive fit of a model (m), for data D n Response y i fit: Standardized residual (z i ). z i = {y i E n (Y x i, m)}² / Var n (Y x i, m) Model fit: D(m) criterion n y i is an outlier when z i > 2. D m yi E n Y x i,m 2 Var n Y x i,m i 1 i 1 Predictive mean-square. First term measures data fit. Second term is penalty; increases with model complexity. (Laud & Ibrahim, 1995; Gelfand & Ghosh, 1998) Among a set of M models compared, {m = 1,, M}, the model with the best predictive fit for the data is identified as the one with the smallest value of D(m). 26 n

27 Bayesian Modeling and MCMC Posterior density of model: m D n n i 1 n i 1 f y i x i ; m m m f y i x i ; m d m m In typical applications of Bayesian modeling, ( D n ) cannot be solved directly, esp. when dimensionality of is high and/or when the prior ( ) is not conjugate to likelihood f( x; ). To infer the posterior ( D n ), and all posterior functionals of interest, can use Markov chain Monte Carlo (MCMC), MCMC involves constructing a discrete-time Harris ergodic Markov chain {ζ (s) } s=1:s having stationary distribution Π(ζ D n ). Ergodicity ensured by a proper ( ) (Robert & Casella, 2004). As S, chain {ζ (s) } s=1:s converges to samples from the posterior Π(ζ D n ). 27

28 Bayesian Modeling and MCMC Posterior density of model: m D n n i 1 n i 1 f y i x i ; m m m f y i x i ; m d m m A MCMC chain {ζ (s) } s=1:s can be constructed by sampling blocks ζ b, for b = 1,,B, of ζ, repeatedly for S times. ( b ζ b = ζ and ζ b ζ c = for all b c). Each stage s, sample s mb mb D n, m/b, for b = 1,,B, with m/b s s m1,, m,b 1, s 1 m,b 1,, s 1 mb. Gibbs sampler: For sampling known π(ζ b D n, ζ m/b ). Rejection sampler: For sampling π(ζ b D n, ζ m/b ), when we only know: π(ζ mb D n, ζ m/b ) i=1:n f(y i x i ; ζ m/b )π(ζ mb ζ m/b ). Metropolis-Hastings, adaptive rejection, or slice sampling. 28

29 Bayesian Modeling and MCMC Given a sampled MCMC chain, {ζ (s) } s=1:s, check whether S is large enough for the chain to provide samples from the posterior distribution (density), ( D n ), by performing a: MCMC Convergence Analysis (2 steps; Geyer 2011): 1. Trace plots: For each univariate parameter ζ of interest, do a trace plot of MCMC samples {ζ (s) } s=1:s to verify good mixing; i.e., that the samples appear to adequately explore the posterior ( D n ). Plot should look stable and hairy % Monte Carlo Confidence intervals (MCCIs): For each parameter, and its posterior estimate, of interest, use overlapping batch means and/or sub-sampling methods (Flegal & Jones, 2011) to calculate and verify that the 95% MCCI is small enough for practical purposes. If trace plots and MCCIs both not satisfactory, then run additional MCMC samples until they are both satisfactory. 29

30 BNP Regression Modeling A flexible and rich approach to BNP regression is provided by Bayesian Density regression models. Bayesian density regression is a relatively new research area that has already seen many important developments. Nearly all proposed Bayesian density regression models have the general form: f y x; f y x,, dg x j 1 f y x,, j x j x 30

31 General BNP Model f y x; f y x,, dg x j 1 {f ( x,, ) : (, ) } chosen family of parametric densities (kernel densities) f y x,, j x j x G x j 1 j x j x mixing distribution ω j (x) (x) ( ) mixing weights that sum to 1 at every x X probability measure degenerate at (x) {ω j (x)} j=1,2, and { j (x)} j=1,2, are infinite collections of processes indexed by X Other model parameters, not part of the mixture 31

32 General BNP Model f y x; f y x,, dg x j 1 The Bayesian density regression model is completed by the specification of a prior distribution on parameters: f y x,, j x j x = (, (ω j (x), j (x)) j=1,2, ), x X Current research aims to construct such prior distributions that guarantee large supports for f (y x ; ). The general model encompasses linear models, normal LMMs/HLMs, GLMs, GLMMs, finite mixture latent class models, hierarchical mixtures-of-experts regression models, and mixtures of Gaussian process regressions. 32

33 Examples, General BNP Model f y x; f y x,, dg x j 1 Most Bayesian density regression models are based on the Dependent Dirichlet Process (DDP) (MacEachern, 1999, 2000, 2001). DDP prior is denoted by: G x ~ DDP(, G 0x ). A random distribution G x from a DDP is constructed by: G x j 1 j x j x with stick-breaking weights: j x v j x j 1 l 1 1 v j x, j 1,2,, v j ~ Q j, v j : X [0,1], and atoms j (x) ~ ind G 0x (Sethuraman, 1994) 33 f y x,, j x j x

34 Examples, General BNP Model f y x; f y x,, dg j 1 The ordinary Dirichlet process (DP; Ferguson, 1973), is not covariate-dependent, and is denoted by: G ~ DP(, G 0 ), where the random distribution function: G j 1 j j is constructed by covariate independent, stick breaking weights: j v j j 1 l 1 1 v j, v j ~ iid Beta(1, ), and atoms j ~ iid G 0, j = 1,2, f y x,, j Under DP(, G 0 ), E[G( )] = G 0, Var[G( )] = {G( )[1 G( )]} / ( + 1). 34

35 Examples, General BNP Model f y x; f y x,, dg j 1 The more general nonparametric, stick-breaking process, G ~ Stick-Breaking((a j, b j ) j=1,2,, G 0 ), for the stick-breaking mixture weights, j v j j 1 l 1 1 v j lets: v j ~ ind Beta(a j, b j ), j = 1,2, f y x,, j (Ishwaran & James, 2001) Special cases of the stick-breaking process: G ~ DP(, G 0 ), when a j = 1 and b j = > 0. G ~ PitmanYor(a, b, G 0 ) when a j = 1 a and b j = b + ja, 0 a < 1 and b > a. 35

36 ANOVA/Linear DDP Mixture Model f y x; f y x,, dg x j 1 f y x,, j x j A multi-level model n y i h h i h 1 X h f y h X h, h 1,,N h f y h X h j 1 n h n y i h x i h j, 2 j i h 1 j j j 1 l 1 1 l j Beta 1 a, aj j,t N,T 2 IG a 0 /2,a 0 /2,T N 0,r 0 I p 1 IW T p 3,s 0 I p 1 Ga a,1/b De Iorio, Müller, Rosner, & MacEachern (2004) 36

37 ANOVA/Linear DDP Mixture Model f y x; f y x,, dg x j 1 f y x,, j x j While the ANOVA/linear DDP model, as represented in the previous slide, is a mixture model with mixing distribution: G ~ Stick-Breaking((a j, b j ) j=1,2,, G 0 ). the model is identical to a model with mixing distribution G x ~ ANOVA-DDP((a j, b j ) j=1,2,, G 0 ) G x j 1 j j x with, j (x) = x j, j, ~ iid G 0 = N(, ), and normal kernel n(y, 2 ). De Iorio, Müller, Rosner, & MacEachern (2004, p. 208). 37

38 Infinite Probits (IP) Model (v1) f y x; f y x,, dg x j 1 f y x,, j j x y i x i f y x i, i 1,,n f y x n y j, 2 j j x j ( ( ) is N(0,1) c.d.f.) j x j, 2 N, 2 j 2 IG 1, j x exp x 1/2 j 1 x exp x 1/2, N 0, 0 2 U 0,b Ga a 0 /2,a 0 /2, N 0, vi Karabatsos & Walker (2012b) 38

39 Infinite Probits (IP) Model Mixture weight j (x) (x) = 1/ Index j (x) = 1/ Index j (x) = Index j (x) = Index j Density f(y x) y y y (x) controls the level of multimodality of f(y x). x more informative f(y x) becomes more unimodal x less informative f(y x) becomes more multimodal y 39

40 Infinite Probits (IP) Model (v2) f y x; f y x,, dg x j 1 y i x i f y x i, i 1,,n f y x n y j, 2 j j x j f y x,, j j x ( ( ) is N(0,1) c.d.f.) Covariate Selection Priors j x j, 2 N, 2 j 2 IG 1, j x exp x 1/2 j 1 x exp x 1/2, N 0, 0 2 U 0,b Ga a 0 /2,a 0 /2, N 0,diag v 1 v 0 1 k, k Ber 1/2 Ber 1/2, k 0,1,,p Karabatsos & Walker (2012b) 40

41 Infinite Probits (IP) Model (v3) f y x; f y x,, dg x j 1 f y x,, j x j y i x i f y x i, i 1,,n f y x n y j x, 2 j x j ( ( ) is N(0,1) c.d.f.) j x j x j 1 x j 2 N 0, 2 ; U 0,b ; 0 2 N 0, 2 v 0 k 2, k N 0, 2 v 1 k v 0 1 k 2 k Ber w, k 1,,p 2 IG a 0 /2,a 0 /2 N 0, 2 vi 2 IG a /2,a /2 Karabatsos & Walker (2012b) 41

42 MCMC Sampling of BNP models The standard Gibbs samplers of Kalli et al. (2010) can be used to perform MCMC sampling of BNP regression models, combined with standard Gibbs sampling methods for sampling the f.c.p.d.s of parameters of ordinary linear/hlm models that are assigned conjugate priors; and possibly combined with rejection sampling methods (e.g., Metropolis or slice sampling) for sampling the f.c.p.d.s of parameters that are assigned non-conjugate priors. MCMC extensions to the linear DDP model for binary, ordinal, or censored Y, are similar to that for the linear model, after replacing N(x i β, σ²) with N(x i β z, σ²) for the f.c.p.d.s for the i y i *s, with along with f.c.p.d. z i N(x i β, σ²) z i zi. MCMC extensions to v1 or v2 (or v3) of the infinite probits model would instead replace with N(, σ z i zi ²) and f.c.p.d. z i N(x i β, σ²) z i zi (x) (with N( zi + x i β, σ²) and z i N( + x z i i β, 42 σ²) z (x), respectively). See Karabatsos & Walker (2012a,b). i

43 Applying BNP regression Applying BNP regression analysis, for data analysis, using the Bayesian Regression Software. A. PIRLS data (students nested within schools) 1. Dependent variable: reading achievement. Covariates of the student, teacher, classroom, & school. 2. Binary regression, and ordinal regression. 3. Multilevel BNP regression analysis via the DDP. 4. Causal analysis, regression discontinuity design (effect of class size). B. Larynx data: Censored survival data (survival analysis). C. School Calendar data: Meta-analysis. D. Item response data (NAEP). BNP-IRT analysis. (All data applications above are time-permitting). 43

44 Bayesian Regression Software Website for Bayesian Regression Software: Software Requirements: The software can run on either a Windows (PC) computer, a Linux computer, or on a Macintosh computer, which has installed either the (1) (free) MATLAB Compiler Runtime (MCR) R2013b (8.2) software for Windows 64-bit; Click here if you need to download and install the free MCR R2013b (8.2) software for Windows on your computer. (2) or has installed MATLAB version R2013b. Click this link to download the Bayesian Regression software package (file BayesRegression_2014b_pkg.exe). (8 MB) 44

45 Bayesian Regression Software For a Macintosh or Linux computer, the Bayesian Regression software can be run under a free software program that can run executable (.exe) files, such as Wine, Wine Bottler, Darwine, VirtualBox, or Bochs PC Emulator. 45

46 Bayesian Regression Software After downloading the Bayesian Regression software onto your computer desktop, click the file (icon): BayesRegression_2014b_pkg.exe to unpack several files, including the software file BayesRegression_2014b.exe, a readme file, a folder containing several example data files, and the software user's manual. The software user's manual provides step-by-step instructions on how to analyze a given data set, using a model of your choice. Any one of the example data files may be used to illustrate the Bayesian Regression Software. The Bayesian Regression software is opened/run by clicking the icon (file) BayesRegression_2014b.exe (it may take a few moments to load on screen). 46

47 47

48 You may use the Import data menu option. This will import you data set for analysis. The data set must be a comma delimited (.csv) file (variable names in first line; all data numeric) After data is imported, the software will automatically rename the data file to have the *.DAT extension. If you already have a data file, that has the *.DAT extension, then you may just select the Open Data (*.DAT) file menu option (the first menu option). 48

49 Imported (or opened) data file appears below. 49

50 You may initially inspect the data, by running basic descriptive statistics (summaries, frequencies, and/or correlations of variables) and/or by constructing plots of the data (e.g., histograms, Q-Q plots, scatter plots, box plots, etc.) 50

51 Univariate descriptive statistics for variables: zread, MALE, AGE, CLSIZE. Sample size, mean, median, s.d., min, max, percentiles, etc. 51

52 Univariate descriptive statistics for variables: zread, MALE, AGE, CLSIZE. Bivariate correlations (Pearson r statistics). 52

53 53

54 Also, with any of these menu options, you may reformat the data: e.g., -- Perform simple variable transformations such as constructing z-scores; -- Create new variables by dummy coding, creating interaction variables; -- Impute missing values of variables; etc. 54

55 We will construct z-scores of 8 variables, which will be used as covariates for a BNP regression analysis that we will perform shortly. 55

56 After easily clicking through the Simple variable transformations > Z-score menu options, new variables, the z-scores of 8 variables, are instantly created. 56

57 Select a BNP regression model for data analysis. We select the infinite probits regression model (v1) (one of 43 possible choices of models, from the software) 57

58 Select Dependent Variable for the model: zread Student literacy score. (a z-standardized version of READSC). 58

59 Select Covariates for the model: We select the z-standardized covariates: Z:MALE, Z:AGE, Z:CLSIZE, Z:ELL, Z:ELL, 59 Z:TEXP4, Z:EDLEVEL, Z:ENROL, Z:SCHSAFE.

60 Enter the Parameters of the Prior Distribution for the model: We select the default choices. 60

61 After menu selections, the Infinite Probits regression model appears (middle screen); along with a list of the choices of prior parameters (upper left), and a list of the choices of the dependent variables and covariates (upper right of screen). 61

62 When necessary (optional): (1) If you have censored data for your dependent variable, then you may select variables that indicate the nature of the censoring; (e.g., for a survival analysis) (2) If your data observations are weighted, then you may select the variable for the observation weights (e.g., for a meta-analysis). See the software user s manual for more details. 62

63 Click Run Posterior Analysis button. This will generate 50K MCMC samples from the model s posterior distribution. Before clicking button, we entered in the MCsamples box. 63

64 The analysis is in progress: A wait bar appears, indicating the current number of MCMC sampling iterations run (out of 50K), the time elapsed, the time left to 50K MCMC iterations, and the current value of the model fit ( D(m) statistic ). 64

65 After end of analysis run, this text output file is generated and opened automatically. Text display of BNP regression model. 65

66 Text display of chosen prior parameters of BNP regression model. 66

67 Text display of chosen dependent variable and covariates for BNP regression model. 67

68 Reports of fit of the BNP regression model. D(m) = 44.5 R-squared =.97 No outliers; all standardized residuals < 1 68

69 Posterior estimates of the parameters of the BNP regression model. 69

70 We need to evaluate the accuracy of the posterior estimates (reported in text output), through a MCMC convergence analysis of the 50K MCMC samples that we generated. We can do so by clicking the Posterior Summaries button. Recall that in analyzing the MCMC convergence, that we need to generate, for all model parameters of interest: (1) trace plots to evaluate MCMC mixing; and (2) 95% Monte Carlo confidence intervals, for posterior estimates of interest. 70

71 After clicking Posterior Summaries button, select menu option Trace Plots of MC samples. 71

72 Select the parameters that you want to view in the trace plots. 72

73 73

74 Now we evaluate the 95% Monte Carlo confidence intervals (MCCIs). After clicking Posterior Summaries button, select menu option: Table: Posterior Summaries, with 95% MCCIs. 74

75 For the model parameters, the 95% MCCI half-widths are near.00 to 2 decimals, for nearly all parameter estimates. Given this result, and trace plots, we conclude convergence. If convergence cannot be confirmed by either the trace plots or the 95%MCCIs, then we may generate additional MCMC samples, beyond the 50K MCMC samples already generated, in order to attain better MCMC convergence. Additional MCMC samples can be generated 75 by clicking again the Run Posterior Analysis button of the software.

76 Under the model, we can investigate the posterior predictions of Y (zread), as a function of one or more covariates of interest. We can do so by clicking the Posterior Predictive button. 76

77 We will choose to investigate how the mean, variance, p.d.f., survival function, and various quantiles, namely, the 10%ile, 25%ile, 50%ile (median), 75%ile, and 90%ile, of Y (zread), 77 varies as a function of one covariate, school enrollment (Z:ENROL).

78 Now select the covariate, school enrollment (Z:ENROL). 78

79 We now choose to investigate how posterior predictions of Y (zread) varies as a function of covariate Z:ENROL, for values of this covariate ranging from 1.7 to 1.7, with the values in this range separated by.1. Therefore, in the text box above, we enter 1.7 :.1 :

80 When investigating how posterior predictions of Y (zread) varies as a function of covariate Z:ENROL, We choose to make posterior predictions using the partial dependence method (Friedman, 2001). 80

81 Plot of posterior predictive Y (zread), as a function of zenrol, in terms of the mean, 10%ile, 25%, 50%, 75%, and 90%iles of`y. 81

82 Plot of posterior predictive variance of Y (zread), as a function of zenrol. 82

83 Plot of posterior predictive p.d.f. of Y (zread), as a function of zenrol. 83

84 Plot of posterior predictive survival function of Y (zread), as a function of zenrol. 84

85 Output table of specific numerical estimates of the posterior predictives of Y (zread), as a function of zenrol. Comma-delimited output files of these results are also generated. Also, comma-delimited output files of all posterior predictive are generated, in case that the data analyst want to further analyze or 85 plot these results on her/his own.

86 Run another posterior predictive analysis, by clicking again the Posterior Predictive button of the software. Here, plot of posterior predictive p.d.f. of Y (zread), by gender group. Using partial dependence method. zmale = 1.02 is female, and zmale =.98 is male. 86

87 Plot of posterior predictive Y (zread), for male vs. female, in terms of the mean, 10%ile, 25%, 50%, 75%, and 90%iles of`y. Female 87 Male

88 Conclusions Linear models, while often used in social science research, do not adequately describe many data sets, as they usually give rise to complex relationships between the Y and x variables. Then, linear models can provide misleading inferences. Also, ordinary linear models can only convey how the mean of Y varies with x. We proposed some BNP regression models that are: (1) flexible enough to adequately capture any complex relationships between the Y and x variables; (2) and which provide a richer approach to data analysis, in terms of how the mean, variance, quantiles, p.d.f., survival function, etc., of Y, changes with x. We illustrated these models on several real data sets, using my free software: Bayesian Regression: Nonparametric and Parametric Models (Karabatsos, 2014a, 2014b). 88

89 References Albert, J. & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88, Denison,, D.G.T., Holmes, C.C., Mallick, B.K., & Smith, A.F.M. (2002). Bayesian methods for nonlinear classification and regression. John Wiley & Sons. De Iorio, M., Müller, P., Rosner, G. L., & MacEachern, S. N. (2004). An ANOVA model for dependent random measures. Journal of the American Statistical Association, 99, Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29, Gelfand, A. & Ghosh, J. (1998). Model choice: A minimum posterior predictive loss approach. Biometrika, 85, George, E. I. and McCulloch, R. E. (1997). Approaches for Bayesian Variable Selection. Statistica Sinica, 7, Holmes, C. & Held, K. (2006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1, Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96, Jeffreys, H. (1961). The Theory of Probability (3 rd ed.). Oxford. (p. 432) Kalli, M., Griffin, J. and Walker, S. G. (2010). Slice sampling mixture models. Statistics and Computing, 21, Karabatsos, G. & Walker, S. (2012a). Bayesian nonparametric mixed random utility models. Computational Statistics and Data Analysis, 56,

90 References (continued) Karabatsos, G. & Walker, S. (2012b). Adaptive-modal Bayesian nonparametric regression. Electronic Journal of Statistics, 6, Karabatsos, George (2014a). Bayesian Regression: Nonparametric and Parametric Models, Version 2014b [Software]. University of Illinois-Chicago. Available from Karabatsos, George (2014b). Bayesian Regression: Nonparametric and Parametric Models, Version 2014b. Software User s Manual. University of Illinois-Chicago. Available from Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, Laud, P. & Ibrahim, J. (1995). Predictive model selection. Journal of the Royal Statistical Society, Series B, 57, MacEachern, S. N. (1999). Dependent nonparametric processes. Proceedings of the Bayesian Statistical Sciences Section of the American Statistical Association, MacEachern, S. N. (2000). Dependent Dirichlet Processes, Technical Report, Department of Statistics, The Ohio State University. MacEachern, S. N. (2001). Decision theoretic aspects of dependent nonparametric processes. In Bayesian Methods with Applications to Science, Policy and Official Statistics (E. George, ed.) International Society for Bayesian Analysis, Creta. Sethuraman, J. (1994). A constructive definition of Dirichlet priors. 90 Statistica Sinica, 4,

Bayesian Nonparametric Rasch Modeling: Methods and Software

Bayesian Nonparametric Rasch Modeling: Methods and Software Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Bayesian Nonparametric Meta-Analysis Model George Karabatsos University of Illinois-Chicago (UIC)

Bayesian Nonparametric Meta-Analysis Model George Karabatsos University of Illinois-Chicago (UIC) Bayesian Nonparametric Meta-Analysis Model George Karabatsos University of Illinois-Chicago (UIC) Collaborators: Elizabeth Talbott, UIC. Stephen Walker, UT-Austin. August 9, 5, 4:5-4:45pm JSM 5 Meeting,

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Bayesian Methods for Testing Axioms of Measurement

Bayesian Methods for Testing Axioms of Measurement Bayesian Methods for Testing Axioms of Measurement George Karabatsos University of Illinois-Chicago University of Minnesota Quantitative/Psychometric Methods Area Department of Psychology April 3, 2015,

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Bayesian Inference for Regression Parameters

Bayesian Inference for Regression Parameters Bayesian Inference for Regression Parameters 1 Bayesian inference for simple linear regression parameters follows the usual pattern for all Bayesian analyses: 1. Form a prior distribution over all unknown

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Bayesian nonparametric predictive approaches for causal inference: Regression Discontinuity Methods

Bayesian nonparametric predictive approaches for causal inference: Regression Discontinuity Methods Bayesian nonparametric predictive approaches for causal inference: Regression Discontinuity Methods George Karabatsos University of Illinois-Chicago ERCIM Conference, 14-16 December, 2013 Senate House,

More information

Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Nonparametric Bayesian modeling for dynamic ordinal regression relationships Nonparametric Bayesian modeling for dynamic ordinal regression relationships Athanasios Kottas Department of Applied Mathematics and Statistics, University of California, Santa Cruz Joint work with Maria

More information

A Nonparametric Model for Stationary Time Series

A Nonparametric Model for Stationary Time Series A Nonparametric Model for Stationary Time Series Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. isadora.antoniano@unibocconi.it Stephen G. Walker University of Texas at Austin, USA. s.g.walker@math.utexas.edu

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Slice Sampling Mixture Models

Slice Sampling Mixture Models Slice Sampling Mixture Models Maria Kalli, Jim E. Griffin & Stephen G. Walker Centre for Health Services Studies, University of Kent Institute of Mathematics, Statistics & Actuarial Science, University

More information

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition. Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface

More information

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models Journal of Data Science 8(2010), 43-59 A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models Jing Wang Louisiana State University Abstract: In this paper, we

More information

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Index Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Adaptive rejection metropolis sampling (ARMS), 98 Adaptive shrinkage, 132 Advanced Photo System (APS), 255 Aggregation

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

A Nonparametric Bayesian Model for Multivariate Ordinal Data

A Nonparametric Bayesian Model for Multivariate Ordinal Data A Nonparametric Bayesian Model for Multivariate Ordinal Data Athanasios Kottas, University of California at Santa Cruz Peter Müller, The University of Texas M. D. Anderson Cancer Center Fernando A. Quintana,

More information

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Flexible Regression Modeling using Bayesian Nonparametric Mixtures Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham

More information

Markov Chain Monte Carlo in Practice

Markov Chain Monte Carlo in Practice Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Gibbs Sampling in Latent Variable Models #1

Gibbs Sampling in Latent Variable Models #1 Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK Practical Bayesian Quantile Regression Keming Yu University of Plymouth, UK (kyu@plymouth.ac.uk) A brief summary of some recent work of us (Keming Yu, Rana Moyeed and Julian Stander). Summary We develops

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Bayesian Modeling of Conditional Distributions

Bayesian Modeling of Conditional Distributions Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Nonparametric Bayes Uncertainty Quantification

Nonparametric Bayes Uncertainty Quantification Nonparametric Bayes Uncertainty Quantification David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & ONR Review of Bayes Intro to Nonparametric Bayes

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

MULTILEVEL IMPUTATION 1

MULTILEVEL IMPUTATION 1 MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression

More information

Appendix F. Computational Statistics Toolbox. The Computational Statistics Toolbox can be downloaded from:

Appendix F. Computational Statistics Toolbox. The Computational Statistics Toolbox can be downloaded from: Appendix F Computational Statistics Toolbox The Computational Statistics Toolbox can be downloaded from: http://www.infinityassociates.com http://lib.stat.cmu.edu. Please review the readme file for installation

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

Making rating curves - the Bayesian approach

Making rating curves - the Bayesian approach Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

Lecture 16: Mixtures of Generalized Linear Models

Lecture 16: Mixtures of Generalized Linear Models Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently

More information

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations

More information

Sample Size Calculations for ROC Studies: Parametric Robustness and Bayesian Nonparametrics

Sample Size Calculations for ROC Studies: Parametric Robustness and Bayesian Nonparametrics Baylor Health Care System From the SelectedWorks of unlei Cheng Spring January 30, 01 Sample Size Calculations for ROC Studies: Parametric Robustness and Bayesian Nonparametrics unlei Cheng, Baylor Health

More information

Modeling and Predicting Healthcare Claims

Modeling and Predicting Healthcare Claims Bayesian Nonparametric Regression Models for Modeling and Predicting Healthcare Claims Robert Richardson Department of Statistics, Brigham Young University Brian Hartman Department of Statistics, Brigham

More information

Accept-Reject Metropolis-Hastings Sampling and Marginal Likelihood Estimation

Accept-Reject Metropolis-Hastings Sampling and Marginal Likelihood Estimation Accept-Reject Metropolis-Hastings Sampling and Marginal Likelihood Estimation Siddhartha Chib John M. Olin School of Business, Washington University, Campus Box 1133, 1 Brookings Drive, St. Louis, MO 63130.

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

Statistics Toolbox 6. Apply statistical algorithms and probability models

Statistics Toolbox 6. Apply statistical algorithms and probability models Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Likelihood-free MCMC

Likelihood-free MCMC Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University

More information

Bayesian Methods in Multilevel Regression

Bayesian Methods in Multilevel Regression Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo (and Bayesian Mixture Models) David M. Blei Columbia University October 14, 2014 We have discussed probabilistic modeling, and have seen how the posterior distribution is the critical

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

A Nonparametric Model-based Approach to Inference for Quantile Regression

A Nonparametric Model-based Approach to Inference for Quantile Regression A Nonparametric Model-based Approach to Inference for Quantile Regression Matthew Taddy and Athanasios Kottas Department of Applied Mathematics and Statistics, University of California, Santa Cruz, CA

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods

More information

Default priors for density estimation with mixture models

Default priors for density estimation with mixture models Bayesian Analysis ) 5, Number, pp. 45 64 Default priors for density estimation with mixture models J.E. Griffin Abstract. The infinite mixture of normals model has become a popular method for density estimation

More information

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods

More information

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i, Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives Often interest may focus on comparing a null hypothesis of no difference between groups to an ordered restricted alternative. For

More information

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017 Bayesian Statistics Debdeep Pati Florida State University April 3, 2017 Finite mixture model The finite mixture of normals can be equivalently expressed as y i N(µ Si ; τ 1 S i ), S i k π h δ h h=1 δ h

More information

Nonparametric Bayesian Modeling for Multivariate Ordinal. Data

Nonparametric Bayesian Modeling for Multivariate Ordinal. Data Nonparametric Bayesian Modeling for Multivariate Ordinal Data Athanasios Kottas, Peter Müller and Fernando Quintana Abstract We propose a probability model for k-dimensional ordinal outcomes, i.e., we

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Modeling conditional distributions with mixture models: Theory and Inference

Modeling conditional distributions with mixture models: Theory and Inference Modeling conditional distributions with mixture models: Theory and Inference John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università di Venezia Italia June 2, 2005

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION

INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION Pak. J. Statist. 2017 Vol. 33(1), 37-61 INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION A. M. Abd AL-Fattah, A.A. EL-Helbawy G.R. AL-Dayian Statistics Department, Faculty of Commerce, AL-Azhar

More information

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 Contents Preface to Second Edition Preface to First Edition Abbreviations xv xvii xix PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 1 The Role of Statistical Methods in Modern Industry and Services

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information