Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

Size: px
Start display at page:

Download "Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models"

Transcription

1 Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Dimitris Fouskakis, Department of Mathematics, School of Applied Mathematical and Physical Sciences, National Technical University of Athens, Athens, Greece; Joint work with: Ioannis Ntzoufras & David Draper Department of Statistics Department of Applied Mathematics and Statistics Athens University of Economics and Business University of California Athens, Greece; Santa Cruz, USA; Presentation is available at: fouskakis/presentations/irvine/presentation Irvine.pdf.

2 4th June 2012: Department of Statistics, UCIrvine, Irvine 2 Synopsis 1. Bayesian Variable Selection 2. Prior Specification 3. Expected-Posterior Priors 4. Motivation 5. Power-Expected Posterior (PEP) Methodology 6. MCMC for Sampling from the Posterior 7. Marginal Likelihood Computation 8. Simulated Example 9. Real Life Example

3 4th June 2012: Department of Statistics, UCIrvine, Irvine 3 1 Bayesian Variable Selection Within the Bayesian framework the identification of the best set of predictors between the M = {0, 1} p competitors is equivalent (assuming a zero-one loss function) to find the model m with the highest posterior model probability, defined as f(m y) = f(y m)f(m) m l M f(y m l)f(m l ), where f(y m) is the marginal likelihood under model m and f(m) is the prior probability of model m. The marginal likelihood function in the above calculation can be further expanded to include the effect of the model parameters: f(y m) = f(y θ m, m)f(θ m m)dθ m, where f(y θ m, m) is the likelihood under model m with parameters θ m and f(θ m m) is the prior distribution of model parameters given model m.

4 4th June 2012: Department of Statistics, UCIrvine, Irvine 4 Posterior odds and Bayes factors Based on the above posterior model probabilities, pairwise comparisons of any two models, m k and m l, is given by the Posterior Odds (PO) PO mk,m l f(m k y) f(m l y) = f(y m k) f(y m l ) f(m k) f(m l ) = B m k,m l O mk,m l which is a function of the Bayes Factor B mk,m l and the Prior Odds O mk,m l. The posterior model probability can be then expressed entirely in terms of Bayes Factors and Prior Odds as f(m y) = ( m l M PO ml,m) 1 = [ m l M B ml,m O ml,m] 1.

5 4th June 2012: Department of Statistics, UCIrvine, Irvine 5 2 Prior Specification Prior on the model space Uniform prior on the model space f(m) I [m M]. The above prior is equivalent of assuming that each covariate has prior probability 0.5 of entering the model. Beta-Binomial hierarchical prior on the model size W W Bin(p,θ) and θ Beta(α,β). If α = β = 1 then we have a uniform prior on the model size.

6 4th June 2012: Department of Statistics, UCIrvine, Irvine 6 Prior on model parameters Proper prior distributions (conjugate if available). For example in the case of the Gaussian regression models a popular choice is the Zellner s g-prior (Zellner, 1986). Main issue: Specification of hyperparameter g that controls the prior variance. Large values of g Bartlett s paradox (e.g., Bartlett, 1957 Biometrika). For g = n unit information prior (Kass & Wasserman, 1995, JASA). Beta prior on g g+1 Mixtures of g-priors (e.g. Liang et al., 2008, JASA). Non-local priors (e.g. Johnson and Rossell, 2010, RSSS B). They have zero mass for values of the parameter under the null hypothesis. Products of independent normal moment priors.

7 4th June 2012: Department of Statistics, UCIrvine, Irvine 7 Prior on model parameters (cont.) Shrinkage priors. E.g. Bayesian Lasso (Park and Casella, 2008, JASA, horseshoe prior (Carvalho et al., 2010, Biometrika), etc. Improper (reference) priors (defined up to arbitrary constants). Objectivity. Jeffreys prior. Bayes factors cannot be determined. Priors defined via imaginary data. Power prior (Ibrahim & Chen, 2000, Statistical Science). Expected-Posterior prior (Pérez & Berger, 2002, Biometrika).

8 4th June 2012: Department of Statistics, UCIrvine, Irvine 8 3 Expected-Posterior Priors Expected-posterior priors (EPPs) are defined as the posterior distribution of a parameter vector for the model under consideration, averaged over all possible imaginary samples y coming from a suitable predictive distribution m (y ). Hence the EPP for the parameters of any model m l M, with M denoting the model space, is π E l (θ l ) = = = f(θ l y, m l ) m (y ) dy π N l (θ l y ) m (y ) dy f(y θ l, m l )πl N(θ l) m (y ) dy. (1) f(y θ l, m l )πl N (θ l )dθ l } {{ } m N l (y )

9 4th June 2012: Department of Statistics, UCIrvine, Irvine 9 Expected-posterior priors (cont.) π N l (θ l) can be proper or improper. If improper, Bayes factors can be defined. Different choices of m include: Base-model approach. Select a reference model m 0 for the training sample and define m (y ) = m N 0 (y ) f(y m 0 ) to be the prior predictive distribution, evaluated at y, for the reference model m 0 under the baseline prior π N 0 (θ 0 ). The reference model should be at least as simple as the other competing models, and therefore a reasonable choice is to take m 0 to be the constant model (with no predictors). Use of empirical distribution for m. Subjective specification of m. Under the base-model approach the resulting expected-posterior Bayes factors becomes essentially identical to the arithmetic Intrinsic Bayes Factors (IBF), Berger and Pericchi (1996, JASA).

10 4th June 2012: Department of Statistics, UCIrvine, Irvine 10 Imaginary data We need to specify the imaginary responses y (of size n ) and the imaginary design matrix X (of size n d). Then the EPP, will depend on X but not on y, since the latter is integrated out. How large should such training samples be, i.e how large should n be? Minimal training sample. With p covariates and unknown error variance, we set n = (p + 1) + 1 = d + 1. Then to specify X we randomly select a sub-sample of the rows of the original matrix X (n should be < n). When the sample size is small in comparison to the number of covariates, working with a minimal training sample can result in an influential prior. How should the rows of X be chosen? With (n, p) = (100, 50) there are about possible choices. Arithmetic mean of the Bayes factors over all possible X. Take a random sample from the set of all possible training samples, but this adds an extraneous layer of Monte-Carlo noise to the model-comparison process. If the data derive from a highly structured situation, such as a complete randomized blocks experiment, any choice of a small part of the data to act as a training sample would be somewhat untypical.

11 4th June 2012: Department of Statistics, UCIrvine, Irvine 11 4 Motivation 1. Produce a less influential expected-posterior prior. 2. Diminish the effect of training samples on the expected-posterior prior methodology. We combine ideas from the power prior approach and unit information prior approach power-expected-posterior prior: We raise the likelihood involved in the expected-posterior prior distribution to the power of 1/δ. For δ = n this produces a prior information content equivalent to one data point. The method is sufficiently insensitive to the size of n ; one may take n = n and dispense with training samples altogether; this both removes the instability arising from the random choice of training samples and greatly reduces computing time.

12 4th June 2012: Department of Statistics, UCIrvine, Irvine 12 Specification We focus on variable selection problems for Gaussian linear models. We consider two models m l (l = 0, 1) with parameters θ l = (β l, σl 2 ) and likelihood specified by Y X l, β l, σl, 2 m l N n (X l β l, σli 2 n ), where Y = (Y 1,...,Y n ) is a vector containing the responses for all subjects, X l is an n d l design matrix containing the values of the explanatory variables in its columns, I n is the n n identity matrix, β l is a vector of length d l summarizing the effects of the covariates on the response Y and σl 2 is the error variance. Two diffrerent Baseline Prior Choice: Independence Jeffrey s prior, i.e. p(β l, σ 2 l m l) σ 2 l. Zellner s g-prior. Base-model approach; the null model is the reference model.

13 4th June 2012: Department of Statistics, UCIrvine, Irvine 13 5 Power-Expected-Posterior (PEP) Methodology The PEP prior is: π P E l (β l, σ 2 l X l, δ) = πn l (β l, σ2 l X l ) m N 0 (y X 0, δ) m N l (y X l, δ) f N n (y ; X l β l, δ σ2 l I n ) dy, The posterior under the PEP prior becomes: π P E l (β l, σl 2 y;x l,x l, δ) f(β l, σl 2 y, y, m l ;X l,x l, δ) mn l (y y ;X l, X l, δ) mn 0 (y X 0, δ) dy, f(β l, σl 2 y, y, m l ; X l, X l, δ) : posterior using data y, design matrix X l under prior f(β l, σl 2 y, m l ; X l, δ) (i.e. the posterior under power normal likelihood and baseline prior). m N l (y y ; X l, X l, δ) : marginal likelihood of model m l, using data y, design matrix X l and prior f(β l, σl 2 y, m l ; X l, δ).

14 4th June 2012: Department of Statistics, UCIrvine, Irvine 14 PEP under the Jeffreys baseline Baseline Prior: πl N (β l, σ 2 X l) = c l σl 2 where c l is an unknown normalizing constant., (2) J-PEP prior: π PE l (β l, σ 2 l X l, δ) = [ f Ndl β l ; β ] ( l, δ (X T l X l) 1 σl 2 f IG σl 2 ; n d l, RSS l 2 2δ ) m N 0 (y X 0, δ) dy, with β l = (X T l X l ) 1 X T l y, RSSl = y T( I n X l (X l T X l ) 1 X ) T l y and ( m N l (y X l, δ) = c l π d l n 2 X T l X l 1 2 n ) d d l l Γ RSS n 2 l. 2

15 4th June 2012: Department of Statistics, UCIrvine, Irvine 15 PEP under the Jeffreys baseline (cont.) The posterior under the J-PEP prior: πl P E (β l, σl 2 y;x l,x l, δ) f Ndl ( βl ; β N, Σ N σ 2 l) fig (σ 2 l ; ãn l, b N l ) m N l (y y ; X l,x l, δ) mn 0 (y X 0, δ) dy, with β N = Σ N (X T l y + δ 1 X T l y ), Σ N = [X T l X l + δ 1 X T l X l] 1, where ã N l = n + n d l 2, b N l = SSN l + δ 1 RSS l 2 + b l ; SSl N = ( [ y X l β l )T I n + δ X l (X T l X l ) 1 X T l ] 1 (y Xl β l ), while, m N l (y y ;X l,x l, δ) = f St n { y; n d l, X l β l, RSS l δ(n d l ) [I n + δ X l (X T l X l ) 1 X T l ] }.

16 4th June 2012: Department of Statistics, UCIrvine, Irvine 16 PEP under the Zellner s g-prior baseline Baseline Prior: [ ] πl N (β l σl 2 ; X l) = f Ndl β l ; 0, g (X T l X l) 1 σl 2 and π N l (σ 2 l) = f IG ( σ 2 l ; a l, b l ). (3) Z-PEP Prior: π P E l (β l,σ 2 l X l,δ) = [ f Ndl β l ; w β ] l, w δ (X T l X l) 1 σl 2 f IG (σ l 2 ; a l + n 2, b l + SS l 2 ) m N 0 (y X 0,δ) dy. Here w = g, β g+δ l = (X T l X l) 1 X T l y, SSl = y T Λ l y and ( m N l (y X l, δ) = f Stn y ; 2 a l,0, b ) l Λ 1 l, a l with Λ l 1 = δ [ I n g g + δ X l ( ) ] 1 1 ( 1 X T l X l X T l = δ I n + g X l X T l Xl) X T l.

17 4th June 2012: Department of Statistics, UCIrvine, Irvine 17 PEP under the Zellner s g-prior baseline (cont.) Theorem 1. Under the baseline prior setup (3), the power-expected-posterior prior mean of β l is equal to zero (i.e. E [β l ] = 0) while the power-expected-posterior prior variance is given by V [β l ] = { δw a l 1 + n /2 [ b l ] b 0 a 0 1 tr(λ l Λ 1 0 ) I dl + w2 b 0 a 0 1 (X l T X l ) 1 X T l Λ 0 1 X l } (X T l X l ) 1. where tr(a) is the trace of matrix A. Theorem 2. Under the baseline prior setup (3), the power-expected-posterior prior mean of σ 2 l is given by E [ σ 2 l ] = b 0 a tr [ Λ l Λ 0 1] + (a 0 1)b l /b 0 n 2 + a l 1 while the power-expected-posterior prior variance is given by V [ σl 2 ] = = E [ σl 4 ] [ E σl 2 ] 2 [( n ( 2 + a l 1 b 0 a 0 1 ) 2 ) ( n 2 + a l 2 )] tr(λ l Λ 0 1 ) + (a 0 1)b l /b 0 n /2 + a l 1 b 2 l + b l b 0 tr(λ l Λ 0 1 ) a b 2 0 {2tr(Λ l Λ 0 1 Λ l Λ 0 1 ) + tr(λ l Λ 0 1 ) 2} 4(a 0 1)(a 0 2)

18 4th June 2012: Department of Statistics, UCIrvine, Irvine 18 PEP under the Zellner s g-prior baseline (cont.) The posterior under the Z-PEP prior: with π P E l (β l, σ 2 l y;x l, X l, δ) β N = Σ N (X T l y + δ 1 X T l y ), ΣN = f Ndl ( βl ; β N, Σ N σ 2 l) fig (σ 2 l ; ã N l, b N l ) m N l (y y ; X l, X l,δ) m N 0 (y X 0, δ)dy, [X T l X l + (w δ) 1 X T l X l] 1, where ã N l = n + n 2 SS N l + a l, b N l = SSN l + SS l 2 + b l ; = ( [ y w X l β l )T I n + δ w X l (X T l X l) 1 X T l ] 1 ( y w Xl β l ), while { m N l (y y ; X l, X l, δ) = f Stn y ; 2 a l + n, w X l β l, 2b l + SSl [ ] } I 2 a l + n n + w δ X l (X T l X l) 1 X T l

19 4th June 2012: Department of Statistics, UCIrvine, Irvine 19 Hyper-parameters The parameter g in the normal baseline prior is set to δ n, so that with δ = n we use g = (n ) 2. This choice will make the g-prior contribute information equal to one data point within the posterior f(β l, σl 2 y, m l ; X l, δ). In this manner, the entire PEP prior accounts for information equal to ( 1 + δ) 1 data points. We set the parameters a and b in the inverse-gamma baseline prior to 0.01, yielding a baseline prior mean of 1 and variance of 100 (i.e., a large amount of prior uncertainty) for the precision parameter. (If strong prior information about the model parameters is available, Theorems 1 and 2 can be used to guide the choice of a and b.) By comparing the posterior distributions under the two different baseline schemes described before, it is straightforward to prove that they coincide for large g (and therefore for w 1), a l = d l /2 and b l = 0.

20 4th June 2012: Department of Statistics, UCIrvine, Irvine 20 6 MCMC for Sampling from the Posterior We consider the following augmented conditional distribution: f(β l, σ 2 l, y y;x l,x l, δ) f N dl ( βl ; β N, Σ N σ 2 l) fig (σ 2 l ; ãn l, b N l ) f(y y;x l,x l, δ), with with f(y y;x l,x l, δ) mn l (y y; X l,x l, δ) mn 0 (y X 0, δ) m N l (y X l, δ), m N l (y y, X l,x l, δ) = f(y β l, σ 2 l, m l ;X l, δ) f(β l, σ2 l y, m l ; X l ) dβ l dσ 2 l. (4) For the Zellner s baseline prior, (4) becomes equal to { f Stn y g ; 2 a l + n, g + 1 X β l l, 2b l + SS l 2 a l + n where SS l = y T [ I n f Stn g g+1 X l (X T l X l) 1 X T l { y ; n d l, X l β l, [ δ I n + g ]} g + 1 X l (XT l X l) 1 X T l, ] y, while, for the Jeffreys baseline prior, (4) becomes SS [ ] } l δ I n + X l n d (XT l X l) 1 X T l, l with the posterior sum of squares now given by SS l = y T [ I n X l (X T l X l) 1 X T l ] y.

21 4th June 2012: Department of Statistics, UCIrvine, Irvine 21 MCMC for Sampling from the Posterior (cont.) Using the previous expressions, we can specify the following MCMC scheme: (1) Generate y from f(y y; X l, X l, δ). (2) Generate σ 2 l from IG(ãN l, b N l ). (3) Generate β l from N dl ( βn, ΣN σ 2 l). In Step 1, we can generate the imaginary data y by using a Metropolis-Hastings algorithm with proposal q(y ) = m N l (y y, X l, X l, δ) given in (4) and acceptance probability [ ] α = min 1, mn 0 (y X 0, δ) m N l (y X l, δ) m N l (y X l, δ) mn 0 (y X 0, δ).

22 4th June 2012: Department of Statistics, UCIrvine, Irvine 22 7 Marginal Likelihood Computation The marginal likelihood of any model m l M is given by m PE l (y X l, X l, δ) = = m N l (y X l, X l ) f(y β l, σl, 2 m l ; X l ) πl PE (β l, σl 2 X l, δ) dβ l dσl 2 m N l (y y, X l, X l, δ) m N l (y X l, δ) m N 0 (y X 0, δ) dy. In the above expression m N l (y X l,x l) is the marginal likelihood of model m l for the actual data under the baseline prior. Therefore, under the Zellner s baseline prior is given by { m N l (y X l, X l) = f Stn y ; 2 a l,0, b ( ) 1 l [I n + g X l X T l X l Xl T]}, a l while under the Jeffreys baseline prior is given by m N l (y X l, X l) = c l π d l n 2 X l T X l 1 2 Γ ( n dl 2 ) RSS l n d l 2, with RSS l = y T ( In X l (X l T X l ) 1 X l T ) y

23 4th June 2012: Department of Statistics, UCIrvine, Irvine 23 Marginal likelihood computation (cont.) (1) Generate y (t) (t = 1,..., T) from m N 0 (y X 0, δ) and estimate the marginal likelihood by [ 1 T m P E l (y X l,x l, δ) = mn l (y X l, X l ) m N l (y (t) y, X l, X l, δ) ] T m N l (y (t) X l, δ). (5) t=1 (2) Generate y (t) (t = 1,..., T) from m N 0 (y y, X 0, X 0 [ 1 m P E l (y X l,x l, δ) = mn 0 (y X 0,X 0 ) T, δ) and estimate the marginal likelihood by T m N l (y y (t), X l, X l, δ) ] t=1 m N 0 (y y (t), X 0, X 0, δ). (6) (3) Generate y (t) (t = 1,..., T) from m N l (y y, X l, X l [ 1 m P E l (y X l, X l, δ) = mn l (y X l, X l ) T, δ) and estimate the marginal likelihood by T t=1 m N 0 (y (t) X 0, δ) ] m N l (y (t) X l, δ). (7) (4) Generate y (t) (t = 1,..., T) from m N l (y y; X l, X l [ 1 m P E l (y X l,x l, δ) = mn 0 (y X 0, X 0 ) T T t=1, δ) and estimate the marginal likelihood by m N l (y y ; X l, X l, δ) m N 0 (y y; X 0, X 0, δ) ] m N 0 (y y ; X 0, X 0, δ) m N l (y y; X l, X l, δ). (8)

24 4th June 2012: Department of Statistics, UCIrvine, Irvine 24 Model search algorithm We propose to modify the standard MC 3 method by sampling a binary vector γ indicating the variables included in the model (see, e.g., George & McCulloch, JASA, 1993), using a Metropolis-within-Gibbs approach as follows. (1) Generate y (t) (t = 1,...,T) from m N 0 (y X 0, δ) (this is the first Monte-Carlo marginal likelihood scheme) or m N 0 (y y; X 0, X 0, δ) (this is the second scheme). (2) For the current model m l, corresponding to the set of variable-inclusion indicators γ l, repeat the following: For j = 1,...,p (selected in random order), repeat the following steps: (a) Propose γ j = 1 γ j with probability one. (b) Keep the remaining covariates the same: γ l = γ l for all l j. (c) Identify m l corresponding to the vector γ l with elements γ k, k = 1,...,p. (d) If m l is not previously visited, calculate and store its estimated marginal

25 4th June 2012: Department of Statistics, UCIrvine, Irvine 25 likelihood f(m l y) = m PE l (y X l, X l second scheme)., δ) given by (5, first scheme) or (6, (e) Set m l = m l (i.e., accept the proposed model m l ) with probability [ α = min 1, f(m ] l y) = min [1, mpe l (y X l, X ] l, δ) f(m l ) f(m l y) m PE l (y X l, X l, δ). (9) f(m l ) (3) Store m l as the current model. (4) Repeat steps (2) (3) until a target number of models is visited or a pre-specified CPU budget is exhausted. For the third and fourth Monte-Carlo marginal-likelihood estimates, we start the above MC 3 algorithm from step (2) and in step (2)(d) we generate y (t) (t = 1,...,T) from m N l (y y; X l, X l, δ), which now depends on the proposed model, and estimate the marginal likelihood of that model using expressions (7) and (8), respectively.

26 4th June 2012: Department of Statistics, UCIrvine, Irvine 26 8 Simulated Example We illustrate the proposed methods by considering the simulated data-set Nott & Kohn (2005, Biometrika). This data-set consists of n = 50 observations and p = 15 covariates. The first 10 covariates are generated from a standardized Normal distribution while X ij N ( 0.3X i1 +0.5X i2 +0.7X i3 +0.9X i4 +1.1X i5, 1 ) for j = 11,...,15, i = 1,...,50 and the response from Y i N ( 4 + 2X i1 X i X i7 + X i X i13, ), for i = 1,...,50.

27 4th June 2012: Department of Statistics, UCIrvine, Irvine 27 Monte-Carlo standard errors In order to check the efficiency of the four Monte-Carlo estimates we initially performed a small experiment. For Z-PEP, we estimated the logarithm of the marginal likelihood for models X 1 + X 5 + X 7 + X 11 and X 1 + X 7 + X 11, by running each Monte-Carlo technique 100 times for 1000 iterations and we calculated the Monte-Carlo standard errors. Table 1: Monte-Carlo standard errors of the estimates of the logarithm of the marginal likelihoods for models X 1 + X 5 + X 7 + X 11 and X 1 + X 7 + X 11 when running the four Monte-Carlo techniques 100 times for 1000 iterations, using the PI methodology Monte-Carlo Model Scheme X 1 + X 5 + X 7 + X 11 X 1 + X 7 + X The third and fourth Monte-Carlo scheme produces much lower Monte-Carlo standard errors as expected. The third/fourth scheme is the one that will be used, for the Z-PEP/J-PEP methodology, in all the following runs, keeping the number of iterations equal to 1000.

28 4th June 2012: Department of Statistics, UCIrvine, Irvine 28 Posterior model probabilities Table 2: Posterior model probabilities for the best models, together with Bayes factors of the MAP model (m 1 ) against m j, j = 2,..., 7, for the Z-PEP and the J-PEP prior methodologies. Z-PEP J-PEP Posterior Model Bayes Posterior Model Bayes m j Predictors Probability Factor Rank Probability Factor 1 X 1 + X 5 + X 7 + X (2) X 1 + X 7 + X (1) X 1 + X 5 + X 6 + X 7 + X (3) X 1 + X 6 + X 7 + X (4) X 1 + X 7 + X 10 + X (5) X 1 + X 5 + X 7 + X 10 + X (9) X 1 + X 5 + X 7 + X 11 + X (10)

29 4th June 2012: Department of Statistics, UCIrvine, Irvine 29 Sensitivity analysis on n for Z-PEP Figure 1: Posterior marginal inclusion probabilities for different n for the Z-PEP methodology Posterior Inclusion Probabilities X7 X1 X11 X5 X6 X10 X Imaginary Data Sample Size (n )

30 4th June 2012: Department of Statistics, UCIrvine, Irvine 30 Sensitivity analysis on n for Z-PEP (cont.) Figure 2: Posterior model probabilities of the six best models obtained for each n, for the Z-PEP methodology; bullets indicate different models entered the top six models for at least one value of n and lines connect posterior model probabilities for models of the same rank over different values of n Posterior Model Probabilities X_1+X_5+X_6+X_7+X_11 X_1+X_5+X_7+X_10+X_11 X_1+X_5+X_7+X_11 X_1+X_5+X_7+X_11+X_13 X_1+X_6+X_7+X_11 X_1+X_7 X_1+X_7+X_10+X_11 X_1+X_7+X_ Imaginary Data Sample Size (n )

31 4th June 2012: Department of Statistics, UCIrvine, Irvine 31 Comparisons In the following we compare the Bayes factor between the two best models (X 1 + X 5 + X 7 + X 11 versus X 1 + X 5 + X 7 ) according to Z-PEP and J-PEP with the corresponding Bayes factors using J-EP (EPP with no power and with Jeffreys as baseline) and IBF. For IBF and EPP we randomly select 100 training samples of size n = 6 (minimal training samples for the estimation of these two models) and n = 17 (minimal training sample for the estimation of the full model when we consider all p = 15 covariates), while for Z-PEP & J-PEP we randomly select 100 training samples of sizes n = 6, 17 and n = 20 + k 5 for k = 1,.....,5. Each marginal likelihood estimate in Z-PEP is obtained with 1000 iterations, using the third Monte-Carlo scheme and in J-PEP and J-EPP with 1000 iterations, using the fourth Monte-Carlo scheme.

32 4th June 2012: Department of Statistics, UCIrvine, Irvine 32 Comparisons (cont.) J-PEP 50 Z-PEP 50 J-PEP 40 Z-PEP 40 J-PEP 30 Z-PEP 30 J-PEP 17 Z-PEP 17 J-EP 17 IBF 17 J-PEP 6 Z-PEP 6 J-EP 6 IBF Log Bayes Factor

33 4th June 2012: Department of Statistics, UCIrvine, Irvine 33 9 Real Life Example We use the ozone data (Breinman & Friedman 1985, JASA) to implement our approaches. The data we used were slightly changed based on some initial preliminary exploratory analysis. As a response we use a standardized version of the logarithm of the ozone concentration variable of the original data set. The standardized versions of nine (9) main effects, 9 quadratic terms, 2 cubic terms, and 36 two-way interactions (a total of 56 variables ) where used as possible covariates. The main effects are: X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 Day of Year Wind speed (mph) at LAX 500 mb pressure height (m) at VAFB Humidity (%) at LAX Temperature ( F) at Sandburg Inversion base height (feet) at LAX Pressure gradient (mm Hg) from LAX to Daggett Inversion base temperature ( F) at LAX Visibility (miles) at LAX

34 4th June 2012: Department of Statistics, UCIrvine, Irvine 34 Reduced model space (1) First we used MC 3 to identify variables with high posterior marginal inclusion probabilities, and we created a reduced model space consisting only of those variables whose marginal probabilities were above 0.3. (2) Then we used the same model search algorithm as in step (1) in the reduced space to estimate posterior model probabilities (and the corresponding odds). We ran MC 3 for 100,000 iterations for both the Z-PEP and the EIBF methods. For EIBF we used 30 randomly-selected minimal training samples (n = 58). The reduced model space was formed from those variables that in either run had posterior marginal inclusion probabilities above 0.3. With this approach we reduced the initial list of p = 56 available candidates down to 22 predictors. We then ran MC 3 for 220,000 iterations for the Z-PEP, J-PEP and the EIBF approaches. For EIBF we used 30 randomly-selected minimal training samples (n = 24).

35 4th June 2012: Department of Statistics, UCIrvine, Irvine 35 Table 3: Posterior odds (PO 1k ) of the three best models within each analysis versus the current model k for the reduced model space of ozone data set. Variables common in all three analyses were: X 1 + X 2 + X 8 + X 9 + X 10 + X 15 + X 16 + X 18 + X 43. J-PEP Ranking Number of Posterior J-PEP Z-PEP EIBF Additional Variables Covariates Odds P O 1k 1 (>3) (>3) No additional variable (1) (>3) X 7 + X 12 + X 13 +X (>3) (>3) X 7 + X 13 +X Z-PEP Ranking Number of Posterior Z-PEP J-PEP EIBF Additional Variables Covariates Odds P O 1k 1 (2) (>3) X 7 + X 12 + X 13 +X (>3) (>3) X 5 +X 7 + X 12 + X 13 +X (>3) (3) X 5 +X 7 + X 12 + X 13 +X 20 +X EIBF Ranking Number of Posterior EIBF J-PEP Z-PEP Additional Variables Covariates Odds P O 1k 1 (>3) (>3) X 7 + X 12 + X 13 +X 20 +X (>3) (>3) X 5 +X 7 + X 12 + X 13 +X 20 +X 26 + X (>3) (3) X 5 +X 7 + X 12 + X 13 +X 20 +X

36 4th June 2012: Department of Statistics, UCIrvine, Irvine 36 Comparison of the predictive performance We evaluate and compare the out-of-sample predictive performance of the three highest a-posteriori model indicated by the above illustrated methods. To do so, we divide the data in half by considering 50 randomly selected splits. For each split, we generate an MCMC sample of T iterations from the model of interest m l and then calculate the average root mean square error by with ARMSE l = 1 T RMSE (t) l = T t=1 1 n V i V RMSE (t) l ( yi ŷ (t) i m l ) 2 being the root mean square error for the validation dataset V of size n V calculated for the t-iteration of the MCMC, where ŷ (t) = X i m l l(i) β (t) l are the expected values of y i according to the assumed model for iteration t, β (t) l are the model parameters for iteration t and X l(i) is the i-th row of matrix X l of model m l. Detailed results for the two highest a-posteriori models are provided on the next Table; results for the full model were also added for comparison reasons. For comparison purposes, we have also included the split-half RMSE measures for these three models using predictions based on direct fitting of the model with the independence Jeffreys prior.

37 4th June 2012: Department of Statistics, UCIrvine, Irvine 37 Comparison of the predictive performance (cont.) Table 4: Comparison of the predictive performance of the PEP and J-EP methods using the full and the MAP models in the reduced model space of the ozone data set. RMSE Model d l R 2 R 2 adj J-PEP Z-PEP J-EP Jeffreys Prior Full (0.0087) (0.0097) (0.0169) (0.0104) J-PEP MAP (0.0063) (0.0051) (0.0626) (0.0052) Z-PEP MAP (0.0071) (0.0060) (0.0734) (0.0049) EIBF MAP (0.0066) (0.0072) (0.0800) (0.0061) Comparison with the full model (percentage changes) RMSE Model d l R 2 R 2 adj J-PEP Z-PEP J-EP Jeffreys Prior J-PEP MAP 59% 5.06% 4.48% 0.22% +3.81% +21.5% +3.23% Z-PEP MAP 41% 1.50% 1.06% +0.10% +1.01% +12.7% +0.37% EIBF MAP 36% 1.20% 0.78% +3.24% +0.44% +10.9% 0.23% Note: Mean (standard deviation) over 50 different split-half out-of-sample evaluations.

38 4th June 2012: Department of Statistics, UCIrvine, Irvine 38 (a) Full Model J-PEP JEFFREYS J-EP Z-PEP (b) MAP Model using the Z PEP prior J-PEP JEFFREYS J-EP Z-PEP

39 4th June 2012: Department of Statistics, UCIrvine, Irvine 39 (c) MAP Model using the J EP prior J-PEP JEFFREYS J-EP Z-PEP (d) MAP Model using the J PEP prior J-PEP JEFFREYS J-EP Z-PEP

40 4th June 2012: Department of Statistics, UCIrvine, Irvine 40 Thank You Irvine!

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Ioannis Ntzoufras, Department of Statistics, Athens University of Economics and Business, Athens, Greece; e-mail: ntzoufras@aueb.gr.

More information

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models D. Fouskakis, I. Ntzoufras and D. Draper December 1, 01 Summary: In the context of the expected-posterior prior (EPP) approach

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Posterior Model Probabilities via Path-based Pairwise Priors

Posterior Model Probabilities via Path-based Pairwise Priors Posterior Model Probabilities via Path-based Pairwise Priors James O. Berger 1 Duke University and Statistical and Applied Mathematical Sciences Institute, P.O. Box 14006, RTP, Durham, NC 27709, U.S.A.

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Bayesian Analysis of Bivariate Count Data

Bayesian Analysis of Bivariate Count Data Karlis and Ntzoufras: Bayesian Analysis of Bivariate Count Data 1 Bayesian Analysis of Bivariate Count Data Dimitris Karlis and Ioannis Ntzoufras, Department of Statistics, Athens University of Economics

More information

Incorporating cost in Bayesian variable selection, with application to cost-effective measurement of quality of health care

Incorporating cost in Bayesian variable selection, with application to cost-effective measurement of quality of health care Incorporating cost in Bayesian variable selection, with application to cost-effective measurement of quality of health care Dimitris Fouskakis, Department of Mathematics, School of Applied Mathematical

More information

Mixtures of Prior Distributions

Mixtures of Prior Distributions Mixtures of Prior Distributions Hoff Chapter 9, Liang et al 2007, Hoeting et al (1999), Clyde & George (2004) November 10, 2016 Bartlett s Paradox The Bayes factor for comparing M γ to the null model:

More information

Bayesian hypothesis testing for the distribution of insurance claim counts using the Gibbs sampler

Bayesian hypothesis testing for the distribution of insurance claim counts using the Gibbs sampler Journal of Computational Methods in Sciences and Engineering 5 (2005) 201 214 201 IOS Press Bayesian hypothesis testing for the distribution of insurance claim counts using the Gibbs sampler Athanassios

More information

Bayesian Assessment of Hypotheses and Models

Bayesian Assessment of Hypotheses and Models 8 Bayesian Assessment of Hypotheses and Models This is page 399 Printer: Opaque this 8. Introduction The three preceding chapters gave an overview of how Bayesian probability models are constructed. Once

More information

Mixtures of Prior Distributions

Mixtures of Prior Distributions Mixtures of Prior Distributions Hoff Chapter 9, Liang et al 2007, Hoeting et al (1999), Clyde & George (2004) November 9, 2017 Bartlett s Paradox The Bayes factor for comparing M γ to the null model: BF

More information

Bayesian methods in economics and finance

Bayesian methods in economics and finance 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Model Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015

Model Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015 Model Choice Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci October 27, 2015 Topics Variable Selection / Model Choice Stepwise Methods Model Selection Criteria Model

More information

Bayesian Model Averaging

Bayesian Model Averaging Bayesian Model Averaging Hoff Chapter 9, Hoeting et al 1999, Clyde & George 2004, Liang et al 2008 October 24, 2017 Bayesian Model Choice Models for the variable selection problem are based on a subset

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University

More information

Objective Bayesian Variable Selection

Objective Bayesian Variable Selection George CASELLA and Elías MORENO Objective Bayesian Variable Selection A novel fully automatic Bayesian procedure for variable selection in normal regression models is proposed. The procedure uses the posterior

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks (9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web: capture/recapture.

More information

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is Multinomial Data The multinomial distribution is a generalization of the binomial for the situation in which each trial results in one and only one of several categories, as opposed to just two, as in

More information

Advanced Statistical Modelling

Advanced Statistical Modelling Markov chain Monte Carlo (MCMC) Methods and Their Applications in Bayesian Statistics School of Technology and Business Studies/Statistics Dalarna University Borlänge, Sweden. Feb. 05, 2014. Outlines 1

More information

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification N. Zabaras 1 S. Atkinson 1 Center for Informatics and Computational Science Department of Aerospace and Mechanical Engineering University

More information

An MCMC model search algorithm for regression problems

An MCMC model search algorithm for regression problems An MCMC model search algorithm for regression problems Athanassios Petralias and Petros Dellaportas Department of Statistics, Athens University of Economics and Business, Greece Abstract We improve upon

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Model Choice. Hoff Chapter 9. Dec 8, 2010

Model Choice. Hoff Chapter 9. Dec 8, 2010 Model Choice Hoff Chapter 9 Dec 8, 2010 Topics Variable Selection / Model Choice Stepwise Methods Model Selection Criteria Model Averaging Variable Selection Reasons for reducing the number of variables

More information

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011. L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen Outline George McCullochs model The Variational Garrote Bert Kappen L methods

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Finite Population Estimators in Stochastic Search Variable Selection

Finite Population Estimators in Stochastic Search Variable Selection 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (2011), xx, x, pp. 1 8 C 2007 Biometrika Trust Printed

More information

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

BAYESIAN ANALYSIS OF CORRELATED PROPORTIONS

BAYESIAN ANALYSIS OF CORRELATED PROPORTIONS Sankhyā : The Indian Journal of Statistics 2001, Volume 63, Series B, Pt. 3, pp 270-285 BAYESIAN ANALYSIS OF CORRELATED PROPORTIONS By MARIA KATERI, University of Ioannina TAKIS PAPAIOANNOU University

More information

Bayesian Analysis of Massive Datasets Via Particle Filters

Bayesian Analysis of Massive Datasets Via Particle Filters Bayesian Analysis of Massive Datasets Via Particle Filters Bayesian Analysis Use Bayes theorem to learn about model parameters from data Examples: Clustered data: hospitals, schools Spatial models: public

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Forecast combination and model averaging using predictive measures. Jana Eklund and Sune Karlsson Stockholm School of Economics

Forecast combination and model averaging using predictive measures. Jana Eklund and Sune Karlsson Stockholm School of Economics Forecast combination and model averaging using predictive measures Jana Eklund and Sune Karlsson Stockholm School of Economics 1 Introduction Combining forecasts robustifies and improves on individual

More information

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition. Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface

More information

Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection

Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection Prof. Nicholas Zabaras University of Notre Dame Notre Dame, IN, USA Email:

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Bayesian Variable Selection Under Collinearity

Bayesian Variable Selection Under Collinearity Bayesian Variable Selection Under Collinearity Joyee Ghosh Andrew E. Ghattas. June 3, 2014 Abstract In this article we provide some guidelines to practitioners who use Bayesian variable selection for linear

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

Heriot-Watt University

Heriot-Watt University Heriot-Watt University Heriot-Watt University Research Gateway Prediction of settlement delay in critical illness insurance claims by using the generalized beta of the second kind distribution Dodd, Erengul;

More information

Bayesian Methods in Multilevel Regression

Bayesian Methods in Multilevel Regression Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design

More information

ST 740: Model Selection

ST 740: Model Selection ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model

More information

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF) Case Study 4: Collaborative Filtering Review: Probabilistic Matrix Factorization Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 2 th, 214 Emily Fox 214 1 Probabilistic

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

Mixtures of g-priors for Bayesian Variable Selection

Mixtures of g-priors for Bayesian Variable Selection Mixtures of g-priors for Bayesian Variable Selection January 8, 007 Abstract Zellner s g-prior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency

More information

Specification of prior distributions under model uncertainty

Specification of prior distributions under model uncertainty Specification of prior distributions under model uncertainty Petros Dellaportas 1, Jonathan J. Forster and Ioannis Ntzoufras 3 September 15, 008 SUMMARY We consider the specification of prior distributions

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Bayesian Estimation of Input Output Tables for Russia

Bayesian Estimation of Input Output Tables for Russia Bayesian Estimation of Input Output Tables for Russia Oleg Lugovoy (EDF, RANE) Andrey Polbin (RANE) Vladimir Potashnikov (RANE) WIOD Conference April 24, 2012 Groningen Outline Motivation Objectives Bayesian

More information

Including historical data in the analysis of clinical trials using the modified power priors: theoretical overview and sampling algorithms

Including historical data in the analysis of clinical trials using the modified power priors: theoretical overview and sampling algorithms Including historical data in the analysis of clinical trials using the modified power priors: theoretical overview and sampling algorithms Joost van Rosmalen 1, David Dejardin 2,3, and Emmanuel Lesaffre

More information

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 3, 2010

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 3, 2010 MFM Practitioner Module: Risk & Asset Allocation Estimator February 3, 2010 Estimator Estimator In estimation we do not endow the sample with a characterization; rather, we endow the parameters with a

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Bayesian Sampling and Ensemble Learning in Generative Topographic Mapping

Bayesian Sampling and Ensemble Learning in Generative Topographic Mapping Bayesian Sampling and Ensemble Learning in Generative Topographic Mapping Akio Utsugi National Institute of Bioscience and Human-Technology, - Higashi Tsukuba Ibaraki 35-8566, Japan March 6, Abstract Generative

More information

Supplementary materials for Scalable Bayesian model averaging through local information propagation

Supplementary materials for Scalable Bayesian model averaging through local information propagation Supplementary materials for Scalable Bayesian model averaging through local information propagation August 25, 2014 S1. Proofs Proof of Theorem 1. The result follows immediately from the distributions

More information

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Introduction to Markov Chain Monte Carlo & Gibbs Sampling Introduction to Markov Chain Monte Carlo & Gibbs Sampling Prof. Nicholas Zabaras Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Ithaca, NY 14853-3801 Email: zabaras@cornell.edu

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors

Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors The Canadian Journal of Statistics Vol. xx No. yy 0?? Pages?? La revue canadienne de statistique Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors Aixin Tan

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

Bayesian model selection for computer model validation via mixture model estimation

Bayesian model selection for computer model validation via mixture model estimation Bayesian model selection for computer model validation via mixture model estimation Kaniav Kamary ATER, CNAM Joint work with É. Parent, P. Barbillon, M. Keller and N. Bousquet Outline Computer model validation

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modeling for small area health data Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modelling for Small Area

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Non-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice

Non-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice Non-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice Roberta Paroli 1, Silvia Pistollato, Maria Rosa, and Luigi Spezia 3 1 Istituto di Statistica

More information

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. January 28, 2015

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. January 28, 2015 MFM Practitioner Module: Risk & Asset Allocation Estimator January 28, 2015 Estimator Estimator Review: tells us how to reverse the roles in conditional probability. f Y X {x} (y) f Y (y)f X Y {y} (x)

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression

On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression Joyee Ghosh, Yingbo Li, Robin Mitra arxiv:157.717v2 [stat.me] 9 Feb 217 Abstract In logistic regression, separation occurs when

More information

Bayesian shrinkage approach in variable selection for mixed

Bayesian shrinkage approach in variable selection for mixed Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information