Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection

Size: px
Start display at page:

Download "Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection"

Transcription

1 Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection Prof. Nicholas Zabaras University of Notre Dame Notre Dame, IN, USA URL: September 22,

2 Contents The Caterpillar Regression Problem, Linear Regression Model, MLE Estimator, Likelihood, MLE Estimates for our example, Conjugate Priors, Conditional and Marginal Posteriors, Ridge regression, Predictive Distribution, Implementation, Influence of the Conjugate Prior Zellner s G Prior, Marginal Posterior Mean and Variance, Predictive Modeling, Credible Intervals Jeffreys non-informative Prior, Credible Intervals, Zellner s G Prior Marginal Distribution of y, Point Null Hypothesis and Bayes Factors Variable Selection, Model Competition, Variable Selection-Prior, Stochastic Search for the Most Probable Model, Gibb s Sampling for Variable Selection, Implementation C. P. Robert, The Bayesian Core, Springer, 2 nd edition, chapter 3 (full text available) 2

3 Regression Regression refers to statistical analysis that deals with the representation of dependencies between several variables. In particular, we want to find a representation of the distribution f(y q,x) of an observable variable y given a vector of observables x, using samples of (x i,y i ), i=1,,n. Here, we will consider a particular example of modeling dependencies in pine processionary caterpillar colony size. 3

4 Linear Regression Models In linear regression, we analyze the linear influence of some variables on others. In our particular example of pine processionary caterpillar colonies, Response Variable: y is the number of processionary caterpillar colonies Explanatory Variables: Covariates x=(x 1,x 2,,x k ) as defined next (in general can be continuous, discrete or mixed type) 4

5 Caterpillar Regression Problem The pine processionary caterpillar colony size (y=number of nests) is influenced by the following explanatory variables: x 1 is the altitude (in meters) x 2 is the slope (in degrees) x 3 is the number of pines in the square x 4 is the height (in meters) of the tree sampled at the center of the square x 5 is the diameter of the tree sampled at the center of the square x 6 is the index of the settlement density x 7 is the orientation of the square (from 1 if southbound to 2 otherwise) x 8 is the height (in meters) of the dominant tree x 9 is the number of vegetation strata x 10 is the mix settlement index (from 1 if not mixed to 2 if mixed). 5

6 Caterpillar Regression Problem x 1 x 2 x 3 Semilog-y plot of the data (x i,y), i=1,..,9 x 4 x 5 x 6 An implementation is available MatLab, C++ x 7 x 8 x 9 6

7 Regression The distribution of y given x is considered in the context of a set of experimental data, i = 1,...,n, on which both y, and x,..., x i are measured. 1 ik The dataset is made from the outcomes y = (y 1,..., y n ) and of the n (k + 1) matrix of explanatory variables X 1 x x... x 1 x x... x 1 x x... x x x... x k k k n1 n2 nk 7

8 Linear Regression The most common linear regression model is of the form: y X X I From this, we conclude that: 2 2,, ~ Nn(, n) X y, x... x Var y i i k ik i 2 2, X Note the difference between finite valued regressors x i (like in our caterpillar problem) and categorical variables which also take finite number of values but with range that has no numerical meaning. Makes no sense to involve x directly in the regression: replace the single regressor x [belonging in {1,...,m}, say] with m indicator (or dummy) variables. 8

9 Categorical Variables Note the difference between finite valued regressors x i (like in our caterpillar problem) and categorical variables which also take finite number of values but with range that has no numerical meaning. Makes no sense to use x directly in the regression: replace the single regressor x [in {1,...,m}, say] with m indicator variables x ( x), x ( x),..., x ( x) Use different i for each class categorical variable value: X 1 1 y, ( x)... ( x) i m m Identifiability requires eliminating one of the classes (e.g. 1 =0) since ( ) 1 1 x i m m 9

10 Returning back to our linear regression model: From this, we conclude that: Assume that k+1<n and that X is of full rank: rank(x) = k + 1 X is of full rank if and only if X T X is invertible The likelihood is then: Linear Regression y X X I 2 2,, ~ Nn(, n) X y, x... x Var y i i k ik i, X T (, y, X ) exp ( ) ( ) 2 n/2 y X y X 2 10

11 MLE Estimator The MLE of is the solution of the least squares minimization problem n i1 y x x 2 T min( y X ) ( y X ) min... i 0 1 i1 k ik β = X T X 1 X T y Since β = X T X 1 X T y is a linear transform of y X I 2 ~ Nn(, n) β~n X T X 1 X T Xβ, X T X 1 X T σ 2 I n X X T X 1 = N β, σ 2 X T X 1 Thus β is an unbiased estimator and Var β σ 2, X = σ 2 X T X 1 Also β is the best linear unbiased estimator of β: α R k+1, Var α T β σ 2, X Var α T β σ 2, X where β is any unbiased linear estimator of β 11

12 MLE Estimator The MLE of 2 is the solution of: 1 min σ 2 2σ 2 y Xβ T (y Xβ) n 2 lnσ β=β = 0 2 σ MLE = y Xβ T (y Xβቁ n Let M = X X T X 1 X T be the projection n n matrix of the data y on the column space of X 2 E σ MLE = E y X β T (y Xβ ቁ = E (y My ) n n = E yt I M T (I M)y = E yt (I M)y n n You can easily show that (see this slide): T and a nn matrix then tr T For Y a n dim random vector with Y = Cov Y = V A Y AY = AV A = 12

13 You can easily show that (see this slide): Applying this we have: MLE Estimator T ( ) 2 y I M y MLE n T and a nn matrix then tr 2 In T For Y a n dim random vector with Y = Cov Y = V A Y AY = AV A T tr Cov T y I - M y = I - M (y) y I - M y T T I - M X I - M X rank n k Note : I M has 1 as its eigenvalue with multiplicity n k 1 and T 1 T T T 1 k1 tr I M tr I tr M n tr X X X X n tr X X X X n tr I n k

14 MLE Estimator The expectation of the MLE of 2 is then given as: The unbiased estimator of 2 is thus given: σ 2 = 1 n k 1 y X β T (y Xβ) = 2 2 MLE ( n k 1) n s 2 n k 1, where: s2 y Xβ T y Xβ Indeed using the result from the earlier slide we have: E σ 2 = 1 n k 1 E y X β T (y Xβ ቁ = σ2 n k 1 n k 1 = σ 2 To approximate the covariance of β, in β = N β, σ 2 X T X 1 use: Var β σ 2, X = σ 2 X T X 1 14

15 We can prove this theorem quite easily as follows: T T T T T Y - AY - = Y AY + A AY Y A Using the identity tr W = tr W for any random square matrix W, we can write the l.h.s. of the equation above as: This proves the theorem. Appendix T For Y a n dim random vector with Y = Cov Y = V A Y AY = AV A T and a nn matrix then tr Y - T T T T T T A Y - = Y AY A A A Y AY T A tr T T AY - Y - A Y - Y - AV tr T T T Y - A Y - = Y - A Y - A Y - Y - tr tr tr 15

16 T-Statistic We can define the standard t-statistic as follows: T i = መ β i β i σ 2 ω ii ~T(n k 1,0,1), where: ω i,i = X T X 1 i,i This statistic can be used for hypothesis testing, e.g. accept H : 0 versus H : 0 at the level if 0 i 1 i β i 1 F σ 2 ω n k 1 ii 1 Τ a 2, the 1 Τ a 2 nd quantile of T n k 1 16

17 T-Statistic The frequentist argument in using this bound is that there is significant evidence against H 0 if the p-value is smaller than. መβ i p i = P H0 T i > t i = = σ 2 ω ii P H0 T i < t i + P H0 T i > t i = F n k 1 ( t i ) + 1 F n k 1 ( t i ) < α Finally the statistic T i can be used to derive the frequentist marginal confidence interval is: β i : β i መβ i σ 2 1 ω ii F n k 1 1 aτ2 17

18 MLE (Least Squares) Estimates p i = መβ i መβ i σ 2 t ω i = P ii σ 2 H0 T i > ω ii Coefficients Estimate Std.Error t-value Pr(> t ) intercept ** XV ** XV * XV XV * XV * XV XV XV XV XV An implementation is available MatLab, C++ መβ i σ 2 ω ii 18

19 Likelihood T (, y, X ) exp ( y X ) ( y X ) 2 n/ Note that the likelihood above can now be written in the following very useful form: l(β, σ 2 y, X) = 1 2πσ 2 1 n Τ exp 2 2σ 2 y X β β + β T y X β β + β l(β, σ 2 y, X) = 1 2πσ 2 n Τ exp 2 ቇ = 1 2πσ 2 1 2σ 2 y X β T (y Xβ) 1 2σ 2 β β T X T X(β β = n Τ exp 2 ቇ 1 2σ 2 s2 1 2σ 2 β β T X T X(β β where: s 2 y Xβ T y Xβ, β = X T X 1 X T y 19

20 Conjugate Priors Observing the form of the likelihood suggests the following conjugate prior: β σ 2, X~N k+1 β, σ 2 M 1, M a (k + 1) (k + 1) pos. def. symm. matrix σ 2 X~InvGamma a, b, a, b > 0 The posterior is then π β, σ 2 β, s 2, X 2πσ 2 = σ k 1 2a 2 n exp 1 2σ 2 1 = σ k n 2a 3 exp 1 2σ 2 1 2πσ 2 n Τ exp 2 ቇ 1 2σ 2 s2 1 2σ 2 1 2σ 2 β β T M(β β 1 σ 2 k+1 Τ exp ቇ 2 β β T X T X(β β a+1 exp 1 σ 2 b = s 2 + 2b + β β T X T X(β β) + β β T M(β β ቁ = s2 + 2b + β T M + X T X β 2β T Mβ + X T Xβ + β T Mβ + β T X T X β 20

21 Conjugate Priors: Posterior of given 2 π β, σ 2 β, s 2, X σ k n 2a 3 exp 1 2σ 2 s2 + 2b + β T M + X T X β 2β T Mβ + X T Xβ + β T Mβ + β T X T X β = σ k n 2a 3 exp 1 2σ 2 s 2 + 2b + β M + X T X 1 Mβ + X T Xβ T M + X T X +β T Mβ + β T X T Xβ Mβ + X T Xβ T M + X T X T M + X T X β M + X T X 1 Mβ + X T Xβ M + X T X 1 Mβ + X T Xβ Based on this, the posterior of given 2 is: π β σ 2, y, X exp 1 2σ 2 β E β y, X T M + X T X β E β y, X β σ 2, y, X~N k+1 M + X T X 1 X T Xβ + Mβ, σ 2 M + X T X 1 where E β σ 2, y, X = M + X T X 1 Mβ + X T Xβ and Var β σ 2, y, X = σ 2 M + X T X 1 21

22 Conjugate Priors: Marginal posterior of 2 σ k n 2a 3 exp 1 2σ 2 π β, σ 2 β, s 2, X s 2 + 2b + β E β σ 2, y, X T M + X T X β E β σ 2, y, X +β T Mβ + β T X T Xβ Mβ + X T Xβ T M + X T X 1 Mβ + X T Xβ Integrating out gives the following marginal for 2 : π σ 2 β, s 2, X σ n 2a 2 exp 1 2σ 2 β T Mβ + β T X T Xβ + s 2 +2b Mβ + X T Xβ T M + X T X 1 Mβ + X T Xβ To simplify, we will use the following two identities A: B : 1 1 T 1 T 1 T 1 T 1 T 1 M + X X X X X X M X X X X 1 T T 1 T 1 X X M + X X M M X X 1 22

23 Conjugate Priors: Marginal Posterior of 2 π σ 2 β, s 2, X σ n 2a 2 exp 1 2σ 2 β T Mβ + β T X T Xβ + s 2 + 2b Mβ + X T Xβ T M + X T X 1 Mβ + X T Xβ We can simplify the last term above using identities A & B: Mβ + X T Xβ T M + X T X 1 Mβ + X T Xβ = (expandቁ 2β T X T X M + X T X 1 Mβ + β T M T M + X T X 1 Mβ + β T X T X M + X T X 1 X T Xβ = X T X 1 X T X 1 M 1 + X T X 1 1 X T X 1 2β T X T X M + X T X 1 Mβ + β T M T M + X T X 1 Mβ + β T X T X X T X 1 X T X 1 M 1 + X T X 1 1 X T X 1 X T Xβ = 2β T X T X M + X T X 1 Mβ + β T M T M + X T X 1 Mβ + β T X T Xβ β T M 1 + X T X 1 1 β = M 1 + X T X 1 1 M X T X M + X T X 1 M = M M 1 + X T X 1 1 2β T M 1 + X T X 1 1 β + β T Mβ β T M 1 + X T X 1 1 β + β T X T Xβ β T M 1 + X T X 1 1 β = β T Mβ + β T X T Xβ β β T M 1 + X T X 1 1 β β Finally: π σ 2 β, s 2, X σ n 2a 2 exp 1 2σ 2 β β T M 1 + X T X 1 1 β β + s 2 + 2b 23

24 Conjugate Priors: Marginal Posterior of 2 π σ 2 β, s 2, X σ n 2a 2 exp 1 2σ 2 β β T M 1 + X T X 1 1 β β + s 2 + 2b The marginal posterior is: σ 2 y, X~InvGamma n 2 + a, b + s2 2 + β β T M 1 + X T X 1 1 β β 2 The marginal posterior mean for n 2 is then: E π σ 2 y, X = 2b + s2 + β β T M 1 + X T X n + 2a β β 24

25 Conjugate Priors: MLE, Posterior Mean and Ridge Estimator Note that setting M = conditional posterior mean I k+1 Τc, c > 0 and β = 0 k+1 in the E β σ 2, y, X = M + X T X 1 Mβ + X T Xβ we obtain the classical Ridge Regression estimate: E β σ 2, y, X = 1 c I k+1 + X T X 1 X T Xβ = 1 c I k+1 + X T X 1 X T y, yx, 2 The general estimator can be seen as a weighted average of the prior mean and the MLE. 25

26 Conjugate Priors: Marginal Posterior of π β y, X = නπ β σ 2, y, X π σ 2 y, X dσ 2 π β σ 2, y, X N k+1 M + X T X 1 X T Xβ + Mβ, σ 2 M + X T X 1 π σ 2 y, X IG To integrate, we use the following transformations: τ = n 2 + a, b + s2 2 + β β T M 1 + X T X 1 1 β β Τ 1 σ 2, dτ = τ 2 dσ 2, μ = M + X T X 1 X T Xβ + Mβ Z = τ 1 2 β μ T M + X T X β μ + 2b + s 2 + β β T M 1 + X T X 1 1 β β G dτ = 2 G dz 2 26

27 Conjugate Priors: Marginal Posterior of τ = Τ 1 σ 2, dτ = τ 2 dσ 2, μ = M + X T X 1 X T Xβ + Mβ Z = τ 1 2 β μ T M + X T X β μ + 2b + s 2 + β β T M 1 + X T X 1 1 β β dτ = 2 G dz The marginal now takes the form: π β y, X න e Z2 τ k n 2 +a+1 dσ 2 ~ න e Z2 τ k 2 +n 2 +a 1 2dτ 0 0 where ~G k 2 n 2 a 1 2 න e Z2 Z k 2 +n 2 +a 1 2dZ ~G k 2 n 2 a G = β μ T M + X T X β μ + 2b + s 2 + β β T M 1 + X T X 1 1 β β The last integral above is a scalar quantity. 27

28 Conjugate Priors: Marginal Posterior of Thus integrating out 2 leads to a multivariate Student s t marginal posterior on : π β y, X β μ T M + X T X β μ + 2b + s 2 + β β T M 1 + X T X 1 1 β β k 2 n 2 a 1 2 where: μ = M + X T X 1 X T Xβ + Mβ Recall that the density of the multivariate T p q,, is: / 2 / / 2 det p 1 2 T p tq tq Tp q,, 1 28

29 Conjugate Priors: Marginal Posterior of We can now see that our marginal π β y, X β μ T M + X T X β μ + 2b + s 2 + β β T M 1 + X T X 1 1 β β k 2 n 2 a 1 2 can be written (can check with substitution of the Eqs below) in the form of T p / 2 / / 2 det as follows (p=k+1, v=n+2a): p 1 2 T p tq tq q,, 1 β y, X~T k+1 n + 2a, μ, Σ μ = M + X T X 1 X T X β + Mβ Σ = 2b + s2 + β β T M 1 + X T X n + 2a 1 1 β β M + X T X 1 29

30 Multivariate Student s-t Distribution Recall the properties of the multivariate Student s-t distribution:* *Note that in the notation of the tables above the degrees of freedom of the distribution rather than the dimensionality are shown as subscripts, e,g, t v vs t d in the earlier slide. Hopefully the notation will be clear from the discussion. Bayesian Data Analysis, A. Gelman, J. Carlin, H. Stern and D. Rubin,

31 Conjugate Priors: Predictive Distribution For a given (m,k+1) explanatory matrix X, the outcome can be inferred through the predictive distribution* π y σ 2, y, X, X y Since π y X, β, σ 2 N Xβ, σ 2 I m, and since the posterior of conditional on 2 is given as β σ 2, y, X~N k+1 M + X T X 1 X T Xβ + Mβ, σ 2 M + X T X 1 we can see that: is a Gaussian. π y σ 2, y, X, X නπ y β, σ 2, y, X, X π β σ 2, y, X dβ * We will later on integrate 2 out to compute: π y y, X, X 31

32 Appendix We will make use next of the following conditional expectation equalities: E X Z = E E X Y, Z Z Var X Z = Var E X Y, Z Z +E Var X Y, Z Z The proof of these is quite straightforward, e.g. E X Z = න x xp x z dx = න x x න y p x, y z dy dx න x x න y p x y, z p(y z)dy dx = න y න x xp x y, z dx p y z dy = E E X Y, Z Z 32

33 Conjugate Priors: Predictive Distribution We can compute the mean and the variance of this predictive distribution as follows: and E π y σ 2, y, X, X = E π E π y β, σ 2, y, X, X σ 2, y, X, X = Xβ σ 2, y, X, X = X M + X T X 1 X T Xβ + Mβ E π In conclusion: Var π y σ 2, y, X, X = E π Var π y β, σ 2, y, X, X σ 2, y, X, X + Var π E π y β, σ 2, y, X, X σ 2, y, X, X = E π σ 2 I m σ 2, y, X, X + Var π Xβ σ 2, y, X, X = σ 2 I m + Xσ 2 M + X T X 1 X T = σ 2 I m + X M + X T X 1 X T y σ 2, y, X, X~N m XE β σ 2, y, X, σ 2 I m + XVar β σ 2, y, X X T E β σ 2, y, X = M + X T X 1 X T Xβ + Mβ 33

34 Conjugate Priors: Predictive Distribution y σ 2, y, X, X~N XE β σ 2, y, X, σ 2 I m + XVar β σ 2, y, X X T E y σ 2, y, X, X = XE β σ 2, y, X = X M + X T X 1 X T Xβ + Mβ, Var y σ 2, y, X, X = XVar β σ 2, y, X X T = σ 2 X M + X T X 1 X T We now integrate 2 against the posterior distribution σ 2 y, X~IG n 2 + a, b + s2 2 + β β T M 1 + X T X β β to obtain: π y y, X, X න N XE β σ 2, y, X, σ 2 I m + XVar β σ 2, y, X X T 0 IG n 2 + a, b + s2 2 + β β T M 1 + X T X β β dσ 2 34

35 Conjugate Priors: Predictive Distribution π y y, X, X න IG n 2 0 N XE β σ 2, y, X, σ 2 I m + XVar β σ 2, y, X X T + a, b + s2 2 + β β We substitute: T M 1 + X T X β β dσ 2 σ 2 I m + XVar β σ 2, y, X X T = σ 2 I m + X M + X T X 1 X T න σ m n 2a 2 exp 0 π y y, X, X න IG n 2 0 N XE β σ 2, y, X, σ 2 I m + X M + X T X 1 X T s2 + a, b β T β M 1 + X T X 1 1 β β dσ σ 2 {2b + s2 + β β T M 1 + X T X 1 1 β β + y E y σ 2, y, X, X T I m + X M + X T X 1 X T 1 y E y σ 2, y, X, X ൠ dσ 2 35

36 Conjugate Priors: Predictive Distribution න σ m n 2a 2 exp 0 π y y, X, X 1 2σ 2 {2b + s2 + β β T M 1 + X T X 1 1 β β + y E y σ 2, y, X, X T I m + X M + X T X 1 X T 1 y E y σ 2, y, X, X ൠ dσ 2 If we call the term inside {.} as G, then: π y y, X, X න 0 π y y, X, X න 0 σ 2 m+n+2a+2 2 e 1 2σ 2G dσ 2 π y y, X, X G m+n+2a+2 1 Z G 2 2 σ m n 2a 2 e 1 2σ 2G dσ 2 2 න 0 Z m+n+2a+2 2 e Z 1 2 G 1 dz ~G m+n+2a 2 Z2 π y y, X, X {2b + s 2 + β β T M 1 + X T X 1 1 β β + y E y σ 2, y, X, X T I m + X M + X T X 1 X T 1 y E y σ 2, y, X, X } (m+n+2a Τ ) 2 36

37 Conjugate Priors: Predictive Distribution π y y, X, X {2b + s 2 + β β T M 1 + X T X 1 1 β β + y E y σ 2, y, X, X T I m + X M + X T X 1 X T 1 y E y σ 2, y, X, X } (m+n+2a Τ ) 2 This corresponds to a Student s t distribution: / 2 / / 2 det p 1 2 T p tq tq Tp q,, 1 with y y, X, X~T m n + 2a, μ, Σ μ = E y σ 2, y, X, X = X M + X T X 1 X T Xβ + Mβ Σ = 2b + s2 + β β T M 1 + X T X 1 1 β β I m + X M + X T X 1 X T n + 2a 37

38 Implementation of the Conjugate Prior Our Prior: β σ 2, X~N k+1 β, σ 2 M 1, M a (k + 1) (k + 1) pos. def. symm. matrix β, M, a, b. σ 2 X~IG a, b, a, b > 0 We assume that there is no precise information available about β, M, a, b. Let us choose as M = I k+1 /c and, X ~ N 0, c I Let us take a = 2.1 and b = 2, i.e. prior mean and prior variance of σ 2 equal to 1.82 and β = 0 k k 1 k 1 k 1 The intuition is that if c is large, the prior on β should be more diffuse and have less bearing on the outcome. However, this turns out not to be the case. There is lasting influence of c on the posterior means of σ 2 and β 0. 38

39 Influence of the Prior Scale c on Bayesian Estimates of **************** Conjugate Priors ******************** 2 y, X 0 y, X V 0 y, X c E[sigma^2 y, X] E[beta_0 y, X] V[beta_0 y, X] An implementation is available MatLab, C++ E π σ 2 y, X = 2b + s2 + β β T M 1 + X T X n + 2a 2 E π β y, X = M + X T X β X T X β + Mβ Var β y, X = 2b+s2 + β β T M 1 + X T X 1 1 β β n+2a β M + X T X 1 39

40 Bayesian Estimates of for c=100 ************** Conjugate Priors*************** Bayes estimates of beta for c= i y, X V i y, X beta_i E[beta_i y, X] V[beta_i y, X] beta_ beta_ beta_ beta_ beta_ beta_ beta_ beta_ beta_ beta_ beta_ An implementation is available MatLab, C++ 40

41 Influence of the Conjugate Prior The value of c thus has a significant influence on the estimators and even more on the posterior variance. The Bayes estimates stabilize for very large values of c. Thus the prior associated with a particular c should not be considered as a weak or pseudo-noninformative prior but, on the opposite, associated with a specific proper prior information. The dependence on (a, b) is equally strong. Considering these limitations of conjugate priors on at least the posterior variance, a more sophisticated noninformative strategy is needed. We first look a middle-ground perspective which settles the problem of the choice of M. 41

42 Zellner s Informative G-Prior We start with a middle ground solution by introducing only information about the location parameter of the regression but bypassing the selection of the prior correlation structure. β σ 2, X~N k+1 β, cσ 2 X T X 1 σ 2 ~π σ 2 X σ 2 improper Jeffreys prior Zellner, A. (1971). An Introduction to Bayesian Econometrics. John Wiley, New York. Zellner, A. (1984). Basic Issues in Econometrics. University of Chicago Press, Chicago. Carlin, B. and Louis, T. (1996). Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall, New York. 42

43 Zellner s Informative G-Prior We start with a middle ground solution by introducing only information about the location parameter of the regression but bypassing the selection of the prior correlation structure. β σ 2, X~N k+1 β, cσ 2 X T X 1 σ 2 ~π σ 2 X σ 2 improper Jeffreys prior The prior determination is restricted to the choices of β and constant c. c can be interpreted as a measure of the information available in the prior relative to the sample. E.g., we will see that setting 1/c = 0.5 gives the prior the same weight as 50% of the sample. There is still strong influence of c. 43

44 Zellner s Informative G-Prior: Conditional Posterior of The joint posterior now takes the form (note X T X is used in both likelihood and prior): π(β, σ 2 y, X) σ 2 nτ2 exp 1 2σ 2 y X β T y Xβ 1 2σ 2 β β T X T X β β σ 2 k+1 Τ2 exp 1 2cσ 2 β β T X T X β β σ 2 1 We can now compute the following: β σ 2, y, X~exp 1 2σ 2 β β T X T X β β 1 2cσ 2 β β T X T X β β ~exp 1 2 β c c + 1 β c + β T c + 1 cσ 2 XT X β c c + 1 β c + β β σ 2, y, X~N k+1 c c + 1 β c + cσ β 2, c + 1 XT X 1 44

45 Zellner s Informative G-Prior: Posterior Marginal of 2 Starting again with the posterior π(β, σ 2 y, X) σ 2 n+k+ 3Τ2 exp 1 2σ 2 s2 1 2σ 2 β β T X T X β β exp 1 2cσ 2 β β T X T X β β = ~ σ 2 k+1 Τ2 exp 1 2 β c c + 1 β c + β T c + 1 cσ 2 XT X β c c + 1 β c + β σ 2 nτ2+1 exp 1 2σ 2 s2 + 1 β c + 1 β T X T X β β We can now compute the following posterior marginal with integration in : σ 2 y, X~InvGamma n 2, s c + 1 E σ 2 y, X = β β T X T X β β s c + 1 β β T X T X β β n 2 45

46 Zellner s Informative G-Prior: Posterior Marginal of Starting again with the posterior π(β, σ 2 y, X) σ 2 n+k+ 3Τ2 exp{ 1 2σ 2 [ β We can now compute the following posterior marginal with integration in 2 : y, X~ β c c + 1 β c + β T c + 1 c X T X β c c + 1 c c + 1 β c + β T c + 1 c X T X β c c + 1 s β ቋ c + 1 β T X T X β β ] β c + β + s c + 1 β β T X T X β β β β c + β + n+k+ Τ 1 2 β y, X~T k+1 c n, c + 1 β c c + β, s 2 + β β T X T X β β Τ( c + 1ቁ n(c + 1) X T X 1 46

47 Zellner s Informative G-Prior: Bayes Estimates The Bayes estimate of can be derived from: β y, X~T k+1 c n, c + 1 β c c + β, s 2 + β β T X T X β β Τ( c + 1ቁ n(c + 1) X T X 1 E β y, X = 1 c + 1 β + cβ The posterior variance of is also given from above as: V π β y, X = c c + 1 s 2 + β β T X T X β β Τ( c + 1ቁ n 2 X T X 1 47

48 Zellner s Informative G-Prior: Bayes Estimates E β y, X = 1 c + 1 β + cβ If c = 1, you can see from the above equation that it is like putting the same weight on the prior information and on the sample: E β y, X = 1 2 β + β which is the average of the prior mean and the MLE estimator. If c=100, the prior weights 1% of the sample. 48

49 Zellner s Informative G-Prior: Bayes Estimates The Bayes estimate of 2 can be derived from (use the mean of InvGamma) σ 2 y, X~IG n 2, s c + 1 β β T X T X β β as E σ 2 y, X = s2 + β β T X T X β β Τ c + 1 n 2 Note that only when c goes to infinity the influence of the prior vanishes. 49

50 Zellner s Informative G-Prior: Marginal Posterior Mean and Variance of **************** Zellner's G-Prior ***************** Posterior mean and variance of beta for c=100 E β y, X = 1 c + 1 β + cβ V π β y, X = i y, X V i y, X log 10 BF c s 2 + β β T X T X β β Τ( c + 1ቁ c + 1 n beta_i E[beta_i y, X] V[beta_i y, X] log10(bf) See here for BF calculation An implementation (Intercept) (****) is available X (***) MatLab, C++ X (**) X Evidence against H 0 : X (*) X (*) (****) decisive X (***) strong X (**) substantial X (*) poor X X X T X 1

51 Zellner s Informative G-Prior: Marginal Posterior Mean and Variance of **************** Zellner's G-Prior ***************** Posterior mean and variance of beta for c=1000 i y, X V i y, X log 10 BF beta_i E[beta_i y, X] V[beta_i y, X] log10(bf) (Intercept) (***) X (**) X (*) X X X X X X X X c c + 1 An implementation is available MatLab, C++ E β y, X = 1 c + 1 β + cβ V π β y, X = s 2 + β β T X T X β β Τ( c + 1ቁ n 2 See here for BF calculation Evidence against H 0 : (****) decisive (***) strong (**) substantial (*) poor 51 X T X 1

52 Zellner s Informative G-Prior: Predictive Modeling We want to predict (m 1) future observations when the explanatory variables but not the outcome variables X have been observed. y~n m Xβ, σ 2 I m Predictive distribution on y joint posterior distribution on analytically by defined as marginal of the y, β, σ 2. Can be computed π y y, X, X නπ y σ 2, y, X, X π σ 2 y, X, X dσ 2 52

53 Zellner s Informative G-Prior: Predictive Modeling β σ 2, y, X~N k+1 c c + 1 β c + cσ β 2, c + 1 XT X 1 Conditional on σ 2 the future vector of observations has a Gaussian distribution with E π y σ 2, y, X, X = E π E π y β, σ 2, y, X, X σ 2, y, X, X = E π Xβ σ 2, y, X, X = XE π β σ 2, y, X, X E π y σ 2, y, X, X = X β + cβ c + 1 independent of σ 2 This representation is quite intuitive, being the product of the matrix of explanatory variables X by the Bayes estimate of β. 53

54 Zellner s Informative G-Prior: Predictive Modeling β σ 2, y, X~N k+1 c c + 1 β c + cσ β 2, c + 1 XT X 1 Similarly, we can compute: V π y σ 2, y, X, X = E π V y β, σ 2, y, X, X σ 2, y, X, X + V π E π y β, σ 2, y, X, X σ 2, y, X, X = = E π σ 2 I m σ 2, y, X, X + V π Xβ σ 2, y, X, X V π y σ 2, y, X, X = σ 2 I m + c c + 1 X X T X 1 X T 54

55 Zellner s Informative G-Prior: Credible Regions- Highest Posterior Density Regions Here, we are interested on the highest posterior density (HPD) regions on subvectors of the parameter derived from the marginal posterior distribution of. For a single parameter, β i y, X~T 1 c n, c + 1 β c i c + β i, s 2 + β β T X T X β β Τ( c + 1ቁ n(c + 1) ω ii where ii is the (i,i) element of T X X 1 55

56 Zellner s Informative G-Prior: Credible Regions- Highest Posterior Density Regions Let us define Also τ = β + cβ c + 1 K = c s 2 + β β T X T X n(c+1) β β Τ(c+1ቁ X T X 1 = (K ij ) The variable i i i K ii has a T-distribution with n degrees of freedom. 56

57 Zellner s Informative G-Prior: Credible Regions- Highest Posterior Density Regions A 1 HPD interval on i is thus given by (see also here) where F n is the CDF of T(=n). 1 / 2, K F 1 / i Kii Fn i ii n 57

58 Zellner s Informative G-Prior: High Posterior Density (HPD) Intervals for i s Note that these HPD are different from the frequentist confidence intervals defined earlier as: β i y, X~T (n k 1, መβ i, ω i,i s 2 Τ( n k 1) ൯ ω i,i β i መβ i s 2 Τ( n k 1൯ ~T(ν = n k 1,0,1), where: ω i,i = X T X 1 i,i 1 β i : β i መβ i F n k 1 1 aτ2 ω i,i s 2 Τ( n k 1൯ s 2 y Xβ T y Xβ 58

59 Zellner s Informative G-Prior: 95% High Posterior Density (HPD) Intervals for i s beta_i HPD Interval beta_0 [4.6518, ] beta_1 [ , ] beta_2 [ , ] beta_3 [ , ] beta_4 [ , ] beta_5 [0.0152, ] beta_6 [ , ] beta_7 [ , ] beta_8 [ , ] beta_9 [ , ] beta_10 [ , ] c=100 An implementation is available MatLab, C++ 1 β i : β i መβ i F n k 1 1 aτ2 ω i,i s 2 Τ( n k 1൯ 59

60 Zellner s Informative G-Prior: 90% High Posterior Density (HPD) Intervals for i s beta_i HPD Interval beta_0 [5.7435, ] beta_1 [ , ] beta_2 [ , ] beta_3 [ , ] beta_4 [ , ] beta_5 [0.0524, ] beta_6 [ , ] beta_7 [ , ] beta_8 [ , ] beta_9 [ , ] beta_10 [ , ] c=100 ቊβ i : β i መβ i An implementation is available MatLab, C++ Note: The results given in ``C. P. Robert, The Bayesian Core, Springer, 2 nd edition, chapter 3 refer to 90% HPD intervals rather than 95% as posted. These results agree with what is shown above. 60

61 Zellner s Informative G-Prior: Marginal Distribution of y The marginal distribution of y (evidence) is a multivariate T. Since satisfies: β σ 2, X~N k+1 β, cσ 2 X T X 1, the linear transform of Xβ σ 2, X~N Xβ, cσ 2 X X T X 1 X T which implies y σ 2, X~N n Xβ, σ 2 I n + cx X T X 1 X T Integration in 2 with f y X, c = 0 2π 1 1/ nτ2 k+1 Τ2 σ2 nτ2 1 e 1 c + 1 gives (see Appendices): 2σ 2 y X β T I n +cx X T X 1 X T 1 y Xβ dσ 2 f y X, c = c + 1 (k+1) Τ2 π nτ2 Γ( n 2 ) yt y c c + 1 yt X X T X 1 X T y + 1 β c + 1 T X T Xβ 2 c + 1 yt Xβ nτ2 61

62 Appendix It can be easily shown that Indeed: 1 1 T T c T 1 c T I + X X X X I X X X X n n c 1 c c c c T 1 T T 1 T T 1 T T 1 T I X X X X I + X X X X I X X X X X X X X n c 1 n n c1 c c c 2 T T T T T 1 X T 1 X X X X X X X I c- X c n c 1 c 1 X X X I n This can simplify the term in the exponential in the earlier slide: y Xβ T I n + cx X T X 1 X T 1 y Xβ = y Xβ T I n c c + 1 X XT X 1 X T y Xβ = y T y c c + 1 yt X X T X 1 X T y y T Xβ + c c + 1 yt X X T X 1 X T Xβ β T X T y + β T X T Xβ + c β c + 1 T X T X X T X 1 X T y c β c + 1 T X T X X T X 1 X T Xβ = = y T y c c + 1 yt X X T X 1 X T y + 1 c + 1 β T X T Xβ 2 c + 1 yt Xβ 62

63 Let us revisit the matrix Note that for any k 1 : Appendix T 1 I + cx X X T 1 T (1 ) n c I + X X X X X X cx c X which implies that is an eigenvector with eigenvalue (1+c). There are (k+1) of those. X Finally note that for any z in the null space of X T, which implies that these z (n-k-1 in number) are eigenvectors of our matrix with 1 as the eigenvalues. n T 1 T T 1 T I + X X X X X X X X n c z z c z z The determinant of the matrix is then (1 ) k 1 1 n k 1 (1 ) k 1 c c. This explains the derivation in the earlier slide. X T X T z 0, 63

64 Zellner s Informative G-Prior: Point Null Hypothesis If a null hypothesis is H 0 : R = r, R being a q x (k+1) matrix, the model under H 0 can be rewritten as H Nn 0 y,, X ~ ( X, I ) n where 0 is (k + 1 q) dimensional. Under the prior β 0 σ 2, X 0 ~N k+1 q β 0, c 0 σ 2 X 0 T X 0 1 The marginal distribution of y under H 0 is: f y X 0, H 0 = c (k+1 q) Τ2 π nτ2 Γ( n ൰ 2 y T y c 0 c yt X 0 X 0 T X 0 1 X 0 T y + 1 c β 0 T X 0 T X 0 β 0 2 c yt X 0 β 0 64 nτ2

65 Zellner s Informative G-Prior: Bayes Factor The Bayes factor is then given in analytical form as: B π 10 = f y X f y X 0, H 0 = c ൫k+1 q ) Τ2 c + 1 ൫k+1) Τ2 y T y c 0 c yt X 0 X T 0 X 1 0 X T 0 y + 1 β c T 0 X T 0 X 0 β 0 2 c yt X 0 β 0 y T y c c + 1 yt X X T X 1 X T y + 1 β c + 1 T X T X β 2 c + 1 yt X β n/2 Note that we use the same 2 in both models The Bayes factor depends on c 0 and c. This calculation can be used to evaluate the inclusion or not of each of the terms β i, i = 1,.., k + 1 in our regression model. Note:X 0 = X :, setdiff(1: k + 1, j) 65

66 Noninformative Prior Analysis: Jeffreys Prior Considering the robustness issues of the two priors examined earlier in the case of a complete lack of prior information, we consider now the non-informative Jeffreys prior. The Jeffreys prior is a flat prior on 2 (,log ) J (, X ) 2 2 The posterior is then given as: π J (β, σ 2 y, X) σ 2 Τ n 2 exp 1 ቇ 2σ 2 y X β T (y Xβ) 1 2σ 2 β β T X T X(β β σ 2 = σ 2 ൫k+1 ) Τ 2 exp 1 ቇ 2σ 2 β β T X T X(β β σ 2 (n k 1 Τ ) 2+1 exp 1 2σ 2 s2 66

67 Noninformative Prior Analysis: Jeffreys Prior π J (β, σ 2 y, X) σ 2 ൫k+1 ) Τ 2 exp 1 ቇ 2σ 2 β β T X T X(β β From this joint posterior, we can immediately evaluate the following conditional and marginal posteriors: ( nk1)/21 2 J ( n k 1) s ( yx, ) exp s IG, From the 1st of these eqs., the Bayes estimate of 2 is: 2 2 s ( yx, ) n k 3 σ 2 (n k 1 ) Τ 2+1 exp 1 2σ 2 s2 π J (β σ 2, y, X) σ 2 ൫k+1 ) Τ 2 exp 1 ቇ 2σ 2 β β T X T X(β β = N k+1 β, σ 2 X T X 1 This estimate is larger (more pessimistic) than earlier estimates 67

68 Noninformative Prior Analysis: Jeffreys Prior π(β, σ 2 y, X) σ 2 Τ n 2+1 exp To compute the marginal posterior of, we need to integrate in 2. Using symbolic integrator: π(β y, X) ඳ σ 2 nτ2 1 exp s 2 + β β T X T X(β βቁ 2σ 2 s 2 + β β T X T X(β βቁ 2σ 2 dσ 2 s 2 + β β T X T X(β β ቁ nτ2 = 1 + This is a Students T distribution: π(β y, X) T k+1 ν = n k 1, θ, Σ, with: θ = β, Σ = s2 X T X 1 Thus the Bayes estimate of is: β β T s 2 X T X 1 n k 1 n k 1 E π (β y, X) = β The HPD intervals coincide with the frequentist confidence intervals. 1 (β β ൱ n k 1 68 nτ2

69 Zellner s Non-Informative G-Prior The main difference with informative G-prior setup is that we now consider c as unknown. We use the same G-prior distribution with β = 0 k+1 conditional on c, and introduce a diffuse prior on c, 1 ( c) c N* ( c) The corresponding marginal posterior is given as: 2 2 (, y, X ) (, y, X, c) ( c y, X ) dc c1 c1 2 (, y, X, c) 2 1 f y X, c () c y X (, y, X, c) f y X, c c 69

70 Zellner s Non-Informative G-Prior (, y, X ) (, y, X, c) f y X, cc c1 We have proved earlier (for constant c) that: y T y f y X, c = c + 1 (k+1) Τ2 π nτ2 Γ n 2 c c + 1 yt X X T X 1 X T y + 1 β c + 1 T X T Xβ 2 c + 1 yt Xβ nτ2 This for our problem with β = 0 k+1 ( k 1)/2 T c T T 1 T f y X, c c1 y y y X X X X y c 1 n/2 70

71 Zellner s Non-Informative G-Prior: Posterior Mean of Recall that y, X~T k+1 c n, c + 1 β c c + β, s 2 + β β T X T X β β Τ( c + 1ቁ n(c + 1) X T X 1 The Bayes estimates of for β = 0 k+1 is now given by: π β y, X = E π E π β y, X, c y, X = E π c c + 1 β y, X = c=1 σ c=1 c π c y, X c + 1 π c y, X β E π β y, X = c f y X, c c 1 c + 1 β f y X, c c 1 c=1 σ c=1 71

72 Zellner s Non-Informative G-Prior: Posterior Mean of 2 Recall that σ 2 y, X~IG n 2, s c + 1 β β T X T X β β E σ 2 y, X = s2 + β መβ T X T X β β Τ c + 1 n 2 The Bayes estimate of 2 is given similarly by: E π σ 2 y, X = c=1 s 2 + β T X T X β Τ( c + 1൯ f y X, c c n 2 1 σ c=1 f y X, c c 1 Both Bayes estimates involve infinite summations on c. The denominator in both cases is the normalizing constant 1 of the posterior f y X, cc c1 72

73 Zellner s Non-Informative G-Prior: Posterior Variance of β y, X~T k+1 c n, c + 1 β c c + β, s 2 + β β T X T X β β Τ( c + 1ቁ n(c + 1) X T X 1 V π β y, X = E π V π β y, X, c y, X + V π E π β y, X, c y, X = c s 2 + β T X T X β Τ( c + 1൯ X T X 1 c + V n 2 (c + 1) π β y, c + 1 X = E π β c=1 f y X, c n 2 (c + 1) c=1 s 2 + β T X T X β Τ( c + 1൯ σ c=1 f y X, c c 1 X T X 1 + c c + 1 E π 2 c c + 1 y, X f y X, c c 1 β σ T c=1 f y X, c c 1 See earlier slide for the term f y X, c 73

74 Zellner s Non-Informative G-Prior: Marginal Distribution The marginal distribution of the dataset is available in closed form c1 y X y X, f f c c c c1 1 ( k 1)/2 T T T 1 T ( c1) The T-shape means that we can also compute the normalizing constant! 1 c y y y X X X X y c 1 n/2 74

75 Zellner s Informative G-Prior: Posterior Mean and Variance i y, X V i y, X beta_i E[beta_i y, X] V[beta_i y, X] (Intercept) X X X X X X X X X X β c=1 A Matlab implementation can be downloaded here E π β y, X = V π β y, X = f y X, c n 2 (c + 1) c=1 c f y X, c c 1 c + 1 β f y X, c c 1 c=1 σ c=1 s 2 + β T X T X β Τ( c + 1൯ σ c=1 f y X, c c 1 X T X 1 + c c + 1 E π 2 c c + 1 y, X f y X, c c 1 β σ T c=1 f y X, c c 1 We use β = 0 11, c =

76 Zellner s Non-Informative G-Prior: Point Null Hypothesis If a null hypothesis is H 0 : R = r, the model under H 0 can be rewritten as where Under 0 () c is (k + 1 q) dimensional. and the prior the marginal distribution of y under H 0 is: The Bayes factor computed H Nn 0 y,, X ~ ( X, I ) c 1, X, c~ N 0, c X X T 0 k 1 q k 1 q ( k1 q)/2 T c T T 1 T f y X 0, H0 c c 1 y y y X 0 X 0 X 0 X 0 y c1 c 1 /, H B = f y X f y X n 1 n/2 can now be 76

77 Zellner s Non-Informative G-Prior: Posterior Mean and Variance For H B y X y X : 0, log i, V i, log ( BF) beta_i E[beta_i y, X] V[beta_i y, X] log10(bf) (Intercept) (***) X (**) X (**) X X (*) X (*) X X X X X A Matlab implementation can be downloaded here ( We use 0 ) Evidence against H 0 : (****) decisive (***) strong (**) substantial (*) poor 77

78 Variable Selection Let us return to our regression model with one dependent random variable y and a set of k x1, x2,..., xk explanatory variables. Are all the x i s needed in the regression? We assume that every q-subset explanatory variables,, 0 q k, of the is a proper set of explanatory variables for the regression of y (as before, the intercept is included in all models). We have a total of 2 k models to select from! i1 i2 i q 1, x, x,..., x i1, i2,..., i q 78

79 Variable Selection Following earlier notation, we denote: X= 1... n x1 x2 xk as the matrix that contains 1 n and the k potential predictor variables. Each model M g is associated with binary indicator vector g where g i =1 means that the variable x i is included in the model M g and g i =0 that it is not. The number of variables included in the model M g is: The indices of the variables included in the model and not included in the model are denoted, respectively, as: t ( g), t ( g) 0,1 k T q g 1 g

80 Variable Selection Models in Competition For k 1 and X, we define β g as the sub-vector g 0, i i t ( g ) 1 Let X g be the submatrix of X where only the column 1 n and the columns in t 1 (g) have been left. The model M g is then defined as: y g,,, X ~ N X, I 2 2 g n g g n where g q g 1 2 *, are the unknown parameters. The 2 is common to all models and we use the same prior for all models. 80

81 Variable Selection Models in Competition We have a high number 2 k of models in competition. We cannot specify a prior on every M g in a completely subjective and autonomous manner. We derive all priors from a single global prior associated with the full model that corresponds to g = (1,..., 1). 81

82 Zellner s Informative Prior: Variable Selection For the full model that corresponds to g = (1,..., 1), we use the Zellner s informative G-prior: β σ 2, X~N k+1 β, cσ 2 X T X 1 σ 2 ~π σ 2 X σ 2 improper Jeffreys prior For each model M g, the prior distribution of g conditional on 2 is fixed as: β γ γ, σ 2 ~N qγ +1 β γ, cσ 2 X γ T X γ 1 where β and same prior on 2 γ = X T 1 γ X γ Xγ T β. 82

83 Zellner s Informative Prior: Variable Selection Prior The joint prior for model M g is the improper prior π β γ, σ 2 γ σ 2 (q γ+1 Τ ) 2 1 exp 1 2 cσ 2 β γ β γ T Xγ T X γ β γ β γ Infinitely many ways of defining a prior on the model index γ: Our choice is a uniform prior p(g X) = 2 -k. Posterior distribution of γ is central to variable selection since it is proportional to marginal density of y on M g (or evidence of M g ) 83

84 Zellner s Informative Prior: Variable Selection Prior Posterior distribution of γ is proportional to the marginal density of y on M g (so it can also be used to compute Bayes factors) g y, X f y g, X g X f y g, X y g,,, X g,, X X f 2 2 d 2 d 2 where (see earlier derivation) exp 1 2σ 2 yt y + f y γ, σ 2, X = නf y γ, β, σ 2 π β γ, σ 2 dβ = 1 2σ 2 (c + 1) = c + 1 q γ+1 Τ2 2π nτ2 σ 2 nτ2 cyt X γ X γ T X γ 1 Xγ T y β γ T X γ T X γ β γ + 2y T X γ β γ 84

85 Zellners Informative Prior: Variable Selection Prior 1 g y, X f y g,, X X d f y g,, X d Posterior distribution of γ is then given as: y T y f γ y, X c + 1 Τ q γ+1 2 c c + 1 yt X γ X T 1 γ X γ Xγ T y + 1 β c + 1 T γ X T γ X γ β γ 2 c + 1 yt X γ β γ nτ2 We already have seen this distribution earlier for a fixed c: f y X, c = c + 1 (k+1) Τ2 π nτ2 Γ( n 2 ) yt y c c + 1 yt X X T X 1 X T y + 1 β c + 1 T X T Xβ 2 c + 1 yt Xβ nτ2 85

86 Model Selection Most likely models ordered by decreasing posterior probabilities using Zellner s informative G-prior with c=100. t ( g ) 1 g ( yx, ) t1_gamma pi(gamma y, X) An implementation is available MatLab, C++ 86

87 Model M g with the highest posterior probability is t 1 (g) = (1, 2, 4, 5), which corresponds to the variables altitude, slope, Model Selection height of the tree sampled in the center of the area, and diameter of the tree sampled in the center of the area. 87

88 Model Selection For the Zellner s non-informative prior with (c)=1/c, we have ( ) : β = 0 k+1 q T c T T 1 T y X y y y Xg Xg Xg Xg y c1 c /2, c c 1 g g n/2 Again we have seen this before as c1 y X y X, f f c c c c1 1 ( k 1)/2 T T T 1 T ( c1) 1 c y y y X X X X y c 1 n/2 88

89 Model Selection Most likely models ordered by decreasing posterior probabilities using Zellner s non-informative G-prior. t ( g ) g ( yx, ) t1_gamma pi(gamma y, X) An implementation is available MatLab, C++ 89

90 Stochastic Search for the Most Likely Model When k is large, it becomes computationally intractable to compute the posterior probabilities of the 2 k models. Need of a tailored algorithm that samples from (g y,x) and selects the most likely models. Can be done by Gibbs sampling*, given the availability of the full conditional posterior probabilities of the g i s. If g i = (g 1,..., g i 1, g i+1,..., g k ) (1 i k) then (g i y, g -i,x) (g y,x) (to be evaluated in both g i = 0 and g i = 1) * Gibbs and other sampling algorithms will be introduced and discussed in detail in forthcoming lectures. 90

91 Gibbs Sampling for Variable Selection Initialization: Draw g 0 from the uniform distribution on Iteration t: Given g1,..., g k ( t1) ( t1), generate g according to g y, g,..., g, ( t) ( t1) ( t1) k X g according to g y, g, g,..., g, ( t) ( t) ( t1) ( t1) k X.... ( ) ( ) ( ) ( ) g t, t 1, t 2,..., t k according to g k y g g g k 1, X 91

92 Gibbs Sampling for Variable Selection Question: How to sample g according to ( t 1) ( t g 1) 1 y, g 2,..., g k, X t 1 1. The conditional distribution ( t1) ( t1) g1 y, g 2,..., g k, X is proportional to g yx, 2. Since only has two possible values which are 0 and 1, we get g t 1 ( t1) ( t1) p0 g1 0, g 2,..., g k yx, ( t1) ( t1) p1 g1 1, g 2,..., g k yx, t So the probability that g1 1 is p 1, then we can use Gibbs Sampling to approximate the distribution of { } g t i 92

93 Gibbs Sampling: Posterior Probabilities After T 1 MCMC iterations, we approximate the posterior probabilities p(g y,x) by empirical averages π γ y, X = 1 T T t=t 0 T I γ t =γ The T 0 first values (burn in) in the MCMC chain are eliminated. 93

94 Model Choice Comparison: Gibbs Estimates First level Informative G-prior model with ( = 0 11, c=100) compared with the Gibbs estimates of the top ten posterior probabilities t g ( yx, ) 1 g π(γ y, X) t1_gamma pi(gamma y, X) pi_hat(gamma y, X) A MatLab implementation is available β 94

95 Model Choice Comparison: Gibbs Estimates Non-informative G-prior variable model choice compared with the Gibbs estimates of the top ten posterior probabilities t ( ) 1 g ( yx, ) π(γ y, X) t1_gamma pi(gamma y, X) pi_hat(gamma y, X) A MatLab implementation is available 95

96 Gibbs Sampling: Probabilities of Inclusion An approximation of the probability to include the i-th variable: P π γ i = 1 y, X = 1 T T t=t 0 T I γi t =1 96

97 Probabilities of Inclusion Estimates Informative ( β = 0 11, c=100) and non-informative G-prior variable inclusion estimates (based on the same Gibbs output as in the earlier two tables) g i P (γ i = 1 y, X൯ P π (γ i = 1 y, X൯ gamma_i P(gamma_i y, X) P(gamma_i y, X) gamma_ gamma_ gamma_ gamma_ gamma_ gamma_ gamma_ gamma_ gamma_ gamma_ A MatLab implementation is available 97

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web: capture/recapture.

More information

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Introduction to Markov Chain Monte Carlo & Gibbs Sampling Introduction to Markov Chain Monte Carlo & Gibbs Sampling Prof. Nicholas Zabaras Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Ithaca, NY 14853-3801 Email: zabaras@cornell.edu

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

Stat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010

Stat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010 Stat60: Bayesian Modelin and Inference Lecture Date: March 10, 010 Bayes Factors, -priors, and Model Selection for Reression Lecturer: Michael I. Jordan Scribe: Tamara Broderick The readin for this lecture

More information

Bayesian Inference: Concept and Practice

Bayesian Inference: Concept and Practice Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

AMS-207: Bayesian Statistics

AMS-207: Bayesian Statistics Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

Conjugate Analysis for the Linear Model

Conjugate Analysis for the Linear Model Conjugate Analysis for the Linear Model If we have good prior knowledge that can help us specify priors for β and σ 2, we can use conjugate priors. Following the procedure in Christensen, Johnson, Branscum,

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Bayesian Econometrics

Bayesian Econometrics Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence

More information

Quantile POD for Hit-Miss Data

Quantile POD for Hit-Miss Data Quantile POD for Hit-Miss Data Yew-Meng Koh a and William Q. Meeker a a Center for Nondestructive Evaluation, Department of Statistics, Iowa State niversity, Ames, Iowa 50010 Abstract. Probability of detection

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

Inference for a Population Proportion

Inference for a Population Proportion Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2015 Julien Berestycki (University of Oxford) SB2a MT 2015 1 / 16 Lecture 16 : Bayesian analysis

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

1. The Multivariate Classical Linear Regression Model

1. The Multivariate Classical Linear Regression Model Business School, Brunel University MSc. EC550/5509 Modelling Financial Decisions and Markets/Introduction to Quantitative Methods Prof. Menelaos Karanasos (Room SS69, Tel. 08956584) Lecture Notes 5. The

More information

A noninformative Bayesian approach to domain estimation

A noninformative Bayesian approach to domain estimation A noninformative Bayesian approach to domain estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu August 2002 Revised July 2003 To appear in Journal

More information

A Bayesian Treatment of Linear Gaussian Regression

A Bayesian Treatment of Linear Gaussian Regression A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web this afternoon: capture/recapture.

More information

Bayesian methods in economics and finance

Bayesian methods in economics and finance 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

More information

Lecture 4: Dynamic models

Lecture 4: Dynamic models linear s Lecture 4: s Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Approximating Bayesian Posterior Means Using Multivariate Gaussian Quadrature

Approximating Bayesian Posterior Means Using Multivariate Gaussian Quadrature Approximating Bayesian Posterior Means Using Multivariate Gaussian Quadrature John A.L. Cranfield Paul V. Preckel Songquan Liu Presented at Western Agricultural Economics Association 1997 Annual Meeting

More information

BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS

BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS Andrew A. Neath 1 and Joseph E. Cavanaugh 1 Department of Mathematics and Statistics, Southern Illinois University, Edwardsville, Illinois 606, USA

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Gibbs Sampling in Latent Variable Models #1

Gibbs Sampling in Latent Variable Models #1 Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor

More information

Bayesian Linear Models

Bayesian Linear Models Eric F. Lock UMN Division of Biostatistics, SPH elock@umn.edu 03/07/2018 Linear model For observations y 1,..., y n, the basic linear model is y i = x 1i β 1 +... + x pi β p + ɛ i, x 1i,..., x pi are predictors

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Bayesian Inference. Chapter 1. Introduction and basic concepts

Bayesian Inference. Chapter 1. Introduction and basic concepts Bayesian Inference Chapter 1. Introduction and basic concepts M. Concepción Ausín Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master

More information

Bayesian Regressions in Experimental Design

Bayesian Regressions in Experimental Design Bayesian Regressions in Experimental Design Stanley Sawyer Washington University Vs. April 9, 8. Introduction. The purpose here is to derive a formula for the posterior probabilities given observed data

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Using prior knowledge in frequentist tests

Using prior knowledge in frequentist tests Using prior knowledge in frequentist tests Use of Bayesian posteriors with informative priors as optimal frequentist tests Christian Bartels Email: h.christian.bartels@gmail.com Allschwil, 5. April 2017

More information

Heriot-Watt University

Heriot-Watt University Heriot-Watt University Heriot-Watt University Research Gateway Prediction of settlement delay in critical illness insurance claims by using the generalized beta of the second kind distribution Dodd, Erengul;

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT / 15

Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT / 15 Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 15 Contingency table analysis North Carolina State University

More information

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Ioannis Ntzoufras, Department of Statistics, Athens University of Economics and Business, Athens, Greece; e-mail: ntzoufras@aueb.gr.

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

A Few Special Distributions and Their Properties

A Few Special Distributions and Their Properties A Few Special Distributions and Their Properties Econ 690 Purdue University Justin L. Tobias (Purdue) Distributional Catalog 1 / 20 Special Distributions and Their Associated Properties 1 Uniform Distribution

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

MAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM

MAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM Eco517 Fall 2004 C. Sims MAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM 1. SOMETHING WE SHOULD ALREADY HAVE MENTIONED A t n (µ, Σ) distribution converges, as n, to a N(µ, Σ). Consider the univariate

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Bayesian Model Averaging

Bayesian Model Averaging Bayesian Model Averaging Hoff Chapter 9, Hoeting et al 1999, Clyde & George 2004, Liang et al 2008 October 24, 2017 Bayesian Model Choice Models for the variable selection problem are based on a subset

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap

Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap Dale J. Poirier University of California, Irvine September 1, 2008 Abstract This paper

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors

Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors The Canadian Journal of Statistics Vol. xx No. yy 0?? Pages?? La revue canadienne de statistique Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors Aixin Tan

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Environmentrics 00, 1 12 DOI: 10.1002/env.XXXX Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Regina Wu a and Cari G. Kaufman a Summary: Fitting a Bayesian model to spatial

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

Bayesian Inference for the Multivariate Normal

Bayesian Inference for the Multivariate Normal Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract Bayesian analysis of a vector autoregressive model with multiple structural breaks Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus Abstract This paper develops a Bayesian approach

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information