Fte β, λ. N ne. ,λ 1. Z Rte R te,tr R 1

Size: px
Start display at page:

Download "Fte β, λ. N ne. ,λ 1. Z Rte R te,tr R 1"

Transcription

1 02 4 Bayesian Prediction Methodology from Y te ] ] Fte Y ϑ N ne +n s β, λ F Rte R te, R te, R 4..3 where ϑ depends on the case being studied. A second stage specifies the prior disibution of ϑ ]. The following notation is used throughout the chapter. W] denotes the disibution of W; where needed explicitly, πw, EW}, and Cov W denote the joint probability density function of W, the mean of W, and the variance-covariance maix of W, respectively; F te is the n e p maix whose i th row consists of the known regression functions for the input x te i, i n e; F denotes the n s p maix of known regression functions for the n s aining data inputs; β denotes the unknown p vector of regression coefficients; λ λ is the precision variance of the GP that describes deviations from the regression; R te, the n e n e correlation maix Cor Y te, Y te ; R te, is the n e n s cross-correlation maix Cor Y te, Y ; R is the n s n s correlation maix Cor Y, Y ; κ denotes the vector of parameters that determine a given correlation function; ϑ denotes the vector of all unknown model parameters for the model under discussion;ϑisβin Section 4.2.,ϑ is β,λ in Section 4.2.2, andϑis β,λ,κ in Section 4.3. For easy reference note that when 4..3 holds then conditionally Y te Y = y,ϑ] N ne Fte β+ R te, R y F β ],λ Rte R te, R R te, 4..4 whereϑdepends on the case being studied. The chapter is organized as follows. Section 4.2 presents conjugate cases where analytic expressions can be derived forπ Y te Y, E Y te Y }, and Var Y te Y. The caseϑ=β is a toy example that is sufficiently simple to illusate the calculations that become more involved in the remaining conjugate and non-conjugate cases. The conjugate caseϑ=β,σ 2 assumes that the local min/max sucture of the residual process is known and hence is usually not directly useful in applications. However, in conjunction with the conjugate results forϑ=β,σ 2, Section 4.3 describes Bayesian methodology for the caseϑ=β,σ 2,κ which is used in practical settings.

2 4.2 Conjugate Bayesian Models and Prediction Examples of Conjugate Bayesian Models and Prediction The idea that is used in this and the following sections is that the predictive density π y te y can be obtained as π y te y = π y te,ϑ y dϑ= π y te ϑ, y π ϑ y dϑ where ϑ are the unknown model parameters. Inference about the unknown model parameters ϑ can be obtained from the conditional disibution of ϑ given the aining data, i.e., from ϑ y ]. For example, the mean of ϑ y ] is a Bayesian estimate of ϑ while the standard deviation of this posterior measures the uncertainty in the estimatedϑ. Theorems 4. and 4.2 also provide the ϑ y ] conditional disibutions Predictive Disibutions When ϑ = β This subsection illusates the application of 4.2. in a toy example that is sufficiently simple that the calculations can be completely provided. It is assumed that the aining and test data can be described as draws from the regression + stationary GP model, in which the regression coefficient is unknown but with known process precision and correlations, i.e., ν = β. The predictive disibution Y te Y ] is derived; as a consequence the Bayesian predictor EY te Y } and the uncertainty quantification Var Y te Y are immediately available. Section will consider a more challenging model that has application to the usual case where all parameters of the regression + stationary GP model are unknown. The following theorem provides the predictive disibution of Y te for two different conjugate choices of second-stage priors for β. The first choice is the normal prior which can be regarded as an informative choice while the second can be thought of as a non-informative prior. Formally, the non-informative prior and predictive disibution is obtained by letting the prior precisionλ β 0 in the normal prior case a. Theorem 4.. Suppose Y te, Y follows a two-stage model with first-stage Y te ] ] Fte Y β N ne +n s β,λ F Rte R te, R. te, R whereβis unknown whileλ and all correlations are known. a If β] N p bβ,λ β V β], 4.2.2

3 04 4 Bayesian Prediction Methodology is the second-stage model where V β is a known positive definite correlation maix with b β andλ β also known, then the posterior disibutions of b β is β Y ] N p µβ,σ β ], where µ β = λ F R F +λ β V β λ F R F β+λβ V β b β, and Σ β = λ F R F +λ β V β The predictive disibution of Y te is Y te Y = y ] N ne µte,σ te ], with mean and covariance maix µ te =µ te x te = F te µ β + R te, R y F µ β, Σ te =λ F te, R te, λ β ] λ V β F F te F R R te, b If πβ = on IR p, then the posterior disibutions of b β is ] β Y N p β= F R F F R y,λ F R F ], The predictive disibution of Y te is Y te Y n = y n ] N ne µte,σ te ], where the meanµ te and covarianceσ te are modifications of and 4.2.6, respectively, in whichµ β is replaced by β and λ β λ V β is replaced by the p p maix of zeroes. The proof of Theorem 4. is given Section 4.5. It requires saightforward calculations to implement the right-hand integral in Consider first some observations about inferences concerning β provided by the posterior disibution β Y ]. The posterior mean ofβdepends only the ratioλ β /λ because

4 4.2 Conjugate Bayesian Models and Prediction 05 µ β =λ F R F + V λ β λ β /λ F R F β+v β b βλ β /λ, = F R F + V β λ β /λ F R F β+v β b βλ β /λ. In the special case of a where the prior variance ofβis identical to the Yx process variance, i.e.,λ β =λ, then both the posterior mean and covariance simplify further. In this situation, the Bayes estimator ofβcan be thought of as µ β = F R F + V β F R F β+v β b β =Ω β+i p Ω b β whereω= F R F + V β F R F which is the maix convex combination of the prior mean b β and the BLUP β. In conast, the posterior covariance, Σ β = λ F R F +λ β V β ] =λ F R F + V β λ β/λ ], depends on the individual precision parameters whether or notλ β =λ. Prediction at a Single Test Input x te To provide additional insight about the nature of the Bayesian predictorµ te and its UQ quantificationσ 2 te that are given by Theorem 4., we resict attention to the case of a single test input x te to simplify the discussion. Examining first the predictive mean, the following properties hold for both cases a and b. Algebra shows thatµ te =µ te x te is linear in y and, with additional calculation, that it is an unbiased predictor of Yx te, i.e.,µ te x te is a linear unbiased predictor of Yx te. The continuity and other smoothness properties ofµ te are inherited from those of the correlation function R and the regressors f j } p j= because µ te = f x i µ β + r ter Y F µ β p n = f j x te µ β, j + d i Rx te x i, j= whereµ β, j is the j th element ofµ β. Thus in case bµ te depends only on the ratio ofλ andλ β because this is ue ofµ β. Lastly,µ te interpolates the aining data. This is ue because when x te = x i for some i,..., n}, f te = f x te = fx i, and r te R = e i, the ith unit vector. Thus i= µ te = f x i µ β + r ter Y F µ β = f x i µ β + e i Y F µ β = f x i µ β + Y i f x i µ β = Y i.

5 06 4 Bayesian Prediction Methodology For prior b, the n e n e posterior covarianceσ te reduces to σ 2 te xte =λ r te R r te+h F te R F te h } where h= f te F ter r te ; equation was given previously as the variance of the BLUP For prior a, σ 2 te =λ =λ =λ =λ where h= fx te F R r te and λ f te, r β ] λ te V β F f te F R r te f te Q f te + 2 f te Q F R r te + r te R R F Q F R }r ]} te r te R r te+ f te Q f te 2 f te Q F R r te } + r ter F Q F R r te 4.2. r te R r te + h Q h }, Q= F R F + λ β λ V β ; the equality in 4.2. follows from Lemma C.3. Intuitively, the variance of the posterior of Yx te given the aining data should be zero whenever x te = x i, i n s because we know exactly the response at each of the aining data sites and there is no measurement error term in the stochastic process model. This was shown previously for 4.2.0, prior b. To see that this is also the case for prior a, fix x te = x, say. In this case, recall that r te R = e, and observe that fx te = fx. From 4.2.2, σ 2 te x i=λ r te R r te+ f x te r te R F Q fx te F te R r te } e r te x + f x e i F Q fx F e i } =λ =λ + f x f x Q fx fx } =λ +0}=0 where Q is given in Perhaps the most important use of Theorem 4. is to provide pointwise uncertainty bands about the predictorµ te x te in The bands can be obtained by using the fact that Yx te µ te x te N0,. σ 2 te xte This gives the posterior prediction interval

6 4.2 Conjugate Bayesian Models and Prediction Fig. 4. The function yx = exp.4x} cos3.5πx solid line; a seven point input aining data sets solid circles; the Bayesian predictorµ te =µ β +r te R y n µ β in and forλ β = 0 blue, forλ β = 0 red, and forλ β = 00 green. PYx te µ te x te ±σ te x te z α/2 Y }= α, where z α/2 is the upperα/2 critical point of the standard normal disibution see Appendix A. As a special case, if the input x te a, b, thenµ te x te ±σ te x te z α/2 are pointwise 00 α% prediction bands for Yx te, a< x te < b. Below, we illusate the prediction band calculation for the hierarchical Y 0, Y model presented in Theorem 4.2. Example 4. Damped Sine Curve. This example illusates the effect of the prior β] on the mean of the predictive disibutionµ te x te in Theorem 4.. Consider the damped cosine function yx=e.4x cos7πx/2, 0< x<, and n s = 7 points of aining data which are the solid curve and dots in in Figure 4., respectively. For any x te 0, The predictive disibution of Yx te is based on the hierarchical Bayes model whose first stage is the stationary stochastic process Yx β0 ] =β0 + x, 0< x<, whereβ 0 IR and for purpose here of illusating the case where onlyβ 0 is known, Rh is taken to be Rh=exp 0.0 h 2 }. Suppose in part a of Theorem 4. it is assumed thatβ 0 Nb te, λ β v 2 te with v te = with known prior mean, b te, and known prior precision,λ β. For any x te 0,, the Bayesian predictor of yx te is the posterior mean µ te =µ te x te =µ β +r ter whereµ β is the posterior mean ofβ 0 given Y n which is y n µ β, 4.2.4

7 08 4 Bayesian Prediction Methodology Fig. 4.2 The factor r te R n versus x te 0,. µ β = n R yn + b te λ β /λ n R n+λ β /λ =ωb te + ω n R n n R y n =ωb te + ω β 0, whereω=λ β /λ n R n+λ β ] 0,. In words, can be interpreted as saying that the posterior mean ofβ 0 given Y is a convex combination of the prior mean b β and the MLE ofβ 0. For fixed process precisionλ,ω andµ β b β as theβ 0 prior processλ β ; the predictor guesses the prior mean and ignores the data. Similarly,ω 0andµ β β 0 as theβ 0 prior processλ β 0; the predictor uses only the data and ignores the prior information. However, the impact of the prior precision ofβ 0 on the predictorµ te x te of yx te can be relatively minor. Consider the data shown in Figure 4. by the solid circles that are superimposed on the damped sine curve. Calculation gives β 0 = Suppose b β = 5,θ = 0.0 andλ = 6; thenµ β 5 asλ β and µ β asλ β 0. Figure 4. shows that the effect onµ te x 0 of changing the prior precision is relatively minor. The calculation that shows this effect is to observed that µ te x te = r ter y + r ter n µ β which shows that the magnitude of the Bayes predictor depends on the posterior meanµ β only through the factor r ter n. Figure 4.2 shows that the factor r te R n is relatively minor in the center of the aining data but can increase as x te moves away from the aining data.

8 4.2 Conjugate Bayesian Models and Prediction Predictive Disibutions Whenϑ=β,λ Theorem 4.2 states the predictive disibution of Y te given Y for informative and non-informative β,λ priors. The informative prior is stated in terms of the factors β,λ ]=β λ ] λ ]. In both cases, the Y te Y ] posterior is a location shifted and scaled multivariate t disibution having degrees of freedom that are increased for informative prior information from eitherβorλ see Appendix C.4 for the definition of the non-cenal m-variate t disibution, T m ν,µ,σ. The informative conditional β λ ] choice is the multivariate normal disibution with known mean b β and known correlation maix V β ; lacking more definitive information, V β is often taken to be diagonal, if not simply the identity maix. This prior model makes song assumptions; for example, it says that each component of β is equally likely to be less than or greater than the corresponding component of b β. The non-informativeβ prior is the intuitive choice πβ= used in Theorem 4.. The informative λ ] prior is the gamma disibution with specified mean and variance. This prior can be made quite diffuse and hence is also stated as an option for the non-informative prior case. A second non-informative prior forλ is Jeffreys prior πλ = λ, λ > 0 see Jeffreys 96, who gives arguments for this choice. Theorem 4.2. Suppose Y te, Y follows a two-stage model in which the conditional disibution Y te, Y β,λ ] is given by 4..3 and all correlations are known. a If β,λ ]=β λ ]λ ] has prior specified by β λ ] N p bβ,λ V } β and λ ] Γc, d with known prior parameters then the posterior disibutions ofβandλ are λ y ] Γc a, da and β y ] T p 2c a + n s,µ β, d a Σ β /c a where c a = 2c+n s/2 d a = 2d+ y F β R β= F R F F R F y Σ π = V β + F R F Σ y F β + β bβ π β bβ /2

9 0 4 Bayesian Prediction Methodology µ β = F R F + V β F R F ] β+v β b β Σ β =. V β + F R F The predictive disibution of Y te is Y te Y = y ] T ne 2c+ ns,µ te, M te d a /ca where µ te = F te µ β + R te, R y F µ β M te = R te R te, R R te,+h te Σβ H te b Suppose thatβandλ are independent with β] and λ ] having prior b. or b.2 in the following table λ ] Prior Γc, d /λ Case Designation b. b.2 For b. the posterior disibutions ofβandλ are λ y ] Γc b., db. and β y ] T p 2c+n s p, β, F R F d b. /cb. where β is defined in a and c b. = 2c+n s p/2 d b. = 2d+ y F β R y F β /2. The predictive disibution of Y te is Y te Y = y ] T ne 2c+ns p,µ te, M te d b. /cb where µ te = F te β+ R te, R y F β M te = R te R te, R R te,+h te F R F H te For b.2 the posterior disibutions ofβandλ are λ y ] Γc b.2, db.2 and β y ] T p n s p, β, F R F d b.2 /cb where c b.2 = n s p/2 d b.2 = y F β R y F β /2

10 4.2 Conjugate Bayesian Models and Prediction The predictive disibution of Y te is Y te Y = y ] T ne ns p,µ te, d b.2 M te /c b whereµ te and M te are defined as in b.. The formulas above for the degrees of freedom, location shift, and scale factor in the Y te predictive disibution all have intuitive interpretations. The base value for the degrees of freedom is n s p, which is augmented by p additional degrees of freedom when β has the informative Gaussian case a, and is further augmented by 2c a or 2cb. degrees of freedom whenλ has the informative gamma prior cases c and b.. The mean of Y te predictive disibution is the same for twoλ cases of Theorem 4. and 4.2. In the case of Theorem 4. which has knownλ, and give the predictive mean to be µ te = F te µ β + R te, R y F µ β whereµ β is the Bayes estimator of conditional disibution ofβfor the informative or non-informative priors. In the case of Theorem 4.2 which takesλ to be a hierarchical parameter, the predictive mean is the location parameter of the t-disibution. Examination ofµ te shows it is identical to that of Theorem 4.. Thus the Bayesian predictor is the same for the two cases. The UQ of Y te for knownλ that is given in Theorem 4. and for unknownλ but given a prior given in Theorem 4.2 are related. To simplify the discussion, consider the case of a single input x te at which prediction is desired. Whenλ is known and it is assumedλ =λ β, Theorem 4. gives the predictive variance of Y te to be where h= fx te F R r te and σ 2 te =λ r te R r te + h Q h }, Q= F R F + V β or Q= F R F as the informative or non-informativeβ prior is assumed. Whenλ is unknown but has gamma or Jeffreys prior, Theorem 4.2 gives the predictive variance of Y te σ 2 te = db.i c b.i K r te R r te+h Q h }, i=, where h is as above and K= 2c+n s p/2c+n s p 2 or K= n s p/n s p 2 assuming the denominator is positive. The final factor is the same as in the known λ case. The product of the first two terms in should be thought of as an estimator of the factorλ in This is because the posterior ofλ is gamma with parameters c b.i and d b.i, i=, 2. Thus cb.i /db.i is the mean of theλ posterior

11 2 4 Bayesian Prediction Methodology Fig % pointwise prediction bands for yx at n e = 03 equally spaced x te values over 0,. The solid black curve is the damped sine curve used to generate n s = 7 aining data points solid circles. The left Panel assumesθ=0 in and right Panel assumesθ=75 in and d b.i/cb.i is a naive guess ofλ. The degrees of freedom correction will be near unity for small regression model p and large data and/orλ prior information. Again, considering the case of a single input x te, Theorem 4.2 can be used to place pointwise prediction bands about yx te. Using the fact that, given Y, Yx te µ te x te σ 2 te xte T d.o.f., 0,. where d.o.f. is either 2c+n s p or n s p for the informative or non-informativeλ prior, respectively, gives P Yx te µ te x te ±σ te x te t α/2 Y d.o.f. } = α, where t α/2 ν is the upper α/2 critical point of the univariate cenal t-disibution withνdegrees of freedom see Appendix A. When x te a, b thenµ te x te ± σ te x te t α/2 ν i for a< x te < b are pointwise 00 α% prediction bands for yx te. Example 4. Continued] Figure 4.3 plots the 95% pointwise prediction bands 6 d b.2 µ te ± r 4 c b.2 te R r te+ h 2 /Q } obtained for the prior b.2 of Theorem 4.2 based on the n s = 7 point aining data set obtained from the damped sine curve and shown as filled circles. Here h= 7 R r te while c b.2 and d b.2 are specified in Theorem 4.2. The first stage of the model is a GP with constant mean p= and Gaussian correlation function Rh θ=exp θh 2}.

12 4.3 Non-conjugate Bayesian Models and Prediction 3 The bands have been computed forθ 0, 75} to show the effect of assuming a songer correlation sucture between the Yx values θ = 0 and a weaker correlation sucture between the Yx values θ=75. Intuitively, when the model assumption allows Yx to vary more over a given interval of x, the confidence bands should be wider as seen in the right panel of the figure. For anyθthe bands have zero width at each of the ue data points. Finally, the pointwise yx predictor,µ te, is relatively insensitive to the correlation sucture while interpolating the data, although the different left hand prediction values show that this need not be the case when exapolating. 4.3 Examples of Non-conjugate Bayesian Models and Prediction TJS original brief outline of topics the HB model the posterior examples ex3.3 damped cosine/something -d?? and which one? 4.3. Inoduction Subsections 4.2. and assumed that the correlations among the observations are known, i.e., R and r 0 are known. Now we assume that y has a hierarchical Gaussian random field prior with parameic correlation function R ψ having unknown vector of parameters ψ as inoduced in Subsection?? and previously considered in Subsection for predictors. To facilitate the discussion below, suppose that the mean and variance of the normal predictive disibution in and?? are denoted byµ 0 n x 0 =µ 0 n x 0 ψ andσ 2 0 n x 0=σ 2 0 n x 0 ψ, whereψ was known in these earlier sections. Similarly, recall that the location and scale parameters of the predictive t disibutions in?? are denoted byµ i x 0 =µ i x 0 ψ andσ 2 i x 0=σ 2 i x 0 ψ, for i, 2, 3, 4}. We consider two issues. The first is the assessment of the standard error of the plug-in predictorµ 0 n x 0 ψ of Y 0 x 0 that is derived fromµ 0 n x 0 ψ by substituting ψ, which is an estimator ofψthat might be the MLE or REML. This question is implicitly stated from the frequentist viewpoint. The second issue is Bayesian; we describe the Bayesian approach to uncertainty inψwhich is to model it by a prior disibution. Whenψis known, recall thatσ 2 0 n x 0 ψ is the MSPE ofµ 0 n x 0 ψ. This suggests estimating the MSPE ofµ 0 n x 0 ψ by the plug-in MSPEσ 2 0 n x 0 ψ. The correct expression for the MSPE ofµ 0 n x 0 ψ is MSPEµ 0 n x 0 ψ,ψ=e ψ µ0 n x 0 ψ Yx 0 2 }. 4.3.

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Conjugate Analysis for the Linear Model

Conjugate Analysis for the Linear Model Conjugate Analysis for the Linear Model If we have good prior knowledge that can help us specify priors for β and σ 2, we can use conjugate priors. Following the procedure in Christensen, Johnson, Branscum,

More information

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014 Bayesian Prediction of Code Output ASA Albuquerque Chapter Short Course October 2014 Abstract This presentation summarizes Bayesian prediction methodology for the Gaussian process (GP) surrogate representation

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Introduction to Design and Analysis of Computer Experiments

Introduction to Design and Analysis of Computer Experiments Introduction to Design and Analysis of Thomas Santner Department of Statistics The Ohio State University October 2010 Outline Empirical Experimentation Empirical Experimentation Physical Experiments (agriculture,

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

Bayesian Linear Regression. Sargur Srihari

Bayesian Linear Regression. Sargur Srihari Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear

More information

Some general observations.

Some general observations. Modeling and analyzing data from computer experiments. Some general observations. 1. For simplicity, I assume that all factors (inputs) x1, x2,, xd are quantitative. 2. Because the code always produces

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Course topics (tentative) The role of random effects

Course topics (tentative) The role of random effects Course topics (tentative) random effects linear mixed models analysis of variance frequentist likelihood-based inference (MLE and REML) prediction Bayesian inference The role of random effects Rasmus Waagepetersen

More information

Frequentist-Bayesian Model Comparisons: A Simple Example

Frequentist-Bayesian Model Comparisons: A Simple Example Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger)

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger) 8 HENRIK HULT Lecture 2 3. Some common distributions in classical and Bayesian statistics 3.1. Conjugate prior distributions. In the Bayesian setting it is important to compute posterior distributions.

More information

Part 4: Multi-parameter and normal models

Part 4: Multi-parameter and normal models Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

More on nuisance parameters

More on nuisance parameters BS2 Statistical Inference, Lecture 3, Hilary Term 2009 January 30, 2009 Suppose that there is a minimal sufficient statistic T = t(x ) partitioned as T = (S, C) = (s(x ), c(x )) where: C1: the distribution

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection

Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection Prof. Nicholas Zabaras University of Notre Dame Notre Dame, IN, USA Email:

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE Donald A. Pierce Oregon State Univ (Emeritus), RERF Hiroshima (Retired), Oregon Health Sciences Univ (Adjunct) Ruggero Bellio Univ of Udine For Perugia

More information

data lam=36.9 lam=6.69 lam=4.18 lam=2.92 lam=2.21 time max wavelength modulus of max wavelength cycle

data lam=36.9 lam=6.69 lam=4.18 lam=2.92 lam=2.21 time max wavelength modulus of max wavelength cycle AUTOREGRESSIVE LINEAR MODELS AR(1) MODELS The zero-mean AR(1) model x t = x t,1 + t is a linear regression of the current value of the time series on the previous value. For > 0 it generates positively

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Limit Kriging. Abstract

Limit Kriging. Abstract Limit Kriging V. Roshan Joseph School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205, USA roshan@isye.gatech.edu Abstract A new kriging predictor is proposed.

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Stephen Senn (c) Stephen Senn 1 Acknowledgements This work is partly supported by the European Union s 7th Framework

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Why uncertainty? Why should data mining care about uncertainty? We

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Spatial omain Hierarchical Modelling for Univariate Spatial ata Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A.

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Models for models. Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research

Models for models. Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research Models for models Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research Outline Statistical models and tools Spatial fields (Wavelets) Climate regimes (Regression and clustering)

More information

Bayesian reconstruction of free energy profiles from umbrella samples

Bayesian reconstruction of free energy profiles from umbrella samples Bayesian reconstruction of free energy profiles from umbrella samples Thomas Stecher with Gábor Csányi and Noam Bernstein 21st November 212 Motivation Motivation and Goals Many conventional methods implicitly

More information

Introduction to Bayesian Inference

Introduction to Bayesian Inference University of Pennsylvania EABCN Training School May 10, 2016 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ

More information

Bayes and Empirical Bayes Estimation of the Scale Parameter of the Gamma Distribution under Balanced Loss Functions

Bayes and Empirical Bayes Estimation of the Scale Parameter of the Gamma Distribution under Balanced Loss Functions The Korean Communications in Statistics Vol. 14 No. 1, 2007, pp. 71 80 Bayes and Empirical Bayes Estimation of the Scale Parameter of the Gamma Distribution under Balanced Loss Functions R. Rezaeian 1)

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Bayesian Approaches Data Mining Selected Technique

Bayesian Approaches Data Mining Selected Technique Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals

More information

Hierarchical Modelling for Multivariate Spatial Data

Hierarchical Modelling for Multivariate Spatial Data Hierarchical Modelling for Multivariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Point-referenced spatial data often come as

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

One-parameter models

One-parameter models One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1 Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

More information

Hypothesis Testing for Var-Cov Components

Hypothesis Testing for Var-Cov Components Hypothesis Testing for Var-Cov Components When the specification of coefficients as fixed, random or non-randomly varying is considered, a null hypothesis of the form is considered, where Additional output

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

E cient Importance Sampling

E cient Importance Sampling E cient David N. DeJong University of Pittsburgh Spring 2008 Our goal is to calculate integrals of the form Z G (Y ) = ϕ (θ; Y ) dθ. Special case (e.g., posterior moment): Z G (Y ) = Θ Θ φ (θ; Y ) p (θjy

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus

More information

Hierarchical Modeling for Multivariate Spatial Data

Hierarchical Modeling for Multivariate Spatial Data Hierarchical Modeling for Multivariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

The Bayesian approach to inverse problems

The Bayesian approach to inverse problems The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples

Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples Nobumichi Shutoh, Masashi Hyodo and Takashi Seo 2 Department of Mathematics, Graduate

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Dynamic Factor Models and Factor Augmented Vector Autoregressions. Lawrence J. Christiano

Dynamic Factor Models and Factor Augmented Vector Autoregressions. Lawrence J. Christiano Dynamic Factor Models and Factor Augmented Vector Autoregressions Lawrence J Christiano Dynamic Factor Models and Factor Augmented Vector Autoregressions Problem: the time series dimension of data is relatively

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Parameter Estimation. Industrial AI Lab.

Parameter Estimation. Industrial AI Lab. Parameter Estimation Industrial AI Lab. Generative Model X Y w y = ω T x + ε ε~n(0, σ 2 ) σ 2 2 Maximum Likelihood Estimation (MLE) Estimate parameters θ ω, σ 2 given a generative model Given observed

More information