Bayesian Regression with Heteroscedastic Error Density and Parametric Mean Function

Size: px
Start display at page:

Download "Bayesian Regression with Heteroscedastic Error Density and Parametric Mean Function"

Transcription

1 Bayesian Regression with Heteroscedastic Error Density and Parametric Mean Function Justinas Pelenis Institute for Advanced Studies, Vienna May 8, 2013 Abstract This paper considers a Bayesian estimation of restricted conditional moment models with linear regression as a particular example. A common practice in the Bayesian literature for linear regression and other semi-parametric models is to use flexible families of distributions for the errors and to assume that the errors are independent from covariates. However, a model with flexible covariate dependent error distributions should be preferred for the following reason. Assuming that the error distribution is independent of predictors might lead to inconsistent estimation of the parameters of interest when errors and covariates are dependent. To address this issue, we develop a Bayesian regression model with a parametric mean function defined by a conditional moment condition and flexible predictor dependent error densities. Sufficient conditions to achieve posterior consistency of the regression parameters and conditional error densities are provided. In experiments, the proposed method compares favorably with classical and alternative Bayesian estimation methods for the estimation of the regression coefficients and conditional densities. Keywords: Bayesian semi-parametrics, Bayesian conditional density estimation, heteroscedastic linear regression. Previously circulated under the title of Bayesian Semiparametric Regression. I am very thankful to Andriy Norets, Bo Honore, Jia Li, Ulrich Müller and Chris Sims as well as seminar participants at Princeton, Royal Holloway, Institute for Advanced Studies, Vienna, Wirtschaftsuniversitat Wien, European Seminar on Bayesian Econometrics, Seminar on Bayesian Inference in Econometrics and Statistics (SBIES) and Cowles summer conferences for helpful discussions and multiple suggestions that greatly contributed to improve the quality of the manuscript. All remaining errors are mine.

2 1 Introduction Estimation of regression coefficients in linear regression models can be consistent but inefficient if heteroscedasticity is ignored. Furthermore, the regression curve only provides a summary of the mean effects but does not provide any information regarding conditional error distributions which might be of interest to the decision maker. Estimation of conditional error distributions is useful in settings where forecasting and out of sample predictions are the object of interest. In this paper we propose a novel Bayesian method for consistent estimation of both linear regression coefficients and conditional residual distributions when data generating process satisfies a linear conditional moment restriction E[y x] = x β or a more general restricted conditional moment condition of E[y x] = h(x, θ) for some known function h. The contribution of this proposal is that the model is correctly specified for a large class of true data generating processes without imposing specific restrictions on the conditional error distributions. Hence consistent and efficient estimation of the parameters of interest might be expected. The most widely used method to estimate the mean of a continuous response variable as a function of predictors is, without doubt, the linear regression model. Often the models considered impose the assumptions of constant variance and/or symmetric and unimodal error distributions. Such restrictions are often inappropriate for real-life datasets where conditional variability, skewness and asymmetry might hold. The prediction intervals obtained using models with constant variance and/or symmetric error distributions are likely to be inferior to the prediction intervals obtained from models with predictor dependent residual densities. To achieve full inference of parameters of interest and conditional error densities we propose a semi-parametric Bayesian model with a parametric mean function and a non-parametric prior for covariate dependent error densities to simultaneously estimate both regression coefficients and predictor dependent error densities. A Bayesian approach might be more effective in small samples as it enables 1

3 exact inference given observed data instead of relying on asymptotic approximations. There is a substantial semi-parametric Bayesian literature that focuses on non-parametric priors for the mean function and parametric priors on the error distributions. An alternative common strand of the literature focuses on either parametric or non-parametric priors on the mean function and uses explicitly non-parametric priors on the error distributions and it is this choice of the priors on the error distributions that motivates the discussion in this manuscript. The common assumption is that the errors are generated independently from regressors x and usually satisfy either a median or quantile restriction. Estimation and consistency of such models is discussed in Kottas and Gelfand (2001), Hirano (2002), Amewou-Atisso et al. (2003), Conley et al. (2008) and Wu and Ghosal (2008) among others. However, estimation of the parameters and error densities might be inconsistent if errors and covariates are dependent. For example, under heteroscedasticity or conditional asymmetry of error distributions the pseudo-true values of regression coefficients in a linear model with errors generated by covariate independent mixtures of normals are not generally equal to the true parameter values. One of the contributions of this paper is to show that the model proposed in this manuscript that incorporates predictor dependent residual densities is flexible and leads to a consistent estimation of both parameters of interest θ and conditional error densities. Other Bayesian proposals that incorporate predictor dependent residual density modeling into parametric models are by Pati and Dunson (forthcoming) where residual density is restricted to be symmetric, by Kottas and Krnjajic (2009) for quantile regression but without accompanying consistency theorems and by Leslie et al. (2007) who accommodate heteroscedasticity by multiplying the error term by a predictor dependent factor. However, none of these papers address the issue of conditional error asymmetry, and the estimation of regression coefficients by these methods might be inconsistent in the presence of residual asymmetry as the proposed models are misspecified. 2

4 Flexible models with covariate dependent error densities might lead to a more efficient estimator of the regression coefficients. For a linear regression problem, often only the regression coefficient β is of interest. It is a well known fact that if the conditional moment restriction holds then the weighted least squares estimator is more efficient than ordinary least squares estimator under heteroscedasticity. It is known that in parametric models, by assertion of Le Cam s parametric Bernstein-von Mises theorem, the posterior behaves as if one has observed normally distributed maximum likelihood estimator with variance equal to the inverse of Fisher information, see van der Vaart (1998). Semiparametric versions of Bernstein-von Mises theorem have been obtained by Shen (2002) and Kleijn and Bickel (2012), but the conditions are hard to verify. Nonetheless there is an expectation that posterior distribution of β is normal and centered at the true value in correctly specified semi-parametric models if the priors are carefully chosen. Since the most popular frequentist approach of using OLS with heteroscedasticity robust covariance matrix (White (1982)) is suboptimal in a linear regression model with conditional moment restriction, one should expect to achieve a more efficient estimator by estimating a correctly specified model proposed here. Simulation results presented in Section 4 support the hypothesis that the proposed model gives a more efficient estimator of regression coefficients under heteroscedasticity. The defining feature of the proposed model is that we impose a zero mean restriction on conditional error densities conditional on any predictor value. Imposition of the conditional restriction on the error distributions can be expected to be of benefit as a more efficient estimation of the parameters of interest might be expected. We model residual distributions flexibly as a finite or infinite mixtures of a base kernel. The base kernel for residual density is a mixture of two normal distributions with a joint mean of 0. The probability weights in both finite and infinite mixtures are predictor dependent and vary smoothly with changes in predictor values. We consider a finite smoothly mixing 3

5 regression model similar to the ones considered by Geweke and Keane (2007) and Norets (2010) and show that estimation would be consistent if the number of mixtures is allowed to increase. In such models, an appropriate number of mixtures needs to be selected which presents an additional complication. To avoid such complications, an alternative is to estimate a fully nonparametric model (i.e. infinite mixture). We consider the kernel stick breaking process as a fully non-parametric approach to inference in a restricted moment model defined by a conditional moment restriction. This flexible approach leads to consistent estimation of both parameters of interest and conditional error densities. Another contribution of this paper is to provide weak posterior consistency theorems for conditional density estimation in a Bayesian framework for a large class of true data generating processes using kernel stick breaking process (KSBP) with an exponential kernel proposed by Dunson and Park (2008). There are two alternative approaches for conditional density estimation in the Bayesian literature. The first general approach is to use dependent Dirichlet processes (MacEachern (1999), De Iorio et al. (2004), Griffin and Steel (2006) and others) to model conditional density directly. The second approach is to model joint unconditional distributions (Muller et al. (1996), Norets and Pelenis (2012) and others) and extract conditional densities of interest from joint distribution of observables. Even though many varying approaches for direct modeling of conditional distributions have been considered, consistency properties have been largely unstudied and only recent studies of Tokdar et al. (2010), Norets and Pelenis (forthcoming) and Pati et al. (2013) address this question using different setups. We provide a set of sufficient conditions to ensure weak posterior consistency of conditional residual densities using KSBP with an exponential kernel and mixtures of Gaussian distributions and indirectly achieve posterior consistency of the parametric part. The outline of the paper is as follows: Section 2 introduces the finite and infinite models for estimation of a semi-parametric linear regression with a conditional moment 4

6 constraint. Section 3 provides theoretical results regarding the posterior consistency of both the parametric and nonparametric components of the model. Section 4 contains small sample simulation results where we show that the proposed method performs well for both conditional density and parameter estimation and does not suffer from the possible misspecification biases much in the way that other approaches do. The proofs and details of posterior computation are contained in the Appendix. 2 Restricted Moment Model The data consists of N observations of (Y N, X N ) = {(y 1, x 1 ), (y 2, x 2 ),..., (y N, x N )} where y i Y R is a response variable and x i X R d are the covariates. The observations are independently and identically distributed (y i, x i ) F 0 under the assumption that the data generating process (DGP) satisfies E F0 [y x] = h(x, θ 0 ) for all x X for some known function h : X Θ Y. Alternatively, the restricted moment model can be written as y i = h(x i, θ 0 ) + ɛ i, (y i, x i ) F 0, i = 1,..., n. with E F0 [ɛ x] = 0 for all x X. The unknown parameters of this semi-parametric model would be (θ, f ɛ x ), where θ is the finite dimensional parameter of interest and f ɛ x is the infinite dimensional parameter. Let Ξ = F ɛ x Θ be the parameter space, where Θ denotes the space of θ and F ɛ x the space of conditional densities with mean zero. That is θ Θ R p and F ɛ x = { } f ɛ x : R X [0, ) : f ɛ x (ɛ, x)dɛ = 1, ɛf ɛ x (ɛ, x)dɛ = 0 x X. R R The primary objective is to construct a model to consistently estimate the parameter of interest θ 0, while consistent estimation of the conditional error densities f 0,ɛ x is of 5

7 secondary interest. This joint objective is achieved by proposing a flexible covariate dependent model for residual densities that allows the residual density to vary with predictors x X. We will show that the proposed model is correctly specified under weak restrictions on F ɛ x and leads to consistent estimation of both θ 0 and conditional error densities. Furthermore, the simulation results in Section 4 show that this flexible approach might help achieve more efficient estimate of the parameter of interest θ Finite Smoothly Mixing Regression First, we define a density f 2 ( ) which is a mixture of two normal distributions with a joint mean of zero. Given the parameters {π, µ, σ 1, σ 2 } density f 2 is defined as f 2 (ɛ; π, µ, σ 1, σ 2 ) = πφ(ɛ; µ, σ1) 2 + (1 π)φ(ɛ; µ π 1 π, σ2 2) where φ(ɛ; µ, σ 2 ) is a standard normal density evaluated at ɛ with mean µ and variance σ 2. Note that by construction a random variable ɛ with a probability density function f 2 has an expected value 0 as desired. In Section 3 we show that any density belonging to a large class of densities with mean 0 can be approximated by a countable collection of mixtures of f 2. The proposed finite smoothly mixing regression model that imposes a conditional moment restriction is a special case of a mixtures of experts as introduced by Jacobs et al. (1991). Let the proposed model be defined by a set of parameters (η k, θ) where θ is the parameter of interest and η k are the nuisance parameters that induce conditional 6

8 densities f ɛ x. The density of observable y i is modeled as: k p(y i x i, θ, η k ) = α j (x i )f 2 (y i h(x i, θ); π j, µ j, σ j1, σ j2 ) (1) j=1 k α j (x i ) = 1, x i X j=1 where α j (x t ) is a regressor dependent smoothly varying probability weight. Note that by construction E p [y x] = h(x, θ) as desired. The conditional distribution of residuals is modeled as a flexible countable mixture of densities f 2 with predictor dependent mixing weights. Modeling of α j (x) is the choice of the econometrician and there are few available alternatives. We will use a linear logit regression considered by Norets (2010) as it has desirable theoretical properties. Mixing probabilities α j (x i ) are constructed as α j (x i ) = exp ( ρ j + γ jx i ) k l=1 exp (ρ l + γ l x i). (2) The linear logit regression is not a unique choice as Geweke and Keane (2007) considered a multinomial probit model for α j (x), and a multiple number of alternative possibilities have been considered in predictor-dependent stick breaking process literature. Generally, this finite mixture model can be considered as a special case of smoothly mixing regression model for conditional density estimation that imposes a linear mean but leaves residual densities unconstrained. The full finite mixture model is characterized by the parameter of interest θ and the nuisance parameters η k { π j, µ j, σ j1, σ j2, ρ j, γ j} k. To complete the characterization of j=1 this model one would specify a prior Π θ on Θ and a prior Π η on the parameters η k that induces a prior Π fɛ x on F ɛ x. These priors induce a joint prior Π = Π θ Π fɛ x on Ξ. In Section 3 we show that for any true DGP F 0 there exists k large enough and 7

9 parameters (θ, η k ) such that the proposed model is arbitrarily close in KL divergence to the true DGP. This property can be used to show that a consistent estimation of θ 0 would be obtained with k. 2.2 Infinite Smoothly Mixing Regression Estimation of a finite mixture model introduces an additional complication of having to estimate the number of mixture components k. An alternative solution would be to consider an infinite smoothly mixing regression. The conditional density of the observable y i is modeled as: p(y i x i, θ, η) = p j (x i )f 2 (y i h(x i, θ); π j, µ j, σ j1, σ j2 ) j=1 where η are nuisance parameters to be specified later, p j (x i ) is a predictor dependent probability weight and j=1 p j(x) = 1 a.s. for all x X. To construct this infinite mixture model we will employ predictor-dependent stick breaking processes. Similarly to the choice of α j (x) in the finite smoothly mixing regressions, various constructions of p j (x) have been considered in the literature. Those methods include order based dependent Dirichlet processes (πddp) proposed by Griffin and Steel (2006), probit stick-breaking process (Chung and Dunson (2009)), kernel stick-breaking process (Dunson and Park (2008)) and local Dirichlet process (ldp) (Chung and Dunson (2011)) which is a special case of kernel stick-breaking processes. We will be employing a kernel stick-breaking process introduced by Dunson and Park (2008). It is defined using a countable sequence of mutually independent random components V j Beta(a j, b j ) and Γ j H independently for each j = 1,.... The covariate dependent mixing weights are 8

10 defined as: p j (x) = V j K ϕ (x, Γ j ) l<j (1 V l K ϕ (x, Γ l )), for all x X where K : R d R d [0, 1] is any bounded kernel function. Kernel functions that have been considered in practice are K ϕ (x, Γ j ) = exp( ϕ x Γ j 2 ) and K ϕ (x, Γ j ) = 1( x Γ j < ϕ), where is the Euclidean distance. We will focus on the exponential kernel in this manuscript. Jointly the conditional density of y i conditional on covariate x i is defined as p(y i x i, θ, η) = p j (x i )f 2 (y i h(x i, θ); π j, µ j, σ j1, σ j2 ) (3) j=1 p j (x) = V j K ϕ (x, Γ j ) l<j (1 V l K ϕ (x, Γ l )) For Bayesian analysis the parameters are endowed with these priors: { π j, µ j, σj1, 2 σj2} 2 G 0, Γ j H, V j Beta(a j, b j ), ϕ Π ϕ and θ Π θ. The nuisance parameter is η = {ϕ, {π j, µ j, σ j1, σ j2, V j, Γ j } j=1} and jointly these priors on the nuisance parameters induce a prior Π fɛ x on F ɛ x. This is a very flexible model for predictor dependent conditional densities, however it also imposes the desired property that conditional error densities have a mean of zero in order to identify parameter of interest θ. We will show that this is a correctly specified model for the DGP as posterior concentrates on the true parameter θ 0 and on a weak neighborhood of the true conditional densities f 0,ɛ x using a particular choice of a kernel function. Exponential kernel is chosen as it is closely related to the linear logit regression used in finite mixture model. Even though practical suggestions have been plentiful, theoretical results regarding the consistency properties of predictor dependent stick-breaking processes are scarce. Related 9

11 theoretical results are presented by Tokdar et al. (2010), Pati et al. (2013) and Norets and Pelenis (forthcoming). One of the key contributions of this paper are Theorem 2 and Corollary 1 in Section 3 that show that kernel stick-breaking processes with exponential kernel can be used to consistently estimate flexible unrestricted conditional densities and the parameters of interest. Hopefully, consistent estimation of the error densities could lead to a more efficient estimation of the parameters of interest, such as linear regression coefficients, as compared to the other methods that do not directly impose a conditional moment restriction. 3 Consistency Properties We provide general sufficient conditions on the true data generating process that lead to posterior consistency in estimating regression parameters and conditional residual densities. We show that residual densities induced by the proposed models can be chosen to be arbitrarily close in Kullback-Leibler divergence to true conditional densities that satisfy the conditional moment restriction. That is the Kullback-Leibler (KL) closure of proposed models in Section 2 include all true data generating distributions that satisfy a set of general conditions presented in this Section. To motivate the necessity for consistency properties of the models proposed in Section 2 we provide a stylized data generating example where alternative flexible constructions might fail to deliver consistent estimation of the regression function. Suppose that one is estimating a model y = g(x) + ɛ, where g( ) is either an unspecified function or some known function h(x, θ), such as h(x, θ) = x θ. One common textbook advice is to assume that ɛ is independent of covariates and to model error distribution as a Dirichlet process mixture to account for possible non-gaussianity. Such a model is proposed by Chib and Greenberg (2010). An alternative approach by Pati and Dunson (forthcoming) allows the error distribution to be covariate dependent, but restricts it to be symmetric. We hope 10

12 that the stylized example below will highlight the issues that both of these approaches might face, when the true error distribution is covariate dependent and asymmetric. Consider this particular discrete stylized data generating example such that E[y x] = xθ 0 = 0 and θ 0 = 0. Suppose that x { 1, 1} as then only two regression means have to be estimated for g(x = 1) and g(x = 1) and hence g(x) can be estimated as g(x) = xθ. Let true data generating process be P (x = 1) = P (x = 1) = 0.5, P (y = 2 x = 1) = 1/3 and P (y = 1 x = 1) = 2/3, and P (y = 1 x = 1) = 2/3 and P (y = 2 x = 1) = 1/3. Then E[y x] = 0 by design, however the pseudo-true value of θ would be θ = 0.5 or alternatively Ẽ[y x = 1] = 0.5 and Ẽ[y x = 1] = 0.5 if one assumes that either residuals are covariate independent or covariate dependent and symmetric and allows for the residual distribution to be otherwise fully flexible. This inconsistency arises as in both cases the conditional error distribution would concentrate around the probability distribution of P (ɛ = 1.5 x) = 0.5 and P (ɛ = 1.5 x) = 0.5 for both x = 1 and x = 1. This example illustrates the possible risk of the misspecification of the error density and the effect of such a misspecification on the estimates of regression coefficients will be explored in a simulation example in Section 4. Therefore, we advocate a model with flexible asymmetric covariate dependent error distributions as this would allow us to obtain consistent estimates of the parameters of interest. Next, we proceed to show that under very general assumptions on the true data generating process and the prior the estimation of the models defined in 2.1 and 2.2 would lead to posterior consistency. Let p(y x, M) be the conditional density of y given x implied by some model M. Let the true data generating joint density of (y, x) be f 0 (y x)f 0 (x), then the joint marginal density induced by the model M is p(y x, M)f 0 (x). Note that in the models considered in Section 2 we modeled only conditional error density and left the data generating density f 0 (x) of x X unspecified. The KL divergence 11

13 between f 0 (y x)f 0 (x) and p(y x, M)f 0 (x) is defined as d KL (f 0, p M ) = log f 0(y x)f 0 (x) p(y x, M)f 0 (x) F 0(dy, dx) = log f 0(y x) p(y x, M) F 0(dy, dx). (4) Given the true conditional data generating density f 0 (y x), define f 0,ɛ x as f 0,ɛ x (ɛ x) = f 0 (ɛ+h(x, θ 0 ) x). We say that posterior is consistent for estimating (f 0,ɛ x, θ 0 ) if Π(W Y n, X n ) converges to 1 with P n F 0 probability as n for any neighborhood W of (f 0,ɛ x, θ 0 ) when the true data generating distribution is F 0. We define a weak neighborhood U δ (f ɛ x ) as U δ (f ɛ x ) = { f x : f x F ɛ x, R X gf x (ɛ, x)f 0 (x)dɛdx R X gf ɛ x (ɛ, x)f 0 (x)dɛdx < δ, g : R X R is bounded and uniformly continuous}. We consider a neighborhood W of (f 0,ɛ x, θ 0 ) of the form U δ (f 0,ɛ x ) {θ : θ θ 0 < ρ} for any δ > 0 and ρ > 0. First, we will consider the case of the finite model described in Section 2.1 by Equations (1) and (2). Let the proposed model be defined by the parameters (η k, θ). Then we show that there exists k large enough and a set of parameters (η k, θ) such that KL divergence between true conditional densities and the ones implied by the finite model is arbitrarily close to 0. Theorem 1. Assume that 1. E[y x] = h(x, θ 0 ) a.s. F f 0 (y x) is continuous in (y, x) a.s. F X has bounded support and E F0 [y 2 x] < for all x X. 4. h is Lipschitz continuous in x. 12

14 5. There exists δ > 0 such that log f 0 (y x) inf y z <δ, x t <δ f 0 (z t) F 0(dy, dx) < (5) Let p(, θ, η k ) be defined as in Equations (1) and (2). Then, for any ɛ > 0 there exists (η k, θ) such that d KL (f 0 ( ), p(, θ, η k )) < ɛ. Theorem 1 is proved in the appendix. The general idea behind this approximation result is provided below. First we show that the mixing weights α(x) are flexible enough, so that a subset of the mixing weights approximate an indicator function on the neighborhood of some particular covariate value x, while other mixing weights for that particular x approach 0. Then given this particular point x, we can approximate the conditional error density at this x by a finite mixture of f 2 densities associated with a particular subset of mixing weights. As covariate x is varied, different mixing weights would approximate the indicator function on the neighborhood of x and different mixtures of f 2 densities would be used to approximate the conditional error density at the particular value of x. The results above imply the existence of a large number k of mixture components such that induced conditional densities are close to the true values of the DGP. However, this does not provide a direct method of estimating k, the number of mixtures, to be used in applications. Furthermore, one can show that any finite model could have pseudo-true values of θ different from true values for some data generating distributions that belong to the general class F of DGPs. Therefore, we would like to establish that as the KL approximation increases it has to be the case that our estimate of θ is approaching the true value θ 0. We show that these concerns are addressed when an infinite smoothly mixing regression model induced by a predictor dependent stick breaking process prior is used for inference. We provide the general conditions under which the estimation of the 13

15 infinite mixture model leads to posterior consistency of f 0,ɛ x and θ 0. Hence, we provide the necessary theoretical foundation for the use of infinite mixture model if consistency is a desired requirement for the estimation procedure. For the infinite mixture model defined in the Equation (3), the priors G 0, H, Π V, Π ϕ, Π θ and a choice of kernel function K ϕ induce a prior Π on Ξ. By choosing the priors on the nuisance parameters, we induce a prior on the set of conditional error densities with mean 0. A conditional density function f x is said to be in the KL support of the prior Π (i.e. f x KL(Π)), if for all ɛ > 0, Π(K ɛ (f x )) > 0, where K ɛ (f x ) {(θ, η) : d KL (f x ( ), p(, θ, η)) < ɛ} and d KL (, ) is defined in the Equation (4). The next theorem shows that if the true data generating distribution F 0 satisfies the assumptions of the Theorem 1, then f 0 belongs to the KL support of Π under general conditions on the prior distributions and for a particular kernel function. Theorem 2. Assume F 0 satisfies assumptions of Theorem 1 and f 0 ( ) are covariate dependent conditional densities of y Y induced by F 0. Let K ϕ (x, Γ) = exp( ϕ x Γ 2 ) and let the prior Π be induced by the priors G 0, H, Π V, Π ϕ, Π θ and the model defined in Equation (3). If the priors are such that θ 0 is an interior point of support of Π θ, Π(σ j1 < δ) > 0 for any δ > 0 and X supp(h), then f 0 KL(Π). Theorem 2 is proved in the Appendix. The intuition is such that for any given set of the parameters of the finite mixture model, there is a set of the nuisance parameters of the infinite model that approximates the finite model in KL sense. The minimal restrictions on the priors show that any set of the parameters in neighborhood of this particular instance of the infinite model is arbitrarily close in KL sense and this neighborhood has a positive prior support. Once the KL support property is established we hope to proceed to use Schwartz s posterior consistency theorem (Schwartz (1965)) to show that posterior is weakly consistent at f 0,ɛ x and θ 0. First, we will consider the case of the linear regression with 14

16 h(x, θ) = x θ as an illustrative example of the additional assumptions that are necessary to achieve posterior consistency. Furthermore, assume that a constant is included in the regressors x. Theorem 3. An (almost) immediate implication of Schwartz (1965). Given the model defined by Equation (3) suppose that F 0 satisfy the assumptions of the Theorem 1 and prior distributions satisfy the requirements of the Theorem 2 and that E F0 [y x] = x θ 0. Furthermore, suppose that E F0 [x x] is a non-singular matrix. The prior is restricted so that there exists a large L such that E f [ɛ 2 x] < L for all x X and all f supp(π fɛ x ). Let W = U δ (f 0,ɛ x ) {θ : θ θ 0 < ρ} for some δ > 0 and ρ > 0, then Π(W c Y N, X N ) 0 a.s. PF 0. The assumption that E F0 [x x] is non-singular allows us to identify the unique parameter of interest θ 0. If E F0 [x x] were singular, then in the presence of multi-collinearity the parameter of interest is only set-identified and one could expect the posterior to concentrate on this set, but this conjecture is not formally explored. The theorem is proved rigorously in the Appendix. It consists of the construction of exponentially consistent tests for testing H 0 : (f ɛ x, θ) = (f 0,ɛ x, θ 0 ) against alternative hypothesis H 1 : (f ɛ x, θ) W c. Once that is accomplished it is a straightforward application of Schwartz s posterior consistency theorem as KL-support property is already proved in Theorem 2. Theorem 3 can be extended to other restricted moment models beyond linear regression case with an additional assumption. Let γ { 1, 1} p and define a quadrant Q γ = {z R p : z j γ j > 0 for all j = 1,..., p}. Corollary 1. Suppose that F 0 satisfy the assumptions of the Theorem 1 and that the prior distributions satisfy the requirements of the Theorem 2. Additionally, assume that for each γ and θ θ 0 Q γ, there exists X h,γ,ξ = {x X h(x, θ 0 ) h(x, θ) > ξ θ θ 0, θ θ 0 Q γ, x > ξ} and F 0 (X h,γ,ξ ) > ζ > 0 for some ξ > 0 small enough. Furthermore, the prior is restricted so that there exists a large L such that E f [ɛ 2 x] < L for all x X and 15

17 all f supp(π fɛ x ). Let W = U δ (f 0,ɛ x ) {θ : θ θ 0 < ρ} for some δ > 0 and ρ > 0, then Π(W c Y N, X N ) 0 a.s. P F 0. This Corollary is proved in the appendix, but it is just a simple extension of the 3. However, the assumptions of this Corollary require some restrictions on the possible function h and on the distribution of the covariates. This assumption could be satisfied for a number of generalized linear models with a monotonic link function. This corollary establishes that consistent estimation of parameter of interest and conditional error densities will be achieved if the true data generating process satisfies a conditional moment restriction. Given the desirable theoretical properties that both parametric and nuisance parts are consistently estimated it achieves the two objectives. Firstly, the estimation of the parameter of interest is consistent. Secondly, consistent estimation of the nuisance parameter, which are conditional error densities in this case, might lead to a more efficient estimation of the parameter of interest which would be a justification for the estimation of the full semi-parametric model as opposed to some alternative simplified model. There have been a number of papers that estimated the mean regression function of E[y x] without specific parametric assumptions on the form of the regression mean. For example, Chan et al. (2006) and Chib and Greenberg (2010) used splines to flexibly model the regression mean, while Pati and Dunson (forthcoming) used Gaussian process prior for the mean regression function and proved consistency properties of their proposal as well for the case of symmetric residual densities. Unsurprisingly, using the results from Norets (2010) one could show that a well behaved mean regression function could be approximated by finite order polynomials in covariates and Theorems 1 and 2 could be extended to show that f 0 and E[y x] = g(x) is in the Kullback-Leibler support of Π under some general conditions. Extending these results to posterior consistency is less straightforward due to construction of exponentially consistent tests for a non-parametric 16

18 mean. Such tests based on a conditional mean restriction were built to test hypotheses on particular chosen subspaces of a finite dimensional space Θ and the choice of such subspaces is not straightforward in the case of unrestricted regression mean function. However, similar exponentially consistent tests based on conditional median restriction have been constructed by Pati and Dunson (forthcoming) and one would hope that such an extension is possible for testing conditional mean restriction as well and it is an interesting future research topic. 4 Simulation Examples A number of simulation examples is considered to assess the performance of the method proposed in this paper. Consider a linear regression model with y i = α + x iβ + ɛ i,, (y i, x i ) iid F 0, i = 1,..., N. and E F0 [ɛ i x i ] = 0, x is one-dimensional and N = 200. We consider four alternative data generating processes (DGPs), with first three suggested by Müller (forthcoming). 1. Case (i): y i = x i + ɛ i, ɛ i N(0, 1). 2. Case (ii): y i = x i + ɛ i, ɛ i x i N(0, a 2 ( x i + 0.5) 2 ), where a = Case (iii): y i = x i + ɛ i, ɛ i x i, s N([1 2 1(x i < 0)]µ s, σ 2 s), where P (s = 1) = 0.8, P (s = 2) = 0.2, µ 1 = 0.25, σ 1 = 0.75, µ 2 = 1 and σ 2 = Case (iv): y i = 0+0 x i +ɛ i, ɛ i x i N(x i µ s, ), where P (s = 1) = P (s = 2) = 0.5 and µ 1 = µ 2 = All four DGPs are such that E[(x i ɛ i ) 2 ] = 1 and x i N(0, 1) so that inference based on heteroscedasticity robust OLS and infeasible GLS are comparable across the four cases. 17

19 Inference is based on the following methods. First, inference based on an infeasible generalized least squares (GLS) with a correct covariance matrix specification. Second, Bayesian inference based on the artificial sandwich posterior (OLS) as proposed by Müller (forthcoming). Let θ = (α, β), then θ N(ˆθ, ˆΣ) where ˆθ is the ordinary least squares coefficient and ˆΣ is the sandwich covariance matrix. Note that inference based on this sandwich posterior is asymptotically equivalent to inference using Bayesian bootstrap (Lancaster (2003)) so there is a Bayesian alternative to this frequentist inspired procedure when the regression coefficient is the object of interest. Third, Bayesian inference based on a normal regression model (NLR), where ɛ i x i N(0, σ 2 ) with priors θ N(0, (0.01I 2 ) 1 ), σ 2 InvGamma(3/2, 2). Fourth, inference based on a Robinson (1987) asymptotically efficient uniform weight k-nn estimator with k n = n 4/5. Finally, Bayesian inference based on the conditional linear regression model (CLR) proposed in this paper. At this stage for computational reasons we consider the finite model with k = 5 number of states as defined in Equations (1) and (2). We will consider three versions of the CLR model. The first version would be unrestricted regular CLR with priors set to θ N(0, (0.01I 2 ) 1 ), ρ j N(0, 1), γ j N(0, 4), σ ji IG(3/2, 2), µ j N(0, 4) and (π j, 1 π j ) Dir(5, 5) for all j and i. The second version will be called unconditional linear regression (ULR) model which allows for flexible error distribution which is assumed to be independent of covariates. Such model will be estimated by restricting γ j = 0 for all j and with the same prior distributions as for CLR. The third version will be called symmetric conditional linear model (SymCLR) and the error densities will be restricted to be symmetric (but possibly heteroscedastic), by setting π j = 0.5 and σ j1 = σ j2 for all j and the rest of the priors are the same. By comparing these three model choices we hope to pinpoint the possible shortcomings of the homoscedasticity or symmetry restrictions as discussed in Section 3. The priors for µ, σ, π are chosen to reflect the range and variance of observable y at possible covariate 18

20 values. Furthermore, the priors for ρ, γ control the sharing of information across covariate values and the prior of γ should be of similar magnitude to the range of covariates x to allow for conditional error density difference across covariate values. While no preprocessing steps were taken for this simulation exercise as, in some other instances it would be beneficial to standardize covariates so that prior expectation of the influence of each predictor variable is equal. Posterior computation of the finite model and full description of the priors are contained in the Appendix Posterior simulation for the infinite mixture model could be accomplished using retrospective or slice sampling methods proposed by Papaspiliopoulos and Roberts (2008), Walker (2007) and Kalli et al. (2011). The approach used in this paper for infinite mixture sampling is presented in Appendix as well. The parameter of interest is β R and we consider three separate criteria for the evaluation of the performance of the proposed estimators. First, we will compute root mean squared error (RMSE). Second, we construct 95% Bayesian credibility regions using and quantiles of the posterior distribution and report coverage probabilities. Third, we consider the length of a credibility region as another indicator of the performance of the estimator and we compare the average interval lengths of the competing methods. Similar approaches for evaluating performance have been considered by Conley et al. (2008). We repeat the simulation exercise 1000 times for each of the four DGPs. The results are displayed in Table 1. Relative performance of the methods is similar whether RMSE, coverage, or interval length is used as an evaluation criterion. The results show that the conditional linear regression model proposed in this paper performs better than alternatives in Cases (ii) and (iv) in the presence of heteroscedasticity and performs comparably in other cases. In Cases (i) and (iii) the best performing models should be OLS and NLR since it is well know that OLS estimator achieves the semi-parametric efficiency 19

21 Table 1: Simulation results Case (i) Case (ii) Case (iii) Case (iv) Method Criterion GLS OLS NLR k-nn RMSE CLR ULR SymCLR GLS OLS NLR 95 % k-nn Coverage CLR ULR SymCLR GLS OLS NLR Interval k-nn Length CLR ULR SymCLR Notes: DGPs are in columns and methods of inference in rows. Entries are RMSE, Coverage of 95% Bayesian credibility region and interval length of the Bayesian credibility region. Bayesian inference in each method is implemented by a Gibbs sampler of draws with first 5000 discarded as burn-in for 1000 draws from each DGP. 20

22 under homoscedasticity. Note that under heteroscedasticity in Cases (ii) and (iv) the 95% coverage of highest posterior density interval of Bayesian NLR is well below 95% due to the misspecification of the error distribution. Unconditional ULR and symmetric SymCLR versions perform worse in Case (iii) due to (conditional) asymmetries of the error distribution. When the true conditional error density is skewed, the misspecified models lead to inconsistent estimation of the regression coefficients as has been conjectured in Section 3. Furthermore, note that unconditional ULR performs poorly in Cases (ii) and (iv) in the presence of heteroscedasticity as expected. As demonstrated in this simulation example estimation of linear models with conditional error densities modeled as mixtures of normals or with flexible symmetric residual densities as proposed in Pati and Dunson (forthcoming) might be misguided if the regression coefficients are the object of interest. As expected, the CLR model proposed in this paper outperforms or performs comparable to the other alternatives even when compared to the asymptotically efficient k-nn estimator. Furthermore, the proposed methods are able to provide estimates of the conditional error densities as well and this is the aspect that we explore below. First, we will present some suggestive evidence that infinite mixture model defined in Equation (3) with an exponential kernel function is capable of consistently estimating conditional error densities. Furthermore, we will illustrate the possible shortcoming of the infinite mixture model that is restricted to be symmetric. We consider three different finite sample sizes: 200, 800, The priors used for the infinite mixture model are set to θ N(0, (0.01I 2 ) 1 ), ψ Exp(2), V j Beta(1, 1) and Γ j U(min{X}, max{x}), σ ji IG(3/2, 2), µ j N(0, 4) and (π j, 1 π j ) Dir(5, 5) for all j and i for the unrestricted infinite model. For the symmetric model we enforce π j = 1/2 and σ j1 = σ j2 as it can be shown that the infinite mixture model defined in Equation (3) with these restrictions could consistently estimate any general symmetric conditional error density as proposed by Pati et al. (2013). The models are estimated via MCMC Gibbs sampling 21

23 methods as described in Appendix by keeping the last 5000 draws with the first 5000 discarded as burn-in. Below, we explore in more detail the asymmetric DGP (iii). In Figure 1 we present the results of the conditional density estimation. It can be seen that with an increasing sample size the unconstrained infinite mixture model consistently estimates the conditional error densities. In addition, in Figure 2 we present the contour plots for N = From Figure 2, one can note that the constrained symmetric infinite mixture model depicted in the center panel is unable to estimate this skewed distribution. Moreover, one can infer that the pseudo-true value of the linear regression coefficient in this case is indeed negative as it has been suggested by Müller (forthcoming). Hence, the proposed unconstrained infinite CLR model consistently estimates both the coefficients and conditional error densities as desired, while in some cases, such as DGP (iii), the flexible heteroscedastic symmetric restriction might lead to inconsistent estimation of both the parameters of interest and of the conditional densities. Figure 1: Estimated conditional densities for different covariate values and sample sizes. The solid lines are the true values, the red dotted lines are the posterior means, and the blue dashed lines are pointwise 95% equal-tailed credible intervals. 22

24 Figure 2: Left panel: Posterior means of p(y x) for the unconstrained CLR. Center panel: Posterior means of p(y x) for the symmetric SymCLR. Right panel: True p(y x). We can compare the performance of unconstrained CLR and symmetric CLR for the DGP (iv). As expected the unconstrained CLR is able to consistently estimate conditional error densities as can be seen from Figure 3. However, in this case the symmetric CLR correctly estimates the conditional residual densities as well as can be seen from the contour plots in Figure 4, since the true error densities were symmetric as well. As these examples show, the approach considered in this paper is capable of consistently estimating heteroscedastic error densities and a parametric mean function, while some other common specifications of non-parametric prior for residual densities might be misspecified. The unfortunate consequence of such approaches is that even in the case when only the error density is misspecified but parametric mean function is correctly specified it might still cause the parameters of interest such as linear regression coefficients to be estimated inconsistently. The models considered in this paper proposed a new specification of a heteroscedastic error density in order to obtain posterior consistency in both mean regression function and conditional error densities. 23

25 Figure 3: Estimated conditional densities for different covariate values and sample sizes. The solid lines are the true values, the red dotted lines are the posterior means, and the dashed lines are pointwise 95% equal-tailed credible intervals. Figure 4: Left panel: Posterior means of p(y x) for the unconstrained CLR. Center panel: Posterior means of p(y x) for the symmetric SymCLR. Right panel: True p(y x). 24

26 5 Appendix 5.1 Proofs Proof. (Theorem 1) Let A m j, j = 0, 1,..., m, be a partition of Y R, where A m 1,..., A m m are adjacent intervals with side length h m and A m 0 is the rest of set Y. Let B m j, j = 1,..., N(m) be a partition of a bounded X where B n 1,..., B m N(m) are adjacent cubes with side length λ m that cover whole of X and N(m) = m d. Furthermore, let x m j be the center of the cube B m j. We will require that by construction h m, λ m 0 as m. We will construct a model (θ, η k ) such that d KL (f 0 ( ), p(, θ, η k )) < ɛ. Firstly choose θ = θ 0 and consider a model defined by some nuisance parameters η m as p(y x, θ 0, η m ) = α j (x) = N(m) j=1 α j (x) m F 0 (A m i x m j )φ ( ) y h(x, θ 0 ); µ ji, σji 2 i=0 exp ( c m (x m j x m j 2x m j x) ) N(m) l=0 exp ( c m (x m l x m l 2x m l x)) (6) such that m i=0 F 0(A m i x m j )µ ji = 0 for all j {1..., N(m)} as to impose the conditional moment restriction in the proposed model. We will show that under appropriate conditions for c n, some collection of α j (x) approximates 1 B n j (x). Given x X define I m (x, s m ) = { j : x m j x 2 < s m } where sm = λ 2 md is a squared diagonal of Bj m. Using the arguments of the proof of Proposition 4.1. from Norets (2010) we know that for m large enough for any j I m (x, s m ) α j (x) 1 exp { c m s m }/N(m). (7) j I m (x,s m) One can always pick c m such that exp { c m s m }/N(m) 0 and then a collection of α j (x) approximates 1 B n j (x). 25

27 By Assumption 4 there exists L large enough such that h(x, θ 0 ) h(x n j, θ 0 ) L x x n j as h is Lipschitz continuous. Let δ m = δ m + Ls 1/2 m where δ m 0. Then for any j I m (x, s m ) and A m i C δ m (y), where C δ (y) is an interval centered at y with length δ, the following holds F 0 (A m i x n j ) λ(a m i ) inf f 0 (z t). (8) z C δ m (y), t x 2 s m In order to impose conditional mean of 0 we require that, for each x m j, the parameters {µ ji } m i=0 satisfy m i=0 F 0(A m i x m j )µ ji = 0. Then given x m j and j for each i = 0,..., m define µ ji as µ ji = A m i f 0 (y x m j )(y h(x m j, θ 0 ))dy F 0 (A m i xm j ) if F 0 (A m i x n j ) > 0. If F 0 (A m i x n j ) = 0 then define µ ji = µ i h(x m j, θ 0 ) where µ i is the center of the interval A m i and for i = 0 define µ j0 = 0. Then by construction m i=0 F 0(A m i x n j )µ ji = 0 for each j = 1,..., N(m). For some j I m (x, s m ) let y j = y (h(x, θ 0 ) h(x n j, θ 0 )), then C δm (y j ) C δ m (y) by definition of δ m and δm. Let σ ji = σ m if i > 0 and σ j0 = σ 0 that will be chosen later. For m large enough such that { } i : A m i C δm(y ) λ(a m i )φ ( ) y h(x, θ 0 ); µ ji, σm 2 = i:a m i C δm (y) ( λ(a m i )φ y (h(x, θ 0 ) h(x n j, θ 0 )); i:a m i C δm (y) ( λ(a m i )φ y j ; µ ji, σm) 2 i:a m i C δm (y j ) 1 A m i ) f 0 (y x m j )ydy F 0 (A m i xm j ), σm 2 3h m 8σ m. (9) (2π) 1/2 σ m (2π) 1/2 δ m 26

28 with the last inequality derived from Lemmas 1 and 2 1 in Norets and Pelenis (2012) where µ ji = A m f 0 (y x m j )ydy i F 0. (A m i xm j ) Then equation (6) and inequalities (7), (8) and (9) can be combined to show that for any given (x, y) there exists m large enough such that p(y x, θ 0, η m ) > j I m 1 (x,sm) ( F 0 (A m i x n j )α j (x)φ y h(x, θ0 ); µ ji, σm) 2 i:a m i C δm (y) ( 1 3h m 8σ m (2π) 1/2 σ m inf z C δ m (y), t x 2 s m f 0 (z t) (2π) 1/2 δ m ) ( 1 exp { c ) ms m }. N(m) Let {δ m, σ m, h m, c m, s m } satisfy the following: δ m 0, σ m /δ m 0, h m /σ m 0, c m, s m 0, exp { c m s m }/N(m) 0. Hence for any given (x, y) and a given ɛ > 0 there exists M 1 large enough such that m > M 1 p(y x, θ 0, η m ) > inf f 0 (z t) (1 ɛ). z C δ m (y), t x 2 s m By Assumption 2 f 0 (y x) is continuous in (y, x) and if f 0 (y x) > 0, then there exist M 2 large enough such that m > M 2 f 0 (y x) inf z Cδ m (y), t x 2 s m f 0 (z t) 1 + ɛ since s m 0 and δ m 0. Then for any m max {M 1, M 2 } 1 f 0 (y x) p(y x, θ 0, η m ) f 0 (y x) inf z Cδ m (y), t x 2 s m f 0 (z t)(1 ɛ) 1 + ɛ 1 ɛ. Henceforth, log max {1, f 0 (y x)/p(y x, θ 0, η m )} 0 a.s. F 0 as long as f 0 (y x) is continuous 1 With a minor adjustment in the proofs due to µ ji A m i rather than being a center of A m i. 27

29 in (y, x) a.s. F 0. This result establishes pointwise convergence. Furthermore, we would like to apply Dominated Convergence Theorem (DCT) to show that d KL (f 0 ( ), p(, θ, η m )) 0 as m. To apply DCT we need to establish an integrable upper bound. We know from equation (7) that for any x X there exists m large enough such that α j (x) > 1/2 for j I m (x, s m ). Similarly from equation (9) that for any (y, x) the bound for the Riemann sum is bounded below by 1/2 for m large enough. These facts will be used in deriving an integrable upper bound. Further note that pointwise convergence is obtained if instead of F 0 (A m 0 x m j )φ (y h(x, θ 0 ); µ j0, σ0) 2 in the mixture model, we consider 0.5F 0 (A m 0 x m j )(φ (y h(x, θ 0 ); 2µ j0, σ0) 2 + φ (y h(x, θ 0 ); 0, σ0)). 2 Combining these three observations we know that then for m large enough and j I m (x, s m ) p(y x, θ 0, η m ) = N(n) j=1 α j (x) M F 0 (A m i x n j )φ ( ) y h(x, θ 0 ); µ ji, σm 2 m= F 0 (A m 0 X n j ) ( φ ( y h(x, θ 0 ); 2µ j0, σ 2 0) + φ ( y h(x, θ0 ); 0, σ 2 0 > 0.25 inf f 0 (z t) δ φ ( y h(x, θ 0 ); 0, σ0) 2. z C δ (y), t x 2 δ )) as the inequality is derived by the same reasoning as in the proof of Proposition 2.1. in Norets (2010). Then { log max { = log 1, } { f 0 (y x) log max 1, p(y x, θ 0, η m ) δφ (y h(x, θ 0 ); 0, σ 2 0) = log { 0.25δφ ( y h(x, θ 0 ); 0, σ 2 0 } f 0 (y x) 0.25δφ (y h(x, θ 0 ); 0, σ0) 2 inf z Cδ (y), t x 2 δ f 0 (z t) } f 0 (y x) inf z Cδ (y), t x 2 δ f 0 (z t) )} f 0 (y x) + log inf z Cδ (y), t x 2 δ f 0 (z t) and the first logarithm above is integrable by Assumption 3 while the second logarithm is integrable by Assumption 5. Therefore, applying DCT we get that d KL (f 0 ( ), p(, θ, η m )) 0. 28

POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES. By Andriy Norets and Justinas Pelenis

POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES. By Andriy Norets and Justinas Pelenis Resubmitted to Econometric Theory POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES By Andriy Norets and Justinas Pelenis Princeton University and Institute for Advanced

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Journal of Econometrics

Journal of Econometrics Journal of Econometrics 168 (2012) 332 346 Contents lists available at SciVerse ScienceDirect Journal of Econometrics ournal homepage: www.elsevier.com/locate/econom Bayesian modeling of oint and conditional

More information

Bayesian spatial quantile regression

Bayesian spatial quantile regression Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse

More information

Semiparametric posterior limits

Semiparametric posterior limits Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Latent factor density regression models

Latent factor density regression models Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON

More information

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK Practical Bayesian Quantile Regression Keming Yu University of Plymouth, UK (kyu@plymouth.ac.uk) A brief summary of some recent work of us (Keming Yu, Rana Moyeed and Julian Stander). Summary We develops

More information

Bayesian Modeling of Conditional Distributions

Bayesian Modeling of Conditional Distributions Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings

More information

Accounting for Complex Sample Designs via Mixture Models

Accounting for Complex Sample Designs via Mixture Models Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Bayesian Nonparametric Regression through Mixture Models

Bayesian Nonparametric Regression through Mixture Models Bayesian Nonparametric Regression through Mixture Models Sara Wade Bocconi University Advisor: Sonia Petrone October 7, 2013 Outline 1 Introduction 2 Enriched Dirichlet Process 3 EDP Mixtures for Regression

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Outline. Introduction to Bayesian nonparametrics. Notation and discrete example. Books on Bayesian nonparametrics

Outline. Introduction to Bayesian nonparametrics. Notation and discrete example. Books on Bayesian nonparametrics Outline Introduction to Bayesian nonparametrics Andriy Norets Department of Economics, Princeton University September, 200 Dirichlet Process (DP) - prior on the space of probability measures. Some applications

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

ICES REPORT Model Misspecification and Plausibility

ICES REPORT Model Misspecification and Plausibility ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Nonparametric Bayesian modeling for dynamic ordinal regression relationships Nonparametric Bayesian modeling for dynamic ordinal regression relationships Athanasios Kottas Department of Applied Mathematics and Statistics, University of California, Santa Cruz Joint work with Maria

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:

More information

Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Endogenous Variables Models Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take

More information

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Flexible Regression Modeling using Bayesian Nonparametric Mixtures Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham

More information

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods

More information

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Priors for the frequentist, consistency beyond Schwartz

Priors for the frequentist, consistency beyond Schwartz Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist

More information

Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother

Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother J. E. Griffin and M. F. J. Steel University of Warwick Bayesian Nonparametric Modelling with the Dirichlet Process Regression

More information

Bayesian Heteroskedasticity-Robust Regression. Richard Startz * revised February Abstract

Bayesian Heteroskedasticity-Robust Regression. Richard Startz * revised February Abstract Bayesian Heteroskedasticity-Robust Regression Richard Startz * revised February 2015 Abstract I offer here a method for Bayesian heteroskedasticity-robust regression. The Bayesian version is derived by

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

BAYESIAN INFERENCE IN A CLASS OF PARTIALLY IDENTIFIED MODELS

BAYESIAN INFERENCE IN A CLASS OF PARTIALLY IDENTIFIED MODELS BAYESIAN INFERENCE IN A CLASS OF PARTIALLY IDENTIFIED MODELS BRENDAN KLINE AND ELIE TAMER UNIVERSITY OF TEXAS AT AUSTIN AND HARVARD UNIVERSITY Abstract. This paper develops a Bayesian approach to inference

More information

A Bayesian perspective on GMM and IV

A Bayesian perspective on GMM and IV A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity Zhengyu Zhang School of Economics Shanghai University of Finance and Economics zy.zhang@mail.shufe.edu.cn

More information

Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal?

Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal? Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal? Dale J. Poirier and Deven Kapadia University of California, Irvine March 10, 2012 Abstract We provide

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Estimation of Dynamic Regression Models

Estimation of Dynamic Regression Models University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.

More information

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

Inconsistency of Bayesian inference when the model is wrong, and how to repair it Inconsistency of Bayesian inference when the model is wrong, and how to repair it Peter Grünwald Thijs van Ommen Centrum Wiskunde & Informatica, Amsterdam Universiteit Leiden June 3, 2015 Outline 1 Introduction

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Gaussian kernel GARCH models

Gaussian kernel GARCH models Gaussian kernel GARCH models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics 7 June 2013 Motivation A regression model is often

More information

Gibbs Sampling in Linear Models #1

Gibbs Sampling in Linear Models #1 Gibbs Sampling in Linear Models #1 Econ 690 Purdue University Justin L Tobias Gibbs Sampling #1 Outline 1 Conditional Posterior Distributions for Regression Parameters in the Linear Model [Lindley and

More information

Semi-Parametric Importance Sampling for Rare-event probability Estimation

Semi-Parametric Importance Sampling for Rare-event probability Estimation Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Eco517 Fall 2014 C. Sims FINAL EXAM

Eco517 Fall 2014 C. Sims FINAL EXAM Eco517 Fall 2014 C. Sims FINAL EXAM This is a three hour exam. You may refer to books, notes, or computer equipment during the exam. You may not communicate, either electronically or in any other way,

More information

What s New in Econometrics. Lecture 13

What s New in Econometrics. Lecture 13 What s New in Econometrics Lecture 13 Weak Instruments and Many Instruments Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Motivation 3. Weak Instruments 4. Many Weak) Instruments

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Dirichlet Process Mixtures of Generalized Linear Models

Dirichlet Process Mixtures of Generalized Linear Models Dirichlet Process Mixtures of Generalized Linear Models Lauren A. Hannah Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA David M. Blei Department

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

On the Support of MacEachern s Dependent Dirichlet Processes and Extensions

On the Support of MacEachern s Dependent Dirichlet Processes and Extensions Bayesian Analysis (2012) 7, Number 2, pp. 277 310 On the Support of MacEachern s Dependent Dirichlet Processes and Extensions Andrés F. Barrientos, Alejandro Jara and Fernando A. Quintana Abstract. We

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University

More information

Verifying Regularity Conditions for Logit-Normal GLMM

Verifying Regularity Conditions for Logit-Normal GLMM Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Dirichlet Process Mixtures of Generalized Linear Models

Dirichlet Process Mixtures of Generalized Linear Models Lauren A. Hannah David M. Blei Warren B. Powell Department of Computer Science, Princeton University Department of Operations Research and Financial Engineering, Princeton University Department of Operations

More information

BAYESIAN NONPARAMETRIC MODELLING WITH THE DIRICHLET PROCESS REGRESSION SMOOTHER

BAYESIAN NONPARAMETRIC MODELLING WITH THE DIRICHLET PROCESS REGRESSION SMOOTHER Statistica Sinica 20 (2010), 1507-1527 BAYESIAN NONPARAMETRIC MODELLING WITH THE DIRICHLET PROCESS REGRESSION SMOOTHER J. E. Griffin and M. F. J. Steel University of Kent and University of Warwick Abstract:

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS F. C. Nicolls and G. de Jager Department of Electrical Engineering, University of Cape Town Rondebosch 77, South

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003. A Note on Bootstraps and Robustness Tony Lancaster, Brown University, December 2003. In this note we consider several versions of the bootstrap and argue that it is helpful in explaining and thinking about

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density ISSN 1440-771X Australia Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Bayesian estimation of bandwidths for a nonparametric regression model

More information

Bayesian nonparametric regression with varying residual density

Bayesian nonparametric regression with varying residual density Ann Inst Stat Math 014) 66:1 31 DOI 10.1007/s10463-013-0415-z Bayesian nonparametric regression with varying residual density Debdeep Pati David B. Dunson Received: 0 May 010 / Revised: 7 March 013 / Published

More information

Bayesian Aggregation for Extraordinarily Large Dataset

Bayesian Aggregation for Extraordinarily Large Dataset Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Averaging Estimators for Regressions with a Possible Structural Break

Averaging Estimators for Regressions with a Possible Structural Break Averaging Estimators for Regressions with a Possible Structural Break Bruce E. Hansen University of Wisconsin y www.ssc.wisc.edu/~bhansen September 2007 Preliminary Abstract This paper investigates selection

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Bayesian nonparametric regression. with varying residual density

Bayesian nonparametric regression. with varying residual density Bayesian nonparametric regression with varying residual density Debdeep Pati, David B. Dunson Department of Statistical Science, Duke University, NC 27708 email: debdeep.pati@stat.duke.edu, dunson@stat.duke.edu

More information

NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES

NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES Author: Abhishek Bhattacharya Coauthor: David Dunson Department of Statistical Science, Duke University 7 th Workshop on Bayesian Nonparametrics Collegio

More information

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment Ben Shaby SAMSI August 3, 2010 Ben Shaby (SAMSI) OFS adjustment August 3, 2010 1 / 29 Outline 1 Introduction 2 Spatial

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015

More information

Bayesian Nonparametric Analysis of Longitudinal. Studies in the Presence of Informative Missingness

Bayesian Nonparametric Analysis of Longitudinal. Studies in the Presence of Informative Missingness Bayesian Nonparametric Analysis of Longitudinal Studies in the Presence of Informative Missingness Antonio R. Linero Department of Statistics, Florida State University, 214 Rogers Building, 117 N. Woodward

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Statistical Inference of Covariate-Adjusted Randomized Experiments

Statistical Inference of Covariate-Adjusted Randomized Experiments 1 Statistical Inference of Covariate-Adjusted Randomized Experiments Feifang Hu Department of Statistics George Washington University Joint research with Wei Ma, Yichen Qin and Yang Li Email: feifang@gwu.edu

More information

Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap

Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap Dale J. Poirier University of California, Irvine September 1, 2008 Abstract This paper

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL

POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL COMMUN. STATIST. THEORY METH., 30(5), 855 874 (2001) POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL Hisashi Tanizaki and Xingyuan Zhang Faculty of Economics, Kobe University, Kobe 657-8501,

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Dirichlet Process Mixtures of Generalized Linear Models

Dirichlet Process Mixtures of Generalized Linear Models Dirichlet Process Mixtures of Generalized Linear Models Lauren A. Hannah Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA David M. Blei Department

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Lecture Notes based on Koop (2003) Bayesian Econometrics

Lecture Notes based on Koop (2003) Bayesian Econometrics Lecture Notes based on Koop (2003) Bayesian Econometrics A.Colin Cameron University of California - Davis November 15, 2005 1. CH.1: Introduction The concepts below are the essential concepts used throughout

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information