type of the at; and an investigation of duration of unemployment in West Germany from 1980 to 1995, with spatio-temporal data from the German Federal

Size: px
Start display at page:

Download "type of the at; and an investigation of duration of unemployment in West Germany from 1980 to 1995, with spatio-temporal data from the German Federal"

Transcription

1 Bayesian Inference for Generalized Additive Mixed Models Based on Markov Random Field Priors Ludwig Fahrmeir and Stefan Lang University of Munich, Ludwigstr. 33, Munich and Abstract Most regression problems in practice require exible semiparametric forms of the predictor for modelling the dependence of responses on covariates. Moreover, it is often necessary to add random eects accounting for overdispersion caused by unobserved heterogeneity or for correlation in longitudinal or spatial data. We present a unied approach for Bayesian inference via Markov chain Monte Carlo (MCMC) simulation in generalized additive and semiparametric mixed models. Dierent types of covariates, such as usual covariates with xed eects, metrical covariates with nonlinear eects, unstructured random eects, trend and seasonal components in longitudinal data and spatial covariates are all treated within the same general framework by assigning appropriate priors with dierent forms and degrees of smoothness. The approach is particularly appropriate for discrete and other fundamentally non- Gaussian responses, where Gibbs sampling techniques developed for Gaussian models cannot be applied, but it also works well for Gaussian responses. We use the close relation between nonparametric regression and dynamic or state space models to develop posterior sampling procedures, based on Markov random eld priors. They include recent Metropolis-Hastings block move algorithms for dynamic generalized linear models and extensions for spatial covariates as building blocks. We illustrate the approach with a number of applications that arose out of consulting cases, showing that the methods are computionally feasible also in problems with many covariates and large data sets. Keywords: generalized semiparametric mixed models, Markov chain Monte Carlo, random eects, semiparametric Bayesian inference, spatial and spatiotemporal data, state space and Markov random eld models, varying coecients 1 Introduction This work has been motivated by various regression problems that resulted from consulting cases. We consider the following applications: Credit-scoring, where the aim is to model the probability that a client with certain covariates ("risk factors") will not pay back his credit as agreed upon by contract; a longitudinal study on forest damage to assess the impact of covariates like age of trees, ph value of soil and canopy density of the stand on the defoliation degree of trees; an analysis of rents paid for ats or appartments in Munich, depending on oor space, year of construction, location in the city, and a large number of indicators for equipment and 1

2 type of the at; and an investigation of duration of unemployment in West Germany from 1980 to 1995, with spatio-temporal data from the German Federal Employment Oce, including several time scales, regional eects and personal characteristics of unemployed persons. With exception of the rent example, responses are discrete. In all cases, we have metrical or spatially correlated covariates x 1 ; : : : ; x p, say, with unknown, possibly nonlinear eects, and a vector w of further covariates, whose inuence on the predictor is assumed to be linear. In the forest damage problem, we will add unstructured, tree-specic random eects b to account for unobserved heterogeneity or correlation. Therefore an additive predictor of the form = f 1 (x 1 ) + : : : + f p (x p ) + w 0 + b (1) seems reasonable. Together with an exponential family observation model and a suitable link function, (1) denes a generalized additive or semiparametric mixed model (GAMM). A further extension, that we consider in the forest damage study, leads to a varying coecient mixed model (VCMM). In the rent data application, we include structured, spatially correlated random effects that are specic for subquarters in Munich, to account for spatial heterogeneity not explained by other covariates. Similarly, in the unemployment duration analysis we include district specic spatial random eects. From a classical perspective, spatial eects are considered as correlated random eects with an appropriate prior reecting neighborhood relationships. For our Bayesian approach, it is more useful to consider them as spatially correlated eects f j (x j ) of a spatial covariate x j, and to incorporate them formally as a component of the nonparametric additive terms in (1). This will lead to a unied treatment of metrical covariates and spatially correlated random eects or, in other words, spatial covariates. Nevertheless, we well call such models GAMM or VCMM. Inference for these models with existing methodology has some drawbacks and limitations, in particular for non-gaussian responses. For GAMMs, Lin and Zhang (1999) propose approximate inference using smoothing splines and double penalized quasi-likelihood, extending previous works of Breslow and Clayton (1993), Breslow and Lin (1995) for generalized linear mixed models (GLMMs). As they point out in the discussion, similarly to approximate inference in GLMMs, there are bias problems, especially with binary data or correlated random eects, so that MCMC methods may be an attractive alternative. S-Plus has an option for exploratary data analysis in GAMMs with splines or loess functions, but there is no possibility for estimating variance components jointly with smoothing parameters. In this paper we propose a general Bayesian approach via Markov chain Monte Carlo (MCMC) for inference in generalized additive and varying coecients models, including mixed models with structured or unstructured random eects. As an important feature, all dierent types of covariates, or eects are considered from a unied viewpoint. Usual covariates with xed eects, metrical covariates, such as time scales with nonparametric trend or seasonal eect, unstructured random eects and spatial covariates are treated within the same general framework by assigning Markov random eld smoothness priors with common structure but dierent degrees of smoothness to corresponding eects. Data driven choice of smoothing parameters is automatically included. Although models with Gaussian responses are also 2

3 covered by the framework, our main interest lies in models for fundamentally non- Gaussian responses, such as binary or other discrete-valued responses. The data examples in the applications section show the practical feasibility in situations with many covariates and with large data sets. The MCMC procedure provides samples from all posteriors of interest and permits estimation of posterior means, medians, quantiles, condence bands and predictive distributions. No approximations based on conjectures of asymptotic normality have to be made, and no "plug in" procedures are needed. For Gaussian responses, Gibbs sampling can be used for fully Bayesian analysis based on smoothness priors, see for example Wong and Kohn (1996), who use state space or dynamic model representations of splines in additive models without any inclusion of unstructured or spatial random eects, or Hastie and Tibshirani (1998), who derive the Gibbs sampler as a Bayesian version of backtting. Bayesian basis function approaches with regression splines or a more general class of piecewise polynomials are proposed by Smith and Kohn (1996) and Denison et al. (1998), again without random eects. For fundamentally non-gaussian models, more general MCMC techniques than Gibbs sampling are needed, and there is still a lack of methods and of practical experience with existing suggestions. Hastie and Tibshirani (1998) sketch a Metropolis- Hastings-type algorithm for GAMs, but do not give any examples or statements about performance. Mallick et al. (1999) propose a Bayesian MARS method for GLMs as an extension of the Bayesian curve tting method of Denison et al. (1998), but, as they state, the sampler has slow convergence. The more recent RJMCMC algorithm developed by Biller (1999) for adaptive regression splines in semiparametric GLMs seems to have better convergence properties, and an extension to GAMMS and VCMMs might be promising. Our approach is based on the close relationship between dynamic generalized linear models (see, e.g., Fahrmeir and Tutz, 1997, ch.8) and generalized additive or varying coecient models (Hastie and Tibshirani, 1990, 1993). This relationship is well known for the classical smoothing problem, where observations y = (y(1); : : : ; y(n)) are assumed to be the sum y(t) = f(t) + "(t); "(t) N(0; 2 ) (2) of a smooth regression function f, evaluated at equidistant design points t = 1; : : : ; n, and independent Gaussian noise variables. Within a dynamic or state space framework the observation model (2) is supplemented by a linear Gaussian Markov model for the parameters or states f = (f(1); : : : ; f(n)). A common choice as so-called smoothness prior is a second order random walk model f(t) = 2f(t? 1)? f(t? 2) + u(t), u(t) N(0; 2 ). For given variances 2 and 2, posterior means ^f(t) and variances can be eciently computed by the Kalman lter and smoother. For a full Bayesian analysis with hyperpriors for the variances 2 and 2, the Kalman lter and smoother can be exploited for ecient, blockwise Gibbs sampling (Carter and Kohn, 1994; Fruhwirth-Schnatter, 1994). Due to the Gaussian observation model (2), the posterior mean estimates ^f(t) and posterior mode estimates coincide, and are equivalent to the solution of a corresponding penalized least squares criterion, see for example Fahrmeir and Tutz (1997, ch. 8.1), or Green and 3

4 Silverman (1994), in the related context of cubic smoothing splines. Basically, this equivalence extends to additive Gaussian models as well as to non-equally spaced design points or covariate observations. For fundamentally non-gaussian responses as considered in this paper, the observation model (2) has to be replaced by a non-gaussian model, and, as a consequence, the equivalence between posterior mean and posterior mode estimation is lost. The linear Kalman lter and smoother is no longer applicable and Gibbs sampling techniques cannot be reasonably applied. Therefore, we incorporate the Metropolis-Hastings algorithm with conditional prior proposals, developed by Knorr-Held (1999) in the context of dynamic generalized linear models, for drawing block move samples from posteriors of nonlinear eects f j (x j ) of metrical covariates x j. For spatial covariates with Markov random eld priors, this block move sampler is extended in a computationally ecient way for drawing from posteriors of spatial eects. The rest of the paper is organized as follows: Bayesian semiparametric mixed models are described in Section 2, while Section 3 contains details about the chosen MCMC techniques. In Section 4 we demonstrate the practical feasibility of our approach through the data applications mentioned in the beginning. In a rst example, we reanalyze the credit scoring data used in Fahrmeir and Tutz (1997) with a semiparametric additive model without random eects, and we apply a VCMM to the longitudinal study on forest damage. In the third example, a Gaussian additive mixed model with spatially correlated random eects is applied to the rent data example, showing that the method works well also for Gaussian responses in problems with many covariates and a large data set. Finally we present a space-time analysis of unemployment durations in West Germany with a massive data set from the German Federal Unemployment Oce through a discrete hazard model with a GAMM predictor that includes several time scales, a number of personal characteristics of unemployed persons and regional eects. 2 Bayesian semiparametric mixed models Consider now regression situations, where observations (y i ; x i1 ; : : : ; x ip ; w i ); i = 1; : : : ; n, on a response y, a vector x = (x 1 ; : : : ; x p ) of metrical or spatial covariates and a vector w of further covariates are given. In longitudinal studies, as in our applications to forest damage or to duration of unemployment in Section 4, the covariate vector will typically include one or more time scales, such as duration and calendar time. In some studies, as in the rents for ats data and the study on unemployment, a spatial covariate, such as location of ats or districts may be considered and appropriately incorporated into the model. Generalized additive and semiparametric models (Hastie and Tibshirani, (1990)) assume that, given x i = (x i1 ; : : : ; x ip ), and w i the distribution of y i belongs to an exponential family, with mean i = E(y i jx i ; w i ) linked to an additive semiparametric predictor i by i = h( i ); i = f 1 (x i1 ) + : : : + f p (x ip ) + w 0 i: (3) Here h is a known link or response function, and f 1 ; : : : ; f p are unknown smooth 4

5 functions of the covariates. Note that spatially correlated (random) eects of a spatial covariate x j are included as nonparametric component f j (x j ). For identiability reasons, unknown functions are centered appropriately. Observation models of the form (3) may be inappropriate if heterogeneity among units is not suciently described by covariates or for correlated data in a longitudinal study. A common way to deal with this problem is the inclusion of additive random eects into the predictor. This leads to mixed generalized additive models (GAMMs) with a predictor of the form i = f 1 (x i1 ) + + f p (x ip ) + wi 0 + b gi ; (4) where b gi is a unit- or group- specic random eect, with b gi = b g if unit i is in group g, g = 1; : : : ; G. For example, in our analysis of forest damage data, b g is an additional tree specic eect to account for correlation and unobserved heterogeneity. Due to the large number of trees a xed eect approach will not be feasible, and a random eects model is chosen instead. A further extension leads to varying coecient mixed models (VCMMs), i = f 1 (x i1 )z i1 + : : : + f p (x ip )z ip + wi 0 + b gi ; (5) see Hastie and Tibshirani (1993), for the case without random eects. The design vector z = (z 1 ; : : : ; z p ) may contain components of x or w as well as some additional covariates. If a design variable is identical to 1, e. g. z j = 1, then the corresponding function f j is the main eect of x j, while terms like f p (x ip )z ip model an eect of z p that varies over x p or, in other words, interaction between x p and z p. We will use a VCMM in the forest damage study, with time-varying eects and a tree-specic random eect. For Bayesian semiparametric inference, the unknown functions f 1 ; : : : ; f p, more exactly corresponding vectors of function evaluations, the parameters = ( 1 ; : : : ; r ) and the random eects b = (b(1); : : : ; b(g)) are all considered as random variables. The observation models (3), (4) or (5) are understood to be conditional upon these random variables, and have to be supplemented by appropriate prior distributions. Priors for the unknown functions f 1 ; : : : ; f p depend on the type of the covariates and on prior beliefs about smoothness of f j. Priors for time scales and metrical covariates are based on Gaussian smoothness priors that are common in dynamic generalized linear models, see, for example, Fahrmeir and Tutz (1997, ch. 8). For spatial covariates, priors are based on (Gaussian) Markov random elds, see for example Besag (1974), Besag et al. (1991) or Besag and Kooperberg (1995). Let us rst consider the case of a metrical covariate x with equally-spaced observations x i ; i = 1; : : : ; m, m n. Then the ordered sequence x (1) < : : : < x (t) < : : : < x (m) denes an equidistant grid on the x-axis. The typical case for this situation arises if the covariate x corresponds to time t, and the grid points correspond to time units such as weeks, months, or years, but generally x can be any ordered dimension. Dene f(t) := f(x (t) ) and let f = (f(1); : : : ; f(t); : : : ; f(m)) 0 denote the vector of function evaluations. Then, just as for the time trends example in Section 1, common priors for smooth functions are, respectively, rst or second 5

6 order random walk models f(t) = f(t? 1) + u(t) or f(t) = 2f(t? 1)? f(t? 2) + u(t) (6) with Gaussian errors u(t) N(0; 2 ) and diuse priors f(1) / const, and f(1) and f(2) / const, for initial values, respectively. Both specications act as smoothness priors that penalize too rough functions f. A rst order random walk penalizes abrupt jumps f(t)? f(t? 1) between successive states and a second order random walk penalizes deviations from the linear trend 2f(t? 1)? f(t? 2). Note that a second order random walk is derived by computing second dierences, i.e. the dierences of neighboring rst order dierences. The dierence in practice between the two specications is that estimated functions tend to be somewhat smoother for second order random walk priors. Of course, higher order dierence priors are also possible. For example if the covariate x is time t, measured in months, then a common smoothness prior for a seasonal component f(t) is f(t) + f(t? 1) + : : : + f(t? 11) = u(t) N(0; 2 ): (7) Next we consider the general case with non-equally spaced observations. Let x (1) < : : : < x (t) < : : : < x (m) denote the m n strictly ordered, dierent observations of the covariate x, and f = (f(1); : : : ; f(t); : : : ; f(m)) 0 with f(t) := f(x (t) ), the vector of function evaluations. Random walk or autoregressive priors have to be modied to account for nonequal distances t = x (t)? x (t?1) between observations. Random walks of rst order are now specied by f(t) = f(t? 1) + u(t); u(t) N(0; t 2 ); (8) i. e., by adjusting error variances from 2 to t 2. Random walks of second order are f(t) = 1 +! t f(t? 1)? t f(t? 2) + u(t); (9) t?1 t?1 u(t) N(0; w t 2 ), where w t is an appropriate weight. Several possibilities are conceivable for the weights. The simplest one is w t = t as for a rst order random walk. Another choice is w t = t (1 + t t?1 ), which also takes into account the distance t?1. It can be derived from the dierence f(t)? f(t? 1) f(t? 1)? f(t? 2)? t t? 1 of rst order dierences and treating them as they were independent. A related, yet dierent proposal for a second order autoregressive prior is given by Berzuini and Larizza (1996). Another possibility would be to use state space representations 6

7 of stochastic dierential equation priors based on the work by Kohn and Ansley (1987). Biller and Fahrmeir (1997) follow this idea, but there are signicant problems associated with convergence and mixing behaviour of posterior samples, compared with the priors chosen here. Due to the Markovian specication by random walks or other autoregressive models priors for the vector f of function evaluations are seemingly dened in an asymmetric, directed way. However, these priors can always be rewritten in an undirected, symmetric form. This follows from the fact that any discrete Markov process can be formulated in the undirected form of a Markov random eld by conditioning not only on previous variables f(t? 1); f(t? 2); etc. but also on future variables f(t + 1); f(t + 2), etc.. This becomes also evident from the joint Gaussian prior for the entire vector f: For each of the priors (8), (9) or (7) f has a partially improper Gaussian prior f N(0; 2 K? ); where K? is a generalized inverse of a band-diagonal precision or penalty matrix K. For example, it is easy to see that for a random walk of rst order (8), the penalty matrix is K = 0 which reduces to?1 2??1 2??1 2?1 2 +?1 3??1 3??1 3?1 3 +?1 4?? K = 0 m?2?1 m?2 +?1 m?1??1 m?1??1 m?1?1 m?1 +?1 m? m?1? m?1 m?1 1?1?1 2? ?1 2?1?1 1 for equidistant x-values. Let us now turn our attention to a spatial covariate x, where the values of x represent the location or site in connected geographical regions. For example in the unemployment study x represents the district in which the unemployed have their domicil whereas in the rents for ats example x indicates the location of the ats in munich. A common way to deal with spatial covariates is to assume that neighboring sites are more alike than two arbitrary sites. Thus for a valid prior denition a set of neighbors for each site x t must be dened. For geographical data as considered in this paper one usually assumes, that two sites x t and x j are neighbors if they share a common boundary. However more sophisticated neighborhood denitions are possible, see for example Besag et al. (1991). We assume the following spatial smoothness prior for the function evaluations f(t), t = 1; : : : ; m of the m dierent sites x t : 1 C A 1 C A : 7

8 f(t)jf(j) j 6= t; 2 N X j2@ j 1 f(j)=n t ; 2 =N t A ; (10) where N t is the number of adjacent sites and j t denotes, that site x j is a neighbor of site x t. Thus the (conditional) mean of f(t) is an unweighted average of function evaluations of neighboring sites. Note that for spatial data conditioning is undirected since there is no natural ordering of dierent sites x t as is the case for time scales or metrical covariates. A more general prior including (10) as a special case is given by f(t)jf(j) j 6= t; 2 N X 1 w tj =w t+ f(j); 1=w t+ 2 A ; (11) j2@t where w tj are known (not necessarily) equal weights and + denoting summation over the missing subscript. Such a prior is called a Gaussian intrinsic autoregression, see Besag et al. (1991) and Besag and Kooperberg (1995). The above prior (10) is obtained as a special case by specifying w tj = 1 resulting in equal weights for each neighbor of site x t. Unequal weights are for example based on the common boundary length of neighboring sites or the distance of the centroids of two sites, see Besag et al. (1991) for details. However, the applications in this paper are restricted to the prior (10) based on adjacency weights. As in the case of the autoregressive priors (8),(9) or (7), denition (11) may equivalently be written in terms of a penalty matrix K, i.e. fj 2 / exp? 1 2 f 0 Kf (12) 2 where the elements of K are given by and k tt = w t+ k tj = (? w tj w t+ j t 0 else: Usually this prior is improper since K is rank decient and thus not invertible. It is important to note that the autoregressive priors (8),(9) and (7) are all special cases of the Gaussian intrinsic autoregression (11) or the equivalent denition (12). For example, a rst order random walk for equidistant observations is obtained by dening w tj = 1 for j = t 1 and w tj = 0 else. This close formal similarity of our priors for the function f of a metrical or a spatial covariate allow a unied MCMC algorithm, that depends essentially on the penalty matrix K and is more or less independent of the type of covariate and denition of smoothness. This is described in detail in the next section. 8

9 For a fully Bayesian analysis, hyperpriors for variances are introduced in a further stage of the hierarchy. This allows for simultaneous estimation of the unknown function and the amount of smoothness. A common choice are highly dispersed inverse gamma priors p( 2 ) IG(a; b): A common choice for a and b is very small a = b, for example a = b = 0:0001, leading to almost diuse priors for the variance parameters. An alternative proposed, for example, in Besag et al. (1995) is a = 1 and a small value for b, such as b = 0:005. However since estimation results tend to be sensitive to the choice of hyperpriors, especially in situations when data is sparse, some kind of sensitivity analysis should always be performed. For the xed eect parameters 1 ; : : : ; r, we will assume independent diuse priors p( j ) / const, j = 1; : : : ; r. Another choice would be highly dispersed Gaussian priors. For random eects, we make the usual assumption that the b g 's are i.i.d. Gaussian, b g jv 2 N(0; v 2 ); g = 1; : : : ; G and dene again a highly dispersed hyperprior for v 2. In the following, let f = (f 1 ; : : : ; f p ); 2 = ( 2 1 ; : : : ; 2 p ); = ( 1 ; : : : ; r ); b = (b 1 ; : : : ; b G ) denote parameter vectors for function evaluations, variances, xed, and random eects. Then the Bayesian model specication is completed by the following conditional independence assumptions: i) For given covariates and parameters f, and b observations y i are conditionally independent. ii) Priors p(f j j 2 j ), j = 1; : : : ; p, are conditionally independent. iii) Priors for xed and random eects, and hyperpriors 2 j ; j = 1; : : : ; p, are mutually independent. 3 MCMC inference Full Bayesian inference is based on the entire posterior distribution p(f; 2 ; ; bjy) / p(yjf; 2 ; ; b)p(f; 2 ; ; b): By assumption (i), the conditional distribution of observed data y is the product of individual likelihoods: p(yjf; 2 ; ; b) = ny i=1 L i (y i ; i ); (13) 9

10 with L i (y i ; i ) determined by the specic exponential family distribution and the form chosen for the predictor. Together with the conditional independence assumptions (ii) and (iii), we have p(f; 2 ; ; bjy) / ny i=1 L i (y i ; i ) py j=1 fp(f j j 2 j )p( 2 j )g ry k=1 p( k ) GY g=1 p(b g jv 2 )p(v 2 ) for the posterior. Bayesian inference via MCMC simulation is based on updating full conditionals of single parameters or blocks of parameters, given the rest and the data. Single-move steps, as in Carlin et al. (1992), which update each parameter f(t) separately, suer from problems with convergence and mixing. For Gaussian models, Gibbs sampling with so-called multimove steps can be applied, see, for example, Carter and Kohn (1994) and Wong and Kohn (1996). For non-gaussian responses Gibbs sampling is no longer feasible and more general Metropolis Hastings algorithms are needed. We adopt and extend a computationally very ecient MH algorithm with conditional prior proposals developed recently by Knorr-Held (1999) for dynamic generalized linear models. Convergence and mixing is considerably improved by block moves, where blocks f[r; s] = (f(r); : : : ; f(s)) of parameters are updated instead of single parameters f(t). Suppressing conditioning parameters and data notationally, the full conditionals for the blocks f[r; s] are p(f[r; s] j ) / L(f[r; s]) p(f[r; s] j f(l); l =2 [r; s]; 2 ) The rst factor L(f[r; s]) is the product of all likelihood contributions in (13) that depend on f[r; s]. The second factor, the conditional distribution of f[r; s] given the rest f(l); l =2 [r; s], is a multivariate Gaussian distribution. Its mean and covariance matrix can be written in terms of the precicision matrix K of f. Let K[r; s] denote the submatrix of K, given by the rows and columns numbered r to s and let K[1; r?1] and K[s + 1; m] denote the matrices left and right of K[r; s]. Then the (conditional) mean [r; s] and the covariance matrix [r; s] are given by [rs] = 2 8 > < >: and?k[r; s]?1 K[s + 1; m]f[s + 1; m] r = 1?K[r; s]?1 K[1; r? 1]f[1; r? 1] s = m?k[r; s]?1 (K[1; r? 1]f[1; r? 1] + K[s + 1; m]f[s + 1; m]) else [r; s] = 2 K[r; s]?1 ; respectively. This result may be obtained by applying usual formulae for conditional Gaussian distributions. MH block move updates for f[r; s] are obtained by drawing a conditional prior proposal f [r; s] from the conditional Gaussian N([r; s]; [r; s]) and accepting it with probability minf1; L(f [r; s]) L(f[r; s]) g: 10

11 A fast implementation of MCMC updates requires ecient computing of the mean [r; s]. For that reason we dene the matrices K[r; s] l = K[r; s]?1 K[1; r? 1] and K[r; s] r = K[r; s]?1 K[s + 1; m]. Then the conditional mean may be rewritten as [rs] = 2 8 > < >: K[r; s] r f[s + 1; m] r = 1 K[r; s] l f[1; r? 1] r = m K[r; s] l f[1; r? 1] + K[r; s] r f[s + 1; m] else. Knorr-Held's (1999) conditional prior proposal refers to autoregressive priors in dynamic models, where the precision matrix K has nonzero diagonal bands, implying sparse structures for K[r; s] r and K[r; s] l where most elements are zero. For example for a rst order random walk only the rst column of K[r; s] r and the last column of K[r; s] l contains nonzero elements. For spatial covariates, the precision matrix is no longer band-diagonal but still sparse, with nonzero entries reecting neighborhood relationships, so that sparse matrix operations can be used to improve computational eciency. Now computing of [r; s] and [r; s] 1=2 required for drawing the conditional prior proposal f [r; s] is as follows: For every block [r; s] we compute and store the matrices [r; s]?1=2, K[r; s] l and K[r; s] r in advance, where the last two are stored as sparse matrices. In every iteration of our MCMC algorithm computing of [r; s] and [r; s]?1=2 requires at most two (sparse) matrix multiplications with a column vector and multiplication of the resulting matrices with the scalar 2. From a computational point of view, another main advantage is the simple form of the acceptance probability. Only the likelihood has to be computed, no rst or second derivates etc. are involved, thus considerably reducing the number of calculations. The full conditional for a variance parameter 2 is still an inverse gamma distribution p( 2 j ) / IG(a 0 ; b 0 ): (14) Its parameters a 0 and b 0 again can be expressed in terms of the penalty matrix K and are given by and a 0 = a + rg(k) 2 b 0 = b f 0 Kf respectively. Thus updating of variance parameters can be done by simple Gibbs steps, drawing directly from the inverse gamma densities (14). Once again a fast implementation requires sparse matrix multiplications of f 0 Kf. With a diuse prior p( j ) = const for the xed eects parameters, the full conditional for is p( j ) / ny i=1 L i (y i ; i ): 11

12 Updating of can in principle be done by MH steps with a random walk proposal q(; ), but a serious problem is tuning, i.e. specifying a suitable covariance matrix for the proposal that guarantees high acceptance rates and good mixing. Especially when the dimension of is high, with signicant correlations among components, tuning \by hand" is no longer feasible. An alternative is the weighted least squares proposal suggested by Gamerman (1997). Here a Gaussian proposal is used with mean m() and covariance matrix C(), where is the current state of the chain. The mean m() is obtained by making one Fisher scoring step to maximize the full conditional p( j ) and C() is the inverse of the expected Fisher information, evaluated at the current state of the chain. In this case the acceptance probability of a proposed new vector is minf1; p( j )q( ; ) p( j )q(; ) g (15) Note that q is not symmetric because the covariance matrix C of q depends on. Thus in principle the fraction q( ; )=q(; ) can not be omitted from (15). In practice however experience shows that this fraction is almost always near one, so omitting the fraction does not aect signicantly the eciency of the algorithm, but rather leads to a considerable saving in computation. Further computer time can be saved by omitting the Fisher scoring step when computing the mean of Gamermans proposal, and simply taking the current state of the chain as the mean. Compared to Gamermans original proposal our slightly modied updating scheme for xed eects parameters is more ecient and avoids tuning \by hand". For an additional random intercept, the full conditional for parameter b g is given by p(b g j ) / Y i2fj:g j =gg L i (y i ; i )p(b g jv 2 ) Here a simple Gaussian random walk proposal with mean b g and variance v 2 works well in most cases. To improve mixing, tuning is sometimes required by multiplying the prior variance v 2 in the proposal with a constant factor, e.g. 2. An alternative is, again, Gamerman's weighted least squares proposal or a slight modication. This becomes especially attractive when the observation model contains one or more random slope parameters in addition to the random intercept. By analogy to the variance parameters j 2 of nonparametric terms the full conditional of v 2 is again an inverse gamma distribution, so updating is straightforward. 4 Applications The following applications arose out of consulting cases and show the practical feasibility of the methods even for complex models. In the rst example, a reanalysis of credit scoring data described in Fahrmeir and Tutz (1997) illustrate performance for a semiparametric model without random eects. The data are available from the authors. In the other three applications, we use a VCMM with unstructured random eects for the forest damage data, and GAMMs with spatial random eects 12

13 and many covariates for the large data set applications on rents and unemployment duration. To assess the sensitivity of the estimated functions from the hyperparameters a and b of the prior for the variance parameter in all applications models where (re)estimated with dierent choices of a and b. We tested four dierent choices for the hyperparameters, that is a = b = 0:01, a = b = 0:0001, a = 1,b = 0:01 and a = 1, b = 0:005. However, since in all applications except the forest damage data estimation results dier only slightly for the dierent choices of hyperpriors, results are printed only for our standard choice for the hyperparameters, that is a = 1 and b = 0:005. Only for the forest damage application we give a comparison of relative changes of estimated functions with respect to the dierent hyperparameters. 4.1 Credit-Scoring In our rst application, we reanalyze the credit{scoring problem described in Fahrmeir and Tutz (1997, ch. 2.1). The aim of credit{scoring is to model or predict the probability that a client with certain covariates (\risk factors") is to be considered as a potential risk, and therefore will probably not pay back his credit as agreed upon by contract. The data set consists of 1000 consumers' credits from a South German bank. The response variable is \creditability", which is given in dichotomous form (y = 0 for creditworthy, y = 1 for not creditworthy). In addition, 20 covariates assumed to inuence creditability were collected. As in Fahrmeir and Tutz (1997), we will use a subset of these data, containing only the following covariates, which are partly metrical and partly categorical: x 1 x 3 x 4 x 5 x 6 x 8 running account, trichotomous with categories \no running account" (= 1), \good running account" (= 2), \medium running account" (\less than 200 DM" = 3 = reference category) duration of credit in months, metrical amount of credit in DM, metrical payment of previous credits, dichotomous with categories \good", \bad" (=reference category) intended use, dichotomous with categories \private" or \professional" (=reference category) marital status, with reference category \living alone". Eect coding is used for all categorical covariates. A parametric logit model for the probability pr(y = 1jx) of being not creditworthy, leads to the conclusion that the covariate \amount of credit" has no signicant inuence on the risk. Here, we reanalyze the data with a semiparametric logit model log pr(y = 1jx) 1? pr(y = 1jx) = x x f 3 (x 3 ) + f 4 (x 4 ) + 5 x x x 8 ; where x 1 1 and x 2 1 are dummies for the categories \good" and \medium" running account. The predictor has semiparametric additive form: The smooth functions f 3 (x 3 ), f 4 (x 4 ) of the metrical covariates \duration of credit" and \amount of credit", 13

14 are estimated nonparametrically using second order random walk models for non{ equally spaced observations. The constant 0 and the eects 1 ; 2 ; 5 ; 6 ; 8 of the remaining categorical covariates are considered as xed and estimated jointly with the curves. Figure 1 shows estimates for the curves f 3 and f 4. For comparison, cubic smoothing splines are included in addition to posterior mean estimates. Although cubic splines are posterior mode estimators and the penalty terms are not exactly the same, both estimates are close. While the eect of the variable \duration of credit" is almost linear, the eect of \amount of credit" is clearly nonlinear. The curve has a bathtub shape, and indicates that not only high credits but also low credits increase the risk, compared to \medium" credits between 3000{6000 DM. Apparently, if the inuence is misspecied by assuming a linear function 4 x 4 instead of f 4 (x 4 ), the estimated eect b 4 will be near zero, corresponding to an almost horizontal line b 4 x 4 near zero, and falsely considered as nonsignicant. Table 1 gives the posterior means together with 80% credible intervals and, for comparison, maximum likelihood estimates of the remaining eects. Both estimates are in close agreement. They also have the same signs and are quite near to the estimates for a parametric logit model given in Fahrmeir and Tutz (1997), so that interpretation remains qualitatively the same for these constant eects. Closeness of xed eects is in agreement with asymptotic normality results for penalized likelihood estimations in semiparametric GLMs by Mammen and van de Geer (1997). Closeness of posterior means and methods of nonparametric components is in agreement with empirical evidence and own experience in the related context of non-gaussian state space models. A heuristic justication is given in Fahrmeir and Wagenpfeil (1997), but rigorous asymptotic results are still missing. A main interest with credit scoring is prediction, that is to predict the probability of failure 0 for a new client with characteristics x 0. Within a Bayesian framework this can easily be done by just drawing samples from the predictive distribution. The advantage compared to existing frequentist based approaches is that we get not only point estimates but also exact credible intervals to assess uncertainty of the prediction. Suppose for example that we have a new client with covariate vector x 0 = (?1;?1; 16; 1:4; 1;?1; 1). Then sampling from the predictive distribution results in a predicted probability of 0 = 0:35 (posterior mean) and a credible interval of (0:2; 0:54) based on the posterior 10 and 90 percent quantiles. It should be noted that the credit scoring data come from a stratied sample with 300 "bad" and 700 good credidts. This explains the high value 0 of risk. To obtain realistic values a stratication correction is necessary, compare Anderson (1980). Table 1: Estimates of constant parameters for the credit{scoring data. covariate mean 10 % quantile 90 % quantile ML estimator x x x x x

15 duration of credit in months amount of credit in 1000 DM Figure 1: Estimated eects of duration and amount of credit. Shown is the posterior mean within 80 % credible regions and for comparison cubic smoothing splines (dotted lines). 4.2 Forest damage In this longitudinal study, we analyze the inuence of calendar time, age of trees, ph value of soil and canopy density of the stand on the damage state of beeches at the stand. Data have been collected in yearly forest damage inventories carried out in the forest district of Rothenbuch in northern Bavaria from 1983 to There are 80 observation points with occurence of beeches spread over the whole area. We use the degree of defoliation as a binary indicator for damage state, with y it = 1 for "light or distinct damage" of tree i in year t, y it = 0 for "no damage". Figure 2 shows relative frequencies of the response "damage state" over the years. There is a clear pattern with a maximum of damage around 1986, while trees seem to recover in the following ve years. A detailed data description can be found in Gottlein and Pruscha (1996). Covariates used here are dened as follows: 15

16 Figure 2: relative frequencies of the response "damage state" over the years. A age of tree at the beginning of the study in 1983, measured in three categories "below 50 years" (=1), between 50 and 120 years (=2), and above 120 years (reference category); CD Canopy density at the stand with categories "low" (=1), "medium" (=2), and "high" (reference category); ph ph value of the soil near the surface, measured as a metrical covariate with values ranging from a minimum from 3.3 to a maximum of 6.1 The covariate ph value and canopy density are time varying, while age is time constant by denition. Based on results of Gieger (1998) with a marginal model, we use a logistic VCMM with predictor = f 1 (t) + f 2 (t)a (1) + f 3 (t)a (2) + f 4 (ph) + 1 CD (1) + 2 CD (2) + b; where t is calendar time in years, and A (1) ; A (2) ; CD (1) ; CD (2) are dummy variables for A and CD. The impact of calendar time is modelled by a baseline eect f 1 (t) and time-varying eects f 3 (t), f 4 (t) of age categories, and the possibly nonlinear eect of ph is also modelled nonparametrically, using RW (2) smoothness priors. The tree-specic random eect b accounts for correlation and unobserved heterogeneity. Figure 3 show posterior mean estimates for the baseline eect f 1 (t), the time-varying eects f 2 (t) and f 3 (t) of age and the eect of the ph value f 4 (ph). The baseline eect f 1 (t) corresponds to the time trend of old trees over 120 years. Trees in this age group recovered somewhat after 1986, but then the probability for damage increases again. This is in contrast to the eect of the age groups A (1) (young trees) and A (2) (medium age). The estimated curve f 2 (t) is signicantly negative and decreases, that is, in comparison to old trees, younger trees have lower probabilities of being damaged and they seem to recover better over the years. The eect for trees of medium age in gure c) is similar but less pronounced. As we might have expected, low ph values, i.e, acid soil, have a positive eect on damage probability; it decreases however with soil becoming less acid, until the eect vanishes. 16

17 a) calendar time b) varying effect of age category time c) varying effect of age category time d) ph value time ph value Figure 3: Estimated nonparametric functions for the forest damage data. Shown is the posterior mean within 80 % credible regions. Table 2: Relative changes of estimated functions for dierent choices of hyperparameters. a b f 1 (t) f 2 (A (1) f 3 (A (2) ) f 4 (ph) Note that the impact of this eect seems to be very small, because credible intervals are very large, indicating strong uncertainty with the estimated eect. This becomes even more obvious, if we compare estimation results for dierent choices of hyperparamters of the variance parameters. Table 2 compares relative changes of estimated functions for dierent choices of hyperpriors with respect to our standard choice a = 1 and b = 0:005. As can be seen, estimates for the eect of the ph value change considerably for dierent hyperpriors while the remaining eects are more or less unaected (a least the plotted functions can not be distinguished) by the respective choice of hyperparameters. This demonstrates once again the uncertainty about the eect of the ph value. Estimated eects for canopy density are given in Table 3. Stands with low (CD (1) ) or medium (CD (2) ) density have an increased probability for damage compared to 17

18 Table 3: Estimates of constant parameters for the forest damage data. covariate mean 10% quantile 90% quantile CD CD stands with high canopy density: it seems that beeches get more shelter from bad environmental inuences in stands with high canopy density. 4.3 Rents for ats in Munich According to the German rental law, owners of appartments or ats can base an increase in the amount that they charge for rent on "average rents" for ats comparable in type, size, equipment, quality and location in a community. To provide information about these "average rents", most larger cities publish "rental guides", which can be based on regression analysis with rent as the dependent variable. We use data from the City of Munich, collected in 1998 by Infratest Sozialforschung for a random sample of more than 3000 ats. As response variable we chose R monthly net rent per square meter in German Marks, that is the monthly rent minus calculated or estimated utility costs. Covariates characterizing the at were constructed from almost 200 variables out of a questionnaire answered by tenants of ats. In the following reanalysis, we include 27 covariates. Here is a selection of some typical covariates: F oor space in square meters A year of construction H 0j1 no central heating indicator B 0j1 no bathroom indicator E 0j1 indicator of bathroom equipment above average K 0j1 indicator of kitchen equipment above average W 0j1 indicator of no central warm water system S 0j1 indicator of large balcony to the south or west O 0j1 indicator of simple, old building, constructed before 1949 R 0j1 indicator of old building, renovated, in good state N 0j1 indicator of new modern building, built after 1978 For the ocial Munich '99 rental guide, location in the city was assessed in three categories (average, good, top) by experts. Our main focus here is to investigate empirically the validity of this experts assessment. For that reason we estimated two models, one model with the experts assessment excluded from the predictor and another with the experts assessment included. In both models we include additional spatially correlated random eects that are specic for subquarters ("Bezirksviertel") in Munich to account for extra spatial variation. If the experts assessment is valid the extra spatial variation measured by the random eects should considerably decrease for model two. 18

19 So we choose Gaussian additive mixed models with predictor for model one and = 0 + f 1 (F ) + f 2 (A) + f 3 (s) + w 0 = 0 + f 1 (F ) + f 2 (A) + f 3 (s) + w (L 1 ) + 2 (L 2 ) for model two, where L 1 and L 2 are 0=1 indicators for good and top locations, assessed by experts. In both cases w is the vector of 25 binary indicators, including those mentioned above, and f(s) is a random eect for the subquarter s = s i, where the at i is located. Except for the random eects we give here only results for model one since results for the other eects dier only slightly. Figure 4 a) shows the strong inuence of oor space on rents: small ats and appartements are considerably more expensive than larger ones, but this nonlinear eect becomes smaller with increasing oor space. The eect of year of construction on rents in gure b) is more or less constant till the '50s. It then distinctly increases till about 1990, and it stabilizes on a high level in the '90s. Estimates for xed eects of selected binary indicators are given in Table 4, showing positive or negative eects as to be expected. By construction, the estimated eect is the additional positive or negative amount of net rent per square meter caused by the presence (=1) of the indicator for a at. a) floor space b) year of construction floor space in square meters year of construction Figure 4: Estimated eects of oor space and year of construction for the rent data. Shown is the posterior mean within 80 % credible regions Figure 5 a) shows a map of Munich, displaying subquarters and their spatial random eects f(s) for model one, i.e., without experts' location indicators. Figure 5 b) shows the corresponding map for model two, with inclusion of location indicators L 1 and L 2. In Figure a) there is still a lot of spatial variation, showing increaded rents in the downtown area and near parks. The remaining variation after inclusion of L 1 and L 2 is much smoother. This shows that experts' assessment explains a lot of, but not all eects of location on rents. 19

20 a) model one hoehe b) model breitetwo hoehe Figure 5: Estimated spatial random eect for model one and model two. Shown is breite the posterior mean. 20

21 Table 4: Estimates of selected constant parameters for the rents for ats. covariate mean 10% quantile 90% quantile const H B E K W S O R N Duration of unemployment In this fourth application, we analyze ocial unemployment data from the German Federal Employment Oce (\Bundesanstalt fur Arbeit"). Our analysis is based on a sample of approximately 6300 males with one or more unemployment spells during the observation period from 1980 to Since the dataset must be augmented to apply an ordinary logistic regression model, see below, we end up with more than observations which is a very large dataset. Typical questions that arise in studies on duration of unemployment are: How can the baseline eect (duration dependence) be modelled? How can trend and seasonal eects of calendar time be exibly incorporated? What eect has age? Are there regional dierences for the probability of leaving unemployment and seeking a new job? An important problem in connection with persistant unemployment in the 90 0 s in Europe, is the eect of unemployment compensation and social welfare. Are there negative side-eects of public unemployment compensation? Our analysis is based on the following covariates: D A N U C calendar time measured in months age (in years) at the beginning of unemployment nationality, dichotomous with categories \german" and \foreigner" (= reference category) unemployment compensation, trichotomous with categories \unemployment benet" (=reference category), \unemployment assistance" (U 1 ) and \subsistence allowance" (U 2 ). district in which the unemployed have their domicil Note that calendar time D and unemployment compensation U are both duration time dependent covariates. As in our rst application eect coding is used for all categorical covariates. Since duration of unemployment is measured in months, we use a discrete time duration model as described in Fahrmeir and Tutz (1997, ch. 9). Let T = t 2 f1; : : : ; q + 1g denote end of duration in month t after beginning of 21

Fahrmeir: Recent Advances in Semiparametric Bayesian Function Estimation

Fahrmeir: Recent Advances in Semiparametric Bayesian Function Estimation Fahrmeir: Recent Advances in Semiparametric Bayesian Function Estimation Sonderforschungsbereich 386, Paper 137 (1998) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Recent Advances in Semiparametric

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

Inversion Base Height. Daggot Pressure Gradient Visibility (miles) Stanford University June 2, 1998 Bayesian Backtting: 1 Bayesian Backtting Trevor Hastie Stanford University Rob Tibshirani University of Toronto Email: trevor@stat.stanford.edu Ftp: stat.stanford.edu:

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

for function values or parameters, which are neighbours in the domain of a metrical covariate or a time scale, or in space. These concepts have been u

for function values or parameters, which are neighbours in the domain of a metrical covariate or a time scale, or in space. These concepts have been u Bayesian generalized additive mied models. study A simulation Stefan Lang and Ludwig Fahrmeir University of Munich, Ludwigstr. 33, 8539 Munich email:lang@stat.uni-muenchen.de and fahrmeir@stat.uni-muenchen.de

More information

Fahrmeir: Discrete failure time models

Fahrmeir: Discrete failure time models Fahrmeir: Discrete failure time models Sonderforschungsbereich 386, Paper 91 (1997) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Discrete failure time models Ludwig Fahrmeir, Universitat

More information

Fahrmeir, Knorr-Held: Dynamic and semiparametric models

Fahrmeir, Knorr-Held: Dynamic and semiparametric models Fahrmeir, Knorr-Held: Dynamic and semiparametric models Sonderforschungsbereich 386, Paper 76 (1997) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Dynamic and semiparametric models Ludwig

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Managing Uncertainty

Managing Uncertainty Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel The Bias-Variance dilemma of the Monte Carlo method Zlochin Mark 1 and Yoram Baram 1 Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel fzmark,baramg@cs.technion.ac.il Abstract.

More information

Pruscha, Göttlein: Regression Analysis for Forest Inventory Data with Time and Space Dependencies

Pruscha, Göttlein: Regression Analysis for Forest Inventory Data with Time and Space Dependencies Pruscha, Göttlein: Regression Analysis for Forest Inventory Data with Time and Space Dependencies Sonderforschungsbereich 386, Paper 173 (1999) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Monetary and Exchange Rate Policy Under Remittance Fluctuations. Technical Appendix and Additional Results

Monetary and Exchange Rate Policy Under Remittance Fluctuations. Technical Appendix and Additional Results Monetary and Exchange Rate Policy Under Remittance Fluctuations Technical Appendix and Additional Results Federico Mandelman February In this appendix, I provide technical details on the Bayesian estimation.

More information

Beyond Mean Regression

Beyond Mean Regression Beyond Mean Regression Thomas Kneib Lehrstuhl für Statistik Georg-August-Universität Göttingen 8.3.2013 Innsbruck Introduction Introduction One of the top ten reasons to become statistician (according

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Widths. Center Fluctuations. Centers. Centers. Widths

Widths. Center Fluctuations. Centers. Centers. Widths Radial Basis Functions: a Bayesian treatment David Barber Bernhard Schottky Neural Computing Research Group Department of Applied Mathematics and Computer Science Aston University, Birmingham B4 7ET, U.K.

More information

Küchenhoff: An exact algorithm for estimating breakpoints in segmented generalized linear models

Küchenhoff: An exact algorithm for estimating breakpoints in segmented generalized linear models Küchenhoff: An exact algorithm for estimating breakpoints in segmented generalized linear models Sonderforschungsbereich 386, Paper 27 (996) Online unter: http://epububuni-muenchende/ Projektpartner An

More information

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Andreas Groll 1 and Gerhard Tutz 2 1 Department of Statistics, University of Munich, Akademiestrasse 1, D-80799, Munich,

More information

1. INTRODUCTION State space models may be formulated in avariety of ways. In this paper we consider rst the linear Gaussian form y t = Z t t + " t " t

1. INTRODUCTION State space models may be formulated in avariety of ways. In this paper we consider rst the linear Gaussian form y t = Z t t +  t  t A simple and ecient simulation smoother for state space time series analysis BY J. DURBIN Department of Statistics, London School of Economics and Political Science, London WCA AE, UK. durbinja@aol.com

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Fahrmeir, Knorr-Held: Dynamic discrete-time duration models. (REVISED)

Fahrmeir, Knorr-Held: Dynamic discrete-time duration models. (REVISED) Fahrmeir, Knorr-Held: Dynamic discrete-time duration models. (REVISED) Sonderforschungsbereich 386, Paper 14 (1996) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Dynamic discrete{time duration

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Biller: Adaptive Bayesian Regression Splines in Semiparametric Generalized Linear Models

Biller: Adaptive Bayesian Regression Splines in Semiparametric Generalized Linear Models Biller: Adaptive Bayesian Regression Splines in Semiparametric Generalized Linear Models Sonderforschungsbereich 386, Paper 133 (1998) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Adaptive

More information

mboost - Componentwise Boosting for Generalised Regression Models

mboost - Componentwise Boosting for Generalised Regression Models mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten Hothorn Department of Statistics Ludwig-Maximilians-University Munich 13.8.2008 Boosting in a Nutshell Boosting

More information

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus

More information

Gaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada.

Gaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada. In Advances in Neural Information Processing Systems 8 eds. D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, MIT Press, 1996. Gaussian Processes for Regression Christopher K. I. Williams Neural Computing

More information

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access Online Appendix to: Marijuana on Main Street? Estating Demand in Markets with Lited Access By Liana Jacobi and Michelle Sovinsky This appendix provides details on the estation methodology for various speci

More information

The Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB)

The Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB) The Poisson transform for unnormalised statistical models Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB) Part I Unnormalised statistical models Unnormalised statistical models

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modeling for small area health data Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modelling for Small Area

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

2 Tikhonov Regularization and ERM

2 Tikhonov Regularization and ERM Introduction Here we discusses how a class of regularization methods originally designed to solve ill-posed inverse problems give rise to regularized learning algorithms. These algorithms are kernel methods

More information

1/sqrt(B) convergence 1/B convergence B

1/sqrt(B) convergence 1/B convergence B The Error Coding Method and PICTs Gareth James and Trevor Hastie Department of Statistics, Stanford University March 29, 1998 Abstract A new family of plug-in classication techniques has recently been

More information

Modeling Real Estate Data using Quantile Regression

Modeling Real Estate Data using Quantile Regression Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Working Papers in Economics and Statistics

Working Papers in Economics and Statistics University of Innsbruck Working Papers in Economics and Statistics Simultaneous probability statements for Bayesian P-splines Andreas Brezger and Stefan Lang 2007-08 University of Innsbruck Working Papers

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Prediction of ordinal outcomes when the association between predictors and outcome diers between outcome levels

Prediction of ordinal outcomes when the association between predictors and outcome diers between outcome levels STATISTICS IN MEDICINE Statist. Med. 2005; 24:1357 1369 Published online 26 November 2004 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2009 Prediction of ordinal outcomes when the

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35 Rewrap ECON 4135 November 18, 2011 () Rewrap ECON 4135 November 18, 2011 1 / 35 What should you now know? 1 What is econometrics? 2 Fundamental regression analysis 1 Bivariate regression 2 Multivariate

More information

Implementing componentwise Hastings algorithms

Implementing componentwise Hastings algorithms Computational Statistics & Data Analysis 48 (2005) 363 389 www.elsevier.com/locate/csda Implementing componentwise Hastings algorithms Richard A. Levine a;, Zhaoxia Yu b, William G. Hanley c, John J. Nitao

More information

Markov Chain Monte Carlo in Practice

Markov Chain Monte Carlo in Practice Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France

More information

PENALIZED STRUCTURED ADDITIVE REGRESSION FOR SPACE-TIME DATA: A BAYESIAN PERSPECTIVE

PENALIZED STRUCTURED ADDITIVE REGRESSION FOR SPACE-TIME DATA: A BAYESIAN PERSPECTIVE Statistica Sinica 14(2004), 715-745 PENALIZED STRUCTURED ADDITIVE REGRESSION FOR SPACE-TIME DATA: A BAYESIAN PERSPECTIVE Ludwig Fahrmeir, Thomas Kneib and Stefan Lang University of Munich Abstract: We

More information

Bayesian Modeling of Conditional Distributions

Bayesian Modeling of Conditional Distributions Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Deposited on: 07 September 2010

Deposited on: 07 September 2010 Lee, D. and Shaddick, G. (2008) Modelling the effects of air pollution on health using Bayesian dynamic generalised linear models. Environmetrics, 19 (8). pp. 785-804. ISSN 1180-4009 http://eprints.gla.ac.uk/36768

More information

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals John W. Mac McDonald & Alessandro Rosina Quantitative Methods in the Social Sciences Seminar -

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Spatially Adaptive Smoothing Splines

Spatially Adaptive Smoothing Splines Spatially Adaptive Smoothing Splines Paul Speckman University of Missouri-Columbia speckman@statmissouriedu September 11, 23 Banff 9/7/3 Ordinary Simple Spline Smoothing Observe y i = f(t i ) + ε i, =

More information

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1 To appear in M. S. Kearns, S. A. Solla, D. A. Cohn, (eds.) Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 999. Learning Nonlinear Dynamical Systems using an EM Algorithm Zoubin

More information

CV-NP BAYESIANISM BY MCMC. Cross Validated Non Parametric Bayesianism by Markov Chain Monte Carlo CARLOS C. RODRIGUEZ

CV-NP BAYESIANISM BY MCMC. Cross Validated Non Parametric Bayesianism by Markov Chain Monte Carlo CARLOS C. RODRIGUEZ CV-NP BAYESIANISM BY MCMC Cross Validated Non Parametric Bayesianism by Markov Chain Monte Carlo CARLOS C. RODRIGUE Department of Mathematics and Statistics University at Albany, SUNY Albany NY 1, USA

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

* * * * * * * * * * * * * * * ** * **

* * * * * * * * * * * * * * * ** * ** Generalized Additive Models Trevor Hastie y and Robert Tibshirani z Department of Statistics and Division of Biostatistics Stanford University 12th May, 1995 Regression models play an important role in

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

13 Notes on Markov Chain Monte Carlo

13 Notes on Markov Chain Monte Carlo 13 Notes on Markov Chain Monte Carlo Markov Chain Monte Carlo is a big, and currently very rapidly developing, subject in statistical computation. Many complex and multivariate types of random data, useful

More information

Bayesian Analysis of Vector ARMA Models using Gibbs Sampling. Department of Mathematics and. June 12, 1996

Bayesian Analysis of Vector ARMA Models using Gibbs Sampling. Department of Mathematics and. June 12, 1996 Bayesian Analysis of Vector ARMA Models using Gibbs Sampling Nalini Ravishanker Department of Statistics University of Connecticut Storrs, CT 06269 ravishan@uconnvm.uconn.edu Bonnie K. Ray Department of

More information

Duration Analysis. Joan Llull

Duration Analysis. Joan Llull Duration Analysis Joan Llull Panel Data and Duration Models Barcelona GSE joan.llull [at] movebarcelona [dot] eu Introduction Duration Analysis 2 Duration analysis Duration data: how long has an individual

More information

Research Division Federal Reserve Bank of St. Louis Working Paper Series

Research Division Federal Reserve Bank of St. Louis Working Paper Series Research Division Federal Reserve Bank of St Louis Working Paper Series Kalman Filtering with Truncated Normal State Variables for Bayesian Estimation of Macroeconomic Models Michael Dueker Working Paper

More information

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape Nikolaus Umlauf https://eeecon.uibk.ac.at/~umlauf/ Overview Joint work with Andreas Groll, Julien Hambuckers

More information

1. INTRODUCTION Propp and Wilson (1996,1998) described a protocol called \coupling from the past" (CFTP) for exact sampling from a distribution using

1. INTRODUCTION Propp and Wilson (1996,1998) described a protocol called \coupling from the past (CFTP) for exact sampling from a distribution using Ecient Use of Exact Samples by Duncan J. Murdoch* and Jerey S. Rosenthal** Abstract Propp and Wilson (1996,1998) described a protocol called coupling from the past (CFTP) for exact sampling from the steady-state

More information

data lam=36.9 lam=6.69 lam=4.18 lam=2.92 lam=2.21 time max wavelength modulus of max wavelength cycle

data lam=36.9 lam=6.69 lam=4.18 lam=2.92 lam=2.21 time max wavelength modulus of max wavelength cycle AUTOREGRESSIVE LINEAR MODELS AR(1) MODELS The zero-mean AR(1) model x t = x t,1 + t is a linear regression of the current value of the time series on the previous value. For > 0 it generates positively

More information

Pruscha: Semiparametric Estimation in Regression Models for Point Processes based on One Realization

Pruscha: Semiparametric Estimation in Regression Models for Point Processes based on One Realization Pruscha: Semiparametric Estimation in Regression Models for Point Processes based on One Realization Sonderforschungsbereich 386, Paper 66 (1997) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

On estimation of the Poisson parameter in zero-modied Poisson models

On estimation of the Poisson parameter in zero-modied Poisson models Computational Statistics & Data Analysis 34 (2000) 441 459 www.elsevier.com/locate/csda On estimation of the Poisson parameter in zero-modied Poisson models Ekkehart Dietz, Dankmar Bohning Department of

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER MLE Going the Way of the Buggy Whip Used to be gold standard of statistical estimation Minimum variance

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

The lmm Package. May 9, Description Some improved procedures for linear mixed models

The lmm Package. May 9, Description Some improved procedures for linear mixed models The lmm Package May 9, 2005 Version 0.3-4 Date 2005-5-9 Title Linear mixed models Author Original by Joseph L. Schafer . Maintainer Jing hua Zhao Description Some improved

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

The Bayesian approach to inverse problems

The Bayesian approach to inverse problems The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Lecture Notes based on Koop (2003) Bayesian Econometrics

Lecture Notes based on Koop (2003) Bayesian Econometrics Lecture Notes based on Koop (2003) Bayesian Econometrics A.Colin Cameron University of California - Davis November 15, 2005 1. CH.1: Introduction The concepts below are the essential concepts used throughout

More information

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

Fahrmeir, Pritscher: Regression analysis of forest damage by marginal models for correlated ordinal responses

Fahrmeir, Pritscher: Regression analysis of forest damage by marginal models for correlated ordinal responses Fahrmeir, Pritscher: Regression analysis of forest damage by marginal models for correlated ordinal responses Sonderforschungsbereich 386, Paper 9 (1995) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner

More information

Part I State space models

Part I State space models Part I State space models 1 Introduction to state space time series analysis James Durbin Department of Statistics, London School of Economics and Political Science Abstract The paper presents a broad

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Estimation and Inference on Dynamic Panel Data Models with Stochastic Volatility

Estimation and Inference on Dynamic Panel Data Models with Stochastic Volatility Estimation and Inference on Dynamic Panel Data Models with Stochastic Volatility Wen Xu Department of Economics & Oxford-Man Institute University of Oxford (Preliminary, Comments Welcome) Theme y it =

More information