Module 4: Bayesian Methods Lecture 7: Generalized Linear Modeling

Size: px
Start display at page:

Download "Module 4: Bayesian Methods Lecture 7: Generalized Linear Modeling"

Transcription

1 Module 4: Bayesian Methods Lecture 7: Generalized Linear Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction and Motivating Examples Generalized Linear Models Definition Bayes Linear Model Bayes Logistic Regression Generalized Linear Mixed Models Approximate Bayes Inference The Approximation Case-Control Example Conclusions

2 Introduction In this lecture we will discuss Bayesian modeling in the context of Generalized Linear Models (GLMs). This discussion will include the addition of random effects, to give Generalized Linear Mixed Models (GLMMs). Estimation via a quick and R based technique (INLA) will be demonstrated. An approximation technique that is useful in the context of Genome Wide Association Studies (GWAS) (in which the number of tests is large) will also be introduced. Motivating Example I: Logistic Regression We consider case control data for the disease Leber Hereditary Optic Neuropathy (LHON) disease with genotype data for marker rs : CC CT TT Total Cases Controls Total Let x =0, 1, 2 represent the number of T alleles, and p(x) the probability of being a case, given x alleles. Within a likelihood framework one may fit the multiplicative odds model: p(x) 1 p(x) = exp(α) exp(θx). Interpretation: exp(α) is of little use given the case-control sampling. exp(θ) is the log odds ration describing the multiplicative change in risk given an additional T allele.

3 R code for Logistic Regression Estimation via Likelihood x < c(0,1,2) y < c(6,8,75) z < c(10,66,163) logitmod < glm( cbind (y, z ) x, family= binomial ) thetahat < logitmod$coeff [2] # Log odds ratio thetahat x exp( thetahat ) # Odds ratio x # Odds of disease are associated with an increase # of 61% for each extra T V < vcov( logitmod )[2,2] # standard error ˆ2 # Asymptotic confidence interval for odds ratio > exp( thetahat 1.96 sqrt(v)) x > exp( thetahat sqrt(v)) x # So 95% interval ( just ) contains 1. R code for Logistic Regression Hypothesis Testing via a LRT #Now let s look at a likelihood ratio test > logitmod Call : glm(formula = cbind(y, z) x, family = binomial ) Coefficients : (Intercept) x Degrees of Freedom : 2 Total ( i. e. Null ); 1 Residual Null Deviance : Residual Deviance : AIC: > pchisq( ,1, lower. tail=f) [1] # So just significant at the 5% level. So for these data both estimation and testing point towards borderline significance. Under Hardy Weinberg Equilibrium (HWE) we would estimate (in the controls) the minor allele (C) frequency (MAF) as /2 239 which helps to explain the lack of power. =0.18,

4 Motivating Example II: FTO Data Revisited y = weight x g = fto heterozygote {0, 1} x a = age in weeks {1, 2, 3, 4, 5} We examine the fit of the model Linear Model Example E[Y x g, x a ]=β 0 + β g x g + β a x a + β int x g x a. > fto < read. table ( http ://www. stat. washington. edu/ hoff/sisg/fto data. txt header=true) > liny < fto$y > linxg < fto$xg > linxa < fto$xa > linxint < fto$xgxa > ftodf < list(liny=liny,linxg=linxg,linxa=linxa, linxint=linxint) > ols. fit < lm( l i n y l i n x g+l i n x a+l i n x i n t, data=ftodf ) > summary( ols. f i t ) Coefficients : Estimate Std. Error t value Pr(> t ) (Intercept) linxg linxa e 06 linxint Motivating Example III: Salamander Data McCullagh and Nelder (1989) describe data on the success of matings between male and female salamanders of two population types (roughbutts, RB, and whitesides, WS). The experimental design is complex but involves three experiments having multiple pairings, with each salamander being involved in multiple matings, so that the data are not independent. One way of modeling the dependence is through crossed random effects. The first lab experiment was in the summer of 1986, and the second and third were in the fall of the same year.

5 Motivating Example III: Salamander Data There are 20 females and 20 males, with half of each gender being RB and half being WS. In each experiment each female is paired with 3 males (to give 30 matings in total), and each male is paired with 3 females (to give 30 matings in total). The data are: Experiment 1 Experiment 2 Experiment 3 Y % Y % Y % RR RW WR WW The Observed number (Y ) and percentage (%) of successes out of 30 matings in three experiments. Motivating Example III: Salamander Data Mixed effects models consist of fixed effects and random effects. Fixed effects are population characteristics, while random effects are unit-specific quantities that are assigned a distribution. Mixed effects models allow dependencies in responses to be modeled. Example: The simplest case is the one-way ANOVA model with n observations in each of m groups: i =1,..., m; j =1,..., n. Y ij = β 0 + b i + ij ij iid N(0,σ 2 ) b i iid N(0,σ 2 b) In this model, β 0 is the fixed effect and b 1,..., b m are random effects.

6 Crossed versus Nested Designs Suppose we have two factors, A and B with a and b levels respectively. If each level of A is crossed with each level of B we have a factorial design. In a nested situation j = 1 in level 1 of factor A has no meaningful connection with j = 1 in level 2 of factor A. Treatment Treatment Subject Subject Left: Crossed design. Right: Nested design. The symbols show where observations are measured. Motivating Example III: Salamander Data Let Y ijk denote the response (failure/success) for female i and male j in experiment k. There are 360 binary responses in total. For illustration we fit model C that was previously considered by Karim and Zeger (1992) and Breslow and Clayton (1993): logit Pr(Y ijk =1 β, b f ik, b m jk) =x ijk β k + b f ik + b m jk where x ijk is a 1 4 vector representing the intercept and indicators for female WS, male WS, and male and female both WS, and β k is the corresponding fixed effect (so that this model allows the fixed effects to vary by experiment). The model contains six random effects: b f ik iid N(0,σ 2 fk), b m ik iid N(0,σ 2 mk), k =1, 2, 3 one for each of males and females, and in each experiment.

7 Generalized Linear Models Generalized Linear Models (GLMs) provide a very useful extension to the linear model class. GLMs have three elements: 1. The responses follow an exponential family. 2. The mean model is linear in the covariates on some scale. 3. A link function relates the mean of the data to the covariates. In a GLM the response y i are independently distributed and follow an exponential family so that the distribution is of the form where θ i and α are scalars. p(y i θ i,α) = exp({y i θ i b(θ i )}/α + c(y i,α)), Examples: Normal, Poisson, binomial. It is straightforward to show that E[Y i θ i,α]=µ i = b (θ i ), var(y i θ i,α)=b (θ i )α, for i =1,..., n, with cov(y i, Y j θ i,θ j,α) = 0 for i = j. Generalized Linear Models The link function g( ) provides the connection between µ = E[Y θ, α] and the linear predictor xβ, via g(µ) =xβ, where x is a vector of explanatory variables and β is a vector of regression parameters. For normal data, the usual link is the identity g(µ) =µ. For binary data, a common link is the logistic «µ g(µ) = log. 1 µ For Poisson data, a common link is the log g(µ) = log (µ).

8 Bayesian Modeling with GLMs For a generic GLM, with regression parameters β and a scale parameter α, the posterior is p(β,α y) p(y β,α) p(β,α). Two problems immediately arise: How to specify a prior distribution p(β,α)? How to perform the computations required to summarize the posterior distribution (including the calculation of Bayes factors). Bayesian Computation Various approaches are available: Conjugate analysis the prior combines with likelihood in such a way as to provide analytic tractability (at least for some parameters). Analytical Approximations asymptotic arguments used (e.g. Laplace). Numerical integration. Direct (Monte Carlo) sampling from the posterior, as we have already seen. Markov chain Monte Carlo very complex models can be implemented, for example within the free software WinBUGS. Integrated nested Laplace approximation (INLA). Cleverly combines analytical approximations and numerical integration.

9 Integrated Nested Laplace Approximation (INLA) To download INLA: install.packages(c( fields, numderiv, pixmap, mvtnorm, R. utils )) source( http :// ntnu.no/inla/givemeinla.r ) library(inla) inla.upgrade(testing=true) The homepage of the software is here: The fitting of many common models is described here: There are also lots of examples. INLA can fit GLMs, GLMMs and many other classes. INLA for the Linear Model We first fit a linear model to the FTO data with the default prior settings. > liny < fto$y > linxg < fto$x [,2] > linxa < fto$x [,3] > linxint < fto$x [,4] > ftodf < list(liny=liny,linxg=linxg,linxa=linxa, linxint=linxint) > lin.mod < inla(liny linxg+linxa+linxint,data=ftodf, family= gaussian ) > summary( l i n.mod) Fixed effects : mean sd quant 0.5 quant quant (Intercept) linxg linxa linxint Model hyperparameters : mean sd quant 0.5 quant Precision for the Gaus obsers quant Precision for the Gaus observations The posterior means and standard deviations are in very close agreement with the OLS fits presented earlier.

10 R Code for Marginal Distributions betaxggrid < lin.mod$marginals. fixed$linxg [,1] betaxgmarg < lin.mod$marginals. fixed$linxg [,2] betaxagrid < lin.mod$marginals. fixed$linxa [,1] betaxamarg < lin.mod$marginals. fixed$linxa [,2] betaxintgrid < lin.mod$marginals. fixed$linxint [,1] betaxintmarg < lin.mod$marginals. fixed$linxint [,2] precgrid < lin.mod$marginals.hyperpar$ Precision for the Gaussian observations [,1] precmarg < lin.mod$marginals.hyperpar$ Precision for the Gaussian observations [,2] par(mfrow=c(2,2)) plot(betaxgmarg betaxggrid, type= l,xlab=expression(beta [g]), ylab= Marginal Density,cex. lab=1.5) abline(v=0,col= red ) plot(betaxamarg betaxagrid, type= l,xlab=expression(beta [a]), ylab= Marginal Density,cex. lab=1.5) abline(v=0,col= red ) plot(betaxintmarg betaxintgrid, type= l,xlab=expression(beta [ int ]), ylab= Marginal Density,cex. lab=1.5) abline(v=0,col= red ) plot(precmarg precgrid, type= l,xlab=expression(sigmaˆ{ 2}), ylab= Marginal Density,cex. lab=1.5) FTO Posterior Marginal Distributions Marginal Density Marginal Density β g β a Marginal Density Marginal Density β int σ 2 Marginal distributions of the three regression coefficients corresponding to xg, xa and the interaction, and the precision.

11 FTO Extended Analysis In order to carry out model checking we rerun the analysis, but now switch on a flag to obtain fitted values. lin.mod < inla(liny linxg+linxa+linxint,data=ftodf, family= gaussian,control. predictor=list (compute=true)) fitted < lin.mod$summary. fitted.values [,1] sigmamed < 1/ sqrt (.2924) With the fitted values we can examine the fit of the model. In particular: Normality of the errors (sample size is relatively small). Errors have constant variance (and are uncorrelated). Linear model is adequate. Assessing the Model residuals < (liny fitted)/sigmamed par(mfrow=c(2,2)) qqnorm( residuals, main= ) title ( (a) ) abline (0,1, lty=2,col= red ) plot( residuals linxa, ylab= Residuals,xlab= Age ) title ( (b) ) abline(h=0,lty=2,col= red ) plot( residuals fitted, ylab= Residuals,xlab= Fitted ) title ( (c) ) abline(h=0,lty=2,col= red ) plot( fitted liny, xlab= Observed,ylab= Fitted ) title ( (d) ) abline (0,1, lty=2,col= red )

12 (a) FTO Diagnostic Plots (b) Sample Quantiles Residuals Theoretical Quantiles Age (c) (d) Residuals Fitted Fitted Observed Plots to assess model adequacy: (a) Normal QQ plot, (b) residuals versus age, (c) residuals versus fitted, (d) fitted versus observed. Bayes Logistic Regression The likelihood is Logistic link: The prior is Y (x) p(x) Binomial(N(x), p(x)), x =0, 1, 2. «p(x) log = α + θx 1 p(x) p(α, θ) =p(α) p(θ) with α N(µ α,σ α ) and θ N(µ θ,σ θ ). The first analysis uses the default priors in INLA (which are relatively flat). In the second analysis we specify α N(0, 1/0.1) and θ N(0, W ) where W is such that the 97.5% point of the prior is log(1.5), i.e. we believe the odds ratio lies between 2/3 and 3/2 with probability 0.95.

13 R Code for Logistic Regression Example > x < c(0,1,2) > y < c(6,8,75) > z < c(10,66,163) > cc. dat < list(x=x,y=y,z=z) > cc. dat < as. data. frame( rbind (y,z,x)) > cc.mod < inla(y x, family= binomial,data=cc.dat, Ntrials=y+z) > summary( cc.mod) mean sd quant 0.5 quant quant (Intercept) x #Now with informative priors > Upper975 < log (1.5) > W < (log(upper975)/1.96)ˆ2 > cc.mod2 < inla(y x, family= binomial,data=cc.dat, Ntrials=y+z, control. fixed=list (mean. intercept=c(0),prec. intercept=c(.1), mean=c (0), prec=c (1/W) ) ) > summary( cc. mod2) mean sd quant 0.5 quant quant (Intercept) x The quantiles for θ can be translated to odds ratios by exponentiating. R Code For Extracting the Marginal Densities We obtain plots of the marginal distributions of α and θ, from the model with informative priors. Alternatively, can use plot.inla. alphagrid < cc. mod2$marginals. fixed$ ( Intercept ) [,1] alphamarg < cc. mod2$marginals. fixed$ ( Intercept ) [,2] thetagrid < cc. mod2$marginals. fixed$x [,1] thetamarg < cc. mod2$marginals. fixed$x [,2] par(mfrow=c(1,2)) plot(alphamarg alphagrid, type= l,xlab=expression(alpha), ylab= Marginal Density,cex. lab=1.5) plot(thetamarg thetagrid, type= l,xlab=expression(theta ), ylab= Marginal Density,cex. lab=1.5) abline(v=0,col= red )

14 Logistic Marginal Plots Marginal Density Marginal Density α θ Posterior marginals for the intercept α and the log odds ratio θ. A Simple ANOVA Example We begin with simulated data from the simple one-way ANOVA model example: Y ij = β 0 + b i + ij ij iid N(0,σ 2 ) b i iid N(0,σ 2 b) i =1,..., 10; j =1,..., 5, with β 0 =0.5, σ 2 =0.2 2 and σ 2 b = Simulation: sigma.b < 0.3 sigma. e < 0.2 m < 10 ni < 5 beta0 < 0.5 b < rnorm(m, mean=0, sd=sigma. b) e < rnorm(m ni, mean=0, sd=sigma. e) Yvec < beta0 + rep(b, each=ni ) + e simdata < data. frame(y=yvec, ind=rep (1:m, each=ni ))

15 A Simple ANOVA Example > result < inla(y f(ind, model= iid ), data = simdata) > summary( r e s u l t ) Fixed effects : mean sd quant 0.5 quant quant (Intercept) Model hyperparameters : mean sd quant 0.5 quant quant Precision for Gaus obs Precision for ind > sigma. est < 1/ sqrt (summary( hyper ) $hyperpar [,4]) > sigma. est Gaus obs ind # Extract the posterior medians # of the precision and invert The Salamander Data: INLA Analysis We assume that for each of the random effects the residual odds lies between 0.1 and 10 with probability 0.9, so that Ga(1,0.622) priors are used for each of the six precisions, σ 2 fk,σ 2 mk for k =1, 2, 3. See Fong et al. (2010) for more details. We present results for the fixed effects and variance components, with the model fitted using both restricted ML (REML) and INLA. We see the reduction in standard errors in the REML analysis. There is some attenuation of the Bayesian results here due to the INLA approximation strategy. Again, see Fong et al. (2010) for more details. Experiment 1 Experiment 2 Experiment 3 Variable REML INLA REML INLA REML INLA Intercept 1.34± ± ± ± ± ±0.73 WS F -2.94± ± ± ± ± ±0.92 WS M -0.42± ± ± ± ± ±0.94 WS F WS M 3.18± ± ± ± ± ±1.05 σ f ± ± ±0.28 σm ± ± ±0.48 REML and INLA summaries for Salamander data. For the entries marked with a standard errors were unavailable.

16 INLA Code for Salamander Data: Data Setup ## C r o s s e d Random E f f e c t s Salamander load( salam.rdata ) ## o r g a n i z e d a t a i n t o a f o r m s u i t a b l e f o r l o g i s t i c r e g r e s s i o n dat0 < data. frame( y =c(salam$y ), f W =as. integer(salam$x[, W/R ]==1 salam$x [, W/W ]==1), m W =a s. i n t e g e r ( s a l a m $ x [, R/W ]==1 salam$x [, W/W ]==1), W W = a s. i n t e g e r ( s a l a m $ x [, W/W ]==1 ) ) ## a d d s a l a m a n d e r i d id < t( apply(salam$z, 1, function(x) { tmp = which ( x==1) tmp [ 2 ] = tmp [ 2 ] 20 tmp }) ) ## i d s a r e s u i t a b l e f o r m o d e l A a n d C, b u t n o t B id.moda < rbind(id, id+40, id+20) colnames ( id.moda) < c( f.moda, m.moda ) dat0 < cbind (dat0, id.moda, group=1) dat0$experiment < as. factor (rep (1:3, each=120)) dat0$group < as. factor (dat0$group) # salamander < dat0 salamander. e1 < subset (dat0, dat0$experiment==1) salamander. e2 < subset (dat0, dat0$experiment==2) salamander. e3 < subset (dat0, dat0$experiment==3) INLA Code for Salamander Data: Model Fitting Below is the code for fitting to the data in experiment 1, along with the output. # salamander. e1 > salamander. e1. inla. fit < inla(y f.w + m.w + W.W + f(f.moda, model= iid, param=c(1,.622)) + f(m.moda, model= iid, param=c(1,.622)), family= binomial, data=salamander.e1, Ntrials=rep(1,nrow(salamander.e1))) > salamander. e1. hyperpar < inla.hyperpar(salamander.e1. inla. fit ) > summary( salamander. e1. i n l a. f i t ) Fixed effects : mean sd quant 0.5 quant quant (Intercept) f.w m.w W.W Random e f f e c t s : Model hyperparameters : mean sd quant 0.5 quant quant Precision for f.moda Precision for m.moda

17 Approximate Bayes Inference Particularly in the context of a large number of experiments, a quick and accurate model is desirable. We describe such a model in the context of a GWAS. We first recap the normal-normal Bayes model. Subsequently, we describe the approximation and provide an example. Recall: The Normal Posterior Distribution For the model Prior: θ normal(µ 0,τ0 2 ) and Likelihood: Y 1,..., Y n θ normal(θ, σ 2 ). Posterior θ y 1,..., y n } normal(µ, τ 2 ) where and var(θ y 1,..., y n )=τ 2 = [1/τ n/σ 2 ] 1 Precision = 1/τ 2 = 1/τ n/σ 2 E[θ y 1,..., y n ]=µ = µ 0/τ0 2 +ȳn/σ 2 1/τ0 2 + n/σ2 ««1/τ0 2 n/σ 2 = µ 0 1/τ ȳ n/σ2 1/τ0 2 + n/σ2

18 A Normal-Normal Approximate Bayes Model Consider again the logistic regression model with interest focusing on θ. logit p i = α + x i θ We require priors for α, θ, and some numerical/analytical technique for estimation/bayes factor calculation. Wakefield (2007, 2009) considered replacing the likelihood by the approximation p(θ b θ) p( b θ θ)p(θ) where b θ θ N(θ, V ) the asymptotic distribution of the MLE, θ N(0, W ) the prior on the log RR. Can choose W so that 95% of relative risks lie in some range, e.g. [2/3,1.5]. Posterior Distribution Under the alternative the posterior distribution for the log odds ratio θ is where θ b θ N(r b θ, rv ) r = W V + W. Hence, we have shrinkage to the prior mean of 0. The posterior median for the odds ratio is exp(r b θ) and a 95% credible interval is exp(r b θ ± 1.96 rv ). Note that as W the non-bayesian point and interval estimates are recovered.

19 A Normal-Normal Approximate Bayes Model We are interested in the hypotheses: H 0 : θ =0, evaluation of the Bayes factor H 1 : θ = 0 and BF = p(b θ H 0 ) p( b θ H 1 ). Using the approximate likelihood and normal prior we obtain: 1 Approximate Bayes Factor = exp Z «2 1 r 2 r, with Z = b θ V, r = W V +W. A Normal-Normal Approximate Bayes Model The approximation can be combined with a Prior Odds = (1 π 0 )/π 0 to give Posterior Odds on H 0 = BFDP = ABF Prior Odds 1 BFDP where BFDP is the Bayesian False Discovery Probability. BFDP depends on the power, through r. For implementation, all that we need from the data is the Z-score and the standard error V, or a confidence interval. Hence, published results that report confidence intervals can be converted into Bayes factors for interpretation (see Lecture 8). The approximation relies on large sample sizes, so the normal distribution of the estimator provides a good summary of the information in the data.

20 Bayesian Logistic Regression: Estimation > source( http :// faculty. washington.edu/jonno/bfdp.r ) # # 97.5 point of prior is log (1.5) so that we with prob # 0.95 we think theta lies in (2/3,1.5) > Upper975 < log (1.5) > W < (log(upper975)/1.96)ˆ2 > x < c(0,1,2) > y < c(6,8,75) > z < c(10,66,163) > logitmod < glm( cbind (y, z ) x, family= binomial ) > thetahat < logitmod$coef [2] > V < vcov( logitmod )[2,2] > r < W/ (V+W) > r [1] # Not so much data here, so weight on prior is high. # Bayesian posterior median > exp( r thetahat) x # Shrunk towards prior median of 1 # Bayesian approximate 95% credible interval > exp( r thetahat 1.96 sqrt(r V)) x > exp( r thetahat+1.96 sqrt(r V)) x Bayesian Logistic Regression: Hypothesis Testing Now we turn to testing using Bayes factors. We examine the sensitivity to the prior on the alternative, π 1. > pi1 < c(1/2,1/100,1/1000,1/10000,1/100000) > BFcall < BFDPfunV( thetahat,v,w, pi1 ) > BFcall $BF x $ph0 x $ph1 x $BFDP [1] So data are twice as likely under the alternative as compared to the null. Apart from the 0.5 prior, under these priors the overall evidence is of no association.

21 Combination of data across studies Suppose we wish to combine data from two studies where we assume a common log odds ratio θ.. The estimates from the two studies are b θ 1, b θ 2 with standard errors V 1 and V 2. The Bayes factor is The approximate Bayes factor is p( b θ 1, b θ 2 H 0 ) p( b θ 1, b θ 2 H 1 ). ABF( b θ 1, b θ 2 ) = ABF( b θ 1 ) ABF( b θ 2 b θ 1 ) (1) where and ABF( b θ 2 b θ 1 )= p(b θ 2 H 0 ) p( b θ 2 b θ 1, H 1 ) p( θ b 2 θ b 1, H 1 )=E θ θ1 b hp( θ b i 2 θ) so that the density is averaged with respect to the posterior for θ. Important Point: The Bayes factors are not independent. Combination of data across studies This leads to an approximate Bayes factor (which summarizes the data from the two studies) of r j ABF( θ b 1, θ b W 2 )= exp 1 Z1 2 RV 2 +2Z 1 Z 2 R ff V 1 V 2 + Z2 2 RV 1 RV 1 V 2 2 where R = W /(V 1 W + V 2 W + V 1 V 2 ) θ Z 1 = b 1 V1 and θ Z 2 = b 2 V2 are the usual Z statistics. The ABF will be small (evidence for H 1 ) when the absolute values of Z 1 and Z 2 are large and they are of the same sign.

22 Example of Combination of Studies in a GWAS Frayling et al. (2007) report a GWAS for Type II diabetes. For SNP rs : Pr(H 0 data) with prior: Stage Estimate (CI) p-value log 10 BF 1/5,000 1/50,000 1st 1.27 ( ) nd 1.15 ( ) Combined Combined evidence is stronger than each separately since the point estimates are in agreement. For summarizing inference the (5%, 50%, 95%) points for the RR are: Prior 1.00 ( ) First Stage 1.26 ( ) Combined 1.21 ( ) Conclusions Computationally GLMs and GLMMs can now be fitted in a relatively straightforward way. INLA is very convenient and is being constantly improved. As with all analyses, it is crucial to check modeling assumptions (and there are usually more in a Bayesian analysis). For binary observations INLA can produce inaccurate estimates. Markov chain Monte Carlo provides an alternative for computation. WinBUGS is one popular implementation. Other MCMC possibilities include: JAGS, BayesX.

23 References Breslow, N. and Clayton, D. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88, Fong, Y., H.Rue, and Wakefield, J. (2010). Bayesian inference for generalized linear mixed models. Biostatistics, 11, Frayling, T., Timpson, N., Weedon, M., Zeggini, E., Freathy, R., and et al., C. L. (2007). A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science, 316, Karim, M. and Zeger, S. (1992). Generalized linear models with random effects: Salamander mating revisited. Biometrics, 48, McCullagh, P. and Nelder, J. (1989). Generalized Linear Models, Second Edition. Chapman and Hall, London. Wakefield, J. (2007). A Bayesian measure of the probability of false discovery in genetic epidemiology studies. American Journal of Human Genetics, 81, Wakefield, J. (2009). Bayes factors for genome-wide association studies: comparison with p-values. Genetic Epidemiology, 33,

2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling

2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling 2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling Jon Wakefield Departments of Statistics and Biostatistics, University of Washington 2015-07-24 Case control example We analyze

More information

2017 SISG Module 1: Bayesian Statistics for Genetics Lecture 7: Generalized Linear Modeling

2017 SISG Module 1: Bayesian Statistics for Genetics Lecture 7: Generalized Linear Modeling 2017 SISG Module 1: Bayesian Statistics for Genetics Lecture 7: Generalized Linear Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction and Motivating

More information

2017 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling

2017 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling 2017 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling Jon Wakefield Departments of Statistics and Biostatistics, University of Washington 2017-09-05 Overview In this set of notes

More information

2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling

2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling 2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction and Motivating

More information

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015 Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

Case-Control Association Testing. Case-Control Association Testing

Case-Control Association Testing. Case-Control Association Testing Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline Module 4: Bayesian Methods Lecture 9 A: Default prior selection Peter Ho Departments of Statistics and Biostatistics University of Washington Outline Je reys prior Unit information priors Empirical Bayes

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

Module 4: Bayesian Methods Lecture 5: Linear regression

Module 4: Bayesian Methods Lecture 5: Linear regression 1/28 The linear regression model Module 4: Bayesian Methods Lecture 5: Linear regression Peter Hoff Departments of Statistics and Biostatistics University of Washington 2/28 The linear regression model

More information

WU Weiterbildung. Linear Mixed Models

WU Weiterbildung. Linear Mixed Models Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models

R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models Christina Knudson, Ph.D. University of St. Thomas user!2017 Reviewing the Linear Model The usual linear model assumptions:

More information

Package HGLMMM for Hierarchical Generalized Linear Models

Package HGLMMM for Hierarchical Generalized Linear Models Package HGLMMM for Hierarchical Generalized Linear Models Marek Molas Emmanuel Lesaffre Erasmus MC Erasmus Universiteit - Rotterdam The Netherlands ERASMUSMC - Biostatistics 20-04-2010 1 / 52 Outline General

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Chapter 4 - Fundamentals of spatial processes Lecture notes

Chapter 4 - Fundamentals of spatial processes Lecture notes Chapter 4 - Fundamentals of spatial processes Lecture notes Geir Storvik January 21, 2013 STK4150 - Intro 2 Spatial processes Typically correlation between nearby sites Mostly positive correlation Negative

More information

A brief introduction to mixed models

A brief introduction to mixed models A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.

More information

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks (9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks. Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression 1/37 The linear regression model Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression Ken Rice Department of Biostatistics University of Washington 2/37 The linear regression model

More information

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Lecture 1 Bayesian inference

Lecture 1 Bayesian inference Lecture 1 Bayesian inference olivier.francois@imag.fr April 2011 Outline of Lecture 1 Principles of Bayesian inference Classical inference problems (frequency, mean, variance) Basic simulation algorithms

More information

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

BIOS 312: Precision of Statistical Inference

BIOS 312: Precision of Statistical Inference and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample

More information

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/?? to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter

More information

2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem

2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem 2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic

More information

Beyond MCMC in fitting complex Bayesian models: The INLA method

Beyond MCMC in fitting complex Bayesian models: The INLA method Beyond MCMC in fitting complex Bayesian models: The INLA method Valeska Andreozzi Centre of Statistics and Applications of Lisbon University (valeska.andreozzi at fc.ul.pt) European Congress of Epidemiology

More information

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model Regression: Part II Linear Regression y~n X, 2 X Y Data Model β, σ 2 Process Model Β 0,V β s 1,s 2 Parameter Model Assumptions of Linear Model Homoskedasticity No error in X variables Error in Y variables

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies. Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative

More information

Package LBLGXE. R topics documented: July 20, Type Package

Package LBLGXE. R topics documented: July 20, Type Package Type Package Package LBLGXE July 20, 2015 Title Bayesian Lasso for detecting Rare (or Common) Haplotype Association and their interactions with Environmental Covariates Version 1.2 Date 2015-07-09 Author

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

R Hints for Chapter 10

R Hints for Chapter 10 R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Multi-level Models: Idea

Multi-level Models: Idea Review of 140.656 Review Introduction to multi-level models The two-stage normal-normal model Two-stage linear models with random effects Three-stage linear models Two-stage logistic regression with random

More information

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9 Metropolis Hastings Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 9 1 The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Generalized, Linear, and Mixed Models

Generalized, Linear, and Mixed Models Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New

More information

Data Analysis and Uncertainty Part 2: Estimation

Data Analysis and Uncertainty Part 2: Estimation Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics Estimating a proportion using the beta/binomial model A fundamental task in statistics is to estimate a proportion using a series of trials: What is the success probability of a new cancer treatment? What

More information

Logistic Regression - problem 6.14

Logistic Regression - problem 6.14 Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values

More information

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters Introduction Start with a probability distribution f(y θ) for the data y = (y 1,...,y n ) given a vector of unknown parameters θ = (θ 1,...,θ K ), and add a prior distribution p(θ η), where η is a vector

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

-A wild house sparrow population case study

-A wild house sparrow population case study Bayesian animal model using Integrated Nested Laplace Approximations -A wild house sparrow population case study Anna M. Holand Ingelin Steinsland Animal model workshop with application to ecology at Oulu

More information

Lecture 3 Linear random intercept models

Lecture 3 Linear random intercept models Lecture 3 Linear random intercept models Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The response is measures at n different times, or under

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Motivation: Why Applied Statistics?

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information