Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Size: px
Start display at page:

Download "Statistical Methods III Statistics 212. Problem Set 2 - Answer Key"

Transcription

1 Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423 patients being treated in a kidney-stone clinic. The data set contains the number of kidney stones that each patient has formed from the beginning of treatment at the clinic, together with the number of years of treatment. Patient age (at baseline) and sex are recorded. A small number of patients have only one functional kidney, either because the kidney was surgically removed or never functioned; this is also recorded in the data set. Only patients with at least one year of follow-up (treatment) are included in the dataset. The data can be found on the course web-site at dgillen/stat212/data/stones.txt. Scientific interest lies in determining wether patients with only one functioning kidney have a higher or lower rate of stone development when compared to those patients that have two functioning kidneys. (a) Produce descriptive statistics relevant to the scientific question of interest. Solution: : Answers to this will vary. At minimum, summary statistics of the population stratified by the predictor of interest should have been produced (this should have included years of followup). A histogram of the rate of stone development stratified by the predictor of interest would also have help to visualize the fact that the data are zero-inflated. Finally a boxplot of rates of stone developments by the predictor of interest could also have been produced. (b) Specify an a priori model for answering the scientific question of interest. choice of adjustment variables in your model. You should justify the Solution: : There are few covariates available for the analysis. A very quick reading of any literature would show that age and male sex are risk factors for kidney stones. In addition, multiple reports have suggested that there is an interaction between these covariates (at younger ages, males have higher rates of stone development, but females catch up at later ages). As such, I would be adjusting for these as potential precision variables (and possible confounders, though there is no information on why individuals have one kidney because of failure or because they were a donor). I note that there are a lot of potential unmeasured confounding factors. Among them are sodium intake, water consumption, hypertension, and hereditary kidney disease. (c) Notationally, write down your a priori regression model. Explain, using your model, how the rate is related to the actual count of kidney stones. Provide an interpretation of the intercept (presuming you have one in your model) and the coefficient associated with the predictor of interest. Note that you will probably wish to transform these parameters to provide meaningful interpretations. Solution: : My a priori model is given as follows: log(λ i ) = β 0 + β 1 (age i 40) + β 2 sex i + β 3 (age i 40) sex i + β 4 nx1 i where λ i is the rate of kidney stone development for subject i. If Y i is the number of kidney stones for subject i and Y i has mean µ i, then µ i = λ i yrfu i and so we have the model log(µ i ) = β 0 + β 1 (age i 40) + β 2 sex i + β 3 (age i 40) sex i + β 4 nx1 i + log(yrfu i ) (d) Use Poisson regression to fit your model and interpret the estimated intercept (presuming you have one in your model) and the estimated coefficient associated with the predictor of interest. Note that 1

2 you will probably wish to transform these parameters to provide meaningful interpretations. Solution: : Below is my initial model fit. Note that I have computed the average age over the course of followup as this is probably a more honest measure of subject age (given the high variance of followup times). ### Create variable for the average age over the course of followup > stones$mean.age <- (stones$age + (stones$age+stones$yrfu))/2 > ### Fit a priori regression model with Poisson regression > fit <- glm( stones ~ I(mean.age-40)*sex + nx1, data=stones, family=poisson, offset=log(yrfu) ) > glmci(fit) exp( Est ) ci95.lo ci95.hi z value Pr(> z ) (Intercept) e+00 I(mean.age - 40) e+00 sex e-04 nx e-04 I(mean.age - 40):sex e-04 Interpretations of the (transformed) intercept and coefficient associated with the predictor of interest are as follows: From our model, we estimated that the rate of stones among 40 year old males patients with two functioning kidneys is approximately.1174 per person per year. We estimate that the rate of stone development among patients with one functioning kidney is approximately 50% that of patients with two functioning kidneys that are similar with respect to age and sex. (e) Using your fitted model, examine the data for overdispersion. What do you conclude. Specifically, do the kidney stone count data have a Poisson distribution for a randomly-selected person given their covariates? Why or why not? Solution:From the squared Pearson residual plot, it certainly appears that these data are overdispersed relative to the Poisson distribution (smoother is consistently above the horizontal line y=1. Note that the plot does not support a simple scalar (ie. quasipoisson) form of overdispersion. (f) Refit your model accounting for overdispersion by: i. using a scaled overdispersion model, scaling the standard errors by the Pearson statistic ii. using the robust variance estimator to adjust the standard errors of the regression coefficient estimates Solution: ### Quasi fit and robust SE > fit.quasi <- glm( stones ~ I(mean.age-40)*sex + nx1, data=stones, family=quasipoisson, offset=log(yrfu) ) > summary( fit.quasi ) Call: 2

3 Squared Pearson Residual Fitted mean Figure 1: Squared Pearson residual plot to assess overdispersion (plot is zoomed-in for a better visual representation. glm(formula = stones ~ I(mean.age - 40) * sex + nx1, family = quasipoisson, data = stones, offset = log(yrfu)) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** I(mean.age - 40) * sex nx I(mean.age - 40):sex Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for quasipoisson family taken to be ) Null deviance: on 1422 degrees of freedom Residual deviance: on 1418 degrees of freedom AIC: NA Number of Fisher Scoring iterations: 6 > glmci( fit.quasi ) exp( Est ) ci95.lo ci95.hi z value Pr(> z ) (Intercept) I(mean.age - 40) sex nx

4 I(mean.age - 40):sex > > glmci( fit, robust=true ) exp( Est ) robust ci95.lo robust ci95.hi robust z value robust Pr(> z ) (Intercept) I(mean.age - 40) sex nx I(mean.age - 40):sex Notice from the above that regardless of the method, the variance estimates are much wider after accounting for overdispersion. While the scaled version is the more extreme correction, the Pearson plot does not support such a model for the overdispersion. (g) Now refit your model using negative binomial regression fit via maximum likelihood (see Problem Set 1). How do your estimates and corresponding inference compare with the Poisson fit and the overdisperion models in (f)? Solution: ### Negative binomial fit > library(mass) > fit.nb <- glm.nb( stones ~ I(mean.age-40)*sex + nx1, data=stones ) > summary(fit.nb) Call: glm.nb(formula = stones ~ I(mean.age - 40) * sex + nx1, data = stones, init.theta = , link = log) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) I(mean.age - 40) sex nx I(mean.age - 40):sex Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for Negative Binomial(0.1394) family taken to be 1) Null deviance: on 1422 degrees of freedom Residual deviance: on 1418 degrees of freedom AIC: 2658 Number of Fisher Scoring iterations: 1 Theta:

5 Std. Err.: x log-likelihood: We can see that there are slight differences in the regression parameter estimates, because of the different weights being applied with the negative binomial model (recall that the quasipoisson and robust variance estimator will have the same coefficient estimates as the usual Poisson model). However, the estimates are certainly not qualitatively different and the resulting inference from the negative binomial model is similar to that from using the robust variance estimator. (h) Which of the overdispersion models would you select and why? Solution: Given the relatively large sample size, I would prefer to use the robust variance estimator for these data. The data clearly seem to be overdispersed relative to the Poisson distribution and the scalar form of overdispersion is not supported by the Pearson plot. The Pearson plot would be approximately linear (with non-zero slope) under the negative-binomial model. This again is not supported by the data. The robust variance makes no assumption about the form of the mean-variance relationship. The robust variance estimator may produce less efficient estimators than the negative binomial model if indeed the negative binomial assumption were correct (since the optimal weights are likely not being used), but that certainly does not seem to be the case here and truly is less of a concern to me than guaranteeing valid inference across a wide range for possible mean-variance relationships. (i) Based upon your final model, what do you conclude regarding the association between the number of functioning kidneys and the rate of kidney stone development? What are the limitations of your analysis? Solution: We estimate that the rate of stone development among patients with one functioning kidney is approximately 50% that of patients with two functioning kidneys that are similar with respect to age and sex (arr=0.499; 95% CI:.0.286, 0.871; p=.0145). However, this studies comes with multiple limitations: All patients were sampled from a single specialty clinic, calling into account the generalizability of the result. Further, there is no information the reason for patients having only one kidney, nor is there information on the actual timing of stone development. Finally, there are numerous unmeasured potential confounding factors that were previously discussed. 2. (To be turned in on Wednesday, April 25th) Following our discussion in Lecture 1 that pertained to the impact of correlation on precision, in this problem we will consider the impact of correlated data in the setting of linear regression. Specifically, we will consider the validity of ordinary least squares (OLS) estimates when outcomes are dependent and covariates vary within and between clusters. We will induce correlation by assuming each subject has a random intercept term so that a subject s response is given by the model: Y ij = β 0 + β 1 t ij + b 0,i + ɛ ij, i = 1,..., n, j = 1,..., J, where β = (β 0, β 1 ) consists of fixed effect parameters defining the mean model for the response, and we assume that b 0,i Normal(0, τ 2 ) and ɛ ij iid Normal(0, σ 2 ), with b 0,i and ɛ ij independent. Further assume that observations on different clusters are independent, so that the correlation between Y ij and Y kl is 0 for i k. (a) Based upon the above model specification, what is the mean and variance of Y ij? 5

6 Expectation: E[Y ij ] = E[β 0 + β 1 t ij + b 0,i + ɛ ij ] = E[β 0 ] + E[β 1 t ij ] + E[b 0,i ] + E[ɛ ij ] = β 0 + β 1 t ij = β 0 + β 1 t ij Variance: V ar[y ij ] = V ar[β 0 + β 1 t ij + b 0,i + ɛ ij ] = V ar[b 0,i + ɛ ij ] = V ar[b 0,i ] + V ar[ɛ ij ] + 2Cov(b 0,i, ɛ ij ) = τ 2 + σ = τ 2 + σ 2 (b) Based upon the above model specification, what is the covariance and correlation between Y ij and Y ij, j j? Combine this with your answer from (a) to write down the covariance matrix for Y i, the vector of responses for cluster i. Covariance: Cov(Y ij, Y ij ) = Cov(β 0 + β 1 t ij + b 0,i + ɛ ij, β 0 + β 1 t ij + b 0,i + ɛ ij ) = Cov(b 0,i + ɛ ij, b 0,i + ɛ ij ) = Cov(b 0,i, b 0,i ) + Cov(b 0,i, ɛ ij ) + Cov(ɛ ij, b 0,i ) + Cov(ɛ ij, ɛ ij ) = τ = τ 2 Cov(Y Correlation: Corr(Y ij, Y ij ) = ij,y ij ) = τ 2 (V ar(yij))v ar(y ij )) τ 2 +σ 2 τ 2 + σ 2 τ 2 τ 2 τ 2 τ 2 + σ 2 τ 2 Cov(Y i ) : Cov(Y i ) J,J = τ 2 τ 2 τ 2 + σ 2 (c) Now consider using OLS to estimate β where t ij is constant within a cluster. Consider the case n = 25 and J = 10 (i.e. 10 measurements on each of 25 clusters). Further suppose that σ 2 = 10 and τ 2 = 5. Sample the values of t ij from a Uniform(1,10) distribution using R with a random seed of Using these values of t ij, i = 1,..., 25, j = 1,..., 10, compute the true variance of the OLS estimator of β. (Hint: It may be useful to note that the OLS estimator can be written as ( n ) 1 n β = i=1 XT i X i i=1 XT i Y i, where X i is the design matrix (dimension 10 2) corresponding to cluster i.) set.seed(12345) n <- 25 J <- 10 id <- rep( 1:n, each=j ) t.ij <- rep( runif(n,1,10), each=j ) # cbind(id,t.ij) Sigmai <- matrix(5,nrow=j,ncol=j) + diag(10,nrow=j) XtX <- Reduce( +, lapply( split(t.ij,id), function(x){ t(cbind(1,x)) %*% cbind(1,x) } ) ) XtSigmaX <- Reduce( +, lapply( split(t.ij,id), function(x){ t(cbind(1,x)) %*% Sigmai %*% cbind(1,x) } ) ) Var.beta <- solve(xtx) %*% XtSigmaX %*% solve(xtx) colnames(var.beta) <- c("beta_0", "Beta_1") rownames(var.beta) <- c("beta_0", "Beta_1") Var.beta x x (d) Now simulate 10,000 datasets for each parameter scenario given in the table below using the model and sampling scheme given in (c) (you may assume that the values of t ij are fixed by design and hence 6

7 do not vary by simulation). For each dataset, compute the OLS estimate of β 1 and the model based variance estimate as computed by lm(). Use the results of your simulation study to fill in a table of the form: β 0 β 1 σ 2 τ 2 E[ β 1 ] Mean Var[ β 1 ] Obs Var[ β 1 ] Cov. Prob. Type I Error where the columns represent the mean OLS estimate of β 1, the mean of the model-based variance, the observed variance, the coverage probability of a 95% confidence interval for β 1 based upon the model-based variance, and the observed type I error rate (only for the case β 1 = 0). Comment on the validity of the OLS estimate and corresponding inference in each of the above cases. Specifically comment on how the mean of the model-based and observed variance relate to the true variance you computed in (c) for the case τ 2 = 5. Code is posted on the course webpage. Briefly, OLS is consistent to the true value regardless of the value of τ 2. When τ 2 = 0 (i.e. under uncorrelated data), the model based variance (i.e Mean V ˆ ar[ ˆβ 1 ]) and the observed variance, which is an estimate of the true variance, are the same. Therefore, the inferences made based on the model has the correct coverage probability and correct Type I error rate. For τ 2 > 0 (i.e. under correlated data), as the table in part d shows, model-based variance is less than the observed variance indicating that OLS underestimates the true variance. This results in lower coverage probability and higher Type I error rate compared with the true coverage probability and the true Type I error rate (i.e unreliable inference when τ 2 > 0). In particular, when τ 2 = 5 observed variance, which is expected to be an estimate of the true variance, is very close to the variance estimated in part c (both are about 0.034) while the model-based variance (OLS model) is far less than the true variance. This leads to a lower coverage probability and higher type I error rate. (e) Now consider using OLS to estimate β where t ij varies within a cluster. Again consider the case n = 25, J = 10, and suppose that σ 2 = 10 and τ 2 = 5. Assume that t ij = j for all i = 1,..., 25, j = 1,..., 10. Compute the true variance of the OLS estimator of β. ## ##### Compute true variance of OLSE under varying within-cluster covariate ## n <- 25 J <- 10 id <- rep( 1:n, each=j ) t.ij <- rep( 1:J, n ) Sigmai <- matrix(5,nrow=j,ncol=j) + diag(10,nrow=j) XtX <- Reduce( +, lapply( split(t.ij,id), function(x){ t(cbind(1,x)) %*% cbind(1,x) } ) ) XtSigmaX <- Reduce( +, lapply( split(t.ij,id), function(x){ t(cbind(1,x)) 7

8 %*% Sigmai %*% cbind(1,x) } ) ) Var.beta <- solve(xtx) %*% XtSigmaX %*% solve(xtx) colnames(var.beta) <- c("beta_0", "Beta_1") rownames(var.beta) <- c("beta_0", "Beta_1") Var.beta x x (f) Again simulate 10,000 datasets using the above model and values of t ij give in (e). For each dataset, compute the OLS estimate of β 1 and the model based variance estimate as computed by lm(). and produce an analogous table that given in part (d) (same values for β 1, σ 2, and τ 2 ). Comment on the validity of the OLS estimate and corresponding inference in each of the above cases. Specifically comment on how the mean of the model-based and observed variance relate to the true variance you computed in (e) for the case τ 2 = 5. β 0 β 1 σ 2 τ 2 E[ β 1 ] Mean Var[ β 1 ] Obs Var[ β 1 ] Cov. Prob. Type I Error Again, OLS is consistent to the true value regardless of the value of τ 2. When τ 2 = 0 (i.e. under uncorrelated data), the model based variance (i.e Mean V ˆ ar[ ˆβ 1 ]) and the observed variance, which is an estimate of the true variance, are the same. Therefore, the inferences made based on the model has the correct coverage probability and correct Type I error rate. For τ 2 > 0 (i.e. under correlated data), as the table in part e shows, model-based variance is greater than the observed variance indicating that OLS overestimates the true variance. This results in higher coverage probability and lower Type I error rate compared to the true coverage probability and the true Type I error rate (i.e unreliable inference when τ 2 > 0). In particular, when τ 2 = 5 observed variance, which is expected to be an estimate of the true variance, is very close to the variance estimated in part e (both are about ) while the model-based variance (OLS model) is greater than the true variance. This leads to a higher coverage probability and lower type I error rate. (g) Summarize the conclusions from this exercise regarding the impact of correlated data when using OLS to estimate model parameters. Comment on how the results you observed here relate to our discussion of the repeated measures and case-crossover designs from Lecture 1. Ordinary Least Square (OLS) estimates ignore any correlation in data. Under correlated data, model-based variance of the OLS estimates are incorrect, however, the estimates still remain consistent. Depending on the study design, model-based variance may underestimate (part d) or may 8

9 overestimate (part f) the true variance. In Part d, data are simulated under the repeated measure design. Under this design, OLS model underestimates the true variance and therefore, leads to lower coverage probability and higher type I error rate. In Part f, data are simulated under the cross-over design. Under this design, OLS model overestimates the true variance and therefore, leads to higher coverage probability and lower type I error rate. 9

Modeling Overdispersion

Modeling Overdispersion James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Sample solutions. Stat 8051 Homework 8

Sample solutions. Stat 8051 Homework 8 Sample solutions Stat 8051 Homework 8 Problem 1: Faraway Exercise 3.1 A plot of the time series reveals kind of a fluctuating pattern: Trying to fit poisson regression models yields a quadratic model if

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Logistic Regression - problem 6.14

Logistic Regression - problem 6.14 Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values

More information

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Poisson Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Poisson Regression 1 / 49 Poisson Regression 1 Introduction

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p ) Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was

More information

Non-Gaussian Response Variables

Non-Gaussian Response Variables Non-Gaussian Response Variables What is the Generalized Model Doing? The fixed effects are like the factors in a traditional analysis of variance or linear model The random effects are different A generalized

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

STAT 526 Spring Midterm 1. Wednesday February 2, 2011 STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples. Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

22s:152 Applied Linear Regression. RECALL: The Poisson distribution. Let Y be distributed as a Poisson random variable with the single parameter λ.

22s:152 Applied Linear Regression. RECALL: The Poisson distribution. Let Y be distributed as a Poisson random variable with the single parameter λ. 22s:152 Applied Linear Regression Chapter 15 Section 2: Poisson Regression RECALL: The Poisson distribution Let Y be distributed as a Poisson random variable with the single parameter λ. P (Y = y) = e

More information

R Hints for Chapter 10

R Hints for Chapter 10 R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/ Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.

More information

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized

More information

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102 Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Generalised linear models. Response variable can take a number of different formats

Generalised linear models. Response variable can take a number of different formats Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion

More information

Using R in 200D Luke Sonnet

Using R in 200D Luke Sonnet Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

PAPER 30 APPLIED STATISTICS

PAPER 30 APPLIED STATISTICS MATHEMATICAL TRIPOS Part III Wednesday, 5 June, 2013 9:00 am to 12:00 pm PAPER 30 APPLIED STATISTICS Attempt no more than FOUR questions, with at most THREE from Section A. There are SIX questions in total.

More information

2.1 Linear regression with matrices

2.1 Linear regression with matrices 21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and

More information

Poisson Regression. The Training Data

Poisson Regression. The Training Data The Training Data Poisson Regression Office workers at a large insurance company are randomly assigned to one of 3 computer use training programmes, and their number of calls to IT support during the following

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical

More information

BOOTSTRAPPING WITH MODELS FOR COUNT DATA

BOOTSTRAPPING WITH MODELS FOR COUNT DATA Journal of Biopharmaceutical Statistics, 21: 1164 1176, 2011 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2011.607748 BOOTSTRAPPING WITH MODELS FOR

More information

Contents. 1 Introduction: what is overdispersion? 2 Recognising (and testing for) overdispersion. 1 Introduction: what is overdispersion?

Contents. 1 Introduction: what is overdispersion? 2 Recognising (and testing for) overdispersion. 1 Introduction: what is overdispersion? Overdispersion, and how to deal with it in R and JAGS (requires R-packages AER, coda, lme4, R2jags, DHARMa/devtools) Carsten F. Dormann 07 December, 2016 Contents 1 Introduction: what is overdispersion?

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models 1/37 The Kelp Data FRONDS 0 20 40 60 20 40 60 80 100 HLD_DIAM FRONDS are a count variable, cannot be < 0 2/37 Nonlinear Fits! FRONDS 0 20 40 60 log NLS 20 40 60 80 100 HLD_DIAM

More information

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes Lecture 2.1 Basic Linear LDA 1 Outline Linear OLS Models vs: Linear Marginal Models Linear Conditional Models Random Intercepts Random Intercepts & Slopes Cond l & Marginal Connections Empirical Bayes

More information

Nonlinear Models. What do you do when you don t have a line? What do you do when you don t have a line? A Quadratic Adventure

Nonlinear Models. What do you do when you don t have a line? What do you do when you don t have a line? A Quadratic Adventure What do you do when you don t have a line? Nonlinear Models Spores 0e+00 2e+06 4e+06 6e+06 8e+06 30 40 50 60 70 longevity What do you do when you don t have a line? A Quadratic Adventure 1. If nonlinear

More information

Checking model assumptions with regression diagnostics

Checking model assumptions with regression diagnostics @graemeleehickey www.glhickey.com graeme.hickey@liverpool.ac.uk Checking model assumptions with regression diagnostics Graeme L. Hickey University of Liverpool Conflicts of interest None Assistant Editor

More information

Checking the Poisson assumption in the Poisson generalized linear model

Checking the Poisson assumption in the Poisson generalized linear model Checking the Poisson assumption in the Poisson generalized linear model The Poisson regression model is a generalized linear model (glm) satisfying the following assumptions: The responses y i are independent

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Lecture 3 Linear random intercept models

Lecture 3 Linear random intercept models Lecture 3 Linear random intercept models Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The response is measures at n different times, or under

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

MSH3 Generalized linear model Ch. 6 Count data models

MSH3 Generalized linear model Ch. 6 Count data models Contents MSH3 Generalized linear model Ch. 6 Count data models 6 Count data model 208 6.1 Introduction: The Children Ever Born Data....... 208 6.2 The Poisson Distribution................. 210 6.3 Log-Linear

More information

Lecture 1 Intro to Spatial and Temporal Data

Lecture 1 Intro to Spatial and Temporal Data Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1

More information

1/15. Over or under dispersion Problem

1/15. Over or under dispersion Problem 1/15 Over or under dispersion Problem 2/15 Example 1: dogs and owners data set In the dogs and owners example, we had some concerns about the dependence among the measurements from each individual. Let

More information

PAPER 206 APPLIED STATISTICS

PAPER 206 APPLIED STATISTICS MATHEMATICAL TRIPOS Part III Thursday, 1 June, 2017 9:00 am to 12:00 pm PAPER 206 APPLIED STATISTICS Attempt no more than FOUR questions. There are SIX questions in total. The questions carry equal weight.

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Chapter 3: Generalized Linear Models

Chapter 3: Generalized Linear Models 92 Chapter 3: Generalized Linear Models 3.1 Components of a GLM 1. Random Component Identify response variable Y. Assume independent observations y 1,...,y n from particular family of distributions, e.g.,

More information

Random Intercept Models

Random Intercept Models Random Intercept Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline A very simple case of a random intercept

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID

More information

Logistic Regression 21/05

Logistic Regression 21/05 Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

Review of Poisson Distributions. Section 3.3 Generalized Linear Models For Count Data. Example (Fatalities From Horse Kicks)

Review of Poisson Distributions. Section 3.3 Generalized Linear Models For Count Data. Example (Fatalities From Horse Kicks) Section 3.3 Generalized Linear Models For Count Data Review of Poisson Distributions Outline Review of Poisson Distributions GLMs for Poisson Response Data Models for Rates Overdispersion and Negative

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington Analsis of Longitudinal Data Patrick J. Heagert PhD Department of Biostatistics Universit of Washington 1 Auckland 2008 Session Three Outline Role of correlation Impact proper standard errors Used to weight

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Lecture 3.1 Basic Logistic LDA

Lecture 3.1 Basic Logistic LDA y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Lecture 9 STK3100/4100

Lecture 9 STK3100/4100 Lecture 9 STK3100/4100 27. October 2014 Plan for lecture: 1. Linear mixed models cont. Models accounting for time dependencies (Ch. 6.1) 2. Generalized linear mixed models (GLMM, Ch. 13.1-13.3) Examples

More information

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 29, 2015 Lecture 5: Multiple Regression Review of ANOVA & Simple Regression Both Quantitative outcome Independent, Gaussian errors

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto. Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Generalized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58

Generalized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58 Generalized Linear Mixed-Effects Models Copyright c 2015 Dan Nettleton (Iowa State University) Statistics 510 1 / 58 Reconsideration of the Plant Fungus Example Consider again the experiment designed to

More information

Generalized Linear Models in R

Generalized Linear Models in R Generalized Linear Models in R NO ORDER Kenneth K. Lopiano, Garvesh Raskutti, Dan Yang last modified 28 4 2013 1 Outline 1. Background and preliminaries 2. Data manipulation and exercises 3. Data structures

More information

Generalized Estimating Equations (gee) for glm type data

Generalized Estimating Equations (gee) for glm type data Generalized Estimating Equations (gee) for glm type data Søren Højsgaard mailto:sorenh@agrsci.dk Biometry Research Unit Danish Institute of Agricultural Sciences January 23, 2006 Printed: January 23, 2006

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00 Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

36-463/663: Multilevel & Hierarchical Models

36-463/663: Multilevel & Hierarchical Models 36-463/663: Multilevel & Hierarchical Models (P)review: in-class midterm Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 In-class midterm Closed book, closed notes, closed electronics (otherwise I have

More information

Lecture 2: Linear and Mixed Models

Lecture 2: Linear and Mixed Models Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the

More information