Generalized linear models IV Examples

Size: px
Start display at page:

Download "Generalized linear models IV Examples"

Transcription

1 Generalized linear models IV Examples Peter McCullagh Department of Statistics University of Chicago Polokwame, South Africa November 2013

2 Outline Decay rates of vitamin C Ship damage data Fisher s tuberculin data Birth date and death date Drosophila diet and assortative mating

3 Example: Decay rates of vitamin C Ascorbic acid concentrations of snap-beans after cold storage Weeks of storage Temp Total 0 F F F Response Y = concentration Gaussian model versus gamma model Half life estimation Model checking Compatibility with pure error estimate

4 Gaussian non-linear model Response: Y = concentration (not log(concentration)) Exponential decay model E(Y (t)) = µ(t) = exp(β 0 β T t) log µ(t) = β 0 β T t log link Model formula: as.factor(temp):time β 0 = log µ(0) (same for every temp) Half life: log(2)/β T at storage temp T glm(y~temp:time, family=gaussian(link=log)) Coefficients: Estimate Std. Error t-value Pr(> t ) (Intercept) e-16 *** temp0:time temp10:time *** temp20:time e-08 *** -- (Dispersion parameter for gaussian family taken to be ) Null deviance: on 11 degrees of freedom Residual deviance: on 8 degrees of freedom

5 Half-life,... Decay rate β 0: Half life: log(2)/β CI for β: ( ˆβ t se, ˆβ + t se)+ Confidence interval for half-life: (90%) (log(2)/( ˆβ + t se), log(2)/( ˆβ t se)) + Temp F ˆβ SE log(2)/ ˆβ half-life CI (weeks) (89.4, ) (22.3, 43.0) (4.8, 5.7) Gamma model (82.2, ) (21.9, 42.3) (4.9, 5.6) Half-life intervals not symmetric

6 Model checking using replicate info External check: Each response is a sum of measurements for 3 packets: var(y i ) = 3σ 2 : σ 2 = packet variance Individual measurements not available, but replicate mean squared error = on 24 df Model mean squared error on 8 df F -ratio: F = 1.125/ = 0.53 at 18% point of F 8,24 distribution External variance estimate helps to avoid over-fitting

7 Ship damage data Background: structured data from Lloyd s of London in 1980 Cargo-carrying ships of five types A E Construction year: 4-level factor with levels 60 64; 65 69; 70 74; Operation period: 2-level factor with levels and 74 79; (pre and post OPEC) Exposure unit t: ship-months at risk ( ) Response: Number of damage incidents caused by waves to the forward section... Same ship may experience more than one incident Same ship may operate in both periods

8 Number of reported damage incidents and aggregate months service by ship type, year of construction and period of operation Ship Year of Period of Aggregate Number of type construction operation months service damage incidents A A A A A A A * A B B B B B B B * B C C C C C C C * C D D D D D D D * D E

9 Statistical considerations for ship damage data Response is an event count in (0, t), a non-negative integer suggesting Poisson process or a renewal process Possibility of moderate over-dispersion var(y ) > E(Y ) various reasons... Focus on event rates suggesting E(Y t ) t t = 0 implies µ = 0 and Y = 0 Three factors stype, const, oper affecting accident rate multiplicative effects more plausible than additive Leading to initial log-linear model Y i Po(µ i ) (independent components) Main effects additive model log µ i = β 0 + log(t i ) + α stype + β cons + γ oper Role of log(t): offset, (coefficient = 1) not a covariate; does not appear in the model formula glm(y~stype+cons+oper, family=poisson(), offset=log(t)) Six cells with t = 0 are uninformative and may be omitted

10 Statistical considerations for ship damage data Response is an event count in (0, t), a non-negative integer suggesting Poisson process or a renewal process Possibility of moderate over-dispersion var(y ) > E(Y ) various reasons... Focus on event rates suggesting E(Y t ) t t = 0 implies µ = 0 and Y = 0 Three factors stype, const, oper affecting accident rate multiplicative effects more plausible than additive Leading to initial log-linear model Y i Po(µ i ) (independent components) Main effects additive model log µ i = β 0 + log(t i ) + α stype + β cons + γ oper Role of log(t): offset, (coefficient = 1) not a covariate; does not appear in the model formula glm(y~stype+cons+oper, family=poisson(), offset=log(t)) Six cells with t = 0 are uninformative and may be omitted

11 Statistical considerations for ship damage data Response is an event count in (0, t), a non-negative integer suggesting Poisson process or a renewal process Possibility of moderate over-dispersion var(y ) > E(Y ) various reasons... Focus on event rates suggesting E(Y t ) t t = 0 implies µ = 0 and Y = 0 Three factors stype, const, oper affecting accident rate multiplicative effects more plausible than additive Leading to initial log-linear model Y i Po(µ i ) (independent components) Main effects additive model log µ i = β 0 + log(t i ) + α stype + β cons + γ oper Role of log(t): offset, (coefficient = 1) not a covariate; does not appear in the model formula glm(y~stype+cons+oper, family=poisson(), offset=log(t)) Six cells with t = 0 are uninformative and may be omitted

12 Statistical considerations for ship damage data Response is an event count in (0, t), a non-negative integer suggesting Poisson process or a renewal process Possibility of moderate over-dispersion var(y ) > E(Y ) various reasons... Focus on event rates suggesting E(Y t ) t t = 0 implies µ = 0 and Y = 0 Three factors stype, const, oper affecting accident rate multiplicative effects more plausible than additive Leading to initial log-linear model Y i Po(µ i ) (independent components) Main effects additive model log µ i = β 0 + log(t i ) + α stype + β cons + γ oper Role of log(t): offset, (coefficient = 1) not a covariate; does not appear in the model formula glm(y~stype+cons+oper, family=poisson(), offset=log(t)) Six cells with t = 0 are uninformative and may be omitted

13 Statistical considerations for ship damage data Response is an event count in (0, t), a non-negative integer suggesting Poisson process or a renewal process Possibility of moderate over-dispersion var(y ) > E(Y ) various reasons... Focus on event rates suggesting E(Y t ) t t = 0 implies µ = 0 and Y = 0 Three factors stype, const, oper affecting accident rate multiplicative effects more plausible than additive Leading to initial log-linear model Y i Po(µ i ) (independent components) Main effects additive model log µ i = β 0 + log(t i ) + α stype + β cons + γ oper Role of log(t): offset, (coefficient = 1) not a covariate; does not appear in the model formula glm(y~stype+cons+oper, family=poisson(), offset=log(t)) Six cells with t = 0 are uninformative and may be omitted

14 Statistical considerations for ship damage data Response is an event count in (0, t), a non-negative integer suggesting Poisson process or a renewal process Possibility of moderate over-dispersion var(y ) > E(Y ) various reasons... Focus on event rates suggesting E(Y t ) t t = 0 implies µ = 0 and Y = 0 Three factors stype, const, oper affecting accident rate multiplicative effects more plausible than additive Leading to initial log-linear model Y i Po(µ i ) (independent components) Main effects additive model log µ i = β 0 + log(t i ) + α stype + β cons + γ oper Role of log(t): offset, (coefficient = 1) not a covariate; does not appear in the model formula glm(y~stype+cons+oper, family=poisson(), offset=log(t)) Six cells with t = 0 are uninformative and may be omitted

15 Statistical considerations for ship damage data Response is an event count in (0, t), a non-negative integer suggesting Poisson process or a renewal process Possibility of moderate over-dispersion var(y ) > E(Y ) various reasons... Focus on event rates suggesting E(Y t ) t t = 0 implies µ = 0 and Y = 0 Three factors stype, const, oper affecting accident rate multiplicative effects more plausible than additive Leading to initial log-linear model Y i Po(µ i ) (independent components) Main effects additive model log µ i = β 0 + log(t i ) + α stype + β cons + γ oper Role of log(t): offset, (coefficient = 1) not a covariate; does not appear in the model formula glm(y~stype+cons+oper, family=poisson(), offset=log(t)) Six cells with t = 0 are uninformative and may be omitted

16 Statistical considerations for ship damage data Response is an event count in (0, t), a non-negative integer suggesting Poisson process or a renewal process Possibility of moderate over-dispersion var(y ) > E(Y ) various reasons... Focus on event rates suggesting E(Y t ) t t = 0 implies µ = 0 and Y = 0 Three factors stype, const, oper affecting accident rate multiplicative effects more plausible than additive Leading to initial log-linear model Y i Po(µ i ) (independent components) Main effects additive model log µ i = β 0 + log(t i ) + α stype + β cons + γ oper Role of log(t): offset, (coefficient = 1) not a covariate; does not appear in the model formula glm(y~stype+cons+oper, family=poisson(), offset=log(t)) Six cells with t = 0 are uninformative and may be omitted

17 Some conclusions for ship damage data Main-effects model: Deviance = 38.7 on 25 df; X 2 = 42.3 on 25 df Suggests moderate over-dispersion (or interaction) X 2 /df = 1.69 Stationarity: coefficient of log(t): ˆβ = 0.9 ± 0.1 consistent with β = 1 Interactions: by adding 2-factor terms (one at a time) Not much evidence of interaction X2 <- sum((y - fit$fitted)^2/fit$fitted) summary(fit, dispersion=x2/25) to get correct standard errors with dispersion factor Conclusions regarding risk factors ship type relative rates: 1.00, 0.58, 0.51, 0.93, 1.38 operation: pre 74: 1.00, post 74: 1.47 construction period:

18 Some conclusions for ship damage data Main-effects model: Deviance = 38.7 on 25 df; X 2 = 42.3 on 25 df Suggests moderate over-dispersion (or interaction) X 2 /df = 1.69 Stationarity: coefficient of log(t): ˆβ = 0.9 ± 0.1 consistent with β = 1 Interactions: by adding 2-factor terms (one at a time) Not much evidence of interaction X2 <- sum((y - fit$fitted)^2/fit$fitted) summary(fit, dispersion=x2/25) to get correct standard errors with dispersion factor Conclusions regarding risk factors ship type relative rates: 1.00, 0.58, 0.51, 0.93, 1.38 operation: pre 74: 1.00, post 74: 1.47 construction period:

19 Some conclusions for ship damage data Main-effects model: Deviance = 38.7 on 25 df; X 2 = 42.3 on 25 df Suggests moderate over-dispersion (or interaction) X 2 /df = 1.69 Stationarity: coefficient of log(t): ˆβ = 0.9 ± 0.1 consistent with β = 1 Interactions: by adding 2-factor terms (one at a time) Not much evidence of interaction X2 <- sum((y - fit$fitted)^2/fit$fitted) summary(fit, dispersion=x2/25) to get correct standard errors with dispersion factor Conclusions regarding risk factors ship type relative rates: 1.00, 0.58, 0.51, 0.93, 1.38 operation: pre 74: 1.00, post 74: 1.47 construction period:

20 Some conclusions for ship damage data Main-effects model: Deviance = 38.7 on 25 df; X 2 = 42.3 on 25 df Suggests moderate over-dispersion (or interaction) X 2 /df = 1.69 Stationarity: coefficient of log(t): ˆβ = 0.9 ± 0.1 consistent with β = 1 Interactions: by adding 2-factor terms (one at a time) Not much evidence of interaction X2 <- sum((y - fit$fitted)^2/fit$fitted) summary(fit, dispersion=x2/25) to get correct standard errors with dispersion factor Conclusions regarding risk factors ship type relative rates: 1.00, 0.58, 0.51, 0.93, 1.38 operation: pre 74: 1.00, post 74: 1.47 construction period:

21 Some conclusions for ship damage data Main-effects model: Deviance = 38.7 on 25 df; X 2 = 42.3 on 25 df Suggests moderate over-dispersion (or interaction) X 2 /df = 1.69 Stationarity: coefficient of log(t): ˆβ = 0.9 ± 0.1 consistent with β = 1 Interactions: by adding 2-factor terms (one at a time) Not much evidence of interaction X2 <- sum((y - fit$fitted)^2/fit$fitted) summary(fit, dispersion=x2/25) to get correct standard errors with dispersion factor Conclusions regarding risk factors ship type relative rates: 1.00, 0.58, 0.51, 0.93, 1.38 operation: pre 74: 1.00, post 74: 1.47 construction period:

22 Some conclusions for ship damage data Main-effects model: Deviance = 38.7 on 25 df; X 2 = 42.3 on 25 df Suggests moderate over-dispersion (or interaction) X 2 /df = 1.69 Stationarity: coefficient of log(t): ˆβ = 0.9 ± 0.1 consistent with β = 1 Interactions: by adding 2-factor terms (one at a time) Not much evidence of interaction X2 <- sum((y - fit$fitted)^2/fit$fitted) summary(fit, dispersion=x2/25) to get correct standard errors with dispersion factor Conclusions regarding risk factors ship type relative rates: 1.00, 0.58, 0.51, 0.93, 1.38 operation: pre 74: 1.00, post 74: 1.47 construction period:

23 Tuberculin study design: a 4 4 Latin square Design for tuberculin assay Site Cow class on neck I II III IV 1 A B C D 2 B A D C 3 C D A B 4 D C B A Responses in mm. Cow class I II III IV Treatments: A: Standard double: Wey=0; log vol = 1 B: Standard single: Wey=0; log vol = 0 C: Weybridge single: Wey=1; log vol = 0 D: Weybridge half; Wey=1; log vol = 1 y <- c(454, 249, 349, 249, 408, 322,...,290) site <- gl(4, 4, 16); class <- gl(4, 1, 16) wey <- c(0,0,1,1, 0,0,1,1, 1,1,0,0, 1,1,0,0) vol <- c(1,0,0,-1, 0,1,-1,0, 0,-1,1,0, -1,0,0,1)

24 Log-linear model for tuberculin data Note: Response is a measured variable, not a count fit <- glm(y~site+class+wey+vol, family=poisson(link=log)) X2 <- sum((y - fit$fitted)^2/fit$fitted) summary(fit, dispersion=x2/7) > wey > vol Fitted response... + β w I(W ) + β log(vol) Relative potency of A to B: ratio vol(b)/vol(a) required to produce equal responses β w + β log(vol(w )) = β s + β log(vol(s)) log(vol(s)/ vol(w )) = (β w β s )/β Need a CI for the ratio β w /β of regression coefficients

25 Fieller s method (CI for ratio of means) Suppose ( Y1 Y 2 ) N 2 ( ( µ1 µ 2 ), ( σ11 σ )) 12 σ 21 σ 22 with Σ known (for simplicity). We observe (y 1, y 2 ) and want a CI for θ = µ 1 /µ 2. Fieller s (1954) pivotal argument: R(Y ; θ) = Y 1 θy 2 N(0, τ 2 (θ)) Y 1 θy 2 σ11 2θσ 12 + θ 2 σ 22 N(0, 1) 90% CI : {θ : < R(y; θ) < 1.645} Properties: exact 90% coverage under assumptions as stated Interval always includes ˆθ = y 1 /y 2 : (non-empty) Either I = (θ L (y), θ U (y)) (bounded) or I = (θ L (y), θ U (y)) Interval may be whole space: I =

26 Fieller interval with estimated variance ( Y1 Y 2 with C known. ) ( ( µ1 ) N 2, σ 2( C 11 C )) 12 µ 2 C 21 C 22 s 2 σ 2 χ 2 f Y Fieller pivotal t-statistic: R(Y ; θ) Y 1 θy 2 s t f C 11 2θC 12 + θ 2 C 22 (1 α)ci : {θ : t f (α/2) < R(y; θ) < t f (1 α/2)} R 2 (y, θ) = t 2 (α/2) (quadratic in θ) Generates random intervals of same structure: exact coverage as stated may be a bounded real interval or the complement may be the whole space does not meet with universal approval

27 Tuberculin relative potency (continued) Relative potency of Weybridge to Standard: ratio vol(s)/vol(w) required to produce equal responses β w + β log(vol(w )) = β s + β log(vol(s)) θ = log(vol(s)/ vol(w )) = (β w β s )/β Need a CI for the ratio of regression coefficients: ( ˆβ wˆβ ) N 2 ( ( βw β s 2 = on 7 d.f. ), σ 2( ) ) Point estimate: ˆθ = / = Fieller 95% CI for θ: (0.8735, ) for log rel potency CI for rel potency: 2 θ : (1.83, 2.22) (logs coded to base 2) Weybridge roughly twice as potent as Standard

28 Graphical illustration of Fieller interval R R^2(y,theta) plotted against theta 95% confidence interval for the log relative potency of Weybridge to Standard using the Fieller (or L R) pivotal statistic. 95% cutoff = qf(0.95, 1, 7) = % CI: (0.874, , 1.153) 99% CI: (0.804, , 1.223) theta

29 Association between birth month and death month Dates for 348 famous Americans: Phillips and Feldmam 1973: Month of death J F M A M J J A S O N D J F M A M J J A S O N D Q1: Is there an association between B and D? Q2: Is the distn of differences D B unusual?

30 Association between birth month and death month Month of death J F M A M J J A S O N D T J F M A M J J A S O N D T Q1: Is there an association between B and D? Q2: Is the distn of differences D B unusual?

31 Statistical strategies for birth and death 1. Terminology: the table and the values M = {Jan, Feb,..., Nov, Dec} Tbl = M 2 (144 ordered pairs (empty cells)) Y : Tbl R value in the table structure in the table; pattern in the values 2. The indexing system is the table Homologous factors A and B: (same set of levels) Diagonal is special : A = B 3. Circularly ordered levels gives additional structure a metric d(x, x ) = d(x, x) on M is a function on M 2, a covariate 4. Q: Is there a pattern in the values associated with the table structure, e.g. with the metric? W/O structure: X 2 = 109.9; dev = on 121 df no indication of any pattern...

32 Focusing the question by exploiting structure Data tabulated by metric: death month - birth month death month - birth month (mod 12) diff diff count fitted resid (uniform): X 2 = 22.1 on 11 df; p = 2.4% (non-uniform): X 2 = 21.7 on 11 df; p = 2.7% Total deaths (3m before, 3m after) = (73, 114) Total deaths (5m before, 5m after) = (124, 174) Similar analysis using GLMs fit0 <- glm(y birth+death, offset=log(days), family=poisson()) fit1 <- glm(y birth+death+diff,... Dev0 - Dev1 = 22.6 on 11 df Other conclusions: Excess of famous births in Jan; big deficit May, June Excess deaths April, July; big deficit in Nov

33 Hypergeometric distributions Space: 2-way tables of non-negative integers y rs Row and column totals specified Hypergeometric distn: p(y totals) = y..! y rs! yr.! y.s! Hypergeometric distribution by random matching Two lists of length n = y.. R-labels: R = (R 1,..., R n ) (values 1,..., k) y r. individuals have R i = r C-labels: C = (C 1,..., C n ) (values 1,..., k ) y.c individuals have C i = c List matching: (i π(i)) (R, C π ) = (R 1, C π(1) ),..., (R n, C π(n) ) Table: H rs = #{i : (R i, C π(i) ) = (r, s)} Row totals of H: same as those of y Uniform random matching: π uniform on the group

34 p-values by hypergeometric simulation Given 2-way array Y Generate a random hypergeometric table Y H(...) with the same marginal totals as Y y, rowsums, colsums, n <- sum(y) xr <- rep(1:nrow(y), rowsums) xc <- rep(1:ncol(y), colsums) ystar <- table(xr, xc[order(runif(n))]) Given a scalar statistic T defined on tables, compute the distn of T (Y ) with Y H(...) for(sim in 1:nsim){ ystar <- table(xr, xc[order(runif(n))]) value[sim] <- T(ystar) } Compute Monte carlo p-value: sum(value >= T(y)) / nsim

35 Distribution of Pearson statistic in sparse case Pearson statistic versus chisq(121) Deviance statistic versus chisq(121) X X Reduced P statistic versus chisq(11) X Reduced D statistic versus chisq(11) X

36 Experimental Design: PNAS paper Fig. 1

37 Mating events for 18 generations Single mating wells Double mating wells Zero Two flies Three flies Four flies Activity level Gen null cc cs sc ss cc.cs sc.ss cc.ss cs.sc zero two three four Tot X 2 = 48.4 X 2 = 30.2 X 2 = 7.8 X 2 = Data taken from the Yekutieli report Activity level: double mating rate decreasing with gen Given the activity level, the type distribution is constant?

38 Are the events in different wells independent? y: homogamic rate for single mating wells x: homogamic rate for double mating wells weighted Pearson correlation:

39 Null distribution of Pearson correlation statistic Statistic: weighted correlation of 18 pairs of binomial fractions cor(y1/m1, y2/m2, wt) Simulate a table Y for homrate1 Simulate an indep table Y2 for homrate2 same row and col totals as in observed tables Compute the sample correlation r Repeat 10 4 times for a null distribution Where does the observed value lie relative to the distribution? Answer by simulation: F( 0.65) 1/850 Answer by normal approx: F ( 0.65) 1/350 Are events in distinct wells independent?

40 Are the events in different wells independent? weighted sample correlation distribution Density F( 0.65)=1/850 X X2value[, 3]

41 Limitations of significance testing Posterior odds versus significance levels posterior odds = How to evaluate the denominator P(data independence) prior odds P(data non-independence) Big question: Why are events in different wells not independent? Open scientific question: Not a statistical question: Speculations:

Lecture 8. Poisson models for counts

Lecture 8. Poisson models for counts Lecture 8. Poisson models for counts Jesper Rydén Department of Mathematics, Uppsala University jesper.ryden@math.uu.se Statistical Risk Analysis Spring 2014 Absolute risks The failure intensity λ(t) describes

More information

Generalized linear models I Linear models

Generalized linear models I Linear models Generalized linear models I Linear models Peter McCullagh Department of Statistics University of Chicago Polokwane, South Africa November 2013 Outline Exchangeability, covariates, relationships, Linear

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00 Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section

More information

Generalized linear models III Log-linear and related models

Generalized linear models III Log-linear and related models Generalized linear models III Log-linear and related models Peter McCullagh Department of Statistics University of Chicago Polokwane, South Africa November 2013 Outline Log-linear models Binomial models

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017 Poisson Regression Gelman & Hill Chapter 6 February 6, 2017 Military Coups Background: Sub-Sahara Africa has experienced a high proportion of regime changes due to military takeover of governments for

More information

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples. Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Exercise 5.4 Solution

Exercise 5.4 Solution Exercise 5.4 Solution Niels Richard Hansen University of Copenhagen May 7, 2010 1 5.4(a) > leukemia

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III) Title: Spatial Statistics for Point Processes and Lattice Data (Part III) Lattice Data Tonglin Zhang Outline Description Research Problems Global Clustering and Local Clusters Permutation Test Spatial

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017 Binary Regression GH Chapter 5, ISL Chapter 4 January 31, 2017 Seedling Survival Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Checking the Poisson assumption in the Poisson generalized linear model

Checking the Poisson assumption in the Poisson generalized linear model Checking the Poisson assumption in the Poisson generalized linear model The Poisson regression model is a generalized linear model (glm) satisfying the following assumptions: The responses y i are independent

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Poisson Regression. The Training Data

Poisson Regression. The Training Data The Training Data Poisson Regression Office workers at a large insurance company are randomly assigned to one of 3 computer use training programmes, and their number of calls to IT support during the following

More information

Linear regression is designed for a quantitative response variable; in the model equation

Linear regression is designed for a quantitative response variable; in the model equation Logistic Regression ST 370 Linear regression is designed for a quantitative response variable; in the model equation Y = β 0 + β 1 x + ɛ, the random noise term ɛ is usually assumed to be at least approximately

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

Sampling bias in logistic models

Sampling bias in logistic models Sampling bias in logistic models Department of Statistics University of Chicago University of Wisconsin Oct 24, 2007 www.stat.uchicago.edu/~pmcc/reports/bias.pdf Outline Conventional regression models

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models 1/37 The Kelp Data FRONDS 0 20 40 60 20 40 60 80 100 HLD_DIAM FRONDS are a count variable, cannot be < 0 2/37 Nonlinear Fits! FRONDS 0 20 40 60 log NLS 20 40 60 80 100 HLD_DIAM

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of

More information

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions

More information

Review of One-way Tables and SAS

Review of One-way Tables and SAS Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409

More information

STA 450/4000 S: January

STA 450/4000 S: January STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Various Issues in Fitting Contingency Tables

Various Issues in Fitting Contingency Tables Various Issues in Fitting Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Complete Tables with Zero Entries In contingency tables, it is possible to have zero entries in a

More information

Reaction Days

Reaction Days Stat April 03 Week Fitting Individual Trajectories # Straight-line, constant rate of change fit > sdat = subset(sleepstudy, Subject == "37") > sdat Reaction Days Subject > lm.sdat = lm(reaction ~ Days)

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Improving the Precision of Estimation by fitting a Generalized Linear Model, and Quasi-likelihood.

Improving the Precision of Estimation by fitting a Generalized Linear Model, and Quasi-likelihood. Improving the Precision of Estimation by fitting a Generalized Linear Model, and Quasi-likelihood. P.M.E.Altham, Statistical Laboratory, University of Cambridge June 27, 2006 This article was published

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Categorical Variables and Contingency Tables: Description and Inference

Categorical Variables and Contingency Tables: Description and Inference Categorical Variables and Contingency Tables: Description and Inference STAT 526 Professor Olga Vitek March 3, 2011 Reading: Agresti Ch. 1, 2 and 3 Faraway Ch. 4 3 Univariate Binomial and Multinomial Measurements

More information

Regression Models for Risks(Proportions) and Rates. Proportions. E.g. [Changes in] Sex Ratio: Canadian Births in last 60 years

Regression Models for Risks(Proportions) and Rates. Proportions. E.g. [Changes in] Sex Ratio: Canadian Births in last 60 years Regression Models for Risks(Proportions) and Rates Proportions E.g. [Changes in] Sex Ratio: Canadian Births in last 60 years Parameter of Interest: (male)... just above 0.5 or (male)/ (female)... "ODDS"...

More information

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model Regression: Part II Linear Regression y~n X, 2 X Y Data Model β, σ 2 Process Model Β 0,V β s 1,s 2 Parameter Model Assumptions of Linear Model Homoskedasticity No error in X variables Error in Y variables

More information

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015 Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Faculty of Science FINAL EXAMINATION Mathematics MATH 523 Generalized Linear Models

Faculty of Science FINAL EXAMINATION Mathematics MATH 523 Generalized Linear Models Faculty of Science FINAL EXAMINATION Mathematics MATH 523 Generalized Linear Models Examiner: Professor K.J. Worsley Associate Examiner: Professor R. Steele Date: Thursday, April 17, 2008 Time: 14:00-17:00

More information

Notes for week 4 (part 2)

Notes for week 4 (part 2) Notes for week 4 (part 2) Ben Bolker October 3, 2013 Licensed under the Creative Commons attribution-noncommercial license (http: //creativecommons.org/licenses/by-nc/3.0/). Please share & remix noncommercially,

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Poisson Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Poisson Regression 1 / 49 Poisson Regression 1 Introduction

More information

Answer Key for STAT 200B HW No. 8

Answer Key for STAT 200B HW No. 8 Answer Key for STAT 200B HW No. 8 May 8, 2007 Problem 3.42 p. 708 The values of Ȳ for x 00, 0, 20, 30 are 5/40, 0, 20/50, and, respectively. From Corollary 3.5 it follows that MLE exists i G is identiable

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,

More information

MSH3 Generalized linear model

MSH3 Generalized linear model Contents MSH3 Generalized linear model 7 Log-Linear Model 231 7.1 Equivalence between GOF measures........... 231 7.2 Sampling distribution................... 234 7.3 Interpreting Log-Linear models..............

More information

options description set confidence level; default is level(95) maximum number of iterations post estimation results

options description set confidence level; default is level(95) maximum number of iterations post estimation results Title nlcom Nonlinear combinations of estimators Syntax Nonlinear combination of estimators one expression nlcom [ name: ] exp [, options ] Nonlinear combinations of estimators more than one expression

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

Generalized linear models

Generalized linear models Generalized linear models Christopher F Baum ECON 8823: Applied Econometrics Boston College, Spring 2016 Christopher F Baum (BC / DIW) Generalized linear models Boston College, Spring 2016 1 / 1 Introduction

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14

More information

Package HGLMMM for Hierarchical Generalized Linear Models

Package HGLMMM for Hierarchical Generalized Linear Models Package HGLMMM for Hierarchical Generalized Linear Models Marek Molas Emmanuel Lesaffre Erasmus MC Erasmus Universiteit - Rotterdam The Netherlands ERASMUSMC - Biostatistics 20-04-2010 1 / 52 Outline General

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

Generalised linear models. Response variable can take a number of different formats

Generalised linear models. Response variable can take a number of different formats Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion

More information

Confidence Interval for the mean response

Confidence Interval for the mean response Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018 Work all problems. 60 points are needed to pass at the Masters Level and 75

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs Outline Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team UseR!2009,

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

Review of Poisson Distributions. Section 3.3 Generalized Linear Models For Count Data. Example (Fatalities From Horse Kicks)

Review of Poisson Distributions. Section 3.3 Generalized Linear Models For Count Data. Example (Fatalities From Horse Kicks) Section 3.3 Generalized Linear Models For Count Data Review of Poisson Distributions Outline Review of Poisson Distributions GLMs for Poisson Response Data Models for Rates Overdispersion and Negative

More information

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News: Today HW 1: due February 4, 11.59 pm. Aspects of Design CD Chapter 2 Continue with Chapter 2 of ELM In the News: STA 2201: Applied Statistics II January 14, 2015 1/35 Recap: data on proportions data: y

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Sample solutions. Stat 8051 Homework 8

Sample solutions. Stat 8051 Homework 8 Sample solutions Stat 8051 Homework 8 Problem 1: Faraway Exercise 3.1 A plot of the time series reveals kind of a fluctuating pattern: Trying to fit poisson regression models yields a quadratic model if

More information

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto. Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal

More information

Case-Control Association Testing. Case-Control Association Testing

Case-Control Association Testing. Case-Control Association Testing Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/ Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.

More information

Lecture 3 Linear random intercept models

Lecture 3 Linear random intercept models Lecture 3 Linear random intercept models Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The response is measures at n different times, or under

More information

Generalized Linear Models in R

Generalized Linear Models in R Generalized Linear Models in R NO ORDER Kenneth K. Lopiano, Garvesh Raskutti, Dan Yang last modified 28 4 2013 1 Outline 1. Background and preliminaries 2. Data manipulation and exercises 3. Data structures

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

MSH3 Generalized linear model Ch. 6 Count data models

MSH3 Generalized linear model Ch. 6 Count data models Contents MSH3 Generalized linear model Ch. 6 Count data models 6 Count data model 208 6.1 Introduction: The Children Ever Born Data....... 208 6.2 The Poisson Distribution................. 210 6.3 Log-Linear

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION FOR SAMPLE OF RAW DATA (E.G. 4, 1, 7, 5, 11, 6, 9, 7, 11, 5, 4, 7) BE ABLE TO COMPUTE MEAN G / STANDARD DEVIATION MEDIAN AND QUARTILES Σ ( Σ) / 1 GROUPED DATA E.G. AGE FREQ. 0-9 53 10-19 4...... 80-89

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence ST3241 Categorical Data Analysis I Two-way Contingency Tables Odds Ratio and Tests of Independence 1 Inference For Odds Ratio (p. 24) For small to moderate sample size, the distribution of sample odds

More information