Stat 704: Data Analysis I, Fall 2010

Size: px
Start display at page:

Download "Stat 704: Data Analysis I, Fall 2010"

Transcription

1 Stat 704: Data Analysis I, Fall 2010 Generalized linear models Generalize regular regression to non-normal data {(Y i,x i )} N i=1, most often Bernoulli or Poisson Y i. The general theory of GLMs has been developed to outcomes in the exponential family (normal, gamma, Poisson, binomial, negative binomial, ordinal/nominal multinomial). The ith mean is µ i = E(Y i ) The ith linear predictor is η i = β 0 + β 1 x i1 + + β k x ik = x i β. A GLM relates the mean to the linear predictor through a link function g( ): g(µ i ) = η i = x iβ. 1

2 Chapter 14: Logistic Regression 14.1, 14.2 Binary response regression Let Y i Bern(π i ). Y i might indicate the presence/absence of a disease, whether someone has obtained their drivers license or not, etc. We wish to relate the probability of success to explanatory covariates x i = (x i1,...,x ik ) through π i = π(x i ) = g 1 (x iβ). So then, Y i Bern(π(x i )), and E(Y i ) = π(x i ) and var(y i ) = π(x i )[1 π(x i )]. g( ) is called a link function. It relates E(Y i ) = π(x i ) to η i = x i β. 2

3 Simplest link, identity g(x) = x When g(x) = x, the identity link, we have π(x i ) = β x i. When x i = x i is one-dimensional, this reduces to Y i Bern(α + βx i ). When x i large or small, π(x i ) can be less than zero or greater than one. Appropriate for a restricted range of x i values. Can of course be extended to π(x i ) = β x i where x i = (1, x i1,...,x ik ). Can be fit in SAS proc genmod. 3

4 Example: Association between snoring (as measured by a snoring score) and heart disease. Let s be someone s snoring score, s {0, 2, 4, 5}. Heart disease Proportion Snoring s yes no yes Never Occasionally Nearly every night Every night This is fit in proc genmod: data glm; input snoring disease datalines; ; proc genmod; model disease/total = snoring / dist=bin link=identity; run; 4

5 The GENMOD Procedure Model Information Description Value Distribution BINOMIAL Link Function IDENTITY Dependent Variable DISEASE Dependent Variable TOTAL Observations Used 4 Number Of Events 110 Number Of Trials 2484 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Pearson Chi-Square Log Likelihood Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT SNORING SCALE

6 The fitted model is ˆπ(s) = s. For every unit increase in snoring score s, the probability of heart disease increases by about 2%. The p-values test H 0 : α = 0 and H 0 : β = 0. The latter is more interesting and we reject at the α = level. The probability of heart disease is strongly, linearly related to the snoring score. What do you think that SCALE term is in the output? Note: P(χ 2 2 > ) (later...) 6

7 14.2, 14.3 Logistic regression Often a fixed change in x has less impact when π(x) is near zero or one. Example: Let π(x) be probability of getting an A in a statistics class and x is the number of hours a week you work on homework. When x = 0, increasing x by 1 will change your (very small) probability of an A very little. When x = 4, adding an hour will change your probability quite a bit. When x = 20, that additional hour probably wont improve your chances of getting an A much. You were at essentially π(x) 1 at x = 10. Of course, this is a mean model. Individuals will vary. 7

8 The most widely used nonlinear function to model probabilities is the canonical, logit link: logit(π i ) = α + βx i. Solving for π i and then dropping the subscripts we get the probability of success (Y = 1) as a function of x: π(x) = exp(α + βx) 1 + exp(α + βx). When β > 0 the function increases from 0 to 1; when β < 0 it decreases. When β = 0 the function is constant for all values of x and Y is unrelated to x. The logistic function is logit 1 (x) = e x /(1 + e x ). 8

9 Figure 1: Logistic curves π(x) = e α+βx /(1 + e α+βx ) with (α, β) = (0, 1), (0, 0.4), ( 2, 0.4), ( 3, 1). What about (α, β) = (log 2, 0)? 9

10 To fit the snoring data to the logistic regression model we use the same SAS code as before (proc genmod) except specify LINK=LOGIT and obtain ˆα = 3.87 and ˆβ = 0.40 as maximum likelihood estimates. Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Pearson Chi-Square Log Likelihood Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT SNORING SCALE You can also use proc logistic to fit binary regression models. proc logistic; model disease/total = snoring; 10

11 The LOGISTIC Procedure Response Variable (Events): DISEASE Response Variable (Trials): TOTAL Number of Observations: 4 Link Function: Logit Response Profile Ordered Binary Value Outcome Count 1 EVENT NO EVENT 2374 Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC LOG L with 1 DF (p=0.0001) Score with 1 DF (p=0.0001) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT SNORING

12 Association of Predicted Probabilities and Observed Responses Concordant = 58.6% Somers D = Discordant = 16.7% Gamma = Tied = 24.7% Tau-a = ( pairs) c = The fitted model is then ˆπ(x) = exp( x) 1 + exp( x). As before, we reject H 0 : β = 0; there is a strong, positive association between snoring score and developing heart disease. Note: P(χ 2 2 > ) (later...) 12

13 Example: Crab mating Data on N = 173 female horseshoe crabs. C = color (1,2,3,4=light medium, medium, dark medium, dark). S = spine condition (1,2,3=both good, one worn or broken, both worn or broken). W = carapace width (cm). Wt = weight (kg). Sa = number of satellites (additional male crabs besides her nest-mate husband) nearby. We are initially interested in the probability that a female horseshoe crab has one or more satellites (Y i = 1) as a function of carapace width. 13

14 data crabs; input color spine width satell weight weight=weight/1000; color=color-1; y=0; if satell>0 then y=1; datalines;

15 ; proc logistic data=crabs; model y(event= 1 )=width / link=logit aggregate scale=none lackfit; event=1 tells SAS to model π i = P(Y i = 1) rather than π i = P(Y i = 0). The default link is logit (giving logistic regression) I specify it here anyway for transparency. aggregate scale=none lackfit give goodness-of-fit statistics, more later... 15

16 The LOGISTIC Procedure Model Information Response Variable y Number of Response Levels 2 Model binary logit Optimization Technique Fisher s scoring Response Profile Ordered Total Value y Frequency Probability modeled is y=1. Deviance and Pearson Goodness-of-Fit Statistics Criterion Value DF Value/DF Pr > ChiSq Deviance Pearson Number of unique profiles: 66 Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L

17 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio <.0001 Score <.0001 Wald <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 width <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits width Hosmer and Lemeshow Goodness-of-Fit Test Chi-Square DF Pr > ChiSq

18 14.3 Model interpretation Lets start with simple logistic regression: ( e α+βx i Y i bin n i, 1 + e α+βx i ). An odds ratio: let s look at how the odds of success changes when we increase x by one unit: [ ] [ ] e α+βx+β 1 π(x + 1)/[1 π(x + 1)] / 1+e = α+βx+β 1+e [ [ α+βx+β π(x)/[1 π(x)] e α+βx 1+e α+βx ] / = eα+βx+β e α+βx = e β. ] 1 1+e α+βx When we increase x by one unit, the odds of an event occurring increases by a factor of e β, regardless of the value of x. 18

19 So e β is an odds ratio. We also have π(x) x = βπ(x)[1 π(x)]. Note that π(x) changes more when π(x) is away from zero or one than when π(x) is near 0.5. This gives us approximately how π(x) changes when x increases by a unit. This increase depends on x, unlike the odds ratio. Horseshoe crab data Let s look at Y i = 1 if a female crab has one or more satellites, and Y i = 0 if not. So π(x) = eα+βx 1 + e α+βx, is the probability of a female having more than her nest-mate around as a function of her width x. 19

20 Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 width <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits width We estimate the probability of a satellite as ˆπ(x) = e x 1 + e x. The odds of having a satellite increases by a factor between 1.3 and 2.0 times for every cm increase in carapace width. The coefficient table houses estimates ˆβ j, se(ˆβ j ), and the Wald statistic z 2 j = {ˆβ j /se(ˆβ j )} 2 and p-value for testing H 0 : β j = 0. What do we conclude here? 20

21 14.7 Goodness of fit and grouping The deviance GOF statistic is defined to be N { ( ) ( )} yi ni y i D = 2 y i log + (n i y i ) log, n iˆπ i n i n iˆπ i i=1 where ˆπ i = ex iˆβ 1+e x iˆβ Pearson s GOF statistic is are fitted values. X 2 = N i=1 (y i n iˆπ i ) 2 n iˆπ i (1 ˆπ i ). Both statistics are approximately χ 2 N p in large samples assuming that the number of trials n = N i=1 n i increases in such a way that each n i increases. (Same type of GOF test requiring replicates in Sections 3.7 & 6.8) 21

22 Group your data Binomial data is often recorded as individual (Bernoulli) records: i y i n i x i Grouping the data yields an identical model: i y i n i x i ˆβ, se(ˆβ j ), and L(ˆβ) don t care if data are grouped. The quality of residuals and GOF statistics depend on how data are grouped. D and Pearson s X 2 will change! (Bottom, p. 590). 22

23 In PROC LOGISTIC type AGGREGATE and SCALE=NONE after the MODEL statement to get D and X 2 based on grouped data. This option does not compute residuals based on the grouped data. You can aggregate over all variables or a subset, e.g. AGGREGATE=(width). The Hosmer and Lemeshow test statistic orders observations (x i, Y i ) by fitted probabilities ˆπ(x i ) from smallest to largest and divides them into (typically) g = 10 groups of roughly the same size. A Pearson test statistic is computed from these g groups. 23

24 The statistic would have a χ 2 g p distribution if each group had exactly the same predictor x for all observations. In general, the null distribution is approximately χ 2 g 2 (see text). Termed a near-replicate GOF test. The LACKFIT option in PROC LOGISTIC gives this statistic. Pearson, Deviance, and Hosmer & Lemeshow all provide a p-value for the null H 0 : the model fits based on χ 2 c p where c is the distinct number of covariate levels (using AGGREGATE). The alternative model for the first two is the saturated model where every µ i is simply replaced by y i. Can also test logit{π(x)} = β 0 + β 1 x versus more general model logit{π(x)} = β 0 + β 1 x + β 2 x 2 via H 0 : β 2 = 0. 24

25 Raw (Bernoulli) data with aggregate scale=none lackfit; Deviance and Pearson Goodness-of-Fit Statistics Criterion Value DF Value/DF Pr > ChiSq Deviance Pearson Number of unique profiles: 66 Partition for the Hosmer and Lemeshow Test y = 1 y = 0 Group Total Observed Expected Observed Expected Hosmer and Lemeshow Goodness-of-Fit Test Chi-Square DF Pr > ChiSq

26 Comments: There are c = 66 distinct widths {x i } out of N = 173 crabs. For χ to hold, we must keep sampling crabs that only have one of the 66 fixed number of widths! Does that make sense here? The Hosmer and Lemeshow test gives a p-value of 0.73 based on g = 10 groups. Are assumptions going into this p-value met? None of the GOF tests have assumptions that are met in practice for continuous predictors. Are they still useful? The raw statistics do not tell you where lack of fit occurs. Deviance and Pearson residuals do tell you this (later). Also, the table provided by the H-L tells you which groups are ill-fit should you reject H 0 : logistic model holds. GOF tests are meant to detect gross deviations from model assumptions. No model ever truly fits data except hypothetically. 26

27 Categorical predictors Let s say we wish to include variable X, a categorical variable that takes on values x {1, 2,..., I}. We need to allow each level of X = x to affect π(x) differently. This is accomplished by the use of dummy variables. This is typically done one of two ways. Define z 1, z 2,...,z I 1 as follows: 1 X = j z j = 1 X j This is the default in PROC LOGISTIC with a CLASS X statement. Say I = 3, then the model is logit π(x) = β 0 + β 1 z 1 + β 2 z 2. 27

28 which gives logit π(x) = β 0 + β 1 β 2 when X = 1 logit π(x) = β 0 β 1 + β 2 when X = 2 logit π(x) = β 0 β 1 β 2 when X = 3 At alternative method uses zero/one dummies instead: 1 X = j z j = 0 X j This is the default if PROC GENMOD with a CLASS X statement. This can also be obtained in PROC LOGISTIC with the PARAM=REF option. This sets class X = I as baseline. Say I = 3, then the model is logit π(x) = β 0 + β 1 z 1 + β 2 z 2. 28

29 which gives logit π(x) = β 0 + β 1 when X = 1 logit π(x) = β 0 + β 2 when X = 2 logit π(x) = β 0 when X = 3 I prefer the latter method because it s easier to think about for me. You can choose a different baseline category with REF=FIRST next to the variable name in the CLASS statement. 29

30 Example: Study of relationship between drinking during pregnancy and congenital malformations. Drinks per day Malformation 0 < Absent 17,066 14, Present data mal; input cons present total=present+absent; datalines; ; proc logistic; class cons / param=ref; model present/total = cons; 30

31 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq cons Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept cons cons cons cons Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits cons 1 vs cons 2 vs cons 3 vs cons 4 vs

32 The model is logit π(x) = β 0 +β 1 I{X = 1}+β 2 I{X = 2}+β 3 I{X = 3}+β 4 I{X = 4} where X denotes alcohol consumption X = 1, 2, 3, 4, 5. Type 3 analyses test whether all dummy variables associated with a categorical predictor are simultaneously zero, here H 0 : β 1 = β 2 = β 3 = β 4 = 0. If we accept this then the categorical predictor is not needed in the model. PROC LOGISTIC gives estimates and CIs for e β j for j = 1, 2, 3, 4. Here, these are interpreted as the odds of developing malformation when X = 1, 2, 3, or 4 versus the odds when X = 5. We are not as interested in the individual Wald tests H 0 : β j = 0 for a categorical predictor. Why is that? Because they only compare a level X = 1, 2, 3, 4 to baseline X = 5, not to each other. 32

33 The Testing Global Null Hypothesis: BETA=0 are three tests that no predictor is needed; H 0 : logit{π(x)} = β 0 versus H 1 : logit{π(x)} = x β. Anything wrong here? There are exact tests available... Note that the Wald test for H 0 : β = 0 is the same as the Type III test that consumption is not important. Why is that? Let Y = 1 denote malformation for a randomly sampled individual. To get an odds ratio for malformation from increasing from, say, X = 2 to X = 4, note that P(Y = 1 X = 2)/P(Y = 0 X = 2) P(Y = 1 X = 4)/P(Y = 0 X = 4) = eβ 2 β 4. This is estimated with the CONTRAST command. 33

34 Let s use X = 1 as the reference level. Then the model is logit π(x) = β 0 +β 1 I{X = 2}+β 2 I{X = 3}+β 3 I{X = 4}+β 4 I{X = 5}. We may be interested in the how the odds of malformation changes when dropping from 3-4 drinks per week (X = 4) to less than one drink per week (X = 2), given by e β 3 β 1. A contrast is a linear combination c β = c 1 β 1 +c 2 β 2 + +c p 1 β p 1. We are specifically interested in H 0 : β 3 = β 1, or equivalently, H 0 : β 3 β 1 = 0, as well as estimating e β 3 β 1. proc logistic; class cons / param=ref ref=first; model present/total = cons; contrast "beta3-beta1" cons ; contrast "exp(beta3-beta1)" cons / estimate=exp; contrast "beta3-beta1" cons / estimate; 34

35 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 cons cons cons cons Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits cons 2 vs cons 3 vs cons 4 vs cons 5 vs Let θ ij be the odds ratio for malformation when going from level X = i to X = j. We automatically get ˆθ 21 = e = 0.934, ˆθ 31 = e = 2.26, etc. Since θ 42 = θ 41 /θ 21 we can estimate ˆθ 42 = 2.822/0.934 = 3.02, or else directly from the dummy variable coefficients, e ( 0.068) =

36 The CONTRAST command allows us to further test H 0 : β 3 = β 1 and to get a 95% CI for the odds ratio θ 42 = e β 3 β 1. Contrast Test Results Wald Contrast DF Chi-Square Pr > ChiSq beta3-beta exp(beta3-beta1) beta3-beta Contrast Rows Estimation and Testing Results Standard Wald Contrast Type Row Estimate Error Alpha Confidence Limits Chi-Square Pr > ChiSq exp(beta3-beta1) EXP beta3-beta1 PARM We are allowed linear contrasts or the exponential of linear contrasts. To get, for example, the relative risk of malformation, h(β) = P(Y = 1 X = 4) P(Y = 1 X = 2) = eβ0+β3 /[1 + e β0+β3 ] e β 0+β 1/[1 + e β 0 +β 1], takes more work. 36

37 14.4 Multiple predictors Now we have p 1 predictors x i = (1, x i1,...,x i,p 1 ) and fit ( ) exp(β 0 + β 1 x i1 + + β p 1 x i,p 1 ) Y i bin n i,. 1 + exp(β 0 + β 1 x i1 + + β p 1 x i,p 1 ) Many of these predictors may be sets of dummy variables associated with categorical predictors. e β j is now termed the adjusted odds ratio. This is how the odds of the event occurring changes when x j increases by one unit keeping the remaining predictors constant. This interpretation may not make sense if two predictors are highly related. Examples? 37

38 14.5: Tests of regression effects An overall test of H 0 : logit π(x) = β 0 versus H 1 : logit π(x) = x β is generated in PROC LOGISTIC three different ways: LRT, score, and Wald versions. This checks whether some subset of variables in the model is important. Recall the crab data covariates: C = color (1,2,3,4=light medium, medium, dark medium, dark). S = spine condition (1,2,3=both good, one worn or broken, both worn or borken). W = carapace width (cm). Wt = weight (kg). We ll take C = 4 and S = 3 as baseline categories. 38

39 There are two categorical predictors, C and S, and two continuous predictors W and Wt. Let Y = 1 if a randomly drawn crab has one or more satellites and x = (C, S, W, Wt) be her covariates. An additive model including all four covariates would look like logit π(x) = β 0 + β 1 I{C = 1} + β 2 I{C = 2} + β 3 I{C = 3} This model is fit via proc logistic data=crabs1 descending; class color spine / param=ref; model y = color spine width weight / lackfit; +β 4 I{S = 1} + β 5 I{S = 2} + β 6 W + β 7 Wt The H-L GOF statistic yields p = 0.88 so there s no evidence of gross lack of fit. The parameter estimates are: 39

40 Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept color color color spine spine width weight Color seems to be important. Plugging in ˆβ for β, logit ˆπ(x) = I{C = 1} I{C = 2} I{C = 3} 0.40I{S = 1} 0.50I{S = 2} W Wt Overall checks that one or more predictors are important: Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio <.0001 Score <.0001 Wald

41 The Type III tests are (1) H 0 : β 1 = β 2 = β 3 = 0, that color is not needed to explain whether a female has satellite(s), (2) H 0 : β 4 = β 5 = 0, whether spine is needed, (3) H 0 : β 6 = 0, whether width is needed, and (4) H 0 : β 7 = 0, whether weight is needed: Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq color spine width weight The largest p-value is 0.6 for dropping spine condition from the model. When refitting the model without spine condition, we still strongly reject H 0 : β 1 = β 2 = β 3 = β 4 = β 5 = β 6 = 0, and the H-L shows no evidence of lack of fit. We have: Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq color width weight

42 We do not reject that we can drop weight from the model, and so we do: Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio <.0001 Score <.0001 Wald <.0001 Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq color width <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 color color color width <

43 The new model is logit π(x) = β 0 + β 1 I{C = 1} + β 2 I{C = 2}β 3 I{C = 3} + β 4 W. We do not reject that color can be dropped from the model H 0 : β 1 = β 2 = β 3, but we do reject that the dummy for C = 2 can be dropped, H 0 : β 2 = 0. Maybe unnecessary levels in color are clouding its importance. Let s see what happens when we try to combine levels of C. proc logistic data=crabs1 descending; class color spine / param=ref; model y = color width / lackfit; contrast 1 vs 2 color 1-1 0; contrast 1 vs 3 color 1 0-1; contrast 1 vs 4 color 1 0 0; contrast 2 vs 3 color 0 1-1; contrast 2 vs 4 color 0 1 0; contrast 3 vs 4 color 0 0 1; 43

44 p-values for combining levels: Contrast Test Results Wald Contrast DF Chi-Square Pr > ChiSq 1 vs vs vs vs vs vs We reject that we can combine levels C = 2 and C = 4, and almost reject combining C = 3 and C = 4. Let s combine C = 1, 2, 3 into one category D = 1 not dark and C = 4 is D = 2, dark. proc logistic data=crabs1 descending; class dark / param=ref ref=first; model y = dark width / lackfit; We include dark=1; if color=4 then dark=2; in the DATA step. Annotated output: 44

45 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio <.0001 Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq dark width <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 dark width <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits dark 2 vs width Hosmer and Lemeshow Goodness-of-Fit Test Chi-Square DF Pr > ChiSq

46 Comments: The odds of having satellite(s) significantly decreases by a little less than a third, 0.27, for dark crabs regardless of width. The odds of having satellite(s) significantly increases by a factor of 1.6 for every cm increase in carapace width regardless of color. Lighter, wider crabs tend to have satellite(s) more often. The H-L GOF test shows no gross LOF. We didn t check for interactions. If an interaction between color and width existed, then the odds ratio of satellite(s) for dark versus not dark crabs would change with how wide she is. 46

47 Interactions and quadratic effects An additive model is easily interpreted because an odds ratio from changing values of one predictor does not change with levels of another predictor. However, often this incorrect and we may introduce additional terms into the model such as interactions. An interaction between two predictors allows the odds ratio for increasing one predictor to change with levels of another. For example, in the last model fit the odds of having satellite(s) decreases by 0.27 for dark crabs vs. not dark regardless of carapace width. A two-way interaction is defined by multiplying the variables together; if one or both variables are categorical then all possible pairings of dummy variables are considered. 47

48 Example: Say we have two categorical predictors, X = 1, 2, 3 and Z = 1, 2, 3, 4. An additive model is logit π(x, Z) = β 0 + β 1 I{X = 1} + β 2 I{X = 2} +β 3 I{Z = 1} + β 4 I{Z = 2} + β 5 I{Z = 3}. The model that includes an interaction between X and Z adds (3 1)(4 1) = 6 additional dummy variables accounting for all possible ways, i.e. all levels of Z, the log odds can change between from X = i to X = j. The new model is rather cumbersome: logit π(x, Z) = β 0 + β 1 I{X = 1} + β 2 I{X = 2} +β 3 I{Z = 1} + β 4 I{Z = 2} + β 5 I{Z = 3} +β 6 I{X = 1}I{Z = 1} + β 7 I{X = 1}I{Z = 2} +β 8 I{X = 1}I{Z = 3} + β 9 I{X = 2}I{Z = 1} +β 10 I{X = 2}I{Z = 2} + β 11 I{X = 2}I{Z = 3}. 48

49 In PROC GENMOD and PROC LOGISTIC, categorical variables are defined through the CLASS statement and all dummy variables are created and handled internally. The Type III table provides a test that the interaction can be dropped; the table of regression coefficients tell you whether individual dummies can be dropped. Let s consider the crab data again, but consider an interaction between categorical D and continuous W: proc logistic data=crabs1 descending; class dark / param=ref ref=first; model y = dark width dark*width / lackfit; Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq dark width <.0001 width*dark We accept that the interaction is not needed. 49

50 Let s consider the interaction model anyway, for illustration: The model is: Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 dark width <.0001 width*dark logit π(d, W) = I{D = 2} W 0.32I{D = 2}W. The odds ratio for the probability of satellite(s) going from D = 2 to D = 1 is estimated P(Y = 1 D = 2, W)/P(Y = 0 D = 2, W) P(Y = 1 D = 1, W)/P(Y = 0 D = 1, W) How about the odds ratio going from W to W + 1? = e W 0.32W = e W. e W 50

51 For a categorical predictor X with I levels, adding I 1 dummy variables allows for a different event probability at each level of X. For a continuous predictor Z, the model assumes that the log-odds of the event increases linearly with Z. This may or may not be a reasonable assumption, but can be checked by adding nonlinear terms, the simplest being Z 2. Consider a simple model with continuous Z: logit π(z) = β 0 + β 1 Z. LOF from this model can manifest itself in rejecting a GOF test (Pearson, deviance, or H-L) or a residual plot that shows curvature (in a few slides...) 51

52 Adding a quadratic term logit π(z) = β 0 + β 1 Z + β 2 Z 2, may improve fit and allows testing the adequacy of the simpler model via H 0 : β 2 = 0. Higher order powers can be added, but the model can become unstable with, say, higher than cubic powers. A better approach might be to fit a generalized additive model (GAM): logit π(z) = f(z), where f( ) is estimated from the data, often using splines. We will discuss this next week... Adding a simple quadratic term can be done, e.g., proc logistic; model y/n = z z*z; 52

53 14.6 Model selection Two competing goals: Model should fit the data well. Model should be simple to interpret (smooth rather than overfit principle of parsimony). Often hypotheses on how the outcome is related to specific predictors will help guide the model building process. Agresti (2002) suggests a rule of thumb: at least 10 events and 10 non-events should occur for each predictor in the model (including dummies). So if N i=1 y i = 40 and N i=1 n i = 830, you should have no more than 40/10 = 4 predictors in the model. 53

54 Horseshoe crab data Recall that in all models fit we strongly rejected H 0 : logit π(x) = β 0 in favor of H 1 : logit π(x) = x β: Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio <.0001 Score <.0001 Wald However, it was not until we carved superfluous predictors from the model that we showed significance for the included model effects. This is an indication that several covariates may be highly related, or correlated. If one or more predictors are perfectly predicted as a linear combination of other predictors the model is overspecified and unidentifiable. Here s an example: logit π(x) = β 0 + β 1 x 1 + β 2 x 2 + β 3 (x 1 3x 2 ). 54

55 The MLE β = (β 0, β 1, β 2, β 3 ) is not unique and the model is said to be unidentifiable. The variable x 1 3x 2 is totally predicted and redundant given x 1 and x 2. Although a perfect linear relationship is usually not met in practice, often variables are highly correlated and therefore one or more are redundant. We need to get rid of some! Although not ideal, automated model selection is necessary with large numbers of predictors. With p 1 = 10 predictors, there are 2 10 = 1024 possible models; with p 1 = 20 there are 1, 048, 576 to consider. Backwards elimination starts with a large pool of potential predictors and step-by-step eliminates those with (Wald) p-values larger than a cutoff (the default is 0.05 in SAS PROC LOGISTIC). 55

56 proc logistic data=crabs1 descending; class color spine / param=ref; model y = color spine width weight color*spine color*width color*weight spine*width spine*weight width*weight / selection=backward; When starting from all main effects and two-way interactions, the default p-value cutoff 0.05 yields only the model with width as a predictor Summary of Backward Elimination Effect Number Wald Step Removed DF In Chi-Square Pr > ChiSq 1 color*spine width*color width*spine weight*spine spine width*weight weight*color weight color Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 width <

57 Let s change the criteria for removing a predictor to p-value model y = color spine width weight color*spine color*width color*weight spine*width spine*weight width*weight / selection=backward slstay=0.15; Yielding a more complicated model: Summary of Backward Elimination Effect Number Wald Step Removed DF In Chi-Square Pr > ChiSq 1 color*spine width*color width*spine weight*spine spine Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept color color color width weight weight*color weight*color weight*color width*weight

58 Let s test if we can simultaneously drop width and width*weight from this model. From the (voluminous) output we find: Intercept Intercept and Criterion Only Covariates AIC SC Log L Fitting the simpler model with color, weight, and color*weight yields Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L There are 2 more parameters in the larger model (for width and width*weight) and we obtain 2(L 0 L 1 ) = = 4.9 and P(χ 2 2 > 4.9) = We barely accept that we can drop width and width*weight at the 5% level. (This is like an F-test...) 58

59 Forward selection starts by fitting each model with one predictor separately and including the model with the smallest p-value under a cutoff (default=0.05 in PROC LOGISTIC). When we instead have SELECTION=FORWARD in the MODEL statement we obtain the model with only width. Changing the cutoff to SLENTRY=0.15 gives the model with width and color. Starting from main effects and working backwards by hand, we ended up with width and color in the model. We further simplified color to dark and non dark crabs. Using backwards elimination with a cutoff of 0.05 we ended up with just width. A cutoff of 0.15 and another by hand step (at the 0.05 level) yielded weight, color, and weight*color. Agresti (2002) considers backwards elimination starting with a three-way interaction model including color, spine condition, and width. The end model is color and width. 59

60 AIC & model selection No model is correct, but some are more useful than others. George Box. It is often of interest to examine several competing models. In light of underlying biology or science, one or more models may have relevant interpretations within the context of why data were collected in the first place. In the absence of scientific input, a widely-used model selection tool is the Akaike information criterion (AIC), AIC = 2[L(ˆβ;y) p]. The L(ˆβ;y) represents model fit. If you add a parameter to a model, L(ˆβ;y) has to increase. If we only used L(ˆβ;y) as a criterion, we d keep adding predictors until we ran out. The p penalizes for the number of the predictors in the model. 60

61 The AIC has very nice properties in large samples in terms of prediction. The smaller the AIC is, the better the model fit (asymptotically). Model AIC W C+Wt+C Wt C+W D + Wt + D Wt D + W If we pick one model, it s W + D, the additive model with width and the dark/nondark category. 61

62 14.8 Diagnostics GOF tests are global checks for model adequacy. The data are (x i, Y i ) for i = 1,...,N. The i th fitted value is an estimate of µ i = E(Y i ), namely Ê(Y i) = ˆµ i = n iˆπ i where π i = eβ x i 1+e β x i and ˆπ i = eˆβ x i 1+eˆβ x i. The raw residual is what we see Y i minus what we predict n iˆπ i. The Pearson residual divides this by an estimate of var(yi ): e i = y i n iˆπ i niˆπ i (1 ˆπ i ). The Pearson GOF statistic is X 2 = N e 2 i. i=1 62

63 The standardized Pearson residual is given by r i = y i n iˆπ i, n iˆπ i (1 ˆπ i )(1 ĥi) where ĥi is the i th diagonal element of the hat matrix Ĥ = Ŵ1/2 X(X ŴX) 1 X Ŵ 1/2 where X is the design matrix 1 x 11 x 1,p 1 1 x 21 x 2,p 1 X = , 1 x N1 x N,p 1 and 63

64 Ŵ = n 1ˆπ 1 (1 ˆπ 1 ) n 2ˆπ 2 (1 ˆπ 2 ) n N ˆπ N (1 ˆπ N ). Alternatively, p. 592 defines a deviance residual. Comments: Plots of residuals r j versus one of the p 1 predictors x ij, for j = 1,...,N might show systematic lack of fit (i.e. a pattern). Adding nonlinear terms or interactions can improve fit. An overall plot is r j versus the linear predictor ˆη j = ˆβ x j. This plot will tell you if the model tends to over or underpredict the observed data for ranges of the linear predictor. 64

65 The r i are approximately N(0, 1) when n i is not small. With truly continuous predictors n i = 1 and the residual plots will have a distinct pattern. I usually flag r i > 3 as being ill-fit by the model. You can look at individual r i to determine model fit. For the crab data, this might flag some individual crabs as ill-fit or unusual relative to the model. The model can t tell the difference between, e.g., two nondark crabs width carapace widths 23 cm. You can aggregate over like values of the predictors to slightly improve the residuals. This way the approximate N(0, 1) may be a bit better. Ill fitting residuals then become predictor values where the aggregated number of events don t match what we d expect under the model. 65

66 Let s look at W + D for the crab data. We ll consider both width and width truncated to an integer cm. The DATA step is data crabs1; input color spine width satell weight; weight=weight/1000; color=color-1; y=0; n=1; if satell>0 then y=1; dark=1; if color=4 then dark=2; w=int(width); * round down; Two models fit & r i plotted: proc logistic data=crabs1 descending; * each crab has n_i=1; class dark; model y = dark width; output out=diag1 reschi=p h=h xbeta=eta; data diag2; set diag1; r=p/sqrt(1-h); proc gplot; plot r*width; plot r*dark; plot r*eta; * plot r_i vs width, dark, eta_i; proc sort data=crabs1; by w dark; * aggregate over coarser widths; proc means data=crabs1 noprint; by w dark; var y n; output out=crabs2 sum=sumy sumn; proc logistic data=crabs2; class dark; model sumy/sumn = dark w; output out=diag3 reschi=p h=h xbeta=eta; data diag4; set diag3; r=p/sqrt(1-h); proc gplot; plot r*w; plot r*dark; plot r*eta; 66

67 Figure 2: Raw data & residual plots, n i = 1. 67

68 Figure 3: Raw data & residual plots, aggregated. 68

69 Influential observations Unlike linear regression, the leverage ĥi in logistic regression depends on the model fit ˆβ as well as the covariates X. Points that have extreme predictor values x i may not have high leverage ĥi if ˆπ i is close to 0 or 1. Here are the influence diagnostics available in PROC LOGISTIC: Leverage ĥi. Still may be useful for detecting extreme predictor values x i. c i = e 2 i ĥi/(1 ĥi) 2 measures the change in the joint confidence region for β when i is left out. DFBETA ij is the standardized change in ˆβ j when observation i is left out. The change in the X 2 GOF statistic when obs. i is left out is DIFCHISQ i = e 2 i /(1 ĥi). ( X 2 i in your book) 69

70 I suggest looking at plots of c i vs. i, and possibly the DFBETA s versus i. Let s look output from the aggregated crab data: proc logistic data=crabs2; class dark; model sumy/sumn = dark w / influence iplots; Pearson Residual Deviance Residual Covariates Case (1 unit = 0.4) (1 unit = 0.27) Number dark1 w Value Value * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 70

71 Hat Matrix Diagonal Intercept Case (1 unit = 0.02) DfBeta (1 unit = 0.07) Number Value Value * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 71

72 dark1 w Case DfBeta (1 unit = 0.1) DfBeta (1 unit = 0.08) Number Value Value * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 72

73 Confidence Interval Displacement C Confidence Interval Displacement CBar Case (1 unit = 0.05) (1 unit = 0.03) Number Value Value * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 73

74 Delta Deviance Delta Chi-Square Case (1 unit = 0.32) (1 unit = 0.66) Number Value Value * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 74

75 Obs. 3 has a large e i (and larger r i ) and is flagged as ill-fit. Obs. 3 also has the largest DIFCHISQ. Obs. 3 is n 3 = 1 skinny (22cm) dark crab that had a satellite. Recall that the probability of having a satellite increases for light crabs and for wider crabs. This observation does not have much of an effect on ˆβ as measured by c i and the DFBETAs, perhaps because it s only 1 crab. Obs. 9 and 13 have the largest c i. Refining their influence, both 9 and 13 have the largest (in magnitude) DFBETAs for the dark dummy variable. However, with relatively small e i, these observations are not ill-fit. Obs. 9 represents y 9 = 3 dark crabs out of n 9 = 6 that have satellites at width 25cm. Obs. 13 is y 13 = 1 out of n 13 = 4 dark crabs at 27cm. These affect the estimate of dark s regression coefficient (adjusting for width) more than the other observations, but are fit well by the model. These two could be driving the significance of the color effect. 75

76 Obs w dark sumy sumn

77 Comments: Your book suggests lowess smooths of residual plots (pp ), based on the identity E(Y i π i ) = 0 for Bernoulli data. I aggregated observations to achieve something similar, but the lowess approach has obvious advantages e.g. it s more or less automatic, easy to carry out, etc. In your homework, you ll consider lowess-smoothed residual plots. You are looking for a line that is approximately zero, not perfectly zero. The line will have a natural increase/decrease at either end if there are lots of zeros or ones e.g. last two plots on p It s easy to get estimated probabilities and associated confidence intervals. This is discussed 14.9 and later in these notes under optional material. Examples of how to obtain this from PROC LOGISTIC are in the sample SAS code. 77

78 14.9 Inferences for the mean function Consider the full model logit{π(x)} = β 0 + β 1 x β p 1 x p 1 = x β. Most types of inferences are functions of β, say g(β). Some examples: g(β) = β j, j th regression coefficient. g(β) = e β j, j th odds ratio. g(β) = e x β /(1 + e x β ), probability π(x). If ˆβ is the MLE of β, then g(ˆβ) is the MLE of g(β). This provides an estimate. The delta method is an all-purpose method for obtaining a standard error for g(ˆβ). 78

79 We know ˆβ N p (β, ĉov(ˆβ)). Let g(β) be a function from R p to R. Taylor s theorem implies, as long as the MLE ˆβ is somewhat close to the true value β, that g(β) g(ˆβ) + [Dg(ˆβ)](β ˆβ), where [Dg(β)] is the vector of first partial derivatives Dg(β) = g(β) β 1 g(β) β 2. g(β) β p. 79

80 Then (ˆβ β) N p (0, ĉov(ˆβ)), implies and finally [Dg(β)] (ˆβ β) N(0, [Dg(β)] ĉov(ˆβ)[dg(β)]), g(ˆβ) N(g(β), [Dg(ˆβ)] ĉov(ˆβ)[dg(ˆβ)]). So se{g(ˆβ)} = [Dg(ˆβ)] ĉov(ˆβ)[dg(ˆβ)]. This can be used to get confidence intervals for probabilities, etc. 80

81 proc logistic data=crabs1 descending; model y = width; output out=crabs2 pred=p lower=l upper=u; proc sort data=crabs2; by width; proc gplot data=crabs2; title "Estimated probabilities with pointwise 95% CI s"; symbol1 i=join color=black; symbol2 i=join color=red line=3; symbol3 i=join color=black; axis1 label=( ); plot (l p u)*width / overlay vaxis=axis1; 81

82 Fitting logistic regression models (pp ) (Optional...) The data are (x i, Y i ) for i = 1,...,N. The model is Y i bin The pmf of Y i in terms of β is [ p(y i ; β) = n i y i ( n i, e β x i 1 + e β x i e β x i 1 + e β x i ] yi [ ). 1 eβ x i 1 + e β x i ] ni y i. The likelihood is the product of all N of these and the log-likelihood simplifies to p N N p L(β) = β j y i x ij log 1 + exp β j x ij + constant. j=1 i=1 i=1 j=1 82

83 The likelihood (or score) equations are obtained by taking partial derivatives of L(β) with respect to elements of β and setting equal to zero. Newton-Raphson is used to get ˆβ, see the following optional slides if interested. The inverse of the covariance of ˆβ has ij th element 2 L(β) β i β j = N x si x sj n s π s (1 π s ), s=1 where π s = eβ xs 1+e β xs. The estimated covariance matrix ĉov(ˆβ) is obtained by replacing β with ˆβ. This can be rewritten ĉov(ˆβ) = {X diag[n iˆπ i (1 ˆπ i )]X} 1. 83

84 How to get the estimates? (Optional...) Newton-Raphson in one dimension: Say we want to find where f(x) = 0 for differentiable f(x). Let x 0 be such that f(x 0 ) = 0. Taylor s theorem tells us f(x 0 ) f(x) + f (x)(x 0 x). Plugging in f(x 0 ) = 0 and solving for x 0 we get ˆx 0 = x f(x) f (x). Starting at an x near x 0, ˆx 0 should be closer to x 0 than x was. Let s iterate this idea t times: x (t+1) = x (t) f(x(t) ) f (x (t) ). Eventually, if things go right, x (t) should be close to x 0. 84

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Chapter 14 Logistic regression

Chapter 14 Logistic regression Chapter 14 Logistic regression Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 62 Generalized linear models Generalize regular regression

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

Chapter 4: Generalized Linear Models-I

Chapter 4: Generalized Linear Models-I : Generalized Linear Models-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Section Poisson Regression

Section Poisson Regression Section 14.13 Poisson Regression Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 26 Poisson regression Regular regression data {(x i, Y i )} n i=1,

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples ST3241 Categorical Data Analysis I Logistic Regression An Introduction and Some Examples 1 Business Applications Example Applications The probability that a subject pays a bill on time may use predictors

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials. The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the

More information

Chapter 4: Generalized Linear Models-II

Chapter 4: Generalized Linear Models-II : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information

Chapter 14 Logistic and Poisson Regressions

Chapter 14 Logistic and Poisson Regressions STAT 525 SPRING 2018 Chapter 14 Logistic and Poisson Regressions Professor Min Zhang Logistic Regression Background In many situations, the response variable has only two possible outcomes Disease (Y =

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T. Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Models for Binary Outcomes

Models for Binary Outcomes Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

Homework 1 Solutions

Homework 1 Solutions 36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports

More information

CHAPTER 1: BINARY LOGIT MODEL

CHAPTER 1: BINARY LOGIT MODEL CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm Kedem, STAT 430 SAS Examples: Logistic Regression ==================================== ssh abc@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm a. Logistic regression.

More information

6 Applying Logistic Regression Models

6 Applying Logistic Regression Models 6 Applying Logistic Regression Models I Model Selection and Diagnostics I.1 Model Selection # of x s can be entered in the model: Rule of thumb: # of events (both [Y = 1] and [Y = 0]) per x 10. Need to

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; SAS Analysis Examples Replication C8 * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; libname ncsr "P:\ASDA 2\Data sets\ncsr\" ; data c8_ncsr ; set ncsr.ncsr_sub_13nov2015

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

(c) Interpret the estimated effect of temperature on the odds of thermal distress. STA 4504/5503 Sample questions for exam 2 1. For the 23 space shuttle flights that occurred before the Challenger mission in 1986, Table 1 shows the temperature ( F) at the time of the flight and whether

More information

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Binary Response: Logistic Regression. STAT 526 Professor Olga Vitek

Binary Response: Logistic Regression. STAT 526 Professor Olga Vitek Binary Response: Logistic Regression STAT 526 Professor Olga Vitek March 29, 2011 4 Model Specification and Interpretation 4-1 Probability Distribution of a Binary Outcome Y In many situations, the response

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Count data page 1. Count data. 1. Estimating, testing proportions

Count data page 1. Count data. 1. Estimating, testing proportions Count data page 1 Count data 1. Estimating, testing proportions 100 seeds, 45 germinate. We estimate probability p that a plant will germinate to be 0.45 for this population. Is a 50% germination rate

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Analysis Mahinda Samarakoon April 6, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 25 Table of contents 1 Building and applying logistic regression models (Chap

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Review of One-way Tables and SAS

Review of One-way Tables and SAS Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409

More information

One-Way Tables and Goodness of Fit

One-Way Tables and Goodness of Fit Stat 504, Lecture 5 1 One-Way Tables and Goodness of Fit Key concepts: One-way Frequency Table Pearson goodness-of-fit statistic Deviance statistic Pearson residuals Objectives: Learn how to compute the

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

The material for categorical data follows Agresti closely.

The material for categorical data follows Agresti closely. Exam 2 is Wednesday March 8 4 sheets of notes The material for categorical data follows Agresti closely A categorical variable is one for which the measurement scale consists of a set of categories Categorical

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine. Horseshoe crab example: There are 173 female crabs for which we wish to model the presence or absence of male satellites dependant upon characteristics of the female horseshoe crabs. 1 satellite present

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

Treatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev

Treatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev Variable selection: Suppose for the i-th observational unit (case) you record ( failure Y i = 1 success and explanatory variabales Z 1i Z 2i Z ri Variable (or model) selection: subject matter theory and

More information

Short Course Introduction to Categorical Data Analysis

Short Course Introduction to Categorical Data Analysis Short Course Introduction to Categorical Data Analysis Alan Agresti Distinguished Professor Emeritus University of Florida, USA Presented for ESALQ/USP, Piracicaba Brazil March 8-10, 2016 c Alan Agresti,

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

7.1, 7.3, 7.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1/ 31

7.1, 7.3, 7.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1/ 31 7.1, 7.3, 7.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 31 7.1 Alternative links in binary regression* There are three common links considered

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Inference for Binomial Parameters

Inference for Binomial Parameters Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

Likelihoods for Generalized Linear Models

Likelihoods for Generalized Linear Models 1 Likelihoods for Generalized Linear Models 1.1 Some General Theory We assume that Y i has the p.d.f. that is a member of the exponential family. That is, f(y i ; θ i, φ) = exp{(y i θ i b(θ i ))/a i (φ)

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Analysing categorical data using logit models

Analysing categorical data using logit models Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester

More information

STAT 705: Analysis of Contingency Tables

STAT 705: Analysis of Contingency Tables STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic

More information

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi LOGISTIC REGRESSION Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi- lmbhar@gmail.com. Introduction Regression analysis is a method for investigating functional relationships

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

More information

BIOS 625 Fall 2015 Homework Set 3 Solutions

BIOS 625 Fall 2015 Homework Set 3 Solutions BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2014 1 / 13 Chapter 8: Polynomials & Interactions

More information

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla. Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

Chapter 1. Modeling Basics

Chapter 1. Modeling Basics Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Statistical Modelling with Stata: Binary Outcomes

Statistical Modelling with Stata: Binary Outcomes Statistical Modelling with Stata: Binary Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 21/11/2017 Cross-tabulation Exposed Unexposed Total Cases a b a + b Controls

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

n y π y (1 π) n y +ylogπ +(n y)log(1 π). Tests for a binomial probability π Let Y bin(n,π). The likelihood is L(π) = n y π y (1 π) n y and the log-likelihood is L(π) = log n y +ylogπ +(n y)log(1 π). So L (π) = y π n y 1 π. 1 Solving for π gives

More information