6 Applying Logistic Regression Models

Size: px
Start display at page:

Download "6 Applying Logistic Regression Models"

Transcription

1 6 Applying Logistic Regression Models I Model Selection and Diagnostics I.1 Model Selection # of x s can be entered in the model: Rule of thumb: # of events (both [Y = 1] and [Y = 0]) per x 10. Need to be aware of collinearity in x s. Use traditional model selection procedures (used when p << n) 1. Forward selection (simple one + variant) 2. Backward elimination Use modern model selection procedures, usually in the form of penalized likelihood (can handle p > n); New research area. Slide 344

2 Use LRT for nested models (e.g., Table 6.2) Use AIC (Akaike information criterion) or BIC (Bayesian information criterion) for model selection (not necessarily nested models) Smaller AIC/BIC, the better. AIC = 2{l max p} BIC = 2{l max 0.5 log(n)p} Note: BIC tends to yield a simpler model than AIC. Use common sense in model building (e.g. time ordering, etc. Table 6.3). Slide 345

3 I.2 Model Diagnostics Use standardized residuals to check model fit and identify outliers: y i x i ind Bin(n i, π i ) logit(π i ) = x T i β π i = 1. Standardized Pearson residual: ext i b β 1 + e xt i b β e i = y i π i ni π i (1 π i ) e st i = e i 1 hi Slide 346

4 2. Standardized deviance residual: ( d i = 2 y i log y i + (n i y i ) log n ) i y i n i π i n i n i π i d i = d i sign(y i π i ) d st i = d i 1 hi If e st i Plots of e st i st (or d i ) > 2, 3 outliers. (or When n i = 1, e st i d st i ) v.s. x i or x T i β may detect lack of fit. (or st d i ) not very informative. Note: Proc Logistic does not report e st Proc GenMod to get e st i and d st i. i and st d i. Need to use Slide 347

5 Example 1: Residual plot for the crab data: Model: logit(p[y = 1 x, c]) = β 0 + β 1 c 1 + β 2 c 2 + β 3 c 3 + β 4 x data crab; input color spine width satell weight; weight=weight/1000; color=color-1; satbin=(satell>0); c1 = (color=1); c2 = (color=2); c3 = (color=3); c4 = (color=4); s1 = (spine=1); s2 = (spine=2); datalines; proc genmod data=crab descending; model satbin = width c1 c2 c3 / dist=bin link=logit; output out=resid ResRaw=ResRaw ResChi=ResChi StdReschi=StdReschi; run; data _null_; set resid; file "crab_res"; put stdreschi width; run; Slide 348

6 Standardized Pearson Residual Plot for Crab Data Standardized Pearson Residual Carapace Width Slide 349

7 Example 2: Heart disease and bloop pressure (Table 6.5, P. 217) data HD; input bp $ n y; if bp="<117" then x=111.5; else if bp=" " then x=121.5; else if bp=" " then x=131.5; else if bp=" " then x=141.5; else if bp=" " then x=151.5; else if bp=" " then x=161.5; else if bp=" " then x=176.5; else x=191.5; cards; < > ; proc genmod; model y/n = x /dist=bin link=logit residual; run; Slide 350

8 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Intercept x Raw Pearson Deviance Observation Residual Residual Residual Std Deviance Std Pearson Likelihood Residual Residual Residual Slide 351

9 Example 3: Admission to Graduate School at UF in (Table 6.6) Let π(k, g) = P[admission D = k, G = g] for department D = k and gender G = g. We consider three models: 1. π(k, g) = D k : Admission is independent of gender at each department. 2. π(k, g) = D k + G g : Admission-Gender association is the same across departments. 3. π(k, g) = G g : Get the marginal Admission-Gender association collapsed over departments. options ls=75 ps=100; data admit; input dept $ gender y yno; n = y+yno; male=gender-1; cards; anth anth astr astr Slide 352

10 chem chem title "Model 1: Logistic model assuming gender and admission are"; title2 "conditional independent given department"; proc genmod; class dept; model y/n = dept /dist=bin link=logit; output out=resid Resraw=Resraw Reschi=Reschi StdReschi=StdReschi; run; data resid; set resid; keep dept male Resraw Reschi StdReschi; run; title "Residuals from Model 1"; proc print data=resid; run; title "Model 2: Logistic model with homogeneous GA and DA association"; proc genmod data=admit; class dept; model y/n = dept male; run; title "Model 3: Logistic model for marginal GA association"; proc genmod data=admit; model y/n = male; run; Slide 353

11 Part of the output: Model 1: Logistic model assuming gender and admission are 1 conditional independent given department Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Std Obs dept male Reschi Resraw Reschi 1 anth anth astr astr chem chem clas clas comm comm comp comp engl engl geog geog geol geol germ germ Slide 354

12 21 hist hist lati lati ling ling math math phil phil phys phys poli poli psyc psyc reli reli roma roma soci soci stat stat zool zool Model 2: Logistic model with homogeneous GA and DA association 4 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Slide 355

13 Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Wald Parameter DF Estimate Error Confidence Limits Chi-Square Intercept dept anth dept astr male Model 3: Logistic model for marginal GA association 6 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Intercept male Models 2 & 3 show Simpson s Paradox. Slide 356

14 II Inference on The Conditional Association in 2 2 K Tables Example: Multi-center clinical trial evaluating a cream in curing skin infection (Table 6.9, P.226) S F S F S F S F trt control Z = 1 Z = 2 Z = 3 Z = 4 S F S F S F S F trt control Z = 5 Z = 6 Z = 7 Z = 8 What we observed: There is a lot of variation in success probabilities among centers. Slide 357

15 If we collapse the tables over centers, we got: S Y F X trt θ XY = control The above estimate θ XY may not be very useful since this is not a random sample, so we cannot use the famous formula for calculating the variance of log θ XY : var(log θ XY ) Should focus on conditional association! Slide 358

16 II.1 Testing Conditional Independence between X and Y Given Z (H 0 : X Y Z) 1. Method 1: Use logistic model with ML inference (good when K is fixed, small moderate) Let Y = 1 for success, 0 for failure x = 1 for treatment, 0 for control z = 1, 2,..., 8 for centers π(x, z) = P[Y = 1 x, z] and consider the (homogeneous) model: logitπ(x, z = k) = βx + β z k ( ) common odds-ratio model: π(x = 1, z = k)/{1 π(x = 1, z = k)} π(x = 0, z = k)/{1 π(x = 0, z = k)} = eβ Slide 359

17 π(x = 0, z = k)/{1 π(x = 0, z = k)} = e βz k Under this model, H 0 : β = 0 H 0 : X Y Z. data table6_9; input center trt y y0; n=y+y0; cards; title "Use homogeneous model to test no treatment effect at each center"; proc logistic; class center / param=ref; model y/n = center trt / selection=f include=1 slentry=1; run; Use homogeneous model to test no treatment effect at each center 1 The LOGISTIC Procedure The following effects will be included in each model: Intercept center Step 0. The INCLUDE effects were entered. Model Fit Statistics Slide 360 Intercept Intercept and

18 Step 1. Effect trt entered: Criterion Only Covariates -2 Log L Residual Chi-Square Test Chi-Square DF Pr > ChiSq Model Fit Statistics Intercept Criterion Intercept Only and Covariates -2 Log L Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept center center center center <.0001 center center center trt Slide 361

19 Three Tests for H 0 : β = 0: (a) Score test: χ 2 = , df = 1, P = (b) LRT: G 2 = = 6.669, df = 1, P = (c) Wald test: χ 2 = , P = Strong evidence to reject H 0 : β = 0. β = , e bβ = 2.17 At each center, the odds of success (infection is cured) for treated patients is 2.17 times the odds of success for untreated patients. Note 1: The above test results are based on the homogeneous model (*). When β = 0, model (*) reduces to logitπ(x, z = k) = β z k to H 0 : X Y Z, can be tested by conducting the GOF test for this model. Slide 362

20 title "Use goodness-of-fit statistics to test conditional independence"; Proc genmod; class center; model y/n = center; run; *************************************************************************** Use goodness-of-fit statistics to test conditional independence 3 Response Profile The GENMOD Procedure Ordered Binary Total Value Outcome Frequency 1 Event Nonevent 171 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X χ 2 = 13.71, df = 16 8 = 8, P = G 2 = 16.42, df = 8, P = Less powerful. Slide 363

21 Note 2: We can also test the adequacy of the homogeneous model (*) using its GOF statistics: title "Use goodness-of-fit statistics to test homogeneity"; Proc genmod; class center; model y/n = center trt; run; *************************************************************************** Use goodness-of-fit statistics to test homogeneity 4 The GENMOD Procedure Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X χ 2 = , df = 7, P = 0.33 G 2 = , df = 7, P = 0.20; adequate fit. Slide 364

22 2. Method 2. Use Cochran-Mental-Haenszel (CMH) test for H 0 : X Y Z (good when K or K is fixed but n ++k ) The above analysis assuming N = 2 K = 2 8 = 16 is fixed may be problematic in many situations. One way to test X Y Z is to use the CMH test: X trt n 11k n 12k n 1+k S control n 21k n 22k n 2+k Y F n +1k n +2k Z = k Slide 365

23 Under H 0 : X Y Z, n 11k n 1+k, n +1k hypergeometric distribution: E(n 11k H 0, n 1+k, n +1k ) = n 1+kn +1k n ++k = µ 11k, var(n 11k H 0, n 1+k, n +1k ) = n 1+kn 2+k n +1k n +2k n 2 ++k (n ++k 1). χ 2 = [ K k=1 (n 11k µ 11k )] 2 K k=1 var(n 11k H 0, n 1+k, n +1k ) H 0 χ 2 1. CMH with correction: χ 2 c = { K k=1 (n 11k µ 11k ) 0.5} 2 K k=1 var(n 11k H 0, n 1+k, n +1k ) H 0 χ 2 1. The CMH does not require the homogeneous model. Slide 366

24 data y1; set table6_9; count=y; drop y0; y=1; run; data y0; set table6_9; count=y0; drop y0; y=0; run; data new; set y1 y0; run; title "MH test for conditional independence and MH common OR"; proc freq data=new order=data; weight count; tables center*trt*y/nopercent norow nocol cmh; run; ***************************************************************************** MH test for conditional independence and MH common OR 8 The FREQ Procedure Summary Statistics for trt by y Controlling for center Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob Nonzero Correlation Row Mean Scores Differ General Association Slide 367

25 Estimates of the Common Relative Risk (Row1/Row2) Type of Study Method Value 95% Confidence Limits Case-Control Mantel-Haenszel (Odds Ratio) Logit ** Cohort Mantel-Haenszel (Col1 Risk) Logit ** Cohort Mantel-Haenszel (Col2 Risk) Logit ** These logit estimators use a correction of 0.5 in every cell of those tables that contain a zero. Breslow-Day Test for Homogeneity of the Odds Ratios Chi-Square DF 7 Pr > ChiSq CMH χ 2 = , df = 1, P = MH Common odds-ratio estimate θ MH = with 95% CI [1.1776, ]. Breslow-Day Test for common odds-ratio: χ 2 = , df = 7, P = , similar to the GOF test. Slide 368

26 3. Method 3: Use a conditional logistic regression under homogeneous model (*) (good even when K ): logitπ(x, k) = xβ + β k. Problem: # of β k s may ; want to get rid of them. Idea: find out sufficient statistics of β k and conduct inference on β based on the conditional distribution of the data given those sufficient statistics. Data from center k: X trt n 11k n 12k n 1+k S control n 21k n 22k n 2+k Y Z = k F Slide 369

27 Given n 11k n 1+k Bin(n 1+k, π(1, k)), n 21k n 2+k Bin(n 2+k, π(0, k)), we got the likelihood function of β and (β 1,..., β K ): L(β, β 1,..., β K ) = K k=1 L k (β, β k ) where L k (β, β k ) is the likelihood contributed by the data from center Z = k: L k (β, β k ) = {π(1, k)} n 11k {1 π(1, k)} n 12k {π(0, k)} n 21k {1 π(0, k)} n 22k, π(1, k) = π(0, k) = e β+β k 1 + e β+β k e β k 1 + e β k Slide 370

28 L k (β, β k ) = ( e β+β k 1 + e β+β k ) n11k ( e β+β k ) n12k ( e β k ) n21k ( 1 ) n22k 1 + e β k 1 + e β k = e βn 11k+β k (n 11k +n 21k ) (1 + e β+β k ) n 11k +n 12k(1 + e β k) n 21k +n 22k = e βn 11k+β k n +1k (1 + e β+β k ) n 1+k(1 + e β k) n 2+k Since n 1+k and n 2+k are fixed already, so n +1k = n 11k + n 21k (total # of successes in center k) is a sufficient statistic for β k. L k (β, β k n +1k ) should be free of β k noncentral hypergeometric dist. Slide 371

29 The conditional logistic inference (on β) is based on the conditional likelihood: L c (β {n +1k }) = K k=1 L k (β, β k n +1k ), which only has one parameter β no matter how large K is! Treat this as a regular likelihood function, we can estimate β by maximizing L c (β {n +1k }). We can also conduct the Wald, score and LRT for testing H 0 : β = 0. Slide 372

30 SAS program and output: title "Use a conditional logistic regression to assess treatment effect"; proc logistic; class center; model y/n = trt; strata center; run; *************************************************************************** Use a conditional logistic regression to assess treatment effect 5 The LOGISTIC Procedure Conditional Analysis Model Information Data Set WORK.TABLE6_9 Response Variable (Events) y Response Variable (Trials) n Number of Strata 8 Model binary logit Optimization Technique Newton-Raphson ridge Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald Slide 373

31 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq trt β = (SE = ), e bβ = 2.13, similar to before since K = 8 is small. LRT G 2 = , Score χ 2 = , Wald χ 2 = Reject H 0 : β = 0. Note 1: Score χ 2 stat using L c (β {n +1k }) is equivalent to CMH χ 2, Note 2: We can make exact conditional inference for a regression coefficient in a regular regression model using the same idea. Y i = 1/0 for success/failure, covariates: x i1, x i2,..., x ip. π(x i ) = P[Y i = 1 x i ] Slide 374

32 Model: logit{π(x i )} = β 1 x i1 + β 2 x i2 + + β p x ip We can find out suff. stat. for each β k, denoted by T k. Suppose we would like to make exact conditional inference on, β p, say, then the exact inference can be based on f(y 1, y 2,..., y n T 1, T 2,..., T p 1 ) = L(β p ). For exact test of H 0 : β p = 0, the cond. dist. of data (Y 1, Y 2,..., Y n ) given T 1, T 2,..., T p 1 is completely known. We can do exact score test based on L(β p ). We can also construct an exact CI for β p based on L(β p ). Software: Proc Logistic descending; model y = x1 x2 x3 / link=logit; exact x3; run; Slide 375

33 Warning: It is usually very time consuming to conduct the exact inference, especially for non-sparse data, in which case no exact inference is needed. Note 3: If we apply the above procedure to our homogeneous model (*) logitπ(x, k) = xβ + β k, we can make exact conditional inference on the treatment effect β. In this case L(β) is the conditional likelihood we got before using the conditional logistic approach. Therefore, we will get exact CMH test for H 0 : β = 0. title "Exact p-value for MH test of no treatment effect at each center"; proc logistic data=table6_9; class center / param=ref; model y/n = center trt; exact trt; run; *************************************************************************** Exact p-value for MH test of no treatment effect at each center 9 The LOGISTIC Procedure Slide 376

34 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept center center center center <.0001 center center center trt p-value --- Effect Test Statistic Exact Mid trt Score Probability We can see that is the CMH χ 2, which is the score stat. based on L(β) (row 1). We can also conduct Fisher exact test on H 0 : β = 0 using table prob. (row 2). Slide 377

35 4. Method 4. Use mixed model approach (good when K as n ): logitπ(x, k) = xβ + β k. Data from center k: Y 1 0 X 1 n 11k n 12k n 1+k 0 n 21k n 22k n 2+k Z = k Here 8 centers is probably a random sample of centers drawn from a large population of centers. Then the analysis should take this into account clustered data. β k log odds of being a success for patients in center k if they all receive the control treatment. It reflects the general healthy status of patients in center k. Slide 378

36 Since center k is randomly sampled, it is reasonable to assume β k is a random variable and has a distribution. A commonly used dist. is β k N(µ, σ 2 ). Let b k = β k µ, then b k N(0, σ 2 ) and our model becomes: logitπ(x, k) = µ + xβ + b k. Only 3 model parameters: µ, β and σ 2. The likelihood function of (µ, β, σ 2 ): L(µ, β, σ 2 ) = K k=1 f(n 11k b k )f(n 21k b k )f(b k )db k. The inference on β is based on L(µ, β, σ 2 ). Slide 379

37 SAS program and output: title "Proc glimmix treating center effect as random"; proc glimmix method=quad data=table6_9; class center; model y/n = trt / s dist=bin; random int / subject=center type=vc; run; ****************************************************************** Proc glimmix treating center effect as random 12 Data Set Response Variable (Events) Response Variable (Trials) Response Distribution Link Function Variance Function Variance Matrix Blocked By Estimation Technique Likelihood Approximation Degrees of Freedom Method The GLIMMIX Procedure Model Information Class Level Information Class Levels Values WORK.TABLE6_9 y n Binomial Logit Default center Maximum Likelihood Gauss-Hermite Quadrature Containment center Number of Observations Read 16 Slide 380

38 Number of Observations Used 16 Number of Events 102 Number of Trials 273 Iteration History Objective Max Iteration Restarts Evaluations Function Change Gradient E-6 Convergence criterion (GCONV=1E-8) satisfied. Fit Statistics -2 Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) CAIC (smaller is better) HQIC (smaller is better) Fit Statistics for Conditional Distribution -2 log L(y r. effects) Slide 381

39 Pearson Chi-Square 8.37 Pearson Chi-Square / DF 0.52 Covariance Parameter Estimates Standard Cov Parm Subject Estimate Error Intercept center Solutions for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Intercept trt From the output, we see µ = β = (SE = ), e bβ = 2.1. σ 2 = , variation in log odds of success among centers. Huge variation. Since the success prob. for patients receiving control at center k is π 0 k = π(0, k) = Slide 382 eµ+b k 1 + e µ+b k

40 and the success prob. for patients receiving treatment at center k is π 1 k = π(1, k) = eµ+β+b k 1 + e µ+β+b k, we can generate a random sample {b k } s to get a feeling on the distributions of π 0 k and π1 k π 0 = Ê(π0 k ) = 0.29, π1 = Ê(π1 k ) = 0.42 θ XY = R function: postscript(file="cream-prob.ps", horizontal = F) par(mfrow=c(1,2), pty="s") b <- rnorm(10000, 0, sqrt(1.9591)) expeta0 <- exp( b) expeta1 <- exp( b) pi0 <- expeta0/(1+expeta0) pi1 <- expeta1/(1+expeta1) mean0 <- mean(pi0) mean1 <- mean(pi1) hist(pi0, main="histogram of pi_0") hist(pi1, main="histogram of pi_1") dev.off() Slide 383

41 Histogram of pi_0 Histogram of pi_1 Frequency Frequency pi pi1 Slide 384

42 II.2 Estimation of The Common Odds-ratio in 2 2 K Tables Each of the above methods provides an estimate of the common odds-ratio in 2 2 K tables, except the CMH method (Method 2). There is also an MH estimate of the common odds-ratio θ MH = K k=1 K k=1 n 11k n 22k n ++k n 12k n 21k n ++k Motivation of θ MH : We could estimate θ using the data from the kth table as: θ = n 11kn 22k n 12k n 21k Slide 385

43 Estimating equation: θn 12k n 21k = n 11k n 22k θn 12k n 21k /n ++k = n 11k n 22k /n ++k K K θ n 12k n 21k /n ++k = n 11k n 22k /n ++k k=1 k=1 θmh = K k=1 K k=1 n 11k n 22k n ++k. n 12k n 21k n ++k CDA provides a variance formula of log( θ MH ) on P. 229, can be used to construct CI s for the common odds-ratio θ. Slide 386

44 For our cream example, we have θ MH = = See Method 2 in the previous section for SAS program and output. Slide 387

45 III Summarizing Predictive Power, Classification Tables and ROC Curves (P. 223) Suppose we have binary response Y i = 1/0 (success/failure), x i a vector of covariates. π(x i ) = P[Y i = 1 x i ] logit{π(x i )} = x T i β After we fit the model, we got β we got π i as π i = ext i b β 1 + e xt i b β. Choose a known value π 0 (e.g., π 0 = 0.5), and conduct prediction Ŷ i as 1 if π i > π 0 Ŷ i = 0 otherwise Slide 388

46 and then construct the table (classification table) Ŷ 1 0 Y 1 n 11 n 12 0 n 21 n 22 The following two quantities tell us how good the prediction is: sensitivity = n 11 n 11 +n 12 specificity = n 22 n 21 +n 22 Using only one table with one π 0 loses information. Solution: use many different values of π 0 many classification tables many pairs of sensitivity and specificity plot sensitivity v.s. 1 specificity ROC (receiver operating characteristic curve Area under the ROC curve summarizes the predictive power of the model, often called the c-index. Slide 389

47 An example: Y bπ Y0.3 b Y0.4 b Y0.5 b Y0.6 b Y0.7 b Y0.8 b Y0.9 b by Y se = 3 3 se = 3 3 se = 2 3 se = 2 3 se = 1 3 se = 1 3 se = 0 3 sp = 0 3 sp = 1 3 sp = 1 3 sp = 2 3 sp = 2 3 sp = 3 3 sp = 3 3 Slide 390

48 ROC curve for the example Sensitivity Specificity Slide 391

49 The AUC for the above ROC curve: = 2 3 = proportion of concordant pairs in (Y i, π i ) among all pairs with different outcome Y i. # of pairs with different outcomes: 3 3 = 9. # of concordant pairs: = 6. If there are ties in π i s, need to do some adjustment. For example, suppose two π i for a Y i = 1 and a Y i = 0 are the same (0.4): Slide 392

50 Y bπ Y0.4 b Y0.5 b Y0.6 b Y0.7 b Y0.8 b Y0.9 b The corresponding classification tables are: by Y se = 3 3 se = 2 3 se = 2 3 se = 1 3 se = 1 3 se = 0 3 sp = 0 3 sp = 1 3 sp = 2 3 sp = 2 3 sp = 3 3 sp = 3 3 Slide 393

51 ROC curve when there are tied predictive probs Sensitivity Specificity Slide 394

52 AUC = = # of pairs with diff outcomes 5.5 = # of concordant pairs (5) # of ties in π i s with diff. outcomes (1). Slide 395

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials. The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information

Section Poisson Regression

Section Poisson Regression Section 14.13 Poisson Regression Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 26 Poisson regression Regular regression data {(x i, Y i )} n i=1,

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples ST3241 Categorical Data Analysis I Logistic Regression An Introduction and Some Examples 1 Business Applications Example Applications The probability that a subject pays a bill on time may use predictors

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

(c) Interpret the estimated effect of temperature on the odds of thermal distress. STA 4504/5503 Sample questions for exam 2 1. For the 23 space shuttle flights that occurred before the Challenger mission in 1986, Table 1 shows the temperature ( F) at the time of the flight and whether

More information

Chapter 14 Logistic regression

Chapter 14 Logistic regression Chapter 14 Logistic regression Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 62 Generalized linear models Generalize regular regression

More information

Chapter 4: Generalized Linear Models-I

Chapter 4: Generalized Linear Models-I : Generalized Linear Models-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm Kedem, STAT 430 SAS Examples: Logistic Regression ==================================== ssh abc@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm a. Logistic regression.

More information

STAT 705: Analysis of Contingency Tables

STAT 705: Analysis of Contingency Tables STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence ST3241 Categorical Data Analysis I Two-way Contingency Tables Odds Ratio and Tests of Independence 1 Inference For Odds Ratio (p. 24) For small to moderate sample size, the distribution of sample odds

More information

2 Describing Contingency Tables

2 Describing Contingency Tables 2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random

More information

BIOS 625 Fall 2015 Homework Set 3 Solutions

BIOS 625 Fall 2015 Homework Set 3 Solutions BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's

More information

Count data page 1. Count data. 1. Estimating, testing proportions

Count data page 1. Count data. 1. Estimating, testing proportions Count data page 1 Count data 1. Estimating, testing proportions 100 seeds, 45 germinate. We estimate probability p that a plant will germinate to be 0.45 for this population. Is a 50% germination rate

More information

3 Way Tables Edpsy/Psych/Soc 589

3 Way Tables Edpsy/Psych/Soc 589 3 Way Tables Edpsy/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring 2017

More information

Short Course Introduction to Categorical Data Analysis

Short Course Introduction to Categorical Data Analysis Short Course Introduction to Categorical Data Analysis Alan Agresti Distinguished Professor Emeritus University of Florida, USA Presented for ESALQ/USP, Piracicaba Brazil March 8-10, 2016 c Alan Agresti,

More information

dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" = -/\<>*"; ODS LISTING;

dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR= = -/\<>*; ODS LISTING; dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" ---- + ---+= -/\*"; ODS LISTING; *** Table 23.2 ********************************************; *** Moore, David

More information

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; SAS Analysis Examples Replication C8 * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; libname ncsr "P:\ASDA 2\Data sets\ncsr\" ; data c8_ncsr ; set ncsr.ncsr_sub_13nov2015

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Regression modeling for categorical data. Part II : Model selection and prediction

Regression modeling for categorical data. Part II : Model selection and prediction Regression modeling for categorical data Part II : Model selection and prediction David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625 http://math.agrocampus-ouest.fr/infogluedeliverlive/membres/david.causeur

More information

Inference for Binomial Parameters

Inference for Binomial Parameters Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

Three-Way Contingency Tables

Three-Way Contingency Tables Newsom PSY 50/60 Categorical Data Analysis, Fall 06 Three-Way Contingency Tables Three-way contingency tables involve three binary or categorical variables. I will stick mostly to the binary case to keep

More information

Models for Binary Outcomes

Models for Binary Outcomes Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

More information

Some comments on Partitioning

Some comments on Partitioning Some comments on Partitioning Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/30 Partitioning Chi-Squares We have developed tests

More information

Epidemiology Wonders of Biostatistics Chapter 13 - Effect Measures. John Koval

Epidemiology Wonders of Biostatistics Chapter 13 - Effect Measures. John Koval Epidemiology 9509 Wonders of Biostatistics Chapter 13 - Effect Measures John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being covered 1. risk factors 2. risk

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only Moyale Observed counts 12:28 Thursday, December 01, 2011 1 The FREQ Procedure Table 1 of by Controlling for site=moyale Row Pct Improved (1+2) Same () Worsened (4+5) Group only 16 51.61 1.2 14 45.16 1

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

Example 7b: Generalized Models for Ordinal Longitudinal Data using SAS GLIMMIX, STATA MEOLOGIT, and MPLUS (last proportional odds model only)

Example 7b: Generalized Models for Ordinal Longitudinal Data using SAS GLIMMIX, STATA MEOLOGIT, and MPLUS (last proportional odds model only) CLDP945 Example 7b page 1 Example 7b: Generalized Models for Ordinal Longitudinal Data using SAS GLIMMIX, STATA MEOLOGIT, and MPLUS (last proportional odds model only) This example comes from real data

More information

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

n y π y (1 π) n y +ylogπ +(n y)log(1 π). Tests for a binomial probability π Let Y bin(n,π). The likelihood is L(π) = n y π y (1 π) n y and the log-likelihood is L(π) = log n y +ylogπ +(n y)log(1 π). So L (π) = y π n y 1 π. 1 Solving for π gives

More information

Chapter 11: Analysis of matched pairs

Chapter 11: Analysis of matched pairs Chapter 11: Analysis of matched pairs Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 42 Chapter 11: Models for Matched Pairs Example: Prime

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Stat 704: Data Analysis I, Fall 2010

Stat 704: Data Analysis I, Fall 2010 Stat 704: Data Analysis I, Fall 2010 Generalized linear models Generalize regular regression to non-normal data {(Y i,x i )} N i=1, most often Bernoulli or Poisson Y i. The general theory of GLMs has been

More information

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study 1.4 0.0-6 7 8 9 10 11 12 13 14 15 16 17 18 19 age Model 1: A simple broken stick model with knot at 14 fit with

More information

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal

More information

Appendix: Computer Programs for Logistic Regression

Appendix: Computer Programs for Logistic Regression Appendix: Computer Programs for Logistic Regression In this appendix, we provide examples of computer programs to carry out unconditional logistic regression, conditional logistic regression, polytomous

More information

CHAPTER 1: BINARY LOGIT MODEL

CHAPTER 1: BINARY LOGIT MODEL CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual

More information

Logistic Regression Analyses in the Water Level Study

Logistic Regression Analyses in the Water Level Study Logistic Regression Analyses in the Water Level Study A. Introduction. 166 students participated in the Water level Study. 70 passed and 96 failed to correctly draw the water level in the glass. There

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Analysis of Categorical Data Three-Way Contingency Table

Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 1/17 Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 2/17 Outline Three way contingency tables Simpson s paradox Marginal vs. conditional independence Homogeneous

More information

Ordinal Variables in 2 way Tables

Ordinal Variables in 2 way Tables Ordinal Variables in 2 way Tables Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 C.J. Anderson (Illinois) Ordinal Variables

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

R Hints for Chapter 10

R Hints for Chapter 10 R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

STAT 705 Generalized linear mixed models

STAT 705 Generalized linear mixed models STAT 705 Generalized linear mixed models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 24 Generalized Linear Mixed Models We have considered random

More information

Regression modeling for categorical data. Part II : Model selection and prediction

Regression modeling for categorical data. Part II : Model selection and prediction Regression modeling for categorical data Part II : Model selection and prediction David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625 http://math.agrocampus-ouest.fr/infogluedeliverlive/membres/david.causeur

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Models for binary data

Models for binary data Faculty of Health Sciences Models for binary data Analysis of repeated measurements 2015 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen 1 / 63 Program for

More information

Chapter 1. Modeling Basics

Chapter 1. Modeling Basics Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

Chapter 11: Models for Matched Pairs

Chapter 11: Models for Matched Pairs : Models for Matched Pairs Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Chapter 14 Logistic and Poisson Regressions

Chapter 14 Logistic and Poisson Regressions STAT 525 SPRING 2018 Chapter 14 Logistic and Poisson Regressions Professor Min Zhang Logistic Regression Background In many situations, the response variable has only two possible outcomes Disease (Y =

More information

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials CHL 55 H Crossover Trials The Two-sequence, Two-Treatment, Two-period Crossover Trial Definition A trial in which patients are randomly allocated to one of two sequences of treatments (either 1 then, or

More information

Topic 23: Diagnostics and Remedies

Topic 23: Diagnostics and Remedies Topic 23: Diagnostics and Remedies Outline Diagnostics residual checks ANOVA remedial measures Diagnostics Overview We will take the diagnostics and remedial measures that we learned for regression and

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Lecture 2: Poisson and logistic regression

Lecture 2: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Chapter 4: Generalized Linear Models-II

Chapter 4: Generalized Linear Models-II : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

E509A: Principle of Biostatistics. (Week 11(2): Introduction to non-parametric. methods ) GY Zou.

E509A: Principle of Biostatistics. (Week 11(2): Introduction to non-parametric. methods ) GY Zou. E509A: Principle of Biostatistics (Week 11(2): Introduction to non-parametric methods ) GY Zou gzou@robarts.ca Sign test for two dependent samples Ex 12.1 subj 1 2 3 4 5 6 7 8 9 10 baseline 166 135 189

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Matched Pair Data. Stat 557 Heike Hofmann

Matched Pair Data. Stat 557 Heike Hofmann Matched Pair Data Stat 557 Heike Hofmann Outline Marginal Homogeneity - review Binary Response with covariates Ordinal response Symmetric Models Subject-specific vs Marginal Model conditional logistic

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Lecture 5: Poisson and logistic regression

Lecture 5: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Analyzing Residuals in a PROC SURVEYLOGISTIC Model

Analyzing Residuals in a PROC SURVEYLOGISTIC Model Paper 1477-2017 Analyzing Residuals in a PROC SURVEYLOGISTIC Model Bogdan Gadidov, Herman E. Ray, Kennesaw State University ABSTRACT Data from an extensive survey conducted by the National Center for Education

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S Logistic regression analysis Birthe Lykke Thomsen H. Lundbeck A/S 1 Response with only two categories Example Odds ratio and risk ratio Quantitative explanatory variable More than one variable Logistic

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Analysis of Count Data A Business Perspective George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Overview Count data Methods Conclusions 2 Count data Count data Anything with

More information

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla. Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Generalized Models: Part 1

Generalized Models: Part 1 Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information