Logistic Regression Analyses in the Water Level Study

Similar documents
Testing Independence

Simple logistic regression

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Binary Dependent Variables

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

Models for Binary Outcomes

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STAT 7030: Categorical Data Analysis

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples

Count data page 1. Count data. 1. Estimating, testing proportions

Logistic Regression Analysis

Sociology 362 Data Exercise 6 Logistic Regression 2

Inference for Binomial Parameters

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Longitudinal Modeling with Logistic Regression

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Logistic regression: Miscellaneous topics

Epidemiology Wonders of Biostatistics Chapter 13 - Effect Measures. John Koval

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

SAS Analysis Examples Replication C11

Lecture 12: Effect modification, and confounding in logistic regression

Multinomial Logistic Regression Models

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Binary Logistic Regression

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

COMPLEMENTARY LOG-LOG MODEL

9 Generalized Linear Models

Table 1: Fish Biomass data set on 26 streams

Some comments on Partitioning

Investigating Models with Two or Three Categories

The material for categorical data follows Agresti closely.

A novel method for testing goodness of fit of a proportional odds model : an application to an AIDS study

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

13.1 Categorical Data and the Multinomial Experiment

ssh tap sas913, sas

Three-Way Contingency Tables

16.400/453J Human Factors Engineering. Design of Experiments II

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Appendix: Computer Programs for Logistic Regression

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Advanced Quantitative Data Analysis

AMS 7 Correlation and Regression Lecture 8

Psych 230. Psychological Measurement and Statistics

Lab 11. Multilevel Models. Description of Data

An ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they represent.

2.1 Linear regression with matrices

STA6938-Logistic Regression Model

Lecture (chapter 13): Association between variables measured at the interval-ratio level

6 Applying Logistic Regression Models

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Model Estimation Example

BIOS 625 Fall 2015 Homework Set 3 Solutions

CHAPTER 1: BINARY LOGIT MODEL

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

BIOSTATS Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Lecture 2: Poisson and logistic regression

8 Nominal and Ordinal Logistic Regression

Chapter 1 Statistical Inference

Review of Multiple Regression

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Nonlinear Econometric Analysis (ECO 722) : Homework 2 Answers. (1 θ) if y i = 0. which can be written in an analytically more convenient way as

E509A: Principle of Biostatistics. GY Zou

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Lecture 4: Generalized Linear Mixed Models

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

BIOMETRICS INFORMATION

Testing and Model Selection

Statistical Methods in Clinical Trials Categorical Data

Booklet of Code and Output for STAC32 Final Exam

Formulas and Tables by Mario F. Triola

Formulas and Tables for Elementary Statistics, Eighth Edition, by Mario F. Triola 2001 by Addison Wesley Longman Publishing Company, Inc.

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

Statistics Handbook. All statistical tables were computed by the author.

Independent Samples t tests. Background for Independent Samples t test

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Introducing Generalized Linear Models: Logistic Regression

Lecture 5: Poisson and logistic regression

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements:

In many situations, there is a non-parametric test that corresponds to the standard test, as described below:

Introduction to SAS proc mixed

General Linear Model (Chapter 4)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Transcription:

Logistic Regression Analyses in the Water Level Study A. Introduction. 166 students participated in the Water level Study. 70 passed and 96 failed to correctly draw the water level in the glass. There were two main research questions: 1. Why was the passing rate so low? What factors affect passing? 2. There was a major difference in the proportion of females and males who passed? Can some of the variables in the study explain this? B. Frequency Tables: 1. Tabl e o f y by se x Frequency Percent Sex Row Pct Col Pct female male Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 75 21 96 29.91% of females passed Fail 45.18 12.65 57.83 (32/107) 78.13 21.88 70.09 35.59 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 32 38 70 64.41% of males passed Pass 19.28 22.89 42.17 (38/59) 45.71 54.29 29.91 64.41 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 107 59 166 64.46 35.54 100.00 Statistics for Table of y by sex Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 18.5617 <.0001 Likelihood Ratio Chi-Square 1 18.6578 <.0001 2. Table o f y by gravi ty Frequency Percent Gravity Score Row Pct Col Pct 0 1 2 3 4 5 Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Fail 10 23 18 18 23 4 96 6.02 13.86 10.84 10.84 13.86 2.41 57.83 10.42 23.96 18.75 18.75 23.96 4.17 100.00 92.00 64.29 58.06 53.49 13.79 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Pass 0 2 10 13 20 25 70 0.00 1.20 6.02 7.83 12.05 15.06 42.17 0.00 2.86 14.29 18.57 28.57 35.71 0.00 8.00 35.71 41.94 46.51 86.21 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 10 25 28 31 43 29 166 6.02 15.06 16.87 18.67 25.90 17.47 100.00 Statistics for Table of y by gravity Page 2 Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 5 43.1342 <.0001 Likelihood Ratio Chi-Square 5 50.7636 <.0001 C. Logistic Regression of Pass/Fail in Water Level Study on Sex β 0 + β 1, for females : ln{π(sex)/[1-π(sex)]= β 0 + β 1 *(sex) =

β 0, for males options ls=72; Note 1: Females are coded 1 data sex; Males are coded 0 input sex r n @@; Note 2: Frequency counts are used cards; 1 32 107 0 38 59 ; Proc Logistic ; r/n=sex ; output out=pred p=phat lower=lcl upper=ucl; proc print; run; Output: WORK.SEX Response Variable (Events) r Response Variable (Trials) n Number of Observations 2 Response Profile Ordered Binary Total Value Outcome Frequency 1 Event 70 2 Nonevent 96 and AIC 228.036 211.378 SC 231.148 217.602-2 Log L 226.036 207.378 Page 3 Likelihood Ratio 18.6578 1 <.0001 Score 18.5617 1 <.0001 Wald 17.6086 1 <.0001 Test H 0 : No sex effect or H 0 : β 1 = 0 vs. H a : β 1 0. G 2 = 18.6578 = LRT Reject H 0 : No sex effect and conclude there is a statistically significant difference between females and males in proportion passing the task. 1 0.5931 0.2719 4.7572 0.0292 sex 1-1.4446 0.3443 17.6086 <.0001 Fitted : fitted logit(females) = 0.5931 1.4446 = -0.8515 for females fitted logit(males) = 0.5931 for males

Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits sex 0.236 0.120 0.463 Odds ratio (females vs. males) = s -1.4446 = 0.236 Obs sex r n phat lcl ucl 1 1 32 107 0.29911 0.22005 0.39229 2 0 38 59 0.64407 0.51503 0.75510 Odds ratio (males vs females): Pass Fail Males 38 21 Females 32 75 Odds Ratio = (38)(75)/(21)(32 = 4.24 = s 1.4446 D. Logistic Regression of Pass/Fail in Water Level Study on x = Gravity : ln π(x)/[1-π(x)]. SAS Program: Page 4 options ls=72; data gravity; input gravity r n @@; cards; 0 0 10 1 2 25 2 10 28 3 13 31 4 20 43 5 25 29 ; Proc Logistic ; r/n=gravity ; output out=pred p=phat lower=lcl upper=ucl; proc print; run; WORK.GRAVITY Response Variable (Events) r Response Variable (Trials) n Number of Observations 6 Response Profile Ordered Binary Total Value Outcome Frequency 1 Event 70 2 Nonevent 96 and AIC 228.036 187.859 SC 231.148 194.083-2 Log L 226.036 183.859

Likelihood Ratio 42.1765 1 <.0001 Score 37.8303 1 <.0001 Wald 31.1219 1 <.0001 Test H 0 : No gravity effect or H 0 : β 1 = 0 vs. H a : β 1 0. G 2 = 42.1765 = LRT Reject H 0 : No gravity effect and conclude there is a statistically significant difference between gravity score and proportion passing the task. Page 5 1-2.8155 0.5048 31.1055 <.0001 gravity 1 0.7998 0.1434 31.1219 <.0001 Fitted : Estimated logit[π(x)] = -2.8156+0.7998x Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits gravity 2.225 1.680 2.947 Odds o f passing the water l e ve l task incre ase by 2.225 for e ach additional right answe r on gravity items. gravity r n r/n phat lcl ucl score (pass) (fail) 0 0 10.0000 0.05649 0.02178 0.13871 1 2 25.0800 0.11756 0.06012 0.21719 2 10 28.3571 0.22864 0.15170 0.32945 3 13 31.4194 0 39742 0.31490 0.48622 4 20 43.4651 0.59473 0.49485 0.68733 5 25 29.8621 0.76554 0.64295 0.85550 A graph of observed and fitted proportions is Given above, right. How does the fit look? 0.9 0.8 0.7 0.6 Y-Data 0.5 0.4 0.3 0.2 0.1 Scatterplot of r/n, phat vs gravity Variable r/n phat E. Logistic regression of Pass/Fail on sex and gravity: 0.0 0 1 2 3 gravity 4 5 options ls=72; Note: Females are coded 1 data water; Males are coded 2 input obs y sex gravity ; cards; 1 0 1 4 2 1 2 5............ 166 0 1 4 ; Proc Logistic descending; Y=sex gravity; output out=pred p=phat lower=lcl upper=ucl; proc print;

Proc Logistic; Y=sex gravity; run; Note: descending not specified E1. Logistic Regression of Pass/Fail on Sex and Gravity : logit[π(sex, gravity)] = β 0 + β 1 *(sex) + β 2 *gravity Page 6 (β 0 + β 1 ) + β 2 *gravity, for females = (β 0 + 2β 1 )+ β 2 *gravity, for males WORK.WATER Response Variable y Number of Response Levels 2 Number of Observations 166 Probability modeled is y=1. and AIC 228.036 181.059 SC 231.148 190.395-2 Log L 226.036 175.059 Likelihood Ratio 50.9766 2 <.0001 Score 45.0940 2 <.0001 Wald 35.2414 2 <.0001 Test H 0 : Sex and gravity together do not affect passing the water level task or H 0 : β 1 = β 2 = 0 vs. H a : at least one of the parameters is not 0. G 2 = 50.9766 = LRT Conclude the logistic regression of pass/fail on sex and gravity is statistically significant. 1-4.1676 0.7228 33.2425 <.0001 sex 1 1.1220 0.3824 8.6117 0.0033 gravity 1 0.7404 0.1466 25.4979 <.0001 Estimated logit(sex,gravity) = -4.1676 + 1.1220sex + 0.7404gravity. Note that sex is coded as 1 for females and 2 for males. Test the hypothesis that there is no gravity effect, adjusted for sex Page 7

Calculate the change in G 2 for the models with both variables included and with only sex. G 2 (sex, gravity) - G 2 (sex) = 50.9766 42.1765 = 8.801, or calculate the change in - 2log likelihood: -2ln (sex) - [-2ln(sex, gravity) = 183.859 175.059 = 8.800. compare this value with the Wald chi-square 8.6117. Test the hypothesis that there is no sex effect, adjusted for gravity score: Calculate the change in G 2 for the models with both variables included and with only gravity. G 2 (sex, gravity) - G 2 (sex) = 50.9766 18.6568 = 32.319, or calculate the change in - 2log likelihood: -2ln (gravity) - [-2ln(sex, gravity) = 207.478 175.059. Compare this value with the Wald chi-square 25.4979. Point 95% Wald Effect Estimate Confidence Limits sex 3.071 1.452 6.498 gravity 2.097 1.573 2.795 Odds Ratio Estimates Predicted Values and Confidence Limits for Population Proportions: Obs obs y sex gravity _LEVEL_ phat lcl ucl 1 1 0 1 4 1 0.47898 0.35601 0.60455 2 2 1 2 5 1 0.85548 0.73431 0.92689................... 166 166 0 1 4 1 0.47898 0.35601 0.60455 Edited Fitted Values are given below; a plot of phat vs. gravity for females and for males is given in the graph. Row sex gravity phat lcl ucl 1 1 0 0.04541 0.01658 0.11831 2 1 1 0.09069 0.04332 0.18012 3 1 2 0.17295 0.10478 0.27199 4 1 3 0.30481 0.21613 0.41080 5 1 4 0.47898 0.35601 0.60455 6 1 5 0.65841 0.49314 0.79246 7 2 1 0.23448 0.11133 0.42822 8 2 2 0.39107 0.24091 0.56514 9 2 3 0.57384 0.42507 0.71034 10 2 4 0.73844 0.60244 0.84026 11 2 5 0.85548 0.73431 0.92689 0.9 0.8 0.7 0.6 phat 0.5 0.4 0.3 0.2 0.1 0.0 0 Scatterplot of phat vs gravity 1 2 3 4 gravity 5 sex 1 2 Page 8 E2. Logistic Regression of Pass/Fail on Sex, Gravity and Sex*Gravity (Interaction ) : logit[π(sex, gravity)] = β 0 + β 1 *(sex) + β 2 *gravity+β 3 *(sex*gravity) = (β 0 + β 1 ) + (β 2 + β 3 )gravity, for females

(β 0 + 2β 1 )+ (β 2 +2β 3 )gravity, for males WORK.PRED Response Variable y Number of Response Levels 2 Number of Observations 166 Ordered Total Value y Frequency 1 0 96 2 1 70 Response Profile Probabil ity mode l ed is y= 0. and AIC 228.036 182.944 SC 231.148 195.392-2 Log L 226.036 174.944 Likelihood Ratio 51.0922 3 <.0001 Score 45.1521 3 <.0001 Wald 34.9621 3 <.0001 1 4.6340 1.5633 8.7873 0.0030 sex 1-1.4606 1.0646 1.8822 0.1701 gravity 1-0.8823 0.4452 3.9281 0.0475 sex*gravity 1 0.1026 0.3009 0.1162 0.7332