STA6938-Logistic Regression Model

Size: px
Start display at page:

Download "STA6938-Logistic Regression Model"

Transcription

1 Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of Fitted Model a. Interpretation of Regression Parameters b. Issues about Confounding and Interaction c. Interpretation of Fitted Value

2 . Model Fitting Observed data: i.i.d. copies of ( Y, X, ( Y, X i =,2,, n, Y : a dichotomous variable i i X = ( X, X,, X T p : a covariate vector 2 Goal: Study π ( x = P( Y = x Logistic Regression Model: What is g( x? e π ( x = + e g( x g( x a. If all x i =, 2,, p are continuous variables i g x x x ( = β0 + β + + βp p b. If, for example, x j is a categorical variable such as race, gender, different treatment methods, etc. we need to create some dummy variables. 2

3 Suppose X j level 2 level 2 = k level k We create a vector of dummy variables with k- Components. X j, level 2 = 0 otherwise X j,2 level 3 = 0 otherwise.. level k X jk, = 0 otherwise ( j Then the vector X = ( X j,, X j,2,, X j, k determines the value of the categorical variable X j k g( x = β + β x + β x + + β x + + β x j, l j, l p p l= 3

4 Maximum Likelihood Estimation Log likelihood function: n n l( β = log f( yi xi = log ( xi ( xi i= i= n g( xi e = yilog + ( y log i i= + e + e { ( } y y i i π π g( x i g( x i MLE of β, ˆβ : l( ˆ β = max β l( β Numerical algorithm: (Newton-Raphson Iterative Method ( ( ( ˆ ( k+ ˆ ( k 2 ˆ ( k ˆ ( k l l, k 0,, β = β β β = Asymptotic Result: ( ˆ ( 0, β β ( d β n N I Consistent estimate of the asymptotic covariance matrix: Î ( ˆ β ( ˆ T Iˆ β = XVX 4

5 x x p x2 x 2 p X= xn xn p ˆ π( ˆ π ˆ π ˆ 2( π2 V= ˆ π ( ˆ n πn ˆ β x j ˆ ˆ e π j = P( Yj = xj = ˆ j =,2,, n β x j + e Asymptotic Normality for the jth parameter: Let e (0,0,,,0,,0 T j = j+ ( ˆ ( ˆ ( 0, ( T T β j β j = j β β d j β e j n ne N e I The (j+th diagonal element of I ( β 5

6 2. Statistical Inference for Multiple Logistic Regression Model Example 2. The goal of this study was to identify risk factors associated with giving birth to a low birth weight baby (weighing less than 2500 grams. Data were collected on 89 women, 59 of which had low birth weight babies and 30 of which had normal birth weight babies. Four variables which were thought to be of importance were age, weight of the subject at her last menstrual period, race, and the number of physician visits during the first trimester of pregnancy. Table: Code Sheet for the Variables in the Low Birth Weight Data Set. Columns Variable Abbreviation Identification Code ID 0 Low Birth Weight (0 = Birth Weight ge 2500g, LOW l = Birth Weight < 2500g 7-8 Age of the Mother in Years AGE Weight in Pounds at the Last Menstrual Period LWT 32 Race ( = White, 2 = Black, 3 = Other RACE 40 Smoking Status During Pregnancy ( = Yes, 0 = No SMOKE 48 History of Premature Labor (0 = None, = One, etc. PTL 55 History of Hypertension ( = Yes, 0 = No HT 6 Presence of Uterine Irritability ( = Yes, 0 = No UI 67 Number of Physician Visits During the First Trimester FTV (0 = None, = One, 2 = Two, etc Birth Weight in Grams BWT 6

7 SAS code for Fitting the Multiple Regression Model: proc logistic data=logistic.lowbwt Descending; class RACE /PARAM=REFERENCE Descending; model LOW=AGE LWT RACE FTV; run; Some Remarks: The option Descending in the proc statement tells SAS PY= x to model ( If there is any categorical variable, one should use class statement to point it out The option PARAM associated the class statement tells SAS how to code the categorical variable By setting PARAME=REFERENCE, one actually set one level as a baseline level (reference level and then the regression parameters associated with this variable represent the comparison of each level with this reference level. By default, the reference level is set as the largest level in the numerical order. Wanting to reverse the order, one can simply add the option Descending 7

8 The LOGISTIC Procedure Model Information Data Set LOGISTIC.LOWBWT Response Variable LOW Low Birth Weight Number of Response Levels 2 Number of Observations 89 Model binary logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value LOW Frequency Probability modeled is LOW=. Class Level Information Design Variables Class Value 2 RACE Model Convergence Status Convergence criterion (GCONV=E-8 satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L

9 The LOGISTIC Procedure Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald Type III Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq AGE LWT RACE FTV Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept AGE LWT RACE RACE FTV Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits AGE LWT RACE 3 vs RACE 2 vs FTV

10 The use of PROC GENMOD: The logistic regression model can be also fitted using PROC GENMOD. (Generalized Linear Model is SAS. SAS code: proc genmod DATA=logistic.lowbwt DESCENDING; class RACE; model LOW=AGE LWT RACE FTV; run; SAS report: The GENMOD Procedure Model Information Data Set LOGISTIC.LOWBWT Distribution Binomial Link Function Logit Dependent Variable LOW Low Birth Weight Observations Used 89 Class Level Information Class Levels Values RACE Response Profile Ordered Total Value LOW Frequency PROC GENMOD is modeling the probability that LOW=''. 0

11 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept AGE LWT RACE RACE RACE FTV Scale NOTE: The scale parameter was held fixed. Note that for the GENMOD, it automatically treat the RACE=3 as the reference level. To make it comparable with the results reported by PROC LOGISTIC, we need to recode the data. SAS code: Data a; set logistic.lowbwt; if RACE= THEN RACE=5; if RACE=2 THEN RACE=4; run; proc genmod DATA=a descending; class RACE; model LOW=AGE LWT RACE FTV /D=B; run;

12 SAS report: The GENMOD Procedure Model Information Data Set WORK.A Distribution Binomial Link Function Logit Dependent Variable LOW Low Birth Weight Observations Used 89 Class Level Information Class Levels Values RACE Response Profile Ordered Total Value LOW Frequency PROC GENMOD is modeling the probability that LOW=''. Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Algorithm converged. 2

13 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept AGE LWT RACE RACE RACE FTV Scale NOTE: The scale parameter was held fixed. Remarks: The results from the PROC GENMOD are the same from the PROC LOGISTIC. PROC LOGISTIC reports the log likelihood ratio test statistics to test if the fitted model is better than nothing. Rejection of the null hypothesis yields that the fitted model is indeed better than nothing. PROC GENMOD reports the deviance that can test if the fitted model can be improved by inserting the interactions, i.e. test if the fitted model is the best model (Goodnessof-fit using the selected variables. Rejection of the null hypothesis yields that the fitted model can be somehow improved. Hypothesized Model: gx ( = β + β AGE+ β LWT+ β RACE_2+ β RACE_3+ β FTV Estimating Equation: gx= ˆ( AGE-0.04 LWT+.004 RACE_ RACE_ FTV 3

14 Hypotheses Testing: a. Overall significance test β = ( β, β2, β3, β4, β5 H : =0 0 β H a : β 0 Likelihood Ratio Test L(with the variables G = 2 log = ( L(without the variables = ( χ5 ( χ5 P value = Pr > G = Pr > < 0.05 Reject the null hypothesis at significance level Wald Test Obtaining the MLE of β, ˆβ and the consistent var ˆ ˆ β, estimate of covariance matrix ( ˆ ( var β, ( ˆT W = β var ˆ ˆ β ˆ β χ For this example, W = and 2 2 ( χ5 ( χ5 P value = Pr > W = Pr > > 0.05 No evidence to reject the at level H

15 What can we do?. In most situations, the results from both tests agree 2. The likelihood ratio test is usually the most powerful test and commonly suggested to use in practice. 2. Univariate Tests After we conclude that the fitted model is significantly better than nothing, we want to know which variable significantly affect the response. H : 0 0 βi = H ( : β 0 or β > 0 or β < 0 a i i i i =, 2,, p Wald test: Z i W i = se ˆ β i ( ˆ βi ˆ β ( ˆ βi ( 0, 2 2 i 2 i χ = Z = var ˆ N At significance level 0.05, it appears that only LWT and RACE are significant factors for the response. Note: if our goal is to obtain the best fitting model while minimizing the number of parameters, we should fit a reduced model containing only those variables thought to 5

16 be significant, and compare it to the full model containing all the variables. 3. Reduced model vs. Full model Compare the reduced model with the full model. The reduced model should be nested in the full model that means all the variables appeared in the reduced model should be present in the full model. For Example 2., since only LWT and RACE are significant, we consider the reduced model to be the one only containing LWT and RACE. proc logistic data=logistic.lowbwt Descending; class RACE /PARAM=REFERENCE Descending; model LOW=LWT RACE; run; Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L The LOGISTIC Procedure Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald

17 Type III Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq LWT RACE Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept LWT RACE RACE H or 0 : Reduced model is good enough compared to the full model H : β = β = Test statistic: G = ( = χ ( χ2 ( χ2 P value = Pr > G = Pr > No enough evidence to reject the null hypothesis. 7

18 3. Interpretation of Fitted Model a. Interpretation of Regression Parameters Study goal: What do the estimated coefficients in the model tell us about the research questions that motivated the study? Model Example where π ( x g( x = log = β0 + βx+ β2x2+ β22x π ( x 22 x = ( x, x, x = ( x, x x x : continuous variable : categorical variable with three levels and thus two dummy variables introduced Assume that x2 and x22 indicate level 2 and 3 of x2, respectively I. Continuous independent variable π ( x = x0 + ; x2 π ( x = + ; x2 gx ( = x0 + ; x2 gx ( = x0; x2 = log π ( x = x0; x2 π ( x = x0; x2 [ ( ] [ = β + β x + + β x + β x β + β x + β x + β x = β ] β : the log of odds ratio for the variable unit, given the other variable fixed. x increased by one 8

19 OR x = x +, x = x other = e β Notation: ( 0 0 The odds of developing the symptom when x = x0 + is times of that when x = x0 adjusting the other variable (keep the other variable fixed, for the similar subjects, etc e β OR x = a, x = b other = e β In general, ( ( a b Studying the odds ratio for the one unit increase in a continuous variable may not be clinically meaningful The odds of developing the symptom when x = a e β (a b x is times of that when other variable = b adjusting the II. Categorical (Polychotomous independent variable π ( x2 = 2nd level; x π ( x2 = 2nd level; x gx ( 2 = 2nd level; x gx ( 2 = st level; x = log π ( x2 = st level; x π ( x2 = st level; x [ x ] [ = β + β + β + β 0 β + β x + β 0+ β 0 = β ( = e β OR 2nd level,st level other The odds of developing the symptom will increase by 2 e β st times when the status of x2 changes from the level the 2 nd level, adjusting the other variable (keep the other variable fixed, for the similar subjects, etc ] 9

20 For the reduced model in Example 2.: Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept LWT RACE RACE For the pregnant women in the same race, the odds ratio of having a small baby is approximately e = 0.859, when a pregnant woman compares with the counterpart who is 0 pounds lighter at the last menstrual period. For the pregnant women who are similar in the weight at the last menstrual period, the odds of having a small baby for Black women is approximately almost three time.08 ( e = of that for White women. For the pregnant women who are similar in the weight at the last menstrual period, the odds of having a small baby for women with race other than White or Black is approximately almost.6 times ( e =.67 of that for white women. 20

21 Confidence Interval Estimation The MLE simply gives the point estimate of the regression parameters. When we interpret the point estimate we actually have 0% confidence that means the chance of the true regression parameter equals to the estimated value is almost zero The confidence interval estimate provides a confidence level while stating that the true regression parameter falls in a specific interval 00 ( α % CI for β i, i =,2,, p ˆ β ± z i α /2 se ( ˆ βi For the reduced model in Example 2., Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept LWT RACE RACE we are 95% confident that β ˆ β z0.025se( ˆ β, ˆ β + z0.025se( ˆ β [ , ] [ , ] = + = 2

22 (, ( β ˆ β z se ˆ β ˆ β + z se ˆ β [ , ] [ 0.244, ] = + = (, ( β ˆ β z se ˆ β ˆ β + z se ˆ β [ , ] [ 0.285,.797] = + = 00 ( α % CI for the odds ratio: We are 95% confident that OR( x x 0, x x ; x e, e = 0 + = 0 2 = [ , ] [ ] [ ] OR(black, white; x e, e =.325, OR(other, white; x e, e = , Interpretation We are 95% confident that the odds ratio of having a small baby for a pregnant woman vs. her counterpart who is0 pounds lighter at the last menstrual period could be as little as or as large as adjusting their race We are 95% confident that the odds ratio of having a small baby for a black woman vs. a white woman, who are similar in weight at the last menstrual period could be as little as.325 or as large as

23 We are 95% confident that the odds ratio of having a small baby for a woman other than black or white vs. a white woman, who are similar in weight at the last menstrual period could be as little as or as large as It seems that black women tend to have a great risk of having a small baby compared with white women due to the confidence interval excluding. b. Issues about Confounding and Interaction Epidemiologist use the term confounder to describe covariate that is associated with both the outcome variable of interest and a primary independent variable or risk factor In most of epidemiological research, we are primarily interested in the association between outcome and a potential risk factor. For example, the correlation between the incidence of coronary heart disease (CHD and smoking is one of the research interests. Assume that we are able to follow two study cohorts: one is smoking group and another non-smoking and we can also control other possible risk factors (distributions of these factors are the same in the two cohorts, then we can simply establish a univariate logistic regression model to describe the correlation between the CHD and SMOKE, i.e. 23

24 (CHD log P = β + 0 β SMOKE P(CHD β represents the odds ratio of developing CHD of smokers vs. non-smokers. However, what if we are not able to control another risk factor says AGE, in practice. Then the true model is P(CHD log = β0 + β SMOKE+ β2 AGE P(CHD Also we consider the situation that the age distribution is different between smokers and non-smokers: smokers tend to be older than non-smokers. If we mistakenly fitted the previous model ignoring the AGE factor, then 2 ( E( E( OR(smoker, non-smoker= β + β age smoking age non-smoking So this will incorrectly estimate the effect of smoking, actually, it will exaggerate the risk of smoker on developing the CHD. 24

25 For a variable to be considered as a Confounding variable, it should have two characteristics: a. This variable is related to outcome b. This variable is also related to the primary risk factor Outcome Primary Risk Factor Confounder Example 2.2 we consider a simulated data set Data CHDSIMUL; do ID= to 200 by ; smoke=ranbin(int(time(,,0.5; if smoke= then age=50+5*normal(int(time(; else age=42+5*normal(int(time(; p=exp( *smoke+0.5*age+0.20*smoke*age/(+exp ( *smoke+0.5*age+0.20*smoke*age; CHD=RANBIN(int(time(,,p; output; end; run; 25

26 Model Fitting: a. Univariate model-only contains SMOKE proc logistic data=logistic.chdsimul Descending; model CHD=smoke; run; The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.000 smoke < OR (smoker, non-smoker= e = 7.38 b. Obviously, in this study, age is definitely a confounding variable. Multivariate model-contains SMOKE and AGE proc logistic data=logistic.chdsimul Descending; model CHD=smoke age; run; The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.000 smoke age <.000 The coefficient associated with variable SMOKE is greatly reduced. 26

27 This type of study is called Analysis of Covariance in epidemiology. We should explain the odds ratio adjusting the confounder AGE!!! O R(smoker, non-smoker age= e =.94 Interaction-effect modifier Suppose we fit a logistic regression model with two risk factors, if the effect of one risk factor on the outcome depends on the level of the other factor, we will say that the two factors interact in affecting the outcome. In any model, interaction is incorporated by the inclusion of the cross product terms of two risk factors. For Example 2.2, if we fit model P(CHD log = β0 + β SMOKE+ β2 AGE+ β3 SMOKE AGE P(CHD Now let s consider the age effect of developing the CHD: For smokers P(CHD smpker log = ( β0 + β+( β2 + β3 AGE P(CHD smoker 27

28 For non-smoker (CHD non-smpker log P = β0+ β 2 AGE P(CHD non-smoker Apparently, the β 3 modifies the effect of age on the CHD. We cannot interpret the age effect on the CHD without knowing the smoking status. c. Fit the interaction model proc logistic data=logistic.chdsimul Descending; model CHD=smoke age smoke*age; run; The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept smoke age smoke*age OR (age=a+0,age=a non-smoker= e = 3.8 ( OR (age=a+0,age=a smoker= e = 3.79 Do you get the idea of effect modifier? 28

29 Model Fitting Principle in Biomedical Studies: The inclusion of confounder(s is very important, otherwise the relationship between the risk factor and the outcome may be misinterpreted. When the regression coefficients associated with the risk factors changes a lot when drop a covariate, this covariate should be potentially viewed as a confounder, even though it may not be statistically significant from the hypothesis test. In biomedical studies, SEX and AGE are normally treated as confounders. When interaction effect in included, the interpretation the risk factor becomes complicated. Unless the interaction is statistically significant, it is normally advised to not put the interaction in the model. When a covariate is an effect modifier, its status as a confounder is of secondary importance since the estimate of the effect of the risk factor depends on the specific value of the covariate. 29

30 c. Interpretation of Fitted Value 00 ( α % PI (prediction interval for the odds Given a set of covariates, we can predict the log odds by plug in these covariates into the prediction equation, i.e. obtaining the point prediction g x ˆ ˆ x ˆ x ˆ( = β0 + β + + βp p To obtain the confidence interval, we need to get the distribution of ˆ( g x. ( T T T ( β ( ˆ β ( ˆ T Note that gx ˆ( x ˆ β and ˆ β, var ( ˆ d N β = β, we have ( gˆ( x N x,var g( x = N x, x var β x d T ( ˆ = ˆ ( ˆ β var ˆ g( x x var x ( ˆ x x ˆ ( p p p 2 x ˆ ˆ j j j k j j= 0 j= k= j+ = var β + 2 cov β, β ˆk SAS provides the estimate for the covariance matrix of the regression parameter estimates by inserting the option COVB in model step. 30

31 For the reduced model in Example 2., proc logistic data=logistic.lowbwt Descending; class RACE /PARAM=REFERENCE Descending; model LOW=LWT RACE /covb; run; The estimate of covariance matrix: The LOGISTIC Procedure Estimated Covariance Matrix Variable Intercept LWT RACE3 RACE2 Intercept LWT RACE RACE Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept LWT RACE RACE Let s consider two particular cases. i. LWT=50, RACE=White g ˆ(50 pounds and White= =

32 2 ( gˆ ( = ( ˆ β0 + ( ˆ β var ˆ 50 pounds and White var 50 var ( ˆ β ( ˆ 2 β ( ˆ 22 β ˆ 0 β ( β β ( β β ( ( ˆ β ˆ β22 ( ˆ β ˆ 2 β var 0 var 2 50 cov, cov ˆ, ˆ cov ˆ, ˆ cov ˆ β, ˆ β cov, cov, 2 = ( = We are 95% confident that the odds of this woman will fall in.444± e = e, e = [ 0.37, 0.406] We are 95% confident that π ( 50 pounds and White, = [ 0.20,0.289] i.e. we are 95% confident that the chance of having a small baby for this woman could be as little as 2.0% or as large as 28.9%. ii. LWT=20, RACE=Asian (other g ˆ(20 pounds and Asian= =

33 2 ( gˆ ( = ( ˆ β0 + ( ˆ β var ˆ 20 pounds and Asian var 20 var ( ˆ β ( ˆ 2 β ( ˆ 22 β ˆ 0 β ( β β ( β β ( ( ˆ β ˆ β22 ( ˆ β ˆ 2 β var var 2 20 cov, cov ˆ, ˆ + 2 cov ˆ, ˆ cov ˆ β, ˆ β cov, cov, 2 = ( ( = What is wrong with that? var ˆ β a positive definite matrix? Isn't ( Note that the eigenvalues of ˆ ( ˆ var β is λ = [ 0.732, 0.259, 0.088, ] It is purely the numerical problem! How to fix it? Change the scale of variable LWT by dividing LWT by 00. data a; set logistic.lowbwt; lwt=lwt/00; run; proc logistic data=a Descending; class RACE /PARAM=REFERENCE Descending; model LOW=LWT RACE /covb; run; 33

34 The LOGISTIC Procedure Estimated Covariance Matrix Variable Intercept LWT RACE3 RACE2 Intercept LWT RACE RACE Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept LWT RACE RACE Note under current scale, the eigenvalues of ˆ ( ˆ λ = [.2, 0.269, 0.095, 0.009] var β is g ˆ(20 pounds and Asian= = ( gˆ ( = ( ˆ β0 + ( ˆ β var ˆ 20 pounds and Asian var.2 var ( ˆ β ( ˆ 2 β ( ˆ 22 β ˆ 0 β ( β β ( β β ( ( ˆ β ˆ β22 ( ˆ β ˆ 2 β var var 2.2 cov, cov ˆ, ˆ + 2 cov ˆ, ˆ cov ˆ β, ˆ β cov, cov, 2 = ( ( =

35 We are 95% confident that the odds of this woman will fall in ± e = e, e = [ 0.352, 0.963] We are 95% confident that π ( 20 pounds and Asian, = [ 0.260,0.49] i.e. we are 95% confident that the chance of having a small baby for this woman could be as little as 26.0% or as large as 49.%. 35

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ Logistic Regression The goal of a logistic regression analysis is to find the best fitting and most parsimonious, yet biologically reasonable, model to describe the relationship between an outcome (dependent

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 6-Logistic Regression for Case-Control Studies Outlines: 1. Biomedical Designs 2. Logistic Regression Models for Case-Control Studies 3. Logistic

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials. The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the

More information

Logistic Regression. Building, Interpreting and Assessing the Goodness-of-fit for a logistic regression model

Logistic Regression. Building, Interpreting and Assessing the Goodness-of-fit for a logistic regression model Logistic Regression In previous lectures, we have seen how to use linear regression analysis when the outcome/response/dependent variable is measured on a continuous scale. In this lecture, we will assume

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only Moyale Observed counts 12:28 Thursday, December 01, 2011 1 The FREQ Procedure Table 1 of by Controlling for site=moyale Row Pct Improved (1+2) Same () Worsened (4+5) Group only 16 51.61 1.2 14 45.16 1

More information

More Statistics tutorial at Logistic Regression and the new:

More Statistics tutorial at  Logistic Regression and the new: Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

CHAPTER 1: BINARY LOGIT MODEL

CHAPTER 1: BINARY LOGIT MODEL CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual

More information

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see Title stata.com logistic postestimation Postestimation tools for logistic Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

More information

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S Logistic regression analysis Birthe Lykke Thomsen H. Lundbeck A/S 1 Response with only two categories Example Odds ratio and risk ratio Quantitative explanatory variable More than one variable Logistic

More information

Logistic Regression. via GLM

Logistic Regression. via GLM Logistic Regression via GLM 1 2008 US Election Some analysts say that Obama s data science team basically won him the election For the first time, a team used data and statistical methods to model voter

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

Interpretation of the Fitted Logistic Regression Model

Interpretation of the Fitted Logistic Regression Model CHAPTER 3 Interpretation of the Fitted Logistic Regression Model 3.1 INTRODUCTION In Chapters 1 and 2 we discussed the methods for fitting and testing for the significance of the logistic regression model.

More information

Models for Binary Outcomes

Models for Binary Outcomes Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

More information

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm Kedem, STAT 430 SAS Examples: Logistic Regression ==================================== ssh abc@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm a. Logistic regression.

More information

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples ST3241 Categorical Data Analysis I Logistic Regression An Introduction and Some Examples 1 Business Applications Example Applications The probability that a subject pays a bill on time may use predictors

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs STAT 5500/6500 Conditional Logistic Regression for Matched Pairs Motivating Example: The data we will be using comes from a subset of data taken from the Los Angeles Study of the Endometrial Cancer Data

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Appendix: Computer Programs for Logistic Regression

Appendix: Computer Programs for Logistic Regression Appendix: Computer Programs for Logistic Regression In this appendix, we provide examples of computer programs to carry out unconditional logistic regression, conditional logistic regression, polytomous

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Multiple linear regression

Multiple linear regression Multiple linear regression Course MF 930: Introduction to statistics June 0 Tron Anders Moger Department of biostatistics, IMB University of Oslo Aims for this lecture: Continue where we left off. Repeat

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Chapter 11: Analysis of matched pairs

Chapter 11: Analysis of matched pairs Chapter 11: Analysis of matched pairs Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 42 Chapter 11: Models for Matched Pairs Example: Prime

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

Generalized Linear Modeling - Logistic Regression

Generalized Linear Modeling - Logistic Regression 1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Logistic Regression Some slides from Craig Burkett STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Titanic Survival Case Study The RMS Titanic A British passenger liner Collided

More information

Epidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval

Epidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval Epidemiology 9509 Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being covered

More information

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors

More information

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

Chapter 14 Logistic and Poisson Regressions

Chapter 14 Logistic and Poisson Regressions STAT 525 SPRING 2018 Chapter 14 Logistic and Poisson Regressions Professor Min Zhang Logistic Regression Background In many situations, the response variable has only two possible outcomes Disease (Y =

More information

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi LOGISTIC REGRESSION Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi- lmbhar@gmail.com. Introduction Regression analysis is a method for investigating functional relationships

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; SAS Analysis Examples Replication C8 * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; libname ncsr "P:\ASDA 2\Data sets\ncsr\" ; data c8_ncsr ; set ncsr.ncsr_sub_13nov2015

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Unit 11: Multiple Linear Regression

Unit 11: Multiple Linear Regression Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable

More information

STAT 705: Analysis of Contingency Tables

STAT 705: Analysis of Contingency Tables STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Lecture 3.1 Basic Logistic LDA

Lecture 3.1 Basic Logistic LDA y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data

More information

Chapter 11: Models for Matched Pairs

Chapter 11: Models for Matched Pairs : Models for Matched Pairs Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Yig Zhag STA6938-Logistic Regressio Model Topic -Simple (Uivariate) Logistic Regressio Model Outlies:. Itroductio. A Example-Does the liear regressio model always work? 3. Maximum Likelihood Curve

More information

Introduction to the Logistic Regression Model

Introduction to the Logistic Regression Model CHAPTER 1 Introduction to the Logistic Regression Model 1.1 INTRODUCTION Regression methods have become an integral component of any data analysis concerned with describing the relationship between a response

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

More information

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies. Matched and nested case-control studies Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark http://staff.pubhealth.ku.dk/~bxc/ Department of Biostatistics, University of Copengen 11 November 2011

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Logistic Regression Models for Multinomial and Ordinal Outcomes

Logistic Regression Models for Multinomial and Ordinal Outcomes CHAPTER 8 Logistic Regression Models for Multinomial and Ordinal Outcomes 8.1 THE MULTINOMIAL LOGISTIC REGRESSION MODEL 8.1.1 Introduction to the Model and Estimation of Model Parameters In the previous

More information

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations)

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations) Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations) Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology

More information

Homework 1 Solutions

Homework 1 Solutions 36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports

More information

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto. Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,

More information

STA102 Class Notes Chapter Logistic Regression

STA102 Class Notes Chapter Logistic Regression STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Homework Solutions Applied Logistic Regression

Homework Solutions Applied Logistic Regression Homework Solutions Applied Logistic Regression WEEK 6 Exercise 1 From the ICU data, use as the outcome variable vital status (STA) and CPR prior to ICU admission (CPR) as a covariate. (a) Demonstrate that

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Introduction to logistic regression

Introduction to logistic regression Introduction to logistic regression Tuan V. Nguyen Professor and NHMRC Senior Research Fellow Garvan Institute of Medical Research University of New South Wales Sydney, Australia What we are going to learn

More information

Count data page 1. Count data. 1. Estimating, testing proportions

Count data page 1. Count data. 1. Estimating, testing proportions Count data page 1 Count data 1. Estimating, testing proportions 100 seeds, 45 germinate. We estimate probability p that a plant will germinate to be 0.45 for this population. Is a 50% germination rate

More information

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors.

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors. EDF 7405 Advanced Quantitative Methods in Educational Research Data are available on IQ of the child and seven potential predictors. Four are medical variables available at the birth of the child: Birthweight

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

Epidemiology Wonders of Biostatistics Chapter 13 - Effect Measures. John Koval

Epidemiology Wonders of Biostatistics Chapter 13 - Effect Measures. John Koval Epidemiology 9509 Wonders of Biostatistics Chapter 13 - Effect Measures John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being covered 1. risk factors 2. risk

More information

SAS PROC NLMIXED Mike Patefield The University of Reading 12 May

SAS PROC NLMIXED Mike Patefield The University of Reading 12 May SAS PROC NLMIXED Mike Patefield The University of Reading 1 May 004 E-mail: w.m.patefield@reading.ac.uk non-linear mixed models maximum likelihood repeated measurements on each subject (i) response vector

More information

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference

More information

STAT 4385 Topic 03: Simple Linear Regression

STAT 4385 Topic 03: Simple Linear Regression STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis

More information

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies Section 9c Propensity scores Controlling for bias & confounding in observational studies 1 Logistic regression and propensity scores Consider comparing an outcome in two treatment groups: A vs B. In a

More information

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Analysis of Count Data A Business Perspective George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Overview Count data Methods Conclusions 2 Count data Count data Anything with

More information

Changes Report 2: Examples from the Australian Longitudinal Study on Women s Health for Analysing Longitudinal Data

Changes Report 2: Examples from the Australian Longitudinal Study on Women s Health for Analysing Longitudinal Data ChangesReport: ExamplesfromtheAustralianLongitudinal StudyonWomen shealthforanalysing LongitudinalData June005 AustralianLongitudinalStudyonWomen shealth ReporttotheDepartmentofHealthandAgeing ThisreportisbasedonthecollectiveworkoftheStatisticsGroupoftheAustralianLongitudinal

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Chapter 4: Generalized Linear Models-II

Chapter 4: Generalized Linear Models-II : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

BIOS 625 Fall 2015 Homework Set 3 Solutions

BIOS 625 Fall 2015 Homework Set 3 Solutions BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

ECONOMETRICS II TERM PAPER. Multinomial Logit Models

ECONOMETRICS II TERM PAPER. Multinomial Logit Models ECONOMETRICS II TERM PAPER Multinomial Logit Models Instructor : Dr. Subrata Sarkar 19.04.2013 Submitted by Group 7 members: Akshita Jain Ramyani Mukhopadhyay Sridevi Tolety Trishita Bhattacharjee 1 Acknowledgement:

More information

Models for binary data

Models for binary data Faculty of Health Sciences Models for binary data Analysis of repeated measurements 2015 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen 1 / 63 Program for

More information