Chapter 14 Logistic and Poisson Regressions

Size: px
Start display at page:

Download "Chapter 14 Logistic and Poisson Regressions"

Transcription

1 STAT 525 SPRING 2018 Chapter 14 Logistic and Poisson Regressions Professor Min Zhang

2 Logistic Regression Background In many situations, the response variable has only two possible outcomes Disease (Y = 1) vs Not diseased (Y = 0) Employed (Y = 1) vs Unemployed (Y = 0) Response is binary or dichotomous Can model response using Bernoulli distribution Pr(Y i = 1) = π i Pr(Y i = 0) = 1 π i E(Y i ) = π i Goal to link π i with covariates X i 14-1

3 Standard Linear Model? Consider simple linear model (single predictor X) Y i = β 0 +β 1 X i +ε i π i = E[Y i ] = β 0 +β 1 X i Assumption violations: Non-normal error terms When Y i = 0 : ε i = β 0 β 1 X i When Y i = 1 : ε i = 1 β 0 β 1 X i Non-constant variance Var(Y i ) = π i (1 π i ) = (β 0 +β 1 X i )(1 β 0 β 1 X i ) Response function constraints 0 π i

4 Logistic Response Function Consider a sigmoidal response function π i = E[Y i ] = exp(β 0 +β 1 X i ) 1+exp(β 0 +β 1 X i ) Example of a nonlinear model = {1+exp( β 0 β 1 X i )}

5 Properties Monotonic increasing/decreasing function Restricts 0 π i = E[Y i ] 1 Can be linearized through transformation log ( πi 1 π i ) = β 0 +β 1 X i Known as the logit transformation Thus, use logit link to relate π i with X i Other links possible (i.e., probit, complementary log-log described on p ) 14-4

6 Estimation Given the distribution of Y i, we can formulate the likelihood function log(l) = log n π Y i i=1 i (1 π i) 1 Y i = Y i log(π i )+ (1 Y i )log(1 π i ) π i = Y i log( )+ log(1 π i ) 1 π i = Y i (β 0 +β 1 X i ) log(1+exp(β 0 +β 1 X i )) MLEs do not have closed forms SAS performs iterative reweighted least squares (IRWLS) Given b 0 and b 1, can calculate ˆπ i 14-5

7 Interpretation ˆπ i is the estimated probability of individual i having response Y i = 1 b 1 is no longer the slope of a linear relationship but rather the slope of the logit relationship logit(ˆπ(x i +1)) logit(ˆπ(x i )) = b 1 Logit transformation is also the log of the odds Thus, exp(b 1 ) becomes the odds ratio Popular summary in epidemiologic studies 14-6

8 Example (Page 625) Board of directors are interested in the effect of a due increase Randomly surveyed n = 30 members X i is the due increase posited to member X i varied between $30 and $50 Y i is whether membership would continue 14-7

9 data a1; infile u:\.www\datasets525\ch14pr07.txt ; input norenew increase; renew=1-norenew; proc print data=a1; var renew increase; run; quit; Obs renew increase

10 /* Scatterplot */ goptions colors=( none ); symbol1 v=circle i=sm70; proc gplot data=a1; plot renew*increase; run; quit; 14-9

11 /* DESCENDING: SAS will predict the event of 1, otherwise of 0 */ proc logistic data=a1 descending; model renew = increase; output out=a2 p=pred; run; quit; Model Information Model Optimization Technique binary logit Fisher s scoring Response Profile Ordered Total Value renew Frequency Probability modeled is renew=1. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L

12 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept increase Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits increase Association of Predicted Probabilities and Observed Responses Percent Concordant 68.3 Somers D Percent Discordant 28.1 Gamma Percent Tied 3.6 Tau-a Pairs 224 c

13 /* Scatterplot/Fit */ symbol1 v=circle i=none; symbol2 v=star i=sm30; proc gplot data=a2; plot renew*increase pred*increase /overlay; run; quit; 14-12

14 Multiple Logistic Regression Easily extension using matrix formulations Similar model building strategies (SAS option of MODEL) Stepwise: SELECTION=STEPWISE Forward: SELECTION=FORWARD Backward: SELECTION=BACKWARD Best Subset: SELECTION=SCORE SELECTION=SCORE PROC LOGISTIC uses the branch and bound algorithm of Furnival and Wilson (1974) to find a specified number of models (e.g., BEST=1) with the highest score (chi-square) test statistic for all possible model sizes. The score test statistic is to test whether all the coefficients are zero

15 Example (Page 573) Investigating epidemic outbreak of a disease spread by mosquitoes Randomly sampled individuals within two sectors of city Assessed whether individual had symptoms of disease and obtained other info X i1 is age X i2 indicator of middle class X i3 indicator of lower class X i4 is the sector Y i is whether they had symptoms 14-14

16 data a3; infile u:\.www\datasets525\apc3.dat ; input case age socioecon sector Y jnk; X2=0; if socioecon=2 then X2=1; X3=0; if socioecon=3 then X3=1; drop jnk; proc logistic data=a3 descending; model Y = age X2 X3 sector/selection=score best=1; run; quit; Regression Models Selected by Score Criterion Number of Score Variables Chi-Square Variables Included in Model sector age sector age X3 sector age X2 X3 sector Which model should be selected? X2 and X3 should be included/excluded together! Note: selection=score does not support class variables! 14-15

17 proc logistic data=a3 descending; class socioecon; model Y = age socioecon sector/selection=stepwise; run; quit; Summary of Stepwise Selection Effect Number Score Wald Step Entered Removed DF In Chi-Square Chi-Square Pr > ChiSq 1 sector age proc logistic data=a3 descending; model Y = age X2 X3 sector; test x2=0,x3=0; run; quit; Linear Hypotheses Testing Results Wald Label Chi-Square DF Pr > ChiSq Test

18 /* Remove X2 & X */ proc logistic data=a3 descending; model Y = age sector; run; quit; Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio <.0001 Score <.0001 Wald <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 age sector Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits age sector

19 PROC GENMOD fits a generalized linear model by ML. The available distributions: normal, binomial, Poisson, gamma, inverse Gaussian, negative binomial, multinomial. proc genmod data=a3 descending; model Y = age X2 X3 sector / link=logit noscale dist=bin; contrast socioeconomic X2 1, X3 1; contrast sector sector 1; run; quit; Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept <.0001 age X X sector Scale Contrast Results Contrast DF Chi-Square Pr > ChiSq Type socioeconomic LR sector LR 14-18

20 Hypothesis Testing Hypothesis testing and formulation of test statistics is done using a likelihood ratio test, which is similar to the general linear test (i.e., comparing the full and reduced models). The deviance of a fitted model is the difference between the log-likelihood of the fitted model and a model that has a parameter (π i ) for each observation Y i (i.e., use all of the degrees of freedom so that the residuals will be zero). For logistic regression DEV(X 1,X 2,...,X p 1 ) = 2log(L(b 0,b 1,...,b p 1 )) 14-19

21 Can use deviance to compare models Models must be hierarchical (Full/Reduced) DEV(X q,...,x p 1 X 0,...,X q 1 ) = DEV(X 0,...,X q 1 ) DEV(X 0,...,X p 1 ) Partial deviance approx χ 2 with p q df Must compute by hand or with PROC GENMOD For example: Testing H O : Socioeconomic=0 Model Fit Statistics from PROC LOGISTIC Output of model Y = age X2 X3 sector; -2 Log L Output of model Y = age sector; -2 Log L DEV(Socioecon Age,Sector) = = p-value=

22 Can also use Wald s Test for H 0 : l β = 0 Described on page 578 Test is similar to squared z test S = (L ˆβ) (L ΣL) 1 (L ˆβ) Under H 0, S χ 2 r where r is rank of L Available in PROC GENMOD and PROC LOGISTIC, using ESTIMATE statement

23 Diagnostics Goodness of fit : Overall measure of fit Consider both replicated and unreplicated binary data Replicated (c unique classes of predictors) a. Deviance goodness of fit : Compare deviance to χ 2 c p b. Pearson goodness of fit : Compare (O jk E jk ) 2 /E jk to χ 2 c p Unreplicated a. Hosemer-Lemeshow goodness of fit: Group observations into classes (usually around 10) according to fitted logit values. Assess overall fit to each class using a Pearson goodness of fit approach. (p ) 14-22

24 Hosemer-Lemeshow Goodness of Fit Test Divide obs up into 10 groups of equal size based on percentiles of the estimated probabilities Expected # of 1 s is ˆπ i Expected # of 0 s is n i ˆπ i Compare expected with observed through χ 2 = (O ij E ij ) 2 E ij Example (Page 625) /* LACKFIT: Hosmer and Lemeshow goodness-of-fit test */ proc logistic data=a1 descending; model renew = increase / lackfit; output out=a2 p=pred; proc print; run; quit; 14-23

25 Obs norenew increase renew _LEVEL_ pred

26 In this example E 11 = ( ) = 0.59 E 10 = 2.41 E 21 = ( ) = 0.77 E 20 = 2.23 Partition for the Hosmer and Lemeshow Test renew = 1 renew = 0 Group Total Observed Expected Observed Expected Hosmer and Lemeshow Goodness-of-Fit Test Chi-Square DF Pr > ChiSq

27 Measures of Agreement Want to measure the association of the pair (Y i,ˆπ i ), i = 1,2,,N Desired to have close (positive) association in some sense. Compare predicted probabilities of pair (Y i = 1,Y j = 0) Concordant if ˆπ i > ˆπ j (#concordant = n c ) Discordant if ˆπ i < ˆπ j (#discordant = n d ) Tie if ˆπ i = ˆπ j Consider all pairs of distinct responses In this example t = =

28 Measures of agreement Somers D = n c n d t : ranges from 1 (all pairs disagree) to 1 (all pairs agree); Goodman-Kruskal Gamma = n c n d n c +n d : range from -1.0 (no association) to 1.0 (perfect association). Not penalizing for ties and generally greater than Somer s D; Kendall s Tau-a = n c n d : modification of Somer s D N(N 1)/2 by considering all possible paired observations. Usually much smaller than Somer s D; c = n c+(t n c n d )/2 t : equivalent to the well known measure ROC, ranging from 0.5 (randomly predicting the response) to 1 (perfectly discriminating the response). Association of Predicted Probabilities and Observed Responses Percent Concordant 68.3 Somers D Percent Discordant 28.1 Gamma Percent Tied 3.6 Tau-a Pairs 224 c

29 Diagnostics Residual analysis Distribution of residuals under correct model is unknown and thus common residual plot uninformative. Pearson residual is the ordinary residual divided by the standard error of Y i. Sum of squared residuals equals Pearson X 2 r P i = Y i ˆπ i ˆπ i (1 ˆπ i ) Studentized Pearson residual is similar but divided by its standard error so they have unit variance Deviance residual is the signed square root of the contribution to the model deviance (r D i ) 2 = DEV(X 0,...,X p 1 ) Sign depends on ˆπ i >

30 Residual analysis Can plot residual by predicted value : A flat lowess smooth to this plot suggests the model is correct Can generate probability plot with envelope DFFITS, DFBETAS SAS contains influence and iplots options Example (Page 625)) ods html; ods graphics on; proc logistic data=a1 descending; model renew = increase / iplots influence lackfit; ods graphics off; ods html close; run; quit; 14-29

31 Regression Diagnostics Pearson Residual Deviance Residual Covariates Case (1 unit = 0.24) (1 unit = 0.22) increase Value Value * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 14-30

32 Hat Matrix Diagonal Intercept Case (1 unit = 6.E-03) DfBeta (1 unit = 0.07) Value Value * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 14-31

33 Influence Plots 14-32

34 Residual Plots 14-33

35 Ordinal Logistic Regression Have more than two possible outcomes on ordered scale (i.e., Likert scale), say J outcomes Often called proportional odds model ( ) Pr(Yi j) log = β 0j +β 1 X i1 + +β p 1 X i,p 1, 1 Pr(Y i j) j = 1,2,,J 1 Shared values of β 1,,β p 1 J 1 intercepts: β 01,β 02,,β 0,J 1 Proportional odds (irrelevant to the values of predictors) Pr(Y i j X i ) 1 Pr(Y i j X i ) / Pr(Y i k X i ) 1 Pr(Y i k X i ) = eβ 0j β 0k 14-34

36 Dose-Symptoms Example data dsymp; input dose symptoms $ ldose = log10(dose); cards; 10 None Mild 7 10 Severe None Mild Severe None Mild 3 30 Severe None 9 40 Mild 8 40 Severe 32 ; /* ORDER: specifies sorting order for classification variables */ /* ORDER=DATA: order of appearance in the input data set */ proc logistic data=dsymp order=data; class symptoms; model symptoms = ldose; weight n; run; quit; Response Profile Ordered Total Total Value symptoms Frequency Weight 1 None Mild Severe Probs modeled are cumulated over the lower Ordered Values

37 Score Test for the Proportional Odds Assumption Chi-Square DF Pr > ChiSq Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio <.0001 Score <.0001 Wald <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept None <.0001 Intercept Mild <.0001 ldose <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits ldose

38 Polytomous Logistic Regression For polytomous or multicategory, consider there are J response categories Select one as baseline or reference category, ( ) P(Yi = j) log = β 0j +β 1j X i1 + +β p 1,j X i,p 1, P(Y i = J) log ( ) πij π ij = X i β j, j = 1,,J 1; Results in J 1 parameter vectors, β 1,,β J

39 Dose-Symptoms Example /* CATMOD: performs categorical data modeling */ /* DIRECT: lists numeric predictors */ proc catmod data=dsymp order=data; direct ldose; weight n; model symptoms = ldose; run; quit; Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq Intercept <.0001 ldose <.0001 Likelihood Ratio Analysis of Maximum Likelihood Estimates Function Standard Chi- Parameter Number Estimate Error Square Pr > ChiSq Intercept < ldose <

40 Overdispersion Binomial/Bernoulli Model E(Y i ) = π i Var(Y i ) = π i (1 π i ) Beta-Binomial Model E(Y i ) = π i Var(Y i ) = σ 2 π i (1 π i ) Mean unaffected but Cov(ˆβ) σ 2 (X WX) 1 Can show E(χ 2 /(N p)) σ

41 Example (Page 625) proc genmod data=a1 descending; model renew = increase / dist=binomial noscale link=logit; run; quit; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood proc genmod data=a1 descending; model renew = increase / dist=binomial scale=2 link=logit; run; quit; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood

42 proc genmod data=a1 descending; model renew = increase / dist=binomial noscale link=logit; run; quit; Analysis Of Parameter Estimates proc genmod data=a1 descending; model renew = increase / dist=binomial scale=2 link=logit; run; quit; Analysis Of Parameter Estimates Standard Chi- Parameter DF Estimate Error Square Pr > ChiSq Intercept increase Scale Standard Chi- Parameter DF Estimate Error Square Pr > ChiSq Intercept increase Scale

43 INGOTS Example Taken from Cox and Snell (1989, p.10-11) The data consist of the number, r, of ingots not ready for rolling, out of n tested, for a number of combinations of heating time (heat) and soaking time (soak). data ingots; input heat soak r cards; ; 14-42

44 proc logistic data = ingots; model r/n = heat soak; run; quit; Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 heat soak

45 proc genmod data = ingots; model r/n = heat soak / dist=binomial link=logit scale=p; run; quit; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept <.0001 heat soak Scale NOTE: The scale parameter was estimated by the square root of Pearson s Chi-Square/ 14-44

46 proc genmod data = ingots; model r/n = heat soak / dist=binomial link=logit scale=d; run; quit; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept <.0001 heat soak Scale NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF

47 Poisson Regression Like logistic regression, this is a nonlinear regression model for discrete outcomes Used when response variable is a count, for example, Number of acute asthma attacks in a week Number of visits to mall during December Useful when large count is a rare event Otherwise, standard linear model with normal errors appropriate Poisson counts use transformation 14-46

48 Regression Model Standard linear/nonlinear regression form: Y i = E(Y i )+ε i Generalized linear model: Exponential family distribution for Y and link between the mean and covariates Common link g( ) such that µ i = g 1 (X i β) µ i = X i β (LINK=IDENTITY) µ i = exp(x iβ) (LINK=LOG) µ i = (X i β)2 (LINK=POWER(0.5)) All must result in µ i nonnegative Model is such that the Y i s are independent Poisson random variables with expected values µ i = g 1 (X i β) 14-47

49 Estimation Given the distribution of Y i, we can formulate the likelihood function log(l) = Y i log{µ(x i,β)} µ(x i,β)+c MLEs do not have closed forms Iterative reweighted least squares or other numerical search procedures used to solve for β 14-48

50 Hypothesis Testing Hypothesis testing and formulation of test statistics is done in similar manner to logistic regression. The deviance of a fitted model is the difference between the log-likelihood of the fitted model and a model that has a parameter (µ i ) for each observation Y i (i.e., use all of the degrees of freedom so that the residuals will be zero). For Poisson regression Yi DEV(X 1,X 2,...,X p 1 ) = 2[ log ( ) Yi ˆµ i (Y i ˆµ i ) ] 14-49

51 Can use deviance to compare models Models must be hierarchical (Full/Reduced) DEV(X q,...,x p 1 X 0,...,X q 1 ) = DEV(X 0,...,X q 1 ) DEV(X 0,...,X p 1 ) Partial deviance approx χ 2 with p q df Must compute by hand or with PROC GENMOD Prediction Mean response for predictors X Probability of specific count (i.e., Y = 0) 14-50

52 Example (Page 621) Lumber company interested in the relationship between the number of customers from a census tract and X 1 : Number of housing units X 2 : Average income X 3 : Average housing unit age X 4 : Distance to nearest competitor X 5 : Distance to store Will use LINK=LOG log(µ i ) = X β data ppoi1; infile U:\.www\datasets525\CH14TA08.DAT ; input ncust x1 x2 x3 x4 x5; proc genmod; model ncust= x1 x2 x3 x4 x5 / link=log dist=poi type3; run; quit; 14-51

53 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Analysis Of Parameter Estimates NOTE: The scale parameter was held fixed. LR Statistics For Type 3 Analysis Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square P>ChiSq Intercept <.0001 x <.0001 x <.0001 x x <.0001 x <.0001 Scale Chi- Source DF Square Pr > ChiSq x <.0001 x <.0001 x x <.0001 x <

54 Diagnostics Goodness of fit Deviance goodness of fit : Compare overall fit with χ 2 n p Residual analysis Distribution of residuals under correct model is unknown and thus common residual plot uninformative. Deviance residual is the signed square root of the contribution to the model deviance (r D i ) 2 = DEV(X 0,...,X p 1 ) Sign depends on whether ˆµ i > Y i Can plot r D i by observation (index plot) Can generate probability plot with envelope 14-53

55 Chapter Review Logistic Regression Background Model Inference Diagnostics and remedies Ordinal Logistic Regression Polytomous Logistic Regression Poisson Regression Background Model Inference 14-54

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials. The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the

More information

Models for Binary Outcomes

Models for Binary Outcomes Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

More information

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm Kedem, STAT 430 SAS Examples: Logistic Regression ==================================== ssh abc@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm a. Logistic regression.

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Binary Response: Logistic Regression. STAT 526 Professor Olga Vitek

Binary Response: Logistic Regression. STAT 526 Professor Olga Vitek Binary Response: Logistic Regression STAT 526 Professor Olga Vitek March 29, 2011 4 Model Specification and Interpretation 4-1 Probability Distribution of a Binary Outcome Y In many situations, the response

More information

Section Poisson Regression

Section Poisson Regression Section 14.13 Poisson Regression Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 26 Poisson regression Regular regression data {(x i, Y i )} n i=1,

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Count data page 1. Count data. 1. Estimating, testing proportions

Count data page 1. Count data. 1. Estimating, testing proportions Count data page 1 Count data 1. Estimating, testing proportions 100 seeds, 45 germinate. We estimate probability p that a plant will germinate to be 0.45 for this population. Is a 50% germination rate

More information

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi LOGISTIC REGRESSION Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi- lmbhar@gmail.com. Introduction Regression analysis is a method for investigating functional relationships

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Chapter 1. Modeling Basics

Chapter 1. Modeling Basics Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical

More information

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only Moyale Observed counts 12:28 Thursday, December 01, 2011 1 The FREQ Procedure Table 1 of by Controlling for site=moyale Row Pct Improved (1+2) Same () Worsened (4+5) Group only 16 51.61 1.2 14 45.16 1

More information

CHAPTER 1: BINARY LOGIT MODEL

CHAPTER 1: BINARY LOGIT MODEL CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure). 1 Neuendorf Logistic Regression The Model: Y Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and dichotomous (binomial; 2-value), categorical/nominal data for a single DV... bear in mind that

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Age 55 (x = 1) Age < 55 (x = 0)

Age 55 (x = 1) Age < 55 (x = 0) Logistic Regression with a Single Dichotomous Predictor EXAMPLE: Consider the data in the file CHDcsv Instead of examining the relationship between the continuous variable age and the presence or absence

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples ST3241 Categorical Data Analysis I Logistic Regression An Introduction and Some Examples 1 Business Applications Example Applications The probability that a subject pays a bill on time may use predictors

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 29 14.1 Regression Models

More information

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Analysis of Count Data A Business Perspective George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Overview Count data Methods Conclusions 2 Count data Count data Anything with

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

Chapter 4: Generalized Linear Models-I

Chapter 4: Generalized Linear Models-I : Generalized Linear Models-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Chapter 14 Logistic regression

Chapter 14 Logistic regression Chapter 14 Logistic regression Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 62 Generalized linear models Generalize regular regression

More information

Statistical Modelling with Stata: Binary Outcomes

Statistical Modelling with Stata: Binary Outcomes Statistical Modelling with Stata: Binary Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 21/11/2017 Cross-tabulation Exposed Unexposed Total Cases a b a + b Controls

More information

Statistics 262: Intermediate Biostatistics Model selection

Statistics 262: Intermediate Biostatistics Model selection Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Logistic Regression. Continued Psy 524 Ainsworth

Logistic Regression. Continued Psy 524 Ainsworth Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

Generalized Linear Modeling - Logistic Regression

Generalized Linear Modeling - Logistic Regression 1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Chapter 4: Generalized Linear Models-II

Chapter 4: Generalized Linear Models-II : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

(c) Interpret the estimated effect of temperature on the odds of thermal distress. STA 4504/5503 Sample questions for exam 2 1. For the 23 space shuttle flights that occurred before the Challenger mission in 1986, Table 1 shows the temperature ( F) at the time of the flight and whether

More information

Generalized Models: Part 1

Generalized Models: Part 1 Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; SAS Analysis Examples Replication C8 * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; libname ncsr "P:\ASDA 2\Data sets\ncsr\" ; data c8_ncsr ; set ncsr.ncsr_sub_13nov2015

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

Stat 704: Data Analysis I, Fall 2010

Stat 704: Data Analysis I, Fall 2010 Stat 704: Data Analysis I, Fall 2010 Generalized linear models Generalize regular regression to non-normal data {(Y i,x i )} N i=1, most often Bernoulli or Poisson Y i. The general theory of GLMs has been

More information

Introduction to Generalized Models

Introduction to Generalized Models Introduction to Generalized Models Today s topics: The big picture of generalized models Review of maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013 Topic 20 - Diagnostics and Remedies - Fall 2013 Diagnostics Plots Residual checks Formal Tests Remedial Measures Outline Topic 20 2 General assumptions Overview Normally distributed error terms Independent

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Lecture 8: Summary Measures

Lecture 8: Summary Measures Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Topic 20: Single Factor Analysis of Variance

Topic 20: Single Factor Analysis of Variance Topic 20: Single Factor Analysis of Variance Outline Single factor Analysis of Variance One set of treatments Cell means model Factor effects model Link to linear regression using indicator explanatory

More information

Overview Scatter Plot Example

Overview Scatter Plot Example Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T. Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the

More information

Some comments on Partitioning

Some comments on Partitioning Some comments on Partitioning Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/30 Partitioning Chi-Squares We have developed tests

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

7.1, 7.3, 7.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1/ 31

7.1, 7.3, 7.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1/ 31 7.1, 7.3, 7.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 31 7.1 Alternative links in binary regression* There are three common links considered

More information

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved. LINEAR REGRESSION LINEAR REGRESSION REGRESSION AND OTHER MODELS Type of Response Type of Predictors Categorical Continuous Continuous and Categorical Continuous Analysis of Variance (ANOVA) Ordinary Least

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Generalized Multilevel Models for Non-Normal Outcomes

Generalized Multilevel Models for Non-Normal Outcomes Generalized Multilevel Models for Non-Normal Outcomes Topics: 3 parts of a generalized (multilevel) model Models for binary, proportion, and categorical outcomes Complications for generalized multilevel

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Introduction to the Logistic Regression Model

Introduction to the Logistic Regression Model CHAPTER 1 Introduction to the Logistic Regression Model 1.1 INTRODUCTION Regression methods have become an integral component of any data analysis concerned with describing the relationship between a response

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Treatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev

Treatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev Variable selection: Suppose for the i-th observational unit (case) you record ( failure Y i = 1 success and explanatory variabales Z 1i Z 2i Z ri Variable (or model) selection: subject matter theory and

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

Generalized Linear Models: An Introduction

Generalized Linear Models: An Introduction Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

STAT 705: Analysis of Contingency Tables

STAT 705: Analysis of Contingency Tables STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information