Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Size: px

Start display at page:

Download "Class Notes: Week 8. Probit versus Logit Link Functions and Count Data"

Leonard Carson
6 years ago
Views:

1 Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While the logit link function is most commonly used with dichotomous data, another option is using the probit link function. This is often used within the context of dose-response experiments. The probit model makes use of what would be the natural response rate (i.e., the probability of getting a response with no dose). So if we applied this type of model to an educational problem, the natural response would be the rate of students who might be proficient in math without having a particular treatment. The REGRESSION: Probit routine is set up to run this type of model. In the absence of this type of experimental setting, it is often easier to use a logistic model. We can also use GENLIN (or REGRESSION: Ordinal) to obtain probit estimate for outcomes that are dichotomous or ordinal. Let s say we have 55 students who participate in a treatment to raise their achievement in math. We assign 29 of them to a control group and 26 of them to a treatment group. We measure them on two occasions, once before the treatment is implemented and once after it is implemented. We want to see if the treatment results in a greater probability of students attaining proficiency compared with the control group. We can see that at from the previous measurement, there were 33 students were not proficient and 22 students were proficient. This would be a natural response rate of 22/55 or 0.40 (40%). Table 1 Premath1* postmath2 Crosstabulation post Total 0 1 Count Expected Count pre Count Expected Count Count Total Expected Count Students were randomly assigned to receive the treatment whether or not they were proficient. At the end, there were 31 students were not proficient and 24 were proficient. The issue is whether the treatment helped students become proficient in the second measurement, after taking into consideration their previous proficiency status (premath1).

2 Ronald Heck Class Notes: Week 8 2 Logit Model It should be noted that if the distribution of the dependent variable is somewhere between 0.9 and 0.1, the two models will typically provide very similar results. Where the distribution is outside of this, the logit model generally provides more accurate results in spreading the proportions over a wider range of the transformed scale (Hox, 2010). Below are the results of the logistic model. We can calculate the probability of being proficient in the logit model for the person who is not in the treatment (coded 0) and not proficient (0) on the pre-treatment measure (premath1) from the following formula [odds/(1 + odds)]. In this case that will be 0.069/1.069, or a probability of (6.5%). Table 2 Parameter B Std. Error Hypothesis Test Exp(B) (Intercept) treatment premath a Dependent Variable: postmath2 a. Fixed at the displayed value. We can see that the model with two predictors is better than the intercept only model. Table 3 Omnibus Test a Likelihood Ratio Chi Dependent Variable: postmath2 a. Compares the fitted model against the intercept-only model.

3 Ronald Heck Class Notes: Week 8 3 Here are the pseudo r-square and -2LL information (log likelihood = ). Table 4 Model Summary Step -2 Log likelihood Cox & Snell R Nagelkerke R In the next table we have the estimated means for the treatment and control groups showing the impact of the treatment on ending math proficiency for both groups. Table 5 Estimated Means treatment Mean Std. Error 95% Wald Confidence Interval Lower Upper Covariates appearing in the model are fixed at the following values: premath1 = 0.40 Probit Model Here is the same model using probit coefficients. The probit model assumes that proficiency represents a somewhat arbitrary point on an underlying continuous (latent) variable ( i ). A probit (probability unit) describes a one unit increase in the independent variable corresponding with a β standard deviation increase in the dependent variable. The probit of a proportion represents a point on the normal curve that has a specific proportion (or area under the normal curve) to the left of it. This suggests it results in the probability of the events that will fall between 0 and 1. The probit units are not as easily interpreted because they represent the probittransformed predicted score for every unit change in the predictor. A positive probit coefficient means that an increase in the predictor leads to an increase in the predicted probability. A negative coefficient means that an increase in the predictor leads to a decrease in the predicted probability. We can see that the intercept (-1.561) refers to the z-score of a student on the posttest who has values of 0 for the independent variables; that is, the student was not proficient at the beginning

4 Ronald Heck Class Notes: Week 8 4 of the study and was not in the treatment group. As a reference point, a z-score of 1.0 refers to a probit of , while a z-score of -1.0 refers to a probit of If we consult a table of proportions under the normal curve, we find that z score = 1.56 takes in ( ) of the area under the curve (leaving ). This suggests that an intercept z score = would take in about of the area to the left under the curve, so the probability of being proficient for students who were 0 on the predictors in the probit model would be about 6.0%. Here is an easy calculator ( Being in the treatment increases the predicted probit index by of a standard deviation. The corresponding logit increase is We note that logit coefficients are usually somewhere between 1.6 to 1.8 times as large as probit coefficients. In this case, if we take the ratio of the probit coefficient for treatment to the corresponding logit coefficient (1.764/0.948), we find it is a bit more than 1.8 (1.86). For the premath1 estimate, the ratio is 1.75 (3.758/2.151). Table 6 Parameter B Std. Error Hypothesis Test df Sig. (Intercept) [treatment=1] [treatment=0] 0 a.... premath b Dependent variable = postmath2 Here are the predicted means for the two groups. You can see the mean of the treatment group differs slightly from the logit model. Table 7 Estimates treatment Mean Std. Error 95% Wald Confidence Interval Lower Upper Covariates appearing in the model are fixed at the following values: premath1 = 0.40.

5 Ronald Heck Class Notes: Week 8 5 Interpreting Probit Coefficients Although it is possible to interpret the probit coefficients as changes in z-scores, we typically convert the z-scores into the probability that Y = 1. Because the probit function is based on the normal distribution, an increase in the z-score of a unit in the predictor does not affect the probability that Y = 1 in a uniform way. If one looks at the table of Z-scores and their probabilities, we can see that at the mean (z = 0), a variable that produces a z-score increase of 1.0 (which is similar to the treatment effect of 0.95) increases the probability of being proficient from 0.50 to In contrast, if we look at a z-score of -2.0, a variable that produces a z-score increase of 1.0 standard deviation results in a change in the probability of being proficient from about 0.02 to about Therefore, we can confirm from this example that the size of the effect of X on the outcome differs depending on where we are beginning on the normal curve. Because probit analyses are often presented in terms of probabilities, we find that the increase in the probability Y =1 are dependent on the intervals on the normal curve we are trying to predict. Table 8 Z-scores and Associated Probabilities z-score Probability In contrast, when we work with odds ratios, an increase of one unit in X results in the same increase in predicted odds because, as X increases by one unit, the expected odds of proficiency are multiplied by e β times. Of course, when we consider the relationship between the probability Y = 1 in logistic regression, we also find that the rate of change in the predicted probability Y = 1 varies depending on the value of X (giving the shape of an S curve). However, it is important to note that the results of logistic regression analyses are not typically presented as probabilities. In this case the adjusted intercept is We see that being in the treatment increases the predicted probit index by of a standard deviation. We can calculate probability of being proficient when one is in the treatment group (and not proficient to begin). This will be = The resulting proportion under the curve for a z-score of is We would say then that the probability of being proficient for a student who was not proficient at the

6 Ronald Heck Class Notes: Week 8 6 start but in the treatment group would be about (or 27.1%). Estimated from the logistic link function, we would multiply the intercept and treatment odds ratios (0.069*5.834= 0.403). Using the formula odds/(1+odds), we can estimate the probability as about A second way to do this is to choose values for the variables where the probability Y = 1 is close to 0.50, as this will be the steepest slope. Count Data (Poisson Distribution) Poisson models (count) have only one parameter an event rate. For the Poisson distribution the mean and the variance are both expressed by: 2. This suggests the mean and variance are equal to the event rate ( ). With real data, however, if the event rate increases, it is generally the case that the frequency of higher counts increases and thus the variance increases. This can introduce overdispersion into the model. It is recommended, therefore, to use robust standard errors to adjust for mild violation of the distribution assumption (Hox, 2010). We will begin with a model to examine variables that might explain students likelihood to fail a core course (English, math, social studies, science) during their ninth grade year. The outcome then has a range from 0 to 4. The mean event rate is with a variance of if we square the standard deviation. Table 9 Continuous Variable Information N Minimum Maximum Mean Std. Deviation Dependent Variable fail The intercept is the following: Table 10 Parameter B Std. Error Hypothesis Test Exp(B) (Intercept) a Dependent Variable: fail a. Fixed at the displayed value.

7 Ronald Heck Class Notes: Week 8 7 The natural log of the expected count of failing a course during freshman year in the population is (-0.508). If we exponentiate the logged count ( e ) we obtain the expected count ( = 0.602) in the descriptive table above. This is because the natural log is the canonical link function for the Poisson distribution (Azen & Walker, 2011). The inverse of the canonical link function returns the same value. For the Poisson distribution the inverse function (or mean function) is then e. This can also be referred to as an event rate (i.e., the number of events per a specific time period) or incident rate. Once we know the rate, we can determine the probability of failing a given number of courses in the population. The probability of a count of c events is defined as follows: c e PY ( c). c! We can calculate the probability of not failing any course then as (1) 0! 1 (0.60) (0) e The probability of failing one course will then be (0.60) 0! 1 (0.60) (1) 1 e The probability of failing two courses will be (0.60) 0! (2)(1) e (0.60) (1) You can fill in the probabilities for three and four courses. Model 2 We might decide to investigate the effect of gender (male) on failing a course. Here we can see that males are significantly more likely to fail a course than females ( = 1.275, p <.001). We can discover the meaning of this incident rate for males versus females. Using the formula above, the probability of a female failing no courses will be (0.527) (0) e (1) 0!

8 Ronald Heck Class Notes: Week 8 8 Table 11 Parameter B Std. Error Hypothesis Test Exp(B) (Intercept) Male a Dependent Variable: fail a. Fixed at the displayed value. For males, the event rate for not failing a course will be (0.527*1.275). Using the above formula, probability of a male not failing a course will then be (1) 0! 1 (0.672) (0) e It can also be noted that the meaning of the predictor s (gender) effect on the outcome is that a unit change in X (i.e., from female to male) results in an expected incident rate of The ratio of the incident rate of males (0.672) to females (0.527) will then be This suggests that the event rate for males is times (or 27.5% more) than the event rate for females. Considering Possible Overdispersion There is information in the table below about possible overdispersion in the model. Typically the scale factor is set to 1.0, which means that the residuals follow a Poisson distribution exactly. The ratio of the model deviance to the degrees of freedom is one indicator of possible overdispersion. If there is none, the ratio will be close to 1.0. Some prefer the Pearson chi-square to the degrees of freedom for this purpose. When the ratio is much larger than expected under the assumptions of the Poisson distribution, one possible solution is to add a dispersion parameter. SPSS does not support this under the Poission distribution but allows it under the negative binomial model, which allows extra variance in the counts (Hox, 2010). This amounts to adding an error term to the model. The user can set the new parameter at some value such as (instead of the fixed parameter of 1.0) or

9 Ronald Heck Class Notes: Week 8 9 allow it to be estimated by the program. We note that when the scale parameter in the negative binomial is fixed at 0, the two distributions will be the same. Table 12 Goodness of Fit a Value df Value/df Deviance Scaled Deviance Pearson Chi Scaled Pearson Chi Log Likelihood b Akaike's Information Criterion (AIC) Finite Sample Corrected AIC (AICC) Bayesian Information Criterion (BIC) Consistent AIC (CAIC) Dependent Variable: fail a. Information criteria are in small-is-better form. b. The kernel of the log likelihood function is displayed and used in computing information criteria. Negative Binomial Distribution (To include an error term for dispersion) Following we have the same model estimated with the negative binomial distribution, with the scale parameter set to 0. This will result in the same model as estimated with the Poisson distribution. Table 13 Parameter B Std. Error Hypothesis Test Exp(B) (Intercept) Male a (Negative binomial) 0 a Dependent Variable: fail Model: (Intercept), male a. Fixed at the displayed value.

10 Ronald Heck Class Notes: Week 8 10 Here is the model when the overdispersion is estimated by the program (2.316). You need to use the custom designation (negative binomial) to estimate this model (with log link). In this case, there is no visible difference, except that the model fit indices (not included) indicate a better fit. For example, AIC is 15, in the present model versus in the previous model (which was not accounting for overdispersion), where AIC was 17, Table 14 Parameter B Std. Error Hypothesis Test Exp(B) (Intercept) Male a (Negative binomial) Dependent Variable: fail a. Fixed at the displayed value. Reference Hox, J. (2010). Multilevel analysis: Techniques and applications (2 nd ed). New York: Routledge Academic.

Model Estimation Example

Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions