SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.

Size: px

Start display at page:

Download "SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU."

Gavin Ryan
5 years ago
Views:

1 SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU Erling Berge 00 Literature Logistic regression II Hamilton Ch 7 p7-4 Erling Berge 00 Erling Berge 00

2 Definitions I The probability that person no i shall have the value on the variable Y i will be written Pr(Y i =). Then Pr(Y i ) = - Pr(Y i =) The odds that person no i shall have the value on the variable Y i, here called O i, is the ratio between two probabilities O i ( y ) i ( yi = ) ( y ) Pr pi = = = Pr = p Erling Berge 00 3 i i Definitions II The LOGIT, L i, for person no i (corresponding to Pr(Y i =)) is the natural logarithm of the odds, O i, that person no i has the value on variable Y i. This is written: L i = ln(o i ) = ln{p i /(-p i )} The model assumes that L i is a linear function of the explanatory variables x ji, i.e.: L i = β 0 + Σ j β j x ji, where j=,..,k-, and i=,..,n Erling Berge 00 4 Erling Berge 00

3 Logistic regression: assumptions The model is correctly specified The logit is linear in its parameters All relevant variables are included No irrelevant variables are included x-variables are measured without error Observations are independent No perfect multicollinearity No perfect discrimination Sufficiently large sample Erling Berge 00 5 Assumptions that cannot be tested Model specification All relevant variables are included x-variables are measured without error Observations are independent Two will be tested automatically. If the model can be estimated there is No perfect multicollinearity and No perfect discrimination Erling Berge 00 6 Erling Berge 00 3

4 LOGISTIC REGRESSION Statistical problems may be due to Too small a sample High degree of multicollinearity Leading to large standard errors (imprecise estimates) Multicollinearity is discovered and treated in the same way as in OLS regression High degree of discrimination (or separation) Leading to large standard errors (imprecise estimates) Will be discovered automatically by SPSS Erling Berge 00 7 Assumptions that can be tested Model specification logit is linear in the parameters no irrelevant variables are included Sufficiently large sample What constitutes a sufficiently large sample is not always clear. It depends on how the cases are distributed between 0 and categories. If one of these is too small there will be problems estimating partial effects. It also depends on the number of different patterns in the sample and how cases are distributed across these Erling Berge 00 8 Erling Berge 00 4

5 LOGISTIC REGRESSION: TESTING () Testing implies an assessment of whether statistical problems leads to departure from the assumptions Two tests are useful () The Likelihood ratio test statistic: χ Η Can be used analogous to the F-test () Wald test The square root of this can be used analogous to the t-test Erling Berge 00 9 LOGISTIC REGRESSION: TESTING () The Likelihood Ratio test : A ratio between two Likelihoods is equivalent to a difference between two LogLikelihoods The difference between the LogLikelihood (LL) of two nested models, estimated on the same data, can be used to test which of two models fits the data best, just like the F-statistic is used in OLS regression The test can also be used for singe regression coefficients (single variables). In small samples it has better properties than the Wald statistic Erling Berge 00 0 Erling Berge 00 5

6 LOGISTIC REGRESSION: TESTING (3) The Likelihood Ratio test statistic χ Η = -[LL(model) - LL(model)] will, if the null hypothesis of no difference between the two models is correct, be distributed approximately (for large n) as the chi-square distribution with number of degrees of freedom equal to the difference in number of parameters in the two models (H) Erling Berge 00 Example of a Likelihood Ratio test Model : just constant Model : constant plus one variable χ Η = -[LL(model) - LL(model)] = -LL(model) + LL(model) Find the value of the ChiSquare and the number of degrees of freedom e.g.: LogLikelihood (mod) = 09,/(-) LogLikelihood (mod) = 95,67/(-) From Tab 7.: - Log- Likelihood 09, 95,684 95,69 95,67 95,67 Erling Berge 00 Erling Berge 00 6

7 Logistic Regression in SPSS I Case Processing Summary Unweighted Cases a Selected Cases Included in Analysis Unselected Cases Total Missing Cases Total N Percent a. If weight is in effect, see classification table for the total number of cases. Dependent Variable Encoding Original Value OPEN CLOSE Internal Value 0 Erling Berge 00 3 Logistic Regression in SPSS IIa Iteration History a,b,c Iteration Step Log Coefficients likelihood Constant a. Constant is included in the model. b. Initial - Log Likelihood: 09. c. Estimation terminated at iteration number 3 because parameter estimates changed by less than.00. Erling Berge 00 4 Erling Berge 00 7

8 Logistic Regression in SPSS IIb Classification Table a,b Observed Step 0 SCHOOLS SHOULD CLOSE Overall Percentage OPEN CLOSE a. Constant is included in the model. b. The cut value is.500 Predicted SCHOOLS SHOULD CLOSE Percentage OPEN CLOSE Correct Step 0 Constant Variables in the Equation B S.E. Wald df Sig. Exp(B) Variables not in the Equation Step 0 Variables Overall Statistics lived Score df Sig Erling Berge 00 5 Logistic Regression in SPSS IIIa Iteration History a,b,c,d Iteration Step 3 4 a. Method: Enter - Log Coefficients likelihood Constant lived b. Constant is included in the model. c. Initial - Log Likelihood: d. Estimation terminated at iteration number 4 because parameter estimates changed by less than.00. Erling Berge 00 6 Erling Berge 00 8

9 Logistic Regression in SPSS IIIb Omnibus Tests of Model Coefficients Step Step Block Model Chi-square df Sig Step Model Summary - Log Cox & Snell Nagelkerke likelihood R Square R Square a a. Estimation terminated at iteration number 4 because parameter estimates changed by less than.00. Erling Berge 00 7 Logistic Regression in SPSS IIIc Classification Table a Step Observed SCHOOLS SHOULD CLOSE Overall Percentage a. The cut value is.500 OPEN CLOSE Predicted SCHOOLS SHOULD CLOSE Percentage OPEN CLOSE Correct Step a lived Constant a. Variable(s) entered on step : lived. Variables in the Equation B S.E. Wald df Sig. Exp(B) Erling Berge 00 8 Erling Berge 00 9

10 LOGISTIC REGRESSION: TESTING (4) The Wald test The Wald (or chisquare) test statistic provided by SPSS = t = (b k / SE(b k )) (where t is the t used by Hamilton) can be used for testing single parameters similarly to the t-statistic of the OLS regression Check out Hamilton table 7.: find t, see Wald above If the null hypothesis is correct, t will (for large n) in logistic regression be approximately normally distributed If the null hypothesis is correct, the Wald statistic will (for large n) in logistic regression be approximately chisquare distributed with df= Erling Berge 00 9 Excerpt from Hamilton Table 7. Iterasjon Log likelihood 09, 5,534 49,466 49,38 49,38 49,38 Variables B S.E. Wald df Sig. Exp(B) Lived -,046,05 9,698,00,955 Educ -,66,090 3,404,065,847 Contam,08,465 6,739,009 3,347 Hsc,73,464,99,000 8,784 Constant,73,30,768,84 5,649 Erling Berge 00 0 Erling Berge 00 0

11 Confidence interval for parameter estimates Can be constructed based on the fact that the square root of the Wald statistic approximately follows a normal distribution with degree of freedom b k -t α *SE(b k ) < β k < b k + t α *SE(b k ) where t α is a value taken from the table of the normal distribution with level of significance equal to α Erling Berge 00 Can be constructed based on the t- distribution () If a table of the normal distribution is missing one may use the t-distribution since the t- distribution is approximately normally distributed for large n-k (e.g. for n-k > 0) Erling Berge 00 Erling Berge 00

12 Excerpt from Hamilton Table 7.3 (from SPSS) STATA SPSS Step lived B -,047 S.E.,07 t Wald 7,550 df Prob>t Sig.,006 Exp(B),954 educ -,06,093 4,887,07,84 contam,8,48 7,094,008 3,604 hsc,48,50,508,000,3 female -,05,557,009,96,950 kids -,67,566,406,36,5 nodad -,6,999 4,964,06,08 Constant,894,603 3,59,07 8,060 Erling Berge 00 3 More from Hamilton Table 7.3 Iteration - LogLike lihood Coefficients Const lived educ contam hsc female kids nodad Step0 09, -0,76 Step 47,08,565 -,07 -,30,78,764 -,05 -,365 -,074 4,48,538 -,04 -,87,47,39 -,037 -,580 -, ,054,859 -,046 -,04,69,40 -,050 -,66 -,84 4 4,049,893 -,047 -,06,8,48 -,05 -,67 -,5 5 4,049,894 -,047 -,06,8,48 -,05 -,67 -,6 Erling Berge 00 4 Erling Berge 00

13 Is the model in table 7.3 better than the model in table 7.? LL(model in 7.3) = 4,049/(-) LL(model in 7.) = 49,38/(-) χ Η = -[LL(model 7.) - LL(model 7.3)] Find χ Η value Find H Look up the table of the chisquare distribution Erling Berge 00 5 The model of the probability of observing y= for person i exp( Li ) Pr( yi = ) = E[ yi x] = = + exp + exp( L) K where the logit L =β + β ( L ) i 0 j ji j= of the explanatory variables X i is a linear function It is not easy to interpret the meaning of the β coefficients just based on this formula i Erling Berge 00 6 Erling Berge 00 3

14 The odds ratio The odds ratio, O, can be interpreted as the relative effect of having one variable value rather than another e.g. if x ki = t+ in L i and x ki = t in L i O = O i (Y i = L i )/ O i (Y i = L i ) = exp[l i ]/ exp[l i ] = exp[β k ] Why β k? Erling Berge 00 7 The odds ratio : example I The Odds for answering yes = e b 0+b *Alder+b *Kvinne+b 3 *E.utd+b 4 *Barn i HH The odds ratio for answering yes between women and men = e e b0+ b* Alder+ b* + b3* E. utd + b4* Barn _ i _ HH = b + b * Alder+ b *0 + b * E. utd + b * Barn _ i _ HH e b Remember the rules of power exponents Erling Berge 00 8 Erling Berge 00 4

15 Conditional Effect Plot Set all x-variables except x k to fixed values and enter these into the equation for the logit Plot Pr(Y=) as a function of x k i.e. P =/(+exp[-l]) = /(+exp[-konst - b k x k ]) for all reasonable values of x k, konst is the constant obtained by entering into the logit the fixed values of variables other than x k Erling Berge 00 9 Excerpt from Hamilton Table 7.4 B S.E. Wald df Sig. Exp(B) Minimu m Maximu m Mean lived -,040,05 6,559,00,96,00 8,00 9,680 educ -,97,093 4,509,034,8 6,00 0,00,954 contam,99,477 7,43,006 3,664,00,00,80 hsc,79,490,59,000 9,763,00,00,307 nodad -,73,75 5,696,07,77,00,00,699 Constant,8,330,69,0 8,866 Logit: L = *lived -0.97*educ +.99*contam +.79*hsc -.73*nodad Here we let lived vary and set in reasonable values for other variables Erling Berge Erling Berge 00 5

16 Conditional effect plot from Hamilton table 7.4 (fig7.5): effect of living for a long time in town y=/(+exp(-( x ))) y=/(+exp(-( x ))) y=/(+exp(-( x ))) Mean Max Min Erling Berge 00 3 Conditional effect plot from Hamilton table 7.4 (fig7.6): effect of pollution on own land y=/(+exp(-( x ))) y=/(+exp(-( x ))) y=/(+exp(-( x ))) Mean Max Min Erling Berge 00 3 Erling Berge 00 6

17 Coefficients of determination Logistic regression does not provide measures comparable to the coefficient of determination in OLS regression Several measures analogous to R have been proposed They are often called pseudo R Hamilton uses Aldrich and Nelson s pseudo R = χ /(χ +n) where χ = test statistic for the test of the whole model against a model with just a constant and n= the number of cases Erling Berge Some pseudo R in SPSS SPSS reports Cox and Snell, Nagelkerke, and in multinomial logistic regression also McFadden s proposal for R Aldrich and Nelson s pseudo R can easily be computed by ourselves [pseudo R = χ /(χ +n)] Model Summary Step - Log likelihood *** Cox & Snell R Square *** Nagelkerke R Square *** Pseudo R-Square Cox and Snell Nagelkerke McFadden *** *** *** Erling Berge Erling Berge 00 7

18 Statistical problem: linearity of the logit Curvilinearity of the logit can give biased parameter estimates Scatter plot for y - x is not informative since y only has values To test if the logit is linear in an x-variable one may do as follows Group the x variable For every group find average of y and compute the logit for this value Make a graph of the logits against the grouped x Erling Berge Y= Closing school vs. x= Years lived in town,00 0,80 SCHOOLS SHOULD CLOSE 0,60 0,40 0,0 Scatter plot is not very informative 0,00 0,00 0,00 40,00 60,00 80,00 00,00 YEARS LIVED IN WILLIAMSTOWN Erling Berge Erling Berge 00 8

19 Linearity in logit: example Recall: Logit = L i = ln(o i ) = ln{p i /(-p i )} SCHOOLS SHOULD CLOSE YEARS LIVED IN WILLIAMSTOWN (Banded) <= N OPEN N Within group CLOSE Mean (=p) 3,65 4,50 0,59 7,44 8,4,3,3 Logit Ln(p/(-p)) 0,69 0 0,364-0,4-0,33 -,90 -,90 Erling Berge Is the logit linear in years lived in town? Chart ln(b/(-b)) Maybe! GroupedLived GroupedLived Erling Berge Erling Berge 00 9

20 In case of curvilinearity the odds ratio is non-constant Assume the logit is curvilinear in education. Then the odds ratio for answering yes, adding one year of education, is: e e ( ) ( ) b + b * Alder+ b * Kvinne+ b * E. utd + + b * E. utd + 0 a k utd utd e b + b * Alder+ b * Kvinne+ b *. E utd + b *. E utd 0 a k utd utd ( ) ( ) e = = e 0 *. Eutd e b + b * Eutd. + Eutd. + b + b * Eutd. + utd utd utd utd e b utd = utd ( ) b + b * Eutd. + utd Erling Berge Statistical problems: influence Influence from outliers and unusual x- values are just as problematic in logistic regression as in OLS regression Transformation of x-variables to symmetry will minimize the influence of extreme variable values Large residuals are indicators of large influence Erling Berge Erling Berge 00 0

21 Influence: residuals There are several ways to standardize residuals Pearson residuals Deviance residuals Influence can be based on Pearson residual Deviance residual Leverage (potential for influence): i.e. the statistic h j Erling Berge 00 4 Diagnostic graphs Outlier plots can be based on plots of estimated probability of Y i = (estimated P i ) against Delta* B, Δ B j, or Delta* Pearson Chisquare, Δχ P(j), or Delta* Deviance Chisquare, Δχ D(j) * Delta can be translated as change in Erling Berge 00 4 Erling Berge 00

22 SPSS output Cook's = delta B in Hamilton The logistic regression analogue of Cook's influence statistic. A measure of how much the residuals of all cases would change if a particular case were excluded from the calculation of the regression coefficients. Leverage Value = h in Hamilton The relative influence of each observation on the model's fit. DfBeta(s) is not used by Hamilton in logistic regression The difference in beta value is the change in the regression coefficient that results from the exclusion of a particular case. A value is computed for each term in the model, including the constant. Erling Berge ,70000 Delta B 0,60000 Analog of Cook's influence statistics 0, , , ,0000 0,0000 0, , ,0000 0, , ,80000,00000 Predicted probability Erling Berge Erling Berge 00

23 SPSS output from Save () Unstandardized Residuals The difference between an observed value and the value predicted by the model. Logit Residual ei e ; ˆ i = whereei = yi π i ˆ π ( ˆ π ) i i π i is the probability that y i = ; the hat means estimated value Erling Berge SPSS output from Save () Standardized = Pearson residual The command standardized will make SPSS write a variable called ZRE_ and labelled Normalized residual This is the same as the Pearson residual in Hamilton Studentized = [SQRT(delta deviance chisquare)] The command Studentized will make SPSS write a variable called SRE_ and labelled Standardized residual This is the same as the square root of delta Deviance chisquare in Hamilton, i.e. delta Deviance chisquare = (SRE_) Deviance = Deviance residual The command Deviance will make SPSS write a variable called DEV_ and labelled Deviance value This is the same as the deviance residual in Hamilton Erling Berge Erling Berge 00 3

24 Computation of Δχ P(i) Based on the quantities provided by SPSS we can compute delta Pearson chisquare Where it says r j in the formula we put in ZRE_ and where it says h j we put in LEV_ Δ χ = P( j) r j ( hj ) Erling Berge Computation of Δχ D(i) Based on the quantities provided by SPSS we can compute Delta Deviance Chisquare. To find delta deviance chisquare we square SRE_. Alternatively we put in d j =DEV_ and h j =LEV_ in the formula Δ χ D ( j ) = SRE _* SRE _ Δ χ = D ( j) ( h j ) Erling Berge d j Erling Berge 00 4

25 DeltaDevianceChisquare (with/caseno) 7,00 96,00 6,00 99,00 5,00 DeltaAvviksKjiKv 4,00 3,00 3,00 65,00,00,00 0,00 0, ,0000 0, , ,80000,00000 Predicted probability Erling Berge DeltaDevianceChisquare (with/delta B) 7,00,44 6,00,6366 5,00 DeltaAvviksKjiKv 4,00 3,00,08437,0584,697,33699,00,00 0,00 0, ,0000 0, , ,80000,00000 Predicted probability Erling Berge Erling Berge 00 5

26 Delta Pearson Chisquare (with/caseno) 30,00 96,00 5,00 DeltaPearsonKjiKv 0,00 5,00 0,00 99,00 65,00 5,00 3,00 0,00 0, ,0000 0, , ,80000,00000 Predicted probability Erling Berge 00 5 Delta Pearson Chisquare (with/ delta B) 30,00,44 5,00 DeltaPearsonKjiKv 0,00 5,00 0,00,6366, ,00,08437,0584,697 0,00 0, ,0000 0, , ,80000,00000 Predicted probability Erling Berge 00 5 Erling Berge 00 6

27 Cases with large influence Variables Case No 96 Case No 65 Case No 99 Variables Case No 96 Case No 65 Case No 99 Y=close,00,00,00 ZRE_ 4, -,48-5,36 lived 68,00 40,00,00 DEV_,4 -,98 -,6 educ,00,00,00 DFB0_ -,3,0 -,36 contam,00,00,00 DFB_,0,00,00 hsc,00,00,00 DFB_,0,0,0 nodad,00,00,00 DFB3_ -,08 -,5 -,8 PRE_,05,86,97 DFB4_ -,06 -,7 -,9 COO_,64,34,4 DFB5_ -,08,6,4 RES_,95 -,86 -,97 DeltaPearsonKjiKv 8,34 6,47 9,0 SRE_,46 -,04 -,6 DeltaAvviksKjiKv 6,07 4,4 6,89 Erling Berge From Cases to Patterns The figures shown previously are not identical to those you see in Hamilton Hamilton has corrected for the effect of identical patterns Erling Berge Erling Berge 00 7

28 Influence from a shared pattern of x-variables In a logistic regression with few variables many cases will have the same value on all x-variables. Every combination of x-variable values is called a pattern When many cases have the same pattern, every case may have a small influence, but collectively they may have unusually large influence on parameter estimates Influential patterns in x-values can give biased parameter estimates Erling Berge Influence: Patterns in x-values Predicted value, and hence the residual will be the same for all cases with the same pattern Influence from pattern j can be found by means of The frequency of the pattern Pearson residual Deviance residual Leverage: i.e. the statistic h j Erling Berge Erling Berge 00 8

29 Finding X-pattern by means of SPSS In the Data menu find the command Identify duplicate cases Mark the x-variables that are used in the model and move them to Define matching cases by Cross for Sequential count of matching cases in each group and Display frequencies for created variables This produces two new variables. One, MatchSequence, numbers cases sequentially,, where several patterns are identical. If the pattern is unique this variable has the value 0. The other variable, Primary, has the value 0 for duplicates and for unique patterns Erling Berge X-patterns in SPSS; Hamilton p38-4 Frequency Percent Valid Percent Cumulative Percent Duplicate Case 3,7 3,7 3,7 Primary Case Total ,3 00,0 86,3 00,0 00,0 Sequential count of matching cases 0 [5 patterns with case] [7 patterns with or 3 cases] [7 4=3 patterns with cases] 3 [4 patterns with 3 cases] Total Frequency Percent 75,,,,6 00,0 Valid Percent 75,,,,6 00,0 Cumulative Percent 75, 86,3 97,4 00,0 Erling Berge Erling Berge 00 9

30 J m j Pˆj Y j r j χ P d j χ D h i h j Hamilton table 7.6 Symbols # unique patterns of x-values in the data (J<=n) # cases with the pattern j (m>=) Predicted probability of Y= for case with pattern j Sum of y-values for cases with pattern j (= # cases with pattern j and y=) Pearson residual for pattern j Pearson Chisquare statistic Deviance residual for pattern j Deviance Chisquare statistic Leverage for case i Leverage for pattern j Erling Berge New values for Δχ P(i) and Δχ D(i) By Compute one may calculate the Pearson residual (equation 7.9 in Hamilton) and delta Pearson chisquare (equation 7.4 in Hamilton) once more. This will provide the correct values The same applies for deviance residual (equation 7.) and delta deviance chisquare (equation 7.5a) Erling Berge Erling Berge 00 30

31 Leverage and residuals () Leverage of a pattern is obtained as number of cases with the pattern times the leverage of a case with this pattern. The leverage of a case is the same as in OLS regression h j = m j *h i Pearson residual can be found from r j = Y mpˆ m Pˆ j j j ( Pˆ ) j j j Erling Berge 00 6 Leverage and residuals () Deviance residual can be found from Y j mj Y j d j =± Yj ln + ( mj Yj) ln mpˆ ( ˆ j j mj Pj) Erling Berge 00 6 Erling Berge 00 3

32 Two Chi-square statistics Pearson Chi-square statistics Deviance Chi-square statistics Equations are the same for both cases and patterns χ χ J P = rj j= J D = d j j= Erling Berge The Chisquare statistics Both Chisquare statistics:. Pearson-Chisquare χ P and. Deviance-Chisquare χ D Can be read as a test of the null hypothesis of no difference between the estimated model and a saturated model, that is a model with as many parameters as there are cases/ patterns Erling Berge Erling Berge 00 3

33 Large values of measures of influence Measures of influence based on changes in (Δ) the statistic/ parameter value due to excluded cases with pattern j ΔB j delta B - analogue to Cook s D Δχ P(i) delta Pearson-Chisquare Δχ D(i) delta Deviance-Chisquare Erling Berge What is a large value of Δχ P(i) and Δχ D(i) Both Δχ P(i) and Δχ D(i) measure how badly the model fits the pattern j. Large values indicates that the model would fit the data much better if all cases with this pattern were excluded Since both measures are distributed asymptotically as the chisquare distribution, values larger than 4 indicate that a pattern affects the estimated parameters significantly Erling Berge Erling Berge 00 33

34 ΔB j delta B Measures the standardized change in the estimated parameters (b k ) that obtain when all cases with a given pattern j are excluded Δ B = j rh j j ( h ) Larger values means larger influence ΔB j >= must in any case be seen as large influence j Erling Berge delta B (with caseno) 0,70 99,00 0,60 0,50 96,00 deltab_m 0,40 0,30 66,00 56,00 65,00 0,0 0,0 0,00 0, ,0000 0, , ,80000,00000 Predicted probability Erling Berge Erling Berge 00 34

35 Δχ P(i) Delta Pearson Chisquare Measures the reduction in Pearson χ that obtains from excluding all cases with pattern j Δ χ = P( j) r j ( h ) j Erling Berge Delta Pearson Chisquare (with delta B) 30,00,44 5,00 mdeltapearsonkjikv 0,00 5,00 0,00,6366, ,00,08437,04734,038,697 0,00 0, ,0000 0, , ,80000,00000 Predicted probability Erling Berge Erling Berge 00 35

36 Δχ D(i) Delta Deviance Chisquare Measures changes in deviance that obtains j from excluding all Δ χd( j) = cases with pattern j hj This is equivalent to Δ D( j ) = LLK LLK(j) d ( ) χ LL K is the LogLikelihood of a model with K parameters estimated on the whole sample and LL K(j) is from the estimate of the same model when all cases with pattern j are excluded Erling Berge 00 7 Delta Deviance Chisquare (with delta B) 7,00,44 6,00,6366 5,00,04734 mdeltaavvikskjikv 4,00 3,00,08437,038,33699,00,00 0,00 0, ,0000 0, , ,80000,00000 Predicted probability Erling Berge 00 7 Erling Berge 00 36

37 Influence of excluded cases/patterns Variables in the model lived educ contam hsc nodad Constant *LL(modell) Sample -,040 -,97,99,79 -,73,8-4,65 Logit coefficient Excluding case 99 Δχ P(i) =8,34 -,045 -,4,490,49 -,889,575-35,45 Excluding case 96 Δχ P(i) =9,0 -,05 -,4,38,347 -,658,530-36,4 Erling Berge Influence of excluded cases/patterns y=/(+exp(-( x ))) y=/(+exp(-( x ))) y=/(+exp(-( x ))) Erling Berge Erling Berge 00 37

38 Conclusions () Ordinary OLS do not work well for dichotomous dependent variables since It is impossible to obtain normally distributed errors or homoscedasticity, and since The model predicts probabilities outside the interval [0-] The Logit model is for theoretical reasons better Likelihood ratio tests statistic can be used to test nested models analogous to the F-statistic In large samples the chisquare distributed Wald statistic [or the normally distributed t=sqrt(wald)] will be able to test single coefficients and provide confidence intervals There is no statistic similar to the coefficient of determination Erling Berge Conclusions () Coefficient of estimated models can be interpreted by. Log-odds (direct interpretation). Odds 3. Odds ratio 4. Probability (conditional effect plot) Non-linearity, case with influence, and multicollinearity leads to the same kinds of problems as in OLS regression (inaccurate or uncertain parameter values) Discrimination leads to problems of uncertain parameter values (large variance estimates) Diagnostic work is important Erling Berge Erling Berge 00 38

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b