SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.

Size: px
Start display at page:

Download "SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU."

Transcription

1 SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU Erling Berge 00 Literature Logistic regression II Hamilton Ch 7 p7-4 Erling Berge 00 Erling Berge 00

2 Definitions I The probability that person no i shall have the value on the variable Y i will be written Pr(Y i =). Then Pr(Y i ) = - Pr(Y i =) The odds that person no i shall have the value on the variable Y i, here called O i, is the ratio between two probabilities O i ( y ) i ( yi = ) ( y ) Pr pi = = = Pr = p Erling Berge 00 3 i i Definitions II The LOGIT, L i, for person no i (corresponding to Pr(Y i =)) is the natural logarithm of the odds, O i, that person no i has the value on variable Y i. This is written: L i = ln(o i ) = ln{p i /(-p i )} The model assumes that L i is a linear function of the explanatory variables x ji, i.e.: L i = β 0 + Σ j β j x ji, where j=,..,k-, and i=,..,n Erling Berge 00 4 Erling Berge 00

3 Logistic regression: assumptions The model is correctly specified The logit is linear in its parameters All relevant variables are included No irrelevant variables are included x-variables are measured without error Observations are independent No perfect multicollinearity No perfect discrimination Sufficiently large sample Erling Berge 00 5 Assumptions that cannot be tested Model specification All relevant variables are included x-variables are measured without error Observations are independent Two will be tested automatically. If the model can be estimated there is No perfect multicollinearity and No perfect discrimination Erling Berge 00 6 Erling Berge 00 3

4 LOGISTIC REGRESSION Statistical problems may be due to Too small a sample High degree of multicollinearity Leading to large standard errors (imprecise estimates) Multicollinearity is discovered and treated in the same way as in OLS regression High degree of discrimination (or separation) Leading to large standard errors (imprecise estimates) Will be discovered automatically by SPSS Erling Berge 00 7 Assumptions that can be tested Model specification logit is linear in the parameters no irrelevant variables are included Sufficiently large sample What constitutes a sufficiently large sample is not always clear. It depends on how the cases are distributed between 0 and categories. If one of these is too small there will be problems estimating partial effects. It also depends on the number of different patterns in the sample and how cases are distributed across these Erling Berge 00 8 Erling Berge 00 4

5 LOGISTIC REGRESSION: TESTING () Testing implies an assessment of whether statistical problems leads to departure from the assumptions Two tests are useful () The Likelihood ratio test statistic: χ Η Can be used analogous to the F-test () Wald test The square root of this can be used analogous to the t-test Erling Berge 00 9 LOGISTIC REGRESSION: TESTING () The Likelihood Ratio test : A ratio between two Likelihoods is equivalent to a difference between two LogLikelihoods The difference between the LogLikelihood (LL) of two nested models, estimated on the same data, can be used to test which of two models fits the data best, just like the F-statistic is used in OLS regression The test can also be used for singe regression coefficients (single variables). In small samples it has better properties than the Wald statistic Erling Berge 00 0 Erling Berge 00 5

6 LOGISTIC REGRESSION: TESTING (3) The Likelihood Ratio test statistic χ Η = -[LL(model) - LL(model)] will, if the null hypothesis of no difference between the two models is correct, be distributed approximately (for large n) as the chi-square distribution with number of degrees of freedom equal to the difference in number of parameters in the two models (H) Erling Berge 00 Example of a Likelihood Ratio test Model : just constant Model : constant plus one variable χ Η = -[LL(model) - LL(model)] = -LL(model) + LL(model) Find the value of the ChiSquare and the number of degrees of freedom e.g.: LogLikelihood (mod) = 09,/(-) LogLikelihood (mod) = 95,67/(-) From Tab 7.: - Log- Likelihood 09, 95,684 95,69 95,67 95,67 Erling Berge 00 Erling Berge 00 6

7 Logistic Regression in SPSS I Case Processing Summary Unweighted Cases a Selected Cases Included in Analysis Unselected Cases Total Missing Cases Total N Percent a. If weight is in effect, see classification table for the total number of cases. Dependent Variable Encoding Original Value OPEN CLOSE Internal Value 0 Erling Berge 00 3 Logistic Regression in SPSS IIa Iteration History a,b,c Iteration Step Log Coefficients likelihood Constant a. Constant is included in the model. b. Initial - Log Likelihood: 09. c. Estimation terminated at iteration number 3 because parameter estimates changed by less than.00. Erling Berge 00 4 Erling Berge 00 7

8 Logistic Regression in SPSS IIb Classification Table a,b Observed Step 0 SCHOOLS SHOULD CLOSE Overall Percentage OPEN CLOSE a. Constant is included in the model. b. The cut value is.500 Predicted SCHOOLS SHOULD CLOSE Percentage OPEN CLOSE Correct Step 0 Constant Variables in the Equation B S.E. Wald df Sig. Exp(B) Variables not in the Equation Step 0 Variables Overall Statistics lived Score df Sig Erling Berge 00 5 Logistic Regression in SPSS IIIa Iteration History a,b,c,d Iteration Step 3 4 a. Method: Enter - Log Coefficients likelihood Constant lived b. Constant is included in the model. c. Initial - Log Likelihood: d. Estimation terminated at iteration number 4 because parameter estimates changed by less than.00. Erling Berge 00 6 Erling Berge 00 8

9 Logistic Regression in SPSS IIIb Omnibus Tests of Model Coefficients Step Step Block Model Chi-square df Sig Step Model Summary - Log Cox & Snell Nagelkerke likelihood R Square R Square a a. Estimation terminated at iteration number 4 because parameter estimates changed by less than.00. Erling Berge 00 7 Logistic Regression in SPSS IIIc Classification Table a Step Observed SCHOOLS SHOULD CLOSE Overall Percentage a. The cut value is.500 OPEN CLOSE Predicted SCHOOLS SHOULD CLOSE Percentage OPEN CLOSE Correct Step a lived Constant a. Variable(s) entered on step : lived. Variables in the Equation B S.E. Wald df Sig. Exp(B) Erling Berge 00 8 Erling Berge 00 9

10 LOGISTIC REGRESSION: TESTING (4) The Wald test The Wald (or chisquare) test statistic provided by SPSS = t = (b k / SE(b k )) (where t is the t used by Hamilton) can be used for testing single parameters similarly to the t-statistic of the OLS regression Check out Hamilton table 7.: find t, see Wald above If the null hypothesis is correct, t will (for large n) in logistic regression be approximately normally distributed If the null hypothesis is correct, the Wald statistic will (for large n) in logistic regression be approximately chisquare distributed with df= Erling Berge 00 9 Excerpt from Hamilton Table 7. Iterasjon Log likelihood 09, 5,534 49,466 49,38 49,38 49,38 Variables B S.E. Wald df Sig. Exp(B) Lived -,046,05 9,698,00,955 Educ -,66,090 3,404,065,847 Contam,08,465 6,739,009 3,347 Hsc,73,464,99,000 8,784 Constant,73,30,768,84 5,649 Erling Berge 00 0 Erling Berge 00 0

11 Confidence interval for parameter estimates Can be constructed based on the fact that the square root of the Wald statistic approximately follows a normal distribution with degree of freedom b k -t α *SE(b k ) < β k < b k + t α *SE(b k ) where t α is a value taken from the table of the normal distribution with level of significance equal to α Erling Berge 00 Can be constructed based on the t- distribution () If a table of the normal distribution is missing one may use the t-distribution since the t- distribution is approximately normally distributed for large n-k (e.g. for n-k > 0) Erling Berge 00 Erling Berge 00

12 Excerpt from Hamilton Table 7.3 (from SPSS) STATA SPSS Step lived B -,047 S.E.,07 t Wald 7,550 df Prob>t Sig.,006 Exp(B),954 educ -,06,093 4,887,07,84 contam,8,48 7,094,008 3,604 hsc,48,50,508,000,3 female -,05,557,009,96,950 kids -,67,566,406,36,5 nodad -,6,999 4,964,06,08 Constant,894,603 3,59,07 8,060 Erling Berge 00 3 More from Hamilton Table 7.3 Iteration - LogLike lihood Coefficients Const lived educ contam hsc female kids nodad Step0 09, -0,76 Step 47,08,565 -,07 -,30,78,764 -,05 -,365 -,074 4,48,538 -,04 -,87,47,39 -,037 -,580 -, ,054,859 -,046 -,04,69,40 -,050 -,66 -,84 4 4,049,893 -,047 -,06,8,48 -,05 -,67 -,5 5 4,049,894 -,047 -,06,8,48 -,05 -,67 -,6 Erling Berge 00 4 Erling Berge 00

13 Is the model in table 7.3 better than the model in table 7.? LL(model in 7.3) = 4,049/(-) LL(model in 7.) = 49,38/(-) χ Η = -[LL(model 7.) - LL(model 7.3)] Find χ Η value Find H Look up the table of the chisquare distribution Erling Berge 00 5 The model of the probability of observing y= for person i exp( Li ) Pr( yi = ) = E[ yi x] = = + exp + exp( L) K where the logit L =β + β ( L ) i 0 j ji j= of the explanatory variables X i is a linear function It is not easy to interpret the meaning of the β coefficients just based on this formula i Erling Berge 00 6 Erling Berge 00 3

14 The odds ratio The odds ratio, O, can be interpreted as the relative effect of having one variable value rather than another e.g. if x ki = t+ in L i and x ki = t in L i O = O i (Y i = L i )/ O i (Y i = L i ) = exp[l i ]/ exp[l i ] = exp[β k ] Why β k? Erling Berge 00 7 The odds ratio : example I The Odds for answering yes = e b 0+b *Alder+b *Kvinne+b 3 *E.utd+b 4 *Barn i HH The odds ratio for answering yes between women and men = e e b0+ b* Alder+ b* + b3* E. utd + b4* Barn _ i _ HH = b + b * Alder+ b *0 + b * E. utd + b * Barn _ i _ HH e b Remember the rules of power exponents Erling Berge 00 8 Erling Berge 00 4

15 Conditional Effect Plot Set all x-variables except x k to fixed values and enter these into the equation for the logit Plot Pr(Y=) as a function of x k i.e. P =/(+exp[-l]) = /(+exp[-konst - b k x k ]) for all reasonable values of x k, konst is the constant obtained by entering into the logit the fixed values of variables other than x k Erling Berge 00 9 Excerpt from Hamilton Table 7.4 B S.E. Wald df Sig. Exp(B) Minimu m Maximu m Mean lived -,040,05 6,559,00,96,00 8,00 9,680 educ -,97,093 4,509,034,8 6,00 0,00,954 contam,99,477 7,43,006 3,664,00,00,80 hsc,79,490,59,000 9,763,00,00,307 nodad -,73,75 5,696,07,77,00,00,699 Constant,8,330,69,0 8,866 Logit: L = *lived -0.97*educ +.99*contam +.79*hsc -.73*nodad Here we let lived vary and set in reasonable values for other variables Erling Berge Erling Berge 00 5

16 Conditional effect plot from Hamilton table 7.4 (fig7.5): effect of living for a long time in town y=/(+exp(-( x ))) y=/(+exp(-( x ))) y=/(+exp(-( x ))) Mean Max Min Erling Berge 00 3 Conditional effect plot from Hamilton table 7.4 (fig7.6): effect of pollution on own land y=/(+exp(-( x ))) y=/(+exp(-( x ))) y=/(+exp(-( x ))) Mean Max Min Erling Berge 00 3 Erling Berge 00 6

17 Coefficients of determination Logistic regression does not provide measures comparable to the coefficient of determination in OLS regression Several measures analogous to R have been proposed They are often called pseudo R Hamilton uses Aldrich and Nelson s pseudo R = χ /(χ +n) where χ = test statistic for the test of the whole model against a model with just a constant and n= the number of cases Erling Berge Some pseudo R in SPSS SPSS reports Cox and Snell, Nagelkerke, and in multinomial logistic regression also McFadden s proposal for R Aldrich and Nelson s pseudo R can easily be computed by ourselves [pseudo R = χ /(χ +n)] Model Summary Step - Log likelihood *** Cox & Snell R Square *** Nagelkerke R Square *** Pseudo R-Square Cox and Snell Nagelkerke McFadden *** *** *** Erling Berge Erling Berge 00 7

18 Statistical problem: linearity of the logit Curvilinearity of the logit can give biased parameter estimates Scatter plot for y - x is not informative since y only has values To test if the logit is linear in an x-variable one may do as follows Group the x variable For every group find average of y and compute the logit for this value Make a graph of the logits against the grouped x Erling Berge Y= Closing school vs. x= Years lived in town,00 0,80 SCHOOLS SHOULD CLOSE 0,60 0,40 0,0 Scatter plot is not very informative 0,00 0,00 0,00 40,00 60,00 80,00 00,00 YEARS LIVED IN WILLIAMSTOWN Erling Berge Erling Berge 00 8

19 Linearity in logit: example Recall: Logit = L i = ln(o i ) = ln{p i /(-p i )} SCHOOLS SHOULD CLOSE YEARS LIVED IN WILLIAMSTOWN (Banded) <= N OPEN N Within group CLOSE Mean (=p) 3,65 4,50 0,59 7,44 8,4,3,3 Logit Ln(p/(-p)) 0,69 0 0,364-0,4-0,33 -,90 -,90 Erling Berge Is the logit linear in years lived in town? Chart ln(b/(-b)) Maybe! GroupedLived GroupedLived Erling Berge Erling Berge 00 9

20 In case of curvilinearity the odds ratio is non-constant Assume the logit is curvilinear in education. Then the odds ratio for answering yes, adding one year of education, is: e e ( ) ( ) b + b * Alder+ b * Kvinne+ b * E. utd + + b * E. utd + 0 a k utd utd e b + b * Alder+ b * Kvinne+ b *. E utd + b *. E utd 0 a k utd utd ( ) ( ) e = = e 0 *. Eutd e b + b * Eutd. + Eutd. + b + b * Eutd. + utd utd utd utd e b utd = utd ( ) b + b * Eutd. + utd Erling Berge Statistical problems: influence Influence from outliers and unusual x- values are just as problematic in logistic regression as in OLS regression Transformation of x-variables to symmetry will minimize the influence of extreme variable values Large residuals are indicators of large influence Erling Berge Erling Berge 00 0

21 Influence: residuals There are several ways to standardize residuals Pearson residuals Deviance residuals Influence can be based on Pearson residual Deviance residual Leverage (potential for influence): i.e. the statistic h j Erling Berge 00 4 Diagnostic graphs Outlier plots can be based on plots of estimated probability of Y i = (estimated P i ) against Delta* B, Δ B j, or Delta* Pearson Chisquare, Δχ P(j), or Delta* Deviance Chisquare, Δχ D(j) * Delta can be translated as change in Erling Berge 00 4 Erling Berge 00

22 SPSS output Cook's = delta B in Hamilton The logistic regression analogue of Cook's influence statistic. A measure of how much the residuals of all cases would change if a particular case were excluded from the calculation of the regression coefficients. Leverage Value = h in Hamilton The relative influence of each observation on the model's fit. DfBeta(s) is not used by Hamilton in logistic regression The difference in beta value is the change in the regression coefficient that results from the exclusion of a particular case. A value is computed for each term in the model, including the constant. Erling Berge ,70000 Delta B 0,60000 Analog of Cook's influence statistics 0, , , ,0000 0,0000 0, , ,0000 0, , ,80000,00000 Predicted probability Erling Berge Erling Berge 00

23 SPSS output from Save () Unstandardized Residuals The difference between an observed value and the value predicted by the model. Logit Residual ei e ; ˆ i = whereei = yi π i ˆ π ( ˆ π ) i i π i is the probability that y i = ; the hat means estimated value Erling Berge SPSS output from Save () Standardized = Pearson residual The command standardized will make SPSS write a variable called ZRE_ and labelled Normalized residual This is the same as the Pearson residual in Hamilton Studentized = [SQRT(delta deviance chisquare)] The command Studentized will make SPSS write a variable called SRE_ and labelled Standardized residual This is the same as the square root of delta Deviance chisquare in Hamilton, i.e. delta Deviance chisquare = (SRE_) Deviance = Deviance residual The command Deviance will make SPSS write a variable called DEV_ and labelled Deviance value This is the same as the deviance residual in Hamilton Erling Berge Erling Berge 00 3

24 Computation of Δχ P(i) Based on the quantities provided by SPSS we can compute delta Pearson chisquare Where it says r j in the formula we put in ZRE_ and where it says h j we put in LEV_ Δ χ = P( j) r j ( hj ) Erling Berge Computation of Δχ D(i) Based on the quantities provided by SPSS we can compute Delta Deviance Chisquare. To find delta deviance chisquare we square SRE_. Alternatively we put in d j =DEV_ and h j =LEV_ in the formula Δ χ D ( j ) = SRE _* SRE _ Δ χ = D ( j) ( h j ) Erling Berge d j Erling Berge 00 4

25 DeltaDevianceChisquare (with/caseno) 7,00 96,00 6,00 99,00 5,00 DeltaAvviksKjiKv 4,00 3,00 3,00 65,00,00,00 0,00 0, ,0000 0, , ,80000,00000 Predicted probability Erling Berge DeltaDevianceChisquare (with/delta B) 7,00,44 6,00,6366 5,00 DeltaAvviksKjiKv 4,00 3,00,08437,0584,697,33699,00,00 0,00 0, ,0000 0, , ,80000,00000 Predicted probability Erling Berge Erling Berge 00 5

26 Delta Pearson Chisquare (with/caseno) 30,00 96,00 5,00 DeltaPearsonKjiKv 0,00 5,00 0,00 99,00 65,00 5,00 3,00 0,00 0, ,0000 0, , ,80000,00000 Predicted probability Erling Berge 00 5 Delta Pearson Chisquare (with/ delta B) 30,00,44 5,00 DeltaPearsonKjiKv 0,00 5,00 0,00,6366, ,00,08437,0584,697 0,00 0, ,0000 0, , ,80000,00000 Predicted probability Erling Berge 00 5 Erling Berge 00 6

27 Cases with large influence Variables Case No 96 Case No 65 Case No 99 Variables Case No 96 Case No 65 Case No 99 Y=close,00,00,00 ZRE_ 4, -,48-5,36 lived 68,00 40,00,00 DEV_,4 -,98 -,6 educ,00,00,00 DFB0_ -,3,0 -,36 contam,00,00,00 DFB_,0,00,00 hsc,00,00,00 DFB_,0,0,0 nodad,00,00,00 DFB3_ -,08 -,5 -,8 PRE_,05,86,97 DFB4_ -,06 -,7 -,9 COO_,64,34,4 DFB5_ -,08,6,4 RES_,95 -,86 -,97 DeltaPearsonKjiKv 8,34 6,47 9,0 SRE_,46 -,04 -,6 DeltaAvviksKjiKv 6,07 4,4 6,89 Erling Berge From Cases to Patterns The figures shown previously are not identical to those you see in Hamilton Hamilton has corrected for the effect of identical patterns Erling Berge Erling Berge 00 7

28 Influence from a shared pattern of x-variables In a logistic regression with few variables many cases will have the same value on all x-variables. Every combination of x-variable values is called a pattern When many cases have the same pattern, every case may have a small influence, but collectively they may have unusually large influence on parameter estimates Influential patterns in x-values can give biased parameter estimates Erling Berge Influence: Patterns in x-values Predicted value, and hence the residual will be the same for all cases with the same pattern Influence from pattern j can be found by means of The frequency of the pattern Pearson residual Deviance residual Leverage: i.e. the statistic h j Erling Berge Erling Berge 00 8

29 Finding X-pattern by means of SPSS In the Data menu find the command Identify duplicate cases Mark the x-variables that are used in the model and move them to Define matching cases by Cross for Sequential count of matching cases in each group and Display frequencies for created variables This produces two new variables. One, MatchSequence, numbers cases sequentially,, where several patterns are identical. If the pattern is unique this variable has the value 0. The other variable, Primary, has the value 0 for duplicates and for unique patterns Erling Berge X-patterns in SPSS; Hamilton p38-4 Frequency Percent Valid Percent Cumulative Percent Duplicate Case 3,7 3,7 3,7 Primary Case Total ,3 00,0 86,3 00,0 00,0 Sequential count of matching cases 0 [5 patterns with case] [7 patterns with or 3 cases] [7 4=3 patterns with cases] 3 [4 patterns with 3 cases] Total Frequency Percent 75,,,,6 00,0 Valid Percent 75,,,,6 00,0 Cumulative Percent 75, 86,3 97,4 00,0 Erling Berge Erling Berge 00 9

30 J m j Pˆj Y j r j χ P d j χ D h i h j Hamilton table 7.6 Symbols # unique patterns of x-values in the data (J<=n) # cases with the pattern j (m>=) Predicted probability of Y= for case with pattern j Sum of y-values for cases with pattern j (= # cases with pattern j and y=) Pearson residual for pattern j Pearson Chisquare statistic Deviance residual for pattern j Deviance Chisquare statistic Leverage for case i Leverage for pattern j Erling Berge New values for Δχ P(i) and Δχ D(i) By Compute one may calculate the Pearson residual (equation 7.9 in Hamilton) and delta Pearson chisquare (equation 7.4 in Hamilton) once more. This will provide the correct values The same applies for deviance residual (equation 7.) and delta deviance chisquare (equation 7.5a) Erling Berge Erling Berge 00 30

31 Leverage and residuals () Leverage of a pattern is obtained as number of cases with the pattern times the leverage of a case with this pattern. The leverage of a case is the same as in OLS regression h j = m j *h i Pearson residual can be found from r j = Y mpˆ m Pˆ j j j ( Pˆ ) j j j Erling Berge 00 6 Leverage and residuals () Deviance residual can be found from Y j mj Y j d j =± Yj ln + ( mj Yj) ln mpˆ ( ˆ j j mj Pj) Erling Berge 00 6 Erling Berge 00 3

32 Two Chi-square statistics Pearson Chi-square statistics Deviance Chi-square statistics Equations are the same for both cases and patterns χ χ J P = rj j= J D = d j j= Erling Berge The Chisquare statistics Both Chisquare statistics:. Pearson-Chisquare χ P and. Deviance-Chisquare χ D Can be read as a test of the null hypothesis of no difference between the estimated model and a saturated model, that is a model with as many parameters as there are cases/ patterns Erling Berge Erling Berge 00 3

33 Large values of measures of influence Measures of influence based on changes in (Δ) the statistic/ parameter value due to excluded cases with pattern j ΔB j delta B - analogue to Cook s D Δχ P(i) delta Pearson-Chisquare Δχ D(i) delta Deviance-Chisquare Erling Berge What is a large value of Δχ P(i) and Δχ D(i) Both Δχ P(i) and Δχ D(i) measure how badly the model fits the pattern j. Large values indicates that the model would fit the data much better if all cases with this pattern were excluded Since both measures are distributed asymptotically as the chisquare distribution, values larger than 4 indicate that a pattern affects the estimated parameters significantly Erling Berge Erling Berge 00 33

34 ΔB j delta B Measures the standardized change in the estimated parameters (b k ) that obtain when all cases with a given pattern j are excluded Δ B = j rh j j ( h ) Larger values means larger influence ΔB j >= must in any case be seen as large influence j Erling Berge delta B (with caseno) 0,70 99,00 0,60 0,50 96,00 deltab_m 0,40 0,30 66,00 56,00 65,00 0,0 0,0 0,00 0, ,0000 0, , ,80000,00000 Predicted probability Erling Berge Erling Berge 00 34

35 Δχ P(i) Delta Pearson Chisquare Measures the reduction in Pearson χ that obtains from excluding all cases with pattern j Δ χ = P( j) r j ( h ) j Erling Berge Delta Pearson Chisquare (with delta B) 30,00,44 5,00 mdeltapearsonkjikv 0,00 5,00 0,00,6366, ,00,08437,04734,038,697 0,00 0, ,0000 0, , ,80000,00000 Predicted probability Erling Berge Erling Berge 00 35

36 Δχ D(i) Delta Deviance Chisquare Measures changes in deviance that obtains j from excluding all Δ χd( j) = cases with pattern j hj This is equivalent to Δ D( j ) = LLK LLK(j) d ( ) χ LL K is the LogLikelihood of a model with K parameters estimated on the whole sample and LL K(j) is from the estimate of the same model when all cases with pattern j are excluded Erling Berge 00 7 Delta Deviance Chisquare (with delta B) 7,00,44 6,00,6366 5,00,04734 mdeltaavvikskjikv 4,00 3,00,08437,038,33699,00,00 0,00 0, ,0000 0, , ,80000,00000 Predicted probability Erling Berge 00 7 Erling Berge 00 36

37 Influence of excluded cases/patterns Variables in the model lived educ contam hsc nodad Constant *LL(modell) Sample -,040 -,97,99,79 -,73,8-4,65 Logit coefficient Excluding case 99 Δχ P(i) =8,34 -,045 -,4,490,49 -,889,575-35,45 Excluding case 96 Δχ P(i) =9,0 -,05 -,4,38,347 -,658,530-36,4 Erling Berge Influence of excluded cases/patterns y=/(+exp(-( x ))) y=/(+exp(-( x ))) y=/(+exp(-( x ))) Erling Berge Erling Berge 00 37

38 Conclusions () Ordinary OLS do not work well for dichotomous dependent variables since It is impossible to obtain normally distributed errors or homoscedasticity, and since The model predicts probabilities outside the interval [0-] The Logit model is for theoretical reasons better Likelihood ratio tests statistic can be used to test nested models analogous to the F-statistic In large samples the chisquare distributed Wald statistic [or the normally distributed t=sqrt(wald)] will be able to test single coefficients and provide confidence intervals There is no statistic similar to the coefficient of determination Erling Berge Conclusions () Coefficient of estimated models can be interpreted by. Log-odds (direct interpretation). Odds 3. Odds ratio 4. Probability (conditional effect plot) Non-linearity, case with influence, and multicollinearity leads to the same kinds of problems as in OLS regression (inaccurate or uncertain parameter values) Discrimination leads to problems of uncertain parameter values (large variance estimates) Diagnostic work is important Erling Berge Erling Berge 00 38

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Ref.:   Spring SOS3003 Applied data analysis for social science Lecture note SOS3003 Applied data analysis for social science Lecture note 05-2010 Erling Berge Department of sociology and political science NTNU Spring 2010 Erling Berge 2010 1 Literature Regression criticism I Hamilton

More information

Logistic Regression. Continued Psy 524 Ainsworth

Logistic Regression. Continued Psy 524 Ainsworth Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression

More information

Advanced Quantitative Data Analysis

Advanced Quantitative Data Analysis Chapter 24 Advanced Quantitative Data Analysis Daniel Muijs Doing Regression Analysis in SPSS When we want to do regression analysis in SPSS, we have to go through the following steps: 1 As usual, we choose

More information

Chapter 19: Logistic regression

Chapter 19: Logistic regression Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog

More information

Chapter 19: Logistic regression

Chapter 19: Logistic regression Chapter 19: Logistic regression Smart Alex s Solutions Task 1 A display rule refers to displaying an appropriate emotion in a given situation. For example, if you receive a Christmas present that you don

More information

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure). 1 Neuendorf Logistic Regression The Model: Y Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and dichotomous (binomial; 2-value), categorical/nominal data for a single DV... bear in mind that

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

MULTINOMIAL LOGISTIC REGRESSION

MULTINOMIAL LOGISTIC REGRESSION MULTINOMIAL LOGISTIC REGRESSION Model graphically: Variable Y is a dependent variable, variables X, Z, W are called regressors. Multinomial logistic regression is a generalization of the binary logistic

More information

Dependent Variable Q83: Attended meetings of your town or city council (0=no, 1=yes)

Dependent Variable Q83: Attended meetings of your town or city council (0=no, 1=yes) Logistic Regression Kristi Andrasik COM 731 Spring 2017. MODEL all data drawn from the 2006 National Community Survey (class data set) BLOCK 1 (Stepwise) Lifestyle Values Q7: Value work Q8: Value friends

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors.

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors. EDF 7405 Advanced Quantitative Methods in Educational Research Data are available on IQ of the child and seven potential predictors. Four are medical variables available at the birth of the child: Birthweight

More information

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi.

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi. Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 14, 2018 This handout steals heavily

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla. Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

Binary Dependent Variables

Binary Dependent Variables Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome

More information

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

1. BINARY LOGISTIC REGRESSION

1. BINARY LOGISTIC REGRESSION 1. BINARY LOGISTIC REGRESSION The Model We are modelling two-valued variable Y. Model s scheme Variable Y is the dependent variable, X, Z, W are independent variables (regressors). Typically Y values are

More information

LOGISTICS REGRESSION FOR SAMPLE SURVEYS

LOGISTICS REGRESSION FOR SAMPLE SURVEYS 4 LOGISTICS REGRESSION FOR SAMPLE SURVEYS Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-002 4. INTRODUCTION Researchers use sample survey methodology to obtain information

More information

Models for Binary Outcomes

Models for Binary Outcomes Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

Generalized Linear Models: An Introduction

Generalized Linear Models: An Introduction Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

Count data page 1. Count data. 1. Estimating, testing proportions

Count data page 1. Count data. 1. Estimating, testing proportions Count data page 1 Count data 1. Estimating, testing proportions 100 seeds, 45 germinate. We estimate probability p that a plant will germinate to be 0.45 for this population. Is a 50% germination rate

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see Title stata.com logistic postestimation Postestimation tools for logistic Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Chapter 15 - Multiple Regression

Chapter 15 - Multiple Regression 15.1 Predicting Quality of Life: Chapter 15 - Multiple Regression a. All other variables held constant, a difference of +1 degree in Temperature is associated with a difference of.01 in perceived Quality

More information

Logistic Regression. Building, Interpreting and Assessing the Goodness-of-fit for a logistic regression model

Logistic Regression. Building, Interpreting and Assessing the Goodness-of-fit for a logistic regression model Logistic Regression In previous lectures, we have seen how to use linear regression analysis when the outcome/response/dependent variable is measured on a continuous scale. In this lecture, we will assume

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of

More information

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS The data used in this example describe teacher and student behavior in 8 classrooms. The variables are: Y percentage of interventions

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Procedia - Social and Behavioral Sciences 109 ( 2014 )

Procedia - Social and Behavioral Sciences 109 ( 2014 ) Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 09 ( 04 ) 730 736 nd World Conference On Business, Economics And Management - WCBEM 03 Categorical Principal

More information

SPSS LAB FILE 1

SPSS LAB FILE  1 SPSS LAB FILE www.mcdtu.wordpress.com 1 www.mcdtu.wordpress.com 2 www.mcdtu.wordpress.com 3 OBJECTIVE 1: Transporation of Data Set to SPSS Editor INPUTS: Files: group1.xlsx, group1.txt PROCEDURE FOLLOWED:

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

13.1 Categorical Data and the Multinomial Experiment

13.1 Categorical Data and the Multinomial Experiment Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Ch 6: Multicategory Logit Models

Ch 6: Multicategory Logit Models 293 Ch 6: Multicategory Logit Models Y has J categories, J>2. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. In R, we will fit these models using the

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Chapter 11. Regression with a Binary Dependent Variable

Chapter 11. Regression with a Binary Dependent Variable Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score

More information

Regression Model Building

Regression Model Building Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated

More information

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a Chapter 9 Regression with a Binary Dependent Variable Multiple Choice ) The binary dependent variable model is an example of a a. regression model, which has as a regressor, among others, a binary variable.

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 4: Qualitative influences and Heteroskedasticity Egypt Scholars Economic Society November 1, 2014 Assignment & feedback enter classroom at http://b.socrative.com/login/student/

More information

2. We care about proportion for categorical variable, but average for numerical one.

2. We care about proportion for categorical variable, but average for numerical one. Probit Model 1. We apply Probit model to Bank data. The dependent variable is deny, a dummy variable equaling one if a mortgage application is denied, and equaling zero if accepted. The key regressor is

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur Lecture 10 Software Implementation in Simple Linear Regression Model using

More information

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear

More information

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017 Poisson Regression Gelman & Hill Chapter 6 February 6, 2017 Military Coups Background: Sub-Sahara Africa has experienced a high proportion of regime changes due to military takeover of governments for

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

Checking model assumptions with regression diagnostics

Checking model assumptions with regression diagnostics @graemeleehickey www.glhickey.com graeme.hickey@liverpool.ac.uk Checking model assumptions with regression diagnostics Graeme L. Hickey University of Liverpool Conflicts of interest None Assistant Editor

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Analysis Mahinda Samarakoon April 6, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 25 Table of contents 1 Building and applying logistic regression models (Chap

More information

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories. Chapter Goals To understand the methods for displaying and describing relationship among variables. Formulate Theories Interpret Results/Make Decisions Collect Data Summarize Results Chapter 7: Is There

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression 1 Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable Y (criterion) is predicted by variable X (predictor)

More information

Analysing categorical data using logit models

Analysing categorical data using logit models Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure. STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed

More information

Econometrics 1. Lecture 8: Linear Regression (2) 黄嘉平

Econometrics 1. Lecture 8: Linear Regression (2) 黄嘉平 Econometrics 1 Lecture 8: Linear Regression (2) 黄嘉平 中国经济特区研究中 心讲师 办公室 : 文科楼 1726 E-mail: huangjp@szu.edu.cn Tel: (0755) 2695 0548 Office hour: Mon./Tue. 13:00-14:00 The linear regression model The linear

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

The Logit Model: Estimation, Testing and Interpretation

The Logit Model: Estimation, Testing and Interpretation The Logit Model: Estimation, Testing and Interpretation Herman J. Bierens October 25, 2008 1 Introduction to maximum likelihood estimation 1.1 The likelihood function Consider a random sample Y 1,...,

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

Regression Diagnostics Procedures

Regression Diagnostics Procedures Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF VARIANCE IN Y FOR EACH VALUE OF X For any fixed value of the independent variable X, the distribution of the

More information

BIOS 625 Fall 2015 Homework Set 3 Solutions

BIOS 625 Fall 2015 Homework Set 3 Solutions BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's

More information

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests Chapter 59 Two Correlated Proportions on- Inferiority, Superiority, and Equivalence Tests Introduction This chapter documents three closely related procedures: non-inferiority tests, superiority (by a

More information