A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS

Similar documents
ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION

LOGISTIC REGRESSION Joseph M. Hilbe

Generalized Linear Models

11. Generalized Linear Models: An Introduction

Generalized Linear Models (GLZ)

INFORMATION AS A UNIFYING MEASURE OF FIT IN SAS STATISTICAL MODELING PROCEDURES

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Logistic Regression. Continued Psy 524 Ainsworth

Model Estimation Example

LOGISTICS REGRESSION FOR SAMPLE SURVEYS

Generalized Linear Models

Single-level Models for Binary Responses

GLM models and OLS regression

Logistic Regression: Regression with a Binary Dependent Variable

9 Generalized Linear Models

Generalized Linear Models 1

8 Nominal and Ordinal Logistic Regression

Package rsq. January 3, 2018

Chapter 1 Statistical Inference

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Correlation and regression

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Generalized Linear Models: An Introduction

Categorical data analysis Chapter 5

Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in the Niger Delta

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010

SAS Software to Fit the Generalized Linear Model

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Models for Binary Outcomes

Repeated ordinal measurements: a generalised estimating equation approach

Longitudinal Modeling with Logistic Regression

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

Introduction to General and Generalized Linear Models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Mathematical Modelling of RMSE Approach on Agricultural Financial Data Sets

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Econometrics II. Seppo Pynnönen. Spring Department of Mathematics and Statistics, University of Vaasa, Finland

COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.

Generalized linear models

Classification. Chapter Introduction. 6.2 The Bayes classifier

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Sample size determination for logistic regression: A simulation study

Statistical Distribution Assumptions of General Linear Models

Simple ways to interpret effects in modeling ordinal categorical data

12 Modelling Binomial Response Data

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

MULTINOMIAL LOGISTIC REGRESSION

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multinomial Logistic Regression Models

Generalized Linear Models for Non-Normal Data

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Experimental Design and Data Analysis for Biologists

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

Generalized Linear Models

poisson: Some convergence issues

Outline of GLMs. Definitions

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Survival Analysis Math 434 Fall 2011

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

DISPLAYING THE POISSON REGRESSION ANALYSIS

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Ch 6: Multicategory Logit Models

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models

Psychology 282 Lecture #4 Outline Inferences in SLR

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Longitudinal and Panel Data: Analysis and Applications for the Social Sciences. Table of Contents

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Generalized linear models

Group comparisons in logit and probit using predicted probabilities 1

11. Generalized Linear Models: An Introduction

Lecture 12: Effect modification, and confounding in logistic regression

Econometric Analysis of Cross Section and Panel Data

WU Weiterbildung. Linear Mixed Models

Introduction to General and Generalized Linear Models

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Lecture notes to Chapter 11, Regression with binary dependent variables - probit and logit regression

Linear Regression Models P8111

Generalized logit models for nominal multinomial responses. Local odds ratios

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Investigating Models with Two or Three Categories

Introduction to Generalized Linear Models

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Statistics 203: Introduction to Regression and Analysis of Variance Course review

A Practitioner s Guide to Generalized Linear Models

Procedia - Social and Behavioral Sciences 109 ( 2014 )

Regression models for multivariate ordered responses via the Plackett distribution

Improving the Precision of Estimation by fitting a Generalized Linear Model, and Quasi-likelihood.

Treatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev

Exam Applied Statistical Regression. Good Luck!

Generalized Linear Models I

Transcription:

A COEFFICIENT OF DETEMINATION FO LOGISTIC EGESSION MODELS ENATO MICELI UNIVESITY OF TOINO After a brief presentation of the main extensions of the classical coefficient of determination ( ), a new index is proposed that can be used with Logistic for ungrouped data. This index is a direct extension of the classical coefficient of determination for linear models (link function identity and normal distribution for errors), and they share the same properties. Index performances (including the one proposed here) are compared by means of simulated data. Key words: Model Fit; Coefficient of Determination; Logistic regression models; Generalized Linear Models; Log likelihood. Correspondence concerning this article should be addressed to enato Miceli, Dipartimento di Psicologia, Università degli Studi di Torino, Via Verdi, 4 TOIO (TO), Italy. E-mail: miceli@psych.unito.it INTODUCTION A large number of research studies in psychology applies models with categorical and limited dependent variables in statistical analysis. Such models usually belong to the large family of Generalized Linear Models (GLM) (McCullagh & Nelder, 983; Nelder & Wedderburn, 97). When data are gathered with non-experimental research methods (as in many studies using logistic regression models), the assessment of the goodness-of-fit raises problems due to the lack of a summary measure that can be easily interpreted, such as the coefficient of determination in classical regression linear models. The coefficient of determination ( ) in classical linear models (link function identity and normal distribution for errors) is widely used as a goodness-of-fit measure because of its interesting properties (ao, 973): (i) it ranges between and (the higher the fit, the more approximates, which is reached when the model perfectly reproduces the observed data); (ii) it is dimensionless, i.e., it is independent of the unit of measurement used for variables; (iii) it is independent of sample size (); (iv) it can be immediately and easily interpreted in that it can be expressed as the proportion of the deviance explained by the model with respect to the total deviance to be explained. In classical linear models, the parameters ( θˆ,ˆ θ,...,ˆθ K) can be estimated by Ordinary Least Squares (OLS) criterion and can be expressed as the ratio between explained deviance and deviance to explain ( observations and K variables): TPM Vol. 4, No., 83-98 Summer 7 7 Cises 83

TPM Vol. 4, No., 83-98 Summer 7 7 Cises Miceli,. y) yi) = y) ) () where: ŷ i =θ ˆ K + θ ˆ k xik ; y = k= y i. Numerous suggestions were made for generalizing to various models, other than the classical linear one, even when deviance has to be replaced by the more general concept of variability, and the parameters are Maximum Likelihood (ML) estimates. Efforts were primarily made to extend to discrete models, in particular to logistic regression models for ungrouped data (Aldrich & Nelson, 984; Cox & Snell, 989; Maddala, 983; Magee, 99; Nagelkerke, 99). The index (here referred to as ), originally suggested by Maddala (983), and subsequently by Cox and Snell (989), and Magee (99), can be expressed as: L = L () where is the sample size; L and L denote the likelihoods of the fitted and the null (intercept only) model, respectively. The index (here referred to as A ), proposed by Aldrich and Nelson (984), can be expressed as: c A = + c L where c = log, generally referred to as likelihood ratio. L Even if both indexes present interesting aspects, they do not have property (i). In both cases, the maximum value is less than. In particular, the maximum value of equals: max = L Nagelkerke (99) proposed to correct ) that satisfies property (i), and that can be expressed as: (3), suggesting an index (here referred to as = max (4) It is easily found (Nagelkerke, 99) that not (i) and while properties (i), (ii) and (iii) hold for has the (ii), (iii) and (iv) properties, but, the same is not true of property (iv), which is of fundamental importance in providing a clear interpretation of the index values. Given that is a popular diagnostic tool in research and it varies between and, there is a high risk that its values may be interpreted as explained variation. Furthermore, this 84

TPM Vol. 4, No., 83-98 Summer 7 7 Cises Miceli,. risk could be even higher if as suggested by our simulations the index values always tend to suggest an optimistic interpretation of the explanatory power of the model under consideration. Obviously, in order to claim this, a measure having all the four properties mentioned above is needed. For this reason it appears useful to propose a new index here referred to as M or Maximal atio Index. THE MAXIMAL ATIO (M) INDEX It is useful to start thinking about a metric dependent variable (y) and a group of K metric explanatory variables, independent variables, or covariates. In such a context, K nested linear models (link function identity and normal distribution for errors) and the intercept only model can be estimated: besides the intercept, model M will contain only the variable x ; model M will contain x and x, and so forth. Equation () shows the strict proportionality linking K values of to as many values of the deviance explained by each model. In addition, by obtaining parameters through the ML estimator, the explained deviance is equivalent to the likelihood ratio (omitting the scale factor ) often referred to as c (Aldrich & Nelson, 984, p. 55); such ratio σ can be expressed as: Λ = L L (where L denotes the likelihood of the fitted model, and L denotes the likelihood of the null or intercept only model). The deviance explained by the fitted model can thus be expressed as: c [ log( L ) ( )] logλ= log = L (5) Therefore, within classical linear models (link function identity and normal distribution for errors), can be interpreted as in (iv) taking into account the increments in the explained deviance, as well as the increments in the likelihood ratio. In the context of logistic regression models the concept of explained deviance has to be replaced by the more general concept of explained variability and, given that c measures the latter, it seems obvious to develop a measure of fit proportional to this statistic. On the other hand, within GLM, a statistic also indicated as likelihood ratio (see Dobson, 99, p6) is often used, but its meaning is completely different from that of statistic c. Such ratio can be expressed as: λ = L max L (where L denotes the likelihood of the fitted model, but L max denotes the likelihood of the maximal or full model). Nelder and Wedderburn (97) proposed to use twice the logarithm of such ratio as measure of fit of any generalized linear model. They indicated such statistic with the term deviance, so as to evoke the statistic that has the same name in classical linear models, and to underline the extension of such concept to the whole generalized linear models family, even when the simple residual sum of squares can no longer be calculated, or is meaningless. Such statistic, in relation to a generic fitted model, can thus be expressed as: [ log( L ) ( )] D = L (6) logλ= max log While statistic c expressed the contribution of the covariates to the model fit of the dependent variable (so to speak, the way that has been gone thanks to the model), now statistic 85

TPM Vol. 4, No., 83-98 Summer 7 7 Cises Miceli,. D expresses the amount of discrepancy that, in spite of the model, is still present ( the way that still has to be gone ). The use of a maximal model in the assessment of fit is commonly associated with a certain type of models (for example, log-linear models), or with particular research contexts (confirmatory or experimental methods), when the model may comprise as many covariates as there are observations. Vice versa, a maximal model is not suitable for studies conducted with nonexperimental methods when, for exploratory purposes, researchers deal with a great amount of observations and no a priori defined group of covariates as it often happens when using a logistic regression model. This may be the reason why, in research practice, each statistic (both c and D ) is exclusively restricted to a specific world or domain. Nonetheless, there is a point in which the two worlds meet: this is the intercept model. Thus, the calculation of statistic D for the intercept model (of any model belonging to the GLM family and hence even for logistic regression models) yields a measure of the variability that covariates still have to explain. Such statistic is here referred to as D : D [ log( L ) ( )] logλ= max log = L (7) Now, in the context of logistic regression models, having a measure of explained variability (c ) and a measure of variability to explain (D ) at our disposal, the Maximal atio (M) can be expressed as: c M= D Thus, it is easy to demonstrate that in the case of classical linear models (link function identity and normal distribution for errors), this ratio coincides with (Miceli,, p. 6-6), and obviously it has the same well known properties, including the one of varying between and and of being proportional to the amount of explained variability. The main steps of the demonstration are reported below; for classical linear models (link function identity and normal distribution for errors) we can write the log-likelihood function of the generic model with k covariates (k < ) and σ for dispersion parameter as: l = ) ( y y ) log( πσ ) i σ where: ŷ i =θ ˆ + K θˆ x k ik ; k= the log-likelihood function of the maximal or full model, when y = yˆ ( i ), is: ( πσ ) l max = log the log-likelihood function of the null or intercept only model, when = y ( i ) and y = y i, is: ( y y) log( πσ ) = i σ l i i ŷ i (8) 86

TPM Vol. 4, No., 83-98 Summer 7 7 Cises Miceli,. Then c = ( l l) = y) log( πσ ) + ŷi) + log( πσ ) =+ σ σ y) ŷi) σ = D = ( lmax l) = log( πσ ) + y) + log( πσ ) =+ y) σ σ And c M= D = σ y) ŷi) σ y) = y) ŷi) y) = EMPIICAL COMPAISON BETWEEN THE DIFFEENT INDEXES Through simulated data, it is now possible to compare the performance of the different indexes. Simulations were conducted by generating, for different sample sizes ( = ; = 3; exp( X i) = 3), a continuous latent variable (y), obtained from yi =, where X i denotes + exp X the linear combination of 5 normally distributed random variables, and as many coefficients (plus the intercept). For each sample size, two types of continuous variable (y) were generated, as shown in Figure : simulation type A with about 36% of its values falling into the. interval, thus presenting a clear-cut logistic trend; and simulation type B, with about 86% of its values falling into the same interval, presenting a like linear trend. For each simulation type (A and B) and for each sample size (, 3, and 3), nine cutting points were then defined, in order to generate as many dummy variables (D, D,..., D9), so that each of them had a different frequency of value, as illustrated below: ( ) i Dummy variable D D D3 D4 D5 D6 D7 D8 D9 Frequencies of (%) 3 5 4 5 6 75 9 97 Each of the 54 dummies thus generated was then used as dependent variable in 5 logistic regression models, thus computing an overall ML estimate of 8 models. The variables of the various logistic regression models were organized so as to define, for each dummy, a group of 5 nested models (M, M,..., M5). 87

TPM Vol. 4, No., 83-98 Summer 7 7 Cises Miceli,... 3 4 5 6 7 8 9 ote. Simulation type A ( = ): latent dependent variable (y)... 3 4 5 6 7 8 9 ote. Simulation type B ( = ): latent dependent variable (y). FIGUE Two Types of Latent Dependent Variable y. The obtained results, partially reported in Table a, b, c, and Figure, permit us to express subsequent considerations (due to space limitations, Table a, b, c only report some results from simulation type A estimates, with = 3 (dependent variable: D, D3, and D5); Figure reports simulation type A graphs. The remaining results are in line with the ones presented here): (a) the four indexes provide different indications on the model fit; and A even show discordant values; (b) offers a model-data fit value closer to M, compared to the other indexes, yielding higher values in all occasions; 88

TPM Vol. 4, No., 83-98 Summer 7 7 Cises Miceli,. (c) M and yield very similar values in almost all simulations; however, discrepancies (with increasingly higher values of ) become larger in proximity of central values ( ), and when the frequency of value in the dependent variable is more or less balanced (4% 6%). TABLE A Comparison among Fit Indexes from Simulation Type A ( = 3) Model D c M A M 8853 39 353 54 36 363 M 8853 4833 594 67 59 576 M3 8853 95 35 5 33 99 M4 8853 869 93 54 63 5858 M5 8853 946 4 5 66 676 M6 8853 367 796 3 97 973 M7 8853 3358 47 476 57 5 M8 8853 344 39 57 8 5 M9 8853 454 3 349 4 M 8853 43 7 448 7 8 M 8853 5685 3836 69 8 678 M 8853 694 537 78 38 88 M3 8853 6987 793 7 94 353 M4 8853 7787 8785 8 37 M5 8853 7986 875 89 37 9 ote. Fifteen nested models were simulated for dependent variable D (frequencies of value = 3%). TABLE B Comparison among Fit Indexes from Simulation Type A ( = 3) Model D c M A M 3374 838 474 46 74 77 M 3374 464 734 8 789 759 M3 3374 3695 965 8 978 M4 3374 6393 966 45 58 M5 3374 6684 84 58 98 4 M6 3374 9 35 77 88 49 M7 3374 3 3545 654 43 393 M8 3374 635 449 76 5 95 M9 3374 398 44 57 75 79 M 3374 469 3548 735 87 875 M 3374 8753 5599 885 649 8473 M 3374 87 7577 884 33 383 M3 3374 437 44 64 5 4483 M4 3374 8633 488 9 5 884 M5 3374 336 964 98 739 844 ote. Fifteen nested models were simulated for dependent variable D3 (frequencies of value = 5%). 89

TPM Vol. 4, No., 83-98 Summer 7 7 Cises Miceli,. TABLE C Comparison among Fit Indexes from Simulation Type A ( = 3) Model D c M A M 4588 964 36 4 36 3 M 4588 5 679 78 88 777 M3 4588 4 4 7 4 M4 4588 7848 865 68 73 M5 4588 89 459 5 64 5 M6 4588 567 9 56 4 57 M7 4588 39 3464 949 7 69 M8 4588 477 55 85 889 995 M9 4588 7683 534 94 455 793 M 4588 864 4758 64 63 89 M 4588 3787 797 3 475 45 M 4588 886 94 39 79 934 M3 4588 347 377 55 379 393 M4 4588 3589 63 33 977 447 M5 4588 443 963 983 487 8 ote. Fifteen nested models were simulated for dependent variable D5 (frequencies of value = 5%). Dependent variable D.. ote M M M3 M4 M5 M6 M7 M8 M9 M M M M3 M4 M5 ; ; ; γ M A (figure continues) 9

TPM Vol. 4, No., 83-98 Summer 7 7 Cises Miceli,. FIGUE (continued) Dependent variable D.. M M M3 M4 M5 M6 M7 M8 M9 M M M M3 M4 M5 Dependent variable D3.. M M M3 M4 M5 M6 M7 M8 M9 M M M M3 M4 M5 Dependent variable D4.. M M M3 M4 M5 M6 M7 M8 M9 M M M M3 M4 M5 (figure continues) 9

TPM Vol. 4, No., 83-98 Summer 7 7 Cises Miceli,. FIGUE (continued) Dependent variable D5.. M M M3 M4 M5 M6 M7 M8 M9 M M M M3 M4 M5 Dependent variable D6.. M M M3 M4 M5 M6 M7 M8 M9 M M M M3 M4 M5 Dependent variable D7.. M M M3 M4 M5 M6 M7 M8 M9 M M M M3 M4 M5 (figure continues) 9

TPM Vol. 4, No., 83-98 Summer 7 7 Cises Miceli,. FIGUE (continued) Dependent variable D8.. M M M3 M4 M5 M6 M7 M8 M9 M M M M3 M4 M5 Dependent variable D9.. M M M3 M4 M5 M6 M7 M8 M9 M M M M3 M4 M5 FIGUE Comparison among Fit Indexes from Simulation Type A ( = 3) If it is important that the fit index may be interpreted as a proportion of explained variation, then it should be noted that always tends to suggest an optimistic interpretation of the explanatory power of the fitted model, that is to say a larger proportion of explained variation. In addition, this optimistic interpretation is not constant when data and models vary. This aspect can be verified by assessing the congruence between the increments in the variability explained by each model (expressed by statistic c ) and the corresponding increments in the fit index. Such evaluation can be done with nested models, as in this study. The strict proportionality between c and M can be derived by formula (8). On the contrary, as shown in Table, never strictly follows the increments in the explained variability: above all, the relation is not constant, and larger differences (with r values considerably lower than +) are observed for those dependent 93

TPM Vol. 4, No., 83-98 Summer 7 7 Cises Miceli,. variables that present a higher balance between and (D4, D5, and D6). Further, Table suggests that increasingly larger discrepancies can be observed as the sample size increases, and the more the latent variable (y) moves away from linearity (discrepancies in simulation type A are larger than in simulation type B). Table reports r values calculated across the increments of c and in relation to the 5 nested models estimated for each dependent variable. The values of the other two indexes ( and A ), were not reported due to space limitations. However, they are very similar to those of ; usually, values are remarkably lower. A TABLE Pearson Correlations between Likelihood atio c and (for each simulation type and each dependent variable) Simulation type D D D3 D4 D5 A 9843 7663 44 8679 66 B 9843 854 469 889 75 3 A 9633 646 838 7664 936 3 B 9753 7586 4 3887 547 3 A 949 478 439 7345 5768 3 B 9556 69 54 5746 54 Simulation type D6 D7 D8 D9 A 456 386 7786 967 B 859 68 7498 9883 3 A 4979 443 587 9448 3 B 77 348 8364 988 3 A 995 464 368 94 3 B 66 9877 6763 9448 The results of the present study are summarized in Figure 3. For each dependent variable, the values of the four fit indexes (on the ordinate) for each estimated model are shown, so that the trend of these values can be compared with the trend of the likelihood ratio c (the explained variation) on the abscissa. CONCLUSIONS The assessment of the goodness-of-fit for Logistic (ungrouped data) can be facilitated by an index allowing an easy interpretation, such as the coefficient of determination for classical linear regression models. The new index developed in this study (M) can be used as an alternative for the common indexes (proposed by Cox & Snell, 989, and by Nagelkerke, 99) that today are supplied by the most common statistical software packages. 94

TPM Vol. 4, No., 83-98 Summer 7 7 Cises Miceli,. This paper compares the performance of M with the other known indexes by means of simulated data. Dependent variable D.. 3 4 5 6 7 8 ote. ; ; ; γ M A Dependent variable D.. 4 6 8 4 6 8 Dependent variable D3.. 3 4 (figure continues) 95

TPM Vol. 4, No., 83-98 Summer 7 7 Cises Miceli,. FIGUE 3 (continued) Dependent variable D4.. 5 5 5 3 35 4 45 Dependent variable D5.. 5 5 5 3 35 4 45 Dependent variable D6.. 5 5 5 3 35 4 (figure continues) 96

TPM Vol. 4, No., 83-98 Summer 7 7 Cises Miceli,. FIGUE 3 (continued) Dependent variable D7.. 5 5 5 3 35 Dependent variable D8.. 4 6 8 4 6 8 Dependent variable D9.. 3 4 5 6 7 8 FIGUE 3 Comparison among Fit Indexes, with Likelihood atio c on the abscissa from Simulation Type A ( = 3) 97

TPM Vol. 4, No., 83-98 Summer 7 7 Cises Miceli,. In particular, the main distinctive features of the M index are the following: it is easy to compute; in the case of classical linear models (link function identity and normal distribution for errors) it coincides with the classical coefficient of determination ( ); it varies between and ; its values may be interpreted as explained variation by the fitted model with respect to the total variation to be explained. EFEENCES Aldrich, J. H., & Nelson, F. D. (984). Linear Probability, Logit, and Probit Models. Sage University Paper Series on Quantitative Applications in the Social Sciences (pp-45). Beverly Hills and London: Sage Publications. Cox, D.., & Snell, E. J. (989). The Analysis of Binary Data ( nd ed.). London: Chapman & Hall. Dobson, A. J. (99). An Introduction to Generalized Linear Models. London: Chapman & Hall. Maddala, G. S. (983). Limited-dependent and Qualitative Variables in Econometrics. New York: Cambridge University Press. Magee, L. (99). Measures Based on Wald and Likelihood atio Joint Significance Test. American Statistician, 44, 5-53. McCullagh, P., & Nelder, J. A. (983). Generalized Linear Models. New York: Chapman & Hall. Miceli,. (). Percorsi di icerca e Analisi dei Dati [esearch methods and data analysis]. Torino: Bollati Boringhieri. Nagelkerke, N. J. D. (99). A Note on a General Definition of the Coefficient of Determination. Biometrika, 78, 69-69. Nelder, J. A., & Wedderburn,. W. M. (97). Generalized Linear Models. Journal of oyal Statistical Society, A, 35, 37-384. ao, C.. (973). Linear Statistical Inference and its Applications ( nd ed.). New York: Wiley. 98