Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Similar documents
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

The Periodogram and its Optical Analogy.

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.

Detection of Influential Observation in Linear Regression. R. Dennis Cook. Technometrics, Vol. 19, No. 1. (Feb., 1977), pp

ESTIMATION BY LEAST SQUARES AND

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

HANDBOOK OF APPLICABLE MATHEMATICS

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

Efficient Robbins-Monro Procedure for Binary Data

Conditional confidence interval procedures for the location and scale parameters of the Cauchy and logistic distributions

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Quick and Easy Analysis of Unreplicated Factorials. Russell V. Lenth. Technometrics, Vol. 31, No. 4. (Nov., 1989), pp

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Chapter 1 Statistical Inference

The effect of nonzero second-order interaction on combined estimators of the odds ratio

E. DROR, W. G. DWYER AND D. M. KAN

Ecological Society of America is collaborating with JSTOR to digitize, preserve and extend access to Ecology.

American Society for Quality

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

University of California, Berkeley

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Exam details. Final Review Session. Things to Review

Objective Experiments Glossary of Statistical Terms

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

THE PAIR CHART I. Dana Quade. University of North Carolina. Institute of Statistics Mimeo Series No ~.:. July 1967

SEQUENTIAL TESTS FOR COMPOSITE HYPOTHESES

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

LOGISTIC FUNCTION A MINIMAX ESTIMATOR FOR THE. having certain desirable asymptotic properties. But realistically, what is of

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Mathematics of Operations Research.

of Small Sample Size H. Yassaee, Tehran University of Technology, Iran, and University of North Carolina at Chapel Hill

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

The College Mathematics Journal, Vol. 24, No. 4. (Sep., 1993), pp

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

The American Mathematical Monthly, Vol. 100, No. 8. (Oct., 1993), pp

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Minimum distance tests and estimates based on ranks

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Pseudo-score confidence intervals for parameters in discrete statistical models

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy

Repeated ordinal measurements: a generalised estimating equation approach

Online publication date: 22 March 2010

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Statistical Tests. Matthieu de Lapparent

An Improved Approximate Formula for Calculating Sample Sizes for Comparing Two Binomial Distributions

Annals of Mathematics

Likelihood and p-value functions in the composite likelihood context

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Math 423/533: The Main Theoretical Topics

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Generating Half-normal Plot for Zero-inflated Binomial Regression

Additive and multiplicative models for the joint effect of two risk factors

Mathematical Association of America is collaborating with JSTOR to digitize, preserve and extend access to The American Mathematical Monthly.

RECENT DEVELOPMENTS IN VARIANCE COMPONENT ESTIMATION

Glossary for the Triola Statistics Series

9 Correlation and Regression

Review of Statistics 101

3 Random Samples from Normal Distributions

BAYESIAN ANALYSIS OF DOSE-RESPONSE CALIBRATION CURVES

The Annals of Human Genetics has an archive of material originally published in print format by the Annals of Eugenics ( ).

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Measurements and Data Analysis

Logistic Regression: Regression with a Binary Dependent Variable

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

THE NUMBER OF LOCALLY RESTRICTED DIRECTED GRAPHS1

Simultaneous Concentration Bands for Continuous Random Samples

UNIT 5:Random number generation And Variation Generation

A process capability index for discrete processes

Mind Association. Oxford University Press and Mind Association are collaborating with JSTOR to digitize, preserve and extend access to Mind.

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Introduction to Statistical Analysis

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

QUANTITATIVE TECHNIQUES

Transcription:

American Society for Quality A Note on the Graphical Analysis of Multidimensional Contingency Tables Author(s): D. R. Cox and Elizabeth Lauh Source: Technometrics, Vol. 9, No. 3 (Aug., 1967), pp. 481-488 Published by: Taylor & Francis, Ltd. on behalf of American Statistical Association and American Society for Quality Stable URL: http://www.jstor.org/stable/1266517 Accessed: 08-06-2017 09:27 UTC REFERENCES Linked references are available on JSTOR for this article: http://www.jstor.org/stable/1266517?seq=1&cid=pdf-reference#references_tab_contents You may need to log in to JSTOR to access the linked references. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://about.jstor.org/terms American Statistical Association, American Society for Quality, Taylor & Francis, Ltd. are collaborating with JSTOR to digitize, preserve and extend access to Technometrics

TECHNOMETRICS VOL. 9, NO. 3 AUGUST 1967 A Note on the Graphical Analysis of Multidimensional Contingency Tables D. R. Cox AND ELIZABETH LAUH* Imperial College, London, and Bell Telephone Laboratories, Incorporated The technique of half-normal plotting in the analysis of factorial experiments is adapted to multidimensional contingency tables. A logistic transformation is applied to the cell proportions before calculating and plotting the standard contrasts. An approximate theoretical slope is available for the half-normal plot. One use of the method is in the preliminary analysis of complex quantal data, in particular to help in choosing a model for more formal analysis. An illustrative example is discussed. INTRODUCTION Daniel (1956, 1959) introduced the half-normal plotting technique for the analysis of 2' factorial exponents. The standard contrasts are ranked in order of increasing absolute magnitude and plotted against the quantiles of a halfnormal distribution. A modification of the method is to exclude main effects from the plot, it usually being reasonable to expect appreciable main effects to be present. In a null situation a straight line through the origin is obtained, the slope estimating the error standard deviation, o. When there is a relatively small number of substantial real effects, all except the last few points lie near a line of slope determined by a. The objective of the graphical analysis is partly to estimate o- without making prior assumptions about which effects are unimportant, and partly to obtain an indication of such occurrences as outliers, plot-splitting, etc. In the present paper, we illustrate an adaption of half-normal plotting to the analysis of multidimensional contingency tables; in particular, we consider contingency tables in which there is a binary response variable, i.e., a response taking one of two forms, say success and failure, and a number of factors. It is required to assess the effect of the factors on the probability of success. The basis of the method is the application of the logistic transformation to binary data and in particular to contingency tables. LINEAR LOGISTIC MODELS The basis of the method to be proposed is a representation in terms of effects additive on a logistic scale. That is, if 0i is the probability of success for the ith factor combination, Received July, 1966; Revised February, 1967. * Now at Lockheed, Sunnyvale, California. 481

482 NOTES log ( ) (1) is assumed to be given by a su pected value of an observation with a 22 system, we can cons log ( - - a= a?2 /3, (2) where a,, a,2 correspond to main effects of factor interaction. The signs depend on the factor levels in at the lower level of I and the upper level of II the signs are -, +, -. Alternatively to (2), we might consider a model with only main effects, namely, log (i = 4- a, 4- a1 2 (3) It is, of course, possible also to consider other types of model models additive on some other scale. Thus instead of (3) we m O, = i 4- ai 4- a2. (4) A general reason for preferring (3) to sense that a given value of say a, is adm (g, a2). Lewis (1962), in his review of the analy brief comparison of logistic and other models. The logistic model was used implicitly by Bartlett (1935) in his significance test for the 2 X 2 X 2 table. Dyke and Patterson (1952) analysed a logistic contingency table model by maximum likelihood; see Birch (1964, 1965) for recent work and for further references. It is useful to distinguish between models saturated with parameters, i.e., for which the number of parameters is equal to the number of distinct binomial probabilities 0,, and unsaturated models. In the former, the logistic model puts no direct restriction on the form of the Oi's: to be fruitful, however, the logistic contrasts must in some sense be more useful summaries of the population than the Oi's. In an unsaturated model such as (3) or (4), on the other hand, a testable assumption about the form of the 0o's is made. The usual arguments apply in favor of the use of an additive scale of effects, if such a scale can be found. EMPIRICAL LOGIT TRANSFORM Suppose that for the ith combination of factor levels there are ni trials, R, of which are successes. If the trials are independent with constant probability of success 0i, then Ri has a binomial distribution with mean njoi and variance nioi(1 - Oi). If we contemplate a linear model on the logistic scale, one approach to the analysis is to transform the Ri's so that the expected value is as nearly as possible log { O/(1 - O) }. Haldane (1955) and Anscombe (1956) showed that among all transforms of the type log { (Ri + a)/(ni - Ri + a)}, where a is a

NOTES 483 constant, the form has asymptotic expectation Zi = og( Ri + ) (5) ni - Ri + 2 log (1-1)? 2 other values of a leading to a correction term 0(1/n,). Asymptotically, var (Zi) = n (- ) and Haldane suggested estimating this by (n + 2) (6) v, = (r + 1)(n -r+ 1)' (6) Note that this will overestimate the variance if Oi is ve If the n, are not too small and the 0i not too near zer can be fitted to the Zi by weighted least squares, estim This is essentially the technique proposed by Berkson a quantal dose response curve; he used the term minim Note that the above argument for the choice of the relevant only when unweighted combinations of the Z cock, 1962). In the present paper we shall, however, be concerned with unweighted combinations and will use the definitions (5) and (6), and will not consider the question of alternative definitions of ZA and v, that might be more relevant for use with weighted least squares. THE 2 X 2' SYSTEM We now apply the ideas of Sections 2 and 3 to the 2 X 2' system, i.e. to a system in which the response takes two possible forms, and there are m factors, each of two levels. Calculate Zi and vi for each of the 2m factor combinations. Under the saturated model the unique (almost) unbiased estimates of the factorial contrasts have the form C. = E i- Z (s = 1, ***,2-1), (7) and are given by the standard formulas; for instance, Yates's a be used to compute them. The contrasts C, have equal variance which can be estimated by v= E v. (8) Note that, even though the r the C8's are not in general variance. However, the fac normal with zero means an justification for making a V;.

484 NOTES Note that while (7) is the most convenient form for calculation and plotting, for estimation and detailed interpretation it will usually be best to convert (7) into a difference of two means by dividing by 2m~1. To check on the possible effect of correlation and the discreteness of the underlying data, a series of sampling experiments was run in which sets of 20 half-normal plots were produced corresponding to various combinations of 0i's and ni's. Even for the smallest values of n, used (one half n;, 5, the other half, 15; all n, = 10) plots were obtained showing reasonable agreement with a straight line of the theoretical slope. One type of departure noted in a few of the plots was a tendency for a step-like appearance; this arises in part because the discreteness of the underlying observations means that there is appreciable probability of finding two contrasts exactly equal. Knowledge of a theoretical slope is very helpful in interpreting the plots and also allows an assessment of goodness of fit to a model in which a substantial number of parameters are small. In some applications, however, the independence of the trials within a cell may be suspect. It might then be reasonable to consider an approximation in which var (R,) = y2n,o(1-0,) (9) where y is a constant, equal to unity when the trials are independent. The slope of the half-normal plot will now be a7 /v and estimation of y is possible. A very important distinction between the present situation and the standard normal-theory factorial analysis is that if we pass to an unsaturated model, reestimation of the parameters will be necessary, unless the Zi's happen to have approximately equal variance. The most common use of the half-normal plot is likely to be as an aid in selecting a model for more detailed weighted analysis. MORE COMPLEX FACTORIAL SYSTEMS Some of the simplicity of the analysis of Section 4 arises because the coefficients defining the contrasts are all plus or minus unity, so that the contrasts all have equal variance. To apply half-normal plotting to systems in which not all factors are at two levels, we consider, if possible, a meaningful set of simple degree of freedom contrasts, chosen usually to be orthogonal when the observations have equal variance. Suppose that one such contrast is I'siZi; this has variance estimated by E.1 i. Since for half-normal plotting we want contrasts with equal variance, we are led to define c: = ( 8iZi)(Z iy (10o) and to plot these as having theoretically u An alternative definition for plotting m most of the observed variation in the vi's is random. Then we write

NOTES 485 and treat these as having a theoretical variance C = ( iz)( 1) S () ave (v,), (12) where ave denotes average o Note that these definitions and plotting. For estimation it is an unbiased estimate o AN EXAMPLE To illustrate the above procedures, we use a 2 X (3 X 22) experiment comparing two detergents, a new product X and a standard product M (Ries and Smith, 1963). The three factors were water softness, at three levels, temperature, at two levels, and a factor whose two levels corresponded to previous experience and no previous experience with M. For each of the 12 factor combinations, a number ni of individuals, between 48 and 110, used both detergents and ri of these preferred X, the remainder preferring M. Table 1 gives the data. TABLE 1 Number ri of Preferences for Brand X out of ni Individuals M previous non-user M previous user Temperature Temperature Water Softness Low High Low High Hard r, 68 42 37 24 ni 110 72 89 67 Z,.477.332 -.337 -.575 Vi.0377.0555.0452.0627 Medium ri 66 33 47 23 ni 116 56 102 70 Zj.275.355 -.156 -.704 Vi.0345.0711.0387.0625 Soft ri 63 29 57 19 n, 116 56 106 48 Zi.171.070.150 -.414 Vi.0341.0690.0372.0833 To construct a set of simple degre levels of water softness can be reg effects involving softness into th The quantities Z, vi, computed standardized contrasts are compute main effect of softness, we compute f (.477 +.332 -.337 -.575) - 2(.275 +.355 -.156 -.704) + (.171 +.070 +.150 -.414) =.334.

486 NOTES The estimated variance is (.0377 + * +.0627) + 4(.0345 +... +.0625) + (.0341 +..* +.0833) = 1.252, and finally the standardized contrast (10) is /1.252 334 =.300; this has approximately unit theoretical variance. The contrasts C' from (10) are collected in Table 2. TABLE 2 Logistic Factorial Contrasts Estimated from the Data of Table 1. Standardized Contrast, C' Temperature T 1.908 M User vs Non-user M 4.674 Softness, linear SL 0.121 quadratic SQ 0.300 T X M -1.487 T X SL 0.432 T X SQ 0.099 M X SL -1.862 M X SQ 0.673 T X M X SL -0.568 T X M X SQ -0.626 A preliminary plot, or inspection of Table M is very highly significant; note that in Tabl greater than Zi for the corresponding user M was omitted from the plot and Fig. 1 sho to the theoretical unit variance. There are th in order from highest to T, M X SL, and M significance individually, the fact that thr and that this is a factor corresponding to units (individuals) rather than to a treatmen on factor M, i.e., analyzing separately the r M. The good agreement of the remaining p some check on the absence of additional com To do this, we analyze separately the two to the two levels of M. Note that positive Zi for X over M, negative Zi to an average prefer are briefly as follows. For previous non-us preference for X, i.e., positive Zi. There i perature, but a steady decrease in preferen difference in Zi between soft and hard be mate standard error estimated from the v

NOTES 487 2.0 MXSL T 1.8- / < 1.6- TxM / o 1.4-0 / Cu 1.2- N/ 1.0 / z 0.4- /@ z < 0.8-0.2-0 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 STANDARD SEMI-NORMAL QUANTILE FIGURE I-Half-normal plot for logistic factorial contrasts from data of Table 1, omitting main effect of M. Line gives theoretical slope. all cells except one show an average preference for M, i.e., negative Zi. Th preference is stronger at the higher temperature, the average difference of between high and low temperatures being -.450?.191. There is a decrease in strength of preference for M with water softness; in terms of Zi this is an effect of opposite sign from that observed with previous non-users of M. The average difference in Zi between soft and hard is.324 +.239. Thus in both cases the effects of softness, while suggestive, are not clearly established. To obtain some appreciation of the meaning of differences of a given magnitude on the logistic scale, the reader may find it helpful to prepare a diagram showing, for a series of values of A, curves joining points (p,, P2) differing by A on a logistic scale, i.e., related by the equation l og og (1 P ) + A. (13) The above are broadly the conclusions reached by R by a series of chi-squared tests. The present approach estimation as well as the significance testing of effects. The rather informal graphical approach is likely to be in more complex systems. Then some rough preliminar be necessary before fitting a suitably simplified model likelihood (Dyke and Patterson, 1952) or by weighted le ACKNOWLEDGMENTS We are grateful to Harry Smith, Jr. for suggesting the illustrative example. This work was done at Bell Telephone Laboratories, Murray Hill, New Jersey.

488 NOTES REFERENCES ANSCOMBE, F. J., 1956. On estimating binomial response relations. Bio BARTLETT, M. S., 1935. Contingency table interactions. Suppl. J. R. S BERKSON, J., 1944. Application of the logistic function to bioassay. J. A 357-365. BERKSON, J., 1953. A statistically precise and relatively simple method of estimating th bioassay with quantal response, based on the logistic function. J. Amer. Statist. Assoc., 4 565-599. BIRCH, M. W., 1964. The detection of partial association, I: the 2 X 2 case. J. R. Statist. Soc. 26, 313-324. BIRCH, M. W., 1965. The detection of partial association, II: the general case. J. R. Statist. Soc. B 27, 111-124. DANIEL, C., 1956. Fractional replication in industrial research. Proc. 3rd. Berkeley Symp. on Math. Statist. and Prob. 5, 87-98. DANIEL, C., 1959. Use of half-normal plots in interpreting factorial two-level experiments. Technometrics, 1, 311-341. DYKE, A. V. and PATTERSON, H. D., 1952. Analysis of factorial arrangements when the data are proportions. Biometrics 8, 1-12. HALDANE, J. B. S., 1955. The estimation and significance of the logarithm of a ratio of frequencies. Ann. Hum. Genetics 20, 309-311. HITCHCOCK, S. E., 1962. A note on the estimation of the parameters of the logistic function, using the minimum logit X2 method. Biometrika 49, 250-252. LEWIS, B. N., 1962. On the analysis of interaction in multi-dimensional contingency tables. J. R. Statist. Soc. A 125, 88-117. RIES, P. N. and SMITH, H., 1963. The use of chi-square for preference testing in multi-dimensional problems. Chemical Engineering Progress, 59, 39-43.