Interval Estimates for Efficiency of Plating

Similar documents
(Jennison, 1937). This lack of agreement may be overcome, at

A booklet Mathematical Formulae and Statistical Tables might be needed for some questions.

1 Mathematics and Statistics in Science

Volume Title: The Interpolation of Time Series by Related Series. Volume URL:

Significance Testing with Incompletely Randomised Cases Cannot Possibly Work

13.7 ANOTHER TEST FOR TREND: KENDALL S TAU

University of Regina. Lecture Notes. Michael Kozdron

1 Measurement Uncertainties

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS

FACTORIZATION AND THE PRIMES

Estimation and Confidence Intervals for Parameters of a Cumulative Damage Model

Solution: chapter 2, problem 5, part a:

Chapter 1 Statistical Inference

The Statistical Analysis of Plant Virus Assays: a Transformation to Include Lesion Numbers with Small Means

Probability Distributions

Statistical Experiment A statistical experiment is any process by which measurements are obtained.

The American Mathematical Monthly, Vol. 100, No. 8. (Oct., 1993), pp

More on Estimation. Maximum Likelihood Estimation.

A booklet Mathematical Formulae and Statistical Tables might be needed for some questions.

Post Scriptum to "Final Note"

CHAPTER 1. Review of Algebra

Confidence intervals and the Feldman-Cousins construction. Edoardo Milotti Advanced Statistics for Data Analysis A.Y

Probability. Table of contents

Non-parametric Statistics

Hardy s Paradox. Chapter Introduction

Take the measurement of a person's height as an example. Assuming that her height has been determined to be 5' 8", how accurate is our result?

Keppel, G. & Wickens, T.D. Design and Analysis Chapter 2: Sources of Variability and Sums of Squares

Chapter 13 - Inverse Functions

Statistics 3 WEDNESDAY 21 MAY 2008

Examiners Report/ Principal Examiner Feedback. June GCE Core Mathematics C2 (6664) Paper 1

1 Measurement Uncertainties

was prepared by the method of Beeby and Whitehouse and sodium hypochlorite were tested periodically; no changes were detected over the experimental

Introduction. Probability and distributions

2012 Assessment Report. Mathematics with Calculus Level 3 Statistics and Modelling Level 3

Examiners Report/ Principal Examiner Feedback. Summer GCE Core Mathematics C3 (6665) Paper 01

The Annals of Human Genetics has an archive of material originally published in print format by the Annals of Eugenics ( ).

SEQUENTIAL TESTS FOR COMPOSITE HYPOTHESES

Statistical Practice

Chapter 4: An Introduction to Probability and Statistics

Machine Learning Lecture Notes

FUNCTIONS AND MODELS

Topic 17: Simple Hypotheses

DEMBSKI S SPECIFIED COMPLEXITY: A SIMPLE ERROR IN ARITHMETIC

Chapter 3. Introduction to Linear Correlation and Regression Part 3

Time: 1 hour 30 minutes

A Note on Bayesian Inference After Multiple Imputation

A-LEVEL STATISTICS. SS04 Report on the Examination June Version: 1.0

Quadratic Equations. All types, factorising, equation, completing the square. 165 minutes. 151 marks. Page 1 of 53

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

Slides for Data Mining by I. H. Witten and E. Frank

Non-independence in Statistical Tests for Discrete Cross-species Data

Probability Distribution

Supporting Australian Mathematics Project. A guide for teachers Years 11 and 12. Probability and statistics: Module 25. Inference for means

VARIABILITY OF KUDER-RICHARDSON FOm~A 20 RELIABILITY ESTIMATES. T. Anne Cleary University of Wisconsin and Robert L. Linn Educational Testing Service

The Integers. Peter J. Kahn

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p

Chapter 9. Non-Parametric Density Function Estimation

The Art of Dumbassing Brian Hamrick July 2, 2010

Supplementary Logic Notes CSE 321 Winter 2009

Treatment of Error in Experimental Measurements

Just Enough Likelihood

Foundations of Probability and Statistics

Basic Probability Reference Sheet

Time: 1 hour 30 minutes

Introduction to Advanced Mathematics (C1) WEDNESDAY 9 JANUARY 2008

Uncertainty. Michael Peters December 27, 2013

This chapter follows from the work done in Chapter 4 of the Core topics book involving quadratic equations.

Selection should be based on the desired biological interpretation!

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Chapter Three. Hypothesis Testing


CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Module 8 Probability

ter. on Can we get a still better result? Yes, by making the rectangles still smaller. As we make the rectangles smaller and smaller, the

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

1 Random and systematic errors

MEI STRUCTURED MATHEMATICS 4751

Time: 1 hour 30 minutes

Model Estimation Example

Probability and Statistics

ON THE APPROXIMATION OF THE TOTAL AMOUNT OF CLAIMS. T. PENTIKAINEN Helsinki

2010 HSC NOTES FROM THE MARKING CENTRE MATHEMATICS EXTENSION 1

EXERCISES IN QUATERNIONS. William Rowan Hamilton. (The Cambridge and Dublin Mathematical Journal, 4 (1849), pp ) Edited by David R.

Journeys of an Accidental Statistician

CHAPTER FIVE FUNDAMENTAL CONCEPTS OF STATISTICAL PHYSICS "

Food delivered. Food obtained S 3

Chapter 26: Comparing Counts (Chi Square)

2009 Sample Math Proficiency Exam Worked Out Problems

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

ACCURATE FREE VIBRATION ANALYSIS OF POINT SUPPORTED MINDLIN PLATES BY THE SUPERPOSITION METHOD

Numbers, proof and all that jazz.

Probability Distributions - Lecture 5

Mark Scheme (Results) January 2009

Maximum-Likelihood fitting

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Stochastic Quantum Dynamics I. Born Rule

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

Introduction to Thermodynamic States Gases

GCE EXAMINERS' REPORTS

arxiv:quant-ph/ v1 19 Oct 1995

Transcription:

J. gen. Virol. (1968), 2, 13-I8 Printed in Great Britain I3 Interval Estimates for Efficiency of Plating By T. WILLIAMS* Department of Mathematics, Duke University, Durham, North Carolina 27706, U.S.A.,t and Guinness-Lister Research Unit, Lister Institute of Preventive Medicine, Chelsea Bridge Road, London, S. IV. I (Accepted 8 July i967) SUMMARY The observed efficiency of plating (e.o.p.) is defined as the ratio of two plaque or colony counts and is in practice subject to substantial sampling fluctuations. The true e.o.p, is here defined as the ratio of the underlying mean values, of which the two counts are Poissonian estimates: thus, when the number of counts on each plate is assumed to increase indefinitely, the observed e.o.p, converges in probability to the true e.o.p. A contour map is provided, by means of which one may rapidly determine, for any given pair of counts in the range 50 to IOOO, an interval which will in 95 % of all cases contain the true e.o.p. INTRODUCTION The technique of estimating the efficiency of plating (e.o.p.) of one viral suspension with respect to another, by plating out on two hosts and computing the ratio of the plaque (or focus) counts, has long been in use (Ellis & Delbriick, I939) and has a wide variety of applications. Thus it may be used to compare the activities of two preparations of virus on the same type of host cell, to assess differences in susceptibility between hosts or to compare biological activity with counts of particles. Numerous variants exist; for example if one expects 1 in l0 s organisms to be a mutant resistant to an antibiotic, one may plate with and without the antibiotic carrying tenfold serial dilutions six stages further for the untreated medium. THE STATISTICAL PROBLEM Plainly, the efficiency of plating so defined is not a fixed quantity but a random variable governed by some probability distribution, for one would be greatly surprised if the e.o.p, was exactly the same on repeating the experiment. Indeed, if equal volumes are pipetted from a homogeneous suspension, the numbers of organisms taken up will not be constant from one pipette to the next, but will follow a Poisson distribution with mean given by the product of the common volume and the number of organisms per ml. of suspension (Fisher, Thornton & Mackenzie, 1922; Meynell & Meynell, 1965). * Present address: Centre International de Recherche sur le Cancer, 16 Avenue Mar6chal Foch, 69, Lyon (6~me), France t Permanent address.

14 T. WILLIAMS Suppose, then, that replicate counts are performed on a series of pairs of plates, producing on the first culture medium a sample sequence of counts, X1, distributed as a Poisson variate with mean #1: symbolically, and on the second medium a sequence )(2, The true e.o.p, must then be defined as X1 ~ Poisson {/~1}; whilst, for any sample pair of counts, we have X2 ~ Poisson {#2}. (2) true e.o.p. = #1/#2, (3) observed e.o.p. -- XI/X~. (4) Equation (3) is a necessary definition inasmuch as (4) converges in probability to (3) as X1 and X~ approach infinity. Convergence in probability (Kendall & Stuart, I96I, chapter I7) is most simply exemplified in a tossing experiment with a true coin, where the observed frequency of appearance of heads approaches o'5 in probability as the number of trials increases without limit. In other words, if extremely large counts were made, it would be most unlikely for X1/X2 to differ appreciably from #1[#2, and it is clearly the latter that the experimenter has in mind when he sets out to measure the e.o.p. The true e.o.p, is a parameter of the system, the observed e.o.p, a statistic; what we are therefore confronted with is the familiar statistical problem of estimating a parameter on the basis of a sample, namely the values X1 and X2. In general, the common practice of taking (4) as a point estimate of (3) may prove misleading. It may, for example, spuriously suggest the existence of a trend in the e.o.p, plotted against an adjustable parameter of the system, when in fact all the variability observed can be accounted for by chance alone, the effects of which, as we shall see in a moment, can be quite considerable even for fairly high pairs of counts. For this reason, it is desirable to give not a point estimate but an interval estimate of the e.o.p. : to state, with 95 % probability of being correct, that the true e.o.p, lies within certain specified limits. One makes such assertions in the expectation that one will, in the long run, prove to be in error in one case in twenty. (I) DETERMINATION OF THE INTERVAL The contour map shown in the accompanying Figure provides such limits. Suppose that the two counts were 2oo and 4oo; a point estimate of the e.o.p, would then be 200/400, or 0"5. To obtain an interval estimate, we read the numerator of the fraction on the vertical logarithmic scale along the left-hand side of the Figure and the denominator on the horizontal logarithmic scale along the bottom. The point (4oo, 20o) so obtained is situated between the e.o.p, contours (the lines lying diagonally athwart the Figure) for 0"3 and 0"5. Passing up and to the right from the point (4oo, 20o) in parallel with the contours, one reaches the e.o.p, scale along the outside of the righthand edge of the Figure at the value 0.42: this is the lower end of the interval. One

Interval estimates for efficiency of plating 15 next interchanges the co-ordinates, obtaining the point (2oo, 400) from which one proceeds parallel to the contours; until coming to the e.o.p, scale along the top edge of the Figure at the value 0"59, which is the upper end of the interval. We may therefore state here with 95% confidence that the true e.o.p, lies between 0"42 and 0"59; the precise limits in this example are, to four decimal places, o.422o and 0-5925, and even the most hurried readings from the Figure should in general be correct to two significant figures. This example emphasizes the need for interval estimates, for even with these rather substantial counts the true e.o.p, may, nineteen times in twenty, be as much as I5"6% lower, or I8-5% higher, than that observed. Using the map, the reader may easily persuade himself how much more pronounced the chance variations become for smaller counts. It will be noticed that the e.o.p, scale of the top edge has been continued down the inside of the right edge as far as I'5, with marks at I.I, 1"15, 1.2, etc. This is because the e.o.p, contours slowly decline from 45 as they rise; therefore, starting from a point above but close to the principal diagonal and moving parallel to the contours, one will occasionally be carried down across the diagonal and finally on to the stretch of the e.o.p, scale in question. Obviously, the converse situation (crossing up through the principal diagonal) can never arise. THE STANDARD ERRORS The most obvious approach is to consider (4) as a statistic which estimates the parameter (3), and to set confidence limits on the latter by calculating the sampling variance of the former; this was done by Irwin (1938) in his note to Miles & Misra's paper. Now, it is easy to show on the strength of (i) and (2) that, when/.1 and/*2 are large, the variance of (4) is given by o-? = ~(~+#2), (5) and so, if we assume approximate normality, we find, with 95 % confidence, Xl #1 ~ 1-96o" x. (6) -!.96o-1 ~< ~ /*~ An inequality like (6) is called pivotal, since, although it is set up as a bounding statement on the statistic, X1/X2, it may be re-east as one about the parameter,/~1/#~. Indeed, one sees immediately from (6) that x~xa I"96 '1 ~< #3/*~ ~ ~+ 1"96 "1' (7) thus providing an interval estimate for the true e.o.p. The difficulty with (7), of course, is that one does not really know o'1, since (5) is in terms of #1 and/~, and if we knew these we would know the true e.o.p, without any further ado. The best we can do here is to use the readings from (I) and (2) as Poisson estimates of/*~ and #2 and replace (5) by xl o-? ~ ~ (xl + x~), (8) the swung dashes designating approximate equality.

I6 T. WILLIAMS Ordinarily, the practice of replacing unknown parameter values by sample estimates is to be frowned upon, since it tends to lead to unrealistically narrow confidence intervals: the classic example is Student's t-distribution, which is based on the sample variance from a normal population and is much broader than the distribution obtaining when the parent variance is known. One would expect here that the Poisson variability inherent in (1) and (2) would be sufficiently great to invalidate (8) as an adequate approximation to (5); curiously enough, however, this does not appear to be the case. Thus, the use of (7) and (8) produces for the example X1 = 2oo, X8 = 40o already discussed, the confidence interval (o.415i, o-5849)--remarkably good agreement with the limits (0.4220, o'5929) cited earlier, when one considers that these were based upon a fiducial argument to be described in the next section. When/fi and #8 are close in value, the skewness of the distribution of X1/X2 becomes appreciable and it is desirable to invoke a normalizing transformation (Kendall & Stuart, chapter 31). This is very simply provided here by the statistic X1/(XI+X2) which, since it is confined to the interval (o, I), is inherently more symmetric. It may be shown that its variance is given by in terms of which we may write the inequality which may be rearranged to yield If, finally, we replace (9) by 0"82 = #1#81(#, +,u2) 3, (9) - t.96o" 8 ~< X1 /q ~< I'96trs, (lo) XI + X8 tq + /z2 ( X1/ [ X 1 + X~]) - 1"96tr 2 ( X1/ [ X 1 + X2]) + l'96tr 8 ~< #1 ~ (II) ( Xs/[X~ + Xz]) + I'96 '2 /~8 ( XJ[X~ + X2])-1"96 '8 ~? ~ xlxd(x~ + xs) 3, (12) we have a pair of equations from which approximate confidence intervals may be computed. In the test case we have been discussing it leads to the limits (o.4197, o'59oo), only slightly different from those obtained from (7) and (8). For more nearly equal X1 and X2, however, the difference between (6) and (1 I) will be more marked, although the latter appears always to be in substantial agreement with estimates derived from the Figure. Hence the reader who distrusts the sort of fiducial argument presented in the following section may construe the map to be no more than a shortcut version of (i1) and 02). THE FIDUCIAL ARGUMENT It is the exception rather than the rule when confidence intervals can be obtained for a parameter by means of a pivotal inequality or through eliminating a scale factor by studentization (Kendall & Stuart, chapter 2o). In general it is necessary to have recourse to a fiducial argument, which rests upon the idea of a prior distribution of parameter values. To see why this is a very natural concept, let us consider more closely the laboratory procedure implicit in plating. It is reasonable to assume that the investigator begins by knowing #1 and/~8 to

Interval estimates for efficiency of plating 17 an order of magnitude; he can then, by making tenfold serial dilutions, always succeed in reducing them to the range (IOO, IOOO). In the event he misjudges, he may either find himself with numbers so small as to vitiate his accuracy, or so great that counting becomes tedious and inaccurate; in either case he will reject the trial and commence afresh, with a clearer idea of the correct values. Since the initial concentrations come randomly from nature, a moment's thought shows that the modus operandi just described should cause the prior distributions of the common logarithms of both #1 and #3 to be uniform (flat) between 2-o and 3"o. In fine, to know the prior distribution is neither more nor less than to know the conditions under which the experiment is performed, without which knowledge we plainly cannot in general hope to make probabilistic predictions. A result known as Fieller's Theorem (Finney, 1952, pp. 28 ft.) is available for setting fiducial limits to a ratio such as (4), by means of the t-distribution. Unfortunately, this requires the existence of a X 2 variance estimate, as well as normality of the underlying distributions, and so is not applicable here. We have therefore to look elsewhere. Clearly #1 and/z~ must both come from the same prior distribution: this is merely to say that the experimental technique employed on both series of plates is the same. Rather than the uniform logarithmic distribution just discussed, another form was found to be more flexible and mathematically convenient. It was assumed that #1 and /z 2 were both exponentially distributed with the same expected value--a necessary restriction since, as we have just noted, the two prior distributions should be identical; the prior expectations actually cancel out in the calculation, so it is immaterial what values they have, so long as they are equal. It is perhaps worth noting that the Figure may even be employed under more general conditions, namely if we assume that the prior distributions of #1 and/~ are both X~, ~ with the same expected value: we have only to add n- I to both counts before reading the e.o.p. The 122, distribution has coefficient of variation (i.e. the standard deviation in units of the mean) equal to I/~/n; thus for n = z5, about 95% of the distribution lies in the range of from 6o % to 14o % of the mean. An investigator who could consistently manage to hold his values of/21 and #3 within such narrow bounds would indeed be performing with remarkable virtuosity. Yet even in this unlikely situation the final interval estimate for the true e.o.p, differs but little from that already obtained in our example; for, altering the co-ordinates to (224, 424) changes the fiducial interval to (o'45, o.62). This robustness, i.e. the fact that the interval is so insensitive to the prior distribution, affords further warrant for the fiducial approach. With X~, prior distributions of #1 and/*2, the posterior distribution of the true e.o.p., incorporating the added information conveyed by the knowledge that the two counts observed were 2"1 and X2, is a/?-distribution of the second kind (Kendall & Stuart, chapter 6), with parameters p = X, + n and q = XI+ n. The Figure is therefore nothing more than a relief map of the 2"5 and 97"5 percentiles of such a distribution. This work was supported by grant no. GM AU 13491--Ol, 2 of the U.S. National Institutes of Health and was begun during the summer of 1966 while the author was visiting professor at the Lister Institute, for whose generous hospitality he is grateful. The computations were performed on the Triangle University Computing Center's IBM 36o computer, which drew the map directly by means of a Calcomp plotter. 2 J. Virol. 2

I8 T. WILLIAMS It is a pleasure for the author to express his thanks to Professor G. G. Meynell, who suggested the problem; to Professor Joseph W. Beard, who commented on several draft versions of the manuscript; and to Elinor A. Ciftan, who supervised the computer programming. REFERENCES ELLIS, E. L. & DELBRUCK, M. (I939). The growth of bacteriophage. J. gen. PhysioL zz, 365. FINNEY, D. J. 0952). Statistical Method in Biological Assay. London: Charles Griffin and Co. Ltd. FISHER, R. A., THORNTON, H. G. & MACKEr~Zm, W. A. (I922). The accuracy of the plating method of estimating the density of bacterial populations. Ann. appl. Biol. 9, 325. IRWlN, J. O. (1938). A note on the standard error of the survival rate of bacteria in blood-bacterium mixtures. J. Hyg., Camb. 38, 748. KENOALL, M. G. & STUART, A. (I96I). The Advanced Theory of Statistics, Vols. I and II. London: Charles Griffin and Co. Ltd. MEYNELL, G. G. & MEYNELL, E. (I965). Theory and Practice in Experimental Bacteriology. Cambridge University Press. MILES, A. A. & MISRA, S. S. (I938). The estimation of the bactericidal power of the blood. J. Hyg., Camb. 38, 732. (Received 20 January 1967)

-f. d-- --k 0'07 008 0,09 0-1 - 0-5 - 0.6-0"7-0-8-0,9 - 'i.o c Numer E) 50- ~ ~s Ng ~, Boo ~ ~.~ 70 80 100 ~ ~-~- ~~ ~ 200 d- ~g -I- ~g g~ -I- Lower limit