GUIDELINES ON REPRESENTATIVE SAMPLING

Similar documents
Lecture 2: Monte Carlo Simulation

A statistical method to determine sample size to estimate characteristic value of soil parameters

Bayesian Methods: Introduction to Multi-parameter Models

1 Inferential Methods for Correlation and Regression Analysis

Statistics 511 Additional Materials

This is an introductory course in Analysis of Variance and Design of Experiments.

Properties and Hypothesis Testing

The standard deviation of the mean

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

1.010 Uncertainty in Engineering Fall 2008

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Chapter 6 Sampling Distributions

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

6.3 Testing Series With Positive Terms

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Estimation of a population proportion March 23,

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

A LARGER SAMPLE SIZE IS NOT ALWAYS BETTER!!!

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Simulation. Two Rule For Inverting A Distribution Function

Sample Size Determination (Two or More Samples)

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

Estimation of Population Mean Using Co-Efficient of Variation and Median of an Auxiliary Variable

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Statistical Intervals for a Single Sample

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Problem Set 4 Due Oct, 12

On an Application of Bayesian Estimation

MATH/STAT 352: Lecture 15

Frequentist Inference

Estimation for Complete Data

Final Examination Solutions 17/6/2010

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Statistical inference: example 1. Inferential Statistics

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Random Variables, Sampling and Estimation

Provläsningsexemplar / Preview TECHNICAL REPORT INTERNATIONAL SPECIAL COMMITTEE ON RADIO INTERFERENCE

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

1 Review of Probability & Statistics

Element sampling: Part 2

Accuracy assessment methods and challenges

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

Economics Spring 2015

Chapter 6 Principles of Data Reduction

Topic 9: Sampling Distributions of Estimators

There is no straightforward approach for choosing the warmup period l.

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Basis for simulation techniques

Bayesian Control Charts for the Two-parameter Exponential Distribution

MA238 Assignment 4 Solutions (part a)

6 Sample Size Calculations

Chapter 13, Part A Analysis of Variance and Experimental Design

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

Simple Random Sampling!

Chapter 8: Estimating with Confidence

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Topic 9: Sampling Distributions of Estimators

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Confidence Interval Guesswork with Confidence

A Confidence Interval for μ

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Confidence Intervals for the Population Proportion p

Modeling and Estimation of a Bivariate Pareto Distribution using the Principle of Maximum Entropy

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Relations between the continuous and the discrete Lotka power function

Chi-Squared Tests Math 6070, Spring 2006

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

Statistical Inference Based on Extremum Estimators

Topic 10: Introduction to Estimation

Data Analysis and Statistical Methods Statistics 651

Topic 9: Sampling Distributions of Estimators

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

POWER COMPARISON OF EMPIRICAL LIKELIHOOD RATIO TESTS: SMALL SAMPLE PROPERTIES THROUGH MONTE CARLO STUDIES*

R. van Zyl 1, A.J. van der Merwe 2. Quintiles International, University of the Free State

Lecture Notes 15 Hypothesis Testing (Chapter 10)

GG313 GEOLOGICAL DATA ANALYSIS

The target reliability and design working life

HYPERGEOMETRIC SAMPLING TOOL a BACKGROUND OF CALCULATION AND VALIDATION

STAT 155 Introductory Statistics Chapter 6: Introduction to Inference. Lecture 18: Estimation with Confidence

Topic 18: Composite Hypotheses

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Activity 3: Length Measurements with the Four-Sided Meter Stick

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

ANALYSIS OF EXPERIMENTAL ERRORS

The Random Walk For Dummies

Lecture 12: September 27

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Statistical Analysis on Uncertainty for Autocorrelated Measurements and its Applications to Key Comparisons

Last Lecture. Wald Test

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Statisticians use the word population to refer the total number of (potential) observations under consideration

Access to the published version may require journal subscription. Published with permission from: Elsevier.

A General Family of Estimators for Estimating Population Variance Using Known Value of Some Population Parameter(s)

Transcription:

DRUGS WORKING GROUP VALIDATION OF THE GUIDELINES ON REPRESENTATIVE SAMPLING DOCUMENT TYPE : REF. CODE: ISSUE NO: ISSUE DATE: VALIDATION REPORT DWG-SGL-001 002 08 DECEMBER 2012 Ref code: DWG-SGL-001 Issue No. 002 Page: 1/15

1. 2. 3. 4. Updates of the last versio Date Update Page December- 2012 Sectio 2 Samplig based o the hypergeometric distributio ad the correspodig tables have bee withdraw. Sectio 2, Appedix 1, Appedix 2 Ref code: DWG-SGL-001 Issue No. 002 Page: 2/15

1 Itroductio This documet presets the results of the validatio of the tables ad samplig software as published i the Guidelies o represetative samplig of the ENFSI. Validatio is described by the ENFSI as a process of 1. establishig the performace characteristics ad limits of a method ad the idetificatio of the iflueces that may chage these characteristics ad to what extet, ad 2. verifyig that a method is fit for purpose, i.e. for use for solvig a particular aalytical problem. (See Validatio ad implemetatio of (ew) methods, QCC, ENFSI, 2006). More specifically this documet validates the three statistical samplig methods based o, respectively, the hypergeometric distributio (Sectio 2), the biomial distributio (Sectio 3), ad the Bayesia approach (Sectio 3), as well as the method of estimatig of the average weight of a drug uit (Sectio 4). I these sectios we will verify, subsequetly, whether formulas as described i the Guidelies are correctly implemeted, ad whether the tables ad subsequetly computer software provide expected results. I sectio 5 we discuss the performace characteristics, limits ad iflueces o these characteristics. We refer to the Guidelies for a detailed descriptio of these methods, ad further cosideratios ad recommedatios. March 2009, Reioud Stoel Netherlads Foresic Istitute, The Hague Ref code: DWG-SGL-001 Issue No. 002 Page: 3/15

2 Samplig based o the hypergeometric distributio Sectio withdraw. Please see the ew DWG documet, referece code: DWG-SGL-002. Ref code: DWG-SGL-001 Issue No. 002 Page: 4/15

3 Samplig based o the biomial distributio The biomial distributio assumes samplig with replacemet. However, whe the seizure is very large ad the sample is relatively small the hypergeometric distributio ca be approximated by the less complex biomial distributio. Similar to the hypergeometric distributio, the biomial distributio ca be used to calculate required sample size such that with at least (1-α)100% cofidece ca be stated that at least a proportio of k is positive. Please ote that N, populatio size, is ot a parameter. It should be kept i mid that the biomial distributio is a approximatio, ad that the sample size will be (slightly) overestimated. The first step i the validatio process is agai to check whether the umbers preseted i the table correspod to the results obtaied from the computer software. All sample sizes as preseted i Table 5.3 correspod to the output of the computer software. The secod step is to verify whether the correct formulas are used, ad to validate whether the computer software provides correct results. Determiatio of the miimum required sample size () is based o a test the hypothesis H 0 : θ k agaist H 1 : θ > k N 1, where θ =. N To select, the equatio to be solved is P(X x θ = k) i= x x θ ( 1 θ ) x x α So give that θ = k the required miimal sample size () is the smallest value for which P(X x θ = k) α. Ref code: DWG-SGL-001 Issue No. 002 Page: 5/15

Whe all sampled drug uits are expected to cotai drugs (i.e. x=), X is distributed as a biomial radom variable: X ~ BIN(x,, θ) Resultig i P(X x θ = k, x=) = θ (4) I this case the miimum required sample size is readily obtaied by solvig logα logθ (5) Whe at most oe sampled drug uit is expected ot to cotai drugs (i.e. x -1), X is distributed as a mixture of two biomial radom variables: θ + θ 1 1 (1 θ ) P(X x θ = k, x -1) (6) 1 Whe at most two sampled drug uits are expected ot to cotai drugs (i.e. x -2), X is distributed as a mixture of three biomial radom variables. θ + θ 1 (1 θ ) + θ 2 1 1 2 (1 θ ) P(X x θ = k, x -2) (7) 3.1 Results The software implemets Formula 4, 6 ad 7 by makig use of the stadard formulas for the biomial distributio of MS Excel 2003. After havig removed the protectio of the software it is cocluded that the 2 Ref code: DWG-SGL-001 Issue No. 002 Page: 6/15

formulas used for computig the probabilities for x, x -1 ad x -2, are equivalet to, respectively Formula 4, 6, ad 7. Furthermore, that required sample size () is based o the correct cut-off (i.e. P(X x θ = k) α.). Further validatio of the software is performed by comparig the probabilities of the software (P software ) with those of had calculatios (P had ) usig Formula 4, 6, ad 7 as well as calculatio by meas of the biomial distributio fuctio implemeted i the computer software R (P R ). For a selected umber of combiatios of k, (1-α)100%, ad -x, the probabilities are computed ad compared. Validatio is successful if P software = P had = P R Table 2 i the Appedix presets the results. As expected, based o the equivalece of the formulas, all computed probabilities are equivalet give the parameters. 3.2 Coclusio The biomial approach to samplig, as described by the Guidelies o represetative samplig, is correctly implemeted i the software. Ref code: DWG-SGL-001 Issue No. 002 Page: 7/15

4 Samplig based o Bayesia approach. The Bayesia approach assumes that, although the populatio proportio is ot kow, there may be some ideas about the size of this proportio. These ideas are represeted by a probability distributio P(θ ). Istead of estimatig P(X x θ = k), the Bayesia approach directly estimates P(θ >k x,). 4.1 Seizures cotaiig > 50 uits (with relative small samples) Select a sample size such that the probability that P(θ >k x,) =(1-α)100%. θ follows a beta distributio with parameters x+a ad -x+b: x+ a 1 x+ b 1 θ (1 θ ) f ( θ x,, a, b) = (8) B( x + a, x + b) Selectio of the parameters, a ad b, should be based o prior iformatio about the cotet(s) of the uits. 4.2 Seizures cotaiig < 50 uits For small seizures it may be better to use the umber of positives, Y, i the uexamied uits. Select a sample size such that the probability that P(Y >y x,, N) =(1-α)100%. Y follows a beta-biomial distributio: N Γ( + a + b) Γ( y + x + a) Γ( N x y + b) y f ( Y x,, N ), a, b) = (9) Γ( x + a) Γ( x + b) Γ( N + a + b) 4.3 Results 4.3.1 > 50 uits Please ote that the software uses N 50 istead of N>50. The software implemets Formula 8 by makig use of the stadard formulas for the Beta distributio of MS Excel 2003. The first step i the validatio process is to check whether the umber preseted i the table correspod to the results obtaied from the computer software. All sample sizes as preseted i Table 5.4 correspod to the output of the computer software. Ref code: DWG-SGL-001 Issue No. 002 Page: 8/15

Further validatio of the software is performed by comparig the probabilities of the software (P software ) with those of calculatio by meas of the Beta distributio fuctio implemeted i the computer software R (P R ). For a selected umber of combiatios of k,, a, b, the umber of egatives, the probabilities are computed ad compared. Validatio is successful if P software = P R Table 3 i the Appedix presets the results. As expected based o the equivalece of the formulas, all computed probabilities are equivalet give the parameters. 4.3.2 < 50 uits The software implemets Formula 9 by makig use of the stadard formulas for the Beta distributio of MS Excel 2003. Further validatio of the software is performed by comparig the probabilities of the software (P software ) with those of calculatio by meas of the Gamma distributio fuctio implemeted i the computer software R (P R ). For a selected umber of combiatios of k,, a, b, the umber of egatives, the probabilities are computed ad compared. Validatio is successful if P software = P R Table 4 i the Appedix presets the results. As expected based o the equivalece of the formulas, all computed probabilities are equivalet give the parameters. 4.4 Coclusio The Bayesia approach to samplig, as described by the Guidelies o represetative samplig, is correctly implemeted i the software. Ref code: DWG-SGL-001 Issue No. 002 Page: 9/15

5 Estimatio of weight A procedure for estimatio of a (1-α)100% cofidece iterval of the weight of a drug uit, ad the total weight i the seizure ( N X ) is provided. r NX N r Ns t N W r NX + N r Ns N α t α (10) where N N of sampled uits without drugs. is the fiite populatio correctio factor ad r r is the correctio factor for the umber is actually the proportio of drug uits i the sample, ad is a estimate of the total umber of drug uits i the populatio. Please ote that the ucertaity i ot take ito accout (see also Aitke & Lucy, 2002). r r N is Iequality 8 is based o Tzidoy & Ravreby (1992, p.1546) (see also Aitke & Lucy, 2002, p.3), however, it seems to cotai a error. Sice there are r uits without drugs, the effective sample size o which the estimatio of the total weight is based should equal (-r). This has cosequeces for both the stadard error ad the degrees of freedom of the correspodig t-distributio which should equal (-r-1) istead of (-1), resultig i a slightly higher critical value t * α. Iequality 8 should, therefore, be rewritte ito: r NX N r N Ns t r W r NX + N r N Ns * * α t α r (11) Based o the above results it is advised to correct the formula i the booklet ad the program (see Stoel & Bolck, i press). 5.1 Coclusio Although the formulas used i the program are ot optimal, they are correctly implemeted. That is, the program produces the results as expected from the formulas provided i the booklet. Please ote agai that the ucertaity i P corr is ot take ito accout. The cofidece iterval is, as a cosequece, iterpreted coditioally o P ). Icorporatig the ucertaity i P ) will result i a complex cofidece iterval with the degrees of freedom of the t-distributio beig a radom variable. Alberik, Bolck ad Stoel (submitted) performed a simulatio study ad showed that the cofidece itervals are ideed ot optimal, sice they are based o a uderestimatio of Ref code: DWG-SGL-001 Issue No. 002 Page: 10/15

the variatio of the uderlyig statistical process. They preset two other formulas which appear more reliable ad asymptotically correct. I particular i situatios where the proportio of positives is becomig icreasigly smaller tha 1. Future research is eeded to ivestigate these ew cofidece itervals ad to cotrast them with the Bayesia approach to weight estimatio (see Aitke & Lucy, 2002). It is advised to correct the formulas i the booklet as described by Stoel ad Bolck (2009, i press), ad to add a ote that the resultig iterval is ot optimal if the proportio of positives is substatively smaller tha 1 with. Furthermore, the Bayesia approach to weight estimatio could also be cosidered for iclusio i the software, but this will require a lot of additioal effort. Ref code: DWG-SGL-001 Issue No. 002 Page: 11/15

6 Discussio This documet described the validatio of the tables ad software as preseted i the booklet Guidelies o represetative samplig of the ENFSI. Based o the results regardig the correctess of the probabilities produced by the program, it is cocluded that sample sizes resultig from the hypergeometric, the biomial ad the Bayesia approach are also correct. Regardig the weight estimatio some further attetio should be give to a error i oe of the Formulas. As far as we ca judge the error attributed to a paper by Tzidoy & Ravreby (1992), ad subsequetly resisted i the literature. Stoel & Bolck (2009, i press) have proposed a correctio to this formula (see also the Sectio 5). From a statistical poit of view it is advised to implemet this correctio i the booklet as well as i the software, although the cosequeces i practice will ot be large. Oe issue remais to be discussed. I the frequetistic approaches sample size is determied based a maximum value of a Type I error. That is, the probability of rejectig the ull hypothesis, while it is true, should have a value smaller tha, or equal to, α100%. Thus, i selectig a appropriate sample size the focus is o prevetig the coclusio that the umber of uits is greater tha some proportio, while it is ot i the populatio. This approach igores the probability o a Type II error, which is the probability of ot rejectig the ull hypothesis, while it is false. I other words, the approach described i the booklet may ot be the optimal approach from a statistical poit of view, however, it may well be that the Type I error is the most importat type of error i drug samplig. We let the aswer to this questio to be judged be the experts i the field. As a illustratio of the possible cosequeces of igorig the Type II error, suppose a very large seizure is ecoutered, ad the biomial approach is used for computig sample size. No egatives are expected, (1- α)100%=95%, ad k=.5. The required sample size is tha equal to =5. Table 5 presets subsequet probability of a Type II error for several values of the true proportio of uits cotaiig drugs. It becomes clear that the probability of a Type II error is relatively large ad much larger tha the commoly advised level of.2 (see Cohe, 1988). The Type II error depeds o the true amout of uits of drugs i the populatio. Ref code: DWG-SGL-001 Issue No. 002 Page: 12/15

7 Literature Aitke, C.G.G & Lucy, D (2002). Estimatio of the quatity of a drug i a cosigmet from measuremets o a sample. J Foresic Sci 2002; 47;968-975. Cohe, J. (1988) Statistical Power Aalysis for the Behavioral Scieces Secod Editio. New Jersey: Lawrece Erlbaum Associates Stoel, R.D. & Bolck, A. (2009, i press) A correctio to Tzidoy ad Ravreby (1992): A statistical approach to drug samplig: A case study. J Foresic Sci. Tzidoy D., Ravreby M. (1992) Statistical Approach to Drug Samplig: A Case study. J Foresic Sci;37;1541-1549. Alberik, I., Bolck, A. & Stoel, R.D. (submitted). Compariso of frequetist methods for estimatig the total weight of cosigmets of drugs. Submitted for publicatio to J Foresic Sci. Ref code: DWG-SGL-001 Issue No. 002 Page: 13/15

Appedix 1 Table 1: withdraw Table 2: probabilities, 1-P(X x θ = k), of the biomial distributio as computed with the software, by had, ad with R. k -x 1-P software 1-P had 1-P R.5 0 3.875.875.875.5 1 3.500.500.500.5 2 3.125.125.125.7 0 20.999.999.999.7 1 20.992.992.992.7 2 20.965.965.965.9 0 30.958.958.958.9 1 30.816.816.816.9 2 30.589.589.589 Note: the tabled probabilities are computed i R by meas of the fuctio: 1-pbiom(x-1,, k) Table 3: probabilities, P(θ >k x,), of the beta distributio as computed with the software ad with R. k x a b P software P R.5 3 3 1 1.9375.9375.5 2 3 1 1.6875.6875.5 1 3 1 1.3125.3125.7 3 3 1 1.7599.7599.7 2 3 1 1.3483.3483.5 10 10.5.5.9998.9998.5 9 10.5.5.9963.9963.9 19 20 1 3.1927.1927.9 18 20 1 3.0731.0731 Note: the tabled probabilities are computed i R by meas of the fuctio: 1-pbeta(x+a, -x+b) Table 4: probabilities, P(Y >y x,,n), of the beta-biomial distributio as computed with the software ad with R. N k x a b P software P R 20.7 3 3 1 1.8327.8327 49.9 18 20 1 3.1258.1258 Note: the tabled probabilities are computed i R by meas of the fuctio: Table 5: probability o a Type II error True proportio of k Type I error (α) drugs Type II error.5 5.05.51.97.5 5.05.6.92.5 5.05.7.83.5 5.05.8.67.5 5.05.9.41.5 5.05.99.05.7 9.05.75.96.7 9.05.8.92.7 9.05.9.86.7 9.05.99.06.9 29.05.95.77.9 29.05.99.25 Ref code: DWG-SGL-001 Issue No. 002 Page: 14/15

Appedix 2 Withdraw. Ref code: DWG-SGL-001 Issue No. 002 Page: 15/15