International Journal of Pure and Applied Mathematics Volume 21 No , THE VARIANCE OF SAMPLE VARIANCE FROM A FINITE POPULATION

Similar documents
A SUMMATION FORMULA FOR SEQUENCES INVOLVING FLOOR AND CEILING FUNCTIONS

SECTION 9.2: ARITHMETIC SEQUENCES and PARTIAL SUMS

EVALUATION OF A FAMILY OF BINOMIAL DETERMINANTS

HAMMING DISTANCE FROM IRREDUCIBLE POLYNOMIALS OVER F Introduction and Motivation

Some thoughts concerning power sums

On reaching head-to-tail ratios for balanced and unbalanced coins

Enumerating split-pair arrangements

Peter Bala, Nov

ON DIVISIBILITY OF SOME POWER SUMS. Tamás Lengyel Department of Mathematics, Occidental College, 1600 Campus Road, Los Angeles, USA.

Passing from generating functions to recursion relations

The Generating Functions for Pochhammer

Counting k-marked Durfee Symbols

Permutations with Inversions

On homogeneous Zeilberger recurrences

Transformation formulas for the generalized hypergeometric function with integral parameter differences

A Note about the Pochhammer Symbol

arxiv: v2 [math.nt] 4 Jun 2016

Linear Recurrence Relations for Sums of Products of Two Terms

Acta Mathematica Academiae Paedagogicae Nyíregyháziensis 22 (2006), ISSN

A REMARK ON ZAGIER S OBSERVATION OF THE MANDELBROT SET

On the Density of Languages Representing Finite Set Partitions 1

arxiv: v1 [math.co] 20 Aug 2015

Bootstrap inference for the finite population total under complex sampling designs

Two finite forms of Watson s quintuple product identity and matrix inversion

Making Change for n Cents: Two Approaches

PAijpam.eu THE PERIOD MODULO PRODUCT OF CONSECUTIVE FIBONACCI NUMBERS

THE RALEIGH GAME. Received: 1/6/06, Accepted: 6/25/06. Abstract

All variances and covariances appearing in this formula are understood to be defined in the usual manner for finite populations; example

Pacific Journal of Mathematics

Running Modulus Recursions

An Application of Catalan Numbers on Cayley Tree of Order 2: Single Polygon Counting

Algorithmic Approach to Counting of Certain Types m-ary Partitions

Tewodros Amdeberhan, Dante Manna and Victor H. Moll Department of Mathematics, Tulane University New Orleans, LA 70118

Improved bounds for (kn)! and the factorial polynomial

Calculus of Finite Differences. Andreas Klappenecker

Diverse Factorial Operators

5.3 Polynomials and Rational Functions

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

Application of Logic to Generating Functions. Holonomic (P-recursive) Sequences

Congruence Classes of 2-adic Valuations of Stirling Numbers of the Second Kind

arxiv: v1 [math.co] 22 May 2014

A Combinatorial Interpretation of the Numbers 6 (2n)! /n! (n + 2)!

PAijpam.eu REPEATED RANDOM ALLOCATIONS WITH INCOMPLETE INFORMATION Vladimir G. Panov 1, Julia V. Nagrebetskaya 2

Polyexponentials. Khristo N. Boyadzhiev Ohio Northern University Departnment of Mathematics Ada, OH

COMPOSITIONS WITH A FIXED NUMBER OF INVERSIONS

arxiv:quant-ph/ v1 20 Apr 1995

Countable and uncountable sets. Matrices.

ON THE POSSIBLE QUANTITIES OF FIBONACCI NUMBERS THAT OCCUR IN SOME TYPES OF INTERVALS

arxiv: v1 [math.co] 13 Jul 2017

Sums of exponentials of random walks

Unequal Probability Designs

On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables

Some Questions Concerning Computer-Generated Proofs of a Binomial Double-Sum Identity

The Distribution of Mixing Times in Markov Chains

ALL TEXTS BELONG TO OWNERS. Candidate code: glt090 TAKEN FROM

On Expected Gaussian Random Determinants

Eigenvalues of Random Matrices over Finite Fields

arxiv: v1 [math.pr] 12 Jan 2017

Studentization and Prediction in a Multivariate Normal Setting

2.3 Estimating PDFs and PDF Parameters

BALANCING GAUSSIAN VECTORS. 1. Introduction

Decomposition of a recursive family of polynomials

COMPLEMENTARY FAMILIES OF THE FIBONACCI-LUCAS RELATIONS. Ivica Martinjak Faculty of Science, University of Zagreb, Zagreb, Croatia

1. (Regular) Exponential Family

Here, logx denotes the natural logarithm of x. Mrsky [7], and later Cheo and Yien [2], proved that iz;.(»)=^io 8 "0(i).

On identities with multinomial coefficients for Fibonacci-Narayana sequence

Invariant Generation for P-solvable Loops with Assignments

MATH 117 LECTURE NOTES

ON THE TAYLOR COEFFICIENTS OF THE HURWITZ ZETA FUNCTION

Inequalities Between Hypergeometric Tails

Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016

Sierpinski's Triangle and the Prouhet-Thue-Morse Word DAVID CALLAN. Department of Statistics University Ave. November 18, 2006.

A Few Special Distributions and Their Properties

On integral representations of q-gamma and q beta functions

F n = F n 1 + F n 2. F(z) = n= z z 2. (A.3)

Generating All Circular Shifts by Context-Free Grammars in Chomsky Normal Form

STATISTICS SYLLABUS UNIT I

Mutivairable Analytic Functions

Understanding hard cases in the general class group algorithm

3 Integration and Expectation

Sampling and Discrete Time. Discrete-Time Signal Description. Sinusoids. Sampling and Discrete Time. Sinusoids An Aperiodic Sinusoid.

arxiv: v2 [math.co] 29 Jun 2016

Asymptotics of Integrals of. Hermite Polynomials

5.5 Recurrence Relations and Clenshaw s Recurrence Formula

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Countable and uncountable sets. Matrices.

Notation Precedence Diagram

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

1. Introduction The incomplete beta function I(a, b, x) is defined by [1, p.269, Eq and p.944, Eq ]

Module 3. Function of a Random Variable and its distribution

arxiv: v1 [math.pr] 7 Aug 2009

Expected heights in heaps. Jeannette M. de Graaf and Walter A. Kosters

MATH 326: RINGS AND MODULES STEFAN GILLE

Random Feature Maps for Dot Product Kernels Supplementary Material

Introduction to Machine Learning CMU-10701

HANDBOOK OF APPLICABLE MATHEMATICS

Lecture 1: Brief Review on Stochastic Processes

Conditional distributions (discrete case)

µ X (A) = P ( X 1 (A) )

Kumaun University Nainital

Transcription:

International Journal of Pure and Applied Mathematics Volume 21 No. 3 2005, 387-394 THE VARIANCE OF SAMPLE VARIANCE FROM A FINITE POPULATION Eungchun Cho 1, Moon Jung Cho 2, John Eltinge 3 1 Department of Mathematics and Sciences Kentucky State University 400, E. Main Street, Frankfort, KY 40601, USA e-mail: eccho@gwmail.kysu.edu 2,3 Office of Survey Methods Research Bureau of Labor Statistics Washington, DC 20212, USA 2 e-mail: Cho.Moon@bls.gov 3 e-mail: Eltinge.John@bls.gov Abstract: A formula for the variance of variance of all the samples taken from a finite population is given in terms of the second and the fourth moments of the population. Formulae for the variance of variance of the samples from a finite population of the discrete uniform and of the binomial distribution, and numerical values of those for the populations of uniform, binomial, normal, Poisson and hypergeometric distributions are given. AMS Subject Classification: 62D05, 62J10 Key Words: variance of sample variance, variance estimator, sampling variance, randomization variance, moments 1. Introduction For an arbitrary finite population, one can establish the sample mean as an unbiased estimator for the population mean and evaluate the randomization variance of the sample mean. One can subsequently use similar developments for related randomized designs, for example, stratified random sampling and Received: May 10, 2005 c 2005, Academic Publications Ltd. Correspondence author

388 E. Cho, M.J. Cho, J. Eltinge some forms of cluster sampling. In applications of this approach to practical problems, it is often important to evaluate the variance of the variance estimator. In the situation when the variance of a finite population is estimated from the variance of a sample it is important to know the relationship between the variances of the population and of the sample, the sizes of the population and of the sample. Though the sample variance is an unbiased estimator of the population variance, it is problematic if the variance of the sample variance is too large. Some cluster sample designs, for example, may be considered problematic if the resulting variance estimator is unstable, that is, the variance estimator has an unreasonably large variance. Historically, introductory sampling textbooks have addressed this issue in a relatively limited form, either through a brief reference to an elaborate algebra of polykays developed by Tukey [4, pp. 37-54] and Wishart [8, pp. 1-13] or through a direct appeal to large-sample approximations based on the normal and chi-square distributions. See Cochran [1, pp. 29, 96] for examples of these two approaches. In addition, the statistical literature has developed computational tools to implement the above mentioned polykay results. However, a simple formula that can be directly coded in programming languages or in computer algebra systems such as Maple or Mathematica) does not exist in the literature. We present a relatively simple formula of the variance of sample variance. The formula is derived directly by considering all possible samples without replacement) from a finite population. We show the application of the formula for the special cases when the population is of uniform, binomial, normal, Poisson, hypergeometric distribution. 2. Functions of Random Samples Consider a finite population A of N numbers and the list L n,a of all possible N ) n samples of size n selected without replacement from A. One can select a without-replacement simple random sample of size n from A by selecting one element from L n,a in such a way that each sample S j has the same probability 1/α of being selected. Two prominent examples of functions f defined on L n,a are the sample mean and the sample variance. Evaluating the randomization properties of fs) for S L n,a is conceptually straightforward. For example, E fs)), the expected value of fs), is obtained by computing its arithmetic average taken over the α equally likely samples in L n,a, and V fs)), the variance of fs), is

THE VARIANCE OF SAMPLE VARIANCE FROM... 389 defined to be the expectation of the squared deviations E fs) E fs))) 2) : E fs)) = 1 fs), α S L n,a V fs)) = 1 fs) E fs))) 2. α S L n,a 3. The Variance of Sample Variance Let S be a sample S of size n from A, ms) and vs) represent the mean and the variance of S. Let µ and va) represent the mean and the full finite-population analogue of the sample variance of A ms) = 1 n µ = 1 N a i S i=1 a i, vs) = 1 n 1 a i, va) = 1 N 1 a i ms)) 2, 1) a i S a i µ) 2. 2) Routine argument e.g., Cochran [1] [Theorems 2.1, 2.2 and 2.4]) shows i=1 E{mS)} = µ, E{vS)} = va), v{ms)} = 1 n N n va). 3) Obviously, vs) is an unbiased estimator of va). We present a simple expression for the variance of vs), v vs)) = 1 α S L n,a {vs) va)} 2 4) in terms of n, N, and the moments of A. We note any direct computation of the variance of vs) using Definition 4 is impractical due to the large value of α unless the size N of the population is small. Notation. 4. The Main Formula A = [a 1,a 2,...,a N ], L n,a = [S 1,S 2,...,S α ], V n = [vs 1 ),vs 2 ),..., vs α )], vs) = 1 n 1 a 1 a i ) 2, n a i S a i S

390 E. Cho, M.J. Cho, J. Eltinge v vs)) = E [vs) E{vS)}] 2), µ 2 = 1 N a i µ) 2, α = i=1 µ 4 = 1 N ) N = n a i µ) 4, i=1 N! n!n n)!. The following lemma expresses the variance of sample variance vvs)) in terms of all the fourth order products of the elements in A, i.e., a i 4, a i 2 a j 2, a i 2 a j a k, a i 3 a j, and a i a j a k a l. The terms in the formula are combined so that they appear only once. For example, a i a j a k a l appears only for the indices arranged in increasing order i < j < k < l. where and Lemma 1. v vs)) = C C 1 s 4 + C 2 s 31 + C 3 s 22 + C 4 s 211 + C 5 s 1111 ), N n C = N 1) 2 N 2 n, C 1 = N 1) 2, 2Nn 3N 3n + 3) C 2 = 4N 1), C 3 =, n 1) 82Nn 3N 3n + 3) 482Nn 3N 3n + 3) C 4 =, C 5 = N 2) n 1) N 2) N 3) n 1) s d = s 31 = s 211 = a d i, d = 1,2,3,4, i a 3 i a j, s 22 = i j i j,i k j<k a 2 ia 2 j, i<j a 2 ia j a k, s 1111 = i<j<k<l a i a j a k a l. Proof. Expand the equation 4), then determine the coefficients C i of the terms a i 4, a i 2 a j 2, a i 2 a j a k, a i 3 a j, and a i a j a k a l that appears in the summation. We assume, to avoid trivial cases, N 4 and 3 n N 1. Since the variance is invariant under the shifting by a constant, we will assume µ is zero by shifting every element of A by µ. That µ = 0 simplifies the formula substantially.,

THE VARIANCE OF SAMPLE VARIANCE FROM... 391 Theorem 1. The variance of sample variance v vs)) for S L n,a is a linear combination of µ 4 and µ 2 2 with coefficients a 1 and a 2, which are rational expressions of N and n, where a 1 = v vs)) = a 1 µ 4 a 2 µ 2 2, 5) N N n)nn N n 1) N 3)N 2) N 1) n 1) n, a 2 = N N n) N 2 n 3n 3N 2 + 6N 3 ) N 3) N 2) N 1) 2 n 1) n Factoring a common factor from a 1 and a 2, where c = v vs)) = c b 1 µ 4 b 2 µ 2 2 ), 6) N N n) N 3) N 2) N 1) n 1) n, b 1 = Nn N n 1, b 2 = N2 n 3n 3N 2 + 6N 3 N 1 Corollary 1. If N is very large compared to n, then vvs)) 1 n µ 4 n 3 n 1 µ 2 2 ). Proof. lim N a 1 = 1/n and lim N a 2 = n 3)/n 1)n. Corollary 2. If n = N 1, as in the jackknife set up, then ) N 2 2 v vs)) = µ4 µ ) 2. N 1) N 2) Proof. Simple substitution of N 1 into n in the formula 5). Proof of Theorem 1. A sketch of the proof of the theorem is given. The sums s 31, s 22, s 211 and s 1111 can be expressed in terms of s 2 and s 4 by routine expansion and simplification together with the fact that N i a i = 0, s 31 = Ns 4, s 22 = N 2 N s 2 2 s 4 ), s 211 = N 2 N s 2 2 2s 4 ), s 1111 = N 8 N s 2 2 2s 4 ). Substituting these relations to the equation of the lemma, we get the formula 5) of Theorem 1...

392 E. Cho, M.J. Cho, J. Eltinge Example 1. Discrete Uniform Distribution) Let A be the population of N numbers distributed uniformly on the interval [0,1]. [ ] i A = : i = 0,...,N 1. N 1 It follows from the definition that µ = 1 2, µ 2 = N + 1 12N 1), µ 4 = 50N5 + 15N 4 + 3N 3 + 90N 2 + 67N + 15 240N N 1) 4. Substitution of µ 2 and µ 4 into the formula 5) of the theorem and taking the limit as N approaches to + gives v vs)) 7n 12 360 n 1)n. Example 2. Binomial Distribution) Let A be the population of N = 2 k integers between 0 and k such that the integer r appears in A k r) times, k r) {}}{ A = 0,1,..., r,...,r,...,k 1,k. It follows from the definition that µ = k 2, µ 2 = k 4, µ 4 = 3 16 k2 1 8 k. If N is large, a 1 1/n and a 2 n 3)/nn 1), and vvs)) k ) n 2 8n n 1 k 1. Example 3. Numerical Examples) In this examples, A is a population of size 2048 generated by a random variable of the given distribution using built-in random number generators in Matlab)and the sample size is 32. Uniform Distribution. A is a population generated by a random variable uniformly distributed on [0,1]. µ = 0.5059, µ 2 = 0.0843, µ 4 = 0.0126, vvs)) = 0.0006.

THE VARIANCE OF SAMPLE VARIANCE FROM... 393 Note that µ = 0.5, µ 2 0.083415, µ 4 0.012555, and vvs)).000588, if A were an ideal population of 2024 numbers uniformly distributed on [0,1]. Binomial Distribution. A is a population generated by a random variable of the binomial distribution with k = 11 and p = 0.5. µ = 5.4985, µ 2 = 2.7783, µ 4 = 22.4375, vvs)) = 0.9148. Poisson Distribution. A is a population generated by a random variable of Poisson distribution with λ = 3 µ = 3.0176, µ 2 = 3.0973, µ 4 = 36.2807, vvs)) = 1.3958. Hypergeometric Distribution. A is a population generated by a random variable of hypergeometric distribution with M = 10,K = 5 and n = 5. µ = 2.4829, µ 2 = 0.7048, µ 4 = 1.3838, vvs)) = 0.0570. Normal Distribution. A is a population generated by a random variable of the standard normal distribution. µ = 0.0040, µ 2 = 1.0826, µ 4 = 3.6113, vvs)) = 0.1452. Acknowledgements The first author was partially supported by Kentucky State University. References [1] W.G. Cochran, Sampling Techniques, Third Edition, John Wiley 1977), 29, 96. [2] R.L. Graham, D.E. Knuth, O. Patashnik, Concrete Mathematics, Addison- Wesley 1989). [3] J.W. Tukey, Some sampling simplified, Journal of the American Statistical Association, 45 1950), 501-519. [4] J.W. Tukey, Keeping moment-like sampling computation simple, The Annals of Mathematical Statistics, 27 1956), 37-54.

394 E. Cho, M.J. Cho, J. Eltinge [5] J.W. Tukey, Variances of variance components: I. Balanced designs, The Annals of Mathematical Statistics, 27 1956), 722-736. [6] J.W. Tukey, Variances of variance components: II. Unbalanced single classifications, The Annals of Mathematical Statistics, 28 1957), 43-56. [7] J.W. Tukey, Variance components: III. The third moment in a balanced single classification, The Annals of Mathematical Statistics, 28 1957), 378-384. [8] J. Wishart, Moment coefficients of the k-statistics in samples from a finite population, Biometrika, 39 1952), 1-13.