Multistage Methodologies for Partitioning a Set of Exponential. populations.

Similar documents
A Generalization of The Partition Problem in Statistics

SELECTING THE NORMAL POPULATION WITH THE SMALLEST COEFFICIENT OF VARIATION

FIXED-WIDTH CONFIDENCE INTERVAL FOR A LOGNORMAL MEAN

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

Selecting the Best Simulated System. Dave Goldsman. Georgia Tech. July 3 4, 2003

A GENERAL FRAMEWORK FOR THE ASYMPTOTIC VALIDITY OF TWO-STAGE PROCEDURES FOR SELECTION AND MULTIPLE COMPARISONS WITH CONSISTENT VARIANCE ESTIMATORS

Proceedings of the 2014 Winter Simulation Conference A. Tolk, S. Y. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds.

Theory of Screening Procedures to Identify Robust Product Designs Using Fractional Factorial Experiments

Lecture 3. Inference about multivariate normal distribution

PERFORMANCE OF VARIANCE UPDATING RANKING AND SELECTION PROCEDURES

Laplace Distribution Based Stochastic Models

Invariant HPD credible sets and MAP estimators

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Complexity of two and multi-stage stochastic programming problems

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

FINDING THE BEST IN THE PRESENCE OF A STOCHASTIC CONSTRAINT. Sigrún Andradóttir David Goldsman Seong-Hee Kim

HIGHER THAN SECOND-ORDER APPROXIMATIONS VIA TWO-STAGE SAMPLING

Testing Algebraic Hypotheses

On prediction and density estimation Peter McCullagh University of Chicago December 2004

Empirical Power of Four Statistical Tests in One Way Layout

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Modified Simes Critical Values Under Positive Dependence

ISSN X On misclassification probabilities of linear and quadratic classifiers

Indifference-Zone-Free Selection of the Best

On the rate of convergence in limit theorems for geometric sums of i.i.d. random variables

E-Companion to Fully Sequential Procedures for Large-Scale Ranking-and-Selection Problems in Parallel Computing Environments

Nonstationary Panels

Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC

6-1. Canonical Correlation Analysis

Ranking and Selection Procedures for Bernoulli and Multinomial Data. Gwendolyn Joy Malone

MONTE CARLO ANALYSIS OF CHANGE POINT ESTIMATORS

Multistage Tests of Multiple Hypotheses

J.l(Fi) == EFiXill SELECTING THE BEST SYSTEM IN TRANSIENT SIMULATIONS WITH VARIANCES KNOWN ABSTRACT 1 INTRODUCTION

Biased Urn Theory. Agner Fog October 4, 2007

A Note on Certain Stability and Limiting Properties of ν-infinitely divisible distributions

2 Mathematical Model, Sequential Probability Ratio Test, Distortions

Seminar über Statistik FS2008: Model Selection

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Exercises Chapter 4 Statistical Hypothesis Testing

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition

Econometric Methods for Panel Data

STATISTICAL SCREENING, SELECTION, AND MULTIPLE COMPARISON PROCEDURES IN COMPUTER SIMULATION

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.

p(x ω i 0.4 ω 2 ω

NEW RESULTS ON PROCEDURES THAT SELECT THE BEST SYSTEM USING CRN. Stephen E. Chick Koichiro Inoue

On the Asymptotic Validity of Fully Sequential Selection Procedures for Steady-State Simulation

Abstract Dynamic Programming

Research Article Multiple-Decision Procedures for Testing the Homogeneity of Mean for k Exponential Distributions

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices

Proceedings of the 2015 Winter Simulation Conference L. Yilmaz, W. K V. Chan, I. Moon, T. M. K. Roeder, C. Macal, and M. D. Rossetti, eds.

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

arxiv: v1 [stat.me] 14 Jan 2019

Comparing Forecast Accuracy of Different Models for Prices of Metal Commodities

On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control

Research Article The Laplace Likelihood Ratio Test for Heteroscedasticity

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

A Maximally Controllable Indifference-Zone Policy

Approximation of Average Run Length of Moving Sum Algorithms Using Multivariate Probabilities

The Level and Power of a Studentized Range Test for Testing the Equivalence of Means

CS 195-5: Machine Learning Problem Set 1

ORDER RESTRICTED STATISTICAL INFERENCE ON LORENZ CURVES OF PARETO DISTRIBUTIONS. Myongsik Oh. 1. Introduction

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

MATH5745 Multivariate Methods Lecture 07

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring

Long-Run Covariability

GOTEBORG UNIVERSITY. Department of Statistics

Does k-th Moment Exist?

Mi-Hwa Ko. t=1 Z t is true. j=0

A Simulation Comparison Study for Estimating the Process Capability Index C pm with Asymmetric Tolerances

Research Article Optimal Portfolio Estimation for Dependent Financial Returns with Generalized Empirical Likelihood

THE UNIVERSITY OF BRITISH COLUMBIA DEPARTMENT OF STATISTICS TECHNICAL REPORT #253 RIZVI-SOBEL SUBSET SELECTION WITH UNEQUAL SAMPLE SIZES

COMBINED RANKING AND SELECTION WITH CONTROL VARIATES. Shing Chih Tsai Barry L. Nelson

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles

Construction and analysis of Es 2 efficient supersaturated designs

Fully Sequential Selection Procedures with Control. Variates

Probability Theory and Statistics. Peter Jochumzen

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

ON CONSTRUCTING T CONTROL CHARTS FOR RETROSPECTIVE EXAMINATION. Gunabushanam Nedumaran Oracle Corporation 1133 Esters Road #602 Irving, TX 75061

STAT 7032 Probability Spring Wlodek Bryc

Generalized nonparametric tests for one-sample location problem based on sub-samples

Inference on reliability in two-parameter exponential stress strength model

Minimum average value-at-risk for finite horizon semi-markov decision processes

General Notation. Exercises and Problems

Eves Equihoop, The Card Game SET, and Abstract SET

Economics 583: Econometric Theory I A Primer on Asymptotics

Test of fit for a Laplace distribution against heavier tailed alternatives

Detection of structural breaks in multivariate time series

The Third International Workshop in Sequential Methodologies

Conway s RATS Sequences in Base 3

Effective Sample Size in Spatial Modeling

Research Article Degenerate-Generalized Likelihood Ratio Test for One-Sided Composite Hypotheses

Random Eigenvalue Problems Revisited

A Fully Sequential Elimination Procedure for Indi erence-zone Ranking and Selection with Tight Bounds on Probability of Correct Selection

A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification

Bridging the Gap between Center and Tail for Multiscale Processes

Transcription:

Multistage Methodologies for Partitioning a Set of Exponential Populations Department of Mathematics, University of New Orleans, 2000 Lakefront, New Orleans, LA 70148, USA tsolanky@uno.edu Tumulesh K. S. Solanky Abstract. We consider the problem of partitioning a set of k exponential populations with respect to a control population. For this problem some multistage methodologies are proposed and their theoretical properties are derived. Using the Monte Carlo simulation techniques, the small and moderate sample size performance of the proposed procedure are studied. Keywords. Correct partition, Probability of correct decision, sequential procedure, simulations, two-parameter negative exponential populations. 1 Introduction The problem of comparisons with a control has been an active research area for over seven decades. The desire of the experimenter to have the best population to be some specified amount better than what is already in use has motivated the research in this area. This problem has been studied by many and under different formulations. Tong (1969) formulated the partition problem using Bechhofer s (1954) indifference zone formulation and constructed a two-stage and a purely sequential procedure. Among slightly different formulations of this problem is the problem of selection of the best treatment relative to a control population or selecting a subset of the treatments having means greater than that of a control. For more on such related formulations one is recommended Bechhofer, Santner and Goldsman (1995). For a general overview of sequential methodology and of the partition problem, the reader is recommended Ghosh, Mukhopadhyay and Sen (1997), and, Mukhopadhyay and Solanky (1994). Suppose that we have π 0, π 1,..., π k, independent and exponentially distributed populations, with density function of π i, i = 0, 1,, k given by f X (x) = σ 1 exp{ (x θ i )/σ}i(x > θ i ). Assume that the location parameters θ i, i = 0, 1,, k and the common scale parameter σ are all unknown. We refer to π 0 as the control population. The goal is to partition the set of treatments Ω = (π i : i = 1, 2,, k), into two disjoint and exhaustive subsets, corresponding to Good and Bad populations compared to the control population with a pre specified probability of correct partition. Given arbitrary but fixed constants δ 1 and δ 2, δ 1 < δ 2, we define three subsets of Ω along the lines of Bechhofer s (1954) indifference-zone formulation, as: Ω L = {π i : θ i θ 0 + δ 1, i = 1,, k}, Ω M = {π i : θ 0 + δ 1 < θ i < θ 0 + δ 2, i = 1,, k}, Ω R = {π i : θ i θ 0 + δ 2, i = 1,, k}. (1) We refer to Ω R as the set of good populations and Ω L as the set of bad populations. The set Ω M would be referred to as the set of mediocre populations. Adopting the Bechhofer s indifference zone approach, we are interested in the correct partition of the populations in Ω R and Ω L. And, we will be indifferent to partition of populations in Ω M. That is, with high accuracy we want to partition the set Ω into two disjoint subsets P L and P R, such that, Ω L P L and Ω R P R. Such a partition is known in the literature as a correct decision (CD). In other words, given a pre assigned number P, 2 k < P < 1, we seek statistical methodologies to determine P L and P R, such that P {CD θ, σ 2, } P θ R k+1, σ R +. (2)

2 Solanky We will use the following notation in the rest of this paper for convenience: d = { (δ 1 + δ 2 )/2, a = ( δ 1 + δ 2 )/2, λ = σ/a, and, k/2 if k is even; r = (k + 1)/2 if k is odd. (3) 2 Known σ case We assume that we observe random variables X 0j, X 1j,, X kj from π 0, π 1,, π k, j = 1,, n, respectively, in a sequential framework, where n is to be determined below. Assuming that σ is known, we denote T i = min 1 j n (X ij ), i = 0, 1,, n. (4) Along the lines of Desu et al. (1977), we define a pooled estimator of σ as ˆσ = Σ k i=0σ n j=1(x ij T i )/((k + 1)(n 1)). (5) Note that 2(k+1)(n 1)ˆσ/σ is independent of T i and has a chi-squared distribution with 2(k+1)(n 1) degrees of freedom. Next, we Consider the decision rule defined as: P L = {π i : T i T 0 < d, i = 1,, k}, P R = {π i : T i T 0 > d, i = 1,, k}. (6) Next, observe that for a mean vector θ to be a least favorable configuration under, the set Ω M must be empty, and, all the populations in Ω L and Ω R must have common means θ 0 + δ 1 and θ 0 + δ 2, respectively. Let θ 0 (r ) be the configuration such that θ i = µ 0 + δ 2 and θ j = µ 0 + δ 1, 0 < i r, r < j k for some r such that 0 < r k. Then, we have P CD θ 0 (r ), σ, Tj θ = P j T 0 θ 0 = P T j T 0 < d, T i T 0 > d, 0 < i r, r < j k, < d θ j+θ 0, T i θ i T 0 θ 0 > d θ i+θ 0, 0 < i r, r < j k. Next, we write Y j = T j θ j, r < j k, Y i = T i θ i, 0 < i r, Y 0 = T 0 θ 0. Note that Y i, Y j and Y 0, 0 < i r, r < j k all have standard exponential distributions. Note that for r < j k, d θ j +θ 0 d δ 1, and for 0 < i r, d θ i+θ 0 P CD θ 0 (r ), σ, d δ 2. Next using the notations from (3), we have P Y j Y 0 < an σ, Y 0 Y i < an σ, 0 < i r, r < j k. (7) Let the (k k) covariance matrix Σ r = (σ ij ) be given by 1 for i = j, σ ij = 1 for i j, and, 0 < i, j r or r < i, j k, 1 for 0 < i r, and, r < j k. (8) Let us define z j = Y j Y 0 for r < j k and z i = Y 0 Y i for 0 < i r. Note that the vector Z = (z 1,..., z k ) has a symmetric multivariate Laplace distribution with mean vector zero and the covariance matrix Σ r as defined above. For details on Multivariate Laplace Distribution one may look at Kotz et al. (2001). Note that (7) gives the infimum of the probability of correct decision under for

Partitioning Exponential Populations 3 the set of all configurations such that there are r populations in Ω R and k r in Ω L. Next, along the lines of Tong (1969), one can derive the Least Favorable Configuration (LFC) under the decision rule as: µ 1 = = µ r = µ 0 + δ 2, and, µ r+1 = = µ k = µ 0 + δ 1, where r is defined in (3). We will refer to the LFC as θ 0. Next, along the lines of (8) with r in place of r, we define the covariance matrix Σ as: 1 1 1 1........ Σ = 1 1 1 1 1 1 1 1. (9)........ 1 1 1 1 Next, let b = b(p, k) be the solution of the equation P = b b 2(2π) k 2 Σ 1 2 (Y Σ 1 Y /2) ν/2 K ν ( 2Y Σ 1 Y ) k dy i, (10) where ν = (2 k)/2 and K ν (.) is the modified Bessel function of the third kind given by K ν (u) = 1 u ν 2 2 0 t ν 1 exp( t u2 4t dt, u > 0. Then, one can immediately note that provided n satisfies P CD θ, σ, P, n bσ a (= n, say). (11) In other words, if σ is known, and one collects a sample of size n from each of π 0, π 1,..., π k, and, uses the decision rule given by (6) to partition the k populations, then the probability requirement (2) is achieved. However, if σ is unknown, then there does not exist a single-stage procedure which can achieve the probability requirement (2). For general details on non-existence of a single-stage procedure, one may refer to Dudewicz (1971). For normal populations case, Tong (1969) gave a single-stage procedure for the partition problem when the common variance known. The single-stage procedure provided above is an extension of Tong s (1969) single-stage procedure. For normal populations case when the common variance is unknown, Tong (1969) constructed a two-stage and a purely sequential procedure. Datta and Mukhopadhyay (1998) studied this problem further and constructed a fine-tuned purely sequential procedure and some other multistage methodologies, emphasizing the second-order asymptotics. Solanky (2001) has constructed an elimination type procedure for the normal populations partition problem which takes samples of unequal sizes. The reader is also recommended to look at Chen and Rollin (2004), Aoshima and Takada (2000), and Solanky (2006), who have studied various aspects of the partition problem. Many other additional references to the partition problem are available in the articles mentioned in this paragraph. Next, We construct a purely sequential procedure for this problem. The theoretical properties of the proposed procedures are derived and verified using Monte Carlo simulation studies. i=1 3 Purely sequential procedure The purely sequential procedure starts with observations X 0j, X 1j,, X kj, j = 1,, m, where m ( 2) is the starting sample size from π 0, π 1,, π k. After this, the sampling continues with one observation from π 0, π 1,, π k, at each step, according to the stopping rule N = inf{n m : n bˆσ }, (12) a where ˆσ is an estimator of σ based on a sample of size n along the lines of (5).

4 Solanky Theorem 1. For the purely sequential procedure (12) and using the decision rule (6) based on a sample of size N from π 0, π 1,, π k, we have as a 0: (i) N/n c 1 w.p. 1; (ii) E(N/n c) 1; (iii) lim inf P (CD) P for all µ R k+1 ; where n c = bσ a and b comes from (10). Proof: The proof follows along the lines of the proof of the Theorem 4.4.1 of Mukhopadhyay and Solanky (1994). The details are omitted for brevity. 4 Monte carlo simulations In this section, we will study the performance of the purely sequential procedure (12) via Monte Carlo simulation studies. The purely sequential procedure (12) was simulated for m = 10, k = 10 and P =.95, under a LFC. Without loss of generality we took σ = 1 for the purpose of generating populations. We took δ 1 = δ 2, giving a = δ 2 (= δ, say). Next, using n = bσ a, we computed the values of δ corresponding to n = 25, 50, 75, and 100. Then, each procedure was independently repeated 1000 times. The performance of the purely sequential procedure (12) is summarized in the Figure 1 via ploting the average value of the observed P(CS) against the optimal sample sizes. The Figure 1 shows that even for small sample sizes, the purely sequential procedure achieves the target value 0.95 quite precisely. Fig. 1. Performance of the purely sequential procedure (12) for: k=10, m=10, P =.95 References Aoshima, M. and Takada, Y. (2000). Second order properties of a two stage procedure for comparing several treatments with a control. Journal of Japan Statistical Society, 30, No. 1, 27-41. Bechhofer, R. E. (1954). A single-sample multiple decision procedure for ranking means of normal populations with known variances. Annals of Mathematical Statistics, 25, 16-39. Bechhofer, R. E., Santner, T.J, and Goldsman, D.M. (1995). Design and analysis of Experiments for statistical selection, screening, and multiple comparisons, John Wiley & Sons, Inc., New York. Chen, P. and Rollin, L.M. (2004). A two-stage selection and testing design for comparing several normal means with a standard. Sequential Analysis, 23, 75-101. Datta, S. and Mukhopadhyay, N. (1998). Second-order asymptotics for multistage methodologies in partitioning a set of normal populations having a common unknown variance. Statistics and Decisions, 16, No. 2, 191-205. Dudewicz, E.J. (1971). Non-existence of a single-sample selection procedure whose P(CS) is independent of the variance. South African Statistics Journal, 5, No. 4, 37-39. Ghosh, M., Mukhopadhyay, N. and Sen, P. K. (1997). Sequential Estimation, John Wiley & Sons, Inc., New York.

Partitioning Exponential Populations 5 Kotz, S., Kozubowski, T.J. and Podgórski, K. (2001). The Laplace Distribution and Generalizations: A Revisit With Applications to Communications, Economics, Engineering, and Finance, Birkhäuser, Boston. Mukhopadhyay, N. and Solanky, T. K. S. (1994). Multistage selection and ranking procedures, Marcel Dekker, New York. Solanky, T.K.S. (2001). A sequential procedure with elimination for partitioning a set of normal populations having a common unknown variance. Sequential Analysis, 20, No. 4, 279-292. Solanky, T.K.S. (2006). A two-stage procedure with elimination for partitioning a set of normal populations with respect to a control. Sequential Analysis, 25, No. 3, 297-310. Tong, Y. L. (1969). On partitioning a set of normal populations by their locations with respect to a control. Annals of Mathematical Statistics, 40, 1300-1324.